data analytics readiness - cdn.fedscoop.com

PRESENTED BY UNDERWRITTEN BY

The ability for federal agencies to harness artificial intelligence and machine learning to identify anomalous behavior on their networks depends on having essential data gathering, preparation and analytics tools in place.

This FedScoop report surveyed IT leaders at large, medium and small federal agencies to explore the state of their data analytics and machine learning capabilities; the obstacles agencies continue to face; and which types of services they’re turning to for greater support.

DATA ANALYTICS R E A D I N E S S F O R C Y B E R R E S I L I E N C E

EXECUTIVE SUMMARY

1

Managing cyber threats across today’s evolving on-premises and cloud-based IT environments depends increasingly on having robust and scalable capabilities to gather, analyze and respond to user behavior and network activity data.

That need has grown more essential for federal agencies, given the shift to hybrid and remote work models, the increasing complexity of IT systems and the government’s broader embrace of zero-trust security models. At the heart of those capabilities lies the need for clean, reliable, machine-learnable data. Federal IT and agency leaders indicate in a new FedScoop survey, however, that when it comes to harnessing data and machine learning — specifically to identify behavioral patterns that signal potential threats — they face several key challenges.

Among the major findings, based on completed responses from 160 prequalified federal IT and agency decision makers:

MACHINE LEARNING HURDLES

Though roughly two-thirds or more of federal IT leaders in the survey reported having above-industry-average capabilities to monitor, collect and analyze behavioral data across their networks, trying to train machine learning (ML) algorithms to identify and respond to anomalous behaviors on their networks remains challenging.

Among the top-ranking challenges

• Lack of experience training/testing ML algorithms

• Lack of adequate ML-related skills• Lack of clarity of what tools/services

meet their ML needs in the market• Lack of adequate tools• Lack of reliable ML-ready data

EXECUTIVE SUMMARY

2

SKILLS GAPS ACROSS ML PROCESS

Respondents at agencies of all sizes say they face significant deficiencies in skills across the data-to-ML processing cycle — from data ingestion to extraction/transforming/loading, to analysis, to ML-training, to operationalizing ML. Those deficiencies are hampering the ability to implement zero-trust models and build greater cyber resiliency. AGENCIES APPEAR READY TO HANDLE THE DATA

On a positive note, federal IT leaders say their organizations have the capabilities to monitor, process, store and analyze behavioral data of users, devices and applications on their networks — and well over two-thirds believe those capabilities meet or exceed industry and NIST accepted standards.

ML CHALLENGES VARY BY AGENCY SIZE

4 in 10 of respondents at large agencies (10,000+ employees) — which tend to deal with larger scale data challenges — cited a lack of adequate skills as a top challenge, compared to 2 in 10 respondents at small agencies (<1,000 employees), which are often still ramping up ML efforts or are relying on third parties.

Conversely, 1 in 3 respondents at small agencies cited a lack of adequate tools among their biggest challenges, compared to less than 1 in 4 at large agencies. And more than twice as many respondents at small and mid-size agencies struggle with a lack of reliable ML-ready data, compared to their counterparts at large agencies.

EXECUTIVE SUMMARY

3

RELIANCE ON EXTERNAL SUPPORT

While federal IT leaders indicate they have the capabilities to handle anomalous behavior data, a sizeable portion also report they’re opting to tap the expertise of external service providers at every stage of the data-to-machine learning process. The areas where agencies are most often seeking help are for data analytics and data integration and production; but there’s also high demand for help with ML governance and ML training.

The study, which was underwritten by Cloudera, also touches on other dimensions of data readiness, including:

• Agency capability to securely gather data at the edge of their networks as well as across their network environments.

• Where agencies are storing their ML production data.

• The extent to which agencies are relying on open-source solutions versus in-house and commercial solutions to prepare their ML data.

Respondents’ ratings of their organization’s data gathering, preparation, analytics and ML capabilities varied by size of agency, however, Leaders at larger agencies, for example, rated their ability to detect “ML model drift” lower than small to midsize agencies.

The variances reflect in part a combination of the complexity of data and security challenges agencies are facing as well as how far along they are in preparing data for machine learning. Whether agencies are fully capitalizing on their capabilities at different stages in the data assembly and ML process remains an open question.

WHO WE SURVEYED

4

FedScoop conducted an online survey of 160 prequalified IT decision makers from federal government agencies — plus those contractors who support them — about their data aggregation, collection and analysis strategies to support greater cyber resilience.

The survey was conducted in August/September 2021 and underwritten by Cloudera.

BY JOB TITLE

41% IT / NETWORK OPERATIONS / DEVELOPMENT PERSONNEL

14% DATA MANAGEMENT OFFICIAL

13% C-SUITE / SENIOR BUSINESS / PROGRAM LEADER

8% IT ANALYST

7% CIO / CTO / CISO

6% IT SYSTEMS ENGINEER / IT DEVELOPER

6% IT INFLUENCER

6% OTHER (E.G. IT STAFF / USERS)

BY AGENCY SIZE

44% 10,000+ EMPLOYEES

36% 1,000 – 9,999 EMPLOYEES

20% 1 – 999 EMPLOYEES*

*20% of respondents identified as government contractors who work on or support federal agency IT projects.

*Results in the study among respondents at agencies with under 1,000 employees are subject to a greater margin of error, due to their smaller base.

RESPONDENT BREAKOUT

MACHINE LEARNING FOR SECURITY TOP CHALLENGES

10,000+ EmployeesBAS E : 70

1,000 – 9,999 EmployeesBAS E : 5 8

Less than 1,000 EmployeesBAS E : 3 2

L E GE N D

What are the biggest challenges your agency is experiencing trying to train machine learning algorithms to identify behavioral patterns from devices / applications operating in your network?

I DONT KNOW: 23%

OTHER: 6%

I DONT KNOW: 14%

OTHER: 7%

I DONT KNOW: 19%

OTHER: 6%

0% 10% 20% 30% 40% 50%

37%

43%

38%

23%

24%

22%

10%

41%

31%

28%

29%

34%

39%

29%

19%

LACK ADEQUATE TOOLS

LACK ADEQUATE SKILLS

LACK EXPERIENCE TRAINING / TESTING ML

LACK RELIABLE ML-READY DATA

LACK CLARITY OF WHAT TOOLS / SERVICES MEET OUR NEEDS IN THE MARKET

5

MACHINE LEARNING PROCESS SKILLS CRITICAL GAPS

Where is your agency facing the most significant deficiency in skills?

0% 10% 20% 30% 40% 50%

27%

28%

36%

29%

36%

38%

31%

36%

25%

26%

16%

28%

36%

39%

28%

36%

41%

33%

DATA INGESTION

EXTRACTING, TRANSFORMING, LOADING DATA (ETL)

EXPLORATORY DATA ANALYSIS

ML TRAINING / BUILDING MODELS

OPERATIONALIZING ML MODEL

MONITORING

I DONT KNOW: 24%

OTHER: 4%

I DONT KNOW: 9%

OTHER: 3%

I DONT KNOW: 22%

OTHER: 6%


1,000 – 9,999 EmployeesBAS E : 5 8


L E GE N D

6

Where is your agency facing the most significant deficiency in skills?

MACHINE LEARNING PROCESS SKILLS CRITICAL GAPS

MACHINE LEARNING DATA REPOSITORIES TOP LOCATIONS

Where is most of that machine learning data being stored?

I DONT KNOW: 24%

I DONT KNOW: 12%

I DONT KNOW: 28%0% 10% 20% 30% 40%

14%

10%

29%

16%

26%

26%

14%

19%

31%

25%

14%

13%

IN MULTIPLE LOCATIONS ON PREMISES

IN A CENTRAL DATA REPOSITORY ON PREMISES

PRIMARILY IN A GOVERNMENT-SANCTIONED CLOUD

SPREAD ACROSS MULTIPLE ON-PREMISE AND CLOUD ENVIRONMENTS


1,000 – 9,999 EmployeesBAS E : 5 8


L E GE N D

7

3% 6%

6% 9% 21% 29%

22% 28%

17%

36%

48%28%5%

2%

41%

3%

3%

6% 19%

19%

34%

38%

34%

39%

31%

44%

31%

USERS / DEVICES OPERATING AT THE NETWORK’S EDGE

USERS / DEVICES OPERATING ACROSS AN AGENCY’S IT ENVIRONMENT

Capabilities fall below minimal industry / NIST accepted practices

Meets majority of industry / NIST best practices

Exceeds most industry / NIST accepted standards1 3 5

ABILITY TO MONITOR BEHAVIORIAL ACTIVITY

Respondents rate their agency’s current ability to monitor user / device behavioral activity (Scale of 1-5)



1,000 – 9,999 EmployeesBAS E : 5 8

1,000 – 9,999 EmployeesBAS E : 5 8



USERS / DEVICES OPERATING AT THE NETWORK’S EDGE

USERS / DEVICES OPERATING ACROSS AN AGENCY’S IT ENVIRONMENT

8

ABILITY TO MONITOR BEHAVIORIAL ACTIVITY

9




Respondents rate their agency’s current ability to monitor software behavioral activity (Scale of 1-5)


1,000 – 9,999 EmployeesBAS E : 5 8


4% 7% 21% 39%29%

28%

3% 28% 25% 44%

34%28%9%

2%

3%

3%

6% 19%

19%

38%

SOFTWARE APPLICATIONS OPERATING ACROSS YOUR IT ENVIRONMENT SOFTWARE APPLICATIONS OPERATING ACROSS YOUR IT ENVIRONMENT

9

ABILITY TO GATHER & ANALYZE BEHAVIORIAL DATA

10




Respondents rate their agency’s current ability to collect, analyze and store behavior activity (Scale of 1-5)



1,000 – 9,999 EmployeesBAS E : 5 8

1,000 – 9,999 EmployeesBAS E : 5 8



3% 3% 36% 26% 33%

4% 9% 23% 33%31%

29%

3% 6% 31%22% 38%

34%24%10%

2%

26%14%

3% 3% 22% 31% 41%

26% 33%

2%

COLLECT AND ANALYZE BEHAVIORAL DATA OF USERS / DEVICES / SOFTWARE ACROSS THE IT ENVIRONMENT

STORE HISTORICAL BEHAVIORAL DATA FROM USERS / DEVICES / SOFTWARE OPERATING ON THE NETWORK

10

ABILITY TO DETECT & RESPOND TO ANOMALOUS BEHAVIOR




Respondents rate their agency’s current ability to react to anomalous behavior (Scale of 1-5)


1,000 – 9,999 EmployeesBAS E : 5 8


3% 9% 14% 33% 41%

3% 9% 20% 36%33%

3% 29%

6% 41%13%9% 31%

38%22%7%

28% 43% 26%

3% 19% 13% 31% 34%

2% 2%

DETECT ANOMALOUS USER / DEVICE BEHAVIOR ACROSS YOUR IT ENVIRONMENT IN NEAR-REAL TIME

DETECT ANOMALOUS APPLICATION BEHAVIOR ACROSS YOUR IT ENVIRONMENT IN NEAR-REAL TIME

11


1,000 – 9,999 EmployeesBAS E : 5 8


ABILITY TO DETECT & RESPOND TO ANOMALOUS BEHAVIOR




Respondents rate their agency’s current ability to react to anomalous behavior (Scale of 1-5)


1,000 – 9,999 EmployeesBAS E : 5 8


12

4% 4% 21% 46%24%

3% 33%

13% 28%22% 38%

34%22%7%

RESPOND TO ANOMALOUS BEHAVIOR, E.G. BLOCKING SOURCE OR ACCESS IN NEAR-REAL TIME

ABILITY TO USE DATA FOR AI AND ML

13




Respondents rate their agency’s current ability to clean and prepare data for AI/machine learning (Scale of 1-5)


1,000 – 9,999 EmployeesBAS E : 5 8


6% 14% 37% 25% 25%

9% 9% 31% 25% 25%

6% 17% 20% 24%33%

10% 28%

6% 16%28%9% 41%

24%38%

33%9% 40% 19%

CLEAN / STANDARDIZE DATA FOR MACHINE LEARNING

PREPARE / ANNOTATE DATA FOR MACHINE LEARNING

13


1,000 – 9,999 EmployeesBAS E : 5 8


ABILITY TO USE DATA FOR AI AND ML

14





1,000 – 9,999 EmployeesBAS E : 5 8


14


1,000 – 9,999 EmployeesBAS E : 5 8


14%

7%

13% 28%6% 22%

19% 22% 28%

28%

31%

31%26%

31%

9%

9% 19%

6% 31%

26%14%

33%

28%

19%

23%

40%

25%

17%

2%

2%

MONITOR AND DETECT MACHINE LEARNING “MODEL DRIFT”

BUILD AND OPERATIONALIZE YOUR MACHINE LEARNING MODEL

Respondents rate their agencies’ current ability deploy behavioral data for AI/machine learning (Scale of 1-5)

MACHINE LEARNING DATA PREPARATION PREFERRED SOLUTION

Which type of solutions are used most by your agency to label and prepare machine learning training data?

0% 10% 20% 30% 40% 50%

21%

29%

22%

11%

19%

13%

27%

28%

16%

5%

10%

34%

COMMERCIAL SOLUTIONS

OPEN-SOURCE SOLUTIONS

IN-HOUSE SOLUTIONS

USE MANAGED SERVICE TO PERFORM THE WORKI DONT KNOW: 22%

I DONT KNOW: 12%

I DONT KNOW: 30%


1,000 – 9,999 EmployeesBAS E : 5 8


L E GE N D

15

0% 10% 20% 30% 40% 50%

13%

13%

13%

31%

31%

36%

38%

49%

41%

47%

24%

13%

6%

14%

19%

33%

29%

19%

34%

22%

28%

DATA READINESS FOR MACHINE LEARNING RELIANCE ON SERVICE PROVIDERS

Which of the following external services has your agency used for its ML projects?

I DONT KNOW: 23%

I DONT KNOW: 17%

I DONT KNOW: 31%

DATA INGESTION

DATA LABELING

DATA INTEGRATION / PRODUCTION

DATA ANALYTICS

ML TRAINING

ML DEPLOYMENT

GOVERNANCE


1,000 – 9,999 EmployeesBAS E : 5 8


L E GE N D

16

DATA READINESS FOR MACHINE LEARNING RELIANCE ON SERVICE PROVIDERS

RECOMMENDATIONS

17

While federal IT leaders maintain their agencies have the capabilities to ingest, prepare and analyze data, they still need help harnessing those capabilities to leverage machine learning in order to better detect and respond to anomalous behavior on their networks.

The relatively high reliance on third-party support for ML analytics suggests agencies may still be relying on SQL-based workloads — and would benefit from moving to more modern, integrated platforms for ingesting and analyzing behavioral data to improve cyber resilience.

The deficiency in skills across most ML-related data processing stages — and the rapid evolution of data management and ML tools — suggest that agencies would achieve zero-trust frameworks faster by engaging with service providers specializing in modernized data and ML solutions.

Agencies are also likely to benefit from shifting from in-house solutions, to more portable and adaptable opensource solutions for collecting and preparing data for ML production — especially as data gathering and analysis increasingly take place at the edge of agency networks.

Though often underappreciated, getting data and ML governance right — and conveyed across the organization — remain crucial components in building ML competency and achieving broader zero-trust strategies.

FedScoop is the leading tech media brand in the federal government market. With more than 3.6 million monthly unique engagements and 130,000 daily newsletter subscribers, FedScoop gathers top leaders from the White House, federal agencies, academia and the tech industry to discuss ways technology can improve government and identify ways to achieve common goals. With our website, newsletter and events, we’ve become the community’s go-to platform for education and collaboration.

CONTACT

Wyatt Kash Senior Vice President Content Strategy Scoop News GroupWashington, D.C. 202.887.8001 [email protected]

PRESENTED BY UNDERWRITTEN BY

mailto:wyatt.kash%40scoopnewsgroup.com%20?subject=

data analytics readiness - cdn.fedscoop.com

Documents