data analytics readiness - cdn.fedscoop.com
TRANSCRIPT
PRESENTED BY UNDERWRITTEN BY
The ability for federal agencies to harness artificial intelligence and machine learning to identify anomalous behavior on their networks depends on having essential data gathering, preparation and analytics tools in place.
This FedScoop report surveyed IT leaders at large, medium and small federal agencies to explore the state of their data analytics and machine learning capabilities; the obstacles agencies continue to face; and which types of services they’re turning to for greater support.
DATA ANALYTICS R E A D I N E S S F O R C Y B E R R E S I L I E N C E
EXECUTIVE SUMMARY
1
Managing cyber threats across today’s evolving on-premises and cloud-based IT environments depends increasingly on having robust and scalable capabilities to gather, analyze and respond to user behavior and network activity data.
That need has grown more essential for federal agencies, given the shift to hybrid and remote work models, the increasing complexity of IT systems and the government’s broader embrace of zero-trust security models. At the heart of those capabilities lies the need for clean, reliable, machine-learnable data. Federal IT and agency leaders indicate in a new FedScoop survey, however, that when it comes to harnessing data and machine learning — specifically to identify behavioral patterns that signal potential threats — they face several key challenges.
Among the major findings, based on completed responses from 160 prequalified federal IT and agency decision makers:
MACHINE LEARNING HURDLES
Though roughly two-thirds or more of federal IT leaders in the survey reported having above-industry-average capabilities to monitor, collect and analyze behavioral data across their networks, trying to train machine learning (ML) algorithms to identify and respond to anomalous behaviors on their networks remains challenging.
Among the top-ranking challenges
• Lack of experience training/testing ML algorithms
• Lack of adequate ML-related skills• Lack of clarity of what tools/services
meet their ML needs in the market• Lack of adequate tools• Lack of reliable ML-ready data
EXECUTIVE SUMMARY
2
SKILLS GAPS ACROSS ML PROCESS
Respondents at agencies of all sizes say they face significant deficiencies in skills across the data-to-ML processing cycle — from data ingestion to extraction/transforming/loading, to analysis, to ML-training, to operationalizing ML. Those deficiencies are hampering the ability to implement zero-trust models and build greater cyber resiliency. AGENCIES APPEAR READY TO HANDLE THE DATA
On a positive note, federal IT leaders say their organizations have the capabilities to monitor, process, store and analyze behavioral data of users, devices and applications on their networks — and well over two-thirds believe those capabilities meet or exceed industry and NIST accepted standards.
ML CHALLENGES VARY BY AGENCY SIZE
4 in 10 of respondents at large agencies (10,000+ employees) — which tend to deal with larger scale data challenges — cited a lack of adequate skills as a top challenge, compared to 2 in 10 respondents at small agencies (<1,000 employees), which are often still ramping up ML efforts or are relying on third parties.
Conversely, 1 in 3 respondents at small agencies cited a lack of adequate tools among their biggest challenges, compared to less than 1 in 4 at large agencies. And more than twice as many respondents at small and mid-size agencies struggle with a lack of reliable ML-ready data, compared to their counterparts at large agencies.
EXECUTIVE SUMMARY
3
RELIANCE ON EXTERNAL SUPPORT
While federal IT leaders indicate they have the capabilities to handle anomalous behavior data, a sizeable portion also report they’re opting to tap the expertise of external service providers at every stage of the data-to-machine learning process. The areas where agencies are most often seeking help are for data analytics and data integration and production; but there’s also high demand for help with ML governance and ML training.
The study, which was underwritten by Cloudera, also touches on other dimensions of data readiness, including:
• Agency capability to securely gather data at the edge of their networks as well as across their network environments.
• Where agencies are storing their ML production data.
• The extent to which agencies are relying on open-source solutions versus in-house and commercial solutions to prepare their ML data.
Respondents’ ratings of their organization’s data gathering, preparation, analytics and ML capabilities varied by size of agency, however, Leaders at larger agencies, for example, rated their ability to detect “ML model drift” lower than small to midsize agencies.
The variances reflect in part a combination of the complexity of data and security challenges agencies are facing as well as how far along they are in preparing data for machine learning. Whether agencies are fully capitalizing on their capabilities at different stages in the data assembly and ML process remains an open question.
WHO WE SURVEYED
4
FedScoop conducted an online survey of 160 prequalified IT decision makers from federal government agencies — plus those contractors who support them — about their data aggregation, collection and analysis strategies to support greater cyber resilience.
The survey was conducted in August/September 2021 and underwritten by Cloudera.
BY JOB TITLE
41% IT / NETWORK OPERATIONS / DEVELOPMENT PERSONNEL
14% DATA MANAGEMENT OFFICIAL
13% C-SUITE / SENIOR BUSINESS / PROGRAM LEADER
8% IT ANALYST
7% CIO / CTO / CISO
6% IT SYSTEMS ENGINEER / IT DEVELOPER
6% IT INFLUENCER
6% OTHER (E.G. IT STAFF / USERS)
BY AGENCY SIZE
44% 10,000+ EMPLOYEES
36% 1,000 – 9,999 EMPLOYEES
20% 1 – 999 EMPLOYEES*
*20% of respondents identified as government contractors who work on or support federal agency IT projects.
*Results in the study among respondents at agencies with under 1,000 employees are subject to a greater margin of error, due to their smaller base.
RESPONDENT BREAKOUT
MACHINE LEARNING FOR SECURITY TOP CHALLENGES
10,000+ EmployeesBAS E : 70
1,000 – 9,999 EmployeesBAS E : 5 8
Less than 1,000 EmployeesBAS E : 3 2
L E GE N D
What are the biggest challenges your agency is experiencing trying to train machine learning algorithms to identify behavioral patterns from devices / applications operating in your network?
I DONT KNOW: 23%
OTHER: 6%
I DONT KNOW: 14%
OTHER: 7%
I DONT KNOW: 19%
OTHER: 6%
0% 10% 20% 30% 40% 50%
37%
43%
38%
23%
24%
22%
10%
41%
31%
28%
29%
34%
39%
29%
19%
LACK ADEQUATE TOOLS
LACK ADEQUATE SKILLS
LACK EXPERIENCE TRAINING / TESTING ML
LACK RELIABLE ML-READY DATA
LACK CLARITY OF WHAT TOOLS / SERVICES MEET OUR NEEDS IN THE MARKET
5
MACHINE LEARNING PROCESS SKILLS CRITICAL GAPS
Where is your agency facing the most significant deficiency in skills?
0% 10% 20% 30% 40% 50%
27%
28%
36%
29%
36%
38%
31%
36%
25%
26%
16%
28%
36%
39%
28%
36%
41%
33%
DATA INGESTION
EXTRACTING, TRANSFORMING, LOADING DATA (ETL)
EXPLORATORY DATA ANALYSIS
ML TRAINING / BUILDING MODELS
OPERATIONALIZING ML MODEL
MONITORING
I DONT KNOW: 24%
OTHER: 4%
I DONT KNOW: 9%
OTHER: 3%
I DONT KNOW: 22%
OTHER: 6%
10,000+ EmployeesBAS E : 70
1,000 – 9,999 EmployeesBAS E : 5 8
Less than 1,000 EmployeesBAS E : 3 2
L E GE N D
6
Where is your agency facing the most significant deficiency in skills?
MACHINE LEARNING PROCESS SKILLS CRITICAL GAPS
MACHINE LEARNING DATA REPOSITORIES TOP LOCATIONS
Where is most of that machine learning data being stored?
I DONT KNOW: 24%
I DONT KNOW: 12%
I DONT KNOW: 28%0% 10% 20% 30% 40%
14%
10%
29%
16%
26%
26%
14%
19%
31%
25%
14%
13%
IN MULTIPLE LOCATIONS ON PREMISES
IN A CENTRAL DATA REPOSITORY ON PREMISES
PRIMARILY IN A GOVERNMENT-SANCTIONED CLOUD
SPREAD ACROSS MULTIPLE ON-PREMISE AND CLOUD ENVIRONMENTS
10,000+ EmployeesBAS E : 70
1,000 – 9,999 EmployeesBAS E : 5 8
Less than 1,000 EmployeesBAS E : 3 2
L E GE N D
7
3% 6%
6% 9% 21% 29%
22% 28%
17%
36%
48%28%5%
2%
41%
3%
3%
6% 19%
19%
34%
38%
34%
39%
31%
44%
31%
USERS / DEVICES OPERATING AT THE NETWORK’S EDGE
USERS / DEVICES OPERATING ACROSS AN AGENCY’S IT ENVIRONMENT
Capabilities fall below minimal industry / NIST accepted practices
Meets majority of industry / NIST best practices
Exceeds most industry / NIST accepted standards1 3 5
ABILITY TO MONITOR BEHAVIORIAL ACTIVITY
Respondents rate their agency’s current ability to monitor user / device behavioral activity (Scale of 1-5)
10,000+ EmployeesBAS E : 70
10,000+ EmployeesBAS E : 70
1,000 – 9,999 EmployeesBAS E : 5 8
1,000 – 9,999 EmployeesBAS E : 5 8
Less than 1,000 EmployeesBAS E : 3 2
Less than 1,000 EmployeesBAS E : 3 2
USERS / DEVICES OPERATING AT THE NETWORK’S EDGE
USERS / DEVICES OPERATING ACROSS AN AGENCY’S IT ENVIRONMENT
8
ABILITY TO MONITOR BEHAVIORIAL ACTIVITY
9
Capabilities fall below minimal industry / NIST accepted practices
Meets majority of industry / NIST best practices
Exceeds most industry / NIST accepted standards1 3 5
Respondents rate their agency’s current ability to monitor software behavioral activity (Scale of 1-5)
10,000+ EmployeesBAS E : 70
1,000 – 9,999 EmployeesBAS E : 5 8
Less than 1,000 EmployeesBAS E : 3 2
4% 7% 21% 39%29%
28%
3% 28% 25% 44%
34%28%9%
2%
3%
3%
6% 19%
19%
38%
SOFTWARE APPLICATIONS OPERATING ACROSS YOUR IT ENVIRONMENT SOFTWARE APPLICATIONS OPERATING ACROSS YOUR IT ENVIRONMENT
9
ABILITY TO GATHER & ANALYZE BEHAVIORIAL DATA
10
Capabilities fall below minimal industry / NIST accepted practices
Meets majority of industry / NIST best practices
Exceeds most industry / NIST accepted standards1 3 5
Respondents rate their agency’s current ability to collect, analyze and store behavior activity (Scale of 1-5)
10,000+ EmployeesBAS E : 70
10,000+ EmployeesBAS E : 70
1,000 – 9,999 EmployeesBAS E : 5 8
1,000 – 9,999 EmployeesBAS E : 5 8
Less than 1,000 EmployeesBAS E : 3 2
Less than 1,000 EmployeesBAS E : 3 2
3% 3% 36% 26% 33%
4% 9% 23% 33%31%
29%
3% 6% 31%22% 38%
34%24%10%
2%
26%14%
3% 3% 22% 31% 41%
26% 33%
2%
COLLECT AND ANALYZE BEHAVIORAL DATA OF USERS / DEVICES / SOFTWARE ACROSS THE IT ENVIRONMENT
STORE HISTORICAL BEHAVIORAL DATA FROM USERS / DEVICES / SOFTWARE OPERATING ON THE NETWORK
10
ABILITY TO DETECT & RESPOND TO ANOMALOUS BEHAVIOR
Capabilities fall below minimal industry / NIST accepted practices
Meets majority of industry / NIST best practices
Exceeds most industry / NIST accepted standards1 3 5
Respondents rate their agency’s current ability to react to anomalous behavior (Scale of 1-5)
10,000+ EmployeesBAS E : 70
1,000 – 9,999 EmployeesBAS E : 5 8
Less than 1,000 EmployeesBAS E : 3 2
3% 9% 14% 33% 41%
3% 9% 20% 36%33%
3% 29%
6% 41%13%9% 31%
38%22%7%
28% 43% 26%
3% 19% 13% 31% 34%
2% 2%
DETECT ANOMALOUS USER / DEVICE BEHAVIOR ACROSS YOUR IT ENVIRONMENT IN NEAR-REAL TIME
DETECT ANOMALOUS APPLICATION BEHAVIOR ACROSS YOUR IT ENVIRONMENT IN NEAR-REAL TIME
11
10,000+ EmployeesBAS E : 70
1,000 – 9,999 EmployeesBAS E : 5 8
Less than 1,000 EmployeesBAS E : 3 2
ABILITY TO DETECT & RESPOND TO ANOMALOUS BEHAVIOR
Capabilities fall below minimal industry / NIST accepted practices
Meets majority of industry / NIST best practices
Exceeds most industry / NIST accepted standards1 3 5
Respondents rate their agency’s current ability to react to anomalous behavior (Scale of 1-5)
10,000+ EmployeesBAS E : 70
1,000 – 9,999 EmployeesBAS E : 5 8
Less than 1,000 EmployeesBAS E : 3 2
12
4% 4% 21% 46%24%
3% 33%
13% 28%22% 38%
34%22%7%
RESPOND TO ANOMALOUS BEHAVIOR, E.G. BLOCKING SOURCE OR ACCESS IN NEAR-REAL TIME
ABILITY TO USE DATA FOR AI AND ML
13
Capabilities fall below minimal industry / NIST accepted practices
Meets majority of industry / NIST best practices
Exceeds most industry / NIST accepted standards1 3 5
Respondents rate their agency’s current ability to clean and prepare data for AI/machine learning (Scale of 1-5)
10,000+ EmployeesBAS E : 70
1,000 – 9,999 EmployeesBAS E : 5 8
Less than 1,000 EmployeesBAS E : 3 2
6% 14% 37% 25% 25%
9% 9% 31% 25% 25%
6% 17% 20% 24%33%
10% 28%
6% 16%28%9% 41%
24%38%
33%9% 40% 19%
CLEAN / STANDARDIZE DATA FOR MACHINE LEARNING
PREPARE / ANNOTATE DATA FOR MACHINE LEARNING
13
10,000+ EmployeesBAS E : 70
1,000 – 9,999 EmployeesBAS E : 5 8
Less than 1,000 EmployeesBAS E : 3 2
ABILITY TO USE DATA FOR AI AND ML
14
Capabilities fall below minimal industry / NIST accepted practices
Meets majority of industry / NIST best practices
Exceeds most industry / NIST accepted standards1 3 5
10,000+ EmployeesBAS E : 70
1,000 – 9,999 EmployeesBAS E : 5 8
Less than 1,000 EmployeesBAS E : 3 2
14
10,000+ EmployeesBAS E : 70
1,000 – 9,999 EmployeesBAS E : 5 8
Less than 1,000 EmployeesBAS E : 3 2
14%
7%
13% 28%6% 22%
19% 22% 28%
28%
31%
31%26%
31%
9%
9% 19%
6% 31%
26%14%
33%
28%
19%
23%
40%
25%
17%
2%
2%
MONITOR AND DETECT MACHINE LEARNING “MODEL DRIFT”
BUILD AND OPERATIONALIZE YOUR MACHINE LEARNING MODEL
Respondents rate their agencies’ current ability deploy behavioral data for AI/machine learning (Scale of 1-5)
MACHINE LEARNING DATA PREPARATION PREFERRED SOLUTION
Which type of solutions are used most by your agency to label and prepare machine learning training data?
0% 10% 20% 30% 40% 50%
21%
29%
22%
11%
19%
13%
27%
28%
16%
5%
10%
34%
COMMERCIAL SOLUTIONS
OPEN-SOURCE SOLUTIONS
IN-HOUSE SOLUTIONS
USE MANAGED SERVICE TO PERFORM THE WORKI DONT KNOW: 22%
I DONT KNOW: 12%
I DONT KNOW: 30%
10,000+ EmployeesBAS E : 70
1,000 – 9,999 EmployeesBAS E : 5 8
Less than 1,000 EmployeesBAS E : 3 2
L E GE N D
15
0% 10% 20% 30% 40% 50%
13%
13%
13%
31%
31%
36%
38%
49%
41%
47%
24%
13%
6%
14%
19%
33%
29%
19%
34%
22%
28%
DATA READINESS FOR MACHINE LEARNING RELIANCE ON SERVICE PROVIDERS
Which of the following external services has your agency used for its ML projects?
I DONT KNOW: 23%
I DONT KNOW: 17%
I DONT KNOW: 31%
DATA INGESTION
DATA LABELING
DATA INTEGRATION / PRODUCTION
DATA ANALYTICS
ML TRAINING
ML DEPLOYMENT
GOVERNANCE
10,000+ EmployeesBAS E : 70
1,000 – 9,999 EmployeesBAS E : 5 8
Less than 1,000 EmployeesBAS E : 3 2
L E GE N D
16
DATA READINESS FOR MACHINE LEARNING RELIANCE ON SERVICE PROVIDERS
RECOMMENDATIONS
17
While federal IT leaders maintain their agencies have the capabilities to ingest, prepare and analyze data, they still need help harnessing those capabilities to leverage machine learning in order to better detect and respond to anomalous behavior on their networks.
The relatively high reliance on third-party support for ML analytics suggests agencies may still be relying on SQL-based workloads — and would benefit from moving to more modern, integrated platforms for ingesting and analyzing behavioral data to improve cyber resilience.
The deficiency in skills across most ML-related data processing stages — and the rapid evolution of data management and ML tools — suggest that agencies would achieve zero-trust frameworks faster by engaging with service providers specializing in modernized data and ML solutions.
Agencies are also likely to benefit from shifting from in-house solutions, to more portable and adaptable opensource solutions for collecting and preparing data for ML production — especially as data gathering and analysis increasingly take place at the edge of agency networks.
Though often underappreciated, getting data and ML governance right — and conveyed across the organization — remain crucial components in building ML competency and achieving broader zero-trust strategies.
FedScoop is the leading tech media brand in the federal government market. With more than 3.6 million monthly unique engagements and 130,000 daily newsletter subscribers, FedScoop gathers top leaders from the White House, federal agencies, academia and the tech industry to discuss ways technology can improve government and identify ways to achieve common goals. With our website, newsletter and events, we’ve become the community’s go-to platform for education and collaboration.
CONTACT
Wyatt Kash Senior Vice President Content Strategy Scoop News GroupWashington, D.C. 202.887.8001 [email protected]
PRESENTED BY UNDERWRITTEN BY