data science ops in practice...the role of data science in cyber operations 3 •the rate of data...

31
DATA SCIENCE OPS IN PRACTICE Learn How Splunk Enables Fast Science for Cybersecurity Operations OLISA STEPHENSBAILEY DAVID BRENMAN SEPTEMBER 2017 Innovation center, Washington, D.C.

Upload: others

Post on 04-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DATA SCIENCE OPS IN PRACTICE...THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS 3 •The rate of data growth is outpacing human capabilities •We must optimize impact of the people we

DATA SCIENCE OPS IN PRACTICELearn How Splunk Enables Fast Science for Cybersecurity Operations

OLISASTEPHENSBAILEYDAVIDBRENMANSEPTEMBER2017

Innovationcenter,Washington,D.C.

Page 2: DATA SCIENCE OPS IN PRACTICE...THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS 3 •The rate of data growth is outpacing human capabilities •We must optimize impact of the people we

SECTION1:UNDERSTANDINGTHECORENEED

SECTION2:CROSSINGTHEANALYSISCHASM

SECTION3:ANALYSISWORKFLOWDEMONSTRATION

SECTION4:ACTIONITEMSFORYOURPROJECTS

AGENDA

1

LEARN HOW TO:

ADDRESS CULTURAL CHALLENGES

ENSURE YOUR DATA SCIENCE SOLUTIONS GET USED

HARNESS THE FULL POWER OF PYTHON WITHIN SPLUNK

DATA SCIENCE OPS IN PRACTICE

Page 3: DATA SCIENCE OPS IN PRACTICE...THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS 3 •The rate of data growth is outpacing human capabilities •We must optimize impact of the people we

UNDERSTANDING THE CORE NEED

2

Page 4: DATA SCIENCE OPS IN PRACTICE...THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS 3 •The rate of data growth is outpacing human capabilities •We must optimize impact of the people we

THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS

3

• Therateofdatagrowthisoutpacinghumancapabilities

• Wemustoptimizeimpactofthepeoplewedohave

• DataScienceisapowerfultooltoreducethescaleoftheproblem

• Inresponsetotheseneeds,BoozAllenHamiltonwastaskedwithintegratingDataScienceintotheWatchfloor

[1]

Page 5: DATA SCIENCE OPS IN PRACTICE...THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS 3 •The rate of data growth is outpacing human capabilities •We must optimize impact of the people we

CYBER OPERATIONS ANALYSTS & DATA SCIENTISTS POINTS OF VIEW

• Areevaluatedonquantityofoutput• HaveaclearlydefinedSOP• Willloseproductivityeverytimetheyinvestinlearninganewtool• Donotneednewtoolstobeeffective• Areleeryofbuggyprototypecode• HaveadistrustoftheblackboxMachineLearningalgorithm

• LiketounderstandwhattheAnalystistryingtodoratherthanfitexistingsolutiontoproblem• Areevaluatedondevelopmentofnovelmethods• Gainhonorandreputationfromimplementingcuttingedgealgorithms• Donotlikesupportinglegacysoftware• Haveanunwaveringtrustinmathematics

4

CyberOperationsAnalysts DataScientists

Imustmeetmyquota,Idon’thavetimefortoys

Theoldwayisoutofdate,wemustimprove

Page 6: DATA SCIENCE OPS IN PRACTICE...THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS 3 •The rate of data growth is outpacing human capabilities •We must optimize impact of the people we

APPRECIATING YOUR ROLE FOUNDATIONAL KEY TO SUCCESS

• Themostimportantlessonlearned

• Analystsareinapowerposition:-Theyareneeded-Theyownthedomainknowledge-Theyownthetradecraft-Theyowntheaccesses-Theyownthedata

• ItistheresponsibilityoftheDataScientisttoshowrespectandlearn-TheDataScientistisintrudingintotheAnalyst'sdomain

5

AnalystsarefullycapableofmeetingtheircurrentobjectiveswithoutDataScience

[2]

Page 7: DATA SCIENCE OPS IN PRACTICE...THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS 3 •The rate of data growth is outpacing human capabilities •We must optimize impact of the people we

CROSSING THE ANALYSIS CHASM

6

Page 8: DATA SCIENCE OPS IN PRACTICE...THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS 3 •The rate of data growth is outpacing human capabilities •We must optimize impact of the people we

BRIDGING THE GAP BETWEEN ANALYSTS & DATA SCIENTISTS IN OPERATIONS

• ManyAnalystsdonotunderstandappliedstatisticsormachinelearninganddonotunderstandhowitcanbeappliedtotheirdomain

• DataScientistswishingtomakeanimpactshould:-Minimizethenumberofnewwidgetsananalystneedstolearn-Provideallresultswithmeaningfulsupportingevidence-Weightclarityasmuchasperformanceinalgorithmselection-Appreciatethatreportingtherearenoresultsisfarbetterthanfalsepositives

• Hostyourend-solutionsinthetoolenvironmenttheyuse

7

MinimizeNumberofTools

ProvideEvidence

EnsureInterpretability

SilenceIsaVirtue

IfAnalystsUseSplunk,YouUseSplunk

Page 9: DATA SCIENCE OPS IN PRACTICE...THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS 3 •The rate of data growth is outpacing human capabilities •We must optimize impact of the people we

LEVERAGING THE POWER & FLEXIBILITY WITH PYTHON & SPLUNK

8

• Pros-Providesdeveloperswithaccesstowidearrayofdataprocessinglibraries-Object-Orientedprogramdesign-Rapidprototypescriptinglanguage• Cons-Mustbeabletocode-Developedprojectstendtobeindividualobjects- Steeplearninggapforusers

• Pros- Singleunifiedsystemforcollecting,digestingandqueryingdata-Attractive2Dplotting-Usersabletoseamlesslynavigatetorawdatabehindplots

• Cons-Querylanguagenarrowsfindings- Lacksflexibilityofprograminglanguage- LimitedpythonlibrarywithinSDK

Python Splunk

CombinethedevelopmentflexibilityofPythonwiththeconsistencyofSplunktobenefitAnalysts

Page 10: DATA SCIENCE OPS IN PRACTICE...THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS 3 •The rate of data growth is outpacing human capabilities •We must optimize impact of the people we

STEP #1 - WORK DIRECTLY WITH ANALYSTS TO SOURCE A USE CASE

• OurDataScienceteamworksdirectlywithAnalyststoworktogetheronanalyticobjectives-Toidentifymaliciousoraberrantbehaviorwithinanewbatchoflogdata-TodetectsuspiciousURLs

• Theirworkflowconsistedof:1. DigestlogfilesintoSplunk2. Labelfields3. ExplorethedatawithSMEsandviaSplunkqueries4. ReportanynewSplunkqueriesofvalue

9

WeexpediteAnalysts’Splunkingby• Groupingsimilarobservations• Highlightingsuspiciousoutliers• Unlockingnewfeatures [4]

Page 11: DATA SCIENCE OPS IN PRACTICE...THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS 3 •The rate of data growth is outpacing human capabilities •We must optimize impact of the people we

STEP #2 – SELECT METHOD FOR INTEGRATING DATA SCIENCE CAPABILITIES

10

METHOD1• Thismethodhasprovencapableinrapiddeliverysituations• IdentifyalinkingfieldandexportthedataoutofSplunk• Processthedatawithany DataScienceSoftware• CreateanewCSVandusepreviouslinkingfieldtoenrichoriginaldata

RawData

DataFormatted&Indexed

DataExportedto

CSV

RunAnySoftwareApplication

PrintCSVWithLinking

Field

IdentifyLinkingFiled

ImportCSVasLookupTable

RunSplunkProcessingQuery

EnrichedDataInReadyFor

Use

ExternalSoftware

Splunk

ImportAny

Libraries

Page 12: DATA SCIENCE OPS IN PRACTICE...THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS 3 •The rate of data growth is outpacing human capabilities •We must optimize impact of the people we

STEP #2 – SELECT METHOD FOR INTEGRATING DATA SCIENCE CAPABILITIES

11

METHOD2• Slowertosetupfirsttime,buthighlyeffectiveafterthat• UseyourownPythonenvironment• Abletoleverageanylibrary;Scikit-Learn,TensorFlow,Theano,Scrapy,etc.

RawData

DataFormatted&Indexed

RunAnySoftwareApplication

YourAppReturnsResultstoSplunk

RunStandardSplunkQueries

ExternalPython

Splunk

ImportAny

Libraries

CallYourSplunk/Python

App

YourAppStartsExternalPython

Session

Page 13: DATA SCIENCE OPS IN PRACTICE...THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS 3 •The rate of data growth is outpacing human capabilities •We must optimize impact of the people we

STEP #3 – EXECUTE MACHINE LEARNING ALGORITHM DEVELOPMENT PROCESS

12

DataCollection&Aggregation

Splunkmakesiteasy!

RawData

RawData

RawData

Pre-Processing&Cleaning

FeatureExtraction&Vectorization

Externalsoftwareneededfor

advancedfeaturecalculations

ApplyMLAlgorithm

PostAnalysisofResults

Splunkreallyshineswhenitcomestimetopresentyour

results

• SplunkisapowerfulassetinmanystagesoftheMachineLearningprocess

Page 14: DATA SCIENCE OPS IN PRACTICE...THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS 3 •The rate of data growth is outpacing human capabilities •We must optimize impact of the people we

ANALYSIS WORKFLOW DEMONSTRATION

13

Page 15: DATA SCIENCE OPS IN PRACTICE...THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS 3 •The rate of data growth is outpacing human capabilities •We must optimize impact of the people we

LOOK FAMILIAR?

14

Page 16: DATA SCIENCE OPS IN PRACTICE...THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS 3 •The rate of data growth is outpacing human capabilities •We must optimize impact of the people we

STEP #4 – SHOW EVIDENCE TO SUPPORT ANALYSIS RESULTS

15

THENOTORIOUSBLACKBOX

JUSTBELIEVEME‘CAUSEI’MAWESOME!

Page 17: DATA SCIENCE OPS IN PRACTICE...THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS 3 •The rate of data growth is outpacing human capabilities •We must optimize impact of the people we

BEFORE BETTER APPS…

16

ClassicWireshark Good‘Ol Excel

Page 18: DATA SCIENCE OPS IN PRACTICE...THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS 3 •The rate of data growth is outpacing human capabilities •We must optimize impact of the people we

OUR NEW FEATURE EXTRACTION APPLICATION BRINGS NEW INSIGHTS TO ANALYSIS

New Stream AppFeatureExamples– AvoidBasicSummaryTableOverhead

Avg IP,port,time

Statistical sum(bytes),sum(bytes_in),sum(bytes_out),sum(packets_in),sum(packets_out),sum(response_time),sum(time_taken)

17

OurNewFeature Examples- MakeBetterUseofMLToolkit

Numeric duration

Statistical num_bytes_cli2srv,num_bytes_srv2cli,num_packets_cli2srv,num_packets_srv2cli,packet_deltat_avg_cli2srv,packet_deltat_avg_srv2cli,packet_deltat_entropy_2way,packet_deltat_entropy_cli2srv,packet_deltat_entropy_srv2cli

Weadded46new

features!!!!

Page 19: DATA SCIENCE OPS IN PRACTICE...THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS 3 •The rate of data growth is outpacing human capabilities •We must optimize impact of the people we

NEW STREAM APP ENABLES DIRECT ACCESS TO RAW PCAP IN SPLUNK

18

Page 20: DATA SCIENCE OPS IN PRACTICE...THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS 3 •The rate of data growth is outpacing human capabilities •We must optimize impact of the people we

NEW STREAM APP GIVE ANALYSTS MORE INFORMATION

19

Page 21: DATA SCIENCE OPS IN PRACTICE...THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS 3 •The rate of data growth is outpacing human capabilities •We must optimize impact of the people we

ML TOOLKIT ENABLES EXPLORATORY DATA ANALYSIS IN SPLUNK

20

Page 22: DATA SCIENCE OPS IN PRACTICE...THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS 3 •The rate of data growth is outpacing human capabilities •We must optimize impact of the people we

STOCK SPLUNK ML TOOLKIT HAS LIMITED FEATURES AVAILABLE FOR ANALYSIS

21

90%ofMLisPre-Processing&FeatureExtractionCraftingFeaturesisNecessaryBeforeFeedingTheMLTK

Page 23: DATA SCIENCE OPS IN PRACTICE...THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS 3 •The rate of data growth is outpacing human capabilities •We must optimize impact of the people we

DATA SCIENTISTS CAN ADD NEW FEATURES DIRECTLY INTO SPLUNK FOR EDA

22

Page 24: DATA SCIENCE OPS IN PRACTICE...THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS 3 •The rate of data growth is outpacing human capabilities •We must optimize impact of the people we

USER EXPERIENCE AND SUPPORTING EVIDENCE FOR DATA SCIENTISTS

23

Page 25: DATA SCIENCE OPS IN PRACTICE...THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS 3 •The rate of data growth is outpacing human capabilities •We must optimize impact of the people we

USER EXPERIENCE AND SUPPORTING EVIDENCE FOR ANALYSTS

24

Page 26: DATA SCIENCE OPS IN PRACTICE...THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS 3 •The rate of data growth is outpacing human capabilities •We must optimize impact of the people we

LIVE DEMO

25

Page 27: DATA SCIENCE OPS IN PRACTICE...THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS 3 •The rate of data growth is outpacing human capabilities •We must optimize impact of the people we

ACTION ITEMS FOR YOUR PROJECTS

26

Page 28: DATA SCIENCE OPS IN PRACTICE...THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS 3 •The rate of data growth is outpacing human capabilities •We must optimize impact of the people we

CULTURAL HURDLES & SUCCESSES

• Tacticsusedtoovercomeculturalbarriers- Youmustgototheanalyst;theywillshowyoutheiranalysisprocessANDgrantyoukeystotheirdatatroves

- Youmustbewillingtoexplainwhatanalysistechniquesyouareusingsimplyusingtheirterminologyasmuchaspossible

- Someoneonyourteamhastobewillingtotalktothecustomersandtheircustomers- thishelpsestablishanew,collaborativetribe

- Yourworkmustroleupintoastorythattellsthewhyandsowhatofthework- sometimesthisistheclosestonegetstoROI

- Marketing&brandingextremelyimportantforbreakingentrenchedthinkingandcoaxingparticipationtosomethingnew&shiny

• Buildaninterdisciplinaryteam- Unicornsarehardtofindandthebestsolutionsoftenareaproductofdivergentthought

- Dataanalysisisapipeline,journeyofsorts…ittakesdomainexpertsfromfieldsotherthanjustcomputerscienceormathematics

- HavingdatascientiststhathaveexpertiseinCyberOperationsmissionspacewillacceleratesuccess

27

Page 29: DATA SCIENCE OPS IN PRACTICE...THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS 3 •The rate of data growth is outpacing human capabilities •We must optimize impact of the people we

FOUR STEPS TO APPLYING DATA SCIENCE WITHIN CYBER OPERATIONS

• STEP#1- WORKDIRECTLYWITHANALYSTSTOSOURCEAUSECASE

• STEP#2– SELECTMETHODFORINTEGRATINGDATASCIENCECAPABILITIES

• STEP#3– EXECUTEMACHINELEARNINGALGORITHMDEVELOPMENTPROCESS

• STEP#4– SHOWEVIDENCETOSUPPORTANALYSISRESULTS

28

Page 30: DATA SCIENCE OPS IN PRACTICE...THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS 3 •The rate of data growth is outpacing human capabilities •We must optimize impact of the people we

TAKE AWAYS

1) Yourdatascienceteammustgototheanalyst

2) Populateyourresultswheretheuserchecks

3) Developself-containedlimitedsizeproductsthatcanbeiterativelyupdatedanddelivered

4) DataScientistsmustbeconcernedwithjustifyingtheirclaims

5) Splunkcanbeenhancedbyleveragingexternalscripting

29

Page 31: DATA SCIENCE OPS IN PRACTICE...THE ROLE OF DATA SCIENCE IN CYBER OPERATIONS 3 •The rate of data growth is outpacing human capabilities •We must optimize impact of the people we

INNOVATING THE CYBER DOMAIN THROUGH THE APPLICATION OF DATA SCIENCE

30