down the dependency rabbit hole - intel.github.io · down the dependency rabbit hole machine...

Post on 18-Jul-2020

7 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Down The Dependency Rabbit HoleMachine Learning as a first line of defense in Intel’s Dependency Review Process

2@twitter handle

$ envNAME=John AndersenHOME=Portland ORUSER=pdxjohnnyWORK=IntelROLE=Open Source Security Software EngineerEMAIL=john.s.andersen@intel.comINTERESTS=Embedded, Linux, Containers, Concurrency, Web Apps, PythonTHESIS=Machine Learning

3

Dependency Evaluation Review Form

@twitter handle

▪ Initial dataset is made up of

▪ URL of source repo

▪ Security team’s classification (Good / Bad)

▪ Review form data

▪ Plan

▪ Train model on dataset

▪ Assess accuracy

▪ Given URL, collect data to answer form questions

▪ Predict classification by feeding collected data to model

4

Automation Attempt One

@twitter handle

5

Reviewers Rarely Fill Out Evaluation Form

@twitter handle

Bad

Datase

t

▪ Initial dataset is made up of

▪ URL of source repo

▪ Security team’s classification (Good / Bad)

▪ Forms mostly not filled out 👎

▪ Plan

▪ Train model on dataset

▪ Assess accuracy

▪ Given URL, collect data to answer form questions

▪ Predict classification by feeding collected data to model

▪ ~60% Accuracy 🚫

6

Automation Attempt One

@twitter handle

▪ Initial dataset is made up of

▪ URL of source repo

▪ Security team’s classification (Good / Bad)

▪ Plan

▪ Given URL, collect data

▪ Train model on dataset

▪ Assess accuracy

▪ Predict classification by feeding collected data to model

7

Automation Attempt Two

@twitter handle

8

Brainstorming

@twitter handle

Quarterly Ratio of Lines of Comments to Code

10

Prediction Data Flow

@twitter handle

Labeled Data

Git Repo URL

Commits

Authors

...Tensorflow

Deep Neural Network

Good

Bad

11

Request Classification Estimation

Data Flow Facilitator for Machine LearningMachine Learning made easy

▪ Data Flow

▪ Dataset generation

▪ Concurrency without dealing with locking

▪ Sources

▪ CSV

▪ JSON

▪ MySQL

▪ Models

▪ Tensorflow

▪ SciKit

13

Abstractions DFFML Provides

@twitter handle

▪ Python Library

▪ Command Line Interface

▪ HTTP API

▪ JavaScript Client

14

Consistent API

@twitter handle

15

Should I Be Installing This?

@twitter handle

16

What is a Data Flow?

@twitter handle

17

Deploy Anywhere - Command Line

18

Deploy Anywhere - HTTP

19

Deploy Anywhere - HTTP

20

Deploy Anywhere - HTTP

21

Extend Without Writing Code - Modify DataFlow

@twitter handle

▪ How to Integrate Machine Learning Tutorial

▪ https://intel.github.io/dffml/usage/integration.html

▪ shouldi

▪ pip install shouldi && shouldi install some-package-name

▪ https://intel.github.io/dffml/tutorials/operations.html

▪ Use and Contribute!

▪ Weekly Meetings: Tuesdays at 9 AM

▪ Gitter, Mailing List, and Meeting Links: https://intel.github.io/dffml/community.html

▪ Documentation: https://intel.github.io/dffml

▪ Q&A

22

Where To Go From Here

@twitter handle

top related