abstract we live in a world of viruses, worms, and browser threats that change and adapt on an...

Post on 24-Dec-2015

220 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Protecting the World with Big Data

Bill PfeiferProgram ManagerMicrosoft Malware Protection CenterSeptember 2014

Abstract

We live in a world of viruses, worms, and browser threats that change and adapt on an hourly basis. Learn how Microsoft’s Protection Team, who bring you Microsoft Security Essentials and Windows Defender, has built and maintained a Big Data solution to protect Windows customers. These efforts offer monitoring and tools for release management, cloud protection, automatic signature generation, and malware research.

About Me

• Tlingit from southeast Alaska• Interested in electronics and

security from a young age• BS from University of Alaska,

Fairbanks• MS from Purdue• Member of AISES• Microsoft for 3.5 years

Tlingit totem pole and community house in Totem Bight State Park, Ketchikan, Alaska.Credit: Bob and Ira Spring

MS Malware Protection Center

Offerings• Microsoft Security Essentials

• Windows Defender

• System Center Endpoint Protection

• Office 365 Protection

• Azure Protection

• Windows Store protection

• Protect the unprotected • Remain security vendor agnostic

• Publish world-class security content, world security posture

• Remove malware value proposition

• Reduce malware’s reach and life span with cloud protection, machine learning, automation, and faster sample collection

• Identify and work with partners to eliminate malware monetization schemes

• Drive increased malware sample, telemetry, knowledge sharing

• Formalize strategic partner relationships with vendors, CERTs, E-commerce, application vendors and distributors

• Sponsor coordinated malware eradication campaigns

MMPC main goals

Disrupt malware ecosystem

Help ensure all Microsoft customers are protected

Build a strong and united ecosystem

PROTECTION SENSORS

Windows 8+ Defender 55MWindows 7- MSE 94M

Enterprise SCCM, Intune 6M

MSRT Monthly cleanup 1.2B

Azure, Office 365 .Windows 7- Defender 309M

DAILY675K new samples250M cloud calls12 sig releases

RESULTS79% protected13% encounters3.6% infected122M unique files

Too many expired AVs on Windows 8+

Malware out-paces sigs

GOALSEnsure all of Microsoft’s customers are protected

• Measure, push user-not-protected scenariosEradicate malware

• Apply new protection techniques• Amplify researchers with automation• Block the first time with the cloud

Lead antimalware ecosystem• Drive appropriate behavior• Coordinate activities across industry and ecommerce players• Fix testing perception, testing approach

8

The usual suspects• Malware families

rarely die: 466 make up top 99% of infections• Disruption helps, but

most families come back…• … and when they do,

they come back more resilient

Inefficiency: Lingering malware infections

Encounters vs. Infections

Data from Microsoft real time protection clients

Heat map shows rate of encounters (Blue->Green->Yellow->Red)

Country color signifies % of customers with infections

Of note Flooding works: more

encounters mean more infections

World-wide: 9% encounters, 3% infections

The trick is to stop the encounters

http://www.microsoft.com/security/sir/threat/default.aspx

Family Encounters Infections Industry MissesJenxcus 1,804,868 188,540 177,523 OptimizerElite 195,846 157,777 - Zbot 237,470 143,353 34,107 Brantall 433,403 113,526 - Wysotot 536,452 105,791 186,541 Rotbrow 508,577 100,354 271,326 Necurs 127,059 96,505 - Sality 455,342 91,492 50,485 Rovnix 85,228 71,019 - Kilim 196,326 67,871 1,198 Ramnit 437,128 67,171 30,629 Upatre 110,502 64,375 - Clikug 441,677 62,438 - Gamarue 920,694 58,110 26,624 Virut 223,311 57,594 2,516 Filcout 2,791,531 52,555 7,495,272 Spacekito 77,385 51,652 53,379 Napolar 156,702 51,492 1,973 Alureon 72,305 50,762 3,860 Dorkbot 471,114 48,697 8,128

Threat Family reports

Antimalware automation

Big Datasamples,

telemetry, reputation,

determinations

Analysis

Auto-classification

Signature generation

Telemetry response

Industry- Samples- Meta-data- Reputation- Determinations

Collection

Customers- Telemetry- Samples

Collection- Industry and customers- Automatic and on demand

Big Data- Samples- Map reduce- Processed/Workflow

Analysis- Dynamic and Static- Vendor rescans/determinations- Human-supplied patterns

Auto-classification- Combine analysis with reputation- Assign determination, family- Feeds sig-gen and cloud protection

Signature Generation- Best-fit signature- Static and proactive- Signature release pipeline

Telemetry Monitoring- FP detection- Never unknowns- Sample requests

Business Intelligence Team

Query Masters• Dashboards• Livesite reporting• PoR meetings• Researcher tools• Query Optimizations

Data Infrastructure

Multiple data sources• Windows Update• Watson Error Reporting• Software Quality Metrics• Telemetry Threat/suspicious

reports

Features

Storage & Usage Numbers

Threat Telemetry

Raw data 2 TB per day 360 TB for ½ yearReduced 200 GB per day 36 TB for ½ year

More than 200 engineers and researchers on the protection team

2 Cosmos Clusters 3 VC instances each

2 PBs stored between clusters

10K job/day

1.5K adhoc jobs/day

4.5 PB read/day by adhoc

Queue wait time of 2 minutes

First Impressions

Issues we found

Missing features• No coding guidelines• Limited shared libraries• No Discoverability• No scheduling

Impact• Duplication of work• Duplication of data• Long execution time

Expensive operations

CROSS APPLY(De-serializing rows)

CLUSTER BY(Partitioning the storage)

Data Skews

Evolution

How things started to evolve

Intermediate outputs for multi-stage jobs• Rerun against the middle outputs while developing

Documenting reusable data streams (lookup tables & contextual streams)Caching historical data

• Need to write stream sets to join over date ranges

Creating views over the cachesCreating libraries

• Common processors (strip the Threat Family Name out of the Threat Name)• Contextual meta data (geolocation)• Enumerations

Stopgap Scheduling • Task scheduler

Formalizing a Data Model

4Metrics/KPIs Views

Lookup Profiles

Metrics/KPI Streams

3Aggregate Views

Aggregate Streams

2Filter Views

Filter Streams

1Curated Views

Curated Streams

0 Raw

File ProfileKey: Sha1/Sha256Provides: First Seen dates, Prevalence, Top sigseqs, Top filenames, etc.

Family Profile Key: Family NameProvides: Family owners, Class, Machine Impacts, etc.

Device ProfileKey: Machine GUIDProvides: Heartbeat rate, City, State, Country, Platform, Top Threat IDs, etc.

Filename ProfileKey: FilenameProvides: Ancestors, Top Threat Ids, Top Sigs, First Seen, etc.

Signature ProfileKey: SigSeqProvides: Last check-in date, author, prevalence, family association, etc.

Sample Source ProfileKey: Source NameProvides: count of samples, efficacy of source, rate of samples etc.

URL ProfileKey: URLProvides: Top Threat Ids, Top Sigs, First Seen, family association, etc.

IP ProfileKey: IPProvides: Top Threat Ids, Top Sigs, First Seen, family association, etc.

Example Metrics: PSL/ESL

4 PSL / ESL KPI

Lookup Profiles

3 MissesAggregate

ActivesAggregate

Incorrect DetectionsAggregate

FailuresAggregate

EncountersAggregate

2 Missesview

Activesview

Incorrect Detectionsview

Failures view

EncountersView**

1

Canonical Telemetry view*

File Report view

Memory Report view

Boot-removal Reportview

Boot Reportview

Rootkit Reportview

File Report Memory Report Boot-removal Report Boot Report Rootkit Report

0 Raw Telemetry

4Metrics/KPIs Views

Lookup Profiles

Metrics/KPI Streams

3Aggregate Views

Aggregate Streams

2

Filter Views

Filter Streams

1Curated Views

Curated Streams

0 Raw

PDMCalculating the PSL, ESL based on the Protection Data Model

Operationalizing• Monitoring

• Job execution• Output stream creation• Dashboard creation

• Automated scheduling• Sangam workflows

• Production level libraries• Common source

• Production level caches• SLA on bug fixes, breaking change notification

• Documentation library• MSDN-like docs for reference and discoverability

• Testing framework

What is next?

• AB Testing• Increased production job stability• Increase agility

• Automatic Dashboard generation• Custom views for each researcher

• Rule based query generation• Parallel logic across single data set

© 2013 Microsoft. All rights reserved. Microsoft, Windows and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

top related