real-time fraud detection with storm and kafka

19
WELCOME Alexey Kharlamov, VP Technology

Upload: alexey-kharlamov

Post on 11-Apr-2017

407 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Real-Time Fraud Detection with Storm and Kafka

WELCOMEAlexeyKharlamov,VPTechnology

Page 2: Real-Time Fraud Detection with Storm and Kafka

1

•  AdFraud–Weeliminatefraudulentimpressionsbyadbotsandmakesureadsdon’tshowuponfraudulentwebsites

•  BrandSafety–Wemakesureadsdon’tshowupinplacesthatbrandsdon’twantthem

•  Viewability–Wemeasurewhetherapersonactuallyviewedanad

WEMEASUREANDENSUREQUALITY

Page 3: Real-Time Fraud Detection with Storm and Kafka

2

•  Adimpressionsprocessed:5+Billion/day

•  HTTPRequests:50+Billion/day

•  DataCenters:10+(AWSandon-premises)

•  Datastoredinclusters:6+petabytes

•  Newdatacollecteddaily:20+terabytes

•  Hadoopclusterprocessingcores~20,000

INTEGRALENGINEERINGBYTHENUMBERS

Page 4: Real-Time Fraud Detection with Storm and Kafka

H20World,11/10/2015

3

ADFRAUDNEARLYALLADFRAUDISCAUSEDBYBOTACTIVITY

AdStackingPlacingmulVpleadsontopofoneotherinasingleadplacement,withonlythetopadinview

IllegalBotsCompromisedcomputerswithbreachedsecuritydefensesconcededtoathirdparty

PixelStuffingStuffinganenVread-supportedsiteintoa1x1pixelAD

Page 5: Real-Time Fraud Detection with Storm and Kafka

H20World,11/10/2015

4

FRAUDDETECTION

facebook cnn ebay

nothingtoseehere.com

thisisnotabotnet.com

Page 6: Real-Time Fraud Detection with Storm and Kafka

H20World,11/10/2015

5

FRAUDDETECTION

facebook cnn ebay

nothingtoseehere.com

thisisnotabotnet.com

Page 7: Real-Time Fraud Detection with Storm and Kafka

6

REQUIREMENTS

•  QuicklyidenVfyfreshlyacVvatedbot

•  HighaccuracyofdetecVonalgorithms

•  AvoidtransferofpersonalinformaVonacrossborders

•  Withstandsingledatacenterfailure

Page 8: Real-Time Fraud Detection with Storm and Kafka

BLOCKINGMONITORING

5+billioneventsperday

Page 9: Real-Time Fraud Detection with Storm and Kafka

8

EVENTSESSIONIZATION

TimeTransaction 1 Transaction 2

Join Window

Impression 1

Impression 2UnloadDTDTDTInit

DTDTDTInit Timeout

Emit

Emit

Impression 3DTDTDT Timeout

Drop

Page 10: Real-Time Fraud Detection with Storm and Kafka

9

DATAFLOW

Inpu

t Top

icSessionBuilder

QLo

g To

pic

Fraud Detection

Hadoop

Model TrainingAssets

Firewall

Page 11: Real-Time Fraud Detection with Storm and Kafka

10

•  LocallogaggregaVonandprocessing•  TransferoverlonglinkscausesallsortsofsynchronizaVonproblems

•  Intra-DClinksarereliable,InternetisNOT.WecankeepdatalocalityandlogVmecoherence

•  Singlefirewallserverfailureisnot“stop-the-world”event.DatapresentonKaaacluster.

•  Acompletelyautonomoussystem

•  HigheravailabilitydueDCredundancy

INTRA-DCDATAPROCESSING

Page 12: Real-Time Fraud Detection with Storm and Kafka

11

DATACENTERARCHITECTURE

Server 1

Front-End Server

STORMFront-End Server

Server N

STORM

Front-End Server

Front-End Server

Page 13: Real-Time Fraud Detection with Storm and Kafka

LOGSOURCING:TAILERAGENT

●  Non-invasiveeventsourcing

●  Decoupleddatapublica[onandeventprocessing

●  Datafan-out

●  Hardlatencyrequirements●  <10msresponse

●  Periodiccheckpointstorecoveracerfailure

Page 14: Real-Time Fraud Detection with Storm and Kafka

RECOVERYSTRATEGY

•  Readlogsinmicro-batchesandmaintainstateinmemory

•  ReliableProcessing-  OnsuccessoperaVon-writecheckpoint-  Onfailurereturntopreviouscheckpoint-  Oncatastrophicfailurerewinddatafeedtoapointbeforetheproblemstarted

Page 15: Real-Time Fraud Detection with Storm and Kafka

LOGICALTIME●  Wall-clockdoesnotwork●  Loadspikes●  Recoveryrewindsdatafeedto

previousVme

●  Logicalclock●  MaximumVmestampseenbyBolt●  Newmessageswithsmaller

Vmestamparelate

●  NoclocksynchronizaVon●  Allboltsarein“weaksynchrony”

Page 16: Real-Time Fraud Detection with Storm and Kafka

DEBUGGINGANDMONITORING

•  MetricsrecordingandvisualizaVonisessenValcomponentofdevelopmentcycle-  EasefailuresymptomscorrelaVon-  Acceleratebuild/deploy/debugcycle-  ProvidetraceforproducVonissues

•  Monitorbusinessmetrics-  Thisistheonlythingyoucare-  Technicalissuesmayormaynothaveconsequences

•  Doitalot-  150Kmetrics/sec

15

Page 17: Real-Time Fraud Detection with Storm and Kafka

GLOBALCONFIGURATION

16

EAST COAST EUROPEDC-X

Stream Mirror

Stream Mirror

Stream Mirror

Kafka Backbone

Spark

CENTRAL

Hadoop

Page 18: Real-Time Fraud Detection with Storm and Kafka

LESSONSLEARNED

•  Usestagedroll-out-  Startfromminimalinfrastructureforlogsdelivery

•  Donottrytobuildafortress-  ItismucheasiertobuildasystemsaccepVnglimiteddataloss

•  Minimizepersistentstate-  Slowssystemdown-  Expensivetomaintain

•  Hardwaremagers

17