paypal's fraud detection with deep learning in h2o world 2014
TRANSCRIPT
Outline(About(PayPal(
Fraud(Preven3on(@(PayPal(
Fraud(Preven3on(Dilemma(&(Solu3on((Deep(Learning)(
Experimental(Setup(
Results(
Conclusions(
About PayPal Unmatched Competitive Advantage
+150M Active Digital Wallets
Deep Relationships Core Competency In Risk
Global Platform with Huge Momentum
4�3�2�1�
143M
2013
2012
123M
PAYMENT CODE� WEARABLE TECH�
QR scanning that generates a payment code for easy check out
Fully able to integrate with existing POS systems; no rip & replace
Available in select markets today
Payments on any type of mobile device
Available in select markets today
About PayPal Innovative leader in payment…
Fraud(Preven3on(@(PayPal(StateDofDthe(art(feature(engineering,(machine(learning(and(sta3s3cal(models(
Highly(scalable(and(mul3Dlayered(infrastructure(soIware((
Superior(team(of(data(scien3sts,(researchers,(financial(and(intelligence(analysts(
Fraud(Preven3on(@(PayPal(• Employs(stateDofDthe(art(machine(learning(and(sta3s3cal(models(to(flag(fraudulent(behavior(upDfront(
• More(sophis3cated(algorithms(aIer(transac3on(is(complete(
Transac3on(Level(
• Monitor(account(level(ac3vity(to(iden3fy(abusive(behavior(
• Abusive(paPern(include(frequent(payments,(suspicious(profile(changes(
Account(Level(
• Monitor(accountDtoDaccount(interac3on(• Frequent(transfer(of(money(from(several(accounts(to(one(central(account((
Network(Level(
Fraud(Preven3on(Dilemma(Fraudsters(are(becoming(increasingly(smarter(and(adap3ve(
Need(costDeffec3ve(solu3ons(that(can(model(complex(aPack(paPerns(not(previously(observed(((
Need(scalable(and(computa3onally(efficient(predic3on(models(
Fraud(Preven3on(Dilemma(Solu3on:(Deep(Learning(• Helps(to(unearth(lowDlevel(complex(abstrac3ons(• Helps(to(learn(complex(highly(varying(func3ons(not(present(in(the(training(examples(
• Widely(employed(for(image,(video(processing(and(object(recogni3on(
Why(Deep(Learning?(
• Highly(scalable(• Superior(performance(• Flexible(deployment(• Work(seamlessly(with(other(big(data(frameworks(• Simple(interface(
Why(H2O?(
Experiment(• Dataset(
– 160(million(records(– 1500(features((150(categorical)(– 0.6TB(compressed(in(HDFS(
• Infrastructure(– 800(node(Hadoop((CDH3)(cluster(
• Decision(– fraud/notDfraud(
Experiment(
R(
H2O(Mapper(
HDFS( HDFS(
• Setup(– 800(node(Hadoop(
(CDH3)(cluster(– R(as(a(client(
H2O(Mapper(
• H2O(cloud(forma3on(failed(– H2O(mapper(needs(
memory(upfront(– Cluster(capacity(
limita3ons(
Experiment(
R(
H2O(Cloud(
HDFS( HDFS(
• Setup(– 800(node(Hadoop(
(CDH3)(cluster(– 5(node(H2O(cloud((24(
CPUs;(144GB(RAM)(– R(as(a(client(
H2O(Cloud(
• Import(failed(– Data(snappy(
compressed(
Experiment(
R(
H2O(Cloud(
HDFS( HDFS(
• Setup(– 800(node(Hadoop(
(CDH3)(cluster(– 5(node(H2O(cloud((24(
CPUs;(144GB(RAM)(– R(as(a(client(– GZIP’ed(data(
H2O(Cloud(
• Import(too(slow(– 1GB/hour(– Not(parallelized(
Experiment(
R(
H2O(Cloud(
HDFS( HDFS(
• Setup(– 800(node(Hadoop((CDH3)(
cluster(– 5(node(H2O(cloud((24(
CPUs;(144GB(RAM)(– R(as(a(client(– GZIP’ed(data(– Cliff’s(fix((1(GB(from(1(
hour(to(10(minutes)(H2O(Cloud(
• Deep(Learning(failed(– Skipping(rows(if(it(had(
missing(values(– 99%(of(rows(had(missing(
values(
Experiment(
R(
H2O(Cloud(
HDFS( HDFS(
• Setup(– 800(node(Hadoop((CDH3)(
cluster(– 5(node(H2O(cloud((24(
CPUs;(144GB(RAM)(– R(as(a(client(– GZIP’ed(data(– Cliff’s(fix((1(GB(from(1(
hour(to(10(minutes)(– Arno’s(fixes(
H2O(Cloud(
• Deep(Learning(slow(
Experiment(
R(
H2O(Cloud(
HDFS( HDFS(
• Setup(– 800(node(Hadoop((CDH3)(
cluster(– 5(node(H2O(cloud((24(
CPUs;(144GB(RAM)(– R(as(a(client(– GZIP’ed(data(– Cliff’s(fix((1(GB(from(1(
hour(to(10(minutes)(– Arno’s(fixes(&(sugges3ons(– Reduced(data(
• 10(million(rows((60%(training;(20%(valida3on;(20%(test)(
H2O(Cloud(
Experimental(Design(
Parameter' Range'
#(of(hidden(layers( (2,(4,(6,(8(
#(of(neurons( 200,(300,(400,(500,(600,(700(
ac3va3on(func3on( Rec3fier;(Tanh;(Maxout;(Rec3fierWithDropout(
feature(subset( All,(subset1(–(subset7(
test(data(set( All,(week4(–(week8(
L1/L2(regulariza3on( 0(D(1(
epoch( 500(
10(million(rows/1500(features((60%(training;(20%(valida3on;(20%(test)(((
Results(
#'of'hidden'layers'(Rec6fier,'2'layer,'200'neurons,'500'epoch,''L1/L2'='0)'
Area'Under'ROC'Curve'(AUC)''
2( 0.762(
4( 0.821(
6( 0.839(
8' 0.839'
How(much(depth(is(required?(
Best(performance(with(6(layers(
Results(
Ac6va6on'func6on'(6'layers;'600'neurons)'
AUC'
Tanh( 0.801(
Rec3fier( 0.856(
Maxout( 0.826(
Rec6fierWithDropout' 0.865'
Which(ac3va3on(func3on(produces(best(result?(
Best(performance(with(
Rec3fierWithDropout(
Results(
Feature'subset' AUC'
subset1( 0.836(
subset2( 0.847(
subset3' 0.849'
subset4( 0.844(
subset5( 0.834(
subset6( 0.786(
subset7' 0.751'
Which(subset(of(features(produces(best(result?(
Best(performance(with(subset3;(
Worst(for(subset7((2/3rd(less(feature)(
Results(
Epoch:'500'Hidden:'2'layers'Neurons:'200'each'layer'Subset7''AUC'
Epoch:'500'Hidden:'6'layers'Neurons:'600'each'layer'Subset7''AUC'
0.751( 0.86(
Can(deep(network(improve(subset7?(
11%(improvement(in(performance((with(1/3rd(of(the(feature(
set(
Results(
Test'Set' AUC'
Week(4( 0.856(
Week(8( 0.861(
Week(12( 0.852(
Week(16( 0.858(
Week(20( 0.853(
Is(deep(learning(temporally(robust?(
Performance(within(1%(difference(upto(20(
weeks(
Conclusions(• Deep(Learning(using(H2O(is(beneficial(for(payment(fraud(
preven3on(– Network(architecture(D(6(layers(with(600(neurons(each(performed(the(
best(– Ac3va3on(func3on((D(Rec3fierWithDropout(performed(the(best(– Improved(performance(with(limited(feature(set(&(a(deep(network(
(11%(improvement(with(a(third(of(the(original(feature(set,(6(hidden(layers,(600(neurons(each)(
– Robust(to(temporal(varia3ons(
Conclusions(• Lessons(learned(in(using(H2O(
– Slow(import(process((– Issues(with(compressed(data,(missing(values,(sparse(data(– Require(knowledge(of(performance(knobs(– Fantas3c(support(from(H2O(team(
• Next(Steps(– Mul3Dclass(classifica3on(– Produc3onalize(