building a machine learning app with aws lambda

Post on 08-Feb-2017

2.107 Views

Category:

Software

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

BUILDINGAMACHINELEARNINGAPPLICATIONWITHAWSLAMBDA

Ludi Rehakludi@h2o.ai

SiliconValleyBigDataScienceMeetupMarch17,2016

(+helpfromTomandPrithvi)

BUILDINGA MACHINE LEARNINGAPPLICATIONWITHAWSLAMBDA

Q: WhatisAWSLambda?A: AWSLambda isacomputeservicethatrunscode–aLambdafunction- on-demand.Itsimplifiestheprocessofrunningcodeinthecloudbymanagingcomputeresourcesautomatically.

OffloadsDevOps tasksrelatedtoVMs:• Serverandoperatingsystemmaintenance• Capacityprovisioning• Scaling• Codemonitoringandlogging• Securitypatches

MAJORSTEPS

Step1:IdentifyproblemtosolveStep2: TrainmodelondataStep3: ExportthemodelasaPOJOStep4:WritecodeforLambdahandlerStep5: Builddeploymentpackage(.zipfile)and

uploadtoLambdaStep6: MapAPIendpointtoLambdafunctionStep7:Embedendpointinapplication

ACONCRETE USECASE: DOMAINNAMECLASS IFICATION

Maliciousdomains• Carryoutmaliciousactivity- botnets,phishing,malwarehosting,etc

• Namesaregeneratedbyalgorithmstodefeatsecuritysystems

Goal:Classifydomainsaslegitimatevs.malicious

Legitimate Malicioush2o zyxgifnjobqhzptuodmzov

zen-cart c3p4j7zdxexg1f2tuzk117wyzn

fedoraforum batdtrbtrikw

FEATURES

• Stringlength• ShannonEntropy

oMeasureofuncertaintyinarandomvariable

• NumberofsubstringsthatareEnglishwords• Proportionofvowels

DATA

• Domainsandwhethertheyaremaliciouso http://datadrivensecurity.info/blog/data/2014/10/legit-dga_domains.csv.zip

o 133,927 rows• Englishwords

o https://raw.githubusercontent.com/dwyl/english-words/master/words.txt

o 354,985rows

MODELINFORMATION

MaliciousDomainModel

Algorithm: GLMModelfamily: BinomialRegularization: RidgeThreshold(maxF1): 0.4935

Class 0 1 Error

0 15889 315 FPR0.0194

1 346 10043 FNR0.0333

Confusion matrix on validation data

Actual

Predicted

WORKFLOWFORTHISAPP

Inputdomainname

GetPredictions

MaliciousDomain?

Visitwebpage

Malicious Legitimate

Yes No

APPARCHITECTUREDIAGRAM

RESTendpoint

JavaScriptApp

Lambda

JythonFeatureMunging

LambdaFunctionHandler

H2OModelPOJO

Prediction

HTTPS POST

domain name

JSONwith

prediction

LAMBDAFUNCTIONHANDLER

publicstaticResponseClass myHandler(RequestClassrequest,Contextcontext)throwsPyException {

PyModule module=newPyModule();

//Predictioncodeisinpymodule.pydouble[]predictions=module.predict(request.domain);returnnewResponseClass(predictions);}

RESTendpoint

JythonFeatureMunging

LambdaFunctionHandler

H2OModelPOJO

Prediction

JYTHONFEATUREMUNGING

def predict(domain):domain=domain.split('.')[0]row=RowData()functions=[len,entropy,p_vowels,num_valid_substrings]eval_features =[f(domain)forfinfunctions]names=NamesHolder_MaliciousDomainModel().VALUESbeta=MaliciousDomainModel().BETA().VALUESfeature_coef_product =[beta[len(beta)- 1]]fori inrange(len(names)):row.put(names[i],float(eval_features[i]))feature_coef_product.append(eval_features[i]*beta[i])

#predictionmodel=EasyPredictModelWrapper(MaliciousDomainModel())p=model.predictBinomial(row)

RESTendpoint

JythonFeatureMunging

LambdaFunctionHandler

H2OModelPOJO

Prediction

H2OMODEL POJO

• staticfinalclassBETA_0implementsjava.io.Serializable {staticfinalvoidfill(double[]sa){sa[0]=1.49207826021648;sa[1]=2.8502716978560194;sa[2]=-8.839804567200542;sa[3]=-0.7977065034624655;sa[4]=-14.94132841574946;}}

RESTendpoint

JythonFeatureMunging

LambdaFunctionHandler

H2OModelPOJO

Prediction

HANDS-ONDEMONSTRATION

STEP1:Build$git clonehttps://github.com/h2oai/app-malicious-domains$cdapp-consumer-loan$gradle wrapper$./gradlew build

STEP2:CreateLambdafunctionandsetAPIendpointSeeinstructionsandscreenshotsinREADME.md

STEP3:Usetheappinawebbrowser$./gradlew jettyRunWar –xgenerateModelhttp://localhost:8080

TROUBLESHOOTING

• CommonPy errorso AnotherH2Oisalreadyrunning

• Py scriptcan’tfindthedatainh2o.import_file()• CommonJavaerrors

o Javanotinstalledatall• Also,mustinstallaJDK(JavaDevelopmentKit)sothattheJavacompileris

available(JREisnotsufficient)o Notconnectedtotheinternet

• Gradle needstofetchsomedependenciesfromtheinternet• CommonLambdaerrors

o Errorinuploading.zipfile• Checkifthefunctionalreadyexistsand,ifnot,tryagain.Forslowerinternet

connections,tryuploading.zipfilewithS3link.o TimeouterrorwhentestingLambdafunction

• GotoadvancedsettingsandincreaseTimeoutvalueo GatewayTimeout(504error)

• ThisisLambda’scoldstartbehavior.Keeptrying,eventuallyLambdakicksin

CAVEATS

• Statelesso Canaccessstateful databycallingotherwebservices,suchasAmazonS3orAmazonDynamoDB.

• Coldstartbehavioro containersareinstantiatedandreusedafterthefirstrequestandstayactiveforawindowoftime(10-20minutes)

o “thelongerIleaveitbetweeninvocations,thelongerthefunctiontakestowarmup”

• APIGatewaytimeoutof10secso Canrequestlongertimeout

CONFIGURINGLAMBDAFUNCTIONS

• Memoryo AllocatesproportionalCPUpower,networkbandwidth,anddiskI/O

o Easysingle-dialsolutiono Logshowshowmuchmemorywasusedfortuningandcostsavings

• Timeout

LAMBDARESOURCEL IMITS

Resource DefaultLimit

Memory 512MB

Numberof filedescriptors 1,024

Numberofprocessesandthreads(combined total)

1,024

Maximumexecutiondurationperrequest 300seconds

Invoke requestbodypayloadsize 6MB

Invoke responsebodypayloadsize 6MB

Concurrentexecutionsperregion 100

Item DefaultLimit

Lambdafunction deploymentpackagesize(.zip/.jarfile)

50MB

Sizeofcode/dependencies thatyoucanzipintoadeploymentpackage(uncompressed zip/jarsize)

250MB

LAMBDAPRICING

• Lambdao Requests

• First1millionpermontharefree• $0.20per1millionrequeststhereafter

o Duration• First400,000GB-secondsofcomputetimepermontharefree• $0.00001667foreveryGB-second thereafter

• APIGatewayo $3.50permillionAPIcallsreceivedplusdatatransfercosts

• EstimateforMaliciousDomainApplication:• Lambda:$0.37/hourwith10threadsafterfree-tier• APIGateway:$0.71/hour• Total:~$1/hr

LAMBDAPERFORMANCE

Memory(MB) Threads Loops Samples Median

(ms)Min(ms)

Max(ms)

%Error

Throughput(calls/sec)

512 1 10000 10000 102 85 2137 0 8.4

512 10 1000 10000 102 85 30330 0.18 44

512 100 100 10000 149 85 30307 0.43 168

LAMBDASCALING

• Automaticallyscalestosupporttherateofincomingrequests

• “Nolimittothenumberofrequestsyourcodecanhandle”

• StartsasmanyinstancesofLambdafunctionasneeded

RELATEDEXAMPLES

• H2OGeneratedModelPOJOinaJavaServletcontainero Github:h2oai/app-consumer-loan

• H2OGeneratedModelPOJOinaStormbolto GitHub:h2oai/h2o-world-2015-trainingo tutorials/streaming/storm

• H2OGeneratedModelPOJOinSparkStreamingo GitHub:h2oai/sparkling-watero examples/src/main/scala/org/apache/spark/examples/h2o/CraigslistJobTitlesStreamingApp.scala

RESOURCESONTHEWEB

• Slideso GitHub h2oai/h2o-tutorials/tree/master/tutorials/aws-lambda-app

• Sourcecodeo GitHub h2oai/app-malicious-domains

• LateststableH2OforPythonreleaseo http://h2o.ai/download/h2o/python

• GeneratedPOJOmodelJavadoco http://h2o-release.s3.amazonaws.com/h2o/rel-turan/3/docs-

website/h2o-genmodel/javadoc/index.html

• AWSLambdao http://docs.aws.amazon.com/lambda/latest/dg/welcome.html

Q&A

• Thanksforattending!

• Sendfollowupquestionsto:

Ludi Rehakludi@h2o.ai

top related