machine learning with r and zeppelin on oracle big data ... › otndocs › products › ...spark...

49
!"#$%&'() + ,-./0 1%2345 2678"% &)9 2::&4&2)59; <44 %&'()9 %595%=57; > ?23(&65 @52%6&6' A&)( B 267 C5##54&6 "6 1%2345 D&' E2)2 F"4G)&"69 !"#$%&'%( ?2%3"9 <%263&H&2 I%"7G3) ?262'5% E2)2 F3&5635 267 D&' E2)2 13)"H5% ,J0 ,-./

Upload: others

Post on 27-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0 1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**>

?23(&65*@52%6&6'*A&)(*B*267*C5##54&6*"6*1%2345*D&'*E2)2*F"4G)&"69!"#$%&'%(

?2%3"9*<%263&H&2I%"7G3)*?262'5%E2)2*F3&5635*267*D&'*E2)213)"H5%*,J0*,-./

Page 2: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.| 2

SafeHarbor StatementThefollowingisintendedtooutlineourgeneralproductdirection.Itisintendedforinformationpurposesonly,andmaynotbeincorporatedintoanycontract.Itisnotacommitmenttodeliveranymaterial,code,orfunctionality,andshouldnotberelieduponinmakingpurchasingdecisions.Thedevelopment,release,andtimingofanyfeaturesorfunctionalitydescribedforOracle’sproductsremainsatthesolediscretionofOracle.

Page 3: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**>

1%2345*D&'*E2)2*?262'5%

J

Page 4: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**>

!"635#)G24

Q

F)%52L&6'*R6'&65 E2)2*@2O5 R6)5%#%&95*E2)2*S*B5#"%)&6'

E&93"=5%$*@2H

N6#G)R=56)9

RT53G)&"6

N66"=2)&"6

E&93"=5%$*1G)#G)

E2)2

F)%G3)G%57R6)5%#%&95*E2)2

<3)&"62H45R=56)9

<3)&"62H45?5)%&39

<3)&"62H45E2)2*F5)9

Page 5: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**>

I%23)&324

U

<3)&"62H45R=56)9

F)%52L&6'*R6'&65 E2)2*@2O5 R6)5%#%&95*E2)2*S*B5#"%)&6'

E&93"=5%$*@2H

<3)&"62H45?5)%&39

<3)&"62H45E2)2*F5)9

N6#G)R=56)9

RT53G)&"6

N66"=2)&"6

E&93"=5%$*1G)#G)

E2)2

F)%G3)G%57R6)5%#%&95*E2)2

V")5H""O98<624$)&3*F5%=&359

1HW53)*F)"%5 K27""#8KEXF

Page 6: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**>

D&'*E2)2*?262'5%! N634G757 A&)(*244*D&'*E2)2*"::5%&6'9*YDE<0*DE!!*267*DE!FZ! R62H459*L299&=5*#2%24454*3"#$*":*72)2*45=5%2'&6'*<#23(5*F#2%O" KEXF*[\\] KEXF" KEXF*[\\]*1HW53)*F)"%2'5*Y!4"G7Z" X&45*E&::\&6' 267*3(53O&6'*2:)5%*3"#&59

! DG&47*267*L262'5*#&#54&659

! RLH57757*C5##54&6*V")5H""O" <624$^5*72)2*&69)26)4$" <624$^5*2)*93245*A&)(*1%2345*B*<7=26357*<624$)&39*:"%*K27""#

_

Page 7: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**>

D&'*E2)2*?262'5%*` ?"=&6'*E2)2*)"*E&93"=5%$*@2H*! N634G757 A&)(*244*D&'*E2)2*"::5%&6'9*YDE<0*DE!!*267*DE!FZ! R62H459*L299&=5*#2%24454*3"#$*":*72)2*45=5%2'&6'*<#23(5*F#2%O" KEXF*[\\] KEXF" KEXF*[\\]*1HW53)*F)"%2'5*Y!4"G7Z" X&45*E&::\&6' 267*3(53O&6'*2:)5%*3"#&59

! DG&47*267*L262'5*#&#54&659

! RLH57757*C5##54&6*V")5H""O" <624$^5*72)2*&69)26)4$" <624$^5*2)*93245*A&)(*1%2345*B*<7=26357*<624$)&39*:"%*K27""#

a

F55*@&=5*E5L"9*":*D&'*E2)2*?262'5%b

<G)"6"L"G9*KGH*` D&'*E2)2*!4"G7*F5%=&35

?"93"65*F"G)(Y?"672$*` c5765972$Z

Page 8: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**> /

Page 9: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**> d

X&45*H%"A95%*562H459*954:\95%=&359*72)2*L"=5L56)*:%"L*:"%*5T2L#45*KEXF*)"*1HW53)*

F)"%2'5

Page 10: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**> .-

NL#"%)26)4$*)(&9*7%2'S7%"#"%*3"#$*&9*)G%657*&6)"*2*F#2%O*#%"'%2L0*A(&3(*&9*5T53G)57*

"%*93(57G457

Page 11: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**>

?23(&65*@52%6&6'*"6*D&'*E2)2

..

Page 12: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**>

e"G%*E&93"=5%$*@2H*&6*M"72$P9*!4"G7*c"%47

.,

K27""#8KEXF

M(5*529&59)*A2$*)"*HG&47*"G)*2*42H*&9*)"*45=5%2'5*9"L5*

O6"A6*H29&39*267*34G9)5%&6'*)"*%G6*W"H9*&6*#2%24454

I&3O*$"G%*:2="%&)5*6")5H""O*56=&%"6L56)*267*9)2%)*)"*3"75*&6*(5%5*2'2&69)*$"G%*

2624$)&39*4&H%2%&59

f95*4&H%2%&59*4&O5*B0*M569"%:4"A 267*!2::5*:"%*$"G%*2624$)&39*267*?@*` &:*#"99&H45*

&6*#2%24454

Page 13: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**>

D&'*E2)2*?262'5%*A&)(*B*N6)5%#%5)5%<H&4&)$*)"*=&9G24&^5*95=5%24*9"G%359*:%"L*V")5H""O9*"6*DE<0*DE!F*267*DE!!

.J

)*+,-,./)0123456783

9+,6:2;,<,=6,*83./;=9

Page 14: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**>

<66"G63&6'b*1B<<K*,;/;-*:"%*F#2%O*,;T

.Q

Page 15: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|

WhatisORAAH(OracleRAdvancedAnalyticsforHadoop)• ORAAHisasetofRpackagesandJavalibrariesthatprovide:– AnRinterfaceformanipulatingdatastoredinalocalFileSystem,HDFS,HIVE,ImpalaorJDBCsources,andcreatingDistributedModelMatricesacrossaClusterofHadoopNodesinpreparationforML.– Ageneralcomputationframeworkwhereusersinvokeparallel,distributedMapReducejobsfromR,writingcustommappersandreducersinRwhilealsoleveragingopensourceCRANpackages.– ParallelanddistributedMachineLearningalgorithmsthattakeadvantageofallthenodesofaHadoopclusterforscalable,highperformancemodelingonbigdata.FunctionsusetheexpressiveRformulaobjectoptimizedforSparkparallelexecution.–ORAAH'scustomLM/GLM/MLPNNalgorithmsonSparkscalebetterandrunfasterthantheopen-sourceSparkMLlib functions,butORAAHprovidesinterfacestoMLlibaswell.

15

Page 16: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|

WhereisORAAHavailable?

16

• Onpremises:–PartoftheOracleBigDataConnectors licensefortheOracleBigDataAppliance,DIYClouderaclustersandDIYHortonworksclusters.

• OnOracleCloud:–PartoftheOracleBigDataConnectorslicensethatisincludedwiththeOracleBigDataCloudService andtheOracleBigDataCloudatCustomer– IncludedaspartoftheBigDataCloud (formerlyknownasComputeEdition)

Page 17: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|

ORAAHBenefits:MakingSparkMLlibbetterforRusersORAAHFormulaparsercanhandlethefullsetofopen-sourceRformulatransformations,soitcanbeusedwithanySparkMLlibalgorithmsupportedbyORAAH. EveninnewerSparkreleases(Oct2018)SparkRfailstoprocessasimpleinteractionbetweenattributes.

UsingSparkMLlib LogisticRegressionmodelinSparkRfails:R> model <- glm( Kyphosis ~ (Age + Number)^2, df, family = "binomial")ERROR RBackendHandler: fitRModelFormula on org.apache.spark.ml.api.r.SparkRWrappers failedError in invokeJava(isStatic = TRUE, className, methodName, ...) :java.lang.IllegalArgumentException: Could not parse formula: Kyphosis ~ (Age + Number)^2

UsingSparkMLlibLogisticRegressionmodelviaORAAH…

R> model <- orch.ml.logistic( Kyphosis ~ (Age + Number)^2, data = data)OBX Model Matrix: processed 1 factor variables, 0.050 secOBX Model Matrix: created MLlib LabeledPoint RDD (81 rows) 0.008 secOBX Machine Learning: MLlib Logistic Regression elapsed time 0.858 secR> model$coefficients[1] -6.568918 0.027176503 1.022537535 -0.004490547

…producesthesameexactresultfromopen-sourceR

glm( Kyphosis ~ (Age + Number)^2, data = kyphosis, family = "binomial")$coefficients(Intercept) Age Number Age:Number-6.568917860 0.027176503 1.022537536 -0.004490547

17

Page 18: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|

Pythonusersteps– 47lines ORAAHusersteps– 14lines

18

ORAAHandPython:Simpleandcleancode:buildingaSparkMLlibRandomForestmodelfromHIVEsource

http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tableshttps://github.com/apache/spark/blob/master/examples/src/main/python/ml/random_forest_classifier_example.pyhttps://github.com/apache/spark/blob/master/examples/src/main/python/ml/rformula_example.py

LoadLibraries

ProcessFormula

EstablishSparkSession

CopydatafromHIVE

Create3rd copyofDataforvectors

BuildModel

SingleVectorofPredictions

LoadLibrariesEstablishHIVEandSparkSession

BuildModeldirectlyagainstHIVE(alsoHDFS,IMPALA,,JDBCorSparkDF)datawithfullformulasupport

Predictionsexportedwithdesiredcolumns,noneedto”glueback”original

columns

http://www.oracle.com/technetwork/database/database-technologies/bdc/r-advanalytics-for-hadoop/documentation/index.html

Page 19: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

Copyright©2018,Oracleand/oritsaffiliates.Allrightsreserved.|

Spark2.1+AlgorithmsbyOracle,interfacestoSparkMLlib,plusHIVE,ImpalaandSparkDFinterfacesRED indicatesnewinrelease2.8.0

MachineLearningAlgorithmsandUtilitiesinORAAH2.8.0

ExtremeLearningMachines(Oracle’sMPI/Spark-based)Hierarchical-ELM(Oracle’sMPI/Spark-based)Multi-LayerNeuralNets(Oracle’sSpark-based)LogisticRegression(Oracle’sSpark-based)GradientBoostedTrees(SparkMLlib)LogisticRegression(SparkMLlib)DecisionTrees(SparkMLlib)RandomForest(SparkMLlib)

RegressionMulti-LayerNeuralNets(Oracle’sSpark-based)LinearRegressionModel(Oracle’sSpark-based)GradientBoostedTrees(SparkMLlib)LinearRegressionModel(SparkMLlib)SupportVectorMachine(SVM)(SparkMLlib)LASSO(SparkMLlib)RidgeRegression(SparkMLlib)RandomForest(SparkMLlib)DecisionTrees(SparkMLlib)

Hierarchicalk-Means(SparkMLlib)GaussianMixtureModels(SparkMLlib)Hierarchicalk-Means(alsoavailableinMap-Red)

FeatureExtraction&CreationDistributedStochasticPCA(Oracle’sMPI/Spark-based)DistributedStochasticSVD(Oracle’sMPI/Spark-based)PrincipalComponentAnalysis(SparkMLlib)NonnegativeMatrixFactorization(Map-Red)LowRankMatrixFactorization(Map-Red)

Classification Clustering

AbilitytorunanyRpackageviaourhadoop.runfunctioninMap-Reducemode

OpenSourceRAlgorithms

TransparencyFunctionswithIMPALA andHIVEAggregations,TableJoins,summarizationVariableCreation,Push&PulldatafromIMPALA andHIVEAbilitytopushandpulldatafromOracleDatabaseJDBCDriverinterface- buildSparkDataFrames forORAAH

Page 20: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**>

1B<<Kb*f#*)"*aT*)(5*#5%:"%L2635*":*F#2%O*?@4&H :"%*E2)2*)(2)*:&)9*&6*L5L"%$llHG)*249"*2H45*)"*9"4=5*2*.-H&*%"A*L"754*YA(&3(*3266")*:&)*56)&%54$*&6*L5L"%$Z*

,-

<44*)59)9*%G6*"6*2*_\V"75*D&'*E2)2*<##4&2635*qa\,*A&)(*,U_jD*":*B<?*#5%*V"75X"%LG42b*326354457*r*7&9)2635*s*"%&'&6*s*759) s*29;:23)"%YL"6)(Z*s*29;:23)"%Y$52%Z*s*29;:23)"%Y72$":L"6)(Z*s*29;:23)"%Y72$":A55OZ*s*29;:23)"%Y:4&'()6GLZ

Page 21: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**>

1%2345*D&'*E2)2*9"4G)&"69*3"L5*A&)(*"#)&L&^57*95))&6'9*:"%*42%'5\93245*?23(&65*@52%6&6'*)%2&6&6'*267*93"%&6'b*!4299&:&32)&"6*RT2L#459*":*.D&*%"A9

,.

Page 22: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**>

1B<<KP9*V5A*E&9)%&HG)57*FhED563(L2%O*":*1B<<KP9*YF#2%Os?INZ*=9*F#2%O*?44&Hb*_T*:29)5%*s*4&652%*93245*G#

,,

B26O ;U90;29+,6:MQ")VWI

QR-?O 90;P2MF8<-?OUX,Y,

.-- HT Qd.

.U- I&Z a-a

,-- IZ' d/,

F&6'G42%*h53)"%*E53"L#"9&)&"6*,-O*T*,-O*75695*&6#G)*YJ;,jHZ.-*)(%5279*Y.,*23)G24*3"%590*\qLTQ-jHZ 9+,6:2QR-?O

$#AA/29+,6:MQ")

Page 23: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**>

V5A*1B<<K*f)&4&)$*:G63)&"69*:"%*E2)2*I%"3599&6'*267*N6'59)[;\]2,FJ29+,6:2;,<,2=6,*83

,J

Page 24: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**>

1B<<K*,;/;-*\ V5A*F#2%O*EX*72)2*L26&#G42)&"6*:G63)&"69

,Q

Page 25: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**>

1B<<K*,;/;-*\ V5A*F#2%O*EX*72)2*L26&#G42)&"6*:G63)&"69

,U

467EL73YKJ>N6)5%#%5)9*2*!Fh*:&45*267*4"279*&)*&6*L5L"%$*&6)"*2*F#2%O*EX;*F"G%35*:&459*2%5*@"324*:&45*9$9)5L*"%*KEXF

Page 26: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**>

1B<<K*,;/;-*\ V5A*F#2%O*EX*72)2*L26&#G42)&"6*:G63)&"69

,_

467ELJ8376?O8j565%2)59*2*9&L#45*9GLL2%$*":*)(5*&6:"%L2)&"6*&6*2*F#2%OEX

Page 27: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**>

1B<<K*,;/;-*\ V5A*F#2%O*EX*72)2*L26&#G42)&"6*:G63)&"69

,a

467EL74--87<D%&6'9*2*F#2%OEX &6)"*BP9*4"324*L5L"%$*:"%*:G%)(5%*L26&#G42)&"6*"%*%59G4)*#%&6)&6'

Page 28: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**>

1B<<K*,;/;-*\ V5A*F#2%O*EX*F324&6'*:G63)&"6

,/

Page 29: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**>

1B<<K*,;/;-*\ V5A*F#2%O*EX*F324&6'*:G63)&"6

,d

467EL37,-8F32459*2*F#2%O*EX*G9&6'*"65*":*.U*7&::5%56)*)53(6&mG59;**!%&)&324*:"%*L26$*?@*24'"%&)(L9*9569&)&=5*)"*932458"G)4&5%9

Page 30: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**>

1B<<K*,;/;-*\ V5A*F#2%O*EX*F324&6'*:G63)&"6

J-

^3E4SF#2%O*EX*2335#)*L26$*%g2=2 75:2G4)*:G63)&"690*:%"L*9("AYZ*)"*3"G6)YZ

Page 31: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**>

V5A*gED!*&6)5%:235! !%52)59*2*F#2%O*E2)2X%2L5 :%"L*2*gED!*9"G%35*)(2)*326*H5*G957*"6*26$*":*1B<<KP9*F#2%O\H2957*?@*24'"%&)(L9;

J.

Page 32: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**>

?"754*#5%:"%L2635*92L#45*:"%*<449)2)5*I%57&3)&"6*!(24456'5D2426357*92L#45*"G)*":*)(5*"%&'&624*.J;QL&*%53"%790*_-t8Q-t*9#4&)*:"%*M59)&6'

J,

$#AA/29+,6:MQ") 1RQ! <4L"9)*29*'""7*29*)(5*

H59)*?@I! JT*:29)5%*)"*HG&47

$#AA/29+,6:2_RQK! X29)59)*)"*DG&47*267*

F3"%5

Page 33: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**>

?"754*#5%:"%L2635*92L#45*:"%*<449)2)5*I%57&3)&"6*!(24456'5D2426357*92L#45*"G)*":*)(5*"%&'&624*.J;QL&*%53"%790*_-t8Q-t*9#4&)*:"%*M59)&6'

JJ

Page 34: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**>

@&=5*E5L"D&'*E2)2*?262'5%*V")5H""O9`46:?FN2S?<E2R,6N829+,6:2]-53<863

Page 35: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**> JU

Page 36: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**> J_

Page 37: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**> Ja

Page 38: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**> J/

Page 39: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**> Jd

Page 40: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**> Q-

Page 41: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**> Q.

Page 42: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**> Q,

Page 43: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**> QJ

Page 44: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**> QQ!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**> QQ

Page 45: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**> QU!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**> QU

Page 46: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**>

E55#*@52%6&6'*` L")&=2)&"6*:"%*)(5*:G)G%5

Q_

B5'%599&"6E53&9&"6*M%559

())#b88AAA;L4$52%6&6';"%'8

Page 47: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**>!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**>

34"G73G9)"L5%3"6653);"%2345;3"L

Page 48: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark

!"#$%&'()*+*,-./0*1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**>

Page 49: Machine Learning with R and Zeppelin on Oracle Big Data ... › otndocs › products › ...Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark