streaming with oracle data integration
TRANSCRIPT
StreamingTransformationsUsingOracleDataIntegration
MichaelRainey|BIWASummit2017
• MichaelRainey-TechnicalAdvisor• SpreadingthegoodwordaboutGluentproductswiththeworld
• OracleDataIntegrationexpertise• OracleACEDirector• mRainey.co
2
Introduction
we liberate enterprise data
Whatis“Streaming”
• Theprocessingandanalysisofstructuredor“unstructured”datainreal-time
• WhyStreaming?• Whenspeed(velocity)ofdataiskey• Streamingdataisprocessedin“timewindows”,inmemory,acrossaclusterofservers
• Examples:• Calculatingaretailbuyingopportunity• Real-timecostcalculations• IoTdataanalysis
4
Whatis“Streaming”
“Publish-subscribemessagingrethoughtasadistributedcommitlog”
5
Streamingdata-ApacheKafka
Image source: kafka.apache.org/
EnterpriseDataBus
6
EnterpriseDataBus
6
• Scalable,fault-tolerant,high-throughputstreamprocessing• SparkStreamingreceivesliveinputdatastreamsfromvarioussources• ContinuousstreamofdataisknownasadiscretizedstreamorDStream
• Dataisdividedintomini-batchesandprocessedbytheSparkengine• Operationssuchasjoin,filter,map,count,windowedcomputations,etcareusedtotransformdatain-flight
7
Streamprocessing-ApacheSpark
WhyOracleDataIntegration?
• EnterprisehasinvestedheavilyinODIand/orGoldenGate
• Gettingstartedwithdevelopmentlanguages(Python/pySpark,Java,etc)
• Centralizedmetadatamanagement• Integratewithotherdatasourcesusingasingleinterface
• Realizedcostsavings• AccordingtoGartner,200%increaseinmaintenancecostswhencustomcoding(https://www.gartner.com/doc/3432617/does-customcoded-data-integration-stack)
9
WhyOracleDataIntegration?
10
StreamingwithOracleDataIntegration
10
StreamingwithOracleDataIntegration
Real-timedatareplication
Streamingintegration:OGG->Kafka
Streamingintegration:Kafka->SparkStreaming
11
RelationaldatabasetransactionstoKafka
• GoldenGate• …isnon-invasive• …hascheckpointsforrecovery• …movesdataquickly• …iseasytosetup
12
WhyGoldenGatewithKafka?
• Heterogeneoussourcesandtargets• Builttointegratealldata
• Flexibility• Reusablecodetemplates(KnowledgeModules)
• ReusableMappings• ODIcanadapttoyourdatawarehouse-andnottheotherwayaround
• Flowbasedmappings
13
WhyOracleDataIntegratorwithSparkStreaming?
GettingstartedwithstreamingusingOracleDataIntegration
• StandardGoldenGateExtract/PumpprocessestocaptureRDBMSdata• ReplicatforJavaparameterfile&processgroupcreatedandsetup• KakfaProducerpropertiesandKafkaHandlerconfigurationsetup
15
OracleGoldenGateforBigData-KafkaHandlerSetup
• Kafkahandlerproperties• SetpropertiesforhowGoldenGateinteractswithKafka• Format,transactionvsoperationmode,etc
• Kafkaproducerconfiguration
16
GoldenGateforKafkasetup
http://mrainey.co/ogg-kafka-oow
17
KafkaandOracleDataIntegratorsetup
17
KafkaandOracleDataIntegratorsetup
• CreateModelusingKafkaLogicalSchema
• CreateDatastore• Similartostandard“File”datastore,definefileformatandsetupcolumns
• OnlysupportforCSV• FutureformatsmayincludeJSON,Avro,etc
• AddDatastoretomapping
18
KafkaandOracleDataIntegrator
• CreateSparkDataServer,Physical/LogicalSchema• SetHadoopDataServer• Addproperties,suchascheckpointing,asynchronousexecutionmode,etc• Additionalpropertiescanbeadded:http://spark.apache.org/docs/latest/configuration.html
• SparkServerissetupasStaginglocation• SourceDatastorefromKafka,OracleDB,etc• TargetDatastoreisCassandra,OracleDB,etc
• CodegeneratedbyKMispySpark• pySparkcodecanbeaddedtofilters,joins,othercomponentsfortransformations• Additionallanguages(Scala,Java)maybecomingsoon
19
SparkStreamingandOracleDataIntegrator
20
SparkStreamingandOracleDataIntegrator
EnabletheStreamingflaginthePhysicaldesignofamapping.
TogenerateSparkcode,settheExecuteOnHintoptiontousetheSparkdataserverasthestaginglocationforyourmapping
TargetIKMshouldnotbeset.Sparkgeneratedcodewillhandleintegrationandloadintotarget.
21
Trackingtheprocess
Whenexecuting,theprocesswillruncontinuouslyintheODIOperator.
IftheconnectionbetweentheODIAgentandSparkAgentislost,itwillreestablishitselfafterrecovery.
• Streamingisthe“velocity”indata.AKA“FastData”
• OracleDataIntegratorandOracleGoldenGateprovideaframeworkfordevelopmentandmanagementofdatastreamingprocesses• BigDataadd-onscontinuetosupportnewtechnologies
• BuildastreamingarchitectureusingGoldenGateandODI:• Metadatamanagement• IntegrationofRDBMSdatawith“schemaonread”data• Buildupontheskillsin-house
22
Recap
23
we liberate enterprise data
thank you!