eliminate traditional etl
TRANSCRIPT
2© 2018 Lore IO Inc.
Eliminate Traditional ETL
Data sources20 50 >100
ETL Engineer
Data unification
Effo
rt
3© 2018 Lore IO Inc.
Eliminate Traditional ETL
The race to unify multi-source datasets is on!
Foryears,companiesofallsizeswereencouragedtobuildtheirdatalakes.Manyheededthecall,builtmoderninfrastructuresandbroughtintheirdata.Somestruggledwiththeendresultandfoundthemselvessubmergedindataswamps.Butmanyotherswereabletoplowthrough,andarenowputtingtheirdatalakesintoactionthroughwideviewsofthedata.
Thereisanewrecognitionnowamongbusinessusersthatdifferentusecasescallfordifferent
numberofdatasources—aswellasstakeholders—throughouttheimplementation.Asmallteamwhowishestoviewsalesfunneldataposesconsiderablysmallerdataunificationburdenthanamarketplacethatmustaggregatedatafromthousandsofsources.
Data sources20 50 >100
Team Wide Enterprise Wide Industry Wide
Sales Funnel CRM, Sales, Marketing and Conversion data
Logistics TeamsSales, CRM, Ticketing, Delivery and revenue
Marketing TeamsCustomer 360 for targeting
Global View, Holding CompaniesPayments, Transactions, ERP, CRM
Finance TeamsMerge usage, spending with margins and revenue
Operation TeamsMerge cost and revenue with attribution across teams
IntermediariesMarketplaces, Exchanges,Payment providers, Trades
Customer/Partner OnboardingB2B CompaniesSolution ProvidersData Platform Builders
4© 2018 Lore IO Inc.
Eliminate Traditional ETL
Solving the multi-source conundrum
Totaptheirlargenumbersofsources,may
companiesturntotheirexistingETLtoolsonlyto
reachscalabilityissuesandlimits.Accordingtoa
2017Forresterresearchpaper,theData
Orchestrationisnowawhopping$60Bindustry.
Company-widedataorchestrationthatinvolves
hundredsofsourcesisbyitselfa$13Bmarket,
where85%ofbudgetsarespentonservices.
Suchoverrelianceonservicesisadirectoutcome
ofhavingtodevelopcustomsolutionsto
overcometraditionalETLlimitation.Such
servicesinvolveaheavydoseofcustom
programmingtounifyandorchestratemultiple
datasources.Thesecustomsolutionswilllikely
involveongoinginvestmentstokeepupwiththe
demandtounifyandorchestrateagrowing
numberofsourcesfornewerdatausecases.
85% of the budget is spent on services to custom-develop data orchestration solutions.
This is a fairly small portion.
Market size of Company-wide data orchestration
Services
Tools
$13.1B
5© 2018 Lore IO Inc.
Eliminate Traditional ETL
Scaling sources (and stakeholders) is hard
TurningtotheirexistingETLanddatapreparationtoolsthatservedthemwellinthe“pre-datalake”world,companiesarefindingoutthatwhilesuchtoolselegantlyhandleasmallnumberofdatasourcesandstakeholders,theybecomecumbersometouseandmaintainfornewusecasesthatinvolvemoresourcesandteams.
CurrentsolutionsrelyonprocedurallanguageswhereETLdevelopersmustcreatestep-by-stepworkflowfordataingestionandtransformations.
Themoresourcesabusinessseekstounify,themorecomplexarethemappings,andthemoreinter-connectedtheseworkflowsbecome.
Evennewsolutionsthatofferself-servecapabilitiesforbusinessusersinindividualteamsstruggletoperformassourcesgrowandalargernumberofteamsisinvolvedindesigningcomplexdatadefinitions.
Data sources20 50 >100
Existing ETL solutions do a great job here…
…but they struggle mightily here.
6© 2018 Lore IO Inc.
Eliminate Traditional ETL
Long implementations slow down the business
“80% of data scientists’ time is spent on data collection and preparation for analysis.”Forbes.com
“While the self-service data preparation market is growing 17% every year and is expected to become a billion-dollar market by 2019, it still seems to be one of the best kept secrets around.”Gartner Group
Astheefforttounifymoredatasourcesincreases,sodoestime-to-value.Businessuserswhoseektodevelopbroaderandmoremeaningfulviewsofcustomers,investments,andprograms,theymustlongerontheirITcounterpartstodigestthesenewdatarequests,figureouthowtoincorporatethemintotheexistingdatainfrastructure,designandimplementareasonablesolution,andtestandreleaseittothebusiness.Timeisaresourceinshortsupplyforbusinessteams.Theymustansweragrowingandmoredemandingquestionsinordertoremaincompetitiveinachaoticmarketplace.Whennewdataislatetoarrive,everyoneinthebusinessfeelstheimpact.
Thelackofreadilyavailabledatathatiscleanandtrustworthyhashugeimplicationsondataanalyticsanddatascienceteamsaswell.Severalstudiesshowthatapproximately80%ofanalysts’andscientists’timeisspentondatapreparation.
Thisenormouswasteofvaluableresourceshasalottodowiththegrowthofdatasources,therelianceonlessandlessstructureddatathatrequiresmorecleanup,andtheballooningvolumesofdataitself.Again,everyoneintheorganizationisimpacted.
7© 2018 Lore IO Inc.
Eliminate Traditional ETL
Data sources20 50 >100
Why do tools struggle with complex use cases?
ETLanddatapreparationtoolscaneffectivelyhandleafewsources,butusingtheminsupportofadataplatformthathandlesdozensorhundredsofsourcesisaherculeaneffort.That’sbecauseattheircore,ETLtoolsrelyonprocedurallogicthatmustbecodifiedinsomeprogramminglanguageortheETLtool’ssystem.Engineersmustdesign,codifyandmaintainproceduresthatdescribethewaydatawillbetransformedandmovedstepbystep.
Themoresourcesandusersanimplementationinvolves,themoretimeandeffortengineersmustinvesttodevelopandcoordinatethegrowingnumberoftransformationsteps.Theymustensurethatnewproceduresdon’tbreakorconflictwithpreviousproceduresandthatdatasetsareflowingintherightdirectionsandattherighttimes.
ETL Engineer
Data unification
Data
uni
ficat
ion
effo
rt
Is it too late to change careers?
8© 2018 Lore IO Inc.
Eliminate Traditional ETL
Five ways ETLs hold businesses back
Asnoted,whileETLsarecarriedoutbydataand
technicalpros,theirimpactisfeltacrossthe
entireorganization,particularlybybusiness
groupswhodependonreadilyavailabledatafor
effectivedecisionmakingandaction.Herearefive
repercussionsofrelyingonETLstorunthe
business:
1 BUSINESSPROJECTSARE
BLOCKEDORSLOWEDDOWN
Dependingonthecomplexityofthedata
preparationinvolved,ETLsmayrequireuptotwo
yearstothoroughlydocumentbusiness
requirements,developtheETLprocess,testthe
flow,fixanyissues,andavailthedatatothe
business.ETLprocessesbecomemarkedlymore
complexwhennewdataneedstobeloadedinto
theenterprisedatawarehouse(EDW);for
instance,pullingdatafromPDFreports.
Businessesfindthemselvesinaharshdilemma:
delayprojectexecutionandsacrificebusiness
agilityandresults,orlimitdatauseandanalysis
tobasicandreadilyavailablemeasuresthat
impedeinsightgenerationandlowerbusiness
performance.
2DIMINSHEDBUSINESSAGILITY
Datastarvationduetoprotractedpreparation
isn’tlimitedtohumans.Businessesareadopting
artificialintelligenceandmachinelearning
capabilitiesatarapidpacetobetteranticipate
keybusinessandindustryeventsandautomate
adequateresponsestothem.Butformachinesto
learnquicklyandeffectively,theymustbefeda
dietofhighqualitydata,whichentailscomplex
preparation.ETLsholdbackmachinelearningin
twoways:ETLsmaynotadequatelyhandle
unanticipatedvariationsinthedatathatcanslow
downlearning,orapplicationsofmachine
learningmaybeconstrainedtosimpleusecases
thatpreventbusinessesfromreactingquicklyto
allcriticalissuesandchanges.
3 SUBOPTIMALBUSINESSDECISIONINGAND
BUDGETALLOCATION
ETLscantripbusinesseswhentheyinvolve
redundantdatacopiesandconvoluteddatatrans-
formations.AsdataprosaddmoreETLsanddata
martsinresponsetobusinessrequests,data
quality,lineage,andoveralltrustworthiness
degrade.Thebusinesslosesthesinglesourceof
truthofitsdata,cobblingtogethermetricswith
vagueormisunderstooddefinitions.
9© 2018 Lore IO Inc.
Eliminate Traditional ETL
Consequently,analyticsdashboardsandreports
getpopulatedwithdirtydata,causingmanagers
andexecutivestosuspect,ifnotaltogetherignore,
thedata,ortrustbaddata.Theendresultispoor
decisionmakingthatderailsbusiness
performance.
4 SUBSTANTIALDATAMANAGEMENT
OVERHEAD
Asthebusinessreliesmoreheavilyondataover
time,ittypicallyaccumulatesdozens,ifnot
hundredsofdifferentETLprocessesthatitmust
maintainandmanage.Basedonvaryingbusiness
requirements,theseETLsincludemanipulation
logicthatisdeployedindifferentlocationsand
configurations.Dataprosmaydevelopdata
manipulationscriptsinprogramminglanguages
suchasPythonorPerl,writecomplexSQLintool
likeHiveorPig,ormakedatamanipulationeither
manuallyorincodedirectlyinExcel.Having
manipulationlogicresideindifferentsystemsand
formats— hard-codeddirectlyintotheETL
process— canpresentsignificantmanagement
challengesandmanagementoverhead,
preventingthebusinessfromscalingitsdata
consumption.
5DECREASEDEMPLOYEEPRODUCTIVITY
Ampleevidencesuggeststhatdatapreparation
cantakeupto90%ofthedevelopmenttimeand
costindatawarehousinganddataanalytics
projects.Dataanalystsandscientistsseekingto
exercisetheircreativityandinterpretationskills
duringdataanalysisandmodelling,regularlyfind
themselvesburdenedwithrudimentaryand
lacklusterdatadesignandcleansingchores,or
altogetherineffectivewhenfailingtoaccess
criticaldata.Whethertheychaseafterdata,pre-
definetablesandschemas,orlanguishinits
transformation,hardtofindandretaindatapros
arepreventedfromfulfillingtheirpotential,
whichresultsinlowmoraleandhighchurn,
impedingthegrowthandstabilityofthebusiness.
10© 2018 Lore IO Inc.
Eliminate Traditional ETL
ETL issues — Close examination
Tobetterunderstandandappreciatethe
complexityofETLsandtheirnegativeimpacton
thebusiness,it’sworthexploringtherootcause
ofETLissues.Thenextthreesectionsdigdeeper
intotheextract-transform-loadphasesandthe
challengestheypresent.
1EXTRACTISSUES
Severalbusinessusecases,suchashaving360-
degreeviewofcustomers,campaigns,or
employeesrequireingestionofdatafrom
multiplesystemsofrecords.Thepluralityofdata
sourcesintroduceseveralchallenges:
DISPARATEDATASOURCES
Businessesrelyonavarietyoftechnology
componentstoruntheirdailyoperations.Data
foranalysis,there- fore,maybepulledfrom
differentapplicationsthataredesigned,built,and
maintainedbydifferentvendors,wherebyeach
applicationrequiresadifferentmechanismto
accessthedata.Dataprosmaythereforehaveto
learnmultipleinterfacesandregularlymaintain
themevenwithinasingleETLprocess.
OBSCUREDSOURCESCHEMA
UseLoreIO’sUI,APIsorletanalysts&business
userstousetheBItoolstheyknowandlovefor
self-servedbusinessexploration.
DISPARATEDATASTRUCTURES
Thegrowthandunpredictablenatureofdata
haveusheredinnewwaysofstoringdata,
especiallyinnon-relationaldatabases.Datamay
befullystructured,semi-structured,or
unstructured.Dataprosmustnowaccessdata
thatishousedindifferentandunfamiliarformats
whereaccessviathewellestablishedSQLisnot
possible.
LACKOFDATAHYGIENE
Businessdataistypicallygeneratedbyboth
humansandmachines,whichintroduces
discrepanciesinthedegreetowhichdatais
completeandclean.Dataprosmustaccountfora
greatervarietyinthestateofthedatathanever
be- fore,furthercomplicatingdataextraction.
DIFFERENTDATAREFRESHCYCLES
Sourcesystemscreateandrefreshtheirdataat
regularintervals.Sensorsorsocialmedia
applicationsstreamvastamountsofdatainreal
time.Somebusinessapplicationssessionize or
aggregatedatathatismadeavailableonlyona
weeklyoramonthlybasis.Analystswhopull
disparatedataelementsmustaccountforvarying
levelsofdatafreshness.
11© 2018 Lore IO Inc.
Eliminate Traditional ETL
FRAGMENTEDSECURITYPOLICIES
Asdatasecuritycapturesmoreattentionand
mind- share,sourcesystemsimplementmore
robustandoftendifferentsecurityand
governancepracticesthatdataprosnowmust
accountforandcomplywithwhenaccessingdata.
DISPARATEDATASTEWARDSHIP
Finally,sourcesystemsaremanagedbydifferent
teamsanddepartmentsinsideandoutsidethe
organization.Dataprosmustspendtime
identifyingtherightdataownersandthen
negotiateandcomplywiththeirdifferentdata
accessanduseguidelines.
2TRANSFORMISSUES
Withdataextractedfromitsoriginalsourceand
broughtintotheEDW,dataprosmustnow
transformthedataintoausableformthatwill
enablethemtoblenddisparatedatasets,
organizeit,andreadyitforconsumptionbythe
business.Giventherichnessandcomplexityof
data,theTransformphasepresentsnumerous
challenges:
FINDINGCOMMONKEYS
Somebusinessusecases,suchascustomer
analytics,forcedataprostojoinmanydifferent
datasets.Auniqueandtrustworthykeybywhich
tostitchdisparatedatasetsmaynotbealways
available,requiringanalyststospendvaluable
timeconstructingareliablekeytoensurethatthe
rightrecordsarebeingjoined.
LACKOFDATALINEAGE
Datamayundergoseveraltransformationsover
timeinsupportofdifferentbusinessusecases.
Dataprosoftenlackdatalineagetofully
understandwhatchangesoccurredinthedata
fromonetransformationtothenext.Thislackof
visibilitycanleadtotheuseofthewrongversion
orgenerationofdata,leadingtomisinformed
analysislater.
DEDUPLICATIONANDDERIVEDCOLUMNS
Dataanalystsandscientistsmustcontendwith
datasetsthatrequiredifferentlevelsof
manipulation.Forinstance,somedatapointsmay
bereadyforuse,whileothersmaybeencodedfor
privacyorbrevitypurposes,andmustbedecoded
beforeuse.Somebusinessusecasesmayrequire
thecreationofnewcalculatedmetricsthatderive
theirvaluesfromothermetrics,orthegeneration
ofsurrogatekeyvalues.Datasetsmight
accidentallyorunnecessarilyrepeatthemselves,
requiringade-duplicationeffort.
COLUMNSPLITSANDMERGES
Theorganizationofdatamayvarydramatically
duringthetransformationphase.Somedata
mightrequiresortingororderingbasedonalist
ofcolumns.Datamightneedtobeaggregated,or
disaggregated,trans- posed,orsomehowpivoted.
Adatacolumnmightneedtobesplitintoseveral
distinctcolumns,orviceversa.
12© 2018 Lore IO Inc.
Eliminate Traditional ETL
VALIDATIONANDQUALITY
Finally,theunpredictablenatureofdatameans
thattherangeofdatavaluesextractedcanexceed
theoriginalexpectationsofdataanalystsand
scientists.Tomitigatethisproblem,thetransform
phasemaygetfurthercomplicatedby
incorporatingaprocesstolookupandvalidate
thedatafromtablesorreferentialfiles.
3 LOADISSUES
Oncedataistransformedit’sreadytobeloaded
intotargetsystemsthatbusinessuserswillinter-
actwithtoderiveinsightsandactions-- thevery
purposefortheETLprocessoverall.TheLoad
phaseintroducesuniquechallengesthatfurther
complicateandprolongETLs:
SCALEANDVARIATIONS
Dataprosmayneedtohandleverydifferent
targetsystems,fromsimpleflatfilestoenormous
datawarehouses.
TESTING,PUBLISHING,ARCHIVING
Theloadprocessitselfmayvaryfromoneuse
casetoanother.Itmayinvolveadifferentnumber
oftasks,suchastesting,approvals,andarchival.
BATCHVSSTREAMVSREAL-TIME
Theschedulingofdatadeliverycanvary
dramatically,fromrealtimeapplicationsallthe
waytoweeklyormonthlydataload.
UPDATESANDCORRECTIONS
Themethodofdeliverycanvary:some
applicationsmaycallforupdatingexisting
recordswithnewdata,whileothersmaydepend
onthecreationofnewre- cords.
TARGETSYSTEMRESPONSES
Whenloadingdata,analystsmustcomplywith
thedifferentdataconstraintsofthetarget
systems,suchasuniqueness,mandatoryfields,or
referentialintegrity.Analystmusteffectively
respondtotheacceptanceandrejection
responsestheyreceivefromthetargetsystems.
ACCESS
Dataprosmustconformtoanyaccess
restrictions,encryptionlevels,orsecuritypolicies
ofthesourcesystemsandpreservetheminthe
targetsystemsforeachdataconsumertoensure
thatonlypermittedusersandapplicationscan
accesssensitivedata.
14© 2018 Lore IO Inc.
Eliminate Traditional ETL
Start at end: Declare your output views
Conventionalthinkingstartswiththesourcedata
andmapsthevarioustransformationprocedures
neededtoshapethedataforitstarget
destination.
Conventionalthinkingdoesn’tscale.
Abetterapproachistodeclarethedesiredend
stateandworkbackwards—ideally
programmatically—thevarioustasksthatmake
upthetransformationlogic.
InplainEnglish,dataanalysts(ratherthanETL
developers)controlthedataprepprocess.These
analystsusedeclarativelanguages,suchasSQLor
SQL-like,todescribetheirdataneeds.
Whenyoustartwithadeclarativelanguageand
addautomation,youcantrulyreimaginethe
wholedataprepprocessandmakeitagileand
scalable.
Dataprepapproachesthatleveragedeclarative
definitionsareprovingtobemuchmorescalable
andagilethantraditionalproceduralones.Rather
thanawaterfallmodel,declarative
transformationshavearelaxed,cyclicalstructure
whereeachtransformationtaskcanbecarried
outindependently.
Declarativetransformationsenablebusiness
userstoparticipateinthedataprepprocessfrom
theget-go.
Analystsdefinetheirtargetdataviews.Subject
matterexpertsannotatethedataelementsand
thesystemtranslatesthesedeclarationstoETL
codethatcanberunatanystepoftheprocess,
evaluatedandfixedasneeded.Businessuserscan
evaluatethedataatanytimeandrequest
modificationstothepreparationprocess.
14
15© 2018 Lore IO Inc.
Eliminate Traditional ETL
Migrate out of procedural logic
ProceduraltransformationsfollowatraditionalapproachwheretheETLownersreceivethereportingrequirementsfromthebusinessteam,andthendesign,build,testandoptimizetheimplementationontheirownbeforeprovidingbusinessuserswiththepreppeddata.
Manyoftheproceduraltasksarecarriedoutsequentiallyandpotentiallyoveralongperiodoftimebasedontheproject’scomplexity.Businessuserscannotaccess—letaloneuse—thedatawhileit’sbeingworkedon.Andanysubsequent
requirementtomodifythedatamayimposeawholenewsequenceofsteps.
Procedurallogicisimposingawaterfall-likebusinessworkflowthatistooslowtoadoptforusecasesthatinvolvemanydatasources.
16© 2018 Lore IO Inc.
Eliminate Traditional ETL
16
Automate ETL coding
Thinkofdeclarativelanguageasthe
“commander’sintent”—Theyconveythedesired
outcomewithoutspecifyinganyimplementation
detailsbecausethosemaychangeatanytime
basedonresourceavailability,theemergenceof
newtechnologies,theuseofnewdatasources,
etc.
Itismuchmoredesirable,therefore,todelegate
theimplementationprocesstomachinesthatare
muchbettersuitedtohandleincreasinglogicand
sourcecomplexitythanmeremortals.When
automatedsystemshandleandevenoptimizethe
spaghettisaladofETLjobcode,thebusinesscan
onboardandunifynewdatasourcesatamuch
fasterrate,enablinganalyststofocusmoretime
ongeneratinginsightsfromdatathanon
preppingthedata.
Agoodexampleofhowautomationhelpsdata
preparationistheprocessofjoiningdisparate
datasources.Followingaproceduralmodel,an
ETLengineermustspecifythevarious
mechanismstounifydatasourcesbymapping
datakeysandusingothertechniques.Asthe
numberofsourcesgrowsandasthedataschemas
oftheunderlyingsourceschange,itbecomes
increasinglypainfulforhumanstomanagethose
relationships,slowingdownimplementations.
Automationontheotherhand,cancontrolthe
variousdataentities,attributes,andkey
relationships,andbysodoingtakecareofallthe
necessarydatajoinsonitsown.Usersshould
alwayshavefulllineagetodiscoverand
understandhowtheirdataelementscome
together,buttheactualunificationcan(and
should)bedonebymachines.
17© 2018 Lore IO Inc.
Eliminate Traditional ETL
Sincedeclarativetransformationsenablemorestakeholderstobeinvolvedthroughoutthedatapreparationprocess,itmakessensetodividetheworkflowintosmalltasksandassigntheresponsibilityforcompletingeachtasktothosebestpositionedtoaddressthem.
Thisdivideandconquerprocessisakintoapotluckpartywhereeveryguestcontributesamealitemthattheyarefeeltheycanbestmanage.Similarly,thedatapreparationworkflowcanbedividedupandacrossthosewhobestknowthesourcedata,thosewhounderstandthebusinesslogic,thosewhocreatethefinalviews,etc.Stakeholderscollaborate onthewholeprocessbycontributingattheirownpace.
Oncestakeholderscollaborateondatadefinitionandbusinesslogic,theyaremorelikelytowanttousethenewdataelementsintheirreportsandapplications.Aglobaldatafabriccanbecomethe
singlesourceoftruthfortheentireorganization,expressingtheinsandoutsofthebusinessinacommonandstandardlanguagethateveryonecanunderstandandtrust.
Automatingdeclarativetransformationsmeansthatthesystemrunsitstransformationcodedirectlyonthesourcedata.Thismeansthatdatadiscoveryandcataloging– whilevirtual– isdirectlycoupledwiththeactualdata,sothatuserscanissuedataqueriesastheystudy– andevenrefine– thefabric,buildingnewdatadefinitionsuponpreviouslycreatedones.
Collaborate on logic creation
18© 2018 Lore IO Inc.
Eliminate Traditional ETL
Capabilities to look for in a new ETL solution
Belowareseveralcapabilitiesthatshouldbe
includedinwhateversolutionyouchooseto
eliminatetraditionalETL:
AUTOMATEDDATASCANNINGANDSCHEMA
DETECTION
Newsourcesareperiodicallyadded,andexisting
onesmayexperienceschemachangesovertime.
Lookforasolutionthatcanscanyourdata
landingarea,ingestnewsources,andadaptto
schemachanges– allautomatically– sothatthe
businessisn’tblocked.
AUTOMATEDDDL(TABLES,COLUMNS,
DESCRIPTIONS)
Sincedatachangessooften,manuallyhandlingits
physicalpersistencecancreatebusiness
bottlenecks.It’sbesttoallowthebusinessto
focusoncreatingdatalogic,anddelegatephysical
considerationstoautomatedsystems.
VIRTUALDATACANVAS
Toremaincompetitiveandeffective,thebusiness
continuouslyevolvesitsdataneedsandusecases.
AnewETLsolutionmustofferbusinessusersan
intuitiveenvironmentwheretheycanextend
theirdatasemantics,furthertransformtheirdata,
andcontinuouslydeveloptheirdatafabric
withouthavingtowaitonITtobuildnewdata
pipelinesfirst.
STATEFULDATACOMPONENTS
Sincemanydatatransformationsrequirethe
involvementofnumerousstakeholdersand
systems,suchtransformationsarebestdeveloped
incrementally.Thebusinessneedstobeableto
breakdatarequirementsintosmallcomponents,
andmanagetheirprogression(orstates)
effectively.
DATACOMPONENTOWNERSHIP
Clearownershipoftheabovementioneddata
componentsimprovesteamcollaboration,
enablingstakeholderstointeractontheirdata
componentsthroughouttheirdevelopment
lifecycles.
DATACOMPONENTDEFINITION
Todemocratizedataandenablebusinessusersto
explore,use,andbuildondatacomponents,there
needstobeacentralizedcatalogthatlistsand
definesalldataelements.
DATACOMPONENTLINEAGE,HISTORYAND
VERSIONING
Stakeholdersmusthaveconfidenceintheirdata.
Toreachastateof”singlesourceoftruth”,teams
mustbeabletounderstandhowdataelements
areconstructedandderived,aswellastotrace
backdatatoitssource.
19© 2018 Lore IO Inc.
Eliminate Traditional ETL
REUSABLEDATACOMPONENTS
Thedatafabricmustbemodular;stakeholders
whodefinenewoutputsshouldbeabletouse
theseoutputsasinputsfornewerstill
components.Therefore,thecomponentcreation
processmustincludetheabilitytopublishthem
intothedatalayerandavailthemsimilarlytoraw
dataelements.
TESTABLEDATACOMPONENTS
Dataqualityandaccuracyneedtobeassessed
andvalidatedassoonaspossible;businessescan
nolongeraffordtowaituntiltheentiredata
pipelineisestablished.Rather,datacomponents
mustbetestablerightontherawdataassoonas
theirdefinitionsarecreated.
USERSCANREQUESTNEWDATACOMPONENTS
Astheneedformoreandnewerdataincreases,
usersmustbeabletodefinedesiredoutputs
withouttaxingdataproducerstorearchitectthe
datapipelines.Thesystemmustbeflexible
enoughtoaccommodatenewrequests,even
whenthosemayprofoundlyalterdatamodels
andschemas.
AUTOMATEDMICRO-TASKSTOADVANCE
COMPONENTSTATE
Toincreasebusinessagility,anautomatedsystem
shouldbeabletodefinedesignandoperation
tasks(“micro-task”)toknownstakeholdersto
ensuredatacomponentsarecreatedandmade
availablequickly.
MICRO-TASKUSERCOLLABORATION
Multiplestakeholders(presumablyfromdifferent
areasofthebusiness)mustbeabletowork
togetherondatacomponentswithout“stepping
oneachothers’toes.”
MICRO-TASKOWNERSHIPASSIGNMENTAND
MANAGEMENT
Collaborativedatamanagementmeansthat
stakeholdersareabletotaskeachotherand
assignresponsibilitiestoparticipateinthedesign,
creation,testing,publishingoroptimizationof
datacomponents.
AUTOMATEDUNIFICATIONOFMULTIPLEDATA
SOURCESANDTYPES
Anautomatedsystemshouldbeabletoidentify
commoncharacteristicsacrossdatasourcesand
types(suchasemailaddresses,orresourceURIs),
andusethisknowledgetounifydatasetsfor
fasterdatadiscoveryandquerying.
INTUITIVEDATAEXPORT
Finally,anewETLsolutionmustenablebusiness
userstoexportanydatacomponentorviewto
oneormoretargetsystemseasilyandquickly,as
soonasnewdataismadeavailable.
20© 2018 Lore IO Inc.
Eliminate Traditional ETL
Putting it all together
CollaborativedatapreparationoffersanewalternativetotraditionalETL.Itovercomesthescalelimitationsthatimpacttraditionalworkflowsastheyattempttounifyalargeandgrowingnumberofdatasources.Thisformofagiledatapreparationenablesstakeholderstocollaborateandgraduallybuildoutthedatafabricbyincrementingtheirdatadefinitionsasnewsourcesandusecasesemerge.
Atthesametime,businessuserscanreapquickreturnsfromnewdefinitionsbyimmediatelyincorporatingthemintotheirreportsandapplications.TheynolongerneedtowaituntiltheirETLengineeringcounterpartsfullydevelopanddeploythedataplatform.BydelegatingETLcodegenerationtosystems,businessescanmorequicklyexpandanddeepen
theirsharedunderstandingoftheirdata,andevolvetheircommonlanguagetobetterexpressthenuancesoftheirbusiness.
Lastly,thecollaborativenatureofthisnewapproachbothdemocratizesdatapreparation–invitingnewstakeholderstoparticipateintheprocess– anddecentralizesdataownership,invitingthosewhoconsumedatatocontributetothegrowthandongoinghealthoftheirdatafabric.
21© 2018 Lore IO Inc.
Eliminate Traditional ETL
About Lore IO
LoreIOisaCollaborativeDataPreparationPlatformthathelpscompaniesingestandunifydisparatedatasetsfromhundredsorthousandsofsources.ItgeneratesstandardoutputswithouttheneedforengineerstodevelopproceduralETLanddatapipelines.
LoreIOcustomersunlockthefullvalueoftheirdatabyempoweringbusinessuserstocollaborateonandusedatasetsthatareinitiallyhardtounderstand,reconcile,andblend.
TheLoreIOplatformabstractsallthecomplexsemanticsofhowthedataiscapturedandjoinedtogether,enablingcustomerstoinstantlyvalidate
businesslogicinsupportofawiderangeofusecases.
LoreIOtakesanagileapproachtopartneringwithnewcustomers.Itseekstoexplorestrategicprojectsthatwillmakeamaterialimpactonthebusinessandthenstructureapartnershipthatdemonstratesvalueimmediatelyandscalesfromthere.
www.getlore.io | [email protected] | (650) 823-1979