eliminate traditional etl

22
Eliminate Traditional ETL WITH COLLABORATIVE DATA PREPARATION © 2018 Lore IO A Lore IO Whitepaper

Upload: others

Post on 28-Dec-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Eliminate Traditional ETLWITH COLLABORATIVE DATA PREPARATION

© 2018 Lore IO

A Lore IO Whitepaper

2© 2018 Lore IO Inc.

Eliminate Traditional ETL

Data sources20 50 >100

ETL Engineer

Data unification

Effo

rt

3© 2018 Lore IO Inc.

Eliminate Traditional ETL

The race to unify multi-source datasets is on!

Foryears,companiesofallsizeswereencouragedtobuildtheirdatalakes.Manyheededthecall,builtmoderninfrastructuresandbroughtintheirdata.Somestruggledwiththeendresultandfoundthemselvessubmergedindataswamps.Butmanyotherswereabletoplowthrough,andarenowputtingtheirdatalakesintoactionthroughwideviewsofthedata.

Thereisanewrecognitionnowamongbusinessusersthatdifferentusecasescallfordifferent

numberofdatasources—aswellasstakeholders—throughouttheimplementation.Asmallteamwhowishestoviewsalesfunneldataposesconsiderablysmallerdataunificationburdenthanamarketplacethatmustaggregatedatafromthousandsofsources.

Data sources20 50 >100

Team Wide Enterprise Wide Industry Wide

Sales Funnel CRM, Sales, Marketing and Conversion data

Logistics TeamsSales, CRM, Ticketing, Delivery and revenue

Marketing TeamsCustomer 360 for targeting

Global View, Holding CompaniesPayments, Transactions, ERP, CRM

Finance TeamsMerge usage, spending with margins and revenue

Operation TeamsMerge cost and revenue with attribution across teams

IntermediariesMarketplaces, Exchanges,Payment providers, Trades

Customer/Partner OnboardingB2B CompaniesSolution ProvidersData Platform Builders

4© 2018 Lore IO Inc.

Eliminate Traditional ETL

Solving the multi-source conundrum

Totaptheirlargenumbersofsources,may

companiesturntotheirexistingETLtoolsonlyto

reachscalabilityissuesandlimits.Accordingtoa

2017Forresterresearchpaper,theData

Orchestrationisnowawhopping$60Bindustry.

Company-widedataorchestrationthatinvolves

hundredsofsourcesisbyitselfa$13Bmarket,

where85%ofbudgetsarespentonservices.

Suchoverrelianceonservicesisadirectoutcome

ofhavingtodevelopcustomsolutionsto

overcometraditionalETLlimitation.Such

servicesinvolveaheavydoseofcustom

programmingtounifyandorchestratemultiple

datasources.Thesecustomsolutionswilllikely

involveongoinginvestmentstokeepupwiththe

demandtounifyandorchestrateagrowing

numberofsourcesfornewerdatausecases.

85% of the budget is spent on services to custom-develop data orchestration solutions.

This is a fairly small portion.

Market size of Company-wide data orchestration

Services

Tools

$13.1B

5© 2018 Lore IO Inc.

Eliminate Traditional ETL

Scaling sources (and stakeholders) is hard

TurningtotheirexistingETLanddatapreparationtoolsthatservedthemwellinthe“pre-datalake”world,companiesarefindingoutthatwhilesuchtoolselegantlyhandleasmallnumberofdatasourcesandstakeholders,theybecomecumbersometouseandmaintainfornewusecasesthatinvolvemoresourcesandteams.

CurrentsolutionsrelyonprocedurallanguageswhereETLdevelopersmustcreatestep-by-stepworkflowfordataingestionandtransformations.

Themoresourcesabusinessseekstounify,themorecomplexarethemappings,andthemoreinter-connectedtheseworkflowsbecome.

Evennewsolutionsthatofferself-servecapabilitiesforbusinessusersinindividualteamsstruggletoperformassourcesgrowandalargernumberofteamsisinvolvedindesigningcomplexdatadefinitions.

Data sources20 50 >100

Existing ETL solutions do a great job here…

…but they struggle mightily here.

6© 2018 Lore IO Inc.

Eliminate Traditional ETL

Long implementations slow down the business

“80% of data scientists’ time is spent on data collection and preparation for analysis.”Forbes.com

“While the self-service data preparation market is growing 17% every year and is expected to become a billion-dollar market by 2019, it still seems to be one of the best kept secrets around.”Gartner Group

Astheefforttounifymoredatasourcesincreases,sodoestime-to-value.Businessuserswhoseektodevelopbroaderandmoremeaningfulviewsofcustomers,investments,andprograms,theymustlongerontheirITcounterpartstodigestthesenewdatarequests,figureouthowtoincorporatethemintotheexistingdatainfrastructure,designandimplementareasonablesolution,andtestandreleaseittothebusiness.Timeisaresourceinshortsupplyforbusinessteams.Theymustansweragrowingandmoredemandingquestionsinordertoremaincompetitiveinachaoticmarketplace.Whennewdataislatetoarrive,everyoneinthebusinessfeelstheimpact.

Thelackofreadilyavailabledatathatiscleanandtrustworthyhashugeimplicationsondataanalyticsanddatascienceteamsaswell.Severalstudiesshowthatapproximately80%ofanalysts’andscientists’timeisspentondatapreparation.

Thisenormouswasteofvaluableresourceshasalottodowiththegrowthofdatasources,therelianceonlessandlessstructureddatathatrequiresmorecleanup,andtheballooningvolumesofdataitself.Again,everyoneintheorganizationisimpacted.

7© 2018 Lore IO Inc.

Eliminate Traditional ETL

Data sources20 50 >100

Why do tools struggle with complex use cases?

ETLanddatapreparationtoolscaneffectivelyhandleafewsources,butusingtheminsupportofadataplatformthathandlesdozensorhundredsofsourcesisaherculeaneffort.That’sbecauseattheircore,ETLtoolsrelyonprocedurallogicthatmustbecodifiedinsomeprogramminglanguageortheETLtool’ssystem.Engineersmustdesign,codifyandmaintainproceduresthatdescribethewaydatawillbetransformedandmovedstepbystep.

Themoresourcesandusersanimplementationinvolves,themoretimeandeffortengineersmustinvesttodevelopandcoordinatethegrowingnumberoftransformationsteps.Theymustensurethatnewproceduresdon’tbreakorconflictwithpreviousproceduresandthatdatasetsareflowingintherightdirectionsandattherighttimes.

ETL Engineer

Data unification

Data

uni

ficat

ion

effo

rt

Is it too late to change careers?

8© 2018 Lore IO Inc.

Eliminate Traditional ETL

Five ways ETLs hold businesses back

Asnoted,whileETLsarecarriedoutbydataand

technicalpros,theirimpactisfeltacrossthe

entireorganization,particularlybybusiness

groupswhodependonreadilyavailabledatafor

effectivedecisionmakingandaction.Herearefive

repercussionsofrelyingonETLstorunthe

business:

1 BUSINESSPROJECTSARE

BLOCKEDORSLOWEDDOWN

Dependingonthecomplexityofthedata

preparationinvolved,ETLsmayrequireuptotwo

yearstothoroughlydocumentbusiness

requirements,developtheETLprocess,testthe

flow,fixanyissues,andavailthedatatothe

business.ETLprocessesbecomemarkedlymore

complexwhennewdataneedstobeloadedinto

theenterprisedatawarehouse(EDW);for

instance,pullingdatafromPDFreports.

Businessesfindthemselvesinaharshdilemma:

delayprojectexecutionandsacrificebusiness

agilityandresults,orlimitdatauseandanalysis

tobasicandreadilyavailablemeasuresthat

impedeinsightgenerationandlowerbusiness

performance.

2DIMINSHEDBUSINESSAGILITY

Datastarvationduetoprotractedpreparation

isn’tlimitedtohumans.Businessesareadopting

artificialintelligenceandmachinelearning

capabilitiesatarapidpacetobetteranticipate

keybusinessandindustryeventsandautomate

adequateresponsestothem.Butformachinesto

learnquicklyandeffectively,theymustbefeda

dietofhighqualitydata,whichentailscomplex

preparation.ETLsholdbackmachinelearningin

twoways:ETLsmaynotadequatelyhandle

unanticipatedvariationsinthedatathatcanslow

downlearning,orapplicationsofmachine

learningmaybeconstrainedtosimpleusecases

thatpreventbusinessesfromreactingquicklyto

allcriticalissuesandchanges.

3 SUBOPTIMALBUSINESSDECISIONINGAND

BUDGETALLOCATION

ETLscantripbusinesseswhentheyinvolve

redundantdatacopiesandconvoluteddatatrans-

formations.AsdataprosaddmoreETLsanddata

martsinresponsetobusinessrequests,data

quality,lineage,andoveralltrustworthiness

degrade.Thebusinesslosesthesinglesourceof

truthofitsdata,cobblingtogethermetricswith

vagueormisunderstooddefinitions.

9© 2018 Lore IO Inc.

Eliminate Traditional ETL

Consequently,analyticsdashboardsandreports

getpopulatedwithdirtydata,causingmanagers

andexecutivestosuspect,ifnotaltogetherignore,

thedata,ortrustbaddata.Theendresultispoor

decisionmakingthatderailsbusiness

performance.

4 SUBSTANTIALDATAMANAGEMENT

OVERHEAD

Asthebusinessreliesmoreheavilyondataover

time,ittypicallyaccumulatesdozens,ifnot

hundredsofdifferentETLprocessesthatitmust

maintainandmanage.Basedonvaryingbusiness

requirements,theseETLsincludemanipulation

logicthatisdeployedindifferentlocationsand

configurations.Dataprosmaydevelopdata

manipulationscriptsinprogramminglanguages

suchasPythonorPerl,writecomplexSQLintool

likeHiveorPig,ormakedatamanipulationeither

manuallyorincodedirectlyinExcel.Having

manipulationlogicresideindifferentsystemsand

formats— hard-codeddirectlyintotheETL

process— canpresentsignificantmanagement

challengesandmanagementoverhead,

preventingthebusinessfromscalingitsdata

consumption.

5DECREASEDEMPLOYEEPRODUCTIVITY

Ampleevidencesuggeststhatdatapreparation

cantakeupto90%ofthedevelopmenttimeand

costindatawarehousinganddataanalytics

projects.Dataanalystsandscientistsseekingto

exercisetheircreativityandinterpretationskills

duringdataanalysisandmodelling,regularlyfind

themselvesburdenedwithrudimentaryand

lacklusterdatadesignandcleansingchores,or

altogetherineffectivewhenfailingtoaccess

criticaldata.Whethertheychaseafterdata,pre-

definetablesandschemas,orlanguishinits

transformation,hardtofindandretaindatapros

arepreventedfromfulfillingtheirpotential,

whichresultsinlowmoraleandhighchurn,

impedingthegrowthandstabilityofthebusiness.

10© 2018 Lore IO Inc.

Eliminate Traditional ETL

ETL issues — Close examination

Tobetterunderstandandappreciatethe

complexityofETLsandtheirnegativeimpacton

thebusiness,it’sworthexploringtherootcause

ofETLissues.Thenextthreesectionsdigdeeper

intotheextract-transform-loadphasesandthe

challengestheypresent.

1EXTRACTISSUES

Severalbusinessusecases,suchashaving360-

degreeviewofcustomers,campaigns,or

employeesrequireingestionofdatafrom

multiplesystemsofrecords.Thepluralityofdata

sourcesintroduceseveralchallenges:

DISPARATEDATASOURCES

Businessesrelyonavarietyoftechnology

componentstoruntheirdailyoperations.Data

foranalysis,there- fore,maybepulledfrom

differentapplicationsthataredesigned,built,and

maintainedbydifferentvendors,wherebyeach

applicationrequiresadifferentmechanismto

accessthedata.Dataprosmaythereforehaveto

learnmultipleinterfacesandregularlymaintain

themevenwithinasingleETLprocess.

OBSCUREDSOURCESCHEMA

UseLoreIO’sUI,APIsorletanalysts&business

userstousetheBItoolstheyknowandlovefor

self-servedbusinessexploration.

DISPARATEDATASTRUCTURES

Thegrowthandunpredictablenatureofdata

haveusheredinnewwaysofstoringdata,

especiallyinnon-relationaldatabases.Datamay

befullystructured,semi-structured,or

unstructured.Dataprosmustnowaccessdata

thatishousedindifferentandunfamiliarformats

whereaccessviathewellestablishedSQLisnot

possible.

LACKOFDATAHYGIENE

Businessdataistypicallygeneratedbyboth

humansandmachines,whichintroduces

discrepanciesinthedegreetowhichdatais

completeandclean.Dataprosmustaccountfora

greatervarietyinthestateofthedatathanever

be- fore,furthercomplicatingdataextraction.

DIFFERENTDATAREFRESHCYCLES

Sourcesystemscreateandrefreshtheirdataat

regularintervals.Sensorsorsocialmedia

applicationsstreamvastamountsofdatainreal

time.Somebusinessapplicationssessionize or

aggregatedatathatismadeavailableonlyona

weeklyoramonthlybasis.Analystswhopull

disparatedataelementsmustaccountforvarying

levelsofdatafreshness.

11© 2018 Lore IO Inc.

Eliminate Traditional ETL

FRAGMENTEDSECURITYPOLICIES

Asdatasecuritycapturesmoreattentionand

mind- share,sourcesystemsimplementmore

robustandoftendifferentsecurityand

governancepracticesthatdataprosnowmust

accountforandcomplywithwhenaccessingdata.

DISPARATEDATASTEWARDSHIP

Finally,sourcesystemsaremanagedbydifferent

teamsanddepartmentsinsideandoutsidethe

organization.Dataprosmustspendtime

identifyingtherightdataownersandthen

negotiateandcomplywiththeirdifferentdata

accessanduseguidelines.

2TRANSFORMISSUES

Withdataextractedfromitsoriginalsourceand

broughtintotheEDW,dataprosmustnow

transformthedataintoausableformthatwill

enablethemtoblenddisparatedatasets,

organizeit,andreadyitforconsumptionbythe

business.Giventherichnessandcomplexityof

data,theTransformphasepresentsnumerous

challenges:

FINDINGCOMMONKEYS

Somebusinessusecases,suchascustomer

analytics,forcedataprostojoinmanydifferent

datasets.Auniqueandtrustworthykeybywhich

tostitchdisparatedatasetsmaynotbealways

available,requiringanalyststospendvaluable

timeconstructingareliablekeytoensurethatthe

rightrecordsarebeingjoined.

LACKOFDATALINEAGE

Datamayundergoseveraltransformationsover

timeinsupportofdifferentbusinessusecases.

Dataprosoftenlackdatalineagetofully

understandwhatchangesoccurredinthedata

fromonetransformationtothenext.Thislackof

visibilitycanleadtotheuseofthewrongversion

orgenerationofdata,leadingtomisinformed

analysislater.

DEDUPLICATIONANDDERIVEDCOLUMNS

Dataanalystsandscientistsmustcontendwith

datasetsthatrequiredifferentlevelsof

manipulation.Forinstance,somedatapointsmay

bereadyforuse,whileothersmaybeencodedfor

privacyorbrevitypurposes,andmustbedecoded

beforeuse.Somebusinessusecasesmayrequire

thecreationofnewcalculatedmetricsthatderive

theirvaluesfromothermetrics,orthegeneration

ofsurrogatekeyvalues.Datasetsmight

accidentallyorunnecessarilyrepeatthemselves,

requiringade-duplicationeffort.

COLUMNSPLITSANDMERGES

Theorganizationofdatamayvarydramatically

duringthetransformationphase.Somedata

mightrequiresortingororderingbasedonalist

ofcolumns.Datamightneedtobeaggregated,or

disaggregated,trans- posed,orsomehowpivoted.

Adatacolumnmightneedtobesplitintoseveral

distinctcolumns,orviceversa.

12© 2018 Lore IO Inc.

Eliminate Traditional ETL

VALIDATIONANDQUALITY

Finally,theunpredictablenatureofdatameans

thattherangeofdatavaluesextractedcanexceed

theoriginalexpectationsofdataanalystsand

scientists.Tomitigatethisproblem,thetransform

phasemaygetfurthercomplicatedby

incorporatingaprocesstolookupandvalidate

thedatafromtablesorreferentialfiles.

3 LOADISSUES

Oncedataistransformedit’sreadytobeloaded

intotargetsystemsthatbusinessuserswillinter-

actwithtoderiveinsightsandactions-- thevery

purposefortheETLprocessoverall.TheLoad

phaseintroducesuniquechallengesthatfurther

complicateandprolongETLs:

SCALEANDVARIATIONS

Dataprosmayneedtohandleverydifferent

targetsystems,fromsimpleflatfilestoenormous

datawarehouses.

TESTING,PUBLISHING,ARCHIVING

Theloadprocessitselfmayvaryfromoneuse

casetoanother.Itmayinvolveadifferentnumber

oftasks,suchastesting,approvals,andarchival.

BATCHVSSTREAMVSREAL-TIME

Theschedulingofdatadeliverycanvary

dramatically,fromrealtimeapplicationsallthe

waytoweeklyormonthlydataload.

UPDATESANDCORRECTIONS

Themethodofdeliverycanvary:some

applicationsmaycallforupdatingexisting

recordswithnewdata,whileothersmaydepend

onthecreationofnewre- cords.

TARGETSYSTEMRESPONSES

Whenloadingdata,analystsmustcomplywith

thedifferentdataconstraintsofthetarget

systems,suchasuniqueness,mandatoryfields,or

referentialintegrity.Analystmusteffectively

respondtotheacceptanceandrejection

responsestheyreceivefromthetargetsystems.

ACCESS

Dataprosmustconformtoanyaccess

restrictions,encryptionlevels,orsecuritypolicies

ofthesourcesystemsandpreservetheminthe

targetsystemsforeachdataconsumertoensure

thatonlypermittedusersandapplicationscan

accesssensitivedata.

How to Eliminate Traditional ETL – A To-Do Checklist

14© 2018 Lore IO Inc.

Eliminate Traditional ETL

Start at end: Declare your output views

Conventionalthinkingstartswiththesourcedata

andmapsthevarioustransformationprocedures

neededtoshapethedataforitstarget

destination.

Conventionalthinkingdoesn’tscale.

Abetterapproachistodeclarethedesiredend

stateandworkbackwards—ideally

programmatically—thevarioustasksthatmake

upthetransformationlogic.

InplainEnglish,dataanalysts(ratherthanETL

developers)controlthedataprepprocess.These

analystsusedeclarativelanguages,suchasSQLor

SQL-like,todescribetheirdataneeds.

Whenyoustartwithadeclarativelanguageand

addautomation,youcantrulyreimaginethe

wholedataprepprocessandmakeitagileand

scalable.

Dataprepapproachesthatleveragedeclarative

definitionsareprovingtobemuchmorescalable

andagilethantraditionalproceduralones.Rather

thanawaterfallmodel,declarative

transformationshavearelaxed,cyclicalstructure

whereeachtransformationtaskcanbecarried

outindependently.

Declarativetransformationsenablebusiness

userstoparticipateinthedataprepprocessfrom

theget-go.

Analystsdefinetheirtargetdataviews.Subject

matterexpertsannotatethedataelementsand

thesystemtranslatesthesedeclarationstoETL

codethatcanberunatanystepoftheprocess,

evaluatedandfixedasneeded.Businessuserscan

evaluatethedataatanytimeandrequest

modificationstothepreparationprocess.

14

15© 2018 Lore IO Inc.

Eliminate Traditional ETL

Migrate out of procedural logic

ProceduraltransformationsfollowatraditionalapproachwheretheETLownersreceivethereportingrequirementsfromthebusinessteam,andthendesign,build,testandoptimizetheimplementationontheirownbeforeprovidingbusinessuserswiththepreppeddata.

Manyoftheproceduraltasksarecarriedoutsequentiallyandpotentiallyoveralongperiodoftimebasedontheproject’scomplexity.Businessuserscannotaccess—letaloneuse—thedatawhileit’sbeingworkedon.Andanysubsequent

requirementtomodifythedatamayimposeawholenewsequenceofsteps.

Procedurallogicisimposingawaterfall-likebusinessworkflowthatistooslowtoadoptforusecasesthatinvolvemanydatasources.

16© 2018 Lore IO Inc.

Eliminate Traditional ETL

16

Automate ETL coding

Thinkofdeclarativelanguageasthe

“commander’sintent”—Theyconveythedesired

outcomewithoutspecifyinganyimplementation

detailsbecausethosemaychangeatanytime

basedonresourceavailability,theemergenceof

newtechnologies,theuseofnewdatasources,

etc.

Itismuchmoredesirable,therefore,todelegate

theimplementationprocesstomachinesthatare

muchbettersuitedtohandleincreasinglogicand

sourcecomplexitythanmeremortals.When

automatedsystemshandleandevenoptimizethe

spaghettisaladofETLjobcode,thebusinesscan

onboardandunifynewdatasourcesatamuch

fasterrate,enablinganalyststofocusmoretime

ongeneratinginsightsfromdatathanon

preppingthedata.

Agoodexampleofhowautomationhelpsdata

preparationistheprocessofjoiningdisparate

datasources.Followingaproceduralmodel,an

ETLengineermustspecifythevarious

mechanismstounifydatasourcesbymapping

datakeysandusingothertechniques.Asthe

numberofsourcesgrowsandasthedataschemas

oftheunderlyingsourceschange,itbecomes

increasinglypainfulforhumanstomanagethose

relationships,slowingdownimplementations.

Automationontheotherhand,cancontrolthe

variousdataentities,attributes,andkey

relationships,andbysodoingtakecareofallthe

necessarydatajoinsonitsown.Usersshould

alwayshavefulllineagetodiscoverand

understandhowtheirdataelementscome

together,buttheactualunificationcan(and

should)bedonebymachines.

17© 2018 Lore IO Inc.

Eliminate Traditional ETL

Sincedeclarativetransformationsenablemorestakeholderstobeinvolvedthroughoutthedatapreparationprocess,itmakessensetodividetheworkflowintosmalltasksandassigntheresponsibilityforcompletingeachtasktothosebestpositionedtoaddressthem.

Thisdivideandconquerprocessisakintoapotluckpartywhereeveryguestcontributesamealitemthattheyarefeeltheycanbestmanage.Similarly,thedatapreparationworkflowcanbedividedupandacrossthosewhobestknowthesourcedata,thosewhounderstandthebusinesslogic,thosewhocreatethefinalviews,etc.Stakeholderscollaborate onthewholeprocessbycontributingattheirownpace.

Oncestakeholderscollaborateondatadefinitionandbusinesslogic,theyaremorelikelytowanttousethenewdataelementsintheirreportsandapplications.Aglobaldatafabriccanbecomethe

singlesourceoftruthfortheentireorganization,expressingtheinsandoutsofthebusinessinacommonandstandardlanguagethateveryonecanunderstandandtrust.

Automatingdeclarativetransformationsmeansthatthesystemrunsitstransformationcodedirectlyonthesourcedata.Thismeansthatdatadiscoveryandcataloging– whilevirtual– isdirectlycoupledwiththeactualdata,sothatuserscanissuedataqueriesastheystudy– andevenrefine– thefabric,buildingnewdatadefinitionsuponpreviouslycreatedones.

Collaborate on logic creation

18© 2018 Lore IO Inc.

Eliminate Traditional ETL

Capabilities to look for in a new ETL solution

Belowareseveralcapabilitiesthatshouldbe

includedinwhateversolutionyouchooseto

eliminatetraditionalETL:

AUTOMATEDDATASCANNINGANDSCHEMA

DETECTION

Newsourcesareperiodicallyadded,andexisting

onesmayexperienceschemachangesovertime.

Lookforasolutionthatcanscanyourdata

landingarea,ingestnewsources,andadaptto

schemachanges– allautomatically– sothatthe

businessisn’tblocked.

AUTOMATEDDDL(TABLES,COLUMNS,

DESCRIPTIONS)

Sincedatachangessooften,manuallyhandlingits

physicalpersistencecancreatebusiness

bottlenecks.It’sbesttoallowthebusinessto

focusoncreatingdatalogic,anddelegatephysical

considerationstoautomatedsystems.

VIRTUALDATACANVAS

Toremaincompetitiveandeffective,thebusiness

continuouslyevolvesitsdataneedsandusecases.

AnewETLsolutionmustofferbusinessusersan

intuitiveenvironmentwheretheycanextend

theirdatasemantics,furthertransformtheirdata,

andcontinuouslydeveloptheirdatafabric

withouthavingtowaitonITtobuildnewdata

pipelinesfirst.

STATEFULDATACOMPONENTS

Sincemanydatatransformationsrequirethe

involvementofnumerousstakeholdersand

systems,suchtransformationsarebestdeveloped

incrementally.Thebusinessneedstobeableto

breakdatarequirementsintosmallcomponents,

andmanagetheirprogression(orstates)

effectively.

DATACOMPONENTOWNERSHIP

Clearownershipoftheabovementioneddata

componentsimprovesteamcollaboration,

enablingstakeholderstointeractontheirdata

componentsthroughouttheirdevelopment

lifecycles.

DATACOMPONENTDEFINITION

Todemocratizedataandenablebusinessusersto

explore,use,andbuildondatacomponents,there

needstobeacentralizedcatalogthatlistsand

definesalldataelements.

DATACOMPONENTLINEAGE,HISTORYAND

VERSIONING

Stakeholdersmusthaveconfidenceintheirdata.

Toreachastateof”singlesourceoftruth”,teams

mustbeabletounderstandhowdataelements

areconstructedandderived,aswellastotrace

backdatatoitssource.

19© 2018 Lore IO Inc.

Eliminate Traditional ETL

REUSABLEDATACOMPONENTS

Thedatafabricmustbemodular;stakeholders

whodefinenewoutputsshouldbeabletouse

theseoutputsasinputsfornewerstill

components.Therefore,thecomponentcreation

processmustincludetheabilitytopublishthem

intothedatalayerandavailthemsimilarlytoraw

dataelements.

TESTABLEDATACOMPONENTS

Dataqualityandaccuracyneedtobeassessed

andvalidatedassoonaspossible;businessescan

nolongeraffordtowaituntiltheentiredata

pipelineisestablished.Rather,datacomponents

mustbetestablerightontherawdataassoonas

theirdefinitionsarecreated.

USERSCANREQUESTNEWDATACOMPONENTS

Astheneedformoreandnewerdataincreases,

usersmustbeabletodefinedesiredoutputs

withouttaxingdataproducerstorearchitectthe

datapipelines.Thesystemmustbeflexible

enoughtoaccommodatenewrequests,even

whenthosemayprofoundlyalterdatamodels

andschemas.

AUTOMATEDMICRO-TASKSTOADVANCE

COMPONENTSTATE

Toincreasebusinessagility,anautomatedsystem

shouldbeabletodefinedesignandoperation

tasks(“micro-task”)toknownstakeholdersto

ensuredatacomponentsarecreatedandmade

availablequickly.

MICRO-TASKUSERCOLLABORATION

Multiplestakeholders(presumablyfromdifferent

areasofthebusiness)mustbeabletowork

togetherondatacomponentswithout“stepping

oneachothers’toes.”

MICRO-TASKOWNERSHIPASSIGNMENTAND

MANAGEMENT

Collaborativedatamanagementmeansthat

stakeholdersareabletotaskeachotherand

assignresponsibilitiestoparticipateinthedesign,

creation,testing,publishingoroptimizationof

datacomponents.

AUTOMATEDUNIFICATIONOFMULTIPLEDATA

SOURCESANDTYPES

Anautomatedsystemshouldbeabletoidentify

commoncharacteristicsacrossdatasourcesand

types(suchasemailaddresses,orresourceURIs),

andusethisknowledgetounifydatasetsfor

fasterdatadiscoveryandquerying.

INTUITIVEDATAEXPORT

Finally,anewETLsolutionmustenablebusiness

userstoexportanydatacomponentorviewto

oneormoretargetsystemseasilyandquickly,as

soonasnewdataismadeavailable.

20© 2018 Lore IO Inc.

Eliminate Traditional ETL

Putting it all together

CollaborativedatapreparationoffersanewalternativetotraditionalETL.Itovercomesthescalelimitationsthatimpacttraditionalworkflowsastheyattempttounifyalargeandgrowingnumberofdatasources.Thisformofagiledatapreparationenablesstakeholderstocollaborateandgraduallybuildoutthedatafabricbyincrementingtheirdatadefinitionsasnewsourcesandusecasesemerge.

Atthesametime,businessuserscanreapquickreturnsfromnewdefinitionsbyimmediatelyincorporatingthemintotheirreportsandapplications.TheynolongerneedtowaituntiltheirETLengineeringcounterpartsfullydevelopanddeploythedataplatform.BydelegatingETLcodegenerationtosystems,businessescanmorequicklyexpandanddeepen

theirsharedunderstandingoftheirdata,andevolvetheircommonlanguagetobetterexpressthenuancesoftheirbusiness.

Lastly,thecollaborativenatureofthisnewapproachbothdemocratizesdatapreparation–invitingnewstakeholderstoparticipateintheprocess– anddecentralizesdataownership,invitingthosewhoconsumedatatocontributetothegrowthandongoinghealthoftheirdatafabric.

21© 2018 Lore IO Inc.

Eliminate Traditional ETL

About Lore IO

LoreIOisaCollaborativeDataPreparationPlatformthathelpscompaniesingestandunifydisparatedatasetsfromhundredsorthousandsofsources.ItgeneratesstandardoutputswithouttheneedforengineerstodevelopproceduralETLanddatapipelines.

LoreIOcustomersunlockthefullvalueoftheirdatabyempoweringbusinessuserstocollaborateonandusedatasetsthatareinitiallyhardtounderstand,reconcile,andblend.

TheLoreIOplatformabstractsallthecomplexsemanticsofhowthedataiscapturedandjoinedtogether,enablingcustomerstoinstantlyvalidate

businesslogicinsupportofawiderangeofusecases.

LoreIOtakesanagileapproachtopartneringwithnewcustomers.Itseekstoexplorestrategicprojectsthatwillmakeamaterialimpactonthebusinessandthenstructureapartnershipthatdemonstratesvalueimmediatelyandscalesfromthere.

www.getlore.io | [email protected] | (650) 823-1979

www.getlore.io