adaptive schema databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf ·...
TRANSCRIPT
AdaptiveSchemaDatabases
WilliamSpothb,Bahareh SadatArabi,EricS.Chano,DieterGawlicko,AdelGhoneimyo,BorisGlavici,BedaHammerschmidto,OliverKennedyb,
Seokki Leei,ZhenHua Liuo,XingNiui,YingYangb
b:UniversityatBuffalo i:IllinoisInst.Tech. o:Oracle
1
AdaptiveSchemaDatabases
2
Classicrelationaldatabase• Navigationalandorganizationalpurposeretaindiscovery,goodperformanceandspace,reusable.
3
Classicrelationaldatabase• But...Highupfrontcostandinflexible
4
BigData/NOSQL• Datacanbeusedimmediately.
5
BigData/NOSQL• But...SacrificenavigationalandPerformancebenefitandmayendupwithduplicateofwork
6
AdaptiveSchemaDatabases
Queriesandfeedback...
eventually
• BridgethegapbetweenrelationaldatabaseandNoSQl.
7
AdaptiveSchemaDatabases
Queriesandfeedback...
eventually
• BridgethegapbetweenrelationaldatabaseandNoSQl.
8
AdaptiveSchemaDatabases
Input:
Queries:SELECTnameFROMUndergradUNIONSELECTnameFROMGrad
SELECTdeg FROMGrad
SELECTnameFROMStudent
…
9
Outline
Unstructured Data Semi-structed Data (e.g., JSON)
Extraction workflow
SchemaWorkspace
Schema Matching
Extraction workflow Extraction workflow
Extraction Schema Candidates
SchemaWorkspace
SchemaWorkspace
SchemaWorkspace
Queries + Feedback• Extractionanddiscovery• Adaptive,personalizedschemas
fromqueries• Explanationsandfeedback• Adaptiveorganization• Conclusionsandfuturework
10
Extraction
Unstructured Data Semi-structed Data (e.g., JSON)
Extraction workflow
SchemaWorkspace
Schema Matching
Extraction workflow Extraction workflow
Extraction Schema Candidates
SchemaWorkspace
SchemaWorkspace
SchemaWorkspace
Queries + Feedback
11
• ASDextractsschemacandidateset
Giveninput:
12
Extraction
• ASDextractsschemacandidateset
Giveninput:
13
Extraction
• ASDextractsschemacandidateset
Giveninput:
14
Extraction
Extraction• ASDextractsschemacandidateset
Giveninput:
15
• ASDextractsschemacandidateset
schemacandidatesetCext={Sext,Pext},whereSext isasetofcandidateschemas,Pext isaprobabilitydistributionovertheseschemas.
16
Discovery
• ASDextractsschemacandidateset
Smax:thebestguessschema
17
Discovery
Adaptive,personalizedschemasfromqueries
Unstructured Data Semi-structed Data (e.g., JSON)
Extraction workflow
SchemaWorkspace
Schema Matching
Extraction workflow Extraction workflow
Extraction Schema Candidates
SchemaWorkspace
SchemaWorkspace
SchemaWorkspace
Queries + Feedback
18
• ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.
Initially,W={}
19
Adaptive,personalizedschemas
• ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.
20
Query1:SELECTname FROMUndergrad UNIONSELECTname FROMGrad
FindingSchemasfromQueries
• ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.
21
FindingSchemasfromQueries
Query1:SELECTname FROMUndergrad UNIONSELECTname FROMGrad
• ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.
22
Query2:SELECTdeg FROMGrad
FindingSchemasfromQueries
• ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.
Query3:SELECTnameFROMStudent
W1 =(S1={Undergrad(name)},P1=0.27),(S1={Grad(name)},P1=0.23),(S1={Undergrad(name), Grad(name)},P1=0.5)
23
SynthesizingTables
Explanationsandfeedback
Unstructured Data Semi-structed Data (e.g., JSON)
Extraction workflow
SchemaWorkspace
Schema Matching
Extraction workflow Extraction workflow
Extraction Schema Candidates
SchemaWorkspace
SchemaWorkspace
SchemaWorkspace
Queries + Feedback
24
Extractionerrorsappearinthreeforms:(1)AqueryincompatiblewithSmax
(2)AnupdatewithdatathatviolatesSmax(3)Anextractionerrorpresentedtouser
Weprovide:(1)explanationofresults(2)provenance(3)Warn theanalystwithambiguity(4)Explain theambiguity(5)Evaluate themagnitudeofambiguity(6)Assisttheanalysttoresolve theambiguity
Whatmightgowrong
25
ASDinteractswiththeoutsideworld:Schema,Data,andUpdate.
Schemainteractions:WhenaqueryincompatiblewithSmax andtheworkspace
Datainteractions:provenanceforattributeandrowlevelambiguity.
Updateinteractions:• representschemamismatchesasmissingvalues.• resolvedataerrorswithaprobabilisticrepair.• upgradeherschematomatchthechanges.• checkpointherworkspaceandignorenewupdates.
Typesoferrors
26
Explanationsandfeedback
Explanations:WematchStudentwith
bothGradandUndergrad
27
Condition2:Queryfromunknown schemaelements:SELECTnameFROMStudent
W1 =(S1={Undergrad(name)},P1=0.27),(S1={Grad(name)},P1=0.23),(S1={Undergrad(name), Grad(name)},P1=0.5)
Adaptiveorganization
Unstructured Data Semi-structed Data (e.g., JSON)
Extraction workflow
SchemaWorkspace
Schema Matching
Extraction workflow Extraction workflow
Extraction Schema Candidates
SchemaWorkspace
SchemaWorkspace
SchemaWorkspace
Queries + Feedback
28
AdaptiveorganizationTrade-offbetweenstoringdatainitsnativeformatandbasedonaspecificschema.
Whatisthechallenge?Manyworkspaces,addtabletotheschema,….
ChallengesandPossibleSolutions:• Wewantmultiplepersonalizedschemas
1.Relationalworkspaceschemaisessentiallyaviewoverrawdata.Materializingviewcanbeused.
2.Useexistingadaptivephysicaldesign andcaching techniques.• Sharedmaterializations
1.Incrementalmaterializedviewmaintenance.Leveragetechniquesfromrevisioncontrolsystems.
2.Viewselectionproblem.29
Conclusionsandfuturework
ASDbridgesthegapbetweenrelationaldatabasesandNoSQL.
• Discovery:Helpuserexploreandunderstandnewdatabyprovidinganoutlineoftheavailableinformation.Done
• Materialization:Adoptworkonadaptivedatastructures.Partiallydone• DataSynthesis:Synthesisnewtablesandattributesfromexistingdata.
Done• ConflictResponse:
– Versioningorbranchingtheschema.– Loganalysistohelpusersassesstheimpactofschemarevisions.
30