adaptive schema databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf ·...

30
Adaptive Schema Databases William Spoth b , Bahareh Sadat Arab i , Eric S. Chan o , Dieter Gawlick o , Adel Ghoneimy o , Boris Glavic i , Beda Hammerschmidt o , Oliver Kennedy b , Seokki Lee i , Zhen Hua Liu o , Xing Niu i , Ying Yang b b: University at Buffalo i: Illinois Inst. Tech. o: Oracle 1

Upload: others

Post on 26-Oct-2019

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

AdaptiveSchemaDatabases

WilliamSpothb,Bahareh SadatArabi,EricS.Chano,DieterGawlicko,AdelGhoneimyo,BorisGlavici,BedaHammerschmidto,OliverKennedyb,

Seokki Leei,ZhenHua Liuo,XingNiui,YingYangb

b:UniversityatBuffalo i:IllinoisInst.Tech. o:Oracle

1

Page 2: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

AdaptiveSchemaDatabases

2

Page 3: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

Classicrelationaldatabase• Navigationalandorganizationalpurposeretaindiscovery,goodperformanceandspace,reusable.

3

Page 4: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

Classicrelationaldatabase• But...Highupfrontcostandinflexible

4

Page 5: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

BigData/NOSQL• Datacanbeusedimmediately.

5

Page 6: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

BigData/NOSQL• But...SacrificenavigationalandPerformancebenefitandmayendupwithduplicateofwork

6

Page 7: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

AdaptiveSchemaDatabases

Queriesandfeedback...

eventually

• BridgethegapbetweenrelationaldatabaseandNoSQl.

7

Page 8: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

AdaptiveSchemaDatabases

Queriesandfeedback...

eventually

• BridgethegapbetweenrelationaldatabaseandNoSQl.

8

Page 9: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

AdaptiveSchemaDatabases

Input:

Queries:SELECTnameFROMUndergradUNIONSELECTnameFROMGrad

SELECTdeg FROMGrad

SELECTnameFROMStudent

9

Page 10: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

Outline

Unstructured Data Semi-structed Data (e.g., JSON)

Extraction workflow

SchemaWorkspace

Schema Matching

Extraction workflow Extraction workflow

Extraction Schema Candidates

SchemaWorkspace

SchemaWorkspace

SchemaWorkspace

Queries + Feedback• Extractionanddiscovery• Adaptive,personalizedschemas

fromqueries• Explanationsandfeedback• Adaptiveorganization• Conclusionsandfuturework

10

Page 11: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

Extraction

Unstructured Data Semi-structed Data (e.g., JSON)

Extraction workflow

SchemaWorkspace

Schema Matching

Extraction workflow Extraction workflow

Extraction Schema Candidates

SchemaWorkspace

SchemaWorkspace

SchemaWorkspace

Queries + Feedback

11

Page 12: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

• ASDextractsschemacandidateset

Giveninput:

12

Extraction

Page 13: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

• ASDextractsschemacandidateset

Giveninput:

13

Extraction

Page 14: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

• ASDextractsschemacandidateset

Giveninput:

14

Extraction

Page 15: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

Extraction• ASDextractsschemacandidateset

Giveninput:

15

Page 16: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

• ASDextractsschemacandidateset

schemacandidatesetCext={Sext,Pext},whereSext isasetofcandidateschemas,Pext isaprobabilitydistributionovertheseschemas.

16

Discovery

Page 17: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

• ASDextractsschemacandidateset

Smax:thebestguessschema

17

Discovery

Page 18: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

Adaptive,personalizedschemasfromqueries

Unstructured Data Semi-structed Data (e.g., JSON)

Extraction workflow

SchemaWorkspace

Schema Matching

Extraction workflow Extraction workflow

Extraction Schema Candidates

SchemaWorkspace

SchemaWorkspace

SchemaWorkspace

Queries + Feedback

18

Page 19: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

• ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.

Initially,W={}

19

Adaptive,personalizedschemas

Page 20: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

• ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.

20

Query1:SELECTname FROMUndergrad UNIONSELECTname FROMGrad

FindingSchemasfromQueries

Page 21: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

• ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.

21

FindingSchemasfromQueries

Query1:SELECTname FROMUndergrad UNIONSELECTname FROMGrad

Page 22: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

• ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.

22

Query2:SELECTdeg FROMGrad

FindingSchemasfromQueries

Page 23: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

• ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.

Query3:SELECTnameFROMStudent

W1 =(S1={Undergrad(name)},P1=0.27),(S1={Grad(name)},P1=0.23),(S1={Undergrad(name), Grad(name)},P1=0.5)

23

SynthesizingTables

Page 24: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

Explanationsandfeedback

Unstructured Data Semi-structed Data (e.g., JSON)

Extraction workflow

SchemaWorkspace

Schema Matching

Extraction workflow Extraction workflow

Extraction Schema Candidates

SchemaWorkspace

SchemaWorkspace

SchemaWorkspace

Queries + Feedback

24

Page 25: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

Extractionerrorsappearinthreeforms:(1)AqueryincompatiblewithSmax

(2)AnupdatewithdatathatviolatesSmax(3)Anextractionerrorpresentedtouser

Weprovide:(1)explanationofresults(2)provenance(3)Warn theanalystwithambiguity(4)Explain theambiguity(5)Evaluate themagnitudeofambiguity(6)Assisttheanalysttoresolve theambiguity

Whatmightgowrong

25

Page 26: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

ASDinteractswiththeoutsideworld:Schema,Data,andUpdate.

Schemainteractions:WhenaqueryincompatiblewithSmax andtheworkspace

Datainteractions:provenanceforattributeandrowlevelambiguity.

Updateinteractions:• representschemamismatchesasmissingvalues.• resolvedataerrorswithaprobabilisticrepair.• upgradeherschematomatchthechanges.• checkpointherworkspaceandignorenewupdates.

Typesoferrors

26

Page 27: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

Explanationsandfeedback

Explanations:WematchStudentwith

bothGradandUndergrad

27

Condition2:Queryfromunknown schemaelements:SELECTnameFROMStudent

W1 =(S1={Undergrad(name)},P1=0.27),(S1={Grad(name)},P1=0.23),(S1={Undergrad(name), Grad(name)},P1=0.5)

Page 28: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

Adaptiveorganization

Unstructured Data Semi-structed Data (e.g., JSON)

Extraction workflow

SchemaWorkspace

Schema Matching

Extraction workflow Extraction workflow

Extraction Schema Candidates

SchemaWorkspace

SchemaWorkspace

SchemaWorkspace

Queries + Feedback

28

Page 29: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

AdaptiveorganizationTrade-offbetweenstoringdatainitsnativeformatandbasedonaspecificschema.

Whatisthechallenge?Manyworkspaces,addtabletotheschema,….

ChallengesandPossibleSolutions:• Wewantmultiplepersonalizedschemas

1.Relationalworkspaceschemaisessentiallyaviewoverrawdata.Materializingviewcanbeused.

2.Useexistingadaptivephysicaldesign andcaching techniques.• Sharedmaterializations

1.Incrementalmaterializedviewmaintenance.Leveragetechniquesfromrevisioncontrolsystems.

2.Viewselectionproblem.29

Page 30: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

Conclusionsandfuturework

ASDbridgesthegapbetweenrelationaldatabasesandNoSQL.

• Discovery:Helpuserexploreandunderstandnewdatabyprovidinganoutlineoftheavailableinformation.Done

• Materialization:Adoptworkonadaptivedatastructures.Partiallydone• DataSynthesis:Synthesisnewtablesandattributesfromexistingdata.

Done• ConflictResponse:

– Versioningorbranchingtheschema.– Loganalysistohelpusersassesstheimpactofschemarevisions.

30