the data ring: community content sharing serge abiteboul (inria) alkis polyzotis (uc santa cruz)
Post on 20-Dec-2015
217 views
TRANSCRIPT
![Page 1: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1cf82/html5/thumbnails/1.jpg)
The Data Ring: Community Content Sharing
Serge Abiteboul (INRIA)
Alkis Polyzotis (UC Santa Cruz)
![Page 2: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1cf82/html5/thumbnails/2.jpg)
Data Sharing Communities
• Examples: UCSC genome browser, SwissProt, Flickr• Interesting data management problem
– Shared information is heterogeneous– Data is distributed and dynamic– Lack of central administration– Users are not database savvy
Data sharing community: a group of users that share and query information within some domain
![Page 3: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1cf82/html5/thumbnails/3.jpg)
The Data Ring
• P2P middleware system that provides:– Monitoring– Querying– …and other database-like services over the
distributed information
• Main goal: simplicity of use
![Page 4: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1cf82/html5/thumbnails/4.jpg)
Data abstraction in the data ring
• Topological layer• Physical layer• External layer
![Page 5: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1cf82/html5/thumbnails/5.jpg)
Data abstraction in the data ring
• Declarative query services• Data and query model based on XML
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Topological Layer
![Page 6: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1cf82/html5/thumbnails/6.jpg)
Data abstraction in the data ring
• Basic service is distributed query evaluation• Comprises the overlay network (DHT), physical access structures (indices,
replicas, views), and the catalog.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Physical Layer
![Page 7: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1cf82/html5/thumbnails/7.jpg)
Data abstraction in the data ring
• Provides semantically richer data models
External Layer
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
![Page 8: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1cf82/html5/thumbnails/8.jpg)
Data abstraction in the data ring
• Our focus is on the topological and physical layer
• External layer is equally important and an active research area
Physical Layer
Topological Layer
![Page 9: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1cf82/html5/thumbnails/9.jpg)
Thesis #1: formalism for distributed XML data and queries
![Page 10: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1cf82/html5/thumbnails/10.jpg)
Distributed XML data and queries
• What made the relational model successful:– A logic for describing tables– An algebra for query optimization
• We need the equivalent for trees in a distributed context:– A logic for describing distributed XML data– An algebra for optimizing distributed XML queries
![Page 11: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1cf82/html5/thumbnails/11.jpg)
Desiderata for description logic
• Seamless transition between data and services– Important for loose data integration
• Support for XML streams– Streams are essential for subscription services– They are also necessary to support recursion
![Page 12: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1cf82/html5/thumbnails/12.jpg)
Starting point: AXML
• AXML: XML tree with embedded web service calls– Seamless transition between intentional and
extensional data– Provides a simple mechanism for loose data
integration
• Core concept: XML streams– A web service call returns a stream of elements– Support for both push and pull semantics
![Page 13: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1cf82/html5/thumbnails/13.jpg)
Desiderata for algebra
• Be amenable to rewrites • Capture the topology of distributed computation • Allow seamless transition between logical and
physical state– Plans may need to be re-optimized in mid-flight– It may be necessary to perform partial optimization– Error recovery
![Page 14: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1cf82/html5/thumbnails/14.jpg)
A proposal based on AXML
• A distributed plan is a workflow of web services … which is exactly a AXML tree
• Components:– An encoding of distributed plans in AXML– Rewrite rules
• A nice bonus: plans can be readily exchanged between nodes
![Page 15: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1cf82/html5/thumbnails/15.jpg)
Disclaimer
• AXML is a starting point, not a panacea• Bottom line: we need formalisms for
distributed XML queries
![Page 16: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1cf82/html5/thumbnails/16.jpg)
Thesis #2: autonomic administration
![Page 17: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1cf82/html5/thumbnails/17.jpg)
Autonomic administration
• Users are not database experts– Typically, scientists with computer experience
• Users are averse to too many “knobs”• No central authority that is responsible for
administration• Autonomic administration is a necessity -- not
a gadget
![Page 18: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1cf82/html5/thumbnails/18.jpg)
Facets of autonomy
• Self-monitoring• Self-tuning• Self-healing
![Page 19: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1cf82/html5/thumbnails/19.jpg)
Some issues
• System integration• Distribution• On-line tuning• Pro-active tuning
![Page 20: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1cf82/html5/thumbnails/20.jpg)
Distributed vs. local tuning
• Distributed tuning– Based on the global workload– Catalog organization, replication
• Local tuning– Based on local workload– Physical design tuning
![Page 21: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1cf82/html5/thumbnails/21.jpg)
Data activation for files
• A large portion of the data is expected to be in files• We need to develop query processors for data
residing in files• File activation: optimize access to the file based on
the local workload– E.g., instantiate an index on file contents or materialize a
relational view
• Local tuning is essential in this context