why google fusion tables is not a data integration tool

32
Data Management, Integration and Collaboration in the cloud Gianni Fiore

Upload: gianni-fiore

Post on 08-Apr-2017

85 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Why Google fusion tables is not a Data Integration tool

Data Management, Integration and Collaboration in the cloud

Gianni Fiore

Page 2: Why Google fusion tables is not a Data Integration tool

from Google Labs

Page 3: Why Google fusion tables is not a Data Integration tool

3

An experimental Web Service for data management First seen on the web in 2009 Related scientific paper in 2010 Default Google Application from 2011

Main goal: providing seamless data integration over the web

Page 4: Why Google fusion tables is not a Data Integration tool

Technology Google search engine for data tables over the web Google Documents: a collaborative, community-based documental base

Realtime collaboration Comment versioning

Google Drive: a cloud storage service Abstract how the data is accessed/stored

Google Visualization API Google Maps

Resources Very large amount of public and private data over the web

Page 5: Why Google fusion tables is not a Data Integration tool

Fusion Tables Model Relational Model

Relation as a collection of homogeneous data records Spreadsheet-like data arrangement

Conjunctive queries over data: Select, to filter out a subset of the records Project, to show only properties of interest Join, to enrich data with other relations

Virtually creates new view of the same data Auto-parsing of data types (text, number, datetime or location) Advanced location parsing

Compare

View

Page 6: Why Google fusion tables is not a Data Integration tool

Sharing Model Advanced permissions model Invite any people to work with you

As viewers, can see and comment As contributors, can also merge, but not edit As collaborators, can merge and edit as you As owners, can also invite others

Compare

View

Page 7: Why Google fusion tables is not a Data Integration tool

Sharing Model Realtime data changes

Cell-based data versioning Cell, row and column based comments

Compare

View

Page 8: Why Google fusion tables is not a Data Integration tool

Let’s have an example Three ways to start playing with Fusion Tables:

From your local tables, uploading them From your Google Spreadsheets, converting them in fusion tables From scratch, i.e. creating a brand new table online From web, searching for an existing data table

Pay-as-you-go effort

Compare

View

Page 9: Why Google fusion tables is not a Data Integration tool

An example: World cars production in 2011 Search with keyword: cars.

Compare

View

Page 10: Why Google fusion tables is not a Data Integration tool

An example: World cars production in 2011 A public fusion table about cars produced in the world.

Compare

View

Page 11: Why Google fusion tables is not a Data Integration tool

An example: World cars production in 2011 Overview of the user interface (stylesheets)

Compare

View

Page 12: Why Google fusion tables is not a Data Integration tool

An example: World cars production in 2011 Advanced Geocoding

Compare

View

Page 13: Why Google fusion tables is not a Data Integration tool

An example: World cars production in 2011 Cards view mode

Compare

View

Page 14: Why Google fusion tables is not a Data Integration tool

An example: World cars production in 2011 Pie view mode, a matter of graph configuration

Compare

View

Page 15: Why Google fusion tables is not a Data Integration tool

An example: World cars production in 2011 Intensity map view mode

Compare

View

Page 16: Why Google fusion tables is not a Data Integration tool

An example: World cars production in 2011 Intensity map view mode

Compare

View

Page 17: Why Google fusion tables is not a Data Integration tool

An example: World cars production in 2011 Intensity map view mode

Compare

View

Page 18: Why Google fusion tables is not a Data Integration tool

An example: World cars production in 2011 Chart view mode

Compare

View

Page 19: Why Google fusion tables is not a Data Integration tool

A join example: World cars production and World population in 2011 Let’s compare with other data around the web: world population

Compare

View

Page 20: Why Google fusion tables is not a Data Integration tool

A join example: World cars production and World population in 2011 You can virtually merge together: mapping source column to target

column.

Compare

View

Page 21: Why Google fusion tables is not a Data Integration tool

A join example: World cars production and World population in 2011 You choose which columns to project out after the merge. The merged table will be computed dinamically from the two base tables,

using the rows from the first table. Changes to the base tables will be reflected in the merged table and vice versa

Compare

View

Page 22: Why Google fusion tables is not a Data Integration tool

Comparing world population and world cars production Compare

View

Page 23: Why Google fusion tables is not a Data Integration tool

Comparing world population and world cars production Compare

View

Page 24: Why Google fusion tables is not a Data Integration tool

China and India? Compare

View

Page 25: Why Google fusion tables is not a Data Integration tool

China? Compare

View

Page 26: Why Google fusion tables is not a Data Integration tool

Under the hood

Requests from Google Fusion Tables website Standalone Applications via API Charts and maps from widgets

Formalizing requests into queries

Scheduling a query plan

Triggering query plans

Asking source tables

Page 27: Why Google fusion tables is not a Data Integration tool

Under the hood

Column-based Data values versioning Table level transaction management

Page 28: Why Google fusion tables is not a Data Integration tool

Under the hood

rows table No PK specified: generated

as <table, row> Indexed and non-indexed

properties

schemas table PK: <table> Columns: table columns, with

their preferred type Access Control Lists, or public

marker

comments table PK: <table, row, col> Comments

property index table PK: <table, row,

prop_name, prop_value> Only keys

Page 29: Why Google fusion tables is not a Data Integration tool

Under the hood

Operations Key lookup Prefix scan Range scan

Example 1: retrieving all rows in a table1. Suppose a SQL-like query

select country, rank from cars_prod2. It is transformed into a prefix scan over rows table

Prefix: <cars_prod, ANY>3. Projections are applied to show only columns country and rank

Example 2: filtering out rows from a table1. Suppose a SQL-like query

select country, rank from cars_prod where rank <= 102. It uses the properties index to retrieve which rows to show

Prefix: <cars_prod, ?, rank, 10>3. Projections are applied to show only columns country and rank among

rows selected in rows table by id

Page 30: Why Google fusion tables is not a Data Integration tool

Peer-to-peer data exchange and integration No central global schema Data exchange only at import time

No wrappers Same underlying infrastructure

Virtualization mechanism based on query Single [big] table Prefix-based mappings

Autonomy of data sources No logical transparency support No pre-aggregated data, only realtime declarative queries Virtual data federation

Page 31: Why Google fusion tables is not a Data Integration tool

Any way to build an ontology on top of Google Fusion Tables? Data inaccuracy Data incompleteness No way to define integrity constraints

No primary keys No inclusion dependencies between tables No exclusion dependencies between tables No foreign keys

Can use as RDBMS? Need a layer on top to guarantee constraints Manual mappings

It’s NoSQL, you can use it at your risk...

Page 32: Why Google fusion tables is not a Data Integration tool

Thank You