edyra

33
© Prof. Dr. -Ing. Wolfgang Lehner | ResUbic Research Lab Dresden EDYRA Engineering of Do-it- Yourself Analytic Rich Internet Applications Wolfgang Lehner Maik Thiele Katrin Braunschweig Julian Eberius ResUbic Research Seminar

Upload: maikthiele

Post on 22-Aug-2015

394 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Edyra

© Prof. Dr. -Ing. Wolfgang Lehner |

ResUbic Research Lab DresdenEDYRA Engineering of Do-it-Yourself Analytic Rich Internet Applications

Wolfgang LehnerMaik ThieleKatrin BraunschweigJulian Eberius

ResUbic Research Seminar

Page 2: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar | 2

>

MAD Skills

[Jeffrey Cohen, Brian Dolan, Mark Dunlap, Joseph M. Hellerstein, Caleb Welton: MAD Skills: New Analysis Practices for Big Data. PVLDB 2009]

Page 3: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 3

> Motivation (1)

In the days of Kings and Priests Computers and Data: Crown Jewels Executives depend on computers

But cannot work with them directly The DBA “Priesthood”

And their Acronymia: EDW, BI, OLAP

The architected Enterprise DWH Rational behavior…for a bygone era “There is no point in bringing data … into the

data warehouse environment without integrating it.”—Bill Inmon, Building the Data Warehouse, 2005

Page 4: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 4

> Motivation (2)

New Realities TB disks < $100 Everything is data Rise of data-driven culture

Very publicly espoused by Google, Wired, etc. Sloan Digital Sky Survey, Terraserver, etc.

The quest for knowledge used to begin with grand theories.

Now it begins with massive amounts of data.

Welcome to the Petabyte Age.

Page 5: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 5

> MAD Skills

Magnetic „Attract data and practitioners“ Usage of all data source independet of their data quality

Agile „Rapid iteration: ingest, analyze, productionalize“ Continous evolution of the logical and physical structures ELT (Extraction, Loading, Transformation)

Deep „Sophisticated analytics in Big Data“ Extended algorithmic run-time Ad-hoc advanced analytics and statistics

Page 6: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 6

> Open Data, Services and Mashups

Web of Data E-Government 2.0, Initiative i2010 Europeana, World Digital Library Public data catalogs

http://data.gov/ http://data.gov.uk/

Free to Copy, distribute and transmit the data Adapt the data Exploiting the data commercially, whether by sub-licensing it, combining it with other

data, or by including it in your own product

Web of Services OpenSocial-API (Google, Yahoo!, MySpace, Xing) Scientific Computations (http://www.wolframalpha.com) Entitiy Detection (http://www.yooname.com) Visualization (http://manyeyes.alphaworks.ibm.com/manyeyes)

Web of Mashups Programmale Web (http://www.programmableweb.com/)

Page 7: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 7

> Principles of Open Data

Data shall be considered open if it is made public in a way that complies with the principles below Complete: All public data is made available. Public data is data that is not subject to valid privacy,

security or privilege limitations. Primary: Data is as collected at the source, with the highest possible level of granularity, not in

aggregate or modified forms Timely: Data is made available as quickly as necessary to preserve the value of the data. Accessible: Data is available to the widest range of users for the widest range of purposes. Machine processable: Data is reasonably structured to allow automated processing. Non-discriminatory: Data is available to anyone, with no requirement of registration. Non-proprietary: Data is available in a format over which no entity has exclusive control. License-free: Data is not subject to any copyright, patent, trademark or trade secret regulation.

Reasonable privacy, security and privilege restrictions may be allowed.

Quelle: http://resource.org/8_principles.html

Page 8: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 8

>

„Daten gehören den Menschen“ – typische Beispiele: Genome, Daten von Organismen, medizinische Forschung, umweltwissenschaftliche Daten

öffentliche Gelder haben die Generierung der Daten erst ermöglicht, also müssen sie auch öffentlich zugänglich sein (tatsächlich treten Wissenschaftler in der Regel die Rechte an den von ihnen generierten Daten an private Verlage ab, wenn sie ihre Ergebnisse publizieren)

Fakten können nicht dem Urheberrecht unterliegenForschung wird gefördert, wenn wissenschaftliche Erkenntnisse für alle

Forscher frei zugänglich sind

Page 9: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 9

> Gapminder

http://www.gapminder.org/

Page 10: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 10

> Gapminder (2)

Vision: making sense of the world by having fun with statistics! Gapminder is a non-profit venture for development and provision of free software to

visualize human development trends Gapminder will ultimately be integrated into Google: this is the first time global

datasets will be searchable over the Internet

Hans Rosling @ TED TEDTalks: annual technology conference in California, USA

http://www.ted.com/tedtalks/ Hans Rosling is a professor of global health at the Karolinska Institute, data

visualization extraordinaire and the creator of the Gapminder tools see http://www.youtube.com/watch?v=YpKbO6O3O3M

Page 11: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 11

> Public.Resource.Org

Idea: Make government more transparentProject funded: Public.Resource.Org is a non-profit organization focused on enabling online access to public government documents in the United States. We are providing $2 million to Public.Resource.Org to support the Law.Gov initiative, which aims to make all primary legal materials in the United States available to all.

Gewinner des Projekts 10100

http://www.project10tothe100.com/intl/DE/index.html

Page 12: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 12

> Microsoft’s Open Government Data Initiative

• The Open Government Data Initiative (OGDI) is a cloud-based collection of software assets that enables publicly available government data to be easily accessible. Using open standards and application programming interfaces (API), developers and government agencies can retrieve the data programmatically for use in new and innovative online applications, or mash-ups that can help:– Improve citizen services – Enhance collaboration between government agencies and private organizations – Increase government transparency •OGDI promotes the use of this data by capturing and publishing re-usable

software assets, patterns, and practices. The data repository already holds over 60 different government datasets that are readily available for use in new applications, and is continuously updated with additional government datasets. •More: http://www.microsoft.com/industry/government/opengovdata/

Page 13: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 13

> Civic Commons

http://civiccommons.com/

Page 14: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 14

> data.gov

Page 15: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 15

> data.gov.uk

Page 16: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 16

> data.worldbank.org

Page 17: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 17

> unData

http://data.un.org/

Page 18: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 18

> Ushahidi

http://www.ushahidi.com/

Page 19: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 19

> Statistisches Bundesamt Deutschland

https://www-genesis.destatis.de/genesis/online/

Page 20: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 20

> offenedaten.de

Page 21: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 21

> Data360

http://www.data360.org

Page 22: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 22

> IBM ManyEyes

http://manyeyes.alphaworks.ibm.com)/manyeyes/

Page 23: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 23

> Open Citizen‘s Platform

Public issue tracking provides increased engagement, transparency, and participation in the community

Manage issues in urban environments, like pot-holes, broken street lighting or lack of accessibility

What are the benefits to…

Governments Reduce time, effort and resources in

fulfilling public information requests Increase data quality by providing correct

data to public from the source Reduce duplication of effort Increase data access, availability, and speed

of delivery Improve citizen satisfaction and create

good public relations with your community

Citizens Open access to complete, formatted data

rather than relying on third party interpretations or subsets

Information accessibility leads to greater government accountability

Fosters better community action on social issues, e.g. crime, pollution, permits, accidents, and education

Improves regional competitiveness by giving businesses quicker and fuller access to data

Page 24: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 24

> What are the goals of the project?

Long Term… Build a open citizen platform for Dresden www.opendresden.de Process it.. compare it... mix it.. filter it... visualize it… Basic premises

Build a simple system and let it evolve Design for participation Openness

For now… Start with a series of value-added municipal services (e.g. Mapnificient,

Schooloscope, Cycling Planner, see following slides) Transport, Education, Economy, (Local) Politics, Environment, Entertainment

Promote the open data principle in Saxony Develop a fluid data repository (for municipal data) Design a domain specific language in order to integrate and analyze data

Different levels of abstraction Reuse existing apps Visual dataflow languages Textual DSL editors

Page 25: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 25

> Mapnificient

http://www.mapnificent.net/london/

Page 26: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 26

> Schooloscope

http://schooloscope.com

Page 27: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 27

> Where can I live

http://www.where-can-i-live.com/londonproperty

Page 28: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 28

> UBC/Google cycling planner

http://www.cyclevancouver.ubc.ca/cv.aspx

Page 29: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 29

> CitySourced

http://www.citysourced.com

Page 30: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 30

> EveryBlock

http://chicago.everyblock.com/

Page 31: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 31

> Architecture – Sketch

Visu

aliz

ation

Ope

n D

ata

and

Publ

ic D

ata

Sour

ces Google

Maps

OpenstreetMap

IBMManyEyes

Flui

d D

ata

Repo

sito

ry

Geo DataCitizen

Request‘s

MunicipalData

REST

JSON

KML

GeoRSS

Lightweight Composite Applications• Create information from the data• Uncover hidden aspects of data• Which becomes new data itself• Classification, prediction, clustering• Embrace recursion

Lightweight Integration Techniques• Join across dimensions (e.g. Entity + Time

+ Place)• Aggregations

http://www.omgstandard.com

API for location-based collaborative issue-trackinghttp://open311.org

Page 32: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 32

> Fluid Data Repository

Platform for the web of things, each represented by an openly writable „social“ object

Share, annotate, augment and re-use informationMainly concerns data mediation and integrationNeed to access and integrate data residing in multiple and heterogeneous

sourcesAdaptive, add metrics, aggregations,

data sources or data connections without re-building analysis processes or visualizations “non-destructive change”

Page 33: Edyra

© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 33

> Alternative Data Models

NoSQL

Graph

other

BigTableHBase

Cassandra

Hypertable

SimpleDB

CouchDB

MongoDB OrientDB

RavenDB

Terrastore

Neo4J

Sones

HyperGraphDB

AllegroGraph

Voldemort

Dynomite

Dynamo

Riak

RedisScalaris

Tokio Cabinet GT.M

Pahoehoe

FluidDB

FlockDB

ThruDBColumn Families

Key/ValueRedStore

Viruoso

JenaSesame YARS

Triple Stores

Documents