edyra
TRANSCRIPT
© Prof. Dr. -Ing. Wolfgang Lehner |
ResUbic Research Lab DresdenEDYRA Engineering of Do-it-Yourself Analytic Rich Internet Applications
Wolfgang LehnerMaik ThieleKatrin BraunschweigJulian Eberius
ResUbic Research Seminar
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar | 2
>
MAD Skills
[Jeffrey Cohen, Brian Dolan, Mark Dunlap, Joseph M. Hellerstein, Caleb Welton: MAD Skills: New Analysis Practices for Big Data. PVLDB 2009]
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 3
> Motivation (1)
In the days of Kings and Priests Computers and Data: Crown Jewels Executives depend on computers
But cannot work with them directly The DBA “Priesthood”
And their Acronymia: EDW, BI, OLAP
The architected Enterprise DWH Rational behavior…for a bygone era “There is no point in bringing data … into the
data warehouse environment without integrating it.”—Bill Inmon, Building the Data Warehouse, 2005
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 4
> Motivation (2)
New Realities TB disks < $100 Everything is data Rise of data-driven culture
Very publicly espoused by Google, Wired, etc. Sloan Digital Sky Survey, Terraserver, etc.
The quest for knowledge used to begin with grand theories.
Now it begins with massive amounts of data.
Welcome to the Petabyte Age.
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 5
> MAD Skills
Magnetic „Attract data and practitioners“ Usage of all data source independet of their data quality
Agile „Rapid iteration: ingest, analyze, productionalize“ Continous evolution of the logical and physical structures ELT (Extraction, Loading, Transformation)
Deep „Sophisticated analytics in Big Data“ Extended algorithmic run-time Ad-hoc advanced analytics and statistics
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 6
> Open Data, Services and Mashups
Web of Data E-Government 2.0, Initiative i2010 Europeana, World Digital Library Public data catalogs
http://data.gov/ http://data.gov.uk/
Free to Copy, distribute and transmit the data Adapt the data Exploiting the data commercially, whether by sub-licensing it, combining it with other
data, or by including it in your own product
Web of Services OpenSocial-API (Google, Yahoo!, MySpace, Xing) Scientific Computations (http://www.wolframalpha.com) Entitiy Detection (http://www.yooname.com) Visualization (http://manyeyes.alphaworks.ibm.com/manyeyes)
Web of Mashups Programmale Web (http://www.programmableweb.com/)
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 7
> Principles of Open Data
Data shall be considered open if it is made public in a way that complies with the principles below Complete: All public data is made available. Public data is data that is not subject to valid privacy,
security or privilege limitations. Primary: Data is as collected at the source, with the highest possible level of granularity, not in
aggregate or modified forms Timely: Data is made available as quickly as necessary to preserve the value of the data. Accessible: Data is available to the widest range of users for the widest range of purposes. Machine processable: Data is reasonably structured to allow automated processing. Non-discriminatory: Data is available to anyone, with no requirement of registration. Non-proprietary: Data is available in a format over which no entity has exclusive control. License-free: Data is not subject to any copyright, patent, trademark or trade secret regulation.
Reasonable privacy, security and privilege restrictions may be allowed.
Quelle: http://resource.org/8_principles.html
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 8
>
„Daten gehören den Menschen“ – typische Beispiele: Genome, Daten von Organismen, medizinische Forschung, umweltwissenschaftliche Daten
öffentliche Gelder haben die Generierung der Daten erst ermöglicht, also müssen sie auch öffentlich zugänglich sein (tatsächlich treten Wissenschaftler in der Regel die Rechte an den von ihnen generierten Daten an private Verlage ab, wenn sie ihre Ergebnisse publizieren)
Fakten können nicht dem Urheberrecht unterliegenForschung wird gefördert, wenn wissenschaftliche Erkenntnisse für alle
Forscher frei zugänglich sind
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 9
> Gapminder
http://www.gapminder.org/
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 10
> Gapminder (2)
Vision: making sense of the world by having fun with statistics! Gapminder is a non-profit venture for development and provision of free software to
visualize human development trends Gapminder will ultimately be integrated into Google: this is the first time global
datasets will be searchable over the Internet
Hans Rosling @ TED TEDTalks: annual technology conference in California, USA
http://www.ted.com/tedtalks/ Hans Rosling is a professor of global health at the Karolinska Institute, data
visualization extraordinaire and the creator of the Gapminder tools see http://www.youtube.com/watch?v=YpKbO6O3O3M
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 11
> Public.Resource.Org
Idea: Make government more transparentProject funded: Public.Resource.Org is a non-profit organization focused on enabling online access to public government documents in the United States. We are providing $2 million to Public.Resource.Org to support the Law.Gov initiative, which aims to make all primary legal materials in the United States available to all.
Gewinner des Projekts 10100
http://www.project10tothe100.com/intl/DE/index.html
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 12
> Microsoft’s Open Government Data Initiative
• The Open Government Data Initiative (OGDI) is a cloud-based collection of software assets that enables publicly available government data to be easily accessible. Using open standards and application programming interfaces (API), developers and government agencies can retrieve the data programmatically for use in new and innovative online applications, or mash-ups that can help:– Improve citizen services – Enhance collaboration between government agencies and private organizations – Increase government transparency •OGDI promotes the use of this data by capturing and publishing re-usable
software assets, patterns, and practices. The data repository already holds over 60 different government datasets that are readily available for use in new applications, and is continuously updated with additional government datasets. •More: http://www.microsoft.com/industry/government/opengovdata/
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 13
> Civic Commons
http://civiccommons.com/
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 14
> data.gov
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 15
> data.gov.uk
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 16
> data.worldbank.org
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 17
> unData
http://data.un.org/
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 18
> Ushahidi
http://www.ushahidi.com/
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 19
> Statistisches Bundesamt Deutschland
https://www-genesis.destatis.de/genesis/online/
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 20
> offenedaten.de
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 21
> Data360
http://www.data360.org
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 22
> IBM ManyEyes
http://manyeyes.alphaworks.ibm.com)/manyeyes/
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 23
> Open Citizen‘s Platform
Public issue tracking provides increased engagement, transparency, and participation in the community
Manage issues in urban environments, like pot-holes, broken street lighting or lack of accessibility
What are the benefits to…
Governments Reduce time, effort and resources in
fulfilling public information requests Increase data quality by providing correct
data to public from the source Reduce duplication of effort Increase data access, availability, and speed
of delivery Improve citizen satisfaction and create
good public relations with your community
Citizens Open access to complete, formatted data
rather than relying on third party interpretations or subsets
Information accessibility leads to greater government accountability
Fosters better community action on social issues, e.g. crime, pollution, permits, accidents, and education
Improves regional competitiveness by giving businesses quicker and fuller access to data
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 24
> What are the goals of the project?
Long Term… Build a open citizen platform for Dresden www.opendresden.de Process it.. compare it... mix it.. filter it... visualize it… Basic premises
Build a simple system and let it evolve Design for participation Openness
For now… Start with a series of value-added municipal services (e.g. Mapnificient,
Schooloscope, Cycling Planner, see following slides) Transport, Education, Economy, (Local) Politics, Environment, Entertainment
Promote the open data principle in Saxony Develop a fluid data repository (for municipal data) Design a domain specific language in order to integrate and analyze data
Different levels of abstraction Reuse existing apps Visual dataflow languages Textual DSL editors
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 25
> Mapnificient
http://www.mapnificent.net/london/
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 26
> Schooloscope
http://schooloscope.com
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 27
> Where can I live
http://www.where-can-i-live.com/londonproperty
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 28
> UBC/Google cycling planner
http://www.cyclevancouver.ubc.ca/cv.aspx
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 29
> CitySourced
http://www.citysourced.com
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 30
> EveryBlock
http://chicago.everyblock.com/
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 31
> Architecture – Sketch
Visu
aliz
ation
Ope
n D
ata
and
Publ
ic D
ata
Sour
ces Google
Maps
OpenstreetMap
IBMManyEyes
Flui
d D
ata
Repo
sito
ry
Geo DataCitizen
Request‘s
MunicipalData
REST
JSON
KML
GeoRSS
Lightweight Composite Applications• Create information from the data• Uncover hidden aspects of data• Which becomes new data itself• Classification, prediction, clustering• Embrace recursion
Lightweight Integration Techniques• Join across dimensions (e.g. Entity + Time
+ Place)• Aggregations
http://www.omgstandard.com
API for location-based collaborative issue-trackinghttp://open311.org
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 32
> Fluid Data Repository
Platform for the web of things, each represented by an openly writable „social“ object
Share, annotate, augment and re-use informationMainly concerns data mediation and integrationNeed to access and integrate data residing in multiple and heterogeneous
sourcesAdaptive, add metrics, aggregations,
data sources or data connections without re-building analysis processes or visualizations “non-destructive change”
© Prof. Dr.-Ing. Wolfgang Lehner| ResUbic Research Seminar 33
> Alternative Data Models
NoSQL
Graph
other
BigTableHBase
Cassandra
Hypertable
SimpleDB
CouchDB
MongoDB OrientDB
RavenDB
Terrastore
Neo4J
Sones
HyperGraphDB
AllegroGraph
Voldemort
Dynomite
Dynamo
Riak
RedisScalaris
Tokio Cabinet GT.M
Pahoehoe
FluidDB
FlockDB
ThruDBColumn Families
Key/ValueRedStore
Viruoso
JenaSesame YARS
Triple Stores
Documents