Download - The Rensselaer IDEA: Data Exploration
![Page 1: The Rensselaer IDEA: Data Exploration](https://reader033.vdocuments.us/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/1.jpg)
Data ExplorationJim Hendler
Director, Rensselaer Institute for Data Exploration and Applications
THE RENSSELAER IDEARensselaer Polytechnic Institute, USA
http://www.cs.rpi.edu/~hendler
![Page 2: The Rensselaer IDEA: Data Exploration](https://reader033.vdocuments.us/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/2.jpg)
IDEA
• Data-driven Medical and Healthcare Applications• Predictive Models for Business and Economics• “Biome” studies for Built and Natural Environments• Question Answering from texts and data• Resiliency Models for Population-Scale Problems and cyber-
security domains• Semantically-enabled Data Services for Science and
Engineering Research• Materials genome and nano-manufacturing informatics• Platforms for testing Policy and Open Data issues • …
Data-driven research areas at RPI
![Page 3: The Rensselaer IDEA: Data Exploration](https://reader033.vdocuments.us/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/3.jpg)
IDEA
The Rensselaer IDEA: empowering our researchers
Data discovery, integration,
and interaction technologies
Application-specificdata tools
![Page 4: The Rensselaer IDEA: Data Exploration](https://reader033.vdocuments.us/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/4.jpg)
IDEA
High Performance Modeling and Simulation• Center for Computational Innovation
Cognitive Computing • Watson at Rensselaer IBM Partnership
Perceptualization• Experimental Multimedia Performing Arts Center
Data Science• Data Science Research Center
The trunk: Shared Data Technologies
![Page 5: The Rensselaer IDEA: Data Exploration](https://reader033.vdocuments.us/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/5.jpg)
IDEA
Roots: Data Exploration
Discover
Integrate
Validate
Explain
Geekopedia: Data exploration helps a data consumer focus an information search on the pertinent aspect of relevant data before true analysis can be achieved. In large data sets, data is not gathered or controlled in a focused manner. Even in smaller data sets, it is also true that data gathered are not in a very rigid and specific technique can result in a disorganized manner and a myriad of subsets each…
DATA
![Page 6: The Rensselaer IDEA: Data Exploration](https://reader033.vdocuments.us/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/6.jpg)
IDEA
Data Exploration Challenges
Discover
Integrate
Validate
Explain
These needs live outside traditional data/info architectures
![Page 7: The Rensselaer IDEA: Data Exploration](https://reader033.vdocuments.us/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/7.jpg)
IDEA
Discovery needs semantics
How do you find the Data you need?
Middle Eastern Terrorists for $800 ?
![Page 8: The Rensselaer IDEA: Data Exploration](https://reader033.vdocuments.us/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/8.jpg)
IDEA
Discovery – there’s a lot out there
![Page 9: The Rensselaer IDEA: Data Exploration](https://reader033.vdocuments.us/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/9.jpg)
IDEA
Discovery needs more than keywords
World Bank: Africa
US Data.gov: Crop
Africover: Agriculture
Kenya: Agricultural
![Page 10: The Rensselaer IDEA: Data Exploration](https://reader033.vdocuments.us/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/10.jpg)
IDEA
Integration needs Semantics
Person
RIN 660125137
Address # 1118
Address St Pinehurst
Address zip 12203
Course topic CSCI
Course # 4961
Campus Personnel
RPI ID 660125137
Name Hendler
Campus Classes
CRN 1118
Name Intro to Physics
YES
NO!!!!
![Page 11: The Rensselaer IDEA: Data Exploration](https://reader033.vdocuments.us/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/11.jpg)
IDEA
Semantic Web and Linked Data (UK)
County Council
Ordnance Survey
Royal Mail
IOGDC Open Data Tutorial 11
![Page 12: The Rensselaer IDEA: Data Exploration](https://reader033.vdocuments.us/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/12.jpg)
IDEADistribution Statement
http://logd.tw.rpi.edu
Data Mashups
![Page 13: The Rensselaer IDEA: Data Exploration](https://reader033.vdocuments.us/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/13.jpg)
IDEA
Validation needs semantics
Easy for us
![Page 14: The Rensselaer IDEA: Data Exploration](https://reader033.vdocuments.us/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/14.jpg)
IDEA
Hard for machines…
Head to head comparison shows that burglaries in Avon and Somerset (UK) far exceed those in Los Angeles, California
![Page 15: The Rensselaer IDEA: Data Exploration](https://reader033.vdocuments.us/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/15.jpg)
IDEA
Data + everything else you know
Same or different?
Do the terms mean the same? Are they collected in the same way? Are they processed differently? …
![Page 16: The Rensselaer IDEA: Data Exploration](https://reader033.vdocuments.us/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/16.jpg)
IDEA
Trends in Smoking Prevalence, Tobacco Policy Coverage and Tobacco Prices (1991-2007)
Validation/Explanation need knowledge
Statistical correlation needs explanation
![Page 17: The Rensselaer IDEA: Data Exploration](https://reader033.vdocuments.us/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/17.jpg)
IDEA
Explanation also needs Semantics
Inference Web: McGuinness – various DoD/IC projects
![Page 18: The Rensselaer IDEA: Data Exploration](https://reader033.vdocuments.us/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/18.jpg)
IDEA
Closing the loop: where do the semantics come from?
Data
Prediction
Model
Design
How do we go from the predictive analytics of Big Data to models/explanations that allow newunderstanding?
![Page 19: The Rensselaer IDEA: Data Exploration](https://reader033.vdocuments.us/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/19.jpg)
IDEA
1. Better tools for Analytics, Agents and HPC
Make the tools and algorithms being developed by RPI researchers more “reusable” and multitask (including HPC data-analytic tools)
![Page 20: The Rensselaer IDEA: Data Exploration](https://reader033.vdocuments.us/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/20.jpg)
IDEA
2. Next-Gen Visualization (at scale)
How can multi-modal, multi-user, large scale sensory (visualization, sonification, haptics) interaction change the way we understand data?
![Page 21: The Rensselaer IDEA: Data Exploration](https://reader033.vdocuments.us/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/21.jpg)
IDEA
3. Include “agents” in the modeling
Develop technologies that enable researchers to work with “human-based” data at larger scales and in new ways• Population-scale
computing models for agent-based simulations
![Page 22: The Rensselaer IDEA: Data Exploration](https://reader033.vdocuments.us/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/22.jpg)
IDEA
Approach
Platform: Research in using supercomputers fordiscrete modeling• Carothers’ ROSS model
KR Model:• Weaver’s restricted rules
on graphs
Challenge problem:• Classification algorithms at petaflop scale• “Logical” (nonlinear, discontinuous) agents
![Page 23: The Rensselaer IDEA: Data Exploration](https://reader033.vdocuments.us/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/23.jpg)
IDEA
4. Exploit Cognitive Computing
IDEA will be the hub of Rensselaer’s cognitive-computing research• eg. Answer questions such as “Why” and “How”
integrated with large scale simulations
![Page 24: The Rensselaer IDEA: Data Exploration](https://reader033.vdocuments.us/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/24.jpg)
IDEA
Watson’s parallel model
Distributed (coarse-grained) parallelism© Making Watson Fast, IBM J Res and Dev,3/4 2012
![Page 25: The Rensselaer IDEA: Data Exploration](https://reader033.vdocuments.us/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/25.jpg)
IDEA
DeepQA type approach best on large clusters
(Physical) Simulation runs on supercomputers
Cognitive Computing at Scale
![Page 26: The Rensselaer IDEA: Data Exploration](https://reader033.vdocuments.us/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/26.jpg)
IDEA
Approach: link these computational models
Surmise (unproven): Cognitive Computing on a fast (large) cluster can query computations run against data generated by simulations (physical or agent-based) on the supercomputer
![Page 27: The Rensselaer IDEA: Data Exploration](https://reader033.vdocuments.us/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/27.jpg)
IDEA
• Semantics is a key technology for common data services
5. Data services will provide synergy across disciplines
Discovery, Integration. ValidationCuration, Citation,Archiving …
![Page 28: The Rensselaer IDEA: Data Exploration](https://reader033.vdocuments.us/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/28.jpg)
IDEA
Conclusions• The “warehouse” is only a small part of the data
ecosystem• Database technologies are only part of the story• Discovery, Integration, … , validation, explanation are key to
solving problems with data
• Closing the loop means “exploring” our data • Humans are still a key player in this
• The Rensselaer IDEA will explore• Data-driven applications and tools, but also…• … multimodal visualization, multiscale and agent modeling,
cognitive computing, and semantic data platforms
![Page 29: The Rensselaer IDEA: Data Exploration](https://reader033.vdocuments.us/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/29.jpg)
Rensselaer Institute for Data Exploration and Applications