towards cognitive agents for bigdata discovery
DESCRIPTION
Cognitive Agents to augment BigData analysisTRANSCRIPT
Towards Cognitive Agents for BigData Discovery
Finding Solutions to Complex, Urgent Problems
Jack Park
BigData Science Meetup
Freemont, CA: 19 April, 2014
Shyam Sarkar, Organizer
© 2014, TopicQuests Foundation
Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
A Narrative Arc
Context: Two Kinds of Discovery
• Data-based
– Harvesting nuggets from collected data
• Literature-based: Deep Question Answering
– Discovering connections between dots in the literature
Target: Deep Question Answering
Breadth
Dep
th
InformationRetrieval
Semantic Representation
Goal
Diagram adapted from a talk by Percy Liang at Stanford, 20140407
Our Goals
• Improve Human-Tool Capabilities
• Augment existing analytic methods
– Increase opportunities for discovery
– Improve already sophisticated methods
“Discovery consists of seeing what everybody has seen and thinking what nobody has thought.”
–Albert Szent-Györgyi
Our Approach
• Explore and develop the technologies of so-called Cognitive Agents
– Current examples
• IBM’s Watson
• SIRI
• An opportunity
– Couple two platforms
• Berkeley Data Analytics Stack (BDAS)
• SolrSherlock
Berkeley Data Analytics Stack Deep QA Issues*
• Low latency queries – Perform faster inferences
– Explore larger spaces
– Better decisions
• Sophisticated analysis – Better forecasts
– Better decisions
• Unification of existing data computation models – Integrate interactive queries, batch and streaming
processing
*http://strata.oreilly.com/2013/02/the-future-of-big-data-with-bdas-the-berkeley-data-analytics-stack.html
An Observation
In this context, interesting literature is about the social lives of data
Literature-based Discovery
• Forming bisociative links* between information in different literature sources which are not known to be related
• Swanson example (simplified)**: – Literature associated with Raynaud’s
• Raynaud’s therapy linked to blood thinners
– Literature associated with fish oils • Fish oil linked to blood thinners
– “Blood thinners” as an implicit link between fish oil and Raynaud’s Syndrome • Akin to the wormholes formed by tags on web pages or
hashtags *Arthur Koestler (1964). The Act of Creation ** Swanson, Don (1986) "Fish oil, Raynaud's syndrome, and undiscovered public knowledge." Perspectives in Biology and Medicine 30(1): 7-18.
Cognitive Agents
• Examples – Proprietary
• IBM’s Watson • SIRI • SRI’s CALO
– Part of which: IRIS, was made open source as OpenIRIS
• Others…
– Open Source • Cougaar
– http://www.cougaar.org/
• Open Cog – http://opencog.org/
• Open Advancement of Question Answering Systems – Closely related to IBM’s Watson – http://oaqa.github.io/
• SolrSherlock – http://debategraph.org/SolrSherlock
• Many others…
Use Cases for Big Data Harvesting
• Resource Collection – Federation
• bring together and organize without filters
• Resource Augmentation – Tagging – Annotating – Debate
• Knowledge Cartography – Connecting resources – Map maintenance – More Debate
• Research Augmentation – Crowd-sourced discovery – Harvesting – Automated inferences /reasoning – Knowledge sharing
Federated Information Resources
Harvesting Activities
Adapted from http://www.slideshare.net/jackpark/big-datasciencemeetup-final Slide 29
Harvesting Activities Harvesting Activities
A Strong Conjecture
• A Knowledge Federation’s topic map provides a Rosetta Stone-like substrate
– Reasoning by analogy
– Big Data mined for clues
– Map:
• Where we have been
• Where we haven’t (Dragons be here)
Adapted from http://www.slideshare.net/jackpark/big-datasciencemeetup-final Slide 33
Topic Maps for Knowledge Federation
• Maintain well-organized by topic structure
• Key issue:
– For any given information resource added to a map:
• Agents must answer this question: – Have I seen this before by any other name or description?
Are We There Yet?
• We are now at the edges of discovery:
– Deeper ways of representing
– Deeper ways of knowing
• Relational Biology
Relational Biology
• Paraphrasing Nicholas Rashevsky*: – We can tease open a living cell and count all its
components, but we cannot put it back together and we have no clue why
• Interpreting Robert Rosen**: – Rashevsky’s quest for a relational mathematics for
biology (complex systems) entails topological algebras (Category Theory)
• Category theory is said to facilitate modeling the social lives of members of the categories
*http://en.wikipedia.org/wiki/Nicholas_Rashevsky **http://en.wikipedia.org/wiki/Robert_Rosen_(theoretical_biologist)
Relational Modeling 1
• Starts with Ontologies
– Ontologies grant uniform vocabularies to universes of discourse
• Including describing data
– Ontology-based frameworks provide ways to model social and other relational structures
• SIOC: Semantically Interlinked Online Communities*
• SWAN: Semantic Web Applications in Neuromedicine**
*http://www.sioc-project.org/ **http://www.w3.org/TR/hcls-swan/
SIOC Closer Look
• A way to model components entailed by a situation (blog post in this case) – Uniform vocabulary
– Structural relations
• Creates a foundation for much deeper modeling – Including:
• Other ontologies
• Other structures
• Feedback loops
SIOC Blog Post*
*http://rdfs.org/sioc/spec/
Massive Connectivity and Feedback
http://geography.oii.ox.ac.uk/?page=home
Complex Communication Processes
Feedback Loops: Crucial to Learning
Image: FEDERAL HEALTH FUTURES SUMMIT LEADERSHIP LEARNING for TRANSFORMATIONAL CHANGE. September 10-11, 2012 Washington DC Metro Region Page 23
Relational Biology: Context
• Context is about Relations among the components themselves
• Context is about Relations among the components and their environment
• Context is about Feedback
Example from Breast Cancer 1
Extracellular Matrix (EM) as Context
Complex Communication Processes
Milk producing tissue
http://www.ted.com/talks/mina_bissell_experiments_that_point_to_a_new_understanding_of_cancer
Example from Breast Cancer 2
Cells missing their EM Cells with restored EM
http://www.ted.com/talks/mina_bissell_experiments_that_point_to_a_new_understanding_of_cancer
Towards Cognitive Agents
• Harvest and represent – Patterns
• Actors
• Relations
• States
– Context in which patterns exist
• Discover – Processes
– Unrecognized connections
– …
Watson’s Architecture* (Simplified)
• Analysis determines answer type and topics in play
• Hypothesis formation seeks candidate answers from sources – Pattern matching
• Hypothesis scoring weighs evidence for each hypothesis
• Answer ranking uses models to select answer
Question
Analysis
Answer Sources
Evidence Sources
Hypothesis Formation
Hypothesis Scoring
Answer Ranking
Answer
*http://www.aaai.org/Magazine/Watson/watson.php
SolrSherlock Architecture (Simplified)
Topic Map
Conceptual Graphs
Harvested Documents
Harvester: HyperMembrane
Information Fabrics, Agents
Literature-based Discovery:
Process documents into structures
(information fabrics) from which patterns
are harvested.
Federate Data Analysis with
Literature: Federate Data
Observations and predictions with
concepts and relations harvested from the
literature
Model Processes, Structures, and
Analogies
SolrSherlock Component Diagram
Topic Map
Conceptual Graph
Information Fabrics
TSC
TM Provider
CG Provider Machine Reader
TSC Provider
Open DeepQA Harvester
Pers
iste
nce
P
rovi
der
s A
gen
ts
Looking Forward
• Coupling Literature-based research with BigData analysis
– Common ontologies
– Hypothesis formation
– Evidence gathering
– Relation discovery
Completed Representation
antioxidants kill
free radicals
Contraindicates
macrophages use free radicals to
kill bacteria
Bacterial Infection Antioxidants
Because
Appropriate For
Compromised Host
Let us co-create Cognitive Agents for Discovery [email protected]
Thanks to Martin Radley , Patrick Durusau Sherry Jones, and Mark Szpakowski for valuable comments
SolrSherlock at: http://debategraph.org/SolrSherlock and https://github.com/SolrSherlock