from data to foresight - brown university · & discovery accelerators information integration...
Post on 03-Jun-2020
3 Views
Preview:
TRANSCRIPT
1 © 2011 IBM Corporation
From Data to Foresight:
Laura Haas, IBM FellowIBM Research - Almaden
Leveraging Data and Analytics
for Materials Research
2 © 2011 IBM Corporation
The road from data to foresight is long
� Must acquire, integrate, enhance and align
� Must deal with missing and incomplete data
� Must store, protect, and manage
� Must create models and other analytics and test them
� Must run these analyses efficiently over large data volumes
� Must understand and share results
� Requires significant (and expensive) EXPERTISE in data management,
systems, analytics, and the domain
� Takes TIME
?
How can I
reduce my
?
Consumer
Reports
RAINFALL
ERROR
RAINFALL
ERROR
SATURATION &
SURFACE Runoff
OVERLAND
ROUTING
UPDATE
STATE
UPDATE
STATE
UPDATE
STATE
UPDATE
STATESOLVE
STATE EQUATIONS
SOLVE
STATE EQUATIONS
SOLVE
STATE EQUATIONS
SOLVE
STATE EQUATIONS
PERCOLATIONPERCOLATION
MISCELLANEOUS
FLUXES
MISCELLANEOUS
FLUXES
MISCELLANEOUS
FLUXES
MISCELLANEOUS
FLUXES
MISCELLANEOUS
FLUXES
UPPER LAYER
EVAPORATION
UPPER LAYER
EVAPORATION
UPPER LAYER
EVAPORATION
LOWER Layer
EVAPORATION
LOWER Layer
EVAPORATION
LOWER Layer
EVAPORATION
INTERFLOW
BASE FLOWBASE FLOWBASE FLOW
SATURATION &
SURFACERunoff
PERCOLATION
INTERFLOW
SOLVE
STATE EQUATIONS
LOWER LAYER
EVAPORATION
UPPER LAYER
EVAPORATION
Miscfluxes
UPDATE
STATE
Note: in addition to dependencies shown, most flux calculations are dependent on values of state variables at the previous timestep
Instantaneous Runoff
Routed Runoff
Total Water:Upper Layer, Lower Layer
OUTPUT
Legend: Flux computations
State computations
Inputs and outputs
SATURATION &
SURFACE RUNOFF
Upper Layer
Evaporation
Lower Layer
Evaporation
EffectivePrecipitation
BASE FLOWBASE FLOW
OVERLAND
ROUTING
Interflow
Baseflow
Saturated
AreaSurface Runoff
Observed Precipitation
Potential Evapo-
Transpiration
Percolation
MISCELLANEOUS
FLUXES
Percolation
INPUT
3 © 2011 IBM Corporation
The 4 V’s of data
Volume Velocity Veracity*Variety
Data at Rest
Terabytes to
exabytes of existing
data to process
Data in Motion
Streaming data,
milliseconds to
seconds to respond
Data in Many
Forms
Structured,
unstructured, text,
multimedia
Data in Doubt
Uncertainty due to
data inconsistency
& incompleteness,
ambiguities, latency,
deception, model
approximations
* Truthfulness, accuracy or precision, correctness
4 © 2011 IBM Corporation4
Valuable new insights are hidden in this wealth of data!
Identify criminals and threats
from disparate video, audio,
and data feeds
Make risk decisions based on
real-time transactional data
Predict weather patterns to plan
optimal wind turbine usage, and
optimize capital expenditure on
asset placement
Detect life-threatening
conditions at hospitals in
time to intervene
Discover and optimize new
materials by mining data in the
patents and literature
5 © 2011 IBM Corporation
Fortunately, new platforms can unlock the value of data
BI /
Reporting
BI /
Reporting
Exploration /
Visualization
Functional
App
Industry
App
Predictive
Analytics
Content
Analytics
Analytic Applications
IBM Big Data Platform
Systems
Management
Application
Development
Visualization
& Discovery
Accelerators
Information Integration & Governance
Hadoop
System
Stream
Computing
Data
Warehouse
New analytic applications drive the
requirements for a big data platform
• Integrate and manage the full
variety, velocity and volume of data
• Apply advanced analytics to
information in its native form
• Visualize all available data for ad-
hoc analysis
• Develop new analytic applications
• Optimize and control scheduling of
many simultaneous analyses
• Protect data and applications from
accidents, sabotage, and theft
6 © 2011 IBM Corporation
Outcome-based medicine vision: Leverage public and private content, rich analytics to improve treatment outcomes
Research & Development and
Intellectual PropertyTarget Identification and Validation
Lead Discovery and Optimization
Safety and Efficacy
Genomics
Proteomics
Metalobomics
Chemical and
Biological Extraction,
Profiling, Analytics,
And Reasoning
Clinical Decision SupportPatient Similarity and Segmentation
Patient Cohorts for Clinical Support
Clinical Genomics Analysis
Comparative Effectiveness Research
Predictive Modeling of Outcome
Disease Progression Analysis
Treatment Cost Analysis
Temporal Analysis
Patient experience
and social
community supportPatient first hand
experiences
Social community
development and support
Patents Pre-clinical
Clinical Trials
Scientific
Literature
Safety
DMPK
FormulationClaims Data
Electronic
Medical
Records
Ontologies
Pathways
Curated Data
Web
Social Media
High
Throughput
Screening
Target
Selection
Candidate
Selection
Development
Selection
Target
Identification
Lead
DiscoveryPreclinical
Development
Clinical
I II IIIlPatient
Experience
Launch Patient
Outcome
Medical
Care
Key Analytics Capabilities: BI, Text analytics, NLP, Network Analysis, Relationship Discovery, ML, Modeling, …
7 © 2011 IBM Corporation
An Example: Leveraging data to accelerate life sciences R&D
► R&D Find white space and gain insight into complex chemical and biological patents; Gain early insights into given target-
compound match from past patents for better research target & compound selection decisions
► Legal Detect IP infringement earlier and increase the quality of patent filings
► Corporate Strategy / Business Dev Identify collaboration and acquisition targets for greater research value and
effectiveness and find patent in- and out licensing candidates for efficient management and monetization of IP
► Valuable insights into competitive landscape, white space, and IP portfolio
► High quality chemical extractions available hours after patents are available from patent authorities
► Previously unobtainable insights at the scientists’ fingertips with the touch of a button
► Fast and easy search and analysis drastically reducing search time from weeks and months to just minutes
The Benefits
Highly volatile, increasingly complex environment
Traditional R&D is not delivering
New approaches are needed
Collaborative R&D models The new normal requiring
open platforms, clear boundaries and protection
Agile responses Vital to drive fast adaptation to changing
competitive IP landscape including, adjustments to strategy,
portfolio investments and partnerships
Effective IP portfolio management Delivering key value
for out-licensing and monetizing of non-core IP
Strategic ecosystem development Growth and
competitive differentiation through aggressive collaboration,
early identification of acquisition and recruitment targets
The Situation
IBM BAO strategic IP insight platform (SIIP)
A unique and powerful
data and analytics offering
Aggregates and processes 30M+ patents and scientific
literature from around the globe
Automatically extracts chemical and biological entities –
200M+ chemical compound instances to date
Generates chemical and biological entity profiles
Searches and analyzes using natural language-based
inputs for key relationship discovery and IP insights
Reasoning about causality of drug, diseases, targets, and
efficacy and side effects
Integrates and enhances existing data and applications
The Solution
8 © 2011 IBM Corporation
A Smart Entity Profiling, Analytics and Reasoning Methodology
Medicine
Disease Patients
IP- Legal status
- Assignee
- Foreign filings
- Expiration Date
- . . .
Drug- Activity
- Half life
- Protein Binding
- . . .
Physical- Computational
- Molecular Weight
- MF, Bp, Mp
- . . .Spectral- IR
- NMR
- Mass Spectra
- . . .
Toxocity- Clinical Trials
- Pre-Clinical
- . . .
Pathways- Metabolic
- Genetic
- Environmental
- Cellular
- Organism
- . . .
Screening- Activity
- . . .
Genetic-. . .
Organisms- Organism
- Organ
- Cell
- Tissue
- . . .
Life styles-. . .
Reactions- Enzymes
- . . .
Patents
Literature
Experimental
HTS
Medical
Records
Clinical
Business
Medical
History-. . .
Social
•An integrated framework leveraging
broad set of data, and many types of
analytics:
• Hypothesis generation
• Entity extraction and
profiling
• Relationship discovery
and analytics
• Summarization
• Reasoning
• Scoring and ranking
• Predictive modeling
•Key steps:
• Extract key entities
• Combine information
from multiple sources
• Discover relationships
among entities
• Reason about
relationships
Medical
Records-. . .
9 © 2011 IBM Corporation
Information and Governance for Big Data
Leverage private/public clouds to share vs keep proprietary as appropriate
10 © 2011 IBM Corporation
Summary
� There is much to be gained from leveraging available data and content
– Accelerate discovery
– Avoid repeating work
� Unlocking the value buried in there is difficult
– 4 V’s: Volume, Velocity, Variety, Veracity
– A long process requiring many types of expertise
� There are powerful platforms and tools that can help
– Aid development of type-specific analytics
– Enable fast and timely processing of large diverse data sets
� Sharing, with appropriate data governance, can accelerate discovery
– Controls for the entire data lifecycle
– Many industry groups are finding leverage from shared investments
top related