DecisionSite and DiscoveryLink
Doug Del Prete
IBM Life Sciences
Visualizing data in novel ways
Spotfire User ConferenceBoston, MAOctober 28, 2003
2DecisionSite and DiscoveryLink
What is DiscoveryLink?
DiscoveryLink (DL) is a powerful technology available from IBM that will allow you to view many data sources – even non-relational ones like BLAST and HMMER – as one heterogeneous “virtual relational database”
Basically, a wide variety of data sources are all made to look like SQL-based tables/views, which makes it easy to access them and integrate all of this data together in an optimized way
3DecisionSite and DiscoveryLink
What is DiscoveryLink?
DiscoveryLink is based on the latest Information Integrator middleware technology, now a major IBM software initiative for data federation/integration
All data sources essentially become “SQL aware”, and do so under a cost-based optimizer that works against both relational and non-relational data sources and their associated queries
4DecisionSite and DiscoveryLink
Information Integration: Issues
For Scientists
Data Layer
Application Layer
Web Servers
Application
Multiple, varied sources Multiple queriesData freshness Legacy Databases
Without a Data Integration Layer
DB2DB2
OracleOracle
SQLSQLServerServer
ASCIIASCIIdata filedata file
Excel Excel SpreadsheetSpreadsheet
ApplicationApplication
ApplicationApplication
For IT
5DecisionSite and DiscoveryLink
DiscoveryLinkDiscoveryLink
Web Servers
DB2DB2
OracleOracle
SQLSQLServerServer
ASCIIASCIIdata filedata file
Excel Excel SpreadsheetSpreadsheet
™™
Multiple, varied sources Multiple queriesData freshness Legacy Databases
Application
Application
ApplicationApplication
Application
Information Integration: Solution
For Scientists
Toxicology Toxicology DataData
Proteomic Proteomic DataData
Compound Compound DataData
Genomic Genomic DataData
Textual Textual DataData
LegacyLegacyDataData
Gene Gene Expression Expression
DataData
Other Other DataData
SourcesSourcesDiscoveryLink
For IT
6DecisionSite and DiscoveryLink
Toxicology Toxicology DataData
Proteomic Proteomic DataData
Compound Compound DataData
Genomic Genomic DataData
Textual Textual DataData
LegacyLegacyDataData
Gene Gene Expression Expression
DataData
Other Other DataData
SourcesSourcesDiscoveryLink
Information Integration: Question
Q: Show me all the compounds similar to ketanserin that have been tested against members of the serotonin family and have characteristics of a good drug.
7DecisionSite and DiscoveryLink
Toxicology Toxicology DataData
Proteomic Proteomic DataData
Compound Compound DataData
Genomic Genomic DataData
Textual Textual DataData
LegacyLegacyDataData
Gene Gene Expression Expression
DataData
Other Other DataData
SourcesSourcesDiscoveryLink
Term Operator Value
Compound SimilarTo Ketanserin
Receptor Homologous Serotonin
IC50 <= 1E-8
Molwt > 375
Molwt < 425
logP > 4
logP < 6
BLAST Wrapper XML Wrapper Oracle Wrapper ODBC Wrapper
BLASTData Source
Oracle Compound DB
in Germany
Assay Results in
MySQL
Discovery Link
Result Set
(Visualization)
ParametersIIM
Query
Solution
XML Document
Q: Show me all the compounds similar to ketanserin that have been tested against members of the serotonin family and have characteristics of a good drug.
Architecture
Information Integration: IT Translation
8DecisionSite and DiscoveryLink
Toxicology Toxicology DataData
Proteomic Proteomic DataData
Compound Compound DataData
Genomic Genomic DataData
Textual Textual DataData
LegacyLegacyDataData
Gene Gene Expression Expression
DataData
Other Other DataData
SourcesSourcesDiscoveryLink
Solution
Architecture
SELECT a.compound_id, b.ic50, b.screen_nameFROM CMPNDDBS a, ACTIVITY_DATA b, BLASTP c WHERE a.compound_id = b.compound_id AND SimilarTo(a.compound_struct,:KETANSERIN_MOL) > 0.88 AND c.input_seq = :SEROTONIN AND c.protein_id = b.screen_name AND b.ic50 <= 0.000000001 AND a.mol_wt BETWEEN 375 AND 425 AND a.logP BETWEEN 4 AND 5
DL Query
BLAST Wrapper XML Wrapper Oracle Wrapper ODBC Wrapper
BLASTData Source
Oracle Coumpound DB in
Germany
Assay Results in
MySQL
Discovery Link
Result Set
(Visulaization)
IIMQuery
XMLDocument
Q: Show me all the compounds similar to ketanserin that have been tested against members of the serotonin family and have characteristics of a good drug.
Information Integration: IT Translation
9DecisionSite and DiscoveryLink
Q: Show me all the compounds similar to ketanserin that have been tested against members of the serotonin family and have characteristics of a good drug.
Results
Information Integration: Final Result
Compnd ID HTR1A US1111
HTR1B US1234
HTR1D US2534
HTR1E US 1111
HTR1F US2534
HTR2A US 1111
HTR2B US4791
HTR2C US 1111
HTR4 US1234
HTR5A US1111
HTR6 US1111
US-345123 0.001 0.5 [email protected] 2.1 3.8 1.1 0.001 53.5@5 0.01 0.02 [email protected]
UK-567345 <0.003 >5.0 [email protected] <6.0 8.8 5.5 5.9 16@5 5.6 4.3 [email protected]
US-234012 0.0025 >5.7 23@10 <6.0 8.9 5.4 7.0 15@5 4.8 19.0 [email protected]
US-321543 0.05 2.0 [email protected] 8.9 0.0 6.7 10.0 48@5 3.33 2.6 [email protected]
10DecisionSite and DiscoveryLink
Wrapper instance definition
RDB, Spreadsheet, Flat Files, Algorithms, etc. In diverse locations
<XML>text
</XML>
RqRq Rq
rsrsrs
Rq rs
Spotfire DecisionSite, Synapsia, Customer Application, SQL command line, etc.
(JDBC/ODBC)
Information IntegratorInformation Integrator
Views
Administration throughInformation Integrator
Control center
CatalogDB2
Dataloader
SwissProtKEGGdbESTLocus LinkUNIGENEand more …
WrappersWrappers
•DB2•Oracle
•Oracle Cartridge•MS SQL Server•Sybase•Informix•Teradata•ODBC (MySQL, Postgres…)•Excel•Flat Files in CSV format•Documentum•Blast•XML•ENTREZ (NCBI portal)•HMMer•Extended Search•BioRS
Wrapper Development Toolkit
Server Definition
User Mapping
Nicknames Definition
Optimizer Log
Wrappers plan
DiscoveryLink – Overall Architecture
11DecisionSite and DiscoveryLink
DiscoveryLink: A Robust Solution
• Access to multiple, heterogeneous sources
• Complex queries across distributed data sources
• Leverage existing IT infrastructure and use specialized functions of existing databases
• Integrating analysis tools and business intelligence
• Can put a SQL front-end and user security on data sources such as BLAST, Pubmed, Genbank, HMMER, XML
• Can use for fast and easy ad-hoc extensions to a data warehouse/mart
Benefits
12DecisionSite and DiscoveryLink
DiscoveryLink Value Proposition
A proven scalable data integration solution that enables efficient and effective queries across disparate data sources, thereby improving R&D efficiencies and productivity.
This translates into greater flexibility and competitive advantage in the marketplace.
14DecisionSite and DiscoveryLink
DecisionSite and DiscoveryLink
DecisionSite, in its own right, can access many data sources to drive its powerful set of visualizations – i.e. any data source that is JDBC compliant can be configured as a data source
But DiscoveryLink as the data access engine can extend the reach of DecisionSite in both the types of data sources and performance/scalability
These two products together provide a robust, flexible “best of breed” approach to analyzing your data: DecisionSite as the user interface/front-end to DL DiscoveryLink as a federated data source access engine for DS
15DecisionSite and DiscoveryLink
DecisionSite Server
Information Interaction Services
Relational DB MiddlewareDB2/
DiscoveryLink
JDBC
WebServices
Txt/csv/etc…
DecisionSite and DiscoveryLink
BLAST/ HMMER XML
Relational(Optimized
Joins)
Pubmed/Genbank …
16DecisionSite and DiscoveryLink
Benefits Overview
Access more data sources – BLAST, HMMER, Genbank, Documentum, BioRS, etc.
Access existing DecisionSite data sources faster/easier – XML, Postgres, MS Access, etc.
Extend/augment an existing visualization with information from any of the above data sources
Optimized queries cross-joined across all data sources – relational and non-relational – all under one JDBC connection
18DecisionSite and DiscoveryLink
DecisionSite and DiscoveryLink
Based on the natural fit of DiscoveryLink into DecisionSite’s IIM (Information Interaction Model)
Under the Information Interaction Designer (IID), you reference “Nicknames”, which are “virtual tables”, pointing to other tables/views and non-relational objects pre-configured under DiscoveryLink
These data sources naturally available to Information Builder and Information Library/Links, and beyond
19DecisionSite and DiscoveryLink
DecisionSite and BLAST
See how easy it is to configure a data source like BLAST to be used under DecisionSite
25DecisionSite and DiscoveryLink
This is all done quickly and easily, even though there is no JDBC driver for BLAST readily available!
DecisionSite and BLAST
26DecisionSite and DiscoveryLink
DecisionSite and XML
Similarly, see how easy it is to configure XML to be used under DecisionSite
30DecisionSite and DiscoveryLink
Again, this can all be done quickly and easily, without having to find and manually configure a JDBC driver via a text editor, and then restarting DecisionSite, etc.
DecisionSite and XML
31DecisionSite and DiscoveryLink
Merged Queries - Example
In IID, set up the BLAST and Entrez Data Models
Also the Nucleotide/BLAST join via Accession #
Configure a BLAST Information Link Configure a Nucleotide Information Link Run BLAST visualization From the visualization, run the Entrez
Information Link to get all the applicable metadata information about each displayed Accession (in Details-on-Demand, Table, etc.)
32DecisionSite and DiscoveryLink
Query Optimization
Under IID, configure key Joins under DB2/ DiscoveryLink
This includes established DecisionSite data like Oracle and MySQL (very quickly configured in DiscoveryLink Control Center)
The Information Link makes only one JDBC connection to access all the data sources!
The underlying SQL query invokes the DB2 Optimizer to improve performance even more
33DecisionSite and DiscoveryLink
Query Optimization
Great for large relational data sets such as joining assay results with a chemical compound database, etc.
34DecisionSite and DiscoveryLink
DecisionSite and DiscoveryLink - Summary
You can extend all the powerful features of DecisionSite, like guided analytics and posters, to visualize information from more data sources, in an easier manner, and do so in an optimized environment