limsoon wong laboratories for information technology singapore from informatics to bioinformatics

19
Limsoon Wong Laboratories for Information Technology Singapore From Informatics to Bioinformatics

Upload: tyler-spencer

Post on 05-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Limsoon Wong Laboratories for Information Technology Singapore From Informatics to Bioinformatics

Limsoon WongLaboratories for Information Technology

Singapore

From Informaticsto Bioinformatics

Page 2: Limsoon Wong Laboratories for Information Technology Singapore From Informatics to Bioinformatics

What is Bioinformatics?

Page 3: Limsoon Wong Laboratories for Information Technology Singapore From Informatics to Bioinformatics

Themes of Bioinformatics

Bioinformatics = Data Mgmt + Knowledge Discovery

Data Mgmt =Integration + Transformation + Cleansing

Knowledge Discovery = Statistics + Algorithms + Databases

Page 4: Limsoon Wong Laboratories for Information Technology Singapore From Informatics to Bioinformatics

Benefits of Bioinformatics

To the patient:Better drug, better treatment

To the pharma:Save time, save cost, make more $

To the scientist:Better science

Page 5: Limsoon Wong Laboratories for Information Technology Singapore From Informatics to Bioinformatics

From Informatics to Bioinformatics

IntegrationTechnology(Kleisli)

Cleansing & Warehousing (FIMM)

MHC-PeptideBinding(PREDICT)

Protein InteractionsExtraction (PIES)

Gene Expression & Medical RecordDatamining (PCL)

Gene FeatureRecognition (Dragon)

VenomInformatics

1994 19981996 2000 2002

8 years of bioinformaticsR&D in Singapore

ISS KRDL LIT

Page 6: Limsoon Wong Laboratories for Information Technology Singapore From Informatics to Bioinformatics

Data IntegrationA DOE “impossible query”:

For each gene on a given cytogenetic band, find its non-human homologs.

source type location remarks

GDB Sybase Baltimore Flat tablesSQL joinsLocation info

Entrez ASN.1 Bethesda Nested tablesKeywordsHomolog info

Page 7: Limsoon Wong Laboratories for Information Technology Singapore From Informatics to Bioinformatics

Data Integration Resultssybase-add (#name:”GDB", ...);

create view L from locus_cyto_location using GDB;

create view E from object_genbank_eref using GDB;

select

#accn: g.#genbank_ref, #nonhuman-homologs: H

from

L as c, E as g,

{select u

from g.#genbank_ref.na-get-homolog-summary as u

where not(u.#title string-islike "%Human%") andalso

not(u.#title string-islike "%H.sapien%")} as H

where

c.#chrom_num = "22” andalso

g.#object_id = c.#locus_id andalso

not (H = { });

• Using Kleisli:

• Clear

• Succinct

• Efficient

• Handles

•heterogeneity

•complexity

Page 8: Limsoon Wong Laboratories for Information Technology Singapore From Informatics to Bioinformatics

Data WarehousingMotivation

efficiency

availabilty

“denial of service”

data cleansing

Requirements

efficient to query

easy to update.

model data naturally

{(#uid: 6138971,

#title: "Homo sapiens adrenergic ...",

#accession: "NM_001619",

#organism: "Homo sapiens",

#taxon: 9606,

#lineage: ["Eukaryota", "Metazoa", …],

#seq: "CTCGGCCTCGGGCGCGGC...",

#feature: {

(#name: "source",

#continuous: true,

#position: [

(#accn: "NM_001619",

#start: 0, #end: 3602,

#negative: false)],

#anno: [

(#anno_name: "organism",

#descr: "Homo sapiens"), …] ), …)}

Page 9: Limsoon Wong Laboratories for Information Technology Singapore From Informatics to Bioinformatics

Data Warehousing Results

Relational DBMS is insufficient because it forces us to fragment data into 3NF.

Kleisli turns flat relational DBMS into nested relational DBMS. It can use flat relational DBMS such as Sybase, Oracle, MySQL, etc. to be its update-able complex object store.

! Log inoracle-cplobj-add (#name: "db", ...);

! Define table

create table GP (#uid: "NUMBER", #detail: "LONG")using db;

! Populate table with GenPept reportsselect #uid: x.#uid, #detail: x into GPfrom aa-get-seqfeat-general "PTP” as xusing db;

! Map GP to that tablecreate view GP from GP using db;

! Run a queryto get title of 131470select x.#detail.#title from GP as xwhere x.#uid = 131470;

Page 10: Limsoon Wong Laboratories for Information Technology Singapore From Informatics to Bioinformatics

Epitope PredictionTRAP-559AAMNHLGNVKYLVIVFLIFFDLFLVNGRDVQNNIVDEIKYSEEVCNDQVDLYLLMDCSGSIRRHNWVNHAVPLAMKLIQQLNLNDNAIHLYVNVFSNNAKEIIRLHSDASKNKEKALIIIRSLLSTNLPYGRTNLTDALLQVRKHLNDRINRENANQLVVILTDGIPDSIQDSLKESRKLSDRGVKIAVFGIGQGINVAFNRFLVGCHPSDGKCNLYADSAWENVKNVIGPFMKAVCVEVEKTASCGVWDEWSPCSVTCGKGTRSRKREILHEGCTSEIQEQCEEERCPPKWEPLDVPDEPEDDQPRPRGDNSSVQKPEENIIDNNPQEPSPNPEEGKDENPNGFDLDENPENPPNPDIPEQKPNIPEDSEKEVPSDVPKNPEDDREENFDIPKKPENKHDNQNNLPNDKSDRNIPYSPLPPKVLDNERKQSDPQSQDNNGNRHVPNSEDRETRPHGRNNENRSYNRKYNDTPKHPEREEHEKPDNNKKKGESDNKYKIAGGIAGGLALLACAGLAYKFVVPGAATPYAGEPAPFDETLGEEDKDLDEPEQFRLPEENEWN

Page 11: Limsoon Wong Laboratories for Information Technology Singapore From Informatics to Bioinformatics

Epitope Prediction Results Prediction by our ANN model for HLA-A11

29 predictions 22 epitopes 76% specificity

1 66 100Rank by BIMAS

Number of experimental binders 19 (52.8%) 5 (13.9%) 12 (33.3%)

Prediction by BIMAS matrix for HLA-A*1101

Page 12: Limsoon Wong Laboratories for Information Technology Singapore From Informatics to Bioinformatics

Transcription Start Prediction

Page 13: Limsoon Wong Laboratories for Information Technology Singapore From Informatics to Bioinformatics

Transcription Start Prediction Results

Page 14: Limsoon Wong Laboratories for Information Technology Singapore From Informatics to Bioinformatics

Medical Record Analysis

Looking for patterns that are valid novel useful understandable

age sex chol ecg heart sick

49 M 266 Hyp 171 N64 M 211 Norm 144 N58 F 283 Hyp 162 N58 M 284 Hyp 160 Y58 M 224 Abn 173 Y

Page 15: Limsoon Wong Laboratories for Information Technology Singapore From Informatics to Bioinformatics

Gene Expression Analysis

Classifying gene expression profiles find stable differentially expressed genes find significant gene groups derive coordinated gene expression

Page 16: Limsoon Wong Laboratories for Information Technology Singapore From Informatics to Bioinformatics

Medical Record & Gene Expression Analysis Results

PCL, a novel “emerging pattern’’ method

Beats C4.5, CBA, LB, NB, TAN in 21 out of 32 UCI benchmarks

Works well for gene expressions

Cancer Cell, March 2002, 1(2)

Page 17: Limsoon Wong Laboratories for Information Technology Singapore From Informatics to Bioinformatics

Protein Interaction Extraction

“What are the protein-protein interaction pathwaysfrom the latest reported discoveries?”

Page 18: Limsoon Wong Laboratories for Information Technology Singapore From Informatics to Bioinformatics

Protein Interaction Extraction Results Rule-based system for

processing free texts in scientific abstracts

Specialized in extracting protein

names extracting protein-

protein interactions

Page 19: Limsoon Wong Laboratories for Information Technology Singapore From Informatics to Bioinformatics

Behind the Scene

Vladimir Bajic Vladimir Brusic Jinyan Li See-Kiong Ng Limsoon Wong Louxin Zhang

Allen Chong Judice Koh SPT Krishnan Huiqing Liu Seng Hong Seah Soon Heng Tan Guanglan Zhang Zhuo Zhangand many more:

students, folks from geneticXchange,MolecularConnections, and other collaborators….