please have a seat. our program will commence shortly

18
Please have a seat. Our program will commence shortly.

Upload: ceana

Post on 23-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Please have a seat. Our program will commence shortly. B iomarker A utomated R etrieval T ool. K N. R C. Ronny Chan, Kim Ngo Earth Science Data Systems Dept. Bioinformatics Relationship. Science produces massive amounts of data Data needs to be analyzed, stored, & retrieved - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Please have a seat.  Our program will commence shortly

Please have a seat. Our program will commence shortly.

Page 2: Please have a seat.  Our program will commence shortly

Biomarker Automated Retrieval Tool

Ronny Chan, Kim NgoRonny Chan, Kim Ngo

Earth Science Data Earth Science Data Systems Dept.Systems Dept.

Page 3: Please have a seat.  Our program will commence shortly

Bioinformatics Relationship

Science produces massive amounts of data Data needs to be analyzed, stored, &

retrieved This is data-mining

We want to apply computer science to improve this process

Page 4: Please have a seat.  Our program will commence shortly

Motivation

Problems with conventional data mining Time consuming Accuracy not defined (subjective)

No objective scientific info retrieval tool

Where are the Biomarkers?

Page 5: Please have a seat.  Our program will commence shortly

Cancer Biomarkers

An indicator of cancerous growth.

Page 6: Please have a seat.  Our program will commence shortly

Proposed Solution

Create a program that allows people to quickly scan literature for the

most relevant keywords/biomarkers

B.A.R.T.

HER-2

HPEBP4EP-CAM

ERBB2BAG-1

Page 7: Please have a seat.  Our program will commence shortly

Significance

What is the need of the project? More efficient research Save time

conventional enhanced

B.A.R.T.

Page 8: Please have a seat.  Our program will commence shortly

Goals

Make biomarker/keyword searches more efficient

Learn Java Learn SQL

Page 9: Please have a seat.  Our program will commence shortly

Approach

Write a program Read in articles Use part of Vector Space Model algorithm to

rank terms Output relevant terms in statistical rankings

they BRCA1VS.

Page 10: Please have a seat.  Our program will commence shortly

Vector Space Model

Information Retrieval System Introduced by Gerald Salton in the 60’s. Used widely in different search engines

Page 11: Please have a seat.  Our program will commence shortly

Algorithm for B.A.R.T.

Keywords Input

PubMed Query Agent

Data Store

Data Retrieval and Output

Content Analyzer

Keyword Parser

Content Ranker

Page 12: Please have a seat.  Our program will commence shortly

DCIS CU-TP3982 ERBB2 HER-2 HPEBP4 BAG-1 EP-CAM 99M

Results

Page 13: Please have a seat.  Our program will commence shortly

Lessons & Difficulties

Deciding on algorithm choice Ease of implementation and effectiveness

Limited knowledge & experience Java, SQL Initial implementation is slow

5 ARTICLES = 160 sec

UPDATE: AUGUST 18, 2004

100 ARTICLES = 8^19 years

20 ARTICLES = 1904 sec

100 ARTICLES = 8^38 years

Page 14: Please have a seat.  Our program will commence shortly

Future work

Apply different term weight functions to make results more robust

Optimize the program for speed

Page 15: Please have a seat.  Our program will commence shortly

Citations

1. http://ir.iit.edu/~dagr/cs529/files/handouts/03VectorSpaceImplementation-6per.PDF

2. http://classes.engr.oregonstate.edu/eecs/spring2004/cs419/10

3. http://www.cs.ust.hk/~dlee/Papers/ir/ieee-sw-rank.pdf

4. http://hartford.lti.cs.cmu.edu/classes/95-778/Lectures/04-BooleanVectorSpaceB.pdf

5. Biomarkers Definitions Working Group.Biomarkers and surrogate endoints: preferred definitions and conceptual framework. Clin. Pharmacol. Ther. 69(3), 89-95 (2001).

Page 16: Please have a seat.  Our program will commence shortly

Acknowledgements

Earth Science Data System, JPLTina XiaoPaul RamirezChris MattmannRoshanak RoshandelSean Hardman

ALL SoCalBSI Colleagues

National Institute of Health (NIH)

National Science Foundation (NSF)

Southern California Bioinformatics Summer Institute (So Cal BSI)

SoCalBSI Professors

Jacqueline Heras

Page 17: Please have a seat.  Our program will commence shortly

Q : malignant breast cancer

D 1: detection of malignant level in the cell

D 2: sighting of breast stage in the breast cancer

D 3: detection of malignant stage in the cancer

doc the stage level sighting cell malignant in of breast detection cancer

D1 1(0) 0 1(.477) 0 1(.477) 1(.176) 1(0) 1(0) 0 1(.176) 0

D2 1(0) 1(.176) 0 1(.477) 0 0 1(0) 1(0) 2(.477) 0 1(.176)

D3 1(0) 1(.176) 0 0 0 1(.176) 1(0) 1(0) 0 1(.176) 1(.176)

Q 0 0 0 0 0 1(.176) 0 0 1 0 1(.176)

VSM ExampleID TERM DF IDF

1 the 3 0

2 stage 2 .176

3 level 1 .477

4 sighting 1 .477

5 cell 1 .477

6 malignant 1 .176

7 in 3 0

8 of 3 0

9 breast 1 .477

10 detection 2 .176

11 Cancer 2 .176

)(log

)(log

23

10

10 DFn

Page 18: Please have a seat.  Our program will commence shortly

Example Continued…

Keyword tf * idf