please have a seat. our program will commence shortly
Post on 23-Jan-2016
33 Views
Preview:
DESCRIPTION
TRANSCRIPT
Please have a seat. Our program will commence shortly.
Biomarker Automated Retrieval Tool
Ronny Chan, Kim NgoRonny Chan, Kim Ngo
Earth Science Data Earth Science Data Systems Dept.Systems Dept.
Bioinformatics Relationship
Science produces massive amounts of data Data needs to be analyzed, stored, &
retrieved This is data-mining
We want to apply computer science to improve this process
Motivation
Problems with conventional data mining Time consuming Accuracy not defined (subjective)
No objective scientific info retrieval tool
Where are the Biomarkers?
Cancer Biomarkers
An indicator of cancerous growth.
Proposed Solution
Create a program that allows people to quickly scan literature for the
most relevant keywords/biomarkers
B.A.R.T.
HER-2
HPEBP4EP-CAM
ERBB2BAG-1
Significance
What is the need of the project? More efficient research Save time
conventional enhanced
B.A.R.T.
Goals
Make biomarker/keyword searches more efficient
Learn Java Learn SQL
Approach
Write a program Read in articles Use part of Vector Space Model algorithm to
rank terms Output relevant terms in statistical rankings
they BRCA1VS.
Vector Space Model
Information Retrieval System Introduced by Gerald Salton in the 60’s. Used widely in different search engines
Algorithm for B.A.R.T.
Keywords Input
PubMed Query Agent
Data Store
Data Retrieval and Output
Content Analyzer
Keyword Parser
Content Ranker
DCIS CU-TP3982 ERBB2 HER-2 HPEBP4 BAG-1 EP-CAM 99M
Results
Lessons & Difficulties
Deciding on algorithm choice Ease of implementation and effectiveness
Limited knowledge & experience Java, SQL Initial implementation is slow
5 ARTICLES = 160 sec
UPDATE: AUGUST 18, 2004
100 ARTICLES = 8^19 years
20 ARTICLES = 1904 sec
100 ARTICLES = 8^38 years
Future work
Apply different term weight functions to make results more robust
Optimize the program for speed
Citations
1. http://ir.iit.edu/~dagr/cs529/files/handouts/03VectorSpaceImplementation-6per.PDF
2. http://classes.engr.oregonstate.edu/eecs/spring2004/cs419/10
3. http://www.cs.ust.hk/~dlee/Papers/ir/ieee-sw-rank.pdf
4. http://hartford.lti.cs.cmu.edu/classes/95-778/Lectures/04-BooleanVectorSpaceB.pdf
5. Biomarkers Definitions Working Group.Biomarkers and surrogate endoints: preferred definitions and conceptual framework. Clin. Pharmacol. Ther. 69(3), 89-95 (2001).
Acknowledgements
Earth Science Data System, JPLTina XiaoPaul RamirezChris MattmannRoshanak RoshandelSean Hardman
ALL SoCalBSI Colleagues
National Institute of Health (NIH)
National Science Foundation (NSF)
Southern California Bioinformatics Summer Institute (So Cal BSI)
SoCalBSI Professors
Jacqueline Heras
Q : malignant breast cancer
D 1: detection of malignant level in the cell
D 2: sighting of breast stage in the breast cancer
D 3: detection of malignant stage in the cancer
doc the stage level sighting cell malignant in of breast detection cancer
D1 1(0) 0 1(.477) 0 1(.477) 1(.176) 1(0) 1(0) 0 1(.176) 0
D2 1(0) 1(.176) 0 1(.477) 0 0 1(0) 1(0) 2(.477) 0 1(.176)
D3 1(0) 1(.176) 0 0 0 1(.176) 1(0) 1(0) 0 1(.176) 1(.176)
Q 0 0 0 0 0 1(.176) 0 0 1 0 1(.176)
VSM ExampleID TERM DF IDF
1 the 3 0
2 stage 2 .176
3 level 1 .477
4 sighting 1 .477
5 cell 1 .477
6 malignant 1 .176
7 in 3 0
8 of 3 0
9 breast 1 .477
10 detection 2 .176
11 Cancer 2 .176
)(log
)(log
23
10
10 DFn
Example Continued…
Keyword tf * idf
top related