development of a chicken unigene database

Post on 03-Jan-2016

35 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Development of a Chicken Unigene Database. Project No. 9. Ruoming Jin. Lilian Lacoste. Jianshan Tang. Department of CIS University of Delaware. Animal Science Dept. University of Delaware. DBI - French National School of Aeronautics and Space. - PowerPoint PPT Presentation

TRANSCRIPT

Development of a Chicken Unigene Database

Project No. 9

Mentors: Dr. Wellington Martins - Dr. Joan Burnside

Animal Science Dept.University of Delaware

Jianshan Tang Ruoming Jin

Department of CIS

University of Delaware

Lilian Lacoste

DBI - French National School of Aeronautics

and Space

Results

2815 contigs 6390 singlets

17,090 ESTsPhrap

9,205 cluster

Phrap Clustering Result:

Second clustering method : using BLAST output

Contig 1

BLASToutput1

Contig 2

BLASToutput2

FilteringParsing

Comparing

Similarity function

Similarity matrix

Whats gbc?

Graph Based Clustering Clustering, a process of partitioning a set of data (or

objects) in a set of meaningful sub-classes, called clusters. Graph, the relation of the data could be expressed as

graph If there is a relation of two nodes, one edge connects them

Working in bioinformatics Protein sequence clustering EST clustering A lot of other applications!

Objective of "gbc" Support different input format Efficiently support very large sparse graph clustering Flexible to use by user

How to use gbc

Output Cluster number, and all the nodes belongs

to the cluster Clique clustering

a clique is a completely connected subgraph each maximal clique in the graph becomes a cluster clusters many overlap generally produces small but very tight clusters

Single-link clustering A maximal connected subgraph becomes a cluster produces larger but weaker clusters

A little about Implementation Works

Two clustering algorithm Single-link Clique

Graph Classes Efficiently support dense/sparse

graph Provide the same interface without

modifying clustering code

Analysis program

Reset BLAST output

Change matrix thresholdReset semantics

Run analysisNew contig set

Number ofcontigs

Comparisonalgorithm

Clusteringalgorithm

Resultsoutput

Analysis tools

Processlog output

Analysis tools : contig information

Display the BLAST output :- sequences references- sequences annotations- percentage of matching basepairs

Display the list of contigs sortedaccording to their best matching percentage in the BLAST output

Analysis tool : EST selector

Display :- frequency vs length (in ESTs)of contigs- list of ESTs in a contig

Allows to select the best representative EST accordingto length and tissue type

First results

On a set of 400 contigs representing 1000 ESTs

Contig number :79Contig size :743Best matching fraction :0.43587786259541983gb|AF178529.1|AF178529 Gallus gallus Rad54b (RAD54B) mRNA, compl... 571 e-160gb|BC001965.1|BC001965 Homo sapiens, RAD54, S. cerevisiae, homol... 143 2e-31ref|XM_005161.3| Homo sapiens RAD54, S. cerevisiae, homolog of, ... 143 2e-31gb|AF112481.1|AF112481 Homo sapiens RAD54B protein (RAD54B) mRNA... 143 2e-31ref|NM_012415.1| Homo sapiens RAD54, S. cerevisiae, homolog of, ... 143 2e-31emb|AL133578.1|HSM801429 Homo sapiens mRNA; cDNA DKFZp434J1672 (... 143 2e-31dbj|AP003534.1|AP003534 Homo sapiens genomic DNA, chromosome 8q2... 76 3e-11gb|AC009623.6|AC009623 Homo sapiens chromosome 8, clone RP11-219... 40 1.7

Contig number :133Contig size :740Best matching fraction :0.9413109756097561gb|AF178529.1|AF178529 Gallus gallus Rad54b (RAD54B) mRNA, compl... 1235 0.0gb|BC001965.1|BC001965 Homo sapiens, RAD54, S. cerevisiae, homol... 184 5e-44ref|XM_005161.3| Homo sapiens RAD54, S. cerevisiae, homolog of, ... 184 5e-44gb|AF112481.1|AF112481 Homo sapiens RAD54B protein (RAD54B) mRNA... 184 5e-44ref|NM_012415.1| Homo sapiens RAD54, S. cerevisiae, homolog of, ... 184 5e-44emb|AL133578.1|HSM801429 Homo sapiens mRNA; cDNA DKFZp434J1672 (... 184 5e-44dbj|AP003534.1|AP003534 Homo sapiens genomic DNA, chromosome 8q2... 76 3e-11gb|AC084633.1|CBRG45G04 Caenorhabditis briggsae cosmid G45G04, c... 44 0.11dbj|AB018110.1|AB018110 Arabidopsis thaliana genomic DNA, chromo... 44 0.11

References

Gene Index analysis of the human genome estimates approximately 120,000 genes. Liang-Feng; Holt-Ingeborg, Pertea-Geo, Karamycheva-Svetlana, Salzberg-Steven-L, Quackenbush-John Nature-Genetics. June, 2000; 25 (2): 239-240.

The TIGR Gene Indices: Reconstruction and representation of expressed gene sequences Quackenbush-John, Liang-Feng, Holt-Ingeborg, Pertea-Geo, Upton-Jonathan Nucleic-Acids-ResearchJan. 1, 2000; 28 (1): 141-145

IMAGEne I: Clustering and ranking of I.M.A.G.E. cDNA clones corresponding to known genes. Cariaso-M, Folta-P , Wagner-M, Kuczmarski-T, Lennon-G Bioinformatics-Oxford. Dec., 1999; 15 (12): 965-973.

R. Larson, M. Hearst : Content analysis - Lecture from University of California , Berkeley School of information management and systems 1998. http://www.sims.berkeley.edu/courses/is202/f98/Lecture16/sld001.htmGib

T. Ono, H. Hishigaki, A. Tanigami, T. Takagi - Automated extraction of information on protein-protein interaction from biological literature. Bioinformatics vol 17 no 2 - Oxford University Press 2001.

I. Iliopoulos, A.J. Enright, C.A. Ouzounis - TEXTQUEST: document clustering of medline abstracts for concept discovery in molecular biology. EMBL Cmabridge Outstation, Cambridge CB10 ISD, UK.

top related