information system for comparative analysis of legume genomes anita dalwani advisors: dr. roger...
TRANSCRIPT
Information System for Information System for Comparative Analysis of Legume Comparative Analysis of Legume
GenomesGenomes Anita Dalwani
Advisors: Dr. Roger Innes,
Dr. Haixu Tang
MotivationMotivation• Goal of legume genome project
- Investigate the process of genome restructuring following polyploidization in plants (soybean and its relatives in the Glycine genus)
- Try answering questions like :
- Genome evolution on both short(<100,000yrs) and long (>50 million yrs) time scale
- Evolution of disease resistance (R) genes.
MotivationMotivation
• To answer these questions:
- 1 Mbp syntenic genomic regions from six taxa as well as their duplicated regions in the polyploidy members (12 such regions in total) will be sequenced and analyzed.
- These regions contain several important disease resistance (R) genes.
MotivationMotivation
Plant species and accession No. of regions to be analysed
Whole
Genome size (megabases)
G. max cultivar Williams 82 2 1103
G. max PI 96983 2 1103
G. tomentella G1188 (2n=80) 4 2083
G. tomentella race D3 (2n-40) 2 1103
Teramnus labialus 1 < 700
Medicago truncatula 1 466
MotivationMotivation
• Information System
- central repository for the data
- stores and retrieves updated information
- bioinformatics and visualization tools
ParticipantsParticipants
Participants University Roles
Roger Innes
Tom Ashfield
Anita Dalwani
Murali Mohan
Innes Lab
Indiana University, Bloomington.
Principal Investigator
R gene evolution
Database development, Web application.
Database development.
Nevin Young
Steve Cannon
Roxanne Denny
Young Lab,
University of Minnesota
Co-PI
phylogenetic; R genes; comparative genomics.
Lab Manager
Jeff Doyle
Bernard Pfeil
Doyle Lab
Cornell University
Co-PI
phylogenetic and polyploidy
Bruce Roe
Majesta Siegfried
Roe Lab,
Oklahoma University
Co-PI
Bac sequencing
Saghai Maroof
Milind Ratnaparkhe
Jafar Mammado
Maroof Lab,
Virginia Tech
Co-PI
R genes; comparative genomics
R genes; comparative genomics
BackgroundBackground• Procedure
1. Create and make available Bacterial Artificial Chromosome (BAC) libraries of each species.
Indexing available BAC, BAC end sequences, library, probes, vector, gel images
BackgroundBackground
2. Assemble syntenic BAC contigs from each library
i. Strategically chosen soybean clones are used as probes
Probe 53 - ACCCGT
Probe 21 - AATTC
Probe 9 - GTACTT
Probe 26 - AAACT
Probe 1 - CCCC
Probe 3 - AATC
ACCCGT AATTC GTACTT AAACT CCCC AATC
ii. Individual probes are hybridized to high-density BAC filters representing all the target genomes
BackgroundBackground
iii. Integrity of contigs is confirmed by fingerprinting
iv. Set of clones that hybridize to two or more probes are selected
v. BACs representing the tentative minimum tiling path will be end sequenced
BackgroundBackground
3. DNA sequencing, Assembly , Annotation
4. Compare the content, order and sequence of gene
5. Results available for public
Importance Importance
• Information System- Centrally available data
- User-friendly interface for retrieving the information
- Updated progress information
- Tools for interpreting the results.
Works as an Laboratory Management Information System
DesignDesign
• Steps for designing the Information System.
1. Design the Database
- Data: BAC, BES, Probes, Libraries, vector, library screen hits etc.
DesignDesign
- Visualize the relationship between these large amount of data.
For example, Library table stores detailed information about
each library used rather than having each BAC storing the library information
DesignDesign
- Created tables based on these relationship
Main tables used in the database are:
BAC BES GEL IMAGES GENOMIC SOUTHERNS GENOTYPE LIBRARY LIBRARY SCREENS LIBRARY SCREEN HITS PRIMER PROBE PROBE WITHIN BACS VECTOR
DesignDesign
B A C
P R O B E
L IB R A R Y
H a s
H a s
B E S
H a sP R IM E R
H a s
is a
D e r i ve dfr o m
is a
DesignDesign
2. Populate the database with initial set of data
- Initial set of data was stored in form of MS- Excel.
- Perl script for parsing information.
DesignDesign
• Web Database Application- understanding the needs for the project - Web database interface - displays information about the
project- add and update interface- tools for analyses
DesignDesign
• For determining the tiling path
- Designing a Visualization tool
- displays the locations of the clones with respect to probes
- Probes are strategically chosen from soybean genomes
DesignDesign
- Input : library name
- subset of probes with at least one hit with the library are selected
- BAC clones for the library are generated which have hits with probes
- Probes are arranged in order of their position
- BACs are mapped to these probes.
DesignDesign
• System Specifications
- Database: Oracle 9i
- Languages: PHP, Perl, HTML, JavaScript
- Web Server: Apache 1.3.29
- Platform: Unix (SunOS 5.9)
AcknowledgementsAcknowledgements
• Dr. Roger Innes
• Dr. Haixu Tang
• Dr. Sun Kim
• Legume genome project team