![Page 1: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e575503460f94b4f809/html5/thumbnails/1.jpg)
Rice Sequence and Map AnalysisLeonid Teytelman
![Page 2: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e575503460f94b4f809/html5/thumbnails/2.jpg)
Rice Genome Annotation
•Sequence Alignments
•Automation
Comparative Maps
•Genetic Marker Correspondences
•FPC Map
•FPC I-Map
EnsEMBL Pipeline
•Automated Annotation
•Compute Farms
![Page 3: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e575503460f94b4f809/html5/thumbnails/3.jpg)
Rice Genome Annotation
![Page 4: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e575503460f94b4f809/html5/thumbnails/4.jpg)
Non-Rice Coding Sequences
•Maize Unigene Clusters
•Maize TIGR GIs
•Maize dbEST ESTs
•Barley dbEST ESTs
•Wheat dbEST ESTs
•Sorghum dbEST ESTs
Aligned Data Sets:
Rice CUGI BAC ends
Rice JRGP/Cornell RFLP Markers
Rice Coding Sequences
•Rice Complete CDSs
•Rice TIGR GIs
•Rice BGI EST Clusters
•Rice dbEST ESTs
•Rice BGI ESTs
Rice Cornell SSRs
![Page 5: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e575503460f94b4f809/html5/thumbnails/5.jpg)
BLAT: search & alignment
pslReps: filtering of low-quality matches
e-PCR: matches based on near-identity to the PCR primers, and correct order
Alignment Tools:
Target
Queries
![Page 6: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e575503460f94b4f809/html5/thumbnails/6.jpg)
BLAT: search & alignment
pslReps: filtering of low-quality matches
e-PCR: matches based on near-identity to the PCR primers, and correct order
Alignment Tools:
TargetTarget
Queries
![Page 7: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e575503460f94b4f809/html5/thumbnails/7.jpg)
Rice Coding Sequences:
•BLAT search & alignment
•pslReps filtering of repetitive matches
•Accept based on percent of EST length matched
Non-Rice Coding Sequences :
•BLAT search & alignment
•pslReps filtering of repetitive matches
•Accept based on hit length and hit frequency
Rice BAC ends:
•BLAT search & alignment
•Accept based on gap length, percent of BAC end length matched, percent identity, and hit frequency.
Alignment Methods:
![Page 8: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e575503460f94b4f809/html5/thumbnails/8.jpg)
Rice Markers:
•BLAT search & alignment
•Accept based on percent of marker length matched and the gap length in case of genomic markers.
•Utilize genetic map information; accept those whose genetic & physical chromosome assignment is concordant.
Rice SSRs:
•e-PCR with default parameters, allowing 0 mismatches in the primers
Alignment Methods:
![Page 9: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e575503460f94b4f809/html5/thumbnails/9.jpg)
Total BACs/PACs: 1,847Total bp: 250,879,896 (250MB ) Phase 1: 78Phase 2: 1,238Phase 3: 531Annotated Phase 3: 330 Annotated Genes: 8,034
February 2002 BAC/PAC Dataset
![Page 10: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e575503460f94b4f809/html5/thumbnails/10.jpg)
Alignment Totals
DATASET TOTAL COMPARED
TOTAL MAPPED
% MAPPED
Rice Complete CDSs 1,358 505 37%
Rice TIGR Gis 12,354 6,290 51%
Rice BGI EST Clusters 24,179 12,135 50%
Rice dbEST ESTs 104,549 49,773 48%
Rice BGI ESTs 86,623 40,049 46%
Maize Unigene Clusters 10,678 3,972 37%
Maize TIGR Gis 27,642 6,941 25%
Maize dbEST ESTs 147,657 38,718 26%
Barley dbEST ESTs 148,651 50,579 34%
Wheat dbEST ESTs 166,513 49,146 29%
Sorghum dbEST ESTs 84,711 28,044 33%
Rice CUGI BAC ends 88,053 18,260 21%
Rice JRGP/Cornell RFLP Markers 2,682 1,320 49%
Rice Cornell SSRs 524 228 44%
![Page 11: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e575503460f94b4f809/html5/thumbnails/11.jpg)
![Page 12: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e575503460f94b4f809/html5/thumbnails/12.jpg)
For each group of data sets, there is a script to automatically:
•Run pslReps
•Load results into the database
•Discard low-quality matches
•Update documentation
Automating Alignments:
![Page 13: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e575503460f94b4f809/html5/thumbnails/13.jpg)
![Page 14: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e575503460f94b4f809/html5/thumbnails/14.jpg)
![Page 15: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e575503460f94b4f809/html5/thumbnails/15.jpg)
Comparative Maps
![Page 16: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e575503460f94b4f809/html5/thumbnails/16.jpg)
Same marker on multiple mapping studies
•Name-identity
•Curated evidence
Sequence-based correspondences for JRGP and Cornell markers:
•BLAT search & alignment
•Utilize genetic mapping information, accepting matches on same chromosome and less than 30cM apart.
Map Correspondences
![Page 17: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e575503460f94b4f809/html5/thumbnails/17.jpg)
curator
same name
sequence-based
![Page 18: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e575503460f94b4f809/html5/thumbnails/18.jpg)
curator
same name
![Page 19: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e575503460f94b4f809/html5/thumbnails/19.jpg)
FPC data from CUGI, synchronized with the latest release.
![Page 20: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e575503460f94b4f809/html5/thumbnails/20.jpg)
Discordant
![Page 21: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e575503460f94b4f809/html5/thumbnails/21.jpg)
Cornell/JRGP markers mapped to sequenced clones were assigned positions on the FPC contigs.
![Page 22: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e575503460f94b4f809/html5/thumbnails/22.jpg)
Total: 2,272 4,417
![Page 23: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e575503460f94b4f809/html5/thumbnails/23.jpg)
EnsEMBL Pipeline in a Nutshell
![Page 24: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e575503460f94b4f809/html5/thumbnails/24.jpg)
•Can take advantage of a compute farm
EnsEMBL Pipeline Overview
•System for automated genome annotation
•Executes and keeps track of computational jobs
•Analysis job execution is serial, allowing stage dependencies
•Jobs are user-defined
RepeatMasker Genscan Blast GenomeBuilder Hmmer
RepeatMasker BLAT GeneWise Hmmer
![Page 25: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e575503460f94b4f809/html5/thumbnails/25.jpg)
Organization
•Utilizes and expands on the EnsEMBL-core modules and database schema
•Database stores:
•analysis program names and parameters
•analysis results
•rules for job dependencies
•and progress status for each job
•Perl modules:
•access the database
•execute specified analysis programs
•parse and load into the database the analysis results
![Page 26: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e575503460f94b4f809/html5/thumbnails/26.jpg)
Cluster Utilization
•How to split up tasks?
•Load management an scheduling (LSF, PBS, etc)
•Contig-by-contig approach
•How to execute jobs on slave nodes?
•Management of management:
•Automatic job submission
•Error/completion checking