the progress of glossina genomics at riken gsc
Post on 14-Jan-2016
37 Views
Preview:
DESCRIPTION
TRANSCRIPT
The progress of Glossina genomics at RIKEN GSC
Todd Taylortaylor@gsc.riken.jp
RIKEN Genomic Sciences Center, Yokohama, Japan(on behalf of Masahira Hattori)
December 15, 2006, IGGI, Sanger, UK
Background
• Sequencing and analysis of human chromosomes • 11, 18 and 21• Contributed about 4-5% of human genome sequence
• Sequencing and analysis of chimpanzee genomic regions including• Whole-genome BAC-end sequence analysis• Chimpanzee chromosome 22
• Found differences (most minor) in nearly all of the coding genes between human and chimp
• Chimpanzee Y chromosome
• Development of novel methods for gene and promoter prediction• Identifying genes missed by other high-throughput
methods
• Identification of unique regulatory mechanisms
Phase III sequence-related activities
• BAC ends
• Finished BAC clones
• Full length cDNAs
• Whole-genome shotgun
BAC end sequencing • The first BAC library has been
constructed (Yale) and 100,000 BAC end sequences are being produced (RIKEN)• Not yet• We will be able to sequence the ends of
up to 50,000 BACs (100,000 reads)• Or possibly more if fosmid ends instead?
• Can start from April 2007• Will take about one month
Finished BAC clone sequencing• Five BACs have been fully
sequenced (RIKEN) and no serious 'issues' have arisen.• VMRC29 library (CHORI)
• 97H16, 39G22, 36N9, 31O6, 3E11
• 759,387 bp• GC level: 38.89%• Repeat content: 6.10%
• Using the Drosophila fruit fly genus repeat library
file name: gmm_clonessequences: 5total length: 759387 bpGC level: 38.89 %bases masked: 46333 bp ( 6.10 %)===================================================== number of length percentage elements occupied of sequence-----------------------------------------------------Retroelements 56 12376 bp 1.63 % SINEs: 0 0 bp 0.00 % Penelope 31 2872 bp 0.38 % LINEs: 49 7695 bp 1.01 % CRE/SLACS 0 0 bp 0.00 % L2/CR1/Rex 7 3181 bp 0.42 % R1/LOA/Jockey 5 1138 bp 0.15 % R2/R4/NeSL 1 51 bp 0.01 % LTR elements: 7 4681 bp 0.62 % BEL/Pao 2 230 bp 0.03 % Gypsy/DIRS1 5 4451 bp 0.59 %
DNA transposons 10 4348 bp 0.57 % Tc1-IS630-Pogo 8 2143 bp 0.28 % Other (Mirage, 1 126 bp 0.02 % P-element, Transib)
Total interspersed repeats: 16724 bp 2.20 %
Small RNA: 3 1357 bp 0.18 %
Simple repeats: 237 12658 bp 1.67 %Low complexity: 366 15594 bp 2.05 %
The query species was assumed to be "Drosophila fruit fly genus".
Homo sapiens ( 4.08 %)Anopheles genus ( 4.52 %)
RepeatMasker
Full-length cDNA sequencing• Full length cDNAs for G. m morsitans
(RIKEN) will be constructed and Sanger will perform a few hundred full length sequences on these. RIKEN will do some 5´ end sequencing.• Full-length cDNA libraries were
prepared by Junichi Watanabe (Univ. Tokyo)
• Sequencing of 9,462 cDNA clones (5' one pass) was recently completed
Whole-genome shotgun sequencing
• RIKEN has applied to Japanese sources for funding for a further 3 million shotgun sequences (~3X coverage).• We failed to get the funding• At present, we have no money for WGS
or additional BAC finishing• Will try for more
• Japanese-African collaborative projects looking somewhat hopeful
LibraryLibrary Sample InformationSample Information SequencesSequences
TCTC Fat Body/Milk GlandFat Body/Milk Gland 3,0593,059
GMSGGMSG Salivary GlandSalivary Gland 7,4937,493
GMREGMRE ReproductiveReproductive 1,5021,502
GMMGMM MidgutMidgut 7,0157,015
cDNAcDNA Full Length cDNA SequencesFull Length cDNA Sequences 190190
TUM/TUFTUM/TUF Tsetse Fly Whole Genome Tsetse Fly Whole Genome cDNA LibrariescDNA Libraries
9,4629,462
Total Number of SequencesTotal Number of Sequences 28,72128,721
Dataset containing ESTs and partial cDNA sequences
Strategy and results obtained from preliminary Strategy and results obtained from preliminary analysisanalysis28,721 sequences were assembled into contigs and identified singletons28,721 sequences were assembled into contigs and identified singletons
Total Contigs made=3,857; Total Singletons= 10,213Total Contigs made=3,857; Total Singletons= 10,213
Translated contigs and singletons into Six Reading FramesTranslated contigs and singletons into Six Reading Frames
Homology searched in SwissProt and NR protein databasesHomology searched in SwissProt and NR protein databases
Annotated Annotated 2,5692,569 ORFs out of 3,857 contigs ORFs out of 3,857 contigsAnnotated Annotated 2,7832,783 ORFs out of 10,213 singletons ORFs out of 10,213 singletons
CAP3CAP3
3,857 3,857 contigscontigs
30,942 30,942 ORFsORFs
TranseqTranseq 10,213 10,213 singletonssingletons
TranseqTranseq 57,860 57,860 ORFsORFs
33% sequence identity33% sequence identityBLATBLAT
Selected continuous ORFs containing atleast 50 amino acidsSelected continuous ORFs containing atleast 50 amino acids
Drosophila (84%)
Anopheles (2%)
Aedes (3%)Others (6%)
Glossina (5%)
A large percent of ORFs from TseTse A large percent of ORFs from TseTse fly contigs resemble those of ‘fruit fly contigs resemble those of ‘fruit
fly’fly’
A large percent of ORFs from TseTse A large percent of ORFs from TseTse fly Singletons resemble those of fly Singletons resemble those of
‘fruit fly’‘fruit fly’
Drosophila (81%)
Anopheles (2%)
Aedes (5%)Others (9%)
Glossina (3%)
METABROWSER : a resource to analyse the METABROWSER : a resource to analyse the metagenomemetagenome
GENE GENE PREDICTIONPREDICTION
FUNCTIONAL FUNCTIONAL ANNOTATIONANNOTATION
Metagenome Metagenome Analysis Analysis PipeLinePipeLine
USERUSER
INPUTINPUT
Genomic Genomic Contigs & Contigs & SequencesSequences
Query the Query the Metagnome Data Metagnome Data
BrowserBrowser
BROWSEBROWSE
ADVANCED ANALYSISADVANCED ANALYSIS
Predicted Predicted GenesGenes
AnnotatedAnnotatedGenesGenes
GLIMMERGLIMMER
GENEMARKGENEMARK
GETORFGETORF
CRITICACRITICA
MetaGeneMetaGene
BLASTBLAST
INTERPROINTERPROSCANSCAN
PLHOSTPLHOST
PROSITEPROSITESCANSCAN
COGsCOGs
Manatee (GO)Manatee (GO)
FingerPRINTscanFingerPRINTscan
JAFA ?JAFA ?
HT-GO-FATHT-GO-FAT
PubSearchPubSearch
BLIMPS (BLOCKS)BLIMPS (BLOCKS)
PfamPfam
MetabolicMetabolicPathwaysPathways
ComparativeComparativeGenomicsGenomics
PhylogeneticPhylogeneticClassificationClassification
ProteinProteinInteractionInteraction
EnzymeEnzymeClassificationClassification
16s ribosomal16s ribosomalRNA analysisRNA analysis
TaxonomicTaxonomicClassificationClassification
PathogenicityPathogenicityindexindex
Origin ofOrigin ofReplicationReplication
SecondarySecondaryStructureStructurePredictionPrediction
Fold PredictionFold Prediction
Other Other AnalysisAnalysis
Metagenome Metagenome Data Browser Data Browser : Data from : Data from our internal our internal
projectsprojects
METABROWSER : a resource to analyse the metagenomeMETABROWSER : a resource to analyse the metagenome
Metagenome Metagenome Data BrowserData Browser
GenesGenes
ProteinsProteins
NovelNovelPathwaysPathways
ComparativeComparativeAnalysisAnalysis
DownloadDownload
SequenceSequence
Novel Novel GenomesGenomes
NovelNovelProteinsProteins
Other RelatedOther RelatedInformationInformation
Current & Future Plans• Sequencing
• More if funding allows
• Analysis• We can contribute to the informatics of
the Glossina genome, including cDNA analysis and annotation
• But we don’t want to duplicate anyone’s efforts
• Also BES mapping and comparative analysis with Drosophila, mosquito, etc.
• ???
Acknowledgements
• Informatics (RIKEN)• Tulika Prakash Srivastava• Vineet K. Sharma• Todd D. Taylor
• Sequencing & Data Access• Atsushi Toyoda (RIKEN)• Junichi Watanabe (Univ. Tokyo)• Hiroyuki Wakaguri (Univ. Tokyo)• Yamashita (Kitasato Univ.)• Serap Aksoy (Yale)• Geoff Attardo (Yale)
• Other• Masahira Hattori (Univ. Tokyo/RIKEN)• Yoshiyuki Sakaki (RIKEN)
top related