center for integrated fungal research
DESCRIPTION
Center for Integrated Fungal Research. Fungal Genomics Laboratory. Industrial applications. glutamic acid citric acid amylases proteases lipases. Bioterrorism. Biologically interesting and genetically tractable. Insight into eukaryotic gene regulation and development. - PowerPoint PPT PresentationTRANSCRIPT
Center for Integrated Fungal Research
Fungal Genomics Laboratory
•glutamic acid•citric acid•amylases•proteases•lipases
Industrial applications
Bioterrorism
Biologically interesting and genetically tractable
Insight into eukaryotic gene regulation and development
Framework of rice blast genome
1f 2f 3f 4f 5f 6f 1r 2r 3r 4r 5r 6r
RFLP 1 RFLP 2 RFLP 3
BAC 1BAC 2
BAC 3BAC 4
BAC 5BAC 6
A. BAC-end sequence provides “Sequence
Tag Connectors”
C. BAC contigs anchored to genetic map
B. BAC fingerprints used to create
contigs
STC: ~500 bp sequence every 3-4 kb across genome
Deep (25X) large insert (130 kb) single enzyme (HindIII) BAC library from rice infecting strain 70-15 – 9,216 clones
USDA-IFAFSproject Oct 2000
“Gene discovery in the rice blast fungus: ESTs and sequence of chromosome 7”1. Generate ~5 X draft sequence of
chromosome 7 (4.2 Mb).
2. Generate 35,000 ESTs and create a set of ~5,000 ESTs representing unique genes.
3. Provide basic sequence analysis and integration of data into physical map of chromosome 7.
NSF-IFAFS projectOct 2001
whole genome sequencehost-pathogen function analysis
• Generate ~7 x draft sequence of M.grisea
• Generate 50,000 knockouts
• Analyze host-pathogen interaction
• Provide basic sequence analysis
Consequences of Scaling
• Moore’s law has allowed labs to keep ahead of data
• Sequence data is now outpacing processing capability
• Bioinformatics processing will be a real problem
1994 1995 1996 1997 1998 1999 2000 2001 2002
lab processing Base Pairs Sequences moore's law
Computational platforms
• Modern biology requires robust computational platforms
• Computer technology implementation is expensive (from a biologists viewpoint)
• Computer technology development is even more expensive (you want how much?!)
• This detracts from research for small labs
On the brink
• Significant investment in off the shelf components and cross training people
• Moderate sized genomes• 20 to 50 Mega Bases
• Takes 2 weeks for initial analyses• Homology searches take days
(www.fungalgenomics.ncsu.edu)Local blast
Link to genetic information (blue)
Link to marker data and other data at http://ascus.cit.cornell.edu/blastdb/
Select a chromosome
Federated database
High Throughput Genomic Processing and Display
Rice blastN. crassa synteny
97 out of 179 unique ESTs from chromosome 7 gave significant (E<10-5) tBlastX match to N. crassa genome shotgun assembly
N. crassa Contig 1.515
M. grisea - BAC 6J18 111kb
185kb
N. crassa Contig 1.13
N. crassa Contig 1.513
N. crassa Contig 1.841
20 kb
17 kb
0.5 kb
1 kb
10 kb
2 kb
1 kb15 kb
3 kb
CIFR BioInformaticsBiological
results GRLRube
SequencePipe line
SequenceData
RelationalData Model
SubmissionsExtraction
PublicHttp
ExposureHigh ThroughputWebBlaster
Genome
HttpBlast
ReportBlastReport db
consed
DataLoading
maskPhredPhrap
ArtemisCuration
CurationWork area load
extract
PBS/LSFGrid Access
NC BioInformaticsSuper computing
Grid
Cluster analysis
Higher Order BioInformatics
synteny
Pathway analysis
OOGenomicAnalysis
Repeat analysisGene predictionEST analysis
homology
In-silico mutationCellular models
Foundation
BioInform
atics
Advanced
BioInform
atics
Research
BioInform
atics
Developed atCIFR
Ongoing work at CIFR
Open sourceand others
BioPerlInterface
Genbank
AlkaESTData
mining
browser
• Our whole genome arrives Spring 2002• Everyone wants immediate results• Host (Rice) genome size far greater
than the pathogen • Comparative genomics likely to require
N way analyses• And then there’s proteomics ….
And over the . . . edge
Research Biology
NCSU GRL•Romulus•Remus
~6 years to sequenceM.grisea
Excellent foundation work
Industrial Scale Biology
High ThroughputSequence Centers(Whitehead)
~4 days to sequenceM.grisea
Research Bioinformatics
CIFR FGL•Mycelial mat
est. 4 years to analyzeM.grisea
Excellent foundation work
Industrial Scale Bioinformatics
NorthCarolinaBioGrid
Hopefully 4 hours to analyzeM.grisea
Islands of Capability
• There are not enough resources for every lab to re-implement technologies
• Individual centers specialize according to their research focus
• Grid ties together disparate systems• Share knowledge and capabilities• Standards based for interoperability
• Organized distributed research - “Virtual Centers”• Bioinformatics
• Tool development• Gene prediction algorithms for filamentous fungi• Gene Indexing
• “Distributed Annotation Systems (DAS)” • Develop better search features “Queries”• Integrate sequenced and annotated BAC clones• Integrate ESTs and expression profiles etc
• Functional Genomics• Comparative studies - saprophyte vs pathogen etc• Coordinate IRBGC and PGI etc
• Complete nucleotide sequence, full length ESTs• Knock out/silence all genes• Transcriptional profiling in various backgrounds (path
mutants)• Construct protein-protein linkage maps (signaling
pathways)
Future directions5 years*
* The biologists view
Future Directions5 years*
• Collaborative knowledge sharing• New data mining approaches• New ways of visualizing the information• In-silico experimentation
• Gene knock outs• Regulatory modification• Pathway models• Cellular models
* The bioinformaticians view
Finding solutions to practical problems
• Seeking answers requires asking questions• Takes 1-2 weeks per question• BioGrid may give near real-time response• BioGrid will bridge the islands of capability• Focus resources back on our work• Consequently, we are going to further
accelerate the rate of discovery