using ngs to answer biological questions
DESCRIPTION
Using NGS to answer biological questions. Usadellab.org @ RWTH Aachen, Forschungszentrum Jülich. Microarrays and RNA Seq the old and the new. You have heard it all before. Was considered big data once. Open platform Good for SNP calling Higher dynmic range Better reflects RT-PCR data. - PowerPoint PPT PresentationTRANSCRIPT
Using NGS to answer biological questions
Usadellab.org @ RWTH Aachen, Forschungszentrum Jülich
You have heard it all before
• Open platform• Good for SNP calling• Higher dynmic range • Better reflects RT-PCR data
• Was considered big data once
Microarrays and RNA Seq the old and the new
1979
1987
1992
1994
1996
1998
2000
2002
2004
2006
2008
2010
2012
2014
0500
100015002000250030003500
Pumed
Next generation sequencingpublications in pubmed
The goldrush is still in its high steam
But all that glitters is not gold
Sometimes a closed platform is not too bad, this also means standardization and of course microarrays take much less time to download
Did you ever ask yourself: Oh let’s have a brief look a this dataset…
All that glitters is not gold
1979
1987
1992
1994
1996
1998
2000
2002
2004
2006
2008
2010
2012
2014
0500
100015002000250030003500
Pumed
Next generation sequencingpublications in pubmed
The goldrush is still in its high steam
But all that glitters is not gold
Sometimes a closed platform is not too bad, this also means standardization and of course microarrays take much less time to download
And then there is still mapping and stats
All that glitters is not gold
1979
1987
1992
1994
1996
1998
2000
2002
2004
2006
2008
2010
2012
2014
0500
100015002000250030003500
Pumed
Next generation sequencingpublications in pubmed
The goldrush is still in its high steam
But all that glitters is not gold
Sometimes a closed platform is not too bad, this also means standardization and of course microarrays take much less time to download
And storageBtw did you buy that newstorage pod?
Another 180TBAnother 20k€Or worse you can‘t build yourself
All that glitters is not gold
So does it all come to naught?
The goldrush is still in its high steam, so there is of course something
Did you ever think it were possible that you yourself can sequence a full genome, de novo of course?
Well that‘s a PhD topic now. (Within reason, up to medium sized plants, bacteria can be dealt with in a Bsc if you gt lucky)
So where are good claims to be had and what can one do about it?
Can Bioinformatics help biologists?
But the goldrush is still on
Wild relative of S. lycopersicum
Moyle 2008Source: Tomato Genome Resource Centre (TGRC)
Grows in Peru and Northern Chile(TGRC Accessions shown)
Solanum pennellii - a wild tomato relative
Schauer et al., 2006Metabolites
Intro
gres
sion
Lines
Solanum pennellii - a great source of gentic variation
x
x
S.lyc M82 S. pennellii
S.lyc M82 F1
Introgression Line Population
Trimmomatic fast & precise
Filtering Effects
Scaffolds Total N50 (>2000)
S. pennellii (V2.00) 943M 1,741,129
S. lycopersicum (V2.4) 781M 16,467,796
S. pimpinellifolium (A-1.0) - -
S. tuberosum (V3) 715M 1,354,002
Final Contigs Total Size N50
S. pennellii (V2.00) ~870M 45,7k
S. lycopersicum (V2.4) 738M 86,9k
S. pimpinellifolium (A-1.0) 689M 6k
S. tuberosum (V3) 683M 31,4k
Split on ‘N’s SNP small indel<0.03%
Solanum pennellii Assembly
Schauer et al., 2006Metabolites
Intro
gres
sion
Lines
Solanum pennellii - a great source of gentic variation
x
x
S.lyc M82 S. pennellii
S.lyc M82 F1
Introgression Line Population
Unfortunately it was the cultivar Heinz and not M82 that was sequenced
Luckily re-sequencing is relatively straight forward (sometimes)
Solanacae..... What makes them what they are
Physalis alkengi (Chinese lantern) Physalis peruviana (Cape gooseberry)
Physalis ixocarpia (tomatillo)
More than 2000 termsRedundancy reduced terms for better visualization and statistical analysis
~ 20 plant species
Automatic tool for whole transcriptome annotation
The MapMan Plant Ontology
Mercator
Data Submission
Mercator is an online resource allowing to submit large FASTA files containing plant sequences
Mercator compares the sequences to in-house annotated and classified plant sequences and searches for domains
Mercator then classifies all genes/proteins
Mercator typically processes one genome equivalent in 2-3 days in acurate mode (and faster in draft mode)
FASTA Sequence Results Summary and Tables
Mercator: Bulk Sequence classification
MapMan
MapMan: Omics on Plant Pathway visualization, testing
Pathway Visualization
MapMan is a graphical tool allowing• Pathway visualization for more about 20 plant species including all major
crops• Testing for enriched pathways and processes• Interactice data exploration and visualization e.g. Venn Diagrams,
Clustering,…
Expression Data Enrichment testing Interactive dataExploration
Physalis alkengi leaf versus rootSomething was done right
Bringing it together
Carbon Status
Arrays RNA Seq
Metabolic profiling
day night extended night
Diurnal Cycles and an Extended Night across species
Carbon Status
Arrays RNA Seq
Metabolic profiling
day night extended night
Diurnal Cycles and an Extended Night across species
The mciroarray was pretty useless <10k genes
Peak times seem to be conserved
If you are a cycling gene it seems to be good to peak around midday or midnight
Phases for orthologs seem to do much worse.....
Genes ordered by phase in Arabidopsis, if you are very far away you might see some conservation
Diurnal Cycles and an Extended Night across species
Looking at individual genes can help....
Myo Inositol pathway (MIOX) shows a conserved response
Arabidopsis Tomato
Blue upRed down
UDP-Glucose
UDP-Glucuronic Acid
Glucuronic Acid-1-P
Glucuronic Acid
Myo-InositolMiox
UGDUDP-Glucose
UDP-Glucuronic Acid
Glucuronic Acid-1-P
Glucuronic Acid
Myo-InositolMiox
UGD
Maize
UDP-Glucose
UDP-Glucuronic Acid
Glucuronic Acid-1-P
Glucuronic Acid
Myo-InositolMiox
UGDCELL WALL
Conserved Pathways
Miox Pathway shows a correlated change in
metabolites and transcripts
Blue upRed down
UDP-Glucose
UDP-Glucuronic Acid
Glucuronic Acid-1-P
Glucuronic Acid
Myo-InositolMiox
UGD
Glucuronokinase
CELL WALL
Conserved Pathways... And metabolites
UDP-sugars drop in response to Carbon depletion
ED EN XN
ED EN XN
UGD
MIOX
GK
Carbon and the Wall
UDP-sugars drop in Carbon depletion
ED EN XN
ED EN XN
UGD
GK
Carbon and the Wall
Miox Mutants show a stronger drop in UDP-sugars
ED EN XN
ED EN XN
ED EN XN
UGD
GK
Carbon and the Wall
• Not all that glitters is gold, but well treated you can find much more unexpected stories from NGS data (S.pimp)
• NGS does allow us to actually get a handle on genomes and transcritomes we couldn’t dream of before (S.penn Physalis)
• Using the openness of NGS one starts seeing new things and can compare between species
Summary
Zhangjun Fei, Jim Giovannoni, Cornell University
Raimund Tenhaken, Salzburg University
Alisdair Fernie, Mark Stitt MPI Golm
Detlef Weigel, MPI Tübingen
Acknowledgements
Thomas HerterLC-MS
usadellab.org