bioinformatica t1-bioinformatics
DESCRIPTION
Bioinformatics: the (r)evolutionTRANSCRIPT
FBW25-09-2012
Wim Van Criekinge
What is Bioinformatics ?
• Application of information technology to the storage, management and analysis of biological information (Facilitated by the use of computers)– Sequence analysis?– Molecular modeling (HTX) ?– Phylogeny/evolution?– Ecology and population studies?– Medical informatics?– Image Analysis ?– Statistics ? AI ?– Sterkstroom of zwakstroom ?
• Medicine (Pharma)– Genome analysis allows the targeting of genetic
diseases– The effect of a disease or of a therapeutic on RNA and
protein levels can be elucidated– Knowledge of protein structure facilitates drug design– Understanding of genomic variation allows the tailoring
of medical treatment to the individual’s genetic make-up
• The same techniques can be applied to crop (Agro) and livestock improvement (Animal Health)
Promises of genomics and bioinformatics
Bioinformatics: What’s in a name ?
• Begin 1990’s• “Bio-informatics”:
Computing PowerGenbank(Log)
Time (years)
Bioinformatics: What’s in a name ?
• Begin 1990’s• “Bio-informatics”:
– convergence of explosive growth in biotechnology, paralled by the explosive growth in information technology
• Not new: > 30 years that people use “computers” in biology
• In silico biology, database biology, ...
Time (years)
Happy Birthday …
PCR + dye termination
Suddenly, a flash of insight caused him to pull the car off the road and stop. He awakened his friend dozing in the passenger seat and excitedly explained to her that he had hit upon a solution - not to his original problem, but to one of even greater significance. Kary Mullis had just conceived of a simple method for producing virtually unlimited copies of a specific DNA sequence in a test tube - the polymerase chain reaction (PCR)
Math
Informatics
Bioinformatics, a scientific discipline …
Theoretical Biology
Computational Biology
(Molecular)Biology
Computer Science
Bioinformatics
Math Algorithm Development
Informatics
Interface Design
Bioinformatics, a scientific discipline …
AI, Image Analysisstructure prediction (HTX)
Theoretical Biology
Sequence Analysis
Computational Biology
(Molecular)Biology
Expert Annotation
Computer Science
NPDatamining
Bioinformatics
Math Algorithm Development
Informatics
Interface Design
Bioinformatics, a scientific discipline …
AI, Image Analysisstructure prediction (HTX)
Theoretical Biology
Sequence Analysis
Computational Biology
(Molecular)Biology
Expert Annotation
Computer Science
NPDatamining
BioinformaticsDiscovery Informatics – Computational Genomics
Doel van de cursus
• Meer dan een inleiding tot ... het is de bedoeling van de cursus een onderliggend inzicht te verschaffen achter de verschillende technieken.
• Naast het gebruik van recepten, wat terug te vinden is in delen van de syllabus laat een inzicht in – de werking van databanken – en de achterliggende algoritmen
• toe – om wisselende interfaces op nieuwe
problemen toe te passen.
Inhoud Lessen: Bioinformatica
Examen
• Theorie – Deel rond een zelf te kiezen publicatie die in
verband staat met de cursus • Bv Bioinformatics of Computational Biology
– Drie inzichtsvragen over de cursus (inclusief !!)
• Practicum (“open-book”)– Viertal oefeningen die meestal het schrijven van een
programma veronderstellen
• Puntenverdeling 50/50
Cursus
• 25 Euro–Syllabus–Hand-outs van Les/Practicum 1–V|Podcasts–Weblems – Screencasts–Flash Drive
• Image to be installed
• Timelin: Magaret Dayhoff …
Nexus > FAQ > Bioinformatics Milestones
• http://www.sciencemag.org/cgi/content/full/291/5507/1195
• Printed version in cursus
naturetheHumangenome
Setting the stage …
Genome Meters
• Genomes Online Database (GOLD 1.0)– http://geta.life.uiuc.edu/~nikos/genomes.html– http://www.ebi.ac.uk/research/cgg/genomes.html
• NCBI– http://www.ncbi.nlm.nih.gov/PMGifs/Genomes/bact.ht
ml• INFOBIOGEN
– http://www.infobiogen.fr/doc/data/complete_genome.html
Genome Size
DOGS: Database Of Genome Sizes
E. coli = 4.2 x 106
Yeast = 18 x 106
Arabidopsis = 80 x 106
C.elegans = 100 x 106
Drosophila = 180 x 106
Human/Rat/Mouse = 3000 x 106
Lily = 300 000 x 106
With ... : 99.9 %To primates: 99%
Biological Research
Adapted from John McPherson, OICRAdapted from John McPherson, OICR
And this is just the beginning ….
Next Generation Sequencing is here
Basics of the “old” technology
• Clone the DNA.• Generate a ladder of labeled (colored) molecules
that are different by 1 nucleotide.• Separate mixture on some matrix.• Detect fluorochrome by laser.• Interpret peaks as string of DNA.• Strings are 500 to 1,000 letters long• 1 machine generates 57,000 nucleotides/run• Assemble all strings into a genome.
Basics of the “new” technology
• Get DNA.• Attach it to something.• Extend and amplify signal with some color
scheme.• Detect fluorochrome by microscopy.• Interpret series of spots as short strings of DNA.• Strings are 30-300 letters long• Multiple images are interpreted as 0.4 to 1.2
GB/run (1,200,000,000 letters/day). • Map or align strings to one or many genome.
Next Generation Technologies
• 454–Emulsion PCR–Polymerase–Natural Nucleotides
• 20-100Mb for 5-15k –1% error rate–Homopolymers
One additional insight ...
Read Length is Not As Important For Resequencing
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
8 10 12 14 16 18 20
Length of K-mer Reads (bp)
% o
f P
air
ed
K-m
ers
wit
h U
niq
uely
Assig
nab
le L
ocati
on
E.COLI
HUMAN
Jay Shendure
Two Short Read Techologies
• Illumina GA
• ABI SOLID
Technology Overview: Solexa/Illumina Sequencing
ABI Solid
Dressman 2003
ABI SOLID
ABI SOLID
Paired End Reads are Important!
Repetitive DNAUnique DNA
Single read maps to multiple positions
Paired read maps uniquely
Read 1 Read 2
Known Distance
Single Molecule Sequencing
Helicos Biosciences Corp.
Microscope slide
Single DNA molecule
dNTP-Cy3
* * *
*
primer
Super-cooledTIRF microscope
Adapted from: Barak Cohen, Washington University, Bio5488 http://tinyurl.com/6zttuq http://tinyurl.com/6k26nh
Introducing
NXT GNT DXSNext Generation Diagnostics
18th september 2009
Wim Van Criekinge
develop in shortest time frame best assay for most relevant
clinical application
NXT GNT DXS
• GNT– Dedicated Team & Network– Operational: Location– Professionalized
• DXS– Content engine– Product 1 established– Pipeline for n+1
• NXT– Workflow management– Bioinformatics– Epigenetics
Next next generation sequencing
Third generation sequencing
Now sequencing
Complete genomics
Complete genomics
Pacific Biosciences: A Third Generation Sequencing Technology
Eid et al 2008
Pacific Biosciences: A Third Generation Sequencing Technology
Nanopore Sequencing
NCBI (educational resources)
Weblems
• What ?– Web-based problemes (over de huidige les
en/of voorbereiding op volgende les)• When ?
– Einde van elke les• How ?
– Oplossingen online via screencasts– Practicum– Voorbedereiding op het practicum examen ...
Niet alle problemen vereisen noodzakelijk programmacode ...
Weblems
W1.1: To which phyla do the following species belong (a) starfish (b) ginko tree (c) scorpion
W1.2: What are the common names for the following species (a) Orycterophus afer (b) Beta vulagaris (c) macrocystis pyrifera
W1.3: What species has the smallest known genome ? And is genome size related to number of genes ?
W1.4: What are the 5 latest genomes published ? How complete is “coverage” ?
W1.5: For approximately 10% of europeans, the painkiller codeine is ineffective because the patients lack the enzyme that converts codeine into the active molecule, morphine. What is the most common mutation that causes this condition ?