what is bioinformatics?hpc.ilri.cgiar.org/beca/training/ilri-aau_2015/clc/k...1000 genomes project...
TRANSCRIPT
![Page 1: What is Bioinformatics?hpc.ilri.cgiar.org/beca/training/ilri-aau_2015/clc/K...1000 Genomes Project (). An effort to sequence the genome of 1000 people to identify genetic variants](https://reader035.vdocuments.us/reader035/viewer/2022062606/5fe2dab3c81b4b4bfd181c5d/html5/thumbnails/1.jpg)
What is Bioinformatics?
■ “Bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline.” - NCBI
■ “The ultimate goal of the field is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned.” - NCBI
http://www.ncbi.nlm.nih.gov/About/primer/bioinformatics.html
![Page 2: What is Bioinformatics?hpc.ilri.cgiar.org/beca/training/ilri-aau_2015/clc/K...1000 Genomes Project (). An effort to sequence the genome of 1000 people to identify genetic variants](https://reader035.vdocuments.us/reader035/viewer/2022062606/5fe2dab3c81b4b4bfd181c5d/html5/thumbnails/2.jpg)
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
⇒ A 3’ hydroxyl group is essential for chain elongation
CHAIN TERMINATOR
DNA Sequencing
5’
3’
![Page 3: What is Bioinformatics?hpc.ilri.cgiar.org/beca/training/ilri-aau_2015/clc/K...1000 Genomes Project (). An effort to sequence the genome of 1000 people to identify genetic variants](https://reader035.vdocuments.us/reader035/viewer/2022062606/5fe2dab3c81b4b4bfd181c5d/html5/thumbnails/3.jpg)
Capillary Gel Electrophoresis
⇒ The sequencing reaction is run out in a single capillary gel. ⇒ The gel is scanned by a laser. ⇒ The sequence is read automatically using computer software from the pattern of different wavelengths emitted by the fluorescent dyes.
![Page 4: What is Bioinformatics?hpc.ilri.cgiar.org/beca/training/ilri-aau_2015/clc/K...1000 Genomes Project (). An effort to sequence the genome of 1000 people to identify genetic variants](https://reader035.vdocuments.us/reader035/viewer/2022062606/5fe2dab3c81b4b4bfd181c5d/html5/thumbnails/4.jpg)
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
![Page 5: What is Bioinformatics?hpc.ilri.cgiar.org/beca/training/ilri-aau_2015/clc/K...1000 Genomes Project (). An effort to sequence the genome of 1000 people to identify genetic variants](https://reader035.vdocuments.us/reader035/viewer/2022062606/5fe2dab3c81b4b4bfd181c5d/html5/thumbnails/5.jpg)
Automated sequencers: ABI 3700
■ Made by Applied
Biosystems ■ Most widely used
automated sequencers:
– 96 capillaries – robot loading from 384-
well plates ■ Two to three hours per
run ■ 600–700 bases per run
96–well plate
robotic arm and syringe
96 glass capillaries
load bar
![Page 6: What is Bioinformatics?hpc.ilri.cgiar.org/beca/training/ilri-aau_2015/clc/K...1000 Genomes Project (). An effort to sequence the genome of 1000 people to identify genetic variants](https://reader035.vdocuments.us/reader035/viewer/2022062606/5fe2dab3c81b4b4bfd181c5d/html5/thumbnails/6.jpg)
Workflow of conventional vs. second-generation sequencing
6
High-throughput shotgun Sanger sequencing
Cyclic array shotgun sequencing
96 or 384 long reads per run
Millions of short reads per run
Template immobilization Sanger cycle seq
(Template amplification)
Template amplification
Capillary electrophoresis
Seq by synthesis or hybridization
![Page 7: What is Bioinformatics?hpc.ilri.cgiar.org/beca/training/ilri-aau_2015/clc/K...1000 Genomes Project (). An effort to sequence the genome of 1000 people to identify genetic variants](https://reader035.vdocuments.us/reader035/viewer/2022062606/5fe2dab3c81b4b4bfd181c5d/html5/thumbnails/7.jpg)
Illumina
7 Figu
re fr
om M
. Met
zker
, Nat
Rev
Gen
et, J
an. 2
010
![Page 8: What is Bioinformatics?hpc.ilri.cgiar.org/beca/training/ilri-aau_2015/clc/K...1000 Genomes Project (). An effort to sequence the genome of 1000 people to identify genetic variants](https://reader035.vdocuments.us/reader035/viewer/2022062606/5fe2dab3c81b4b4bfd181c5d/html5/thumbnails/8.jpg)
Cost of Sequence per megabase
![Page 9: What is Bioinformatics?hpc.ilri.cgiar.org/beca/training/ilri-aau_2015/clc/K...1000 Genomes Project (). An effort to sequence the genome of 1000 people to identify genetic variants](https://reader035.vdocuments.us/reader035/viewer/2022062606/5fe2dab3c81b4b4bfd181c5d/html5/thumbnails/9.jpg)
Benefits of Next-gen sequencing
https://genomevolution.org/wiki/images/1/16/Plant_Genome_Growth.png
![Page 10: What is Bioinformatics?hpc.ilri.cgiar.org/beca/training/ilri-aau_2015/clc/K...1000 Genomes Project (). An effort to sequence the genome of 1000 people to identify genetic variants](https://reader035.vdocuments.us/reader035/viewer/2022062606/5fe2dab3c81b4b4bfd181c5d/html5/thumbnails/10.jpg)
Why do we sequence? ■ Genome Annotation:
A complete genome sequence provides us with the raw data to construct a "parts list".
■ Comparative Genomics:
Conserved regions in the genome are more likely to play an important role in biology of the species.
■ Functional Genomics:
Sequencing the RNA provides us with an insight into the transcriptionally active regions of the genome.
■ Population Genetics and Genomics:
Genetic structure and diversity reveals history and distribution of phenotypic traits (e.g. disease susceptibility alleles)
■ Genetic Analysis:
Map and characterize molecular basis of allelic variants 10
![Page 11: What is Bioinformatics?hpc.ilri.cgiar.org/beca/training/ilri-aau_2015/clc/K...1000 Genomes Project (). An effort to sequence the genome of 1000 people to identify genetic variants](https://reader035.vdocuments.us/reader035/viewer/2022062606/5fe2dab3c81b4b4bfd181c5d/html5/thumbnails/11.jpg)
We have the genome sequence, now what ?
● Well...!● We don’t know how many genes there are!!● We don’t know where they are!!● We don’t know what they do!!
![Page 12: What is Bioinformatics?hpc.ilri.cgiar.org/beca/training/ilri-aau_2015/clc/K...1000 Genomes Project (). An effort to sequence the genome of 1000 people to identify genetic variants](https://reader035.vdocuments.us/reader035/viewer/2022062606/5fe2dab3c81b4b4bfd181c5d/html5/thumbnails/12.jpg)
Definitions of Annotation
■ Interpreting raw sequence data into useful biological information
■ Information attached to genomic coordinates with start and end point, can occur at different levels
■ Addition of as much reliable and up-to-date information as possible to describe a sequence
■ Identification, structural description, characterization of putative protein products and other features in primary genomic sequence
![Page 13: What is Bioinformatics?hpc.ilri.cgiar.org/beca/training/ilri-aau_2015/clc/K...1000 Genomes Project (). An effort to sequence the genome of 1000 people to identify genetic variants](https://reader035.vdocuments.us/reader035/viewer/2022062606/5fe2dab3c81b4b4bfd181c5d/html5/thumbnails/13.jpg)
Genome annotation
• Structural annotation = Nucleotide-Protein level annotation. Finding genes and other biologically relevant sites thus building up a model of genome as objects with specific locations
• Functional annotation = Objects are used in database searches (and experiments) aim is attributing biologically relevant information to whole sequence and individual objects
Large-scale genome analysis projects
• Rate-limiting step is annotation
Two Main Levels
![Page 14: What is Bioinformatics?hpc.ilri.cgiar.org/beca/training/ilri-aau_2015/clc/K...1000 Genomes Project (). An effort to sequence the genome of 1000 people to identify genetic variants](https://reader035.vdocuments.us/reader035/viewer/2022062606/5fe2dab3c81b4b4bfd181c5d/html5/thumbnails/14.jpg)
14
How do we get from here …
![Page 15: What is Bioinformatics?hpc.ilri.cgiar.org/beca/training/ilri-aau_2015/clc/K...1000 Genomes Project (). An effort to sequence the genome of 1000 people to identify genetic variants](https://reader035.vdocuments.us/reader035/viewer/2022062606/5fe2dab3c81b4b4bfd181c5d/html5/thumbnails/15.jpg)
to here,
![Page 16: What is Bioinformatics?hpc.ilri.cgiar.org/beca/training/ilri-aau_2015/clc/K...1000 Genomes Project (). An effort to sequence the genome of 1000 people to identify genetic variants](https://reader035.vdocuments.us/reader035/viewer/2022062606/5fe2dab3c81b4b4bfd181c5d/html5/thumbnails/16.jpg)
Summary of gene annotation steps
![Page 17: What is Bioinformatics?hpc.ilri.cgiar.org/beca/training/ilri-aau_2015/clc/K...1000 Genomes Project (). An effort to sequence the genome of 1000 people to identify genetic variants](https://reader035.vdocuments.us/reader035/viewer/2022062606/5fe2dab3c81b4b4bfd181c5d/html5/thumbnails/17.jpg)
Gene prediction through comparative genomics
■ Highly similar (Conserved) regions between two genomes are useful or else they would have diverged
■ If genomes are too closely related all regions are similar, not just genes
■ If genomes are too far apart, analogous regions may be too dissimilar to be found
17
![Page 18: What is Bioinformatics?hpc.ilri.cgiar.org/beca/training/ilri-aau_2015/clc/K...1000 Genomes Project (). An effort to sequence the genome of 1000 people to identify genetic variants](https://reader035.vdocuments.us/reader035/viewer/2022062606/5fe2dab3c81b4b4bfd181c5d/html5/thumbnails/18.jpg)
Mouse-human comparison
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
![Page 19: What is Bioinformatics?hpc.ilri.cgiar.org/beca/training/ilri-aau_2015/clc/K...1000 Genomes Project (). An effort to sequence the genome of 1000 people to identify genetic variants](https://reader035.vdocuments.us/reader035/viewer/2022062606/5fe2dab3c81b4b4bfd181c5d/html5/thumbnails/19.jpg)
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 From: J.W. Thomas et al - Nature 14 August 2003
![Page 20: What is Bioinformatics?hpc.ilri.cgiar.org/beca/training/ilri-aau_2015/clc/K...1000 Genomes Project (). An effort to sequence the genome of 1000 people to identify genetic variants](https://reader035.vdocuments.us/reader035/viewer/2022062606/5fe2dab3c81b4b4bfd181c5d/html5/thumbnails/20.jpg)
The ENCODE Project Consortium (2011) A User’s Guide to the Encyclopedia of DNA Elements (ENCODE). PLoS Biol 9(4)
![Page 21: What is Bioinformatics?hpc.ilri.cgiar.org/beca/training/ilri-aau_2015/clc/K...1000 Genomes Project (). An effort to sequence the genome of 1000 people to identify genetic variants](https://reader035.vdocuments.us/reader035/viewer/2022062606/5fe2dab3c81b4b4bfd181c5d/html5/thumbnails/21.jpg)
Automated Manual Merged
![Page 22: What is Bioinformatics?hpc.ilri.cgiar.org/beca/training/ilri-aau_2015/clc/K...1000 Genomes Project (). An effort to sequence the genome of 1000 people to identify genetic variants](https://reader035.vdocuments.us/reader035/viewer/2022062606/5fe2dab3c81b4b4bfd181c5d/html5/thumbnails/22.jpg)
![Page 23: What is Bioinformatics?hpc.ilri.cgiar.org/beca/training/ilri-aau_2015/clc/K...1000 Genomes Project (). An effort to sequence the genome of 1000 people to identify genetic variants](https://reader035.vdocuments.us/reader035/viewer/2022062606/5fe2dab3c81b4b4bfd181c5d/html5/thumbnails/23.jpg)
Basic Distributed Annotation Systems (DAS)
![Page 24: What is Bioinformatics?hpc.ilri.cgiar.org/beca/training/ilri-aau_2015/clc/K...1000 Genomes Project (). An effort to sequence the genome of 1000 people to identify genetic variants](https://reader035.vdocuments.us/reader035/viewer/2022062606/5fe2dab3c81b4b4bfd181c5d/html5/thumbnails/24.jpg)
Genome and Functional Annotation:
Predicted genes, GO, MIPSFuncat
Data to support modeling efforts Protein-protein interactions Protein-DNA interactions Pathways (KEGG, AraCyc)
Experimental Data Microarray Chip-Chip
Contents of an Integrated Database
![Page 25: What is Bioinformatics?hpc.ilri.cgiar.org/beca/training/ilri-aau_2015/clc/K...1000 Genomes Project (). An effort to sequence the genome of 1000 people to identify genetic variants](https://reader035.vdocuments.us/reader035/viewer/2022062606/5fe2dab3c81b4b4bfd181c5d/html5/thumbnails/25.jpg)
Bioinformaticians integrate the data into one database
1) Find the data. Decentralized databases Data in different formats
XML is a good idea (SBML) 2) Convert to a common format
3) Data integration. Manual: Excel sheet comparisons (Biologists) Automated: Perl Scripts (Informatician) Database: Queries e.g. SQL (High-production labs)
4) Gene list intersect.
Experiments Function Models
5) Modeling Biological function in Gene list Need visualization and network modeling tools
Annotation
![Page 26: What is Bioinformatics?hpc.ilri.cgiar.org/beca/training/ilri-aau_2015/clc/K...1000 Genomes Project (). An effort to sequence the genome of 1000 people to identify genetic variants](https://reader035.vdocuments.us/reader035/viewer/2022062606/5fe2dab3c81b4b4bfd181c5d/html5/thumbnails/26.jpg)
UCSC browser
![Page 27: What is Bioinformatics?hpc.ilri.cgiar.org/beca/training/ilri-aau_2015/clc/K...1000 Genomes Project (). An effort to sequence the genome of 1000 people to identify genetic variants](https://reader035.vdocuments.us/reader035/viewer/2022062606/5fe2dab3c81b4b4bfd181c5d/html5/thumbnails/27.jpg)
Examples of Large Genome Projects ■ 1000 Genomes Project (www.1000genomes.org). An effort to
sequence the genome of 1000 people to identify genetic variants that affect 1% of the human population.
■ 1001 Arabidopsis thaliana Genomes Project (www.1001genomes.org) . Study the genomes and phenotypes of 1001 strains that can explain difference in phenotype caused by adaptation of different conditions.
■ Metagenomics (http://commonfund.nih.gov/hmp/): Sequencing of DNA samples from environments, for example mouth, skin, and digestive system, to identify the different bacterial species present.
![Page 28: What is Bioinformatics?hpc.ilri.cgiar.org/beca/training/ilri-aau_2015/clc/K...1000 Genomes Project (). An effort to sequence the genome of 1000 people to identify genetic variants](https://reader035.vdocuments.us/reader035/viewer/2022062606/5fe2dab3c81b4b4bfd181c5d/html5/thumbnails/28.jpg)
Your genome ■ Personal Genome Sequencing: Several companies provide a
service where you can submit your DNA to get sequenced. This can help you learn more about your heritage and also which diseases you are susceptible to.
■ Medical Genomic Studies: There are already a collection of genetic testing procedures that look for specific genes. Unfortunately they are not accurate which can result in individuals making bad decisions. But hope is that with more genes, we can make better and more informed decisions.