Doug Brutlag 2011
Next Generation Sequencing andHuman Genome Databases
Doug BrutlagProfessor Emeritus of Biochemistry & Medicine
Stanford University School of Medicine
Genomics, Bioinformatics & Medicinehttp://biochem158.stanford.edu/
Doug Brutlag 2011
Illumina Solexa Sequencing Technology
Doug Brutlag 2011
Emulsion Based Clonal Amplification
Micro-reactors Adapter carrying
library DNAAnneal DNA template
to capture beads
Break micro-reactors Isolate DNA
containing beads
Single test tube generation of millions of clonally amplified sequencing templatesNo cloning and colony picking
“Water-in-oil” emulsion
+ PCR Reagents
+ Emulsion Oil
Perform emulsion PCR
A
B
Doug Brutlag 2011
Pacific Biosciences SMRT Sequencing
Doug Brutlag 2011
Pacific Biosciences Sequencing
Doug Brutlag 2011
Phospholinked Fluorophores
Doug Brutlag 2011
Processive Synthesis
Doug Brutlag 2011
Synthesis of Long Duplex DNA
Doug Brutlag 2011
Circular Templates Gives Redundant Sequencing and
Accuracy
Doug Brutlag 2011
Circular Templates Gives Redundant
Sequencing and Accuracy
Doug Brutlag 2011
Ion Torrent Sequencing
Doug Brutlag 2011
Ion Torrent Sequencing
Doug Brutlag 2011
Ion Torrent Sequencing
Doug Brutlag 2011
The Human GenomeHow fast is the cost going down?
• 2006: $ 50 million• 2008: $500,000• 2009: $50,000• 2010: $20,000• 2011: $5,000• 2012:??? $1,000
Thanks to Serafim Batzoglou
Doug Brutlag 2011
Archon Genomics X-Prize
Doug Brutlag 2011
Archon Genomics X-Prize
Doug Brutlag 2011
Components of a Typical Human Gene
GeneGene
IntronIntron IntronIntronExonExon ExonExon ExonExonPromoterPromoter TerminatorTerminatorTFBS
Doug Brutlag 2011
Active Genes are Transcribed into RNA
PrimaryPrimaryTranscriptTranscript
GeneGene
IntronIntron IntronIntronExonExon ExonExon ExonExonPromoterPromoter TerminatorTerminator
Doug Brutlag 2011
TranscriptTranscript
mRNAmRNA
GeneGene
IntronIntron IntronIntronExonExon ExonExon ExonExonPromoterPromoter TerminatorTerminator
55’’ 33’’
SplicingSplicing
Splicing Transcript Yields Mature mRNA
Doug Brutlag 2011
Mature mRNA contains Coding Region and 5’ and 3’ Untranslated
Regions
TranscriptTranscript
mRNAmRNA
GeneGene
IntronIntron IntronIntronExonExon ExonExon ExonExonPromoterPromoter TerminatorTerminator
55’’ 33’’
SplicingSplicing
Coding RegionCoding Region55’’UTRUTR 33’’UTRUTR
55’’UTRUTR 33’’UTRUTR
Doug Brutlag 2011
Mature mRNA contains7-Methyl-Guanosine 5’ Cap and 3’ Poly A
Tail
TranscriptTranscript
mRNAmRNA
GeneGene
IntronIntron IntronIntronExonExon ExonExon ExonExonPromoterPromoter TerminatorTerminator
SplicingSplicing
Coding RegionCoding Region55’’UTRUTR 33’’UTRUTR
55’’UTRUTR 33’’UTRUTR
7-Me-G-Cap7-Me-G-Cap 3’ Poly A Tail3’ Poly A Tail
Doug Brutlag 2011
ESTs, Full Length cDNAUniGene & RefSeq Databases
Transcript
mRNA
Gene
Intron IntronExon Exon ExonPromoter Terminator
5’ 3’
3’ ESTs5’ ESTs
Full Length cDNA
Splicing
Doug Brutlag 2011
ESTs, Full Length cDNAUniGene & RefSeq Databases
Transcript
mRNA
Gene
Intron IntronExon Exon ExonPromoter Terminator
5’ 3’
3’ ESTs5’ ESTs
Full Length cDNA
Splicing
Proteins
5’ UTR
5’ UTR 3’ UTR
3’ UTR
Protein
Doug Brutlag 2011
GENSCAN Gene Modelhttp://genes.mit.edu/GENSCAN.html
Hidden Markov models of gene
structure
Doug Brutlag 2011
PromotersE Additional Data
Genomic DNAAssembled contigs
A Mapping uniSTS dbSNP
MouseESTs
HumanESTs
C Expression Data
Entrez GeneMouse
RefSeqMouse
UniGeneHuman
RefSeqHuman Ensembl
cDNA
Genome Databases
nrPROD Protein Similarity pFAM Motifs
GrailEXPB Gene Prediction GenScan FGENESH FGENESH+ GeneMark
F Summary Entrez Gene UCSC Browser Ensembl
Doug Brutlag 2011
Entrez Gene Locihttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene
NR-Pro
Genscan
GrailEXP
FGENESH
Entrez Gene
UniGene
ESTs
Doug Brutlag 2011
Alternative Splicing GeneratesDistinct Proteins in Different
Tissues
Transcript
mRNA-1
Gene
Intron IntronExon Exon ExonPromoter Terminator
5’ 3’
Transcript
mRNA-25’ 3’
Alternate Splicing
Splicing
Doug Brutlag 2011
NCBI Genomeshttp://www.ncbi.nlm.nih.gov/sites/entrez?db=genome
Doug Brutlag 2011
Eukaryote Genome Projectshttp://www.ncbi.nlm.nih.gov/genomes/leuks.cgi
Doug Brutlag 2011
Canis lupus familiaris Genomehttp://www.ncbi.nlm.nih.gov/sites/entrez?db=bioproject&cmd=Retrieve&dopt=Overview&list_uids=10726
Doug Brutlag 2011
NCBI Entrez Genehttp://www.ncbi.nlm.nih.gov/sites/entrez?db=gene
Doug Brutlag 2011
NCBI Entrez Gene: Human Opsinhttp://www.ncbi.nlm.nih.gov/gene?term=human%20opsin
Doug Brutlag 2011
Entrez Gene: Human Opsin OPN1MW
http://www.ncbi.nlm.nih.gov/gene/2652
Doug Brutlag 2011
Entrez Gene: Human Opsin OPN1MW
http://www.ncbi.nlm.nih.gov/gene/2652
Doug Brutlag 2011
MapViewer: Human Opsin OPN1MW
http://www.ncbi.nlm.nih.gov/gene/2652
Doug Brutlag 2011
Evidence Viewer for OPN1MWhttp://www.ncbi.nlm.nih.gov/sutils/evv.cgi?taxid=9606&contig=NT_167198.1&gene=OPN1MW&lid=2652&from=4366022&to=4380289
Doug Brutlag 2011
OMIM Home Pagehttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM
Doug Brutlag 2011
Colorblindness in OMIM
Doug Brutlag 2011
Colorblindness in OMIMhttp://omim.org/entry/303800
Doug Brutlag 2011
Human Genome Resourceshttp://www.ncbi.nlm.nih.gov/genome/guide/human/
Doug Brutlag 2011
RefSeqhttp://www.ncbi.nlm.nih.gov/RefSeq/
Doug Brutlag 2011
RefSeq Genehttp://www.ncbi.nlm.nih.gov/refseq/rsg/
Doug Brutlag 2011
NCBI UniGenehttp://www.ncbi.nlm.nih.gov/sites/entrez?db=unigene
Doug Brutlag 2011
NCBI Homologene Databasehttp://www.ncbi.nlm.nih.gov/homologene
Doug Brutlag 2011
Comparative Genomics
Doug Brutlag 2011
Ensembl Home Pagehttp://www.ensembl.org/
Doug Brutlag 2011
EBI Genomes Home Pagehttp://www.ensembl.org/
Doug Brutlag 2011
Ensembl Human Genomehttp://www.ensembl.org/Homo_sapiens/
Doug Brutlag 2011
Ensembl Human Opsin Searchhttp://uswest.ensembl.org/Homo_sapiens/Search/Results?species=Homo_sapiens;idx=;q=opsin
Doug Brutlag 2011
Ensembl Human Opsin Geneshttp://uswest.ensembl.org/Homo_sapiens/Search/Results?species=Homo_sapiens;idx=;q=opsin
Doug Brutlag 2011
Ensembl Human OPN1MW Gene
http://uswest.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000147380;r=X:153448107-153461633
Doug Brutlag 2011
Ensembl Opsin OPN1MW Gene Location
http://uswest.ensembl.org/Homo_sapiens/Location/View?h=Havana%20gene;r=X:153448107-153461633#r=X:153448109-153461632
Doug Brutlag 2011
Ensembl OPN1MW Transcriptshttp://uswest.ensembl.org/Homo_sapiens/Location/View?h=Havana%20gene;r=X:153448107-153461633#r=X:153448109-153461632
Doug Brutlag 2011
Ensembl OPN1MW Opsin Proteinhttp://uswest.ensembl.org/Homo_sapiens/Transcript/ProteinSummary?db=core;g=ENSG00000147380;r=X:153448107-153461633;t=ENST00000369935
Doug Brutlag 2011
Ensembl Tutorialshttp://uswest.ensembl.org/info/website/tutorials/index.html
Doug Brutlag 2011
UCSC Genome Home Pagehttp://genome.ucsc.edu/
Doug Brutlag 2011
UCSC Genome Browserhttp://genome.ucsc.edu/cgi-bin/hgGateway
Doug Brutlag 2011
UCSC Genome Browserhttp://genome.ucsc.edu/cgi-bin/hgGateway
Doug Brutlag 2011
UCSC Genome Browserhttp://genome.ucsc.edu/cgi-bin/hgTracks?position=chrX:153485203-153499469&hgsid=216983641&knownGene=pack&hgFind.matches=uc004fkd.2,
Doug Brutlag 2011
UCSC Genome Browserhttp://genome.ucsc.edu/cgi-bin/hgTracks?position=chrX:153485203-153499469&hgsid=216983641&knownGene=pack&hgFind.matches=uc004fkd.2,
Doug Brutlag 2011
UCSC Proteome Browserhttp://genome.ucsc.edu/cgi-bin/pbGateway
Doug Brutlag 2011
UCSC Proteome Browserhttp://genome.ucsc.edu/cgi-bin/pbGateway
Doug Brutlag 2011
UCSC Help Filehttp://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html