introduction to bioinformatics lecture 2 genes and genomes c e n t r f o r i n t e g r a t i v e b i...
TRANSCRIPT
![Page 1: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/1.jpg)
Introduction to bioinformaticsLecture 2
Genes and Genomes
CENTR
FORINTEGRATIVE
BIOINFORMATICSVU
E
![Page 2: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/2.jpg)
Organisational• Course website:
http://ibi.vu.nl/teaching/mnw_2year/mnw2_2007.phpor click on http://ibi.vu.nl (>teaching >Introduction to Bioinformatics)
• Course book: Bioinformatics and Molecular Evolution by Paul G. Higgs and Teresa K. Attwood (Blackwell Publishing), 2005, ISBN (Pbk) 1-4051-0683-2
• Lots of information about Bioinformatics can be found on the web.
![Page 3: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/3.jpg)
.....acctc ctgtgcaaga acatgaaaca nctgtggttc tcccagatgg gtcctgtccc aggtgcacct gcaggagtcg ggcccaggac tggggaagcc tccagagctc aaaaccccac ttggtgacac aactcacaca tgcccacggt gcccagagcc caaatcttgt gacacacctc ccccgtgccc acggtgccca gagcccaaat cttgtgacac acctccccca tgcccacggt gcccagagcc caaatcttgt gacacacctc ccccgtgccc ccggtgccca gcacctgaac tcttgggagg accgtcagtc ttcctcttcc ccccaaaacc caaggatacc cttatgattt cccggacccc tgaggtcacg tgcgtggtgg tggacgtgag ccacgaagac ccnnnngtcc agttcaagtg gtacgtggac ggcgtggagg tgcataatgc caagacaaag ctgcgggagg agcagtacaa cagcacgttc cgtgtggtca gcgtcctcac cgtcctgcac caggactggc tgaacggcaa ggagtacaag tgcaaggtct ccaacaaagc aaccaagtca gcctgacctg cctggtcaaa ggcttctacc ccagcgacat cgccgtggag tgggagagca atgggcagcc ggagaacaac tacaacacca cgcctcccat gctggactcc gacggctcct tcttcctcta cagcaagctc accgtggaca agagcaggtg gcagcagggg aacatcttct catgctccgt gatgcatgag gctctgcaca accgctacac gcagaagagc ctctc.....
DNA sequenceDNA sequence
![Page 4: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/4.jpg)
Genome sizeGenome sizeOrganism Number of base
pairs
X-174 virus 5,386
Epstein Bar Virus 172,282
Mycoplasma genitalium 580,000
Hemophilus Influenza 1.8 106
Yeast (S. Cerevisiae) 12.1 106
Human Human 3.2 3.2 10 1099
Wheat 16 109
Lilium longiflorum 90 109
Salamander 100 109
Amoeba dubia 670 109
![Page 5: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/5.jpg)
Four DNA nucleotide building blocks
G-C is more strongly hydrogen-bonded than A-T
![Page 6: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/6.jpg)
A gene codes for a protein
Protein
mRNA
DNA
transcription
translation
CCTGAGCCAACTATTGATGAA
PEPTIDE
CCUGAGCCAACUAUUGAUGAA
![Page 7: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/7.jpg)
Central Dogma of Molecular Biology
Replication DNATranscription
mRNATranslation
Protein
Transcription is carried out by RNA polymerase (II)
Translation is performed on ribosomes
Replication is carried out by DNA polymerase
Reverse transcriptase copies RNA into DNA
Transcription + Translation = Expression
![Page 8: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/8.jpg)
But DNA can also be transcribed into non-coding RNA …
tRNA (transfer): transfer of amino acids to theribosome during protein synthesis.
rRNA (ribosomal): essential component of the ribosomes (complex with rProteins).
snRNA (small nuclear): mainly involved in RNA-splicing(removal of introns). snRNPs.
snoRNA (small nucleolar): involved in chemical modifi-cations of ribosomal RNAs and other RNA genes. snoRNPs.
SRP RNA (signal recognition particle): form RNA-protein complex involved in mRNA secretion.
Further: microRNA, eRNA, gRNA, tmRNA etc.
![Page 9: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/9.jpg)
Eukaryotes have spliced genes …
Promoter: involved in transcription initiation (TF/RNApol-binding sites) TSS: transcription start site UTRs: un-translated regions (important for translational control) Exons will be spliced together by removal of the Introns Poly-adenylation site important for transcription termination
(but also: mRNA stability, export mRNA from nucleus etc.)
![Page 10: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/10.jpg)
DNA makes mRNA makes Protein
![Page 11: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/11.jpg)
DNA makes RNA makes Protein
… yet another picture to appreciate the above statement
![Page 12: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/12.jpg)
Some facts about human genes
There are about 20.000 – 25.000 genes in the human genome (~ 3% of the genome)
Average gene length is ~ 8.000 bp
Average of 5-6 exons per gene
Average exon length is ~ 200 bp
Average intron length is ~ 2000 bp
8% of the genes have a single exon
Some exons can be as small as 1 or 3 bp
![Page 13: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/13.jpg)
DMD: the largest known human gene
The largest known human gene is DMD, the gene that encodes dystrophin: ~ 2.4 milion bp over 79 exons
X-linked recessive disease (affects boys)
Two variants: Duchenne-type (DMD) and becker-type (BMD)
Duchenne-type: more severe, frameshift-mutations
Becker-type: milder phenotype, “in frame”- mutations
Posture changes during progression of Duchenne muscular dystrophy
![Page 14: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/14.jpg)
Nucleic acid basics
Nucleic acids are polymers
Each monomer consists of 3 moieties
nucleoside
nucleotide
![Page 15: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/15.jpg)
Nucleic acid basics (2)
A base can be of 5 rings Purines and Pyrimidines can base-pair (Watson- Crick pairs)
Watson and Crick, 1953
![Page 16: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/16.jpg)
Nucleic acid as hetero-polymers
Nucleosides, nucleotides
(Ribose sugar,
RNA precursor)
(2’-deoxy ribose sugar,
DNA precursor)
(2’-deoxy thymidine tri-
phosphate, nucleotide)
DNA and RNA strands
REMEMBER:
DNA = deoxyribonucleotides;RNA = ribonucleotides (OH-groups at the 2’ position)
Note the directionality of DNA (5’-3’ & 3’-5’) or RNA (5’-3’)
DNA = A, G, C, T ; RNA = A, G, C, U
![Page 17: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/17.jpg)
So …
DNA RNA
![Page 18: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/18.jpg)
Stability of base-pairing
C-G base pairing is more stable than A-T (A-U) base pairing (why?)
3rd codon position has freedom to evolve (synonymous mutations)
Species can therefore optimise their G-C content (e.g. thermophiles are GC rich) (consequences for codon use?)
Thermocrinis ruber, heat-loving bacteria
![Page 19: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/19.jpg)
![Page 20: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/20.jpg)
TAA, TAG, TGA Stop Stop codons
CGT, CGC, CGA, CGG, AGA, AGGRArginine
AAA, AAGKLysine
GAT, GACDAspartic acid
GAA, GAGEGlutamic acid
CAT, CACHHistidine
AAT, AACNAsparagine
CAA, CAGQGlutamine
TGGWTryptophan
TAT, TACYTyrosine
TCT, TCC, TCA, TCG, AGT, AGCSSerine
ACT, ACC, ACA, ACGTThreonine
CCT, CCC, CCA, CCGPProline
GGT, GGC, GGA, GGG GGlycine
GCT, GCC, GCA, GCG AAlanine
TGT, TGCc Cysteine
ATGM,
StartMethionine
TTT, TTCFPhenylalanine
GTT, GTC, GTA, GTGVValine
CTT, CTC, CTA, CTG, TTA, TTGLLeucine
ATT, ATC, ATA IIsoleucine
DNA codonsSingle Letter Code
Amino Acid
![Page 21: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/21.jpg)
DNA compositional biases
Base compositions of genomes: G+C (and therefore also A+T) content varies between different genomes
The GC-content is sometimes used to classify organism in taxonomy
High G+C content bacteria: Actinobacteriae.g. in Streptomyces coelicolor it is 72%
Low G+C content: Plasmodium falciparum (~20%)
Other examples:
Saccharomyces cerevisiae (yeast) 38%
Arabidopsis thaliana (plant) 36%
Escherichia coli (bacteria) 50%
![Page 22: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/22.jpg)
Genetic diseases: cystic fibrosis
Known since very early on (“Celtic gene”)
Autosomal, recessive, hereditary disease (Chr. 7)
Symptoms:
Exocrine glands (which produce
sweat and mucus) Abnormal secretions Respiratory problems Reduced fertility and (male)
anatomical anomalies
30,000
3,00020,000
![Page 23: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/23.jpg)
cystic fibrosis (2)
Gene product: CFTR (cystic fibrosis transmembrane conductance regulator)
CFTR is an ABC (ATP-binding cassette) transporter or traffic ATPase.
These proteins transport molecules such as sugars, peptides, inorganic phosphate, chloride, and metal cations across the cellular membrane.
CFTR transports chloride ions (Cl-) ions across the membranes of cells in the lungs, liver, pancreas, digestive tract, reproductive tract, and skin.
![Page 24: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/24.jpg)
cystic fibrosis (3)
CF gene CFTR has 3-bp deletion leading to Del508 (Phe) in 1480 aa protein (epithelial Cl- channel)
Protein degraded in Endoplasmatic Reticulum (ER) instead of inserted into cell membrane
The deltaF508 deletion is the most common cause of cystic fibrosis. The isoleucine (Ile) at amino acid position 507 remains unchanged because both ATC and ATT code for isoleucine
Diagram depicting the five domains of the CFTR membrane protein (Sheppard 1999).
Theoretical Model of NBD1. PDB identifier 1NBD as viewed in Protein Explorer http://proteinexplorer.org
![Page 25: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/25.jpg)
Let’s return to DNA and RNA structure …
Unlike three dimensional structures of proteins, DNA molecules assume simple double helical structures independent of their sequences.
There are three kinds of double helices that have been observed in DNA: type A, type B, and type Z, which differ in their geometries.
RNA on the other hand, can have as diverse structures as proteins, as well as simple double helix of type A.
The ability of being both informational and diverse in structure suggests that RNA was the prebiotic molecule that could function in both replication and catalysis (The RNA World Hypothesis).
In fact, some viruses encode their genetic materials by RNA (retrovirus)
![Page 26: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/26.jpg)
Three dimensional structures of double helices
Side view: A-DNA, B-DNA, Z-DNA
Top view: A-DNA, B-DNA, Z-DNA
Space-filling models of A, B and Z- DNA
![Page 27: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/27.jpg)
Major and minor grooves
![Page 28: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/28.jpg)
Forces that stabilize nucleic acid double helix
There are two major forces that contribute to stability of helix formation: Hydrogen bonding in base-pairing Hydrophobic interactions in base stacking
5’
5’
3’
3’
Same strand stacking
cross-strand stacking
![Page 29: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/29.jpg)
Types of DNA double helix
Type A
major conformation RNAminor conformation DNA
Right-handed helixShort and broad
Type B
major conformation DNA
Right-handed helixLong and thin
Type Z
minor conformation DNA
Left-handed helixLonger and thinner
![Page 30: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/30.jpg)
Secondary structures of Nucleic acids
DNA is primarily in duplex form
RNA is normally single stranded which can have a diverse form of secondary structures other than duplex.
![Page 31: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/31.jpg)
Non B-DNA Secondary structures
Cruciform DNA
Triple helical DNA
Slipped DNA
Hoogsteen basepairs
Source: Van Dongen et al. (1999) , Nature Structural Biology 6, 854 - 859
![Page 32: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/32.jpg)
More Secondary structures
RNA pseudoknots Cloverleaf rRNA structure
Source: Cornelis W. A. Pleij in Gesteland, R. F. and Atkins, J. F. (1993) THE RNA WORLD. Cold Spring Harbor Laboratory Press.
16S rRNA Secondary Structure Based onPhylogenetic Data
![Page 33: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/33.jpg)
3D structures of RNA :transfer-RNA structures
Secondary structure of tRNA (cloverleaf)
Tertiary structure of tRNA
![Page 34: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/34.jpg)
3D structures of RNA :ribosomal-RNA structures
Secondary structure of large rRNA (16S)
Tertiary structure of large rRNA subunit
Ban et al., Science 289 (905-920), 2000
![Page 35: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/35.jpg)
3D structures of RNA :Catalytic RNA
Secondary structure of self-splicing RNA
Tertiary structure of self-splicing RNA
![Page 36: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/36.jpg)
Some structural rules …
Base-pairing is stabilizing
Un-paired sections (loops) destabilize
3D conformation with interactions makes up for this
![Page 37: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/37.jpg)
Three main principles
• DNA makes RNA makes Protein
• Structure more conserved than sequence
• Sequence Structure Function
![Page 38: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/38.jpg)
How to go from DNA to protein sequence
A piece of double stranded DNA:
5’ attcgttggcaaatcgcccctatccggc 3’3’ taagcaaccgtttagcggggataggccg 5’
DNA direction is from 5’ to 3’
![Page 39: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/39.jpg)
How to go from DNA to protein sequence
6-frame conceptual translation using the codon table:
5’ attcgttggcaaatcgcccctatccggc 3’
3’ taagcaaccgtttagcggggataggccg 5’
So, there are six possibilities to make a protein from an unknown piece of DNA, only one of which might be a natural protein
![Page 40: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/40.jpg)
Remark
• Identifying (annotating) human genes, i.e. finding what they are and what they do, is a difficult problem– First, the gene should be delineated on the genome
• Gene finding methods should be able to tell a gene region from a non-gene region
• Start, stop codons, further compositional differences
– Then, a putative function should be found for the gene located
![Page 41: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/41.jpg)
Dean, A. M. and G. B. Golding: Pacific Symposium on Bioinformatics 2000
Evolution and three-dimensional protein structure information
Isocitratedehydrogenase:
The distance fromthe active site(in yellow) determinesthe rate of evolution(red = fast evolution, blue = slow evolution)
![Page 42: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/42.jpg)
Genomic Data Sources
• DNA/protein sequence
• Expression (microarray)
• Proteome (xray, NMR,
mass spectrometry)
• Metabolome
• Physiome (spatial,
temporal)
Integrative bioinformatics
![Page 43: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/43.jpg)
Dinner discussion: Integrative Bioinformatics & Genomics VUDinner discussion: Integrative Bioinformatics & Genomics VU
metabolomemetabolome
proteomeproteome
genomegenome
transcriptometranscriptome
physiomephysiome
Genomic Data SourcesVertical Genomics
![Page 44: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/44.jpg)
DNA makes RNA makes Protein(reminder)
![Page 45: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/45.jpg)
DNA makes RNA makes Protein:Expression data
• More copies of mRNA for a gene leads to more protein
• mRNA can now be measured for all the genes in a cell at ones through microarray technology
• Can have 60,000 spots (genes) on a single gene chip
• Colour change gives intensity of gene expression (over- or under-expression)
![Page 46: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/46.jpg)
![Page 47: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/47.jpg)
Proteomics
• Elucidating all 3D structures of proteins in the cell
• This is also called Structural Genomics
• Finding out what these proteins do
• This is also called Functional Genomics
![Page 48: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/48.jpg)
![Page 49: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/49.jpg)
Protein-protein interaction networks
![Page 50: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/50.jpg)
Metabolic networks
Glycolysis and
Gluconeogenesis
Kegg database (Japan)
![Page 51: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/51.jpg)
High-throughput Biological Data
• Enormous amounts of biological data are being generated by high-throughput capabilities; even more are coming– genomic sequences
– arrayCGH (Comparative Genomic Hybridization) data, gene expression data
– mass spectrometry data
– protein-protein interaction data
– protein structures
– ......
![Page 52: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/52.jpg)
Protein structural data explosion
Protein Data Bank (PDB): 14500 Structures (6 March 2001)10900 x-ray crystallography, 1810 NMR, 278 theoretical models, others...
![Page 53: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/53.jpg)
Dickerson’s formula: equivalent to Moore’s law
On 27 March 2001 there were 12,123 3D protein structures in the PDB: Dickerson’s formula predicts 12,066 (within 0.5%)!
n = e0.19(y-1960) with y the year.
![Page 54: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/54.jpg)
Sequence versus structural data• Structural genomics initiatives are now in full
swing and growth is still exponential.
• However, growth of sequence data is even more rapidly. There are now more than 500 completely sequenced genomes publicly available.
Increasing gap between structural and sequence data (“Mind the gap”)
![Page 55: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/55.jpg)
BioinformaticsLarge - external(integrative) Science Human
Planetary Science Cultural Anthropology
Population Biology Sociology Sociobiology Psychology Systems Biology Biology Medicine
Molecular Biology Chemistry Physics
Small – internal (individual)
Bioinformatics
![Page 56: Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e4f5503460f94b4676f/html5/thumbnails/56.jpg)
Bioinformatics• Offers an ever more essential input to
– Molecular Biology– Pharmacology (drug design)– Agriculture– Biotechnology– Clinical medicine– Anthropology– Forensic science– Chemical industries (detergent industries, etc.)