biologyoverview.ppt
TRANSCRIPT
CS 6463: An overview of Molecular Biology 1
21st Century = Biotech Century
• Completion of human genome• High-throughput microarray and similar devices• Cloning• Genetic engineering• Computational power
Everyone is moving towards Biotech
CS 6463: An overview of Molecular Biology 2
Explosive growth of biological data
Biology is becoming more computational intensive. High throughput bioinformatics, Lots of data The Molecular Biology Database Collection: 2005 update
Small excerpt from the A's: AARSDB: Aminoacyl-tRNA synthetase sequences ABCdb: ABC transporters AceDB: C. elegans, S. pombe, and human sequences
and genomic information ACTIVITY: Functional DNA/RNA site activity ALFRED: Allele frequencies and DNA polymorphisms
CS 6463: An overview of Molecular Biology 3
Opportunities for CS
Possibilities for CS contributions Data integration problem Data extraction from literature (natural language
processing) Database issues (including automation) Visualization Mining large complex data sets
CS 6463: An overview of Molecular Biology 4
Objective Introduction to basic molecular biology to computer science
students by a computer scientist.
A survey of databases: NCBI, SwissProt, PDB, Transfac, … Introduction to computational techniques in analyzing
genomics (and proteomics) data
Basic
CS 6463: An overview of Molecular Biology 5
Communication is important
CS 6463: An overview of Molecular Biology 6
Textbooks and course website
Required textbooks: Molecular Biology of the Cell (Main text) Bioinformatics, Genomics and Proteomics (Lab) Other material
References: Human Molecular Genetics (2nd Edition available for free) Data Mining : Practical Machine Learning Tools and
Techniques with Java Implementations (The Morgan Kaufmann Series in Data Management Systems) by Ian H. Witten, Eibe Frank (Paperback)
Microarrays for an Integrative Genomics (Computational Molecular Biology) [Paperback] By: Isaac S. Kohane, et al
Molecular Biology Web Book Course website:
http://www.cs.utsa.edu/~kwek/cs6463f05.html
CS 6463: An overview of Molecular Biology 7
Intended Audience CS graduate students with an interest in
bioinformatics or want to explore bioinformatics. High School Biology.
Not for students who want to find a filler class in between classes.
Every Tuesday noon to 1pm, Human Genome (HuGe) lab meets to discuss current bioinformatics issues. All are welcome even if you are new to bioinformatics (but are taking this course).
CS 6463: An overview of Molecular Biology 8
Database Search
CS 6463: An overview of Molecular Biology 9
Course Organization Overview of Molecular Biology (and project discussion)
Databases Introduction to Cell:1. Cells and Genomes2. Cell Chemistry and Biosynthesis3. Proteins
Data preprocessingClassification problemClustering problemMicroarray analysis
Sequence alignmentHidden Markov Model
Basic Genetic Mechanisms4. DNA and Chromosomes6. From DNA to Protein7. Control of gene expression
Diseases:23. Cancer25. PathogensOthers: SNP, NRAi
Gene findingMotif finding
Bioinformatics/Computational Biology Molecular Biology
CS 6463: An overview of Molecular Biology 10
Project Grade distributions
1 Quiz – 10% 2 tests – 30% Homework and Lab – 10% Project – 50% (+ 10% bonus)
Project Serious in bioinformatics (all HuGe Lab members): Mini (NIH-)
proposal project. Besides preliminary results, a proposal for future work (i.e. independent studies, theses). Possible collaborations with UTHSCSA and others.
Specific Aim(s): What do you want to do? Why is it important? Background: What have been done previously? (What make you
approach interesting?) Where do you get your data? (Preliminary) Result: To elaborate later. Future Work: To elaborate later.
A project: Same as above except do not need to have future work. Office hours (for projects): By appointment (send me an email
24 hours before) Tu, Th 10-3, 5-7, 8:30-10. W 10:30-noon.
CS 6463: An overview of Molecular Biology 11
Some Important Dates September 13: Quiz 1 (there will be a second chance quiz) September 20: Specific aim of project due. [1 meeting to
discuss with me] October 27: Test 1 October 18: Background of project due. (you must already
started doing experiments) [2 meetings to discuss with me] November 24: Test 2 December 10: Final report of project. [2 meetings to discuss
with me]
IMPORTANT: if you do not meet me the require number of times, I am not accepting your report. Also, each meeting should be at least one week a part.
CS 6463: An overview of Molecular Biology 12
Your Responsibility
• Read the assigned reading once the material is covered in lecture. Lecture is to make your reading easier.
• Try printing out the slides to take notes.
• Project: Observe the deadline!!!! Come and talk to me.
CS 6463: An overview of Molecular Biology 13
A. An overview of molecular biology
Read Human Molecular Genetic Ch. 1A.1. BackgroundA.2. MacromoleculesA.3. DNA structureA.4. RNA transcription and Gene ExpressionA.5. RNA processingA.6. Translation, post-translation processing
and protein structureA.7. Project ideas
CS 6463: An overview of Molecular Biology 14
Two types of cells:1. Prokaryotic (bacteria)2. Eukaryotic (multicellular organisms,Ameba, E. Coli)
A.1 Background: Procaryotic and Eukaryotic Cells
CS 6463: An overview of Molecular Biology 15
A.1 Background: Procaryotic and Eukaryotic Cells
http://www-class.unl.edu/bios201a/spring97/group6/
CS 6463: An overview of Molecular Biology 16
A.2. Building Blocks: Chemical Composition of Eukaryotic Cell
Water [E. Coli: 70%, Mammalian Cell: 70%] Macro-molecules:
DNA: Deoxyribonucleic Acid [E. Coli: 1%, Mammal: 0.25%] RNA: Ribonucleic Acid [E. Coli: 6%, Mammal: 1.1%] Proteins [E. Coli: 15%, Mammal: 18%]
Inorganic ions: Na+, K+, Mg+, Ca2+, Cl- [E. Coli: 1%, Mammal: 1%] Lipids:
Phospholipids [E. Coli: 2%, Mammal: 3%] Other lipids [E. Coli: -, Mammal: 0.2%]
Polysaccahrides [E. Coli: 1%, Mammal: 0.25%]
Volume: [E. Coli: 2 x 10-12cm, Mammal: 4 x 10-9cm] Relative Volume: [E. Coli: Mammal = 1: 2000]
CS 6463: An overview of Molecular Biology 17
A.2 Building Blocks: Structure of bases, nucleosides and nucleotides
DNA: ‘polymer of A, G, T, C’RNA: ‘polymer of A, G, U (replace T), C’
sugar
base
Purines:
Pyrimidines:
CS 6463: An overview of Molecular Biology 18
A.2. Building Blocks: Common bases found in nucleic acids
CS 6463: An overview of Molecular Biology 19
A.2 Building Blocks: 20 amino acids
Polypeptides: chains of amino acids
Amino groupCarboxyl group
CS 6463: An overview of Molecular Biology 20
A.2. Building Blocks: Abbreviation of Amino Acids
NameAbbreviation
Linear Structure
Alanine ala A CH3-CH(NH2)-COOH
Arginine arg R HN=C(NH2)-NH-(CH2)3-CH(NH2)-COOH
Asparagine asn N H2N-CO-CH2-CH(NH2)-COOH
Aspartic Acid asp D HOOC-CH2-CH(NH2)-COOH
Cysteine cys C HS-CH2-CH(NH2)-COOH
Glutamic Acid glu E HOOC-(CH2)2-CH(NH2)-COOH
Glutamine gln Q H2N-CO-(CH2)2-CH(NH2)-COOH
Glycine gly G NH2-CH2-COOH
Histidine his H NH-CH=N-CH=C-CH2-CH(NH2)-COOH
Isoleucine ile I CH3-CH2-CH(CH3)-CH(NH2)-COOH
Leucine leu L (CH3)2-CH-CH2-CH(NH2)-COOH
Lysine lys K H2N-(CH2)4-CH(NH2)-COOH
Methionine met M CH3-S-(CH2)2-CH(NH2)-COOH
Phenylalanine
phe F Ph-CH2-CH(NH2)-COOH
Proline pro P NH-(CH2)3-CH-COOH
Serine ser S HO-CH2-CH(NH2)-COOH
Threonine thr T CH3-CH(OH)-CH(NH2)-COOH
Tryptophan trp W Ph-NH-CH=C-CH2-CH(NH2)-COOH
Tyrosine tyr Y HO-Ph-CH2-CH(NH2)-COOH
Valine val V (CH3)2-CH-CH(NH2)-COOH
CS 6463: An overview of Molecular Biology 21
A.2. Building blocks: Properties of Amino Acids I
http://www.russell.embl-heidelberg.de/aas/aas.html
CS 6463: An overview of Molecular Biology 22
A.2. Building blocks: Some Terms for describing Properties of Amino Acids
Hydrophobic amino acids are those with side-chains that do not like to reside in an aqueous (i.e. water) environment.
Polar amino acids are those with side-chains that prefer to reside in an aqueous (i.e. water) environment.
Strictly speaking, aliphatic implies that the protein side chain contains only carbon or hydrogen atoms.
A side chain is aromatic when it contains an aromatic ring system.
CS 6463: An overview of Molecular Biology 23
A.2 Building Blocks: Covalent and Non-covalent Bonds
Covalent bonds: stronger. Nucleic acid and protein polymers are from by covalent binds connecting nucleotides and amino acids (respectively) to form a linear backbone
Non-covalent bonds: weaker and revisible. 4 types:
1. Hydrogen bonds: N – H –O [double-stranded DNA, protein folding, …etc
2. Ionic bonds: Ionic interaction between charged group, sat Na+ and Cl-
3. Van der Waals: Optimum attraction between two atoms.
4. Hydrophobic forces: Water is polar molecules,
CS 6463: An overview of Molecular Biology 24
A. An overview of molecular biology
A.1. Background A.2. Building Blocks of MacromoleculesA.3. DNA structureA.4. RNA transcription and Gene ExpressionA.5. RNA processingA.6. Translation, post-translation processing
and protein structureA.7. Project ideas
CS 6463: An overview of Molecular Biology 25
A.3 DNA Structure: The Phosphodiester Bond
CS 6463: An overview of Molecular Biology 26
A.3 DNA Structure: base pairing (Watson-Crick Rule).
CS 6463: An overview of Molecular Biology 27
A.3 DNA Structure: DNA is a double-stranded anti-parallel helix
http://www.sumanasinc.com/webcontent/anisamples/molecularbiology/DNA_structure.html
upst
ream
dow
nstr
eam
ComplementaryDNA(cDNA)
%GC = 40%? How many % is G? C? A? T?
CS 6463: An overview of Molecular Biology 28
A.3 DNA Structure: DNA is a double-stranded anti-parallel helix
CS 6463: An overview of Molecular Biology 29
A.3 DNA Structure: RNA structure
palindrome
CS 6463: An overview of Molecular Biology 30
A.3 DNA Structure: Viral Genomes
Highly Variable: DNA or RNA Single stranded or double stranded Linear or Circular Segmented and Multipartite
Virus normally replicate in the cytosol. Unusal Retrovirus duplicate itself in the nucleus (using reverse transcriptase)
CS 6463: An overview of Molecular Biology 31
A.4 DNA Structure: The Central Dogma
Old 1-directional model
CS 6463: An overview of Molecular Biology 32
A. An overview of molecular biology
A.1. Background A.2. Building Blocks of MacromoleculesA.3. DNA structureA.4. RNA transcription and Gene ExpressionA.5. RNA processingA.6. Translation, post-translation processing
and protein structureA.7. Project ideas
CS 6463: An overview of Molecular Biology 33
A.4 Transcription and Gene Expression:Transcription
exon exon exonintronintronstart stop5’ UTR 3’ UTRpromoterTFBS
5’ 3’
(1st key)
Nuclear membrane
(2nd key, May not be there)
exon exon exonintronintronstart stop5’ UTR 3’ UTR
(complementary nucleotides)
Pre-mRNA poly A
cap
pore
TFBS(almost always there)
(mostly for non-housing gene)
TFBS – Transcription factor binding site
CS 6463: An overview of Molecular Biology 34
A.4 Transcription and Gene Expression:Gene Regulation
A G T C
U C A G
http://henge.bio.miami.edu/mallery/movies/transcription.mov
G
C
G
http://www-class.unl.edu/biochem/gp2/m_biology/animation/gene/gene_a2.html
CS 6463: An overview of Molecular Biology 35
A.4 Transcription and Gene Expression:RNA Polymerase
There are three classes of RNA Polymerases: Polymerase I: Localized in the nucleolus. Transcribe
rRNA (ribosome RNA) 28S, 18S 5.8S rRNA. Polymerase II: All protein-coding genes most
smRNAs. Unique in capping and polyadenylation. Polymerase III: tRNA, other rRNAs, snRNAs. [The
promoter can be downstream]
Pusedo-genes (gene fragments): Previously were genes
Only 2% of the human genome encode proteins.
CS 6463: An overview of Molecular Biology 36
A.4 Transcription and Gene Expression: Trans- and cis-elements
Cis- element DNA sequence Trans-acting Factor
GC Box GGGCGG Spl
TATA Box TATAA TFIID (TFIIA – stabilize it)
CAAT Box CCAAT Many
TRE GTGAGT(A/C)A AP-1 family (many)
CRE (cAMP response element)
GTGACGT(A/C)A(A/G)
CREB/ATF family
Important: If pattern is there, does not necessary mean it is a cis-element.
CS 6463: An overview of Molecular Biology 37
A.4 Transcription and Gene Expression: Promoters
Start from 1 not 0
CS 6463: An overview of Molecular Biology 38
A.4 Transcription and Gene Expression: Enhancers and Silencers (Transcription Factors)
Many basepairsaway
CS 6463: An overview of Molecular Biology 39
A.4 Transcription and Gene Expression: Tissue Specific Genes
House keeping genes: Genes encoding histone protein, ribosome protein. Always on.
Tissue or development-specific (non-housekeeping) genes: Transcriptional inactive chromatin Methylation of Cytosine, replacing a hydrogen (H) with
methyl (CH3) Transcription factors’ expression levels are low.
Microarrays measure the expression levels of genes
CS 6463: An overview of Molecular Biology 40
A. An overview of molecular biology
A.1. Background A.2. Building Blocks of MacromoleculesA.3. DNA structureA.4. RNA transcription and Gene ExpressionA.5. RNA processingA.6. Translation, post-translation processing
and protein structureA.7. Project ideas
CS 6463: An overview of Molecular Biology 41
A.4 Transcription and Gene Expression:Transcription
exon exon exonintronintronstart stop5’ UTR 3’ UTRpromoterTFBS
5’ 3’
(1st key)
Nuclear membrane
(2nd key, May not be there)
Splicing the introns: http://www.sumanasinc.com/webcontent/anisamples/molecularbiology/mRNAsplicing.html
exon exon exonintronintronstart stop5’ UTR 3’ UTR
(complementary nucleotides)
Pre-mRNA poly A
exon exon exonstart stop5’ UTR 3’ UTRMassager RNA (mRNA) poly A
cap
pore
TFBS(almost always there)
(mostly for non-housing gene)
CS 6463: An overview of Molecular Biology 42
A.5 RNA Processing: RNA Splicing
donoracceptor
GT-AG spliceosomeAT-AC spliceosome (rare)
CS 6463: An overview of Molecular Biology 43
A.5 RNA Processing: Consensus Sequences at splice donor, acceptor and branch sites
CS 6463: An overview of Molecular Biology 44
A.5 RNA Processing: Mechanism of RNA Splicing (GU-AG introns)
Splicesome(5 snRNA)
http://www.nature.com/nrn/journal/v2/n1/animation/nrn0101_043a_swf_MEDIA1.html
CS 6463: An overview of Molecular Biology 45
A.5 RNA Processing: 5’ End Capping
CS 6463: An overview of Molecular Biology 46
A.5 RNA Processing: 3’ end polyadenylated.
CS 6463: An overview of Molecular Biology 47
A.5 RNA Processing: Functions of 5’ End Cap and Poly A tail
Functions of 5’ end cap
1. Prevent mRNA molecules degradation.
2. Facilitate transport to cytoplasm
3. RNA splicing
4. Facilitate translation
Function of 3’ end poly(A) tail
1. Facilitate transport to cytoplasm
2. Stabilize the mRNA in the cytoplasm
3. Facilitate translation
CS 6463: An overview of Molecular Biology 48
A.5 RNA Processing: Example of the human -globin gene
CS 6463: An overview of Molecular Biology 49
A.4 RNA Processing: Export out of the nuclear
CS 6463: An overview of Molecular Biology 50
A. An overview of molecular biology
A.1. Background A.2. Building Blocks of MacromoleculesA.3. DNA structureA.4. RNA transcription and Gene ExpressionA.5. RNA processingA.6. Translation, post-translation processing
and protein structureA.7. Project ideas
CS 6463: An overview of Molecular Biology 51
A.5 RNA Processing: The Codon-anticodon Recognition
http://henge.bio.miami.edu/mallery/movies/translation.mov
(almost always) tRNA
CS 6463: An overview of Molecular Biology 52
A.6 Translation and Post-Translational Processing : Peptide Bond Formation
CS 6463: An overview of Molecular Biology 53
A.6 Translation and Post-Translational Processing: The Genetic Codes
N-terminalC-terminal
CS 6463: An overview of Molecular Biology 54
A.6 Translation and Post-Translational Processing: The Genetic Codes
wobble- mitochondrial
64 possible codons: 1 Start codon AUG. 3 stop codons, 20 amino acids
Signal in mRNAs can lead to alternative interpretation of stop codons:UGA 21st AA selencocysteine, UAG 22nd AA pyrrolysine.
CS 6463: An overview of Molecular Biology 55
A.6 Translation and Post-Translational Processing: Multiple Post-Translational Cleavages of Polypeptide Precursors
CS 6463: An overview of Molecular Biology 56
A.6 Translation and Post-Translational Processing: Protein Secondary Structure
CS 6463: An overview of Molecular Biology 57
A.6 Translation and Post-Translational Processing: Quaternary
Amino acid sequence secondary structure tertiary structure
Amino acid sequence
CS 6463: An overview of Molecular Biology 58
A.6 Translation and Post-Translational Processing: Quaternary Structure
CS 6463: An overview of Molecular Biology 59
A.6 Translation and Post-Translational Processing: Disulfide Bridges
CS 6463: An overview of Molecular Biology 60
A.6 Translation and Post-Translational Processing: Post-translational Modification
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=hmg.table.103
CS 6463: An overview of Molecular Biology 61
A.6 Translation and Post-Translational Processing: Protein Sorting (Localization)
Protein Destination (Typical) Location and form of signal
Endoplasmic reticulum and secretion from cell
N-terminal peptide of 20 or so very hydrophobic AAs.
Mitochondria N-terminal peptide, a-helix. One side hydrophilic and one side hydrophobic
Nucleus Internal sequence of amino acids. Often a string of basic amino acids plus prolines; maybe bipartite.
Lysosome Addition of mannose 6-phosphate residues
1. Signal Peptide
2. Post-translational modification
CS 6463: An overview of Molecular Biology 62
A.6 Translation and Post-Translational Processing: Cellular Function of Proteins
Diverse cellular functions: Enzymes – ‘cut things into pieces’ Receptors Transport Transcription factor Signaling Hormones Strutural .. etc
CS 6463: An overview of Molecular Biology 63
A. An overview of molecular biology
A.1. Background A.2. Building Blocks of MacromoleculesA.3. DNA structureA.4. RNA transcription and Gene ExpressionA.5. RNA processingA.6. Translation, post-translation processing
and protein structureA.7. Project ideas
CS 6463: An overview of Molecular Biology 64
A.7 Summary: Central Dogma Simplify
Enzymes, Receptors,... etc
CS 6463: An overview of Molecular Biology 65
A.7 Summary: Don’t forget about mitochondria!
CS 6463: An overview of Molecular Biology 66
A.7 Summary: Life is more complex