introduction to bioinformatics molecular biology primer
Post on 20-Dec-2015
229 views
TRANSCRIPT
Genetic Material
• DNA (deoxyribonucleic acid) is the genetic material
• Information stored in DNA– the basis of inheritance– distinguishes living things from nonliving
things
• Genes– various units that govern living thing’s
characteristics at the genetic level
Nucleotides
• Genes themselves contain their information as a specific sequence of nucleotides found in DNA molecules
• Only four different bases used in DNA molecules– Guanine (G)– Adenine (A)– Thymine (T)– Cytosine (C)
• Each base is attached to a phosphate group and a deoxyribose sugar to form a nucleotide.
• The only thing that makes one nucleotide different from another is which nitrogenous base it contains
SugarP
Base
Nucleotides
• Complicated genes can be many thousands of nucleotides long
• All of an organism’s genetic instructions, its genome, can be maintained in millions or even billions of nucleotides
Orientation
• Strings of nucleotides can be attached to each other to make long polynucleotide chains
• 5’ (5 prime) end – The end of a string of nucleotides with a 5'
carbon not attached to another nucleotide
• 3’ (3 prime) end– The other end of the molecule with an
unattached 3' carbon
Base Pairing
• Structure of DNA– Double helix– Paper by Watson and Crick in 1953
• Information content on one of those strands essentially redundant with the information on the other– Not exactly the same—it is complementary
• Base pair– G paired with C (G C)– A paired with T (A = T)
Base Pairing
• Reverse complements– 5' end of one strand corresponding to the 3' end of its
complementary strand and vice versa
• Example– one strand: 5'-GTATCC-3‘
the other strand: 3'-CATAGG-5‘ 5'-GGATAC-3'
• Upstream: Sequence features that are 5' to a particular reference point
• Downstream: Sequence features that are 3' to a particular reference point
DNA Structure
• Let’s see what Watson and Crick said about their discovery …
Chromosome
• Different kinds of organisms have different numbers of chromosomes
• Humans – 23 pairs– 46 in all
Central Dogma of Molecular Biology
• DNA: information storage
• Protein: function unit, such as enzyme
• Gene: instructions needed to make protein
• Central dogma
Central Dogma of Molecular Biology
• RNA (ribonucleic acid)– Single-stranded polynucleotide– Bases
• A• G• C• U (uracil), instead of T
• Transcription– A A, G G, C C, T U
• Let’s see what Crick said about his proposal …
SugarP
Base
SugarP
Base
H
OH
DNA
RNA
DNA Replication Animation
Courtesy of Rob Rutherford, St. Olaf University
Transcription (DNA RNA)
• Messenger RNA (mRNA)– carries information to be
translated
• Ribosomal RNA (rRNA)– the working “spine” of
the ribosome
• Transfer RNA (tRNA)– the “decoder keys” that
will translate nucleic acids to amino acids
Transcription Animation
Courtesy of Rob Rutherford, St. Olaf University
Peptides and Proteins
• mRNA Sequence of amino acids connected by peptide bond
• Amino acid sequence– Peptide: < 30 – 50 amino acids– Protein: longer peptide
List of Amino Acids
Amino acid Symbol CodonA Alanine Ala GC*C Cysteine Cys UGU, UGCD Aspartic Acid Asp GAU, GACE Glutamic Acid Glu GAA, GAGF Phenylalanine Phe UUU, UUCG Glycine Gly GG*H Histidine His CAU, CACI Isoleucine Ile AUU, AUC, AUAK Lysine Lys AAA, AAGL Leucine Leu UUA, UUG, CU*
List of Amino Acids
Amino acid Symbol CodonM Methionine Met AUGN Asparagine Asn AAU, AACP Proline Pro CC*Q Glutamine Gln CAA, CAGR Arginine Arg CG*, AGA, AGGS Serine Ser UC*, AGU, AGCT Threonine Thr AC*V Valine Val GU*W Tryptophan Trp UGGY Tyrosine Tyr UAU, UAC
20 letters, no B J O U X Z
Codon and Reading Frame
• 4 AA letters 43 = 64 triplet possibilities• 20 (< 64) known amino acids• Wobbling 3rd base• Redundant Resistant to mutation• Reading frame: linear sequence of codons in a
gene• Open Reading Frame (ORF): a potential
protein-coding region of DNA sequence– a reading frame that begins with a start codon and
end at a stop codon– a series of codons in a DNA sequence uninterrupted
by the presence of a stop codon
Open Reading Frame
• Given a nucleotide sequence– What to begin with? ATG– How many reading frames? 6
• 3 forward and 3 backward
• Example: ATGACCGTGGGCTCTTAA– ATG ACC GTG GGC TCT TAA M T V G S *– TGA CCG TGG GCT CTT AA * P W A L – GAC CGT GGG CTC TTA A D R G L L– Figure out the three backward reading frames
• In random sequence, a stop codon will follow a Met in ~20 AA
• Substantially longer ORFs are often genes or parts of them
Translation Animation
Courtesy of Rob Rutherford, St. Olaf University
Gene Expression
• Gene expression– Process of using the information stored in
DNA to make an RNA molecule and then a corresponding protein
• Cells controlling gene expression by– reliably distinguishing between those parts of
an organism’s genome that correspond to the beginnings of genes and those that do not
– determining which genes code for proteins that are needed at any particular time.
Promoter
• The probability (P) that a string of nucleotides will occur by chance alone if all nucleotides are present at the same frequency P = (1/4)n, where n is the string’s length
• Promoter sequences – Sequences recognized by RNA polymerases as being
associated with a gene
• Example– Prokaryotic RNA polymerases scan along DNA looking for a
specific set of approximately 13 nucleotides marking the beginning of genes
– 1 nucleotide that serves as a transcriptional start site – 6 that are 10 nucleotides 5' to the start site, and – 6 more that are 35 nucleotides 5' to the start site
Gene Regulation
• Regulatory proteins– Capable of binding to a cell’s DNA near the promoter
of the genes – Control gene expression in some circumstances but
not in others
• Positive regulation – binding of regulatory proteins makes it easier for an
RNA polymerase to initiate transcription
• Negative regulation– binding of the regulatory proteins prevents
transcription from occurring
Point Mutation Example: Sickle-cell Disease
• Wild-type hemoglobin
DNA
3’----CTT----5’
mRNA
5’----GAA----3’
Normal hemoglobin
------[Glu]------
• Mutant hemoglobin
DNA
3’----CAT----5’
mRNA
5’----GUA----3’
Mutant hemoglobin
------[Val]------
50% is high copy number repeats
About 10% is transcribed
(made into RNA)
Only 1.5% actually codes for protein
98.5% Junk DNA
Thinking about the Human Genome