algorithms in bioinformatics: lecture 01 introductionlucia/courses/5126-10/lecturenotes/01... ·...
TRANSCRIPT
Introduction to the course Introduction to Molecular Biology (Part I)
Algorithms in Bioinformatics:Lecture 01 Introduction
Lucia Moura
Fall 2010
Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura
Introduction to the course Introduction to Molecular Biology (Part I)
Intro
Introduction to the course
“Bioinformatics is the study of biology through computer modeling andanalysis. It is a multi-discipline research involving biology, statistics,data-mining, machine learning and algorithms.”
textbook: Wing-Kin SUNG, Algorithms in Bioinformatics, CRC Press,2009.
This course will give an in-depth view of algorithmic techniques used inbioinformatics.
Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura
Introduction to the course Introduction to Molecular Biology (Part I)
Intro
Course contents (tentative):Introduction to Molecular Biology (chapter 1)Sequence Similarity (chapter 2)global/local/semi-global alignment, gap penalty, scoring functionsSuffix trees and related data structures (chapter 3)algorithms to build a suffix trees, applicationsGenome Alignment (chapter 4)methods use suffix tree and longest common subsequence algorithmMultiple sequence alignment (chapter 6)dynamic programming, approximation algorithms, heuristicsPhylogeny Reconstruction (chapter 7)constructing a phylogenetic tree given different types of dataGenome Rearrangement (chapter 9)reversals, transpositions, etc, various distances consideredOther topics: RNA secondary structure prediction (guest lecture);other topics/guest lectures TBA
Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura
Introduction to the course Introduction to Molecular Biology (Part I)
Intro
Course Administration
Please refer to the course outline:http://www.site.uottawa.ca/ lucia/courses/5126-10/outline.html
Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura
Introduction to the course Introduction to Molecular Biology (Part I)
Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome
Intro to Molecular biology: DNA, RNA, Protein
Our body has organs formed by tissues which are collections of similarcells that perform specialized functions.A cell is the minimal self-reproducing unit in all living species. It performstwo functions:
1 stores and passes genetic information for preserving life fromgeneration to generation.This is done via DNA molecules.
2 Performs chemical reactions necessary to maintain our life.To do this portions of DNA called genes are transcribed into RNAmolecules, which in turn guide the synthesis of proteins. Proteins arethe main catalysts for chemical reactions in the cell.
Next we discuss these macromolecules (molecules formed from acollection of smaller molecules): protein, DNA and RNA.
Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura
Introduction to the course Introduction to Molecular Biology (Part I)
Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome
Proteins
Proteins are the building blocks of cells; they execute nearly all cellfunctions.
Understanding proteins is essential to understanding how the bodyfunctions and other biological processes.
A protein (also called polypeptide) is a chain of amino acids (on averagearound 350 amino-acids form a protein), each bonding to its neighbourthrough a covalent peptide bond. The protein’s primary structure isgiven by its sequence of amino-acids.There are 20 different common amino acids.
Computer science language: a protein’s primary structurecorresponds to a string (of length in average 350 symbols) over analphabet of size 20.
Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura
Introduction to the course Introduction to Molecular Biology (Part I)
Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome
Amino acid structure
1 Amino group (NH2)
2 Carboxyl Group (COOH)
3 R-group (side chain):Different R-groups (side chain) characterize each of the 20 commonamino acids.
Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura
Introduction to the course Introduction to Molecular Biology (Part I)
Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome
Amino acids join together via a peptide bond
Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura
Introduction to the course Introduction to Molecular Biology (Part I)
Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome
DNADeoxyribonucleic Acid (DNA) is the genetic material in all livingorganisms. It stores the instructions for the cell to perform its functions.“DNA can be thought of as a large cookbook with recipes for makingevery protein in a cell. (...)The information in the genes is read, perhaps millions of times in the lifeof an organism, but the DNA itself is never used up.”
DNA consists of 2 strands of nucleotides forming a double helix structure.DNA nucleotides vary depending on 4 possible nitrogenous bases:adenine (A), guanine (G), cytosine (C), thymine (T).One strand is a polynucleotide (a sequence of nucleotides of 4 types); thesecond strand has their complementary base pairs (A = T , C ≡ G).
Computer science language: a DNA’s primary structure correspondsto a string over the alphabet A, C, T,G(the second strand is determined by the first).
Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura
Introduction to the course Introduction to Molecular Biology (Part I)
Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome
DNA Nucleotides
1 A pentose sugar deoxyribose
2 Phosphate group (bound to the 5’ carbon)
3 Nitrogenous base (bound to the 1’ carbon): A, C, T, G
Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura
Introduction to the course Introduction to Molecular Biology (Part I)
Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome
DNA formed by chaining nucleotides I
Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura
Introduction to the course Introduction to Molecular Biology (Part I)
Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome
DNA formed by chaining nucleotides II
Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura
Introduction to the course Introduction to Molecular Biology (Part I)
Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome
Watson-Crick base paring
Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura
Introduction to the course Introduction to Molecular Biology (Part I)
Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome
DNA double helix structure (Watson and Crick 1958)
Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura
Introduction to the course Introduction to Molecular Biology (Part I)
Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome
DNA replicationCell duplicates and passes DNA to two daughter cells.
1 double strand separated
2 each strand forms a template for a complementary new strand
Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura
Introduction to the course Introduction to Molecular Biology (Part I)
Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome
RNA
Ribonucleic Acid (RNA) is the nucleic acid produced during thetranscription process (from DNA to RNA).The nucleotide structure for RNA is similar to the one for DNA.Differences:
1 Ribose Sugar in place of Deoxyribose;
2 Nitrogenous bases are (A, U), (C, G); Uracyl instead of Thymine.
RNA is single stranded.RNA can form more complex 3D structures (than DNA) to perform morefunctions.Proteins can perform even more functions than RNA.DNA is more stable to store information than RNA.
Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura
Introduction to the course Introduction to Molecular Biology (Part I)
Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome
Nucleotide structure for RNA
1 A pentose sugar ribose
2 Phosphate group (bound to the 5’ carbon)
3 Nitrogenous base (bound to the 1’ carbon): A,U,C,G
Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura
Introduction to the course Introduction to Molecular Biology (Part I)
Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome
Different types of RNA
mRNA: messenger RNAcarry encoded information needed to make proteins
ncRNA: non-coding RNA, which includes:I ribosomal RNA (rRNA):
are parts of ribosomes, help translate mRNA into proteinsI transfer RNA (tRNA):
are like molecular diccionaries that translate the nucleic acid code intothe amino acid sequence of proteins.
I short ncRNA:regulate the process for generating proteins from genes.
I long ncRNA:diverse functions, unknown functions.
Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura
Introduction to the course Introduction to Molecular Biology (Part I)
Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome
Genome, Chromosome and Gene
genome: the set of all DNA in an organism.genome size varies; size doesn’t necessarily correspond to complexity:bacteria Mycoplasma genitalium genome has ∼ 600,000 base pairs;human and mouse genomes have ∼ 3 billion base pairs;the single cell organism Amoeba dubia has ∼ 670 billion base pairs!
the genome is partitioned into chromosomes; each chromosome is sdouble-stranded DNA chain wrapped around histones. Humans have 23pairs of chromosomes (e.g. males have 22 pairs of autosomes, one X andone Y chromosome).
a gene is a “substring” of DNA that encodes a protein or an RNAmolecule. Each chromosome contains many genes. In the human genomethere are ∼ 30,000 genes.
Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura
Introduction to the course Introduction to Molecular Biology (Part I)
Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome
Chromosomes
Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura
Introduction to the course Introduction to Molecular Biology (Part I)
Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome
Processes to be understood in more detail next class:
DNA replication and DNA mutation.
Central Dogma (proposed by Crick in 1958)process of transfering information from DNA to RNA to protein.
I Transcription (transfer of genetic information from the DNA tothe mRNA):DNA is transcribed to mRNA, i.e., during the transcription process, anmRNA is synthesized from a DNA template.
I Translation (mRNA is translated to protein):The mRNA is translated into an amino acid sequence. Here the geneticcode is used: each codon (3 consecutive symbols) is translated intotheir corresponding amino acid.
Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura
Introduction to the course Introduction to Molecular Biology (Part I)
Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome
Brief History of Bioinformatics1866: Mendel discovered genetics ( hybridization of peas, genes)1869: DNA was discovered.1944: Avery & McCarty show DNA is the carrier of genetic info.1953: Watson and Crick deduced the double helix structure of DNA.1970’s and beyond: several biotechnology techniques were developed.E.g. DNA sequencing using any tissue ; polymerase chain-reaction.1986: RNA splicing in eukaryotes is discovered (introns/extrons)1998: Fire and Mello discovered RNA interference1980-1990: genome sequencing of various organisms (e.g. E. coli)1990: the human genome project is lauched2003: sequencing of the human genome (first draft 2000)2006-now second generation sequencing technology is availableOther projects: Genomes to Life (understand the detailed mechanismof cells), ENCODE (annotating: all the genes & functional elements),HAPMAP(study differences in genetic data among people)
Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura