introduction to genetics – as relevant to this course (ack: roche genetics cd-rom, mishra’s...

35
Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD- ROM, Mishra’s notes at NYU, …)

Post on 20-Dec-2015

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

Introduction to Genetics – as relevant to this course

(Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

Page 2: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

Background (1/18)

• Genome, Chromosome, Genes – made up of DNAs• Genetics research (largely over last 100yrs, accelerated in last 30

yrs)– Has led to important advances in medical science.

• Nucleus of a cell : contains chromosomes (made up of DNA); and proteins.

• DNA (Deoxy Ribo Nucleic acid)– Is the genetic material that is inherited.– Contains the information needed by living cells to specify their structure,

function, activity and interaction with other cells and environment.– A DNA molecule can be thought of as a very long sequence of

nucleotides or bases.

Page 3: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

DNA structure (2/18)

• The Nobel Prize in Physiology or Medicine 1962 -- Crick, Watson and Wilkins– for their discoveries concerning the molecular structure of

nucleic acids and its significance for information transfer in living material

• Made up of 4 different building blocks (so called nucleotide bases), each an almost planar nitrogenic organic compound– Adenine (A)– Thymine (T)– Guanine (G)– Cytosine (C)– Base pairs (A -- T, C -- G)

Page 4: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

DNA Structure cont. (3/18)• Base pairs (A -- T,C -- G) are attached to a sugar phosphate

backbone to form one of 2 strands of a DNA molecule.– Phosphate ((PO4) -3)– Deoxyribose

• Two strands are bonded together by the base pairs (A – T, C – G).• Results in mirror image or complementary strands, each is twisted

(or helical), and when bonded they form a double helix.• Direction of each strand (5’ meaning beginning or 3’ meaning end of

the strand)– 5’ and 3’ refer to position of bases in relation to the sugar molecule in

the DNA backbone.– Are important reference points to navigate the genome.– 2 complementary strands are oriented in opposite direction to each

other.

Page 5: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

DNA Structure

Page 6: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

Genome Size

Species Genome Size (in base pairs)

No. of Chromosomes

E. Coli 4.64 X 106 1

S. Cerevisae (yeast) 1.205X107 16

C. Elegans (nematode) 108 11/12

D. Melanogaster 1.7X108 4

M. Musculus 3 X 109 20

H. Sapiens 3 X 109

6 feet when completely

stretched out

23

A. Cepa (onion) 1.5X1010 8

Page 7: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

DNA hybridization (DL 3/18)

• Hybridization between complementary DNA sequences to form a double stranded DNA molecule.– One of the most important DNA technology

• Applications of Hybridization– PCR (Polymerase Chain Reaction)

• Enzymetically generating millions of copies of a tiny amount of a particular nucleic acid sequence.

– Northern blots analysis• Possible to study (in a semi-qualitative manner) the level of

transcription of a particular gene.

– DNA Microarrays• Can interrogate the level of transcription of several thousand of

different genes in one sample in one experiment.

Page 8: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

PCR (Polymerase Chain Reaction) (DL 3/18)

• PCR allows selected amplification of a DNA sequence.– Only a tiny amount of DNA is necessary to obtain a PCR product (a drop of blood or less is

enough).– Complementary DNA primers need to be designed.

• For this the DNA sequence flanking the target sequence needs to be known in advance.• Primers are short synthetic DNA sequences of about 20 bases (so called oligonucleotides) that can

specifically hybridize to a unique complementary DNA sequence.

• The approach– Genomic DNA (the template), Primers (the starters), deoxynucleotides (building blocks), a

special DNA polymerase that is resistant to heat (the motor of the reaction) are mixed together in one reaction tube.

– Reaction takes place in a thermocycler (an apparatus that allows one to precisely heat and cool the reaction).

– DNA is heated to almost boiling temperature which separates the 2 strands (whole process is called heat denaturation)

– Cooling of the mixture allows the primer to bind to their complementary sequence of the genomic DNA.

– Once the primers bind the DNA polymerase uses them as the start site to generate a copy of each strand of the targeted gene fragment building 2 new double stranded molecules.

– Doing it (denaturation followed by cooling) 30 times, results in 230 = 109 (1 billion) copies.

Page 9: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

Northern Blot analysis• The complete RNA content from a sample is separated according to size by

electrophoresis.• Usually done in a sheet of agarose (similar to gelatine)

– In response to electric current, larger molecules move slower, and smaller move faster, thus separating different RNA molecules by size.

• Then RNAs are transferred from gel to a filter membrane (blot)• Blot is then exposed to a solution containing a nucleic acid (probe) complement to the

sequence whose presence in the blot one wants to interrogate.• The probe may be cRNA or cDNA with detectable marking (radioactive isotope or a

fluoroscent tag)• If the targeted sequence is present in the blot then the probe hybridizes and sticks to

the blot at the location where the targeted sequence is located.• After washing off of excess probe -- a signal is detectable and its specificity can be

checked based on the expected size of the RNA that will correlate with how far it has migrated during electrophoresis.

• With this method – It is possible to study in a semiqualitative manner the level of transcription of a particular gene.

• Comparison of the results from different samples (e.g. different organs etc.) provides information about the transcriptional regulation of the gene.

Page 10: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

DNA Structure cont. (4/18)

• The order of nucleotide bases along a DNA strand is known as the sequence.

• The genetic information is encoded in the precise order of the base pairs.

• DL– GenBank database

http://www.ncbi.nlm.nih.gov/Entrez/– Human genome project

http://www.genome.gov/page.cfm?pageID=10001694 – DNA sequencing

• Is the process designed to precisely determine the sequence of bases in the DNA.

Page 11: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

Cells, DNA and genome (5,6,8/18)

• During cell division (Mitosis) the entire DNA of the cell is copied– 2 strands separate, complementary strands are generated.– Two duplicate DNA sequences are produced.

• Genome: an organism’s total DNA content• Diploid cells: cells that carry 2 genome copies• Haploid cells: have a single copy of the genome

– Reproductive germs cells (gametes), i..e., egg & sperm cells

• Human genome consists of– 22 autosomal chromosomes (same in males and females)– 2 sex chromosomes X and Y (males XY, females XX)

Page 12: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

Structure of Chromosomes (7/18)

• Center is called centromere.

• Two ends called Telomere.

• Center separates two arms– Short arm p– Long arm q.

Page 13: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

Structure of genes 9-11/18

• Genes are those parts of the genome that contain the information necessary for the building of proteins. (size:100-several million base pairs)

• Exon (coding sequence), Intron (non-coding sequence), regulatory region (at the two ends –for regulating how actively protein is to be synthesized from them)– Eukaryotes (organisms whose cell have nucleus) have genes

segmented into exons and introns– Introns can occur between individual codons or within a single

codon.

• Promoter (a regulatory element in the 5’ end)– Consists of several short sequences which are consensus

binding sites for a number of proteins called transcription factors.

Page 14: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

(DL 10/18)

• Prokaryotes (do not have nucleus) – genes are not segmented to exons and introns.

• Eukaryotes (normally segemented to exons and introns)– Except mitochondrial genes & a few nuclear genes.

• During gene expression exons and introns are transcribed to form a pre-mRNA

• RNA splicing -- removes introns and exons and produces mature mRNA molecule that codes for a polypeptide.

• Exons – sequences that are represented in the mature mRNA– May or may not code for a protein– Eg. Exons at the 3’ or 5’ end of mRNA may not be translated to

proteins

Page 15: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

Some Genes (from Mishra’s slides)

Gene Product Organism Exon

Length

#Introns Intron

Length

Adenoshine deaminase Human 1500 11 30,000

Apolipoprotein B Human 14,000 28 29,000

Erythropoietin Human 582 4 1562

Thyroglobulin Human 8500 = 40 100,000

-interferon Human 600 0 0

Fibroin Silk Worm 18,000 1 970

Phaseolin French Bean 1263 5 515

Page 16: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

Some human gene locations (From Mishra’s slides)

Genes chromosome

Insulin 11

Galactokinase 11

Viral oncogene homologues

C-sis 22

C-mos 8

C-Ha-Ras-1 11

C-myb 6

Interferons

& luster 9

12

Genes chromosome

-globin cluster 16

-globin cluster 11

Immunoglobulin

(light chain) 2

(light chain) 22

Heavy Chain 14

Pseudogenes 9,32,15,18

Growth Hormone gene cluster

17

Thymidine kinase 17

Page 17: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

Gene expresion (12/18)

• Gene expression (Transcription and Translation) – from genes to making proteins the 2 step process

• Transcription: genetic information in DNA is copied into messenger RNA (mRNA)

• Translation: mRNA is used as a template to synthesize a protein.

Page 18: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

Central Dogma

• Due to Francis Crick – 1958 states that these information flows are all unidirectional:

– “The central dogma states that once `information' has passed into protein it cannot get out again. The transfer of information from nucleic acid to nucleic acid, or from nucleic acid to protein, may be possible, but transfer from protein to protein, or from protein to nucleic acid is impossible. Information means here the precise determination of sequence, either of bases in the nucleic acid or of amino acid residues in the protein.”

Page 19: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

Transcription (13/18)• RNA (Ribonucleic acid)

– Similar to DNA (except for a chemical modification of the sugar backbone)– Instead of T contains U (Uracil) which binds with A.– Is not double stranded but single stranded– RNA molecules tend to fold back on themselves to make helical twisted and rigid

segments.• RNA is synthesized

– By unwinding the DNA double helix separating the 2 strands.– Using one of the strands as a template along which to build the RNA molecule– Accomplished by Enzyme RNA polymerase (binds to promoter and copies or

transcribes the gene in its full length)– Resulting molecule is called Pre-mRNA– Single stranded pre-mRNA is then procesed.– Splicing (mediated by spliceosome consisting of RNA and proteins) removes the

introns.– Ends modified (Capping modifies 5’ end and Polyadenylation adds adenines at

the 3’ end) to enhance stability

Page 20: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

Translation (14/18)• mRNA is used as a template to synthesize a protein.• Translation takes place outside the nucleus in the cytoplasm within organelles called

endoplasmic reticulum.• Except for the 5’ & 3’ end of the mRNA (which are non-coding) the rest of the

molecule codes for 1 protein• Proteins: made up of aminoacids

– 20 different aminoacids used to build proteins in humans– Each encoded by one or more sets of 3 nucleotides (called triplets or codons) – Initial codon is always AUG (coding for methionine)– Translation is terminated by one of 3 `stop’ codons.

• Translation process is carried out by ribosomes which scan the mRNA, & build the polypetide chain from aminoacids supplied by transport RNAs (tRNA).

– Starts at a particular location of the mRNA called the translator start sequence (usually AUG)

– tRNA (transfer RNA) are made up of a group of small RNA molecules each with specificity for a particular amino acid.

– tRNAs carry the aminoacids to the ribosomes, the site of protein synthesis, where they are attached to a growing polypetide.

– Translation stops when one of UAA, UAG or UGA is encountered

Page 21: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

Post-translational modification (DL)

• The polypetide chain that results from mRNA translation is often subject to chemical modifications. Eg.– Glycosylation, phosphorylation, hydrooxylation– Addition of lipid groups (eg. Fatty acyl or prenyl

groups)– Addition of co-factors (e.g. a heme molecule) – Or proteolytic cleavage

• The type of modification a protein undergoes depends on its function and sub-cellular location.

Page 22: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

Genetic Code (15/18)• The combination of nucleotides that build the different codons

represents the genetic code.• Codon = 3 nucleotides; 4 kinds of nucleotides. So 4X4X4 = 64

possible codons.• But 20 amminoacids + start & stop.• So several codons can specify the same aminoacid. (genetic code

is degenerate)• Start codon (AUG) and Stop codons (UAA, UAG, UGA).• Open reading frame (ORF) – the sequence of nucleotides between

and including the start and stop codons. • The Nobel Prize in Physiology or Medicine 1968 – Holley,

Khorana and Nirenberg– for their interpretation of the genetic code and its function in protein

synthesis– http://www.nobel.se/medicine/laureates/1968/

Page 23: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

Amino Acids with Codes (From Mishra’s slide)

A Ala alanine GC(U+A+C+G)C Cys cysteine UG(U+C)D Asp aspertic acid GA(U+C)E Glu glutamic acid GA(G+A)F Phe phenylanine UU(U+C)G Gly glycine GG(U+A+C+G)H His histine CA(U+C)I Ile isoleucine AU(U+A+C) K Lys lysine AA(A+G)L Leu leucine (C+U)U(A+G) + CU(U+C)M Met methionine AUGN Asn asparginine AA(U+C)P Pro proline CC(U+A+C+G)Q Gln glutamine CA(A+G)R Arg arginine (A+C)G(A+G)+CG(U+C) S Ser serine (AG+UC)(U+C)+UC(A+G) T Thr threonine AC(U+A+C+G) V Val valine GU(U+A+C+G)W Trp tryptophan UGGY Tyr tyrosine UA(U+C)

Page 24: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

Biological Function of Proteins• Enzyme catalysis: DNA polymerases, lactate dehydrogenase,

trypsin• Transport: hemoglobin, membrane transporters, serum albumin• Storage: ovalbumin, egg-white protein, ferritin• Motion: myosin, actin, tubulin, flagellar proteins• Structural and mechanical support: collagen, elastin, keratin, viral

coat proteins• Defense: antibodies, complement factors, blood clotting factors,

protease inhibitors• Signal transduction: receptors, ion channels, rhodopsin, G

proteins, signalling cascade proteins• Control of growth, differentiation and metabolism: repressor

proteins, growth factors, cytokines, bone morphogenic proteins, peptide hormones, cell adhesion proteins

• Toxins: snake venoms, cholera toxin

Page 25: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

Differential Gene Expression 17/18

• All cells in the body (that contain a nucleus) carry the full set of genetic information, but only express about 20% of the genes at any particular time.

• Gene expression is selective– Different proteins are expressed in different cells according to

the function of the cell.

• Gene expression is tightly controlled and regulated.– The differential expression of genes ensures that cells develop

correctly and can differentiate into and function as specialized cell types. For eg. Neurons, muscle cell, or fibroblast.

Page 26: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

cDNA and gene expression (DL)

• Goal: Identify all possible genes expressed in one tissue or cell line. (Use cDNA libraries)

• cDNA libraries are prepared from mRNA isolated from the cells or tissue being studied.

• cDNA are DNA molecules that are complementary to the mRNA sequences in a sample.

• cDNA is synthesized by the enzyme reverse transcriptase (RT), that uses the mRNA as a tenplate.– RT is a viral enzyme used by viruses whose genome is made of RNA,

not DNA.• A cDNA library represents the collection of all genes expressed in a

particular cell or tissue type.• DNA sequence mRNA sequence cDNA sequence (much

smaller as while generating mRNAs the introns are eliminated)– Hence very useful when trying to isolate a particular gene to study the

protein it codes.

Page 27: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

NEXT SEC.

Page 28: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

Gene Cloning 1/11• First step in identifying genes and their function is to isolate it from the rest

of genome and produce a large quantity of it (called cloning a gene).• Cloning a DNA fragment using bacteria

– DNA fragment is isolated from the entire genome using restriction enzyme.• These enzymes can cut the DNA (in a staggered fashion or straight through) at specific

sites defined by a short sequence.• Typically they recognize specific DNA sequences of 4, 6, or 8 bases• These enzymes are found in bacterias, where their role is to protect the bacteria from

foreign DNA by digesting them into smaller pieces– This fragment is inserted into a vector (like a mini-chromosome) using DNA

ligase and the recombinant product is introduced into bacteria (this process is called transformationtransformation)

• Cloning vectors are DNA fragments that are able to replicate within a cell and allow the addition of exogenous DNA.

• They are derived from plasmids, viruses, phages or chromosomes.• Vectors are classified according to: the type of host cell they can replicate in, or the

size of the exogenous DNA they are able to carry.– The bacteria now makes new copies with every cell division.

Page 29: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

DNA Sequencing (DL 1/11)

• It is the process designed to precisely determine the sequence of bases in the DNA.

• Involves enzymetically copying the DNA in the presence of compounds that terminate this copying process in a base specific manner, resulting in a mixture of DNA copies that differ in size by one base.

• Different technologies are used to resolve the mixture and detect the different fragments.

Page 30: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

Cloning issues (2-3/11)

• Clones from genomic DNA contain introns (non-coding sequence) and is very large and difficult to analyze for function.

• Alternative: start from mRNA. Convert to cDNA and clone the cDNA.

Page 31: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

Gene function characterization(4/11)

• To characterize the function of a gene it is important to know the sequence and compare it to other sequences in the databases. Identify where and under what condition it is expressed and what function, if known, it has in other organisms.

• Also do gene expression studies.

Page 32: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

Gene expression studies (5/11) • Allow you to understand how a gene is regulated in a tissue or a cell type.• Most useful way of studying gene expression is by measuring the levels of

mRNA produced from a particular gene in a particular tissue.• Application: to understand certain biological process it is useful to study the

differences in gene expression which occur during such processes. E.g.– It is of interest to know which genes are induced or repressed, say in the liver,

after a particular drug is taken.– Or which genes are expressed in a tumor but not in the surrounding normal

tissue.• Some techniques for analyzing mRNA level of a single gene or to quantify

gene expression– Northern blots– Quantitative reverse transcriptase PCR (QT-RT-PCR)– DNA microarrays– Proteomics (analysis of the protein synthesis that results from gene expression)

Page 33: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

DNA microarrays (6/11)• Consist of thousands of DNA probes corresponding to different genes

arranged as an array.• Each probe (sometimes consisting of a short sequences of synthetic DNA)

is complementary to a different mRNA (or cDNA)• mRNA isolated from a tissue or cell type is converted to fluoroscently

labeled mRNA or cDNA and is used to hybridize the array.• All expressed genes in the sample will bind to one probe of the array and

generate a fluoroscent signal.• A DNA microarray can interrogate the level of transcription of several

thousand of different genes from one sample in one experiment. (One DNA microarray experiment reveals the mRNA levels of 1000s of genes from one tissue or cell type at one time point)

• Particularly useful when studying the effect of environmental factors on gene expression.

• A fingernail size chip can interrogate 10,000 different transcripts. Chip has 30-40 different probes; half of them are designed to perfectly match 20 nucleotide stretches of the gene and the other half contains a mismatch as a control to test for specificity of the hybridization signal.

Page 34: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

Pharmacogenomics 7/11

• It refers to the study of differential gene expression applied to drug discovery and optimization.

• Applications (Differential gene expression studies in special tissues or cell types may)– Find new disease mechanisms of a drug– Discover new drug targets– Confirm expected action of mechanism of a drug– Choose from best candidate compound based on

optimal expression profile.– Figure out apriori with who will benefit from a drug

and who won’t.

Page 35: Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …)

Model organisms 9/11

• Indispensable tool to study the function of a gene.

• Range from bacteria and yeast to animals amenable to genetic modification.– Worms, insect cells, frog eggs, flies, zebra fish, mice,

mammalian (human) cell lines.

• In general, more complex the organism more difficult to do genetic modification, but more relevant the model becomes to humans.