human genome. human genome contents: 3200 mb genes: 1200 mb –genes 48 mb –related 1152 mb:...

42
Human Genome

Upload: elmer-mathews

Post on 13-Dec-2015

223 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

Human Genome

Page 2: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

Human Genome Contents: 3200 Mb

• Genes: 1200 Mb– Genes 48 Mb– Related 1152 Mb: Pseudogenes, Gene Fragments,

Introns

• Intergenic DNA 2000 Mb– Interspersed Repeats 1400 Mb– Microsatellite (short tandem repeats) 90 Mb

• Telomeres: End Sequences• Centromeres:• Single Nucleotide Polymorphisms

Page 3: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

Chromosomes

• Shorter than DNA they contain

• Histones: DNA binding proteins

• Two Copies held together by centromeres

• Telomere: Terminal region

• Two humans differ by 0.1%

Page 4: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb
Page 5: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

Donors

• HGP: – Opportunity advertised near labs

– First come; First Taken

– 5-10 samples for every one used

– No link between donor and sample

• Celera: 5 subjects (three men; two women)– One Asian; One African-American; One Hispanic; Two

Caucasians

– Craig Venter

Page 6: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

Basic Technology

• Physical Mapping

• Cloning

• Shotgun Sequencing

• Computational Sequence Reassembly

Page 7: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

STS

• High Resolution, Rapid, Simple

• 100 - 500 bp

• Collection of overlapping fragments

• Each point represented multiple times in random fragments

• Sequence must be known

• Unique in chromosome under study

Page 8: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

Physical Mapping

• A set of clone fragments whose position relative to each other is known

• Restriction Maps: Relative locations of Restriction Sites• Fluorescent in situ hybridization (FISH): Marker

locations mapped by hybridizing probe to chromosomes• Sequence Tagged Sites (STS): Positions of short

sequences mapped by PCR or hybridization analysis of genome fragments

• Expressed Sequence Tags (EST): short sequences from cDNA clones

Page 9: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Genome cut into fragments

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Cloned as library in vector (red)

Page 10: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

Hybridisation mapping:1 pick clones into a grid 2 hybridise to probe 1 3 hybridise to probe 2 4 build contigs In this case, two clones hybridised to both probes and thus they are predicted to overlap. Those hybridising to only one probe are predicted to extend out to the left or right.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Page 11: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Overlap by sharedbands

Fingerprinting:Digest clones and runOn gel

Page 12: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

Assembly of Contiguous DNA Sequence

• Shotgun Approach

• Contigs: Result of joining overlapping sequences

• Scaffold: Result of connecting contigs by filling in gaps

• BAC: Bacteria artificial chromosome vector: Inserts 100 - 200 kbs

Page 13: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

Regional mapping

Page 14: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

Regional mapping

Page 15: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

Minimal tiling path selected for sequencing.

Regional mapping

Page 16: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

>20 kbp

~300 bp

Molecular weightmarker every

5th laneRestriction fragmentfingerprinting

- BAC clones are grown

in 96-well format

- Hind III digest

- 1% agarose

Page 17: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

Contig assembly

Clone A B C D E F G

FPC* Overlap identification by

restriction pattern similarities Facilitated contig assembly

*Sanger Centre C. Soderlund, I Longden and R. Mott

*

*

*

*

*

*

All restriction fragments withina clone selected for the tilingpath must be verified by theirpresence in overlapping clones. : vector fragments

: insert fragments

Page 18: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

BCM-BCM-HGSCHGSC

Page 19: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

Shotgun Sequencing I :RANDOM PHASE

Bac Clone: Bac Clone: 100-200 kb100-200 kb

Sheared DNA: Sheared DNA: 1.0-2.0 kb1.0-2.0 kb

SequencingSequencingTemplates: Templates:

RandomRandomReadsReads

BCM-BCM-HGSCHGSC

Page 20: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

Shotgun Sequencing II:ASSEMBLY

ConsensusConsensusSequenceSequence

GapGap

Low Base Low Base QualityQuality

SingleSingleStrandedStrandedRegionRegion

Mis-AssemblyMis-Assembly

((InvertedInverted))

BCM-BCM-HGSCHGSC

Page 21: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

ConsensusConsensusSequenceSequence

GapGap

Low Base Low Base QualityQuality

SingleSingleStrandedStrandedRegionRegion

Mis-AssemblyMis-Assembly

((InvertedInverted))

BCM-BCM-HGSCHGSC

Shotgun Sequencing III: FINISHING

Page 22: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

ConsensusConsensusSequenceSequence

GapGap

SingleSingleStrandedStrandedRegionRegion

Mis-AssemblyMis-Assembly

((InvertedInverted))

BCM-BCM-HGSCHGSC

Shotgun Sequencing III: FINISHING

Page 23: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

ConsensusConsensusSequenceSequence

GapGap

Mis-AssemblyMis-Assembly

((InvertedInverted))

BCM-BCM-HGSCHGSC

Shotgun Sequencing III: FINISHING

Page 24: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

ConsensusConsensusMis-AssemblyMis-Assembly

((InvertedInverted))

BCM-BCM-HGSCHGSC

Shotgun Sequencing III: FINISHING

Page 25: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

BCM-BCM-HGSCHGSC

Shotgun Sequencing III: FINISHING

High Accuracy Sequence:High Accuracy Sequence:< 1 error/ 10,000 bases< 1 error/ 10,000 bases

Page 26: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

Whole Genome Shotgun Sequencing

Whole Genome: Whole Genome: 3,000 Mb3,000 Mb

Sheared DNA: Sheared DNA: 1.0-2.0 kb1.0-2.0 kb

SequencingSequencingTemplates: Templates:

RandomRandomReadsReads

BCM-BCM-HGSCHGSC

Page 27: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

Whole Genome Shotgun Sequencing:Assembly

ConsensusConsensusSequenceSequence

GapGap

Low Base Low Base QualityQuality

SingleSingleStrandedStrandedRegionRegion

Mis-AssemblyMis-Assembly

((InvertedInverted))

BCM-BCM-HGSCHGSC

Page 28: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

Whole Genome Shotgun Sequencing:Assembly

ConsensusConsensusSequenceSequence

GapGap

Low Base Low Base QualityQuality

BCM-BCM-HGSCHGSC

Page 29: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Random fragmentation of genome produces good sampling of itssequence space. Overlaps are identified, and subassembly of sequence takes place after cloning into universal vector.

Page 30: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Digested into RandomFragments

Page 31: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Cloned into Vector

Page 32: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Sequenced from know ends of plasmid (vector)

Page 33: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Assembled into contigs. Gaps and single-stranded regions identified for further study. Targeted fornew sequencing. Double-Barreled: Both Strands.

Page 34: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

In the gaps:

Page 35: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb
Page 36: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

Whole-Genome Shotgun Sequencing

• Speed-up: Assembled Correctly?• Avoid up-front mapping• Huge amount of computer time to identify

overlaps• Have to reference a map• Repeats are a problem:

– Leave out sequence between repeats– Missing Reference End Sequence means Error

Page 37: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb
Page 38: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

HGP

• Isolate large fragments in BACs with framework of landmark-based physical map

• Sequence on clone-by-clone basis

• Time-Consuming subcloning of random fragments and physical mapping

Page 39: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 40: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

Sequence Reassembly

• Phrap

• Shortest Covering Superstring

• Map Assembly

• Overlap: Finding overlapping fragments

• Layout: ordering fragments

• Consensus: Sequences from layout

Page 41: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 42: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.