human genome project
DESCRIPTION
hgpsequencingTRANSCRIPT
HUMAN GENOME PROJECTHUMAN GENOME PROJECTMS.RUCHI YADAV
LECTURERAMITY INSTITUTE OF
BIOTECHNOLOGYAMITY UNIVERSITY
LUCKNOW(UP)
HUMAN GENOME PROJECTHUMAN GENOME PROJECT GENOME SEQUENCING GENOME ASSEMBLY GENOME ANNOTATION
Human Genome Project Human Genome Project BackgroundBackgroundThe idea of sequencing the entire human
genome was First proposed in discussions at scientific meetings organized by the US Department of Energy and others from 1984 to 1986
Recommended a broader programme, to include:
The creation of genetic, physical and sequence maps of the human genome;
Parallel efforts in key model organisms such as bacteria, yeast, worms, fies and mice;
Development of technology in support of these objectives;
Research into the ethical, legal and social issues raised by human genome research.
HGP BACKGROUND……HGP BACKGROUND……Human Genome Organization (HUGO) &
International Human Genome Sequencing Consortium (IHGSC) was founded to provide a forum for international coordination of genomic research
HGP Project is constituted as the National Human Genome Research Initiative (NHGRI).
The collaboration was coordinated through periodic international meetings (referred to as ‘Bermuda meetings’)
Work was shared flexibly among the centres, with some groups focusing on particular chromosomes and others contributing in a genome-wide fashion.
The second principle was rapid and unrestricted data release. The centres adopted a policy that all genomic sequence data should be made publicly available without restriction within 24 hours of assembly (Bermuda Principle)
Human Genome ProjectBegun formally in 1990, the U.S. Human
Genome Project was a 13-year effort coordinated by the U.S. Department of Energy and the National Institutes of Health. The project originally was planned to last 15 years, but rapid technological advances accelerated the completion date to 2003. Project goals were to :-
Identify all the approximately 20,000-25,000 genes in human DNA,
Determine the sequences of the 3 billion chemical base pairs that make up human DNA,
Store this information in databases, Improve tools for data analysis, Transfer related technologies to the private
sector, and Address the ethical, legal, and social issues
(ELSI) that may arise from the project.
Milestones::June 2000: Completion of a working draft of
the entire human genome February 2001: Analyses of the working
draft are publishedApril 2003: HGP sequencing is completed
and Project is declared finished two years ahead of schedule
Timeline of large-scale genomic analyses.
HUMAN GENOMEThe human genome contains 3 billion chemical
nucleotide bases (A, C, T, and G). The average gene consists of 3000 bases, but sizes
vary greatly, with the largest known human gene being dystrophin at 2.4 million bases.
The total number of genes is estimated at around 30,000 much lower than previous estimates of 80,000 to 140,000.
Almost all (99.9%) nucleotide bases are exactly the same in all people.
The functions are unknown for over 50% of discovered genes.
HUMAN GENOME PROJECTHUMAN GENOME PROJECT
PUBLIC AND PRIVATE SECTOR
Two Different Groups Worked to Obtain the DNA Sequence of the Human Genome
The US HGP is a multinational consortium established by government research agencies and funded publicly.
Celera Genomics is a private company whose former CEO, J. Craig Venter and Francis collins, ran an independent sequencing project.
Differences arose regarding who should receive the credit for this scientific milestone.
June 6, 2000, the HGP and Celera Genomics held a joint press conference to announce that TOGETHER they had completed ~97% of the human genome.
PUBLISHEDThe International Human Genome
Sequencing Consortium published their results in Nature, 409 (6822): 860-921, 2001.
“Initial Sequencing and Analysis of the Human Genome”
Celera Genomics published their results in Science, Vol 291(5507): 1304-1351, 2001.
“The Sequence of the Human Genome”
HGP SEQUENCING HGP SEQUENCING STRATEGIESSTRATEGIESLARGE SCALE SEQUENCING TECHNOLOGY
Genome GlossaryGenome Glossary
Genome GlossaryGenome Glossary
Genome GlossaryGenome Glossary
HGP SEQUENCING HGP SEQUENCING STRATEGIESSTRATEGIES
The HGP project had three stages:
Genetic (or linkage) mappingPhysical mappingDNA sequencing
Three-Stage Approach to Three-Stage Approach to Genome SequencingGenome Sequencing
Strategic IssuesStrategic IssuesThere are two approaches for
sequencing large repeat-rich genomes.
First is a whole-genome shotgun sequencing approach, as has been used for the repeat-poor genomes of viruses, bacteria and flies, using linking information and computational
Second is the ‘hierarchical shotgun sequencing’ approach , also referred to as `map-based', `BAC-based' or `clone-by-clone'
‘‘HIERARCHICAL SHOTGUN SEQUENCING’HIERARCHICAL SHOTGUN SEQUENCING’`MAP-BASED', `BAC-BASED' OR
`CLONE-BY-CLONE'
Technology for large-scale sequencing
US HGP
Hierarchical shotgun Hierarchical shotgun sequencingsequencing
Clone-by-clone or hierarchicalClone-by-clone or hierarchicalsequencing strategysequencing strategy
Advantages:Ability to fill gap and re-sequence the
uncertain regions.Ability to distribute the clones to
other labsAbility to check the produced
sequence by restriction enzymesDisadvantages:Expensive and time-consuming for
construction of the physical mapExperienced personnel are required,
HIERARCHIAL ASSEMBLY OF SEQUENCE CONTIG SCAFFOLD
Assembly of the draft genome Assembly of the draft genome sequencesequence
The key steps in assembling individual sequenced clones into the draft genome sequence.
Levels of clone and sequence Levels of clone and sequence coverage.coverage.
WHOLE-GENOME SHOTGUNWHOLE-GENOME SHOTGUN
Developed by J. Craig Venter
Whole-Genome Shotgun Approach to Genome Sequencing
The whole-genome shotgun approach was developed by J. Craig Venter in 1992.
This approach skips genetic and physical mapping and sequences random DNA fragments directly.
Powerful computer programs are used to order fragments into a continuous sequence.
Whole-Genome Shotgun Sequencing
Shotgun Sequencing Strategy
Advantage: No physical map construction, Less risk of recombinant clones, Cost effective and fast. Ideal for small genome sequencingDisadvantage: Difficult to fill gaps and Re-track all the sequenced plasmids, Data less useful for positional cloning
Whole-Genome AssemblyWhole-Genome Assembly
Hierarchical vs. Shotgun Sequencing
Assembly of a mapped scaffold
Generating the draft genome sequence
Generating a draft sequence of the human
genome involved three steps: Selecting the BAC clones to be
sequenced,Sequencing them ,andAssembling the individual
sequenced clones into an overall draft genome sequence.
Assembly of the draft genome sequence
This process involved three steps:Filtering,Layout and Merging.The entire data set was filtered
uniformlyto eliminate contamination from
nonhumansequences and other artefacts that had
notalready been removed by the individualcentres.
Assembly of the draft genome sequenceThe sequenced clones were then
associated with specific clones on the physical map to produce a `layout'.
The fingerprint clone contigs were then mapped to chromosomal locations, using sequence matches to mapped STSs from four human maps; radiation hybrid maps, one YAC and two genetic maps together with data from FISH
The human genome assembly and annotation process
•BUILD CYCLE•DATA FREEZE•RELEASE
The human genome assembly and annotation process : INPUTS
Genome AnnotationGenome AnnotationFeature Annotation
◦Clone Features◦STS Features◦SNP Features◦Gene, mRNA(transcript), ◦misc_RNA(pseudogenes , and non-
coding transcripts, ) ◦Protein Features◦Repeat features
Genome AnnotationGenome AnnotationProducts
◦Sequence Data◦Resource Support( dbSNP , Entrez
Gene, Map Viewer, UniSTS)Data Access
◦BLAST◦Entrez Retrieval(Accession number,
gene symbol, or protein name)◦FTP(genomes FTP site)
Links from Map Viewer objects to other NCBI resources
UCSC put the human genome sequence on the web July 7, 2000
UCSC put the human genome sequence on CD in October 2000, with varying results
HGP ON WEBHGP ON WEBGenome Browsers were developed and are
maintained by the University of California at Santa Cruz (UCSC) .
EnsEMBL project of the European Bioinformatics Institute and the Sanger Centre Additional browsers have been created;
URLs are listed at www.nhgri.nih.gov/genome_hub.
These web-based computer tools allow users to view an annotated display of the draft genome sequence, with the ability to scroll along the chromosomes and zoom in or out to different scales.
In addition to using the Genome Browsers, one can download from these sites the entire draft genome sequence together with the annotations in a computer-readable format.
UCSC GENOME BROWSERUCSC GENOME BROWSER
Broad genomic landscapeBroad genomic landscapeThe distribution of GC content, CpG islandsRecombination rates, Repeat content andGene content of the human
genome.
Long-range variation in GC Long-range variation in GC contentcontent
GC-rich and GC-poor regions may have different biological properties:
Gene density, Composition of repeat sequences,
correspondence with cytogenetic bands
Recombination rateCpG islands are of particular Interest
because they are associated with the 5’ends of genes
Repeat content of the human Repeat content of the human genomegenome
INTERSPERSED REPEATSINTERSPERSED REPEATS
Gene content of the human Gene content of the human genomegenomeRNA genes andprotein-coding genes in the human
genome.Noncoding RNAs
There are several major classes of ncRNA
tRNA rRNAs small nucleolar RNAs (snoRNAs) aresmall nuclear RNAs (snRNAs) are critical
components of spliceosomes, the large ribonucleoprotein (RNP) complexes that splice introns out of pre-mRNAs in the nucleus.
ncRNAs do not have translated ORFs, are often small and are not polyadenylated.
Software tools for ab initio gene prediction
Software tools for ab initio gene prediction
Distribution of the Distribution of the homologues of the predicted homologues of the predicted human proteins.human proteins.
Conserved Conserved segments in segments in the human the human and mouse and mouse genome.genome. * * Each colour corresponds to a particular mousechromosome.
DISEASE GENESDISEASE GENES
DRUG TARGETSDRUG TARGETS
Research challenges in genetics--what we still don't know, even with the full human DNA sequence in hand.
Gene number, exact locations, and functions ,Gene regulation DNA sequence organization ,Chromosomal structure and
organization Noncoding DNA types, amount, distribution, information content,
and functions Coordination of gene expression, protein synthesis, and post-
translational events Interaction of proteins in complex molecular machines Predicted vs. experimentally determined gene function Evolutionary conservation among organisms ,Protein
conservation (structure and function) Proteomes in organisms Correlation of SNPs with health and disease Disease-susceptibility prediction based on gene sequence
variation Genes involved in complex traits and multigene diseases Complex systems biology, including microbial consortia useful for
environmental restoration Developmental genetics, genomics
“The more we learn about the human genome, the more there
is to explore”
“We shall not cease from exploration. And the end of all our exploring will be to arrive where we started, and
know the place for the first time.” T. S. Eliot