human genome project

HUMAN GENOME PROJECTHUMAN GENOME PROJECTMS.RUCHI YADAV

LECTURERAMITY INSTITUTE OF

BIOTECHNOLOGYAMITY UNIVERSITY

LUCKNOW(UP)

HUMAN GENOME PROJECTHUMAN GENOME PROJECT GENOME SEQUENCING GENOME ASSEMBLY GENOME ANNOTATION

Human Genome Project Human Genome Project BackgroundBackgroundThe idea of sequencing the entire human

genome was First proposed in discussions at scientific meetings organized by the US Department of Energy and others from 1984 to 1986

Recommended a broader programme, to include:

The creation of genetic, physical and sequence maps of the human genome;

Parallel efforts in key model organisms such as bacteria, yeast, worms, fies and mice;

Development of technology in support of these objectives;

Research into the ethical, legal and social issues raised by human genome research.

HGP BACKGROUND……HGP BACKGROUND……Human Genome Organization (HUGO) &

International Human Genome Sequencing Consortium (IHGSC) was founded to provide a forum for international coordination of genomic research

HGP Project is constituted as the National Human Genome Research Initiative (NHGRI).

The collaboration was coordinated through periodic international meetings (referred to as ‘Bermuda meetings’)

Work was shared flexibly among the centres, with some groups focusing on particular chromosomes and others contributing in a genome-wide fashion.

The second principle was rapid and unrestricted data release. The centres adopted a policy that all genomic sequence data should be made publicly available without restriction within 24 hours of assembly (Bermuda Principle)

Human Genome ProjectBegun formally in 1990, the U.S. Human

Genome Project was a 13-year effort coordinated by the U.S. Department of Energy and the National Institutes of Health. The project originally was planned to last 15 years, but rapid technological advances accelerated the completion date to 2003. Project goals were to :-

Identify all the approximately 20,000-25,000 genes in human DNA,

Determine the sequences of the 3 billion chemical base pairs that make up human DNA,

Store this information in databases, Improve tools for data analysis, Transfer related technologies to the private

sector, and Address the ethical, legal, and social issues

(ELSI) that may arise from the project.

Milestones::June 2000: Completion of a working draft of

the entire human genome February 2001: Analyses of the working

draft are publishedApril 2003: HGP sequencing is completed

and Project is declared finished two years ahead of schedule

Timeline of large-scale genomic analyses.

HUMAN GENOMEThe human genome contains 3 billion chemical

nucleotide bases (A, C, T, and G). The average gene consists of 3000 bases, but sizes

vary greatly, with the largest known human gene being dystrophin at 2.4 million bases.

The total number of genes is estimated at around 30,000 much lower than previous estimates of 80,000 to 140,000.

Almost all (99.9%) nucleotide bases are exactly the same in all people.

The functions are unknown for over 50% of discovered genes.

HUMAN GENOME PROJECTHUMAN GENOME PROJECT

PUBLIC AND PRIVATE SECTOR

Two Different Groups Worked to Obtain the DNA Sequence of the Human Genome

The US HGP is a multinational consortium established by government research agencies and funded publicly.

Celera Genomics is a private company whose former CEO, J. Craig Venter and Francis collins, ran an independent sequencing project.

Differences arose regarding who should receive the credit for this scientific milestone.

June 6, 2000, the HGP and Celera Genomics held a joint press conference to announce that TOGETHER they had completed ~97% of the human genome.

PUBLISHEDThe International Human Genome

Sequencing Consortium published their results in Nature, 409 (6822): 860-921, 2001.

“Initial Sequencing and Analysis of the Human Genome”

Celera Genomics published their results in Science, Vol 291(5507): 1304-1351, 2001.

“The Sequence of the Human Genome”

HGP SEQUENCING HGP SEQUENCING STRATEGIESSTRATEGIESLARGE SCALE SEQUENCING TECHNOLOGY

Genome GlossaryGenome Glossary

HGP SEQUENCING HGP SEQUENCING STRATEGIESSTRATEGIES

The HGP project had three stages:

Genetic (or linkage) mappingPhysical mappingDNA sequencing

Three-Stage Approach to Three-Stage Approach to Genome SequencingGenome Sequencing

Strategic IssuesStrategic IssuesThere are two approaches for

sequencing large repeat-rich genomes.

First is a whole-genome shotgun sequencing approach, as has been used for the repeat-poor genomes of viruses, bacteria and flies, using linking information and computational

Second is the ‘hierarchical shotgun sequencing’ approach , also referred to as `map-based', `BAC-based' or `clone-by-clone'

‘‘HIERARCHICAL SHOTGUN SEQUENCING’HIERARCHICAL SHOTGUN SEQUENCING’`MAP-BASED', `BAC-BASED' OR

`CLONE-BY-CLONE'

Technology for large-scale sequencing

US HGP

Hierarchical shotgun Hierarchical shotgun sequencingsequencing

Clone-by-clone or hierarchicalClone-by-clone or hierarchicalsequencing strategysequencing strategy

Advantages:Ability to fill gap and re-sequence the

uncertain regions.Ability to distribute the clones to

other labsAbility to check the produced

sequence by restriction enzymesDisadvantages:Expensive and time-consuming for

construction of the physical mapExperienced personnel are required,

HIERARCHIAL ASSEMBLY OF SEQUENCE CONTIG SCAFFOLD

Assembly of the draft genome Assembly of the draft genome sequencesequence

The key steps in assembling individual sequenced clones into the draft genome sequence.

Levels of clone and sequence Levels of clone and sequence coverage.coverage.

WHOLE-GENOME SHOTGUNWHOLE-GENOME SHOTGUN

Developed by J. Craig Venter

Whole-Genome Shotgun Approach to Genome Sequencing

The whole-genome shotgun approach was developed by J. Craig Venter in 1992.

This approach skips genetic and physical mapping and sequences random DNA fragments directly.

Powerful computer programs are used to order fragments into a continuous sequence.

Whole-Genome Shotgun Sequencing

Shotgun Sequencing Strategy

Advantage: No physical map construction, Less risk of recombinant clones, Cost effective and fast. Ideal for small genome sequencingDisadvantage: Difficult to fill gaps and Re-track all the sequenced plasmids, Data less useful for positional cloning

Whole-Genome AssemblyWhole-Genome Assembly

Hierarchical vs. Shotgun Sequencing

Assembly of a mapped scaffold

Generating the draft genome sequence

Generating a draft sequence of the human

genome involved three steps: Selecting the BAC clones to be

sequenced,Sequencing them ,andAssembling the individual

sequenced clones into an overall draft genome sequence.

Assembly of the draft genome sequence

This process involved three steps:Filtering,Layout and Merging.The entire data set was filtered

uniformlyto eliminate contamination from

nonhumansequences and other artefacts that had

notalready been removed by the individualcentres.

Assembly of the draft genome sequenceThe sequenced clones were then

associated with specific clones on the physical map to produce a `layout'.

The fingerprint clone contigs were then mapped to chromosomal locations, using sequence matches to mapped STSs from four human maps; radiation hybrid maps, one YAC and two genetic maps together with data from FISH

The human genome assembly and annotation process

•BUILD CYCLE•DATA FREEZE•RELEASE

The human genome assembly and annotation process : INPUTS

Genome AnnotationGenome AnnotationFeature Annotation

◦Clone Features◦STS Features◦SNP Features◦Gene, mRNA(transcript), ◦misc_RNA(pseudogenes , and non-

coding transcripts, ) ◦Protein Features◦Repeat features

Genome AnnotationGenome AnnotationProducts

◦Sequence Data◦Resource Support( dbSNP , Entrez

Gene, Map Viewer, UniSTS)Data Access

◦BLAST◦Entrez Retrieval(Accession number,

gene symbol, or protein name)◦FTP(genomes FTP site)

Links from Map Viewer objects to other NCBI resources

UCSC put the human genome sequence on the web July 7, 2000

UCSC put the human genome sequence on CD in October 2000, with varying results

HGP ON WEBHGP ON WEBGenome Browsers were developed and are

maintained by the University of California at Santa Cruz (UCSC) .

EnsEMBL project of the European Bioinformatics Institute and the Sanger Centre Additional browsers have been created;

URLs are listed at www.nhgri.nih.gov/genome_hub.

These web-based computer tools allow users to view an annotated display of the draft genome sequence, with the ability to scroll along the chromosomes and zoom in or out to different scales.

In addition to using the Genome Browsers, one can download from these sites the entire draft genome sequence together with the annotations in a computer-readable format.

UCSC GENOME BROWSERUCSC GENOME BROWSER

Broad genomic landscapeBroad genomic landscapeThe distribution of GC content, CpG islandsRecombination rates, Repeat content andGene content of the human

genome.

Long-range variation in GC Long-range variation in GC contentcontent

GC-rich and GC-poor regions may have different biological properties:

Gene density, Composition of repeat sequences,

correspondence with cytogenetic bands

Recombination rateCpG islands are of particular Interest

because they are associated with the 5’ends of genes

Repeat content of the human Repeat content of the human genomegenome

INTERSPERSED REPEATSINTERSPERSED REPEATS

Gene content of the human Gene content of the human genomegenomeRNA genes andprotein-coding genes in the human

genome.Noncoding RNAs

There are several major classes of ncRNA

tRNA rRNAs small nucleolar RNAs (snoRNAs) aresmall nuclear RNAs (snRNAs) are critical

components of spliceosomes, the large ribonucleoprotein (RNP) complexes that splice introns out of pre-mRNAs in the nucleus.

ncRNAs do not have translated ORFs, are often small and are not polyadenylated.

Software tools for ab initio gene prediction

Distribution of the Distribution of the homologues of the predicted homologues of the predicted human proteins.human proteins.

Conserved Conserved segments in segments in the human the human and mouse and mouse genome.genome. * * Each colour corresponds to a particular mousechromosome.

DISEASE GENESDISEASE GENES

DRUG TARGETSDRUG TARGETS

Research challenges in genetics--what we still don't know, even with the full human DNA sequence in hand.

Gene number, exact locations, and functions ,Gene regulation DNA sequence organization ,Chromosomal structure and

organization Noncoding DNA types, amount, distribution, information content,

and functions Coordination of gene expression, protein synthesis, and post-

translational events Interaction of proteins in complex molecular machines Predicted vs. experimentally determined gene function Evolutionary conservation among organisms ,Protein

conservation (structure and function) Proteomes in organisms Correlation of SNPs with health and disease Disease-susceptibility prediction based on gene sequence

variation Genes involved in complex traits and multigene diseases Complex systems biology, including microbial consortia useful for

environmental restoration Developmental genetics, genomics

“The more we learn about the human genome, the more there

is to explore”

“We shall not cease from exploration. And the end of all our exploring will be to arrive where we started, and

know the place for the first time.” T. S. Eliot

human genome project

Education

human genomethe human

entire human genome

human genome projectpublic

human genome projectms

genome shotgundeveloped

human genome parallel

human dna

genomewide fashion