whole genome sequencing of bacteria & analysis

36
WHOLE GENOME SEQUENCING OF BACTERIA & ANALYSIS ELAMURUGAN. A Ph.D Scholar, Vet. Immunology

Upload: drelamuruganvet

Post on 10-May-2015

800 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Whole genome sequencing of bacteria & analysis

WHOLE GENOME SEQUENCING OF BACTERIA & ANALYSIS

ELAMURUGAN. A

Ph.D Scholar,

Vet. Immunology

Page 2: Whole genome sequencing of bacteria & analysis

INTRODUCTION 1977 - first complete genome to be sequenced was

bacteriophage X174 - 5386 bp 1995 - first complete genome sequence from a free

living organism - Haemophilus influenzae (1.83 Mb) by whole genome shotgun approach

Sanger & Coulson (1977) - used chain-terminating dideoxynucleotide analogues

Maxam & Gilbert (1977) chemical degradation DNA sequencing - terminally labeled DNA fragments were chemically cleaved at specific bases and separated by gel electrophoresis

Page 3: Whole genome sequencing of bacteria & analysis

http://www.genomesonline.org/cgi-bin/GOLD/sequencing_status_distribution.cgi

429

Genome online database (GOLD)

Page 4: Whole genome sequencing of bacteria & analysis

ARCHON X PRIZE

X PRIZE Foundation in Santa Monica, CA, has introduced the Archon X PRIZE for Genomics and will award a sum of $10 million to the first team that can design a system capable of sequencing 100 human genomes in 10 days

Page 5: Whole genome sequencing of bacteria & analysis

SEQUENCING TECHNOLOGY

First generation Sanger’s dideoxy chain terminating tech Maxam & Gilbert chemical degradation tech

Next generation sequencing (NGS) 454/Roche - pyrosequencing Illumina/ Solexa - reversible dye terminators SOLiD /ABI- sequential ligation of oligonucleotide

probes

Second generation HT-NGS – sequencing after amplification

Page 6: Whole genome sequencing of bacteria & analysis
Page 7: Whole genome sequencing of bacteria & analysis

Heliscope SMRT (Pacific biosciences) Single molecule real time (RNAP) sequencer Nanopore DNA sequencer Ion Torrent sequencing technology (PostLight) VisiGen biotechnologies – FRET

Advantages of 3rd generation HT-NGS over 2nd higher throughput faster turnaround time longer read lengths higher consensus accuracy small amounts of starting material low cost

Third generation HT-NGS - Single molecule sequencing

Page 8: Whole genome sequencing of bacteria & analysis
Page 9: Whole genome sequencing of bacteria & analysis

ADVANTAGES OF HT-NGS

Massive parallel sequencing of hundreds of thousands or millions of templates

Preliminary and tedious cloning work is eliminated and substituted by PCR amplification

Most recent technologies, even PCR is eliminated, because single DNA molecules

Economic Reduced time

Page 10: Whole genome sequencing of bacteria & analysis

DISADVANTAGES OF HT-NGS

Most NGSTs produce short reads Constructions of fragment libraries remain tricky

and involve several steps of fragmentation, adaptor ligation and PCR amplification

Short homopolymers with the 454 technology Modified nucleotides cause mis-incorporation or

block further incorporation if the florescent moiety cannot be completely removed

Assembly of short reads into longer sequences

Page 11: Whole genome sequencing of bacteria & analysis
Page 12: Whole genome sequencing of bacteria & analysis

Illumina/ Solexa technology

Page 13: Whole genome sequencing of bacteria & analysis
Page 14: Whole genome sequencing of bacteria & analysis
Page 15: Whole genome sequencing of bacteria & analysis
Page 16: Whole genome sequencing of bacteria & analysis

zero-mode waveguides(ZMWs)

Page 17: Whole genome sequencing of bacteria & analysis

Selection of a technology for an experiment

Page 18: Whole genome sequencing of bacteria & analysis

GENOME ASSEMBLY

Assemblers can join sequences together based on overlapping regions between the sequences

Composed of contigs and scaffolds Contigs - contiguous consensus sequences that are

derived from collections of overlapping reads Scaffolds - ordered and orientated sets of contigs

that are linked to one another by mate pairs of sequencing reads

N50 - basic statistic for describing the contiguity of a genome assembly. The longer the N50 is, the better the assembly

Page 19: Whole genome sequencing of bacteria & analysis

Alignment against a reference genome sequence

De novo assembly Construction of longer sequences, such as contigs or genomes, from shorter sequences, such as sequence reads, without prior knowledge of the order of the reads or reference to a closely related sequence

Page 20: Whole genome sequencing of bacteria & analysis

GENE PREDICTION

Ab initio gene prediction - mathematical models rather than external evidence (such as EST and protein alignments) to identify genes and to determine their intron–exon structures

Evidence-driven gene prediction - using ESTs, can be used to identify exon boundaries unambiguously. Great potential to improve the quality of gene prediction in newly sequenced genomes. ESTs and proteins must first be aligned to the genome

Commonly used tools for gene prediction in prokaryotes Glimmer, GeneMark

Page 21: Whole genome sequencing of bacteria & analysis

GENOME ANNOTATION

Is the extraction of biological knowledge from raw nucleotide sequences

Seeks to identify every potential protein coding gene (ORFs)

Used to compare in available database like BlastP

‘Structural’ genome annotation is the process of identifying genes and their intron–exon structures

‘Functional’ genome annotation is the process of attaching meta-data such as gene ontology terms to structural annotations

Page 22: Whole genome sequencing of bacteria & analysis
Page 23: Whole genome sequencing of bacteria & analysis
Page 24: Whole genome sequencing of bacteria & analysis

APPLICATIONS

Very large no of short reads help to identify single nucleotide polymorphisms (SNP) when comparing them in reference genome

Identification of rearrangements, deletions, insertions, inversions

Used to generate expressed sequence tags (EST) from RNA sequencing

Also to detect small regulatory RNAs Illumia technoloy - ChIP Seq to study protein - DNA

interactions Metagenomics

Page 25: Whole genome sequencing of bacteria & analysis

LEADS TO DEVELOPMENT

Functional genomics Comparative genomics Environmental genomics (Metagenomics)

Page 26: Whole genome sequencing of bacteria & analysis

FUNCTIONAL GENOMICS Reveals genome structure and its functional relation Orthologs - they represent genes derived from a

common ancestor that diverged because of divergence of the organism, tend to have similar function

Paralogs are homologs produced by gene duplication and represent genes derived from a common ancestral gene that duplicated within an organism and then diverged, tend to have different functions

Xenologs are homologs resulting from the horizontal transfer of a gene between two organisms. The function of xenologs can be variable, depending on how significant the change in context was for the horizontally moving gene. In general, though, the function tends to be similar

Page 27: Whole genome sequencing of bacteria & analysis

PHYLOGENETIC ANALYSIS Phylogenetic trees, which are used to classify the

evolutionary relationships between homologous genes represented in the genomes of divergent species

Internal Nodes orDivergence Points

Branches or Lineages A

B

C

D

E

Terminal Nodes

Ancestral Node or ROOT of

the Tree

Page 28: Whole genome sequencing of bacteria & analysis

COMPARATIVE GENOMICS

Comparison of genome sequences reveals much information about genome structure and evolution, including importance of lateral gene transfer

Tool to discover how microbs adapted to particular ecology and in development of new therapeutic agents

Page 29: Whole genome sequencing of bacteria & analysis

METAGENOMICS

Genomics-based study of genetic material recovered directly from environmentally derived samples without laboratory culture and compared with all previously sequenced genes

Enable how microbs adapt extreme environments which help to discover new metabolic pathway and protective mechanisms

Page 30: Whole genome sequencing of bacteria & analysis

IMPACT OF GENOME SEQUENCING

Revealed genome reduction in I/C bacteria Genome plasticity (rearrangements, mobile elements) Gene duplication and diversification of protein

function Lateral gene transfer & acquisition of new functions Adaptation to environments, virulence Industrial process - fermentation tech, Bioremediation Biotransformation Development of vaccines Bacterial diversity Synthetic biology Epigenetics

Page 31: Whole genome sequencing of bacteria & analysis

REVERSE VACCINOLOGY

Use of genomic sequence information to identify novel and better suited protein candidates for vaccine

Serogroup B Neisseria meningitidis – based on genomic data all proteins predicted to be surface exposed, therefore accessible to antiobodies

Suitable candidates selected after sequencing various strains

Streptococcus agalactiae

Pan-genome composed of core genome, the genes present in all sequence strains and the dispensable genome made of genes present in a subset of strains

Page 32: Whole genome sequencing of bacteria & analysis

Synthetic biology - from sequence of entire genome to synthesize genes de novo

Identification of minimal genome, the smallest set of genes that enbles life - Mycoplasma genitalium

Page 33: Whole genome sequencing of bacteria & analysis

DATABASES AND TOOLS RELATED WITH BACTERIAL GENOMIC DATA NCBI Entrez Genome Project database:

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db = genomeprj A searchable collection of complete and incomplete (in-progress) large-

scale sequencing, assembly, annotation, and mapping projects for cellular organisms

NCBI, Bacteria Genome Database: http://www.ncbi.nlm.nih.gov/genomes/static/eub.html The Genome database provides views for a variety of genomes, complete

chromosomes, sequence maps with contigs, and integrated genetic and physical maps

Bacterial Genomes at The Sanger Institute: • http://www.sanger.ac.uk/Projects/Microbes/• This web contains a list of funded, on-going, or completed projects of

pathogens sequenced at this institute TIGR Comprehensive Microbial Resource (CMR):

http://cmr.tigr.org/tigr-scripts/CMR/CmrHomePage.cgi A free website displaying information on all the publicly available,

complete prokaryotic genomes

Page 34: Whole genome sequencing of bacteria & analysis

GOLD: Genomes OnLine Database: http://www.genomesonline.org/ A genome database containing information about which genomes

have been sequenced or are in progress Microbial Genome Database for Comparative Analysis (MBGD):

http://mbgd.genome.ad.jp/ A database for comparative analysis of completely sequenced

microbial genomes Virulence Factors of Bacterial Pathogens (VFDB):

http://zdsys.chgb.org.cn/VFs/main.htm VFDB is an integrated and comprehensive database of virulence

factors for bacterial pathogens Genome Information Broker:

http://gib.genes.nig.ac.jp/ A comprehensive data repository of complete microbial genomes in

the public domain. Many microbial genomes can be explored graphically

Islander, a Database of Genomic Islands: http://www.indiana.edu/~islander This database contains genomic islands discovered in completely

sequenced bacterial genomes

Page 35: Whole genome sequencing of bacteria & analysis

GenoList genome browser at Institute Pasteur: http://genolist.pasteur.fr/ Contains access to diverse genome browsers of pathogenic

bacteria IslandPath:

http://www.pathogenomics.sfu.ca/islandpath/update/IPindex.pl

An aid to the identification of genomic islands, including pathogenicity islands, of potentially horizontally transferred genes

HGT-DB: http://www.tinet.org/~debb/HGT/ A database containing the prediction of horizontally

transferred genes in several prokaryotic complete genomes E. coli genome project:

http://www.genome.wisc.edu A site devoted to the E. coli genome project with an

updated annotation of the genome

Page 36: Whole genome sequencing of bacteria & analysis

Thank you