interrelating different types of genomic data from proteome to secretome: ‘oming in on the...
TRANSCRIPT
Interrelating Different Types of Genomic Data
From Proteome to Secretome: ‘Oming in on the Function
Objectives
Outline the goal of functional genomics Describe the studies of H. Antelmann Explain the “-Ome” and it’s purpose Define the “-Omes” Identify the different computational and
experimental methods for defining the “-Omes”
Explain how “-Omes” are interrelated Summarize the ultimate goal of genomics
The Goal of Genomics
To complement the genomic sequence by assigning useful biological information to every gene
To improve the understanding of how different biological molecules contained within the cell combine to make the organism possible
Additional Goals
To define the three-dimensional structures of the macromolecules, their sub-cellular localizations, intermolecular interactions, and expression levels
Haike Antelmann & Group
Institute for microbiology and molecular biology @ Ernst-Moritz-Arndt-Universität Greifswald: Greifswald, Germany
Previously the group used computational methods to predict all exported proteins
Now, aim to verify previous predictions by experimentally characterizing the entire population of secreted proteins using 2D gel electrophoresis and mass spectrometry
They showed that about 50% of their original predictions were accurate
The new Lexicon of the “-Ome”
Antelmann coined the term “secretome” to define the varied populations and subpopulations in the cell
-Omes can be divided into two categories: Those that represent a population of
molecules Those that define their actions
-Omes
Provides an inventory or “parts list” of molecules contained within an organism Transcriptome: the population of mRNA
transcripts in the cell, weighted by their expression levels
Glycome: the popluation of carbohydrate molecules in the cell
Secretome: the population of gene products that are secreted from the cell
Ribonome: the population of RNA-coding regions of the genome
-Omes Continued
Transportome: the population of the gene products that are transported; this includes the secretome
Functome: the population of gene products classified by their expression levels
Translatome: the population of proteins in the cell, weighted by their expression levels
Foldome: the population of gene products classified through their tertiary structure
-Omes Continued
Describes the actions of the protein products Genome: the full complement of genetic
information both coding and non coding in the organism
Proteome: the protein coding regions of the genome
Physiome: quantitative description of the physiological dynamics or functions of the whole organism
Metabolome: the quantitative complement of all the small molecules present in a cell in a specific physiological state
-Omes Continued
Morphome: the quantitive description of anatomical structure, biochemical and chemical composition of an intact organism, including its genome, proteome, cell, tissue and organ structures
Interactome: list of interactions between all macromlecules in a cell
Orfeome: the sum total of open reading frames in the genome, without regard to whether or not they code; a subset of this is the proteome
Phenome: qualitative identification of the form and function derived from genes , but lacking a quantitative, integrative definition
-Omes Continued
Regulome: genome-wide regulatory network of a cell
Cellome: the entire complement of molecules and their interactions within a cell
Operome: the characterization of proteins with unknown biological function
Pseudome: the complement of pseudogenes in the proteome
Unknome: genes of unknown factor
Computational Methods
Algorithmic methods for predicting genes, protein structure, interactions, or localization based patterns in individual sequences or structures.
Annotation transfer through homology. (Inferring structure or function based on sequence and structural information of homologous proteins.)
“Guilt-by-Association” method based on clustering where functions or interactions are inferred from clusters of functional genomic data, such as expression information.
Annotation Transfer through Homology
In SWISS- PROT, as in most other sequence databases, two classes of data can be distinguished: the core data and the annotation. For each sequence entry the core data consists of the sequence data, the citation information (bibliographical references), and the taxonomic data (description of the biological source of the protein), while the annotation consists of the description of the following items: Functions of the protein Post- translational modifications. For example
carbohydrates, phosphorylation, acetylation, GPI- anchor, etc.
Domains and sites. For example calcium binding regions, ATP- binding sites, zinc fingers, homeobox, kringle, etc.
Secondary structure, Quaternary structure Similarities to other proteins, Diseases associated with
deficiencies in the protein, Sequence conflicts, variants, etc.
Guilt-by-Association
For assessing gene function (although not logically precise): as genes already known to be related do, in fact, tend to cluster together based on their experimentally determined expression patterns. The approach is made more systematic and statistically sound by calculating the probability that the observed functional distribution of differentially expressed genes could have happened by chance.
Experimental Methods
Most prominent method is the two-dimensional electrophoresis to isolate proteins followed by mass spectrometry for protein identification.
Protein chip system, capable of high-throughput screening of protein biochemical activity.
Also sometimes use, transposon insertion methodology.
2-D Electrophoresis
First introduced in 1975. Most commonly used method for protein separation in proteomics.
Proteins are first separated across a gel according to their isoelectric point, then separated in a perpendicular direction on the basis of their molecular weight.
Electrophoresis in which a second perpendicular electrophoretic transport is performed on the separate components resulting from the first electrophoresis. This technique is usually performed on polyacrylamide gels.
Mass Spectrometry
In a typical approach, this technique for measuring and analyzing molecules involves introducing enough energy into a target molecule to cause its disintegration. The resulting fragments are then analyzed, based on their mass/ charge ratios, to produce a "molecular fingerprint.
Transposon Insertion Methodology
A segment of DNA which contains the insertion elements at either end but can contain just about anything in the middle (genes, markers, etc.). These types of transposons tend to be very large, and many of them came about when the inner two insertion elements of two smaller transposons stopped working and only the two at the far ends continue to work, so that when the transposon moves, it takes everything in between the two original transposons with it. Some composite transposons are used in genetics experiments; Tn5 and Tn10 are two such composite transposons which have genes that encode resistance to certain antibiotics.
Noise
Functional Genomics experiments give rise to very complicated data that is inherently hard to interpret
This data is often plagued with noise Both factors can lead to inaccuracies
and conflicting interpretations Average or combine data to obtain
more accurate results
Interrelating Different -Omes
Fundamental approach in genomics is to establish relationships between the different ‘omes
Piecing the individual –omes together, hope to build a full and dynamic view of the complex process that support the organism
Example: How do the proteome and regulome combine to produce the translatome?
Interrelating Different –Omes Cont.
Defining or assigning one ‘ome based on another
Comparing one ‘ome with another to better understand the processes that shift one population into its successor
Calculating “missing” information in one ‘ome based on information in another
Describing the intersection between multiple populations
Final Thoughts
Ultimate goal of genomics: the clarification of the functome, but there are many intermediate steps
By viewing the cell in terms of a list of distinct parts, definition of each part is possible. Determine and categorize functional information for each gene.
Computational and experimental techniques are valuable and complementary
Genomic approaches result in inaccurate and noisy data that must be analyzed further for accuracy
Sources
Cambridge Healthtech Institute; www.genomicglossaries.com
EMBL-EBI: European Bioinformatics Institute; www.ebi.ac.uk/swissprot/Publications/ismb97.html
Greenbaum, Dov. Luscombe, Nicholas. Jansen, Ronald. Qian, Jiang. Gerstein, Mark. “Interrelating Different Types of Genomic Data, from Proteome to Secretome: ‘Oming in on Function”
Rolf Apweiler et. al " Protein Sequence Annotation in the Genome Era: The Annotation Concept of SWISS- PROT + TrEMBL" Intelligent Systems in Molecular Biology, 1997