. . . . . .
.
.
. ..
.
.
Orthologs Detection and Applications
Marcus Lechner
Bioinformatics Leipzig
2009-10-23
Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 1 / 25
. . . . . .
Table of contents
.. .1 Background on homology
.. .2 Proteinortho
.. .3 Domain wide commons
.. .4 Annotation pipeline
.. .5 References
Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 2 / 25
. . . . . .
Definitions
.Homologous genes..
.
. ..
.
.
have derived from a common ancestor
.Orthology..
.
. ..
.
.
evolved by speciation
thought to have a similar function
.Paralogy..
.
. ..
.
.
homologous genes within the same species
thought to have a related function (neo-/subfunctionalization)
out-paralogs arose form a duplication preceding a speciation
in-paralogs evolved by duplication subsequent to speciation
Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 3 / 25
. . . . . .
Example
Figure: Illustration of relationships: Three species with orthologs, xeno-, in- andout-paralogs. Adapted from [1].
Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 4 / 25
. . . . . .
Problems
.Interpretation..
.
. ..
.
.
original definition of homology (1843):’the same organ under every variety of form and function’ [2]
still a very good quantitative indication
but neither essential nor sufficient
Homology of two proteins is not equivalent with a common function,sequence nor structure!
Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 5 / 25
. . . . . .
Problems
.Interpretation..
.
. ..
.
.
original definition of homology (1843):’the same organ under every variety of form and function’ [2]
still a very good quantitative indication
but neither essential nor sufficient
Homology of two proteins is not equivalent with a common function,sequence nor structure!
Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 5 / 25
. . . . . .
Problems
.Interpretation..
.
. ..
.
.
original definition of homology (1843):’the same organ under every variety of form and function’ [2]
still a very good quantitative indication
but neither essential nor sufficient
Homology of two proteins is not equivalent with a common function,sequence nor structure!
Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 5 / 25
. . . . . .
Problems
.Relative definition..
.
. ..
.
.
in-/out-paralog definition only in subjection to a certain species
greatly dependent on available data
no absolute view
Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 6 / 25
. . . . . .
Problems
Figure: Illustration of relationships: Complete view needed.
Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 7 / 25
. . . . . .
Problems
Figure: Illustration of relationships: Complete view needed.
Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 8 / 25
. . . . . .
Problems
.Information benefit..
.
. ..
.
.
duplications are known to be a major source of innovation in evolution
proteins are homologs per definition, if they have a common ancestor
irrespective of their actual similarity or function
most proteins are anciently related but have evolved far
Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 9 / 25
. . . . . .
Problems
Figure: Multiple gene duplications: All are homologs per definition but smallergroups may be more of use.
Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 10 / 25
. . . . . .
Problems
.Information benefit..
.
. ..
.
.
duplications are known to be a major source of innovation in evolution
proteins are homologs per definition, if they have a common ancestor
irrespective of their actual similarity or function
most proteins are anciently related but have evolved far
Up to which point is the homology information useful?
Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 11 / 25
. . . . . .
Conclusion
.Proteinortho approach..
.
. ..
.
.
arose from the same ancestor + similar function ⇒ similar sequence
should return a useful subset of homologs (isofunctional aimed)
reciprocal best blast(s)
Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 12 / 25
. . . . . .
Reciprocal best blast(s) for homologs detection
Figure: Homology detection using blast
Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 13 / 25
. . . . . .
Proteinortho
.Features..
.
. ..
.
.
orthologs and paralogs assignment for proteins/protein coding genes
designed for large-scale application
behaves nicely in memory consumption
capable of distributed computing
Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 14 / 25
. . . . . .
Workflow
Figure: Proteinortho workflow: 1) Reciprocal blasts 2) Transformation into graphrepresentation 3) Coloring and decomposition 4) Reconversion and mapping tospecies with encoded proteins
Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 15 / 25
. . . . . .
Distributed computing
Figure: a) Multiple PCs running Proteinortho, cooperating dynamically using anN-way technique b) Workflow of synchronization
Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 16 / 25
. . . . . .
Challenge
.Application to all bacteria available on NCBI..
.
. ..
.
.
710 species, 1.5 million proteins
took about two weeks on 50 CPU-cores (Intel Xenon 2.33 GHz)
peak of only 2.5 GB RAM, but 300 GB hard disk
Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 17 / 25
. . . . . .
Challenge
.Application to all bacteria available on NCBI..
.
. ..
.
.
710 species, 1.5 million proteins
took about two weeks on 50 CPU-cores (Intel Xenon 2.33 GHz)
peak of only 2.5 GB RAM, but 300 GB hard disk
Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 17 / 25
. . . . . .
Results
400 450 500 550 600 650 700# of species covered
0
25
50
75
100
125
150
175
200
225
250
275
300#
of c
onne
cted
com
pone
nts
originalblastedblasted filtered
Coverage overviewcumulative
Figure: Number of common proteins. Sets with over 5% paralogs where filtered.
Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 18 / 25
. . . . . .
Results
.Common proteins..
.
. ..
.
.
30S ribosomal proteins S2-5, S7, S8, S10-13, S17, S19
50S ribosomal proteins L1-3, L5, L6, L11, L14, L22, L23
tRNA synthetases for seryl, arginyl, phenylalanyl (alpha chain)
preprotein translocase, SecY subunit
peptidase M22, O-sialoglycoprotein endopeptidase
transcription elongation/termination factor NusA
Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 19 / 25
. . . . . .
Annotation pipeline
.Application for annotation..
.
. ..
.
.
in: newly sequenced bacterial genome
out: annotation of protein coding genescandidates for non-coding genes
no previous knowledge required
runs in 10 to 90 minutes
Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 20 / 25
. . . . . .
Relatives discovery
Figure: Relatives detection using reference proteins and tree.
Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 21 / 25
. . . . . .
Relatives discovery with colors
Figure: Advanced relatives detection using colors.
Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 22 / 25
. . . . . .
Seeding
Figure: Pipeline seeding with proteins.
Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 23 / 25
. . . . . .
Pipeline overview
Figure: Pipline seeding with proteins.
Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 24 / 25
. . . . . .
The end
Thank you for listening!
Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 25 / 25
. . . . . .
W M Fitch.Homology a personal view on some of the problems.Trends Genet, 16(5):227–31, May 2000.
Richard Owen, Cooper, and William White.Lectures on the comparative anatomy and physiology of theinvertebrate animals.London :Longman, Brown, Green, and Longmans, 1843.http://www.biodiversitylibrary.org/bibliography/6788.
Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 25 / 25