when biology meets computer science
TRANSCRIPT
Introduction
Tools
Research
Methods
When Biology Meets Computer Science 1 / 28
Patrick E. Meyer Geeks Anonymes - Oct 2016
When Biology Meets Computer Science
Patrick E. Meyer
Bioinformatics and Systems Biology (BioSys) LabPhytoSystems Group, Universite de Liege (ULg, Belgium)
Geeks Anonymes - Oct 2016
Introduction
Tools
Research
Methods
When Biology Meets Computer Science 2 / 28
Patrick E. Meyer Geeks Anonymes - Oct 2016
A bit of context
Thousands of years ago: Animals and plants domestication
Decades ago: ”Artificial natural selection”
Right now: targeted genetic modification...
Introduction
Tools
Research
Methods
When Biology Meets Computer Science 3 / 28
Patrick E. Meyer Geeks Anonymes - Oct 2016
The future of biology: GMO?
Sectors impacted so far: food and meds (dangers?)
However, other industries could benefit from targetedmodifications without much risks
biolixiviation (ex: Codelco with 80% of Cu market)
biofuel (Arthrospira, Chlamydomonas,...)
phytoremediation (water decontamination, soildepollution,...)
Introduction
Tools
Research
Methods
When Biology Meets Computer Science 4 / 28
Patrick E. Meyer Geeks Anonymes - Oct 2016
Circuit
Solutions for a better future:Mutate one gene and create a perfect cell factory!?
Results: the mutant dies in a few hours.
Reason: genes have multiple roles, changing one altersmany functions.
→ The cell is a circuit of interconnected elements(proteins/genes/metabolites/RNA/DNA/...)!
Introduction
Tools
Research
Methods
When Biology Meets Computer Science 5 / 28
Patrick E. Meyer Geeks Anonymes - Oct 2016
Study the circuit
New solutions: Identify
gene, the less involved in other functions.
gene, that can compensate for the first one.
the environmental conditions suited for the mutant.
Introduction
Tools
Research
Methods
When Biology Meets Computer Science 6 / 28
Patrick E. Meyer Geeks Anonymes - Oct 2016
Plant Cell and Motherboard
Could it be similar??
Introduction
Tools
Research
Methods
When Biology Meets Computer Science 7 / 28
Patrick E. Meyer Geeks Anonymes - Oct 2016
DNA vs Machine binary code
Could it be similar??
quaternary vs binary
Introduction
Tools
Research
Methods
When Biology Meets Computer Science 8 / 28
Patrick E. Meyer Geeks Anonymes - Oct 2016
Virus vs Computer virus
Could it be similar??
self-replicating piece of DNA(needs cellular machinery toreproduce)
It infects cells in contact...
self-replicating piece ofbinary code (needs computerarchitecture to reproduce)
It infects connectedcomputers...
Introduction
Tools
Research
Methods
When Biology Meets Computer Science 9 / 28
Patrick E. Meyer Geeks Anonymes - Oct 2016
Cell circuitry
Those similarities on circuitry and code, show thatbiologists might actually be geeks!(even though, most are in denial).
If a cell can be seen as a computer,How efficient is it?
Introduction
Tools
Research
Methods
When Biology Meets Computer Science 10 / 28
Patrick E. Meyer Geeks Anonymes - Oct 2016
Information Content in Human genome
3 billion base pairs
A:00, C:01, G:11, T:10 : 6 billion bits
6000/8 = 750 : 750 million octets (or bytes)
1 cd-rom (750 mb)
Humans have 37 tera (1012) cells... evolving for 80 years.
Introduction
Tools
Research
Methods
When Biology Meets Computer Science 11 / 28
Patrick E. Meyer Geeks Anonymes - Oct 2016
Information Content in Blu-rays
In perspective, 1 blu-ray has 40 gb (60 cd-roms)It codes for 1920X1080 = 2.106: 2M (mega) pic-cells(pixels)... evolving for 180 min.
How 60 times more memory code for 20M times lesscells... evolving for 350k less time??
Super efficient!!
Introduction
Tools
Research
Methods
When Biology Meets Computer Science 12 / 28
Patrick E. Meyer Geeks Anonymes - Oct 2016
DNA computing (or biomolecular)
DNA hard disks: can store 400+ exabytes on 1 gram
DNA computing (1995):
fluorescence to measure if a reaction has taken place (achemical transistor).
many different molecules of DNA can try manypossibilities at once.
Example (2011):
solving the assignement problem.
square root of numbers up to 15 (using 130 unique DNAstrands).
DNA computers
are smaller,
faster for some specialized tasks.
Introduction
Tools
Research
Methods
When Biology Meets Computer Science 13 / 28
Patrick E. Meyer Geeks Anonymes - Oct 2016
What is systems biology?
Systems biology is the computational and mathematicalmodeling of complex biological systems.
It is focused on inference and analysis of large circuits(i.e. graphs representing a system of interconnected elements)
Gene regulatory networks
Protein networks
Metabolic networks
Environmental effectnetworks (i.e. drugs-proteins)
Meta-networks
A science that uses computers to reverse-engineera kind of more advanced computers...
Introduction
Tools
Research
Methods
When Biology Meets Computer Science 14 / 28
Patrick E. Meyer Geeks Anonymes - Oct 2016
What is the future?
30 years from now, given a seed with
the DNA sequence,the composition of the medium/organelles,environmental parameters,
Computers will be able to
grow the organism numerically... in advanced speed.predict with high accuracy, the impact of environmentaland molecular changes.
Numerical experiments (faster and less costly) will becomecommon (as a precursor of lab experiments).
Introduction
Tools
Research
Methods
When Biology Meets Computer Science 15 / 28
Patrick E. Meyer Geeks Anonymes - Oct 2016
Local vs Global Approach to Biology
Mercator map, 16th century,with hand and boat.
One lifetime to produce.
Most accurate Africa map fora century.
Satellite image.
One second to produce.
More informative and moreaccurate.
Introduction
Tools
Research
Methods
When Biology Meets Computer Science 16 / 28
Patrick E. Meyer Geeks Anonymes - Oct 2016
Satellites in Biology
Lots of manipulation(i.e., PCR).
Expression of a few genes.
Robot for RNAextraction.
High throughputSequencing of RNA.
Introduction
Tools
Research
Methods
When Biology Meets Computer Science 17 / 28
Patrick E. Meyer Geeks Anonymes - Oct 2016
Data
Sequencing:
HGP (2003, 15 years, 3 billion $)
NGS (2010, weeks, 5000$)
Technologie Nanopores (2017, 100h, 100$)
New generations of sequencing techniques (NGS)
Lower cost - lower time
precision increases
→ Data flooding!
Introduction
Tools
Research
Methods
When Biology Meets Computer Science 18 / 28
Patrick E. Meyer Geeks Anonymes - Oct 2016
Data flooding
Introduction
Tools
Research
Methods
When Biology Meets Computer Science 19 / 28
Patrick E. Meyer Geeks Anonymes - Oct 2016
Comparative genomics
Study sequences
connect sequences with phenotypes
Study evolution... phylogeny
Homologous genes (depending of method of calculation),roughly
99.5% with other humans
97% with some primates
85% with cats
80% with cows
75% with mice
60% with fruit flies
Introduction
Tools
Research
Methods
When Biology Meets Computer Science 20 / 28
Patrick E. Meyer Geeks Anonymes - Oct 2016
Transcriptomics
DNA→ RNA→ protein
Expression data provides concentrations of RNA(gene activities) in a cell
High noise
Many variables (from 2000 to20000 RNAs)
Few samples (from 50 to 500measures)
Redundancy (co-regulations)
Non-linear interactions
→ scan 0.5 0.9 0.9 ... 0.5
Introduction
Tools
Research
Methods
When Biology Meets Computer Science 21 / 28
Patrick E. Meyer Geeks Anonymes - Oct 2016
Gene Signatures and Machine Learning
Find in the space of 2n combinations of n genes the one thatexplains the phenotype/disease under study the best.
Search method (ex: Genetic algorithm/Ant colonizationalgorithms)
Evaluate combinations: (ex: Neural Networks)
Interests: knowledge representation, diagnostics, pronostics,drug discovery
Introduction
Tools
Research
Methods
When Biology Meets Computer Science 22 / 28
Patrick E. Meyer Geeks Anonymes - Oct 2016
Transcriptional Network
DNA→ RNA→ protein
protein (TF) can fix on DNA and modify RNA production
⇒ Each cell has an encoded network in DNA.
Each node is a gene.
An arc connects aregulator gene (TF)to a regulated one.
Introduction
Tools
Research
Methods
When Biology Meets Computer Science 23 / 28
Patrick E. Meyer Geeks Anonymes - Oct 2016
Co-expression networks
Definition (Mutual Information [Shannon, 1948])
Let Xi and Xj be two (discrete) random variables, the mutualinformation between Xi and Xj is
I(Xi;Xj) =∑xi∈Xi
∑xj∈Xj
p(xi, xj) log
(p(xi, xj)
p(xi)p(xj)
)
Compute MIM: MI (or correlation) for all pair of genes
DATA X1 X2 ... Xn
s 1 0.1 0.9 ... 0.5
... ... ... ... ...
s m 0.2 0.3 ... 0.8
⇒
T̂ X1 X2 ... Xn
X1 - 0.2 ... 0.9
X2 0.2 - ... 0.1
... ... ... - ...
Xn 0.9 0.1 ... -
Introduction
Tools
Research
Methods
When Biology Meets Computer Science 24 / 28
Patrick E. Meyer Geeks Anonymes - Oct 2016
RELNET: False Positive Trends
The MIM is the inferred network
False Positive Trends:Assume Xi influence Xj through Xk
Xi ← Xk → Xj
Then I(Xi;Xk) and I(Xk;Xj) will be highbut also I(Xi;Xj), hence it adds a false link between Xi
and Xj .
Introduction
Tools
Research
Methods
When Biology Meets Computer Science 25 / 28
Patrick E. Meyer Geeks Anonymes - Oct 2016
ARACNE
Algorithm for the Reconstruction of Accurate Cellular NEtwork
There are three cases of indirect interaction with threevariables:
Xj → Xk → Xi
Xj ← Xk → Xi
Xj → Xk ← Xi
Whatever the case, I(Xj ;Xi) < I(Xj ;Xk) andI(Xj ;Xi) < I(Xk;Xi) by the data processing inequality
For all triples of genes suppress the weakest link amongthem
Introduction
Tools
Research
Methods
When Biology Meets Computer Science 26 / 28
Patrick E. Meyer Geeks Anonymes - Oct 2016
ARACNE: False Negative Trends
Aracne is O(n3)
False Negative Trends:Assume a triple interaction
Xj↙↘
Xk
↓Xi
The algorithm will suppress a good link.
Introduction
Tools
Research
Methods
When Biology Meets Computer Science 27 / 28
Patrick E. Meyer Geeks Anonymes - Oct 2016
Minet and Infotheo R/Bioconductor package
two discretizations (eq freq, eq width)four fast entropy estimators (sg,empirical,mm, shrink)four fast network inference (relnet,clr,aracne,mrnet)three validation tools (fscores,ROC,PR)modular package (other MIM,RGraphviz)
Introduction
Tools
Research
Methods
When Biology Meets Computer Science 28 / 28
Patrick E. Meyer Geeks Anonymes - Oct 2016
http://www.biosys.ulg.ac.be
http://homepage.meyerp.com
Thank you!
Questions ?