when biology meets computer science

28
Introduction Tools Research Methods When Biology Meets Computer Science 1 / 28 Patrick E. Meyer Geeks Anonymes - Oct 2016 When Biology Meets Computer Science Patrick E. Meyer Bioinformatics and Systems Biology (BioSys) Lab PhytoSystems Group, Universite de Liege (ULg, Belgium) Geeks Anonymes - Oct 2016

Upload: jeremie-fays

Post on 24-Jan-2017

309 views

Category:

Science


3 download

TRANSCRIPT

Page 1: When Biology Meets Computer Science

Introduction

Tools

Research

Methods

When Biology Meets Computer Science 1 / 28

Patrick E. Meyer Geeks Anonymes - Oct 2016

When Biology Meets Computer Science

Patrick E. Meyer

Bioinformatics and Systems Biology (BioSys) LabPhytoSystems Group, Universite de Liege (ULg, Belgium)

Geeks Anonymes - Oct 2016

Page 2: When Biology Meets Computer Science

Introduction

Tools

Research

Methods

When Biology Meets Computer Science 2 / 28

Patrick E. Meyer Geeks Anonymes - Oct 2016

A bit of context

Thousands of years ago: Animals and plants domestication

Decades ago: ”Artificial natural selection”

Right now: targeted genetic modification...

Page 3: When Biology Meets Computer Science

Introduction

Tools

Research

Methods

When Biology Meets Computer Science 3 / 28

Patrick E. Meyer Geeks Anonymes - Oct 2016

The future of biology: GMO?

Sectors impacted so far: food and meds (dangers?)

However, other industries could benefit from targetedmodifications without much risks

biolixiviation (ex: Codelco with 80% of Cu market)

biofuel (Arthrospira, Chlamydomonas,...)

phytoremediation (water decontamination, soildepollution,...)

Page 4: When Biology Meets Computer Science

Introduction

Tools

Research

Methods

When Biology Meets Computer Science 4 / 28

Patrick E. Meyer Geeks Anonymes - Oct 2016

Circuit

Solutions for a better future:Mutate one gene and create a perfect cell factory!?

Results: the mutant dies in a few hours.

Reason: genes have multiple roles, changing one altersmany functions.

→ The cell is a circuit of interconnected elements(proteins/genes/metabolites/RNA/DNA/...)!

Page 5: When Biology Meets Computer Science

Introduction

Tools

Research

Methods

When Biology Meets Computer Science 5 / 28

Patrick E. Meyer Geeks Anonymes - Oct 2016

Study the circuit

New solutions: Identify

gene, the less involved in other functions.

gene, that can compensate for the first one.

the environmental conditions suited for the mutant.

Page 6: When Biology Meets Computer Science

Introduction

Tools

Research

Methods

When Biology Meets Computer Science 6 / 28

Patrick E. Meyer Geeks Anonymes - Oct 2016

Plant Cell and Motherboard

Could it be similar??

Page 7: When Biology Meets Computer Science

Introduction

Tools

Research

Methods

When Biology Meets Computer Science 7 / 28

Patrick E. Meyer Geeks Anonymes - Oct 2016

DNA vs Machine binary code

Could it be similar??

quaternary vs binary

Page 8: When Biology Meets Computer Science

Introduction

Tools

Research

Methods

When Biology Meets Computer Science 8 / 28

Patrick E. Meyer Geeks Anonymes - Oct 2016

Virus vs Computer virus

Could it be similar??

self-replicating piece of DNA(needs cellular machinery toreproduce)

It infects cells in contact...

self-replicating piece ofbinary code (needs computerarchitecture to reproduce)

It infects connectedcomputers...

Page 9: When Biology Meets Computer Science

Introduction

Tools

Research

Methods

When Biology Meets Computer Science 9 / 28

Patrick E. Meyer Geeks Anonymes - Oct 2016

Cell circuitry

Those similarities on circuitry and code, show thatbiologists might actually be geeks!(even though, most are in denial).

If a cell can be seen as a computer,How efficient is it?

Page 10: When Biology Meets Computer Science

Introduction

Tools

Research

Methods

When Biology Meets Computer Science 10 / 28

Patrick E. Meyer Geeks Anonymes - Oct 2016

Information Content in Human genome

3 billion base pairs

A:00, C:01, G:11, T:10 : 6 billion bits

6000/8 = 750 : 750 million octets (or bytes)

1 cd-rom (750 mb)

Humans have 37 tera (1012) cells... evolving for 80 years.

Page 11: When Biology Meets Computer Science

Introduction

Tools

Research

Methods

When Biology Meets Computer Science 11 / 28

Patrick E. Meyer Geeks Anonymes - Oct 2016

Information Content in Blu-rays

In perspective, 1 blu-ray has 40 gb (60 cd-roms)It codes for 1920X1080 = 2.106: 2M (mega) pic-cells(pixels)... evolving for 180 min.

How 60 times more memory code for 20M times lesscells... evolving for 350k less time??

Super efficient!!

Page 12: When Biology Meets Computer Science

Introduction

Tools

Research

Methods

When Biology Meets Computer Science 12 / 28

Patrick E. Meyer Geeks Anonymes - Oct 2016

DNA computing (or biomolecular)

DNA hard disks: can store 400+ exabytes on 1 gram

DNA computing (1995):

fluorescence to measure if a reaction has taken place (achemical transistor).

many different molecules of DNA can try manypossibilities at once.

Example (2011):

solving the assignement problem.

square root of numbers up to 15 (using 130 unique DNAstrands).

DNA computers

are smaller,

faster for some specialized tasks.

Page 13: When Biology Meets Computer Science

Introduction

Tools

Research

Methods

When Biology Meets Computer Science 13 / 28

Patrick E. Meyer Geeks Anonymes - Oct 2016

What is systems biology?

Systems biology is the computational and mathematicalmodeling of complex biological systems.

It is focused on inference and analysis of large circuits(i.e. graphs representing a system of interconnected elements)

Gene regulatory networks

Protein networks

Metabolic networks

Environmental effectnetworks (i.e. drugs-proteins)

Meta-networks

A science that uses computers to reverse-engineera kind of more advanced computers...

Page 14: When Biology Meets Computer Science

Introduction

Tools

Research

Methods

When Biology Meets Computer Science 14 / 28

Patrick E. Meyer Geeks Anonymes - Oct 2016

What is the future?

30 years from now, given a seed with

the DNA sequence,the composition of the medium/organelles,environmental parameters,

Computers will be able to

grow the organism numerically... in advanced speed.predict with high accuracy, the impact of environmentaland molecular changes.

Numerical experiments (faster and less costly) will becomecommon (as a precursor of lab experiments).

Page 15: When Biology Meets Computer Science

Introduction

Tools

Research

Methods

When Biology Meets Computer Science 15 / 28

Patrick E. Meyer Geeks Anonymes - Oct 2016

Local vs Global Approach to Biology

Mercator map, 16th century,with hand and boat.

One lifetime to produce.

Most accurate Africa map fora century.

Satellite image.

One second to produce.

More informative and moreaccurate.

Page 16: When Biology Meets Computer Science

Introduction

Tools

Research

Methods

When Biology Meets Computer Science 16 / 28

Patrick E. Meyer Geeks Anonymes - Oct 2016

Satellites in Biology

Lots of manipulation(i.e., PCR).

Expression of a few genes.

Robot for RNAextraction.

High throughputSequencing of RNA.

Page 17: When Biology Meets Computer Science

Introduction

Tools

Research

Methods

When Biology Meets Computer Science 17 / 28

Patrick E. Meyer Geeks Anonymes - Oct 2016

Data

Sequencing:

HGP (2003, 15 years, 3 billion $)

NGS (2010, weeks, 5000$)

Technologie Nanopores (2017, 100h, 100$)

New generations of sequencing techniques (NGS)

Lower cost - lower time

precision increases

→ Data flooding!

Page 18: When Biology Meets Computer Science

Introduction

Tools

Research

Methods

When Biology Meets Computer Science 18 / 28

Patrick E. Meyer Geeks Anonymes - Oct 2016

Data flooding

Page 19: When Biology Meets Computer Science

Introduction

Tools

Research

Methods

When Biology Meets Computer Science 19 / 28

Patrick E. Meyer Geeks Anonymes - Oct 2016

Comparative genomics

Study sequences

connect sequences with phenotypes

Study evolution... phylogeny

Homologous genes (depending of method of calculation),roughly

99.5% with other humans

97% with some primates

85% with cats

80% with cows

75% with mice

60% with fruit flies

Page 20: When Biology Meets Computer Science

Introduction

Tools

Research

Methods

When Biology Meets Computer Science 20 / 28

Patrick E. Meyer Geeks Anonymes - Oct 2016

Transcriptomics

DNA→ RNA→ protein

Expression data provides concentrations of RNA(gene activities) in a cell

High noise

Many variables (from 2000 to20000 RNAs)

Few samples (from 50 to 500measures)

Redundancy (co-regulations)

Non-linear interactions

→ scan 0.5 0.9 0.9 ... 0.5

Page 21: When Biology Meets Computer Science

Introduction

Tools

Research

Methods

When Biology Meets Computer Science 21 / 28

Patrick E. Meyer Geeks Anonymes - Oct 2016

Gene Signatures and Machine Learning

Find in the space of 2n combinations of n genes the one thatexplains the phenotype/disease under study the best.

Search method (ex: Genetic algorithm/Ant colonizationalgorithms)

Evaluate combinations: (ex: Neural Networks)

Interests: knowledge representation, diagnostics, pronostics,drug discovery

Page 22: When Biology Meets Computer Science

Introduction

Tools

Research

Methods

When Biology Meets Computer Science 22 / 28

Patrick E. Meyer Geeks Anonymes - Oct 2016

Transcriptional Network

DNA→ RNA→ protein

protein (TF) can fix on DNA and modify RNA production

⇒ Each cell has an encoded network in DNA.

Each node is a gene.

An arc connects aregulator gene (TF)to a regulated one.

Page 23: When Biology Meets Computer Science

Introduction

Tools

Research

Methods

When Biology Meets Computer Science 23 / 28

Patrick E. Meyer Geeks Anonymes - Oct 2016

Co-expression networks

Definition (Mutual Information [Shannon, 1948])

Let Xi and Xj be two (discrete) random variables, the mutualinformation between Xi and Xj is

I(Xi;Xj) =∑xi∈Xi

∑xj∈Xj

p(xi, xj) log

(p(xi, xj)

p(xi)p(xj)

)

Compute MIM: MI (or correlation) for all pair of genes

DATA X1 X2 ... Xn

s 1 0.1 0.9 ... 0.5

... ... ... ... ...

s m 0.2 0.3 ... 0.8

T̂ X1 X2 ... Xn

X1 - 0.2 ... 0.9

X2 0.2 - ... 0.1

... ... ... - ...

Xn 0.9 0.1 ... -

Page 24: When Biology Meets Computer Science

Introduction

Tools

Research

Methods

When Biology Meets Computer Science 24 / 28

Patrick E. Meyer Geeks Anonymes - Oct 2016

RELNET: False Positive Trends

The MIM is the inferred network

False Positive Trends:Assume Xi influence Xj through Xk

Xi ← Xk → Xj

Then I(Xi;Xk) and I(Xk;Xj) will be highbut also I(Xi;Xj), hence it adds a false link between Xi

and Xj .

Page 25: When Biology Meets Computer Science

Introduction

Tools

Research

Methods

When Biology Meets Computer Science 25 / 28

Patrick E. Meyer Geeks Anonymes - Oct 2016

ARACNE

Algorithm for the Reconstruction of Accurate Cellular NEtwork

There are three cases of indirect interaction with threevariables:

Xj → Xk → Xi

Xj ← Xk → Xi

Xj → Xk ← Xi

Whatever the case, I(Xj ;Xi) < I(Xj ;Xk) andI(Xj ;Xi) < I(Xk;Xi) by the data processing inequality

For all triples of genes suppress the weakest link amongthem

Page 26: When Biology Meets Computer Science

Introduction

Tools

Research

Methods

When Biology Meets Computer Science 26 / 28

Patrick E. Meyer Geeks Anonymes - Oct 2016

ARACNE: False Negative Trends

Aracne is O(n3)

False Negative Trends:Assume a triple interaction

Xj↙↘

Xk

↓Xi

The algorithm will suppress a good link.

Page 27: When Biology Meets Computer Science

Introduction

Tools

Research

Methods

When Biology Meets Computer Science 27 / 28

Patrick E. Meyer Geeks Anonymes - Oct 2016

Minet and Infotheo R/Bioconductor package

two discretizations (eq freq, eq width)four fast entropy estimators (sg,empirical,mm, shrink)four fast network inference (relnet,clr,aracne,mrnet)three validation tools (fscores,ROC,PR)modular package (other MIM,RGraphviz)

Page 28: When Biology Meets Computer Science

Introduction

Tools

Research

Methods

When Biology Meets Computer Science 28 / 28

Patrick E. Meyer Geeks Anonymes - Oct 2016

http://www.biosys.ulg.ac.be

http://homepage.meyerp.com

Thank you!

Questions ?