molecular biology in the information era

64
Molecular Biology in the Information Era Winter School 2015 Andrés Aravena, PhD - Istanbul University Department of Molecular Biology and Genetics - 7 March 2015

Upload: andres-aravena

Post on 20-Jul-2015

100 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Molecular biology in the information era

Molecular Biology in theInformation EraWinter School 2015

Andrés Aravena, PhD - Istanbul UniversityDepartment of Molecular Biology and Genetics - 7 March 2015

Page 2: Molecular biology in the information era

My name is Andrés Aravena

Türkçe bilmiorum !

I am

New Assistant Professor at Molecular Biology and GenomicsDepartment

Mathematical Engineer, U. of Chile

PhD Informatics, U Rennes 1, France

PhD Mathematical Modeling, U. of Chile

not a Biologist

but an Applied Mathematician who can speak "biologist language"

·

·

·

·

·

·

3/67

Page 3: Molecular biology in the information era

I will speak about

The Past, Present and Future

Facts, opinion and guess

What I've done beforeso you can understand why I'm here

What I'm doing now at Istanbul University

What I foresee from my "outsider" point of view

·

·

·

4/67

Page 4: Molecular biology in the information era

I've worked on

Big and small computers

Telecommunication Networks

Between 2003 and 2014 I was the chief research engineer

·

·

·

on the main bioinformatic group in my country

in the top research center (CMM)

in the top university (University of Chile)

of my country

-

-

-

-

5/67

Page 5: Molecular biology in the information era

I come from Chile

6/67

Page 6: Molecular biology in the information era

Chile

Small country of ~17 million people

Universities ranks similar to Turkish ones

Spanish colony 500 years ago (so language is Spanish)

Independent Republic 200 years ago

First Latin American country to recognize Turkish republic

OECD member

Everyday life very similar to Turkey

7/67

Page 7: Molecular biology in the information era

Chilean Economy: Exports

1st world producer of copper

2nd world producer of salmon

Fruits: peaches, grapes, apples,avocado

Wine: exported worldwide

Official data for 2014

9/67

Page 8: Molecular biology in the information era

The natural question was

How can we improve theseindustriesusing Molecular Biology and Bioinformatics?

Page 9: Molecular biology in the information era

FruitsPeach and Grapes

Gene expression analysis for industrial applications:

Peach: response to cold stress

Grapefruit: development related to seed and grape size (Sultaniye)

·

·

11/67

Page 10: Molecular biology in the information era

FishesSalmon

Farmed salmons are feed with cheap vegetal proteinBut wild salmons eat animal protein

How is salmon's metabolism affected by the diet?Which genes change their expression because the changes in food?

Gene expression analysis usingmicroarrays

Fish selection for breeding usingmicroarrays (patent pending)

·

·

12/67

Page 11: Molecular biology in the information era

FishesSalmon Genomic Sequence

... and sequencing of whole Salmo salar genome

(10 million dollars project)

13/67

Page 12: Molecular biology in the information era

Wine

Chilean wine travels long distances to final markets

Any yeast contamination means big economic loses(people stops buying all Chilean brands)

Quality control is usually done growing samples for 3 daysBut time is expensive: penalty for shipping delays

We designed qPCR method for rapid detection of yeast contamination

It is currently used by one major wine producer in Chile. It may besold to Roche.

14/67

Page 13: Molecular biology in the information era

Mining industrymolecular biology to extract copper

A little chemistry:Copper is part of a compound, with Sulfur and Iron.Ferric acid separates it.

Cu2S + 4Fe3+ � 2Cu2+ + 4Fe2+ + S

Resulting Cu2+ is soluble and is recovered.

But all Fe3+ transforms to Fe2+ and reaction stops

There are bacteria that "eat" e- and keep the reaction going on

Fe2+ � Fe3+ + e-

15/67

Page 14: Molecular biology in the information era

Why is it important?

The biological method is much better that the standard one

The goal is to understand and improve the involved bacteria so thistechnology can be used extensively

Enables building new mines

It is like discovering petrol reserves for the country

Reduced contamination

Cheaper

·

·

16/67

Page 15: Molecular biology in the information era

Most of the results are still industrial secret

We had a research contract with the main mining company

State owned, big enough to pay for long term research

Few papers, many patents

17/67

Page 16: Molecular biology in the information era

BioidentificationMonitoring the presence of good bacteria

We need to control the "ecosystem" on the mine

Molecular Biology methods are fast, sensible and reliable

They can be used in place: metagenomic approach. No culture

Key problem: Design probes that match a taxonomic branch, not aspecific strain

The probes should be tolerant to mutations that occur inenvironmental samples with many strains

Classical tools don't work on big scales

18/67

Page 17: Molecular biology in the information era

Design of probes for complex samplesI designed and built a solution using a super-computer

Calculation tool one day on 32 processors (one processor month)

Resulting probes worked as expected

They can be used on qPCR or in microarrays.

19/67

Page 18: Molecular biology in the information era

Automatic Interpretation of Resultsusing a Statistical Classification Model

20/67

Page 19: Molecular biology in the information era

Publications

The microarray was published inN. Ehrenfeld, A. Aravena, A. Reyes-Jara, N. Barreto, R. Assar, A. Maass,P. Parada, Design and use of oligonucleotide microarrays for identification

of Biomining microorganisms. Advanced Materials Research 71-73(2009) 155-158.

21/67

Page 20: Molecular biology in the information era

Patents

The method and the probes have been patented in

USA, Number: US 7 853 408 B2, Date: 14/12/2010;

South Africa, Number: 2006/06828, Date: 26/03/2008;

Australia, Number: 2006203551, Date: 15/09/2011;

Mexico, Number: PXMX 32/2006, Date: November 2012.

Peru, Number: PE 5838, Date: 29/10/2010;

Chine, Number: 200810095172.6, Date: 2013;

Chile, Number: DPI-660-2007, Date: 06/05/2013;

Argentina, Number: AR056179

·

·

·

·

·

·

·

·

22/67

Page 21: Molecular biology in the information era

Functional genomicsHow does the bacteria work?

To improve the process we need to see inside the black box. Wesequenced the complete genome of 3 bacteria

We paid over USD $150K. Today is USD $5K

Hint: Sequence assembly requires a big computer. It does not workon a regular PC

Acidithiobacillus ferrooxidans

Acidithiobacillus thiooxidans

Leptospirillum ferrooxidans

·

·

·

23/67

Page 22: Molecular biology in the information era

Modeling MetabolismWe predict which genes codeenzymes

Each enzyme catalyzes a reaction,with a known stoichiometry

Every reaction gives an equation

All equations plus boundaryconditions give model to predictmetabolite concentration

We can predict how the cell adaptsto environmental changes

24/67

Page 23: Molecular biology in the information era

Modeling Regulation

From the genome sequence we can predict which genes code fortranscription factors and they bind

They form a putative regulatory network.

But current methods produce too many false positives

We expected ~4K regulations. We got 25K regulations.

I integrate this model with microarray data to find the "mostprobable" regulatory network using a parsimony criterium

25/67

Page 24: Molecular biology in the information era

Systems Biologybeyond Bioinformatics

A very active research area that aim to understand the cell as asystem with complex interactions

The focus is not on the genes, is on the genome

The key is to understand networks

regulatory

metabolic

signaling

protein-protein-interaction

·

·

·

·

26/67

Page 25: Molecular biology in the information era

The present

Why Computers in MolecularBiology and Genetics?

Page 26: Molecular biology in the information era

DNA is digital information

All experimental values in science are measured with an observationalerror.(e.g. temperature is 10.2 ± 0.05°C, pressure is 101215 ± 125 Pa)

Except genetic sequences: Nucleotides are either A, C, T or G.

There is no "average" or "intermediate case"

So is natural to use computers and information theory to model DNA

but there is another reason ...

28/67

Page 27: Molecular biology in the information era

29/67

Page 28: Molecular biology in the information era

Science converges to Molecular Biology

Physicists, mathematicians, computer scientist and engineers, turnedtheir attention to molecular biology questions.

They come looking with new eyes and creating new theoretical andpractical tools.

Molecular Biology has always interacted with other disciplines

Just consider the word "Biochemistry"

30/67

Page 29: Molecular biology in the information era

Internet makes Molecular Biology theoryaccessible to more people

Before Internet times

top science was accessible only to researchers with money to

finding references took several weeks by regular mail

Professors had the only copy of the textbooks

·

make complex experiments or

buy expensive books and journals

-

-

·

·

31/67

Page 30: Molecular biology in the information era

Today

all journals are accessible on-line

references are download in minutes at low cost

experimental results of each article are also free

·

·

free when the article is Open Access-

·

32/67

Page 31: Molecular biology in the information era

Anyone can analyze this data

Structured data is easy to process to discover new knowledge.

The software for this meta-analysis is also Open Source

Scientist can adapt the program internal code to solve their specificquestion

Anyone can download these programs without cost.

If the analysis requires big computational power you can rent it at lowcost

33/67

Page 32: Molecular biology in the information era

You don't need your own super-computerYou can rent Cloud computers

Companies like Amazon.com and Google sell their spare computerpower at low prices

This enables researchers to carry computations that would beimpossible otherwise.

34/67

Page 33: Molecular biology in the information era

The World is Flat

This democratization of knowledge provides an exciting challenge.

Rich countries have no longer the monopoly of knowledge.

We can be players in the big leagues, on a leveled surface.

We can read the same books and the same articles, use the samemachines and the same programs.

Anyone could make the new scientific breakthrough, either in NewYork, New Delhi or Istanbul.

But the same opportunity presents to everyone else.

35/67

Page 34: Molecular biology in the information era

There are more PhD students than everAnd many of them will be on Molecular Biology

Cyranoski et al. 2011. “Education: The PhD Factory.” Nature 472: 276–79.

36/67

Page 35: Molecular biology in the information era

More players come to the game

Emerging economies push up the number of researchers worldwide

India graduates more than a million engineers each year. Many ofthem in biotechnology

Egypt has 35.000 PhD students and Israel 10.000.

Many of them will find jobs in Molecular Biology companies oracademia

Hays, Thomas. 2011. “PhDs: Israel Also Trains Plenty.” Nature 473 (7347). Nature Publishing Group: 284–84.

37/67

Page 36: Molecular biology in the information era

How will we be different?

Page 37: Molecular biology in the information era

Success of Molecular Biology generates Big Data

Advances in molecular biology technology has produced

They produce

new generation sequencers

microarrays

mass spectrometers

real-time PCR.

·

·

·

·

reproducible experimental results

in big volumes

at low cost

·

·

·39/67

Page 38: Molecular biology in the information era

Data production costs is falling

National Human Genome Research Institute. http://genome.gov/sequencingcosts

40/67

Page 39: Molecular biology in the information era

Extracting Information from Raw DataSurviving the Data Tsunami

In a few years we passed from lack of data to excess of it

We need to learn how to extract biological meaning from big volumesof data

Classical methods are not enough

What is significant? What is the "null hypothesis"?

41/67

Page 40: Molecular biology in the information era

If we don't fully analyze our ownexperimental data, someone elsewill doAnd they will publish it

Page 41: Molecular biology in the information era

The planwhat we will teach

Page 42: Molecular biology in the information era

Teaching "Introduction to Data Science"

The students will learn

how to handle experimental data

how to communicate with scientists of other data-orienteddisciplines

how to produce publication quality reports with reproducibleresults

How to get raw data, extracting relevant information, filter it usingseveral selection criteria.

How to store and retrieve it in efficient and useful ways.

How to transform it, organize it, categorize it, display, show andunderstand the results.

·

·

·

·

·

·

44/67

Page 43: Molecular biology in the information era

Teaching "Scientific Computing"

Teach Python and BioPython to analyze, model, evaluate and predictthe behavior of genomic and molecular biology entities.

The students should be able to interact with high end servers, usecommand line tools and be comfortable in computing environmentsothers than Microsoft Windows.

Tools include Unix command line tools, SQL and the R statisticalpackage.

The student should be able to understand how computer networkswork and what are their limitations.

45/67

Page 44: Molecular biology in the information era

The idea is no to be experts oncomputers, but to have theconcepts and language to work ininterdisciplinary groups

Page 45: Molecular biology in the information era

Let's start learning Data Science

To test these ideas we start next week an

Introduction to Data Science Workshop

The mathematical tools can be explored together with the biologicalcontext, so they make sense and are easier to learn.

I will give you a link at the end of this talk.

If you are interested visit the webpage and send an email.

after all, maybe I'm just crazy

47/67

Page 46: Molecular biology in the information era

Every normal student is capable of good

mathematical reasoning if attention is

directed to activities of his interest

Jean Piaget, 1976Swiss psychologist and philosopher

Page 47: Molecular biology in the information era

A SecretYou can also learn at home

Everything we will show is available on the Internet

You just need to look for it

But it is in English

Translation takes too long

Translated science is obsolete science

49/67

Page 48: Molecular biology in the information era

The FutureMy personal prediction

Page 49: Molecular biology in the information era

It is hard to make predictions, especially

about the future

Danish proverb

Page 50: Molecular biology in the information era

Molecular Biology has become mainstream

Genomic tools are also used outside academia.

Several companies provide "personalized DNA services".

Both offer to trace ancestry and migrations of the human population.Any person can know which are his true origins.

23andMe, partially owned by Google.

The Genographic project, created by the National Geographic Society

and IBM.

·

·

52/67

Page 51: Molecular biology in the information era

Molecular Biology will follow the path ofcomputers

Today PCR thermocyclers are expensive devices found in universitiesand research centers, very much like desktop computers were in the70's and 80's.

Nowadays computers are low-cost and found everywhere.

Will the same happen with PCR?

54/67

Page 52: Molecular biology in the information era

PCR future

Today only a few companies produce PCR thermocyclers, just likesmartphones such as the iPhone and Samsung.

Nevertheless you can see them everywhere.

And this is a big opportunity for creators of software applications.

The value is in the apps. Ask Nokia or Blackberry

55/67

Page 53: Molecular biology in the information era

A computer on every desk and in every

home, all running Microsoft software

Bill Gates,Microsoft’s founding mission.

Page 54: Molecular biology in the information era

PCR is the new PC

Gates set this goal in the late 70's, when it was not obvious if peoplewould even see a computer in their lives.

PCR technology is now in the same state that Personal Computerswere in 1975. If PCR machines become inexpensive,

then who will be making "software apps" for them?

and there is "a PCR on every desk and home",

in hospitals,

restaurants

and high schools,

·

·

·

·

57/67

Page 55: Molecular biology in the information era

If PCR machines are available everywhereapplications can be:

Determining ancestry (e.g. race horses, farm animals, fishes)

Detection of unwanted organisms

Marker-assisted breeding

Food quality control (e.g. in an university canteen)

Security and control of Genetically Modified Organisms

Polymorphism detection

Clinical diagnosis

Personalized medicine

Police forensic analysis

·

·

·

·

·

·

·

·

·58/67

Page 56: Molecular biology in the information era

Software for PCRthe specific parameters of an application

I think we should prepare our students to make these "apps".

They should have easy access to low-cost thermocyclers, use themfrequently and creatively.

Then, like in the computer industry, they may create completely newapplications that we cannot foresee now.

DNA extraction protocols

Primers design

Amplification protocols

Detection methods

·

·

·

·

59/67

Page 57: Molecular biology in the information era

New tools for new science

Page 58: Molecular biology in the information era

New Instruments trigger advances in MolecularBiologyand in other sciences

They are usually named according to their inventor

Galileo created modern science when he made his own telescope

Newton also invented a new kind of telescope, still used today

Bunsen enabled spectrometry analysis with his burner

Svedberg ultracentrifugue (16S)

Sanger DNA sequencing method

Southern blot method for specific DNA detection

PCR to amplify DNA samples

·

·

·

·

·

·

·61/67

Page 59: Molecular biology in the information era

Scientific Instrumentation

I propose to create a course on "Scientific Instrumentation" usinginitially software tools.

Making instruments is now "software", not craftsmanship.

We can understand this with a biological analogy.

Designs in digital files are like genes.

3D printers are like ribosomes, producing physical versions of thedesign.

Online collaboration is like the evolution: designs are changed toimprove their fitness.

·

·

·

62/67

Page 60: Molecular biology in the information era

It is not rocket science

Page 61: Molecular biology in the information era

It is not heart surgery

Page 62: Molecular biology in the information era
Page 63: Molecular biology in the information era

Teşekkür Ederim

[email protected]

Page 64: Molecular biology in the information era

http://anaraven.github.io/data-science-workshop/