wim de grave: big data in life sciences

15
Big Data in Life Sciences 1st Symposium on Big Data and Public Health FGV 24/10/2013

Upload: flavio-codeco-coelho

Post on 13-May-2015

1.118 views

Category:

Technology


2 download

DESCRIPTION

Talk by Wim de Grave on the 1st Symposium of Big Data and Public Health, 2013

TRANSCRIPT

Page 1: Wim de Grave:  Big Data in life sciences

Big Data in Life Sciences

1st Symposium on Big Data and Public Health

FGV 24/10/2013

Page 2: Wim de Grave:  Big Data in life sciences

Big Data “Big Data in the life sciences sector is now a strategic and operational issue for almost all stakeholders. Capturing, storing, data flow and analysis of information-rich processes affect all aspects of the pharmaceutical and medical device industries but particularly the discovery, research & development stages”. “A strategic shift towards big data and data-driven approaches must be implemented from senior managers and thoroughly rolled-out across the organization”. Capturing Storing Data flow Analysis Summarize Represent

Page 3: Wim de Grave:  Big Data in life sciences

Life Sciences – drug development - Drug discovery process - lead / target identification and research follow-up - Translation to clinical stages - Why & How of implementing big data approaches - Genomic & personalized medicines - combination of biomarkers research and retrospective data analysis, and analysis of current clinical outcomes - Systems Biology & detailed modeling & simulation processes - Better design NMEs, re-engineer and re-initiate previously failed drug programs - Design, collect & manage clinical stage data - Selecting EDC technologies, outsourcing Data Management responsibilities - Retain control over data quality - Real-world big data (Real World Evidence - RWE) for drug safety & surveillance for regulators in post-marketing, feeding back vital insights into mechanisms of action and real-life prescription and use - Understand product health outcome benefits for regulators, payers & other stakeholders: Product’s effectiveness, associated health outcomes and cost effectiveness endpoints - Manage and integrate data generated at all stages of the value chain, from discovery to real-world use after regulatory approval In all fields, the amount of data to be collected and managed has massively increased.

Page 4: Wim de Grave:  Big Data in life sciences

Life Sciences – R&D Genomics – metagenomics: generation of genetic code data Clinical genomics: pharmacogenomics, disease marker genes etc

Expression studies: transcriptomics and micro-array

Protein structure, function and protein-protein interaction (also RNA/DNA/protein;

saccharide, lipids etc)

System Biology – metabolic systems and their regulation, synthetic biology

Mass spectrometry analysis (biomarkers; complex mixtures)

Metabolomics; biodiversity extracts and fractionation

Phylogeny, Networks of life

Clinical Research, epidemiology, models Public health

Scientific Literature and Patents – data and text mining

Page 5: Wim de Grave:  Big Data in life sciences
Page 6: Wim de Grave:  Big Data in life sciences
Page 7: Wim de Grave:  Big Data in life sciences
Page 8: Wim de Grave:  Big Data in life sciences

C. Probst, Fiocruz Paraná

Page 9: Wim de Grave:  Big Data in life sciences

C. Probst, Fiocruz Paraná

Page 10: Wim de Grave:  Big Data in life sciences

aaacgcggaccgcacggtctgataggcaagttccggtatcgctattaccagggcagtcat

cgcttgctgtaaccggttatgggttctgtcgtcaccaacgctatgggcacttcagttggc

atgtttttctgcggataggtagcgatacgctgttgcgtcaccaaattccaaccacagaag

ccggtataccgcgatcggttggtgtgcctgtgtttatgccttaccgtaaggaaagcaaca

ggattaaggcgatagtgcgggtgacttcaatgatcgacgcaccgagccgaccggtcccag

tgtgtatcaacacgtcgctagcgcgggtgtagtcgcgtattgctgctgtagcggtcattg

tcttactgtccatcgacagcgaggatttgagacgcacgatatgtgacaaaatttgagaca

tcgcgaccaagtagtggggaagtgatgtttcatcggaggtctcgtgtcattgtggcttgt

ggtcgttgtctttcgatcttgacactccggcaaaaatatggtttatgccgaaatggccgt

aatcacgggtattgggtgtcggcgccgggaagaattggttgtgttggccggccagtatgt

tgatcgcgtcgggcttgtgggttttgctgatgatctgcagcgttttgccgacgaacggcc

ggagagtagggttcggatcgaactgtgaccggtagatttcctcggatcagaacgaatcgg

aacgattgctttgcgcagatatacaggccatagcgaaggtccggtactatcggtgtgtcg

gtattcgcacgccacgaaaacgttgacctccactcaggcctaaccgttaccgtcaaaagt

ttggatcgccactatacggtgaatatgcgagctacttggctgttgatcaaagtgcttgct

aagcgttggccggcaacaggtagaagcgtggtggcgctcaccagtgatcacacaatgaat

aacctaccctacggggctacgaaagccgtaatagatcgaattgtgcttgctgctgcctac

gcactagggtgttcaagccgtgctcgccaacgtgatcaattcgggcccgggcgacattgg

ctggatgacaccccgacctccagacgcgattaacctctatgcaaccgcccggatgtttag

gaaaccctaaaagacttccaacttggtgtgcgctttctgctgtccgactactggcagtag

gttaacggccagctcatccactgcaacggcagtttctccaaagaccaactccatgtctgc

gttagtgcaacatgcagaaaactatggtatcattcctgttatctcgcattcagctgggct

aagtctggccgcccacggttgtaagcgccgtggcggattgtgcattccggcgctgtcgtc

cgatcgtggcgaagtagtaggcaagcgggaaagaaaagctagaagcaaaaaacagccacg

gacaccgcatcccgctccggtagctataaacactggcagcagaatcatattcgtaacgaa

gtagtcacagttgcccgaaacagcggttgggttggtgatctgcatccgcaggaaatgcgg

atagctttccggtccctggaccaggttaccctgcccggccccatcgtgcacacagcgtgt

attgaaatatcatcttagtatggtagccgctataccaactatgaagtgcccgcactttgt

ggagaaaaagacggctttccagtaagtttggtataaaactgtggttttgacgtggttatc

tagccgatagcggataggttacggactgtgtggacaagaagcgagatcatgggtagtgtg

gccatgccatggtggactagggatcacatgcattcccggttacaattccggttgtgcaga

gctggagggcctgtgcagttaaccgtgttgactcagcagttcatcttccagtgcgaggaa

ctcgtcggacctagttcgtggagtaaacgccgggctcagccggagcttgggcccgtccaa

ggtaatcaagatcgacctgaatagcaggtatgagtcaagttttagctagcggtggaaatc

gagggttccccaaatgcgtaaccactgaaagaataggattaacgcttcggctttcatacc

agcatcgctttcagcgcaattaccttcgacgtgccagaaggaaagtgatagcggtgcaac

gtgattaccacgtgatccagctggacaaagccttagtcctaagattcctgcagaattgaa

gtaattttcagaaactccgcaactggtggcaccgtgaggtgagaaaacggccgtcatcca

atcgccattgttacctatacagattacaggtcggtatgtttaccgtgcggtctgccgccg

aacttcaatagatcggttttgacatgggggaagatccgctgaatctccttgcagtacaga

gtgatcgatgcgcaatcctaatgttgtctaggctaaccagctatcgtcttaagcaatgtt

ctcgtccagtcagacatgttgaagaacgtgtacagatattcgttgtagccaccggtccgc

caagccttaaggcacgtggacaagaagatggtgttgatccagtccggttcgtgattaagc

actactggtaagtaacaactccggcactatctacgaatccgtagaatagtttcataatta

gaaatctgctagcgcttgagcatgtttcggaaagtccaaaactacagtttcaagcacgat

aatcaattcgacaagatatccggttctgtcgctgataacgttgctttgcaacatgatcgg

ttcgaacaacacgcgccacctctctagcagaagatcactttctgcgatctcccaatttgc

ctgcttcgcattaagtacggaagccatctgttcggcatagtcggtgatgtaggcggactg

tttggtgttgaagatagccacaattttctcgacctggagtaaatggttcagtgaattcag

tatcctgccatcgcaccagaggatcgactcgataaaatcagcaagtcacgtcagcgcccg

ttcctgtgtatctgatccaggaacaccatgttgaggtagcgcagcaaatcgtggtacgaa

aatgactgcggcactatacaggtggtcctcatctgagtgatgtatagatgcgcactgtcc

atatgacgttggcgtttctgggtagctaatatacccttggcacccgcaggcatgtcgtag

aaattagataccatgtcgctccgaaagtattgcagtagatgtatacaaacgtggaagaac

taagatgtcaatgatttcaagttgacagggcgagcgtagtttatgttgaaaacctttgct

gtgtagtcagaaactgctgccgtcgagtagctgatcgggctgacgttggggtccgcaggc

tatgctcgtgacgttgagcttgcctttggtttcggtcaggcggtgcttgaccgagttggt

All you wanted to know,

but were afraid to ask...

Page 11: Wim de Grave:  Big Data in life sciences

Adams, M.D., Kelley, J.M., Gocayne, J.D., Dubnick, M., Polymeropoulos, M.H., Xiao, H., Merril, C.R., Wu, A., Olde, B.,

Moreno, R., Kerlavage, A.R., McCombie, W.R., and Venter, J.C. Complementary DNA sequencing: "expressed sequence tags"

and the human genome project. Science 252, 1651-1656 (1991).

Page 12: Wim de Grave:  Big Data in life sciences

12

T. Otto

Page 13: Wim de Grave:  Big Data in life sciences

General Proline and Arginine Metabolism

Page 14: Wim de Grave:  Big Data in life sciences

In silico biochemistry

Metabolism and solute transport of A. fulgidus

Klenk et al, Nature n390, 364 (1997)

Page 15: Wim de Grave:  Big Data in life sciences

rat

Arabidopsis

C. elegans

mouse

human

M. leprae

Drosophila

Vibrio cholera

Plasmodium

Neisseria

Xylella

Rickettsia

Archaeoglobus

M. tuberculosis

Helicobacter

Borrelia

Bacillus

Campilobacter

Aquifex

Chlamidia

Ureaplasma

Thermoplasma

Thermotoga

Pseudomonas

Buchenerasp E. coli

S. cerevisiae

Salmonella

Yersinia

Ralstonia S. pombe