wim de grave: big data in life sciences
DESCRIPTION
Talk by Wim de Grave on the 1st Symposium of Big Data and Public Health, 2013TRANSCRIPT
Big Data in Life Sciences
1st Symposium on Big Data and Public Health
FGV 24/10/2013
Big Data “Big Data in the life sciences sector is now a strategic and operational issue for almost all stakeholders. Capturing, storing, data flow and analysis of information-rich processes affect all aspects of the pharmaceutical and medical device industries but particularly the discovery, research & development stages”. “A strategic shift towards big data and data-driven approaches must be implemented from senior managers and thoroughly rolled-out across the organization”. Capturing Storing Data flow Analysis Summarize Represent
Life Sciences – drug development - Drug discovery process - lead / target identification and research follow-up - Translation to clinical stages - Why & How of implementing big data approaches - Genomic & personalized medicines - combination of biomarkers research and retrospective data analysis, and analysis of current clinical outcomes - Systems Biology & detailed modeling & simulation processes - Better design NMEs, re-engineer and re-initiate previously failed drug programs - Design, collect & manage clinical stage data - Selecting EDC technologies, outsourcing Data Management responsibilities - Retain control over data quality - Real-world big data (Real World Evidence - RWE) for drug safety & surveillance for regulators in post-marketing, feeding back vital insights into mechanisms of action and real-life prescription and use - Understand product health outcome benefits for regulators, payers & other stakeholders: Product’s effectiveness, associated health outcomes and cost effectiveness endpoints - Manage and integrate data generated at all stages of the value chain, from discovery to real-world use after regulatory approval In all fields, the amount of data to be collected and managed has massively increased.
Life Sciences – R&D Genomics – metagenomics: generation of genetic code data Clinical genomics: pharmacogenomics, disease marker genes etc
Expression studies: transcriptomics and micro-array
Protein structure, function and protein-protein interaction (also RNA/DNA/protein;
saccharide, lipids etc)
System Biology – metabolic systems and their regulation, synthetic biology
Mass spectrometry analysis (biomarkers; complex mixtures)
Metabolomics; biodiversity extracts and fractionation
Phylogeny, Networks of life
Clinical Research, epidemiology, models Public health
Scientific Literature and Patents – data and text mining
C. Probst, Fiocruz Paraná
C. Probst, Fiocruz Paraná
aaacgcggaccgcacggtctgataggcaagttccggtatcgctattaccagggcagtcat
cgcttgctgtaaccggttatgggttctgtcgtcaccaacgctatgggcacttcagttggc
atgtttttctgcggataggtagcgatacgctgttgcgtcaccaaattccaaccacagaag
ccggtataccgcgatcggttggtgtgcctgtgtttatgccttaccgtaaggaaagcaaca
ggattaaggcgatagtgcgggtgacttcaatgatcgacgcaccgagccgaccggtcccag
tgtgtatcaacacgtcgctagcgcgggtgtagtcgcgtattgctgctgtagcggtcattg
tcttactgtccatcgacagcgaggatttgagacgcacgatatgtgacaaaatttgagaca
tcgcgaccaagtagtggggaagtgatgtttcatcggaggtctcgtgtcattgtggcttgt
ggtcgttgtctttcgatcttgacactccggcaaaaatatggtttatgccgaaatggccgt
aatcacgggtattgggtgtcggcgccgggaagaattggttgtgttggccggccagtatgt
tgatcgcgtcgggcttgtgggttttgctgatgatctgcagcgttttgccgacgaacggcc
ggagagtagggttcggatcgaactgtgaccggtagatttcctcggatcagaacgaatcgg
aacgattgctttgcgcagatatacaggccatagcgaaggtccggtactatcggtgtgtcg
gtattcgcacgccacgaaaacgttgacctccactcaggcctaaccgttaccgtcaaaagt
ttggatcgccactatacggtgaatatgcgagctacttggctgttgatcaaagtgcttgct
aagcgttggccggcaacaggtagaagcgtggtggcgctcaccagtgatcacacaatgaat
aacctaccctacggggctacgaaagccgtaatagatcgaattgtgcttgctgctgcctac
gcactagggtgttcaagccgtgctcgccaacgtgatcaattcgggcccgggcgacattgg
ctggatgacaccccgacctccagacgcgattaacctctatgcaaccgcccggatgtttag
gaaaccctaaaagacttccaacttggtgtgcgctttctgctgtccgactactggcagtag
gttaacggccagctcatccactgcaacggcagtttctccaaagaccaactccatgtctgc
gttagtgcaacatgcagaaaactatggtatcattcctgttatctcgcattcagctgggct
aagtctggccgcccacggttgtaagcgccgtggcggattgtgcattccggcgctgtcgtc
cgatcgtggcgaagtagtaggcaagcgggaaagaaaagctagaagcaaaaaacagccacg
gacaccgcatcccgctccggtagctataaacactggcagcagaatcatattcgtaacgaa
gtagtcacagttgcccgaaacagcggttgggttggtgatctgcatccgcaggaaatgcgg
atagctttccggtccctggaccaggttaccctgcccggccccatcgtgcacacagcgtgt
attgaaatatcatcttagtatggtagccgctataccaactatgaagtgcccgcactttgt
ggagaaaaagacggctttccagtaagtttggtataaaactgtggttttgacgtggttatc
tagccgatagcggataggttacggactgtgtggacaagaagcgagatcatgggtagtgtg
gccatgccatggtggactagggatcacatgcattcccggttacaattccggttgtgcaga
gctggagggcctgtgcagttaaccgtgttgactcagcagttcatcttccagtgcgaggaa
ctcgtcggacctagttcgtggagtaaacgccgggctcagccggagcttgggcccgtccaa
ggtaatcaagatcgacctgaatagcaggtatgagtcaagttttagctagcggtggaaatc
gagggttccccaaatgcgtaaccactgaaagaataggattaacgcttcggctttcatacc
agcatcgctttcagcgcaattaccttcgacgtgccagaaggaaagtgatagcggtgcaac
gtgattaccacgtgatccagctggacaaagccttagtcctaagattcctgcagaattgaa
gtaattttcagaaactccgcaactggtggcaccgtgaggtgagaaaacggccgtcatcca
atcgccattgttacctatacagattacaggtcggtatgtttaccgtgcggtctgccgccg
aacttcaatagatcggttttgacatgggggaagatccgctgaatctccttgcagtacaga
gtgatcgatgcgcaatcctaatgttgtctaggctaaccagctatcgtcttaagcaatgtt
ctcgtccagtcagacatgttgaagaacgtgtacagatattcgttgtagccaccggtccgc
caagccttaaggcacgtggacaagaagatggtgttgatccagtccggttcgtgattaagc
actactggtaagtaacaactccggcactatctacgaatccgtagaatagtttcataatta
gaaatctgctagcgcttgagcatgtttcggaaagtccaaaactacagtttcaagcacgat
aatcaattcgacaagatatccggttctgtcgctgataacgttgctttgcaacatgatcgg
ttcgaacaacacgcgccacctctctagcagaagatcactttctgcgatctcccaatttgc
ctgcttcgcattaagtacggaagccatctgttcggcatagtcggtgatgtaggcggactg
tttggtgttgaagatagccacaattttctcgacctggagtaaatggttcagtgaattcag
tatcctgccatcgcaccagaggatcgactcgataaaatcagcaagtcacgtcagcgcccg
ttcctgtgtatctgatccaggaacaccatgttgaggtagcgcagcaaatcgtggtacgaa
aatgactgcggcactatacaggtggtcctcatctgagtgatgtatagatgcgcactgtcc
atatgacgttggcgtttctgggtagctaatatacccttggcacccgcaggcatgtcgtag
aaattagataccatgtcgctccgaaagtattgcagtagatgtatacaaacgtggaagaac
taagatgtcaatgatttcaagttgacagggcgagcgtagtttatgttgaaaacctttgct
gtgtagtcagaaactgctgccgtcgagtagctgatcgggctgacgttggggtccgcaggc
tatgctcgtgacgttgagcttgcctttggtttcggtcaggcggtgcttgaccgagttggt
All you wanted to know,
but were afraid to ask...
Adams, M.D., Kelley, J.M., Gocayne, J.D., Dubnick, M., Polymeropoulos, M.H., Xiao, H., Merril, C.R., Wu, A., Olde, B.,
Moreno, R., Kerlavage, A.R., McCombie, W.R., and Venter, J.C. Complementary DNA sequencing: "expressed sequence tags"
and the human genome project. Science 252, 1651-1656 (1991).
12
T. Otto
General Proline and Arginine Metabolism
In silico biochemistry
Metabolism and solute transport of A. fulgidus
Klenk et al, Nature n390, 364 (1997)
rat
Arabidopsis
C. elegans
mouse
human
M. leprae
Drosophila
Vibrio cholera
Plasmodium
Neisseria
Xylella
Rickettsia
Archaeoglobus
M. tuberculosis
Helicobacter
Borrelia
Bacillus
Campilobacter
Aquifex
Chlamidia
Ureaplasma
Thermoplasma
Thermotoga
Pseudomonas
Buchenerasp E. coli
S. cerevisiae
Salmonella
Yersinia
Ralstonia S. pombe