taken some of the hype out of big data again - medtech pharma, nürnberg july 2014
DESCRIPTION
I was invitted to redo the talk about Big Data i did in Berlin earlier this year - slides also here. Slides are similar but updated to reflect my new company and some slides are new. EnjoyTRANSCRIPT
MedTech PharmaNürnberg 2014
Taking (some of) the mystery out of Big Data
Contact
Claus Stie Kallesøe
Founder, CEO
+45 30 14 15 36
Introduction
Big Data –Either VERY large datasets AND/OR other complexities
Characteristics of big data
Source: IBM methodology
A couple of words about scale• 100’s of Megabytes
• This should not be a problem. Can be handled with Matlab, R, Ruby
• 100/500 Gigabytes – 1Terabyte• 2 Terabyte harddrives can be bought in the local shop for €100
• Connect it to your laptop and install postgresql or a no-sql database on it
• > 5 Terabytes• Now you might have a size issue
Inspired by: http://www.chrisstucchio.com/blog/2013/hadoop_hatred.html
Big Data - “Definition”
"Big Data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization."
Cool, but remember where we are!Gartner Hype Cycle 2013
Big Data in Pharma R&D
What is Big Data in Pharma R&D?• Many ideas/possibilities across Pharma R&D and market
access• But many of them are likley NOT “real” Big Data problems!
• Are they relevant and can they bring insights?• Yes, very much so
• Should we than find a way to handle them?• Absolutely
Disclaimer
• I am a (web) tech geek• I have nothing against new technologies
• Like many other geeks I like it
• But do try to use the right tool for the right job
http://blog.mongohq.com/you-dont-have-big-data/
Another great tool - for some
Q: “Could you help me get to Nürnberg, pls?”A: “Yes, absolutely. Not a problem”
Q: “Ok, btw I want to try the Endeavour A: “...ahh why?”
Q: “Because I have read it’s great”A: “Yes, but the ICE….”
MapReduce explained in 41 wordsGoal: Count the number of books in the library.
Map: You count up shelf #1, I count up shelf #2.
(The more people we get, the faster this part goes. )
Reduce: We all get together and add up our individual counts.
http://www.chrisstucchio.com/blog/2011/mapreduce_explained.html
What is it then? Linked data?
Does it matter what it is?
No!
It’s data - and potential analytics (business) opportunities.
Size and complexity should drive the technology
TechnologiesCan we do anything on our own
For many people/companies”Big data technology” is a black box
”A lot of stuff”
And then the vendors go:If
{ box = magic or money}then
{ box = expensive}
Working within a communityA lot of tools available
From: ttp://people10.com/blog/ruby-on-rails-the-popular-platform-for-web-development/
New visualisations – easy and free
http://philogb.github.io/jit/demos.html
Automated calculations - can bring you far
Job submitted to asynccalculation server
https://circleci.com/
Also a lot of great tools to handle data
Elasticsearch text indexes
• Indexed research assay metadata=> Google like search to find the relevant assay
• Indexed sharepoint project workspaces=> Enable easy, fast cross project queries to find trends
Conclusion – Big data in Pharma R&D• Many opportunities across R&D and market access
• More data linking and data analytics than Big Data
• You can use freely available tools on ”normal” hardware
• No magic ”Under the hood” – it’s just data
BUT you still need to define the questions you
want to answer – before diving into technology!