Big data debunking some of the myths

Download Big data debunking some of the myths

Post on 17-Jul-2015

141 views

Category:

Software

4 download

Embed Size (px)

TRANSCRIPT

<ul><li><p>copyright 2015 </p><p>Big Data: debunking some of the myths </p><p>Chris Swan @cpswan </p></li><li><p>copyright 2015 </p><p>Agenda My background </p><p> What do I mean by big data? </p><p> Know your algorithm </p><p> Know your data </p><p> Performance </p></li><li><p>copyright 2015 </p><p>My background CTO CTO Client Experience Co-head CTO Security Corporate Finance fintech, early stage IT R&amp;D Networks and security Grid, app server engineering Combat System Engineer </p></li><li><p>copyright 2015 </p><p>Recent adventure with Big Data </p></li><li><p>copyright 2015 </p><p>Misquoting Roger Needham </p><p>Whoever thinks their analytics problem is solved by big data, </p><p>doesnt understand their analytics problem and doesnt understand </p><p>big data </p><p>5 </p></li><li><p>copyright 2015 </p><p>What do I mean by big data? </p></li><li><p>copyright 2015 </p><p>Overview </p><p>7 </p><p>Based on a blog post from April 2012 http://is.gd/swbdla </p><p>Problem Types </p><p>Algorithm Complexity </p><p>Dat</p><p>a Vo</p><p>lum</p><p>e </p><p>Simple </p><p>Big Data </p><p>Quant </p></li><li><p>copyright 2015 </p><p>Simple problems </p><p>8 </p><p>Low data volume, low algorithm complexity </p><p>Problem Types </p><p>Algorithm Complexity </p><p>Dat</p><p>a Vo</p><p>lum</p><p>e </p><p>Simple </p><p>Big Data </p><p>Quant </p></li><li><p>copyright 2015 </p><p>Quant Problems </p><p>9 </p><p>Any data volume, high algorithm complexity </p><p>Problem Types </p><p>Algorithm Complexity </p><p>Dat</p><p>a Vo</p><p>lum</p><p>e </p><p>Simple </p><p>Big Data </p><p>Quant </p></li><li><p>copyright 2015 </p><p>Big Data Problems </p><p>10 </p><p>High data volume, low algorithm complexity </p><p>Problem Types </p><p>Algorithm Complexity </p><p>Dat</p><p>a Vo</p><p>lum</p><p>e </p><p>Simple </p><p>Big Data </p><p>Quant </p><p>Types of Big Data Problem: </p><p>1. Inherent </p><p>2. More data gives better result than more complex </p><p>algorithm </p></li><li><p>copyright 2015 11 </p><p>Good - Lots of new tools, mostly open source </p><p>Bad - Term being abused by marketing departments </p><p> Ugly </p><p>- Can easily lead to over reliance on systems that lack transparency and ignore specific data points 'Computer says no', but nobody can explain why </p><p>The good, the bad and the ugly of Big Data </p></li><li><p>copyright 2015 </p><p>Its important to know your algorithms </p></li><li><p>copyright 2015 </p><p>Turning an assumption into a line </p></li><li><p>copyright 2015 </p><p>There are lots of algorithms to understand </p></li><li><p>copyright 2015 </p><p>Statisticians </p></li><li><p>copyright 2015 </p><p>Quants </p></li><li><p>copyright 2015 </p><p>Data scientist </p></li><li><p>copyright 2015 </p><p>Its also important to know your data </p></li><li><p>copyright 2015 </p><p>Whatever we call our experts </p></li><li><p>copyright 2015 </p><p>Whos heard of Anscombes quartet? </p></li><li><p>copyright 2015 </p><p>Same statistical properties, but </p><p>http://en.wikipedia.org/wiki/Anscombe's_quartet </p></li><li><p>copyright 2015 </p><p>Performance </p></li><li><p>copyright 2015 </p><p>Dont agonise over distros </p><p>The performance of Hadoop distros are all the same to within 1 server </p><p>within a cluster </p><p>Stefan Groschupf One of the creators of Hadoop </p></li><li><p>copyright 2015 </p><p>Small is still beautiful </p></li><li><p>copyright 2015 </p><p>Because latency </p></li><li><p>copyright 2015 </p><p>In terms of distance </p><p>http://loci.cs.utk.edu/dsi/netstore99/docs/presentations/keynote/sld023.htm </p></li><li><p>copyright 2015 </p><p>Interactive &gt; Real time </p></li><li><p>copyright 2015 </p><p>Questions? </p><p>Big Data:debunking some of the mythsSlide Number 2My backgroundRecent adventure with Big DataMisquoting Roger NeedhamWhat do I mean by big data?OverviewSimple problemsQuant ProblemsBig Data ProblemsThe good, the bad and the ugly of Big DataIts important to know your algorithmsTurning an assumption into a lineThere are lots of algorithms to understandStatisticiansQuantsData scientistIts also important to know your dataWhatever we call our expertsWhos heard of Anscombes quartet?Same statistical properties, butPerformanceDont agonise over distrosSmall is still beautifulBecause latencyIn terms of distanceInteractive &gt; Real timeQuestions?</p></li></ul>