big data debunking some of the myths

Download Big data debunking some of the myths

Post on 17-Jul-2015

141 views

Category:

Software

4 download

Embed Size (px)

TRANSCRIPT

  • copyright 2015

    Big Data: debunking some of the myths

    Chris Swan @cpswan

  • copyright 2015

    Agenda My background

    What do I mean by big data?

    Know your algorithm

    Know your data

    Performance

  • copyright 2015

    My background CTO CTO Client Experience Co-head CTO Security Corporate Finance fintech, early stage IT R&D Networks and security Grid, app server engineering Combat System Engineer

  • copyright 2015

    Recent adventure with Big Data

  • copyright 2015

    Misquoting Roger Needham

    Whoever thinks their analytics problem is solved by big data,

    doesnt understand their analytics problem and doesnt understand

    big data

    5

  • copyright 2015

    What do I mean by big data?

  • copyright 2015

    Overview

    7

    Based on a blog post from April 2012 http://is.gd/swbdla

    Problem Types

    Algorithm Complexity

    Dat

    a Vo

    lum

    e

    Simple

    Big Data

    Quant

  • copyright 2015

    Simple problems

    8

    Low data volume, low algorithm complexity

    Problem Types

    Algorithm Complexity

    Dat

    a Vo

    lum

    e

    Simple

    Big Data

    Quant

  • copyright 2015

    Quant Problems

    9

    Any data volume, high algorithm complexity

    Problem Types

    Algorithm Complexity

    Dat

    a Vo

    lum

    e

    Simple

    Big Data

    Quant

  • copyright 2015

    Big Data Problems

    10

    High data volume, low algorithm complexity

    Problem Types

    Algorithm Complexity

    Dat

    a Vo

    lum

    e

    Simple

    Big Data

    Quant

    Types of Big Data Problem:

    1. Inherent

    2. More data gives better result than more complex

    algorithm

  • copyright 2015 11

    Good - Lots of new tools, mostly open source

    Bad - Term being abused by marketing departments

    Ugly

    - Can easily lead to over reliance on systems that lack transparency and ignore specific data points 'Computer says no', but nobody can explain why

    The good, the bad and the ugly of Big Data

  • copyright 2015

    Its important to know your algorithms

  • copyright 2015

    Turning an assumption into a line

  • copyright 2015

    There are lots of algorithms to understand

  • copyright 2015

    Statisticians

  • copyright 2015

    Quants

  • copyright 2015

    Data scientist

  • copyright 2015

    Its also important to know your data

  • copyright 2015

    Whatever we call our experts

  • copyright 2015

    Whos heard of Anscombes quartet?

  • copyright 2015

    Same statistical properties, but

    http://en.wikipedia.org/wiki/Anscombe's_quartet

  • copyright 2015

    Performance

  • copyright 2015

    Dont agonise over distros

    The performance of Hadoop distros are all the same to within 1 server

    within a cluster

    Stefan Groschupf One of the creators of Hadoop

  • copyright 2015

    Small is still beautiful

  • copyright 2015

    Because latency

  • copyright 2015

    In terms of distance

    http://loci.cs.utk.edu/dsi/netstore99/docs/presentations/keynote/sld023.htm

  • copyright 2015

    Interactive > Real time

  • copyright 2015

    Questions?

    Big Data:debunking some of the mythsSlide Number 2My backgroundRecent adventure with Big DataMisquoting Roger NeedhamWhat do I mean by big data?OverviewSimple problemsQuant ProblemsBig Data ProblemsThe good, the bad and the ugly of Big DataIts important to know your algorithmsTurning an assumption into a lineThere are lots of algorithms to understandStatisticiansQuantsData scientistIts also important to know your dataWhatever we call our expertsWhos heard of Anscombes quartet?Same statistical properties, butPerformanceDont agonise over distrosSmall is still beautifulBecause latencyIn terms of distanceInteractive > Real timeQuestions?