bd day 1 1425. bigdataequalsbigmaths

Upload: krishnanand

Post on 01-Jun-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    1/36

    Laur ence Li ew

    Gener al Manager , APAC

    BigData

    = BigMaths

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    2/36

    Global Industries SerFi nanci al Servi ces

    Di gi t al Medi a

    Gover nment

    Heal t h & Li f e Sci ences

    Hi gh Tech

    Manufact uri ng

    Reta i l

    Tel co

    Our Software DeliversPower: Di st r i but ed, scal abl e hi gh perf or mance

    advanced anal yt i cs

    Productivity: Easi er t o bui l d and depl oy anal yti c

    appl i cat i ons

    Enterprise Readiness: Mul t i - pl atf orm

    Our PhilosophyCust omer- cent r i c i nnovati on

    Easy t o do busi ness wi t h

    Who we are

    Leadi ng pr ovi der of commer ci al anal yti cs pl atf or m

    based on open sour ce R st at i st i cal comput i ng

    l anguage

    Customers200+ Gl obal 2000

    Global PresenceNor t h Amer i ca / EMEA / A

    Our Services DeliverKnowledge: Our expert s enabl e you t o be expert s

    Time-to-Value: Our Qui ckSt art pr oj ect s gi ve you

    a j umpst art

    Guidance: Our cust omer support t eam i s her e to

    hel p you

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    3/36

    Consumer & Info Svcs

    200 Corporate Customers and Growing

    Finance & Insurance Healthcare & Life Sciences

    Manuf & TechAcademic & Govt

    Revolution Co

    http://www.google.com/imgres?imgurl=https://www.detecon.com/media.php/images/references/Cell_C_Logo_150x100.jpg&imgrefurl=https://www.detecon.com/en/services/references.html?page=13&sort=Referenzen_Sort_DESC&usg=__GH3P5sK3kx9QLJQJiYdGAaJ0gd0=&h=100&w=150&sz=23&hl=en&start=2&itbs=1&tbnid=XqUC3mBkwM790M:&tbnh=64&tbnw=96&prev=/images?q=cellc+logo&hl=en&gbv=2&tbs=isch:1http://www.edftrading.com/default.aspxhttp://www.astellas.us/http://www.google.com/imgres?imgurl=http://www.responsible-investor.com/images/uploads/resources/profile/logo/21196678884Aberdeen1.JPG&imgrefurl=http://www.responsible-investor.com/resource/profile_page/aberdeen_asset_management/&usg=__DF-OV7v-YNbEXS-Zf56xV5EWPQE=&h=162&w=299&sz=13&hl=en&start=4&itbs=1&tbnid=k10Hn_J6hgbhvM:&tbnh=63&tbnw=116&prev=/images?q=aberdeen+asset+management+logo&hl=en&gbv=2&tbs=isch:1http://www.google.com/imgres?imgurl=http://static.85broads.com/images/AllianceBernsteinLogo.gif&imgrefurl=http://static.85broads.com/images/&usg=__JYdkyIGo5u8ky3ppR1wPFJnRr7k=&h=112&w=210&sz=4&hl=en&start=2&itbs=1&tbnid=YZleIaA1P4ShCM:&tbnh=57&tbnw=106&prev=/images?q=alliance+bernstein+logo&hl=en&gbv=2&tbs=isch:1http://www.google.com/imgres?imgurl=http://www.stockwatch.in/files/Procter-Gamble.bmp&imgrefurl=http://www.stockwatch.in/procter-amp-gamble-gains-growth-27-its-q4-net-profit-21991&usg=__locyPrugpH5UDMvKU0SK90l_uSM=&h=269&w=448&sz=354&hl=en&start=1&itbs=1&tbnid=DrQpFdRnGHHqtM:&tbnh=76&tbnw=127&prev=/images?q=procter+gamble&hl=en&gbv=2&tbs=isch:1http://www.google.com/imgres?imgurl=http://www.globalpathwaysproject.org/images/aegon.jpg&imgrefurl=http://www.globalpathwaysproject.org/fund_sponsor.html&usg=__G0tdJIQK_oqO6oJtTNqqGadkz_I=&h=91&w=159&sz=6&hl=en&start=1&itbs=1&tbnid=Z4f7zHM2BMLqvM:&tbnh=56&tbnw=97&prev=/images?q=aegon+direct+marketing+services+logo&hl=en&gbv=2&tbs=isch:1http://www.google.com/imgres?imgurl=http://www.valueclickmedia.com/images/vcm_logo_press.jpg&imgrefurl=http://www.valueclickmedia.com/about_press.shtml&usg=__z_EcDYiGvfiq9OykUmXbMJVA1Po=&h=492&w=1176&sz=62&hl=en&start=1&itbs=1&tbnid=9LBc_CYkUUTeLM:&tbnh=63&tbnw=150&prev=/images?q=valueclick+media+inc&hl=en&gbv=2&tbs=isch:1
  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    4/36

    Centre of Excellence COE

    Partner with iLEs to create new IPs in big data

    analytics in Singapore

    Big data analytics training/workshops

    We wi l l have our dat a sci ent i st and devel oper s wal ongsi de our col l abor at i on par t ner s.

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    5/36

    Centre of Attachment COA

    To accelerate formation of data science team wit

    organization

    Anal yt i cs/ stat i st i cs ski l l s

    Bi g dat a i nf r ast r uct ur e ski l l s such as Hadoop HPC cl ust er s

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    6/36

    THE PERFECT STORM

    CONVERGENCEOF

    Why Big Data Now?

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    7/36

    ERP

    Cost

    Records

    Summary

    Operating

    Statistics

    Vehicle

    Monitoring

    Incidents

    Alarms

    Systems

    Logs

    Volumes

    Text

    Instructions

    Workorders

    Reports

    Video

    And

    Imagery

    Machine

    SensorsRealtime

    Telemetry

    3D/4D

    Seismic

    Exabytes

    Petabytes

    Terabytes

    Gigabytes

    Increasing Volume, Variety and Velocity

    7 Dec

    Communication

    Logs

    Geospatial

    ESRI

    Logistics

    Daily

    Activity

    Reports

    Backdrop - Massive Data Volumes

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    8/36

    Whats big data?

    Volume Variety Velocity

    N t G ti Bi D t A l ti

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    9/36

    Next Generation Big Data AnalyticsPlayers

    INFRASTRUCTURE AND D

    ANALYTICS

    ? ? ?

    HDD -> SSD -> In-Memo

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    10/36

    What is R (Video)

    http://www.youtube.com/watch?feature=player_embe

    dded&v=TR2bHSJ_eck

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    11/36

    Statistical data analysis programming language Huge library algorithms for data acce

    analysis & graphics

    = Language + Analytics

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    12/36

    Data Analytics Workflow

    INGEST DISTILL & ANALYZE CONSUM

    R is open source and drives analytic innovat

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    13/36

    R is open source and drives analytic innovatbut.has some limitations for Enterprises

    Disk based

    scalabilit

    Parallelthreading

    Commercialsupport

    Leverage source paplus Big ready pac

    CommercialLicense

    In memory bound

    Singlethreaded

    Community support

    4500+ innovativeanalytic packages

    Risk ofdeployment ofopen source

    Big Data

    Speed of

    Analysis

    Enterprise

    Readiness

    Analytic

    Breadth

    & Depth

    Commercial

    Viability

    13

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    14/36

    Big Data Speed @ Scalewith Revolution R Enterprise

    Fast Math Libraries

    Parallelized Algorithms

    In-Database Execution

    Multi-Threaded Execution

    Multi-Core Processing

    In-Hadoop Exe

    Memory Management

    Parallelized User Code

    14

    i i

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    15/36

    Revolution R Enterprise ScaleRPerformance and Capacity

    15

    SAS HPA B h ki i *

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    16/36

    SAS HPA Benchmarking comparison*Logistic Regression

    Rows of data 1 billion 1 billion

    Parameters just a few 7

    Time 80 seconds 44 seconds

    Data location In memory On disk

    Nodes 32 5

    Cores 384 20

    RAM 1,536 GB 80 GB

    Revolution R is faster on the same amount of data, despite using approximately a 20thas many cores, a 20th

    much RAM, a 6thas many nodes, and not pre-loading data into RAM.

    *As published by SAS in HPC Wire, April 21, 2011

    Double

    45%

    1/6th

    5%

    5%

    Revolution R Enterprise Delivers Performance at 2% of the Cost

    16

    32 nodes

    appliance

    ~ $2.5M

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    17/36

    Benchmarks: RevoR vs legacy tool

    Airline data set: 123,534,969 rows and 29 columns in its original state.

    All tests were run on laptop: 16GB RAM, SSD, and i7-3632QM [email protected].

    mailto:[email protected]:[email protected]
  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    18/36

    Allstate compares SAS and R for BigData Insurance Models

    150 million observations and 70

    freedom.

    "It's difficult to be product

    on a tight schedule if it tak

    over 5 hours to fit one

    candidate models "

    Approach Platform T

    1: SAS 16-core Sun Server 5

    2 R 250 GB Server I

    3: RRE 5-node (4 cores / node) LSF cluster 5

    So what have we learned: SAS works, but is slow.

    The data is too big for open-source R, even on a verylarge server. Revolution R Enterprise gets the same results as SAS,but about 50x faster.

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    19/36

    DistributedR

    ScaleR

    ConnectR

    DeployR

    Write Once. Deploy Anywhere.

    DESIGNED FOR SCALE, PORTABILITY & PERFORMA

    In the Cloud CloudR

    Workstations &Servers

    Desktop

    ServerLinux

    Clustered SystemsLinux HPWindows

    EDW Teradata

    HadoopHortonwCloudera

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    20/36

    From Laptops, workstations andServers

    To H gh Performance Compute

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    21/36

    To H gh Performance ComputeClusters

    Frontend

    - 2-way or 4-

    - Cluster Ma

    - Fast HDD

    - Lots of RAM

    Comput

    - 2-way

    - Compu

    - Fast CP

    - Fast H

    - Lots of

    Storage Node

    - 2-way or NAS

    - external SCSI OR

    - external SAN

    - FAST HDD- Cluster FS

    Supercomputing Network

    - High Bandwidth (>250MB/s)

    - Low Latency (1.2-8us)

    - Cost effective: GE

    - Performance:

    - Infiniband

    - 1GE or 10 GE

    - NumaConnect

    Admin N

    - Good B

    - Route a

    - Typical

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    22/36

    To Hadoop-scale on-disk analytics

    The Apachsof t war e

    a f r amewoal l ows f odi st r i butpr ocessi ndat a set scl ust er s

    comput er s

    1 node 12TB10 nodes 100 nodes

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    23/36

    To in-database

    clusters

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    24/36

    clusters

    We first became interested in shared me

    the programming paradigmI think we ar

    lot more people looking at this type of en- William W. Thigpen, Chief, Engineering Branch, NASA Advan

    SMP S

    8 nod

    16 CP

    256 co1TB R

    One L

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    25/36

    To Cloud

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    26/36

    Write Once. Deploy Anywhere.

    1

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    27/36

    Hadoop + R

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    28/36

    Hadoop

    Dell PowerEdge Servers

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    29/36

    Linear Regression with RevoR on a

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    30/36

    Linear Regression with RevoR on aHadoop Cluster!

    Total: ~ 2 lines of R code, Productivity of 50 times

    RevoR with Hadoop

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    31/36

    RevoR with Hadoop

    Complex & B

    Big Anal tics on Big Data in Hadoop

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    32/36

    Big Analytics on Big Data in Hadoop

    100% R on Hadoop

    Ful l Ski l l Tr ansf er - needed.

    Use 4500+ CRAN Package

    Bl end Combi ne R & Ot he/ Methods

    100% Por t abi l i t y

    Bui l d Once Depl oy Ma

    Tr ack Evol ut i on of Had

    Pr ot ect Agai nst Pl at f oUncer t ai nt y

    Avoi d Pl at f or m Lock- i n

    Hadoop Per f or mance & S

    Lever age Hadoop Par al lEasi l y

    Anal yze Data Wi t hout M

    Data

    Analytics

    Applications

    Hadoop

    +

    Scalable

    Compute

    HDFS

    HBase

    Portability.

    Parallel Storage

    Hive

    Big Data

    Scale

    100% R.

    32

    RRE V7 inside Hadoop

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    33/36

    RRE V7 inside Hadoop

    Analytics

    Applications

    Edge Node

    MapReduce

    Hadoop

    HDFS

    Other MapReduce Jobs

    HBase

    Revolution

    R Enterprise

    DistributedR

    Framework

    ScaleR Algorithms

    ConnectR:

    HBase

    HDFS

    ODBC &

    High-Speed Connectors

    Revolution

    R Enterprise

    DistributedR

    Framework

    ScaleR Algorithms

    ConnectR:

    HBase

    HDFS

    ODBC &

    High-Speed Connectors

    DeployR

    DB, EDW

    M2M

    Applications

    Analytics

    Data

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    34/36

    So how do I start?

    bi d t t t kit

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    35/36

    www.bigdatastarterkit.comwww.bigdataconsume

  • 8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths

    36/36

    Q & A

    Revol ut i on Anal yt i cs i s t he l eadi ng

    commer ci al pr ovi der of sof t war e andsuppor t f or t he popul ar open sour ce Rst at i st i cs l anguage.E: Laur ence. l i ew@r evol ut i onanal yt i cs. com

    W: www. r evol ut i onanal yt i cs. com

    http://www.revolutionanalytics.com/http://www.revolutionanalytics.com/http://www.revolutionanalytics.com/http://www.revolutionanalytics.com/