why big data - the data rush

27
Why Big Data

Upload: robert-gibbon

Post on 02-Jul-2015

123 views

Category:

Internet


0 download

DESCRIPTION

A slide deck that I put together with my thoughts on the main economic drivers that have led to the new "data rush" and the commercialisation of grid computing, and some examples of the many diverse applications for hadoop in the contemporary, third wave, knowledge economy.

TRANSCRIPT

Page 1: Why Big Data - the data rush

Why Big Data

Page 2: Why Big Data - the data rush
Page 3: Why Big Data - the data rush

THE INFORMATION AGE

The so-called “economic third wave” has bankrupted or seriously damaged many blue chip organisations

Traditional manufacturing and retail is in rapid and heavy decline in Europe and the US

Technology, connectivity and access to information is restructuring our societies

Levels of political and social engagement have surged Peer-to-peer lending platforms have revolutionized banking in

many countries

Page 4: Why Big Data - the data rush
Page 5: Why Big Data - the data rush

NEW AGE NEW ORDER Manufacturing is shifting from the mass-production

model of the 20th century back to build-to-order production

High street stores are being used as showrooms while the actual sale is made online

Web based services are run with tiny profit margins on huge transaction volumes

Systems like Amazon Marketplace, Etsy and Ebay are empowering small business, delivering globalised trade and driving socioeconomic change that has never been seen before

Page 6: Why Big Data - the data rush

INNOVATION

Mass-production rarely benefits from innovation Innovation drives change – a huge cost with little benefit for

production-line driven economies “Refinement of product” mentality

Knowledge services need to innovate to differentiate Change in a virtual world can be cheap and yield huge

rewards “Reinvention of product” mentality

Page 7: Why Big Data - the data rush

THE ROVER BICYCLE, 1885

Page 8: Why Big Data - the data rush

A SHIFT IN DEMANDS

Shifting emphasis from mass-production to knowledge services and build-to-order production means shifting priorities

Innovation and change become more valued attributes than stability and reliability

Page 9: Why Big Data - the data rush

LONG TAIL

Long-tail economics underpin the information age

everything else / lower value

Wallmart, Best Buy

Amazon, eBay, Netflix

On

ly t

he

mo

st p

op

ula

r /

hig

hes

t va

lue

Page 10: Why Big Data - the data rush

BIG DATA VIZ LONG TAIL

Knowledge and information-driven services are following the “long-tail” paradigm in many ways, including processing huge amounts of low value data to yield profit

Google Now Amazon recommendations Ebay search Facebook Exchange

Page 11: Why Big Data - the data rush

BIG DATA VIZ INNOVATION

In a competitive, free market like the world-wide-web, innovation is valued because it can open up new opportunities

Consumer-grade access to grid computing technology is a recent innovation

Grid computing can open up new opportunities that would otherwise not be addressable

It is an excellent solution to the needs of ventures architected around the long-tail economic model

Page 12: Why Big Data - the data rush

CURRENT TREND

Industrial economies and traditional production line manufacturing require stability, reliability and minimal change

Knowledge economies thrive on innovation, and process huge amounts of information

The US and Europe are transitioning from industrial to knowledge economies

Big Data concepts and technologies are a key enabler for the new economy

Page 13: Why Big Data - the data rush

THE FUTURE - THINGTERNET

The internet of things is with us Billions of connected devices, even e-tattoos

Page 14: Why Big Data - the data rush

INTERNET OF THINGSAND BIG DATA

Billions of connected devices create a huge amount of data to process

Until grid computing, IoT was technically near impossible to implement

Page 15: Why Big Data - the data rush

INTERNET OF THINGS IS A WILD WEST

The IoT poses many new, unsolved challenges

An internet alarm clock, monitoring how often you sleep late, could be accessed by HR for employee performance evaluations

But new challenges = new opportunities

Page 16: Why Big Data - the data rush

CLASSIC BIG DATA APPLICATIONS

Page 17: Why Big Data - the data rush

STORAGE

Hadoop can be used purely for online data storage, with no direct processing

Low cost per-GB for petascale online storage The option of directly querying or analysing

the the data available if required.

Page 18: Why Big Data - the data rush

PRODUCT SEARCH

A huge, constantly changing catalogue of products – like Ebay and Amazon

Simple keyword search matching customer to product

SolrCloud – a full text search engine indexing and serving up terabytes of live content, running on Hadoop clusters

Page 19: Why Big Data - the data rush

BEHAVIOURAL TARGETING

Matching advertising content with users based on the user's demographic and interests – like Google AdWords

Behavioural Targeting can yield twice as many conversions (eg. Click-throughs) as untargeted advertising

Generates a huge amount of log data which is used for reporting and reprocessed for predictive analysis

Predictive analysis is compute intensive TBs of data per day

Page 20: Why Big Data - the data rush

PRODUCT RECOMMENDERS

Recommending products to the user based on their demographic and interests, other [similar] user's purchase history, and their current browsing pattern

Like Amazon and Zalando recommendations A hybrid between Behavioural Ad Targeting and

Product Search Combines product catalogue, clickstream data and

passive user profiling, possibly running live in-session

Page 21: Why Big Data - the data rush

EMERGING BIG DATA APPLICATIONS

Page 22: Why Big Data - the data rush

SELF SERVICE BIG DATA BUSINESS INTELLIGENCE

So-called “Enterprise Data Hub” paradigm The fastest growing use case in 2014 on

Yahoo's YGrid, a set of 16 clusters composed from 32.500 hadoop nodes

Sales, accounting, executive and other business users run the data analysis jobs themselves on the available datasets using discovery tools like MicroStrategy, Tableau and Tibco Spotfire

Page 23: Why Big Data - the data rush

DATA WAREHOUSING

Many migrations of classical Enterprise Data Warehousing applications to Hadoop

2-3x+ performance gains over Teradata on 3TB – 30TB workloads

Huge cost savings versus trad enterprise technologies like Oracle and Teradata

Fraud detection – eg. Credit Card, Medical Insurance, Welfare

Credit risk appraisal – eg. Credit card application Banking and Retail batch processes

Page 24: Why Big Data - the data rush

OLTP DBMS

Many large scale OLTP dbms implementations use HBase, Accumulo or other NOSQL grid db

For low latency, high throughput, high concurrency, high volume

eg. Sharedealing, Realtime ad auction Volumes at 200BN transactions per day in

realtime reliably served

Page 25: Why Big Data - the data rush

RESEARCH

Low cost solution for mapping the human genome

About 4TB of data per person eg. Cancer research, personalised drugs etc.

Page 26: Why Big Data - the data rush

DEVICE MANAGEMENT

Automated, managed service for analysis and response to threats detected by SPI module on remote switch

Central heating system management – shut down boiler when nobody home to reduce heating bill and emissions – eg. Nest

Monitor drivers' propensity to break the speed limit and apply lower insurance premiums to good drivers

Page 27: Why Big Data - the data rush