scalability and big data at senzari

43
SCALABILITY AND DATA ANALYTICS MATTER HCB (@boosc)

Upload: chris-boos

Post on 01-Dec-2014

828 views

Category:

Education


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Scalability and Big Data at Senzari

SCALABILITY AND DATA ANALYTICS MATTER

HCB (@boosc)

Page 2: Scalability and Big Data at Senzari

Agenda

• Buzzword bingo

• Data

• Analytics

• Scalability

• Distributed and parallel concepts

• Technology and tools

• Senzari and big data

Page 3: Scalability and Big Data at Senzari

Buzzword Bingo

Big DataData Engineer

H-Space

HadoopCassandra HBasePIGredis.io Eucalyptus

Machine Learning Support Vector Machines

Gaussian ProcessesSwarm Intelligence

Genetic Algorithms

Agents/Bots

R+Natural Language Processing

ClusteringCore Dataset

NoStats

Page 4: Scalability and Big Data at Senzari

Data, lots of it

Page 5: Scalability and Big Data at Senzari

79 times more CPU power than used in Apollo missions on one iPhone

Page 6: Scalability and Big Data at Senzari

What we can do

Page 7: Scalability and Big Data at Senzari

Data

Page 8: Scalability and Big Data at Senzari

Knowledge pyramid

Data Processing 1960 s 1950 s Data

Data:

Unfiltered, Research, Creation, Gathering

Page 9: Scalability and Big Data at Senzari

Knowledge pyramid

Data Processing 1960 s 1950 s Data

Information Mangement 1980 s 1970 s Information

Information:

Organized Data, Patterns, Presentation

Page 10: Scalability and Big Data at Senzari

Knowledge pyramid

Data Processing 1960 s 1950 s Data

Information Mangement 1980 s 1970 s Information

Knowledge Management 1990 s Knowledge

Knowledge:

Useful Patterns, Predictability, Conversation

Page 11: Scalability and Big Data at Senzari

Knowledge pyramid

Data Processing 1960 s 1950 s Data

Information Mangement 1980 s 1970 s Information

Knowledge Management 1990 s Knowledge

Knowledge Ecology 2000 s Intelligence

Intelligence: Choice, Understanding, Dicision

Page 12: Scalability and Big Data at Senzari

Knowledge pyramid

Data Processing 1960 s 1950 s Data

Information Mangement 1980 s 1970 s Information

Knowledge Management 1990 s Knowledge

Knowledge Ecology 2000 s Intelligence

Wisdom 2010 s Systems Thinking

Wisdom:

Evaluation, Interpretation, Retrospective

Page 13: Scalability and Big Data at Senzari

Knowledge pyramid

Data Processing 1960 s 1950 s Data

Information Mangement 1980 s 1970 s Information

Knowledge Management 1990 s Knowledge

Knowledge Ecology 2000 s Intelligence

Wisdom 2010 s Systems Thinking

Yield

Page 14: Scalability and Big Data at Senzari

Why you need big data

Data Processing 1960 s 1950 s Data

Information Mangement 1980 s 1970 s Information

Knowledge Management 1990 s Knowledge

Knowledge Ecology 2000 s Intelligence

Wisdom 2010 s Systems Thinking

Yield You Are Here !

Page 15: Scalability and Big Data at Senzari

Analytics

Page 16: Scalability and Big Data at Senzari

Even in simple datasets, common statistics fails - (avg, min, max, distribution)

Page 17: Scalability and Big Data at Senzari

Finding clusters, evaluating outliers and interpreting white noise

Page 18: Scalability and Big Data at Senzari

Two tips for looking at data:

1. Plot it

2. Remove all labels

Page 19: Scalability and Big Data at Senzari

Scalability

Page 20: Scalability and Big Data at Senzari

Cloud Computing Is

When the IT guys are finally able to explain to business

people what they were talking about 20 years ago!

Page 21: Scalability and Big Data at Senzari

=

Page 22: Scalability and Big Data at Senzari

Computation on demand

+Pay as you go

Page 23: Scalability and Big Data at Senzari

BASE(Basically Available, Soft State, Eventual consistency)

not

ACID(Atomicity, Consistency, Isolation, Durability)

Page 24: Scalability and Big Data at Senzari

How to scale (AWS Example)

• Do not allocate instances manually

• Each component needs to be independent

• Plan for failure

• Actively provoke failure

Page 25: Scalability and Big Data at Senzari

Human Software

• Click Workers and Mechanical Turks are not just cheap labour

• They allow programmers to hand tasks to humans they are not able to handle algorithmically

• Make use of it to

• Do things too complicated for machine learning

• Pre populate machine learning spaces

Page 26: Scalability and Big Data at Senzari

Distributed and parallel concepts

Page 27: Scalability and Big Data at Senzari

Imperative Programming

• Step by step explanation what to do

• Explaining WHAT to do rather than RESULTS you want

• Always necessary for basic algorithms

1

2

3

Page 28: Scalability and Big Data at Senzari

Functional Programming I

• Combine results to become a program

• Allows dynamic distribution

• Map-Reduce is only one way of doing it!

1

2

3

Page 29: Scalability and Big Data at Senzari

Functional Programming II

F ( G ( H ( A,B) , C), D)

getMusicLikes(getFriends(facebookID)

Instead of

for i in getFriends(facebookID) getMusicLikes(i)

Page 30: Scalability and Big Data at Senzari

Technology and tools

Page 31: Scalability and Big Data at Senzari

Data Storage

• Cassandra - for write performance

• Hbase - for read performance

• Redis.io - for predictable operation time

Page 32: Scalability and Big Data at Senzari

Other Data Storage

• Mongo - NOSQL for beginners (close to SQL, but scalability is very manual)

• SONOS -Graph DB (Windows based)

• CouchDB, etc. etc. - nice concepts, lots of great ideas, but communities too small

Page 33: Scalability and Big Data at Senzari

Distributed Computing

• Hadoop

• Zookeeper as DLS

Page 34: Scalability and Big Data at Senzari

Languages

• ERLANG

• HASKELL

• SCALA

• Lisp

• Prolog

• Mathmatica

Page 35: Scalability and Big Data at Senzari

STDOUT

No, You Don‘t Have to Learn ERLANG? No,Use Hadoop

Streaming With Python

Program 1

Line 1

Line 1

Line 1

Line 1

Program 2

Program 2

Program 2

Program 2

Page 36: Scalability and Big Data at Senzari

Check out my tool list:http://www.hcboos.net/100-links/

Page 37: Scalability and Big Data at Senzari

Senzari and big data

Page 38: Scalability and Big Data at Senzari

The AMP3 PlatformAdaptable Music Parallel Processing Platform

Page 39: Scalability and Big Data at Senzari

Behind AMP

Page 40: Scalability and Big Data at Senzari

Technologies

• AWS: EC2, S3, EBS, SNS, ELB

• Cassandra + Hadoop + Solandra

• Zookeeper

• Dynamic scaling server (Lich Lord)

• Asynchronous messaging system

• Modules built in python

Page 41: Scalability and Big Data at Senzari

Effects

• Built on top of python platform

• Fully automated scaling

• Fully distributed data processing

• Message channels allow code decoupling

• Message channels allow replay

• Message channels allow outtasking

Page 42: Scalability and Big Data at Senzari

Thank You for Your Time