ruby for soul of bigdata nerds

24
Ruby for the soul of BigData Nerds

Upload: abhishek-parolkar

Post on 28-Nov-2014

2.475 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Ruby for soul of BigData Nerds

Ruby for the soul of BigData Nerds

Page 2: Ruby for soul of BigData Nerds

Who Am I?● Engineering Team Lead

Analytics & Data Platforms @ Viki.com

● Founder of http://BigData.SG

● Contributor to fluentd, pfeed, cartographer, watir

Page 3: Ruby for soul of BigData Nerds

BigData & Its Challenges "big data" is when the size of the data itself becomes part of the problem - Mike Loukides

● Twitter produces over 230 million tweets per day● Wal-Mart is logging one million transactions per hour● Facebook creates over 30 billion pieces of content ranging from web links, news, blogs, photo

Page 5: Ruby for soul of BigData Nerds

Evolving Trends

Batch ProcessingHadoop , HPCC, Google BigQuery

Stream Processing STORM (Twitter) & S4 (Yahoo)

Page 6: Ruby for soul of BigData Nerds

Common Engineering Challenges

● Data Collection● Filtering / Segmentation● Data Storage● Analysis● Visualization● Prediction / Extrapolation

Page 7: Ruby for soul of BigData Nerds

Data Collection + Filtering / Segmentation

http://fluentd.org/

Page 8: Ruby for soul of BigData Nerds

Data Collection + Filtering / Segmentation

http://fluentd.org/

You send events as:Http://domain:8080/namespace?key1=value1&key2=value2

Fluent forwards the data as:<timestamp> <namespace> {key1:value1,key2:value2}

Page 9: Ruby for soul of BigData Nerds

Screencast:http://www.bigdata.sg/videos/fluentd/

Page 10: Ruby for soul of BigData Nerds

Storage

Hadoop HDFS

OpenTSDB (http://opentsdb.net)

SciDB (DMAS)

Page 11: Ruby for soul of BigData Nerds

Analysis

Hadoop Streaming (Ruby)

Hadoop Hive (Using rbhive)

Page 12: Ruby for soul of BigData Nerds

Visualization

Custom Dashboard (Rails + Google Charts / d3.js)

Some Hosted Services: tableaupublic.com, geckoboard.com, splunkstorm.com

Page 13: Ruby for soul of BigData Nerds

Stream Computing

Page 14: Ruby for soul of BigData Nerds

What is STORM?

Page 15: Ruby for soul of BigData Nerds

STORM terminology●Streams●Spouts●Bolts●Topologies

Page 16: Ruby for soul of BigData Nerds

RedStorm (https://github.com/colinsurprenant/redstorm)

$ rvm use jruby-1.6.3 $ bundle install redstorm $ bundle exec redstorm install

Page 17: Ruby for soul of BigData Nerds

Visualizing average bandwidth experienced by users while watching videos on viki.com across the globe.

Page 18: Ruby for soul of BigData Nerds
Page 19: Ruby for soul of BigData Nerds
Page 20: Ruby for soul of BigData Nerds
Page 21: Ruby for soul of BigData Nerds
Page 22: Ruby for soul of BigData Nerds
Page 23: Ruby for soul of BigData Nerds
Page 24: Ruby for soul of BigData Nerds

Thank you!

Let's stay in touch :)

● Signup for my newsletter at http://parolkar.com● Visit BigData.SG Meetup in Singapore.