ruby for soul of bigdata nerds
DESCRIPTION
TRANSCRIPT
Ruby for the soul of BigData Nerds
Who Am I?● Engineering Team Lead
Analytics & Data Platforms @ Viki.com
● Founder of http://BigData.SG
● Contributor to fluentd, pfeed, cartographer, watir
BigData & Its Challenges "big data" is when the size of the data itself becomes part of the problem - Mike Loukides
● Twitter produces over 230 million tweets per day● Wal-Mart is logging one million transactions per hour● Facebook creates over 30 billion pieces of content ranging from web links, news, blogs, photo
Everyone has a big data problem
Evolving Trends
Batch ProcessingHadoop , HPCC, Google BigQuery
Stream Processing STORM (Twitter) & S4 (Yahoo)
Common Engineering Challenges
● Data Collection● Filtering / Segmentation● Data Storage● Analysis● Visualization● Prediction / Extrapolation
Data Collection + Filtering / Segmentation
http://fluentd.org/
Data Collection + Filtering / Segmentation
http://fluentd.org/
You send events as:Http://domain:8080/namespace?key1=value1&key2=value2
Fluent forwards the data as:<timestamp> <namespace> {key1:value1,key2:value2}
Screencast:http://www.bigdata.sg/videos/fluentd/
Analysis
Hadoop Streaming (Ruby)
Hadoop Hive (Using rbhive)
Visualization
Custom Dashboard (Rails + Google Charts / d3.js)
Some Hosted Services: tableaupublic.com, geckoboard.com, splunkstorm.com
Stream Computing
What is STORM?
STORM terminology●Streams●Spouts●Bolts●Topologies
RedStorm (https://github.com/colinsurprenant/redstorm)
$ rvm use jruby-1.6.3 $ bundle install redstorm $ bundle exec redstorm install
Visualizing average bandwidth experienced by users while watching videos on viki.com across the globe.
Thank you!
Let's stay in touch :)
● Signup for my newsletter at http://parolkar.com● Visit BigData.SG Meetup in Singapore.