big data processing utilizing open-source technologies - may 2015

Post on 28-Jul-2015

540 Views

Category:

Software

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Big-Data Processing utilizingOpen-Source Technologies

32 Slides

Amir SedighiRayanesh Dadegan Data Solutions Ltd.

May 2015

Amir Sedighi - May 2015 2

References● http://www.slideshare.net/BernardMarr/140228-big-data-slide-share?qid=017848e2-9e2a-4dc3-963c-52b6a90fba2a&v=default&b=&from_search=1

● http://www.forbes.com/fdc/welcome_mjx.shtml

● ZYMR Spark Your Real-Time Big Data Analytics

● http://dataconomy.com

● https://datakulfi.wordpress.com/2013/03/27/big-data-open-source-technology-landscape/

● http://www.slideshare.net/andrefaria/big-data-abc?qid=1ac97e4a-4acc-460a-b3f8-9122f7210440&v=qf1&b=&from_search=12

● https://wiki.apache.org/hadoop/PoweredBy

● Making Sense Of Streaming Processing by Martin Kleppmann

Amir Sedighi - May 2015 3

Data Explosion

Amir Sedighi - May 2015 4

Data Explosion

Amir Sedighi - May 2015 5

● Big-Data is that everything we do is increasingly leaving a digital trace which we (or others) can gather, use and analyze.– Data Providers

● Business Companies● People

Amir Sedighi - May 2015 6

Volume, Velocity, Variety● “There was 5 exabytes of

information created between the dawn of civilization through 2003, but that much information is now created every 2 days, and the pace is increasing.” Eric Schmidt

Amir Sedighi - May 2015 7

Big-Data Processing

Amir Sedighi - May 2015 8

How to setup a Big-Data processing platform using commodity machines?

Amir Sedighi - May 2015 9

Vertical or Horizontal?

Amir Sedighi - May 2015 10

Scale Up vs Scale Out

Amir Sedighi - May 2015 11

Scale Up vs Scale Out

Amir Sedighi - May 2015 12

Big-Data Processing Open-Source Technology Stack

Amir Sedighi - May 2015 13

Map-Reduce

Amir Sedighi - May 2015 14

Hadoop Framework

Amir Sedighi - May 2015 15

Apache Hadoop Main Projects

Amir Sedighi - May 2015 16

Amir Sedighi - May 2015 17

SQL on Hadoop

● Apache Hive● Apache Drill (Dremel)● Cloudera Impala● Facebook Presto● Apache Kylin

Amir Sedighi - May 2015 18

More Map-Reduce (YARN)

● Apache Spark● Apache Flink (Stratosphere)● Apache Hama● Apache Tez (DAG, Complex Data Processing)

Amir Sedighi - May 2015 19

Service Programming

● Apache Thrift● Apache Zookeeper● Apache Avro● Google Kryo

Amir Sedighi - May 2015 20

Data Stores

● Data Stores– KeyValue– Graph– Columnar– Document Store– In Memory

Amir Sedighi - May 2015 21

Data Transfer

● Apache Flume● Apache Sqoop

Amir Sedighi - May 2015 22

Search

● Elasticsearch● Apache SolR

Amir Sedighi - May 2015 23

Log Management

● ELK● Logstash● FluentD

Amir Sedighi - May 2015 24

Machine Learning

● Apache Mahout● MLLib● GraphX

Amir Sedighi - May 2015 25

Messaging and Queuing● Apache Kafka● ZeroMQ

Amir Sedighi - May 2015 26

Stream Processing

● Apache Storm● Apache Samza● Apache Spark

Amir Sedighi - May 2015 27

Data Processing

Transient Query– Issued once, then forgotten

Persistent DataStored until deleted by user or apps

Amir Sedighi - May 2015 28

Stream Processing

Transient Data– Deleted as Window Slides

Forward

Generated up-to-date answers as time goes on

Persistent Queries

Tim

e Ba

sed

Coun

t Bas

ed

Amir Sedighi - May 2015 29

Amir Sedighi - May 2015 30

Amir Sedighi - May 2015 31

● http://recommender.ir

● http://helio.ir

Amir Sedighi - May 2015 32

Thank You!

Find this slide here:

http://www.slideshare.net/AmirSedighi

LinkedIn:

http://www.linkedin.com/in/amirsedighi

Blog:

http://hexican.com

Email:

sedighi@gmail.com

Twitter:

@amirsedighi

top related