ÁìÇb aif gp=c · cassandra, mongodb, redis, mysql, elasticsearch/solr reporting, visualization...

61
#&%$(C 瓂 b AIF GP=C

Upload: others

Post on 16-Nov-2019

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

Cb AI GP

Page 2: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

► v

► o

► BXIZS

► AMLQ

► G MMXMZ

► I SI

1OM LI

Page 3: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

► v

► o

► BXIZS

► AMLQ

► G MMXMZ

► I SI

1OM LI

Page 4: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

► 4Vi (Volume)g (Velocity)g (Variety)g (Value)

► Wikipediai

*

P X -##M _QSQXMLQI ZO#_QSQ#2QOHLI I

Page 5: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

► + f q …)

► TataUFO h

► h

Page 6: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

► v

► o

► BXIZS

► AMLQ

► G MMXMZ

► I SI

1OM LI

Page 7: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

►► m

► l

► (Splittable)► XML, JSON (x)

► CSV, JSON , Avro, Parquet

► (Block Compressable)► CSV, JSON (x)

► Avro, Parquet

Page 8: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

► v

► o

► BXIZS

► AMLQ

► G MMXMZ

► I SI

1OM LI

Page 9: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

Alluxio

Batch Processing

Real Time Processing

Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr

Reporting, Visualization (Tableau, Zepplin, Hue…)

Storm/Heron

Spark Streaming

Flink

Spark

MapReduce

HDFS / Hbase / Object Storage

IaaS

MonitoringAlarm

MeteringSecurity

Governance(ZooKeeper)

LogsMetricsSocial Data

Sensor Data

Messages

Kafka/Flume

Big SQL

SparkSQL

Phoenix

Hive

Kylin

GreenPlum

Page 10: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

► v

► o

► BXIZS

► AMLQ

► G MMXMZ

► I SI

1OM LI

Page 11: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

o

► | API | | Auto Scaling

► |

► o

► g

► g

► r

Don’t try to do it yourself. Let us handle it.

Page 12: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

Hadoop on Cloud

► 3r

► l 3r j

► j

Page 13: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

SQL on Cloud

► HashData QingCloud l SQL-on-Cloud

g r PB

► |PostgreSQL Greenplum Database, HashData

SQL BI

Page 14: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

► IO – SDN 2.0 ( )

► IO – gUnikernelg IaaS

Page 15: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

► v

► o

► BXIZS

► AMLQ

► G MMXMZ

► I SI

1OM LI

Page 16: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

► r

► =IXAML KM n |

► =IXAML KM f

► | g

► g

► 1 P :I I BKITI I L B <

► t

► r

► gp z |

► 8IL X

1XIKPM BXIZS

Page 17: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

Hadoop HDFS

Page 18: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

► BXIZS 3 ZM-

I S KPML TQ O UMU Z UI IOMUM I T

ZMK MZ Q MZIK Q O _Q P

ZIOM MU I L U ZM

8 UM PM 1 PI LM Q M ZM QTQM

LQ ZQJ ML LI I M A44 _PQKP IZM BXIZSd

UIQ XZ OZIUUQ O IJ ZIK Q

A44 ZMXZM M I K TTMK Q Q MU LQ ZQJ ML

IKZ UI K UX M

LM PI KI JM UI QX TI ML Q XIZITTMT

BXIZS

Page 19: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

BXIZS

Page 20: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

► v lg JQ # IZ ITT P

►► K # XIZS M P

► K # TI M

► K # XIZS LM I T K

►► P X-##. XIZS UI MZ 0-, ,

► P X-##. XIZS LZQ MZ 0-( (

► 4ZQ MZl R J PQ Z MZ MZgP X -##L K YQ OKT L K U#O QLM# XIZS P UT R J PQ Z

BXIZS

Page 21: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

► PMTT

► #JQ # XIZS PMTT UI MZ XIZS -##. XIZS UI MZ QX0-

► B 1A H5E53DC>AH=5=>AF/ O #JQ # XIZS PMTT UI MZ XIZS -##. XIZS

UI MZ QX0-

► #JQ #X XIZS UI MZ XIZS -##. XIZS UI MZ QX0-

► _ ZL K IUXTM

BXIZS

Page 22: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

Spark RDD

► Resilient Distributed Dataset

► An immutable distributed collection of

objects

P X -##LI IUQaM _ ZLXZM K U# )# # ,# Q ITQaQ O JI QK ZLL XMZI Q PZ OP _ ZLK Q X XIZS#

with word count sample

Page 23: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

Spark RDD

P X-##___ QK T K U#IZ QKTM #D: *

with word count sample using HDFS

Page 24: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

Spark RDD

► transformations and actions

► lazy fashion – DAG (Directed Acyclic

Graph)

► map(), filter(), flatmap()

Page 25: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

Spark RDD

► persist() RDD persist

► errorsRDD = inputRDD.filter(lambda x: "error" in x)

warningsRDD = inputRDD.filter(lambda x: "warning" in x)

badLinesRDD = errorsRDD.union(warningsRDD)

► persist g persist

► persist(StorageLevel.DISK_ONLY)

► collect() n lg HDFS

Page 26: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

Spark

► driver program

► main g yRDDg SparkContext

► shell driver program (sc )

► Standalone, Yarn, Mesos

Page 27: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

Spark

► bin/spark-submit --class org.apache.spark.examples.SparkPi --master

spark://skn-im9crqkd-spark-master:7077

--executor-memory 1G

--total-executor-cores 3

/usr/local/spark/lib/spark-examples-1.6.0-hadoop2.6.0.jar 1000

► http://spark.apache.org/docs/latest/submitting-applications.html

Page 28: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

Spark SQL

► g g JSON, Hive, Parquet

► 4I I6 ZIUM c A42=B IJTM

► 4I I M c

► Can be created from external data sources, from the results of queries, or from

regular RDDs

http://www.agildata.com/apache-spark-rdd-vs-dataframe-vs-dataset/

Page 29: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS
Page 30: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

Spark Streaming►

Page 31: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

Spark Streaming►

Page 32: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

Spark Streaming - _ ZL K► 1 inc -lk 9999

► 2 ibin/spark-submitexamples/src/main/python/streaming/network_wordcount.py192.168.100.99999

Page 33: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

Spark Streaming - _ ZL K► DiscretizedStream(DStream)– RDD

Page 34: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

Spark MLlib► iK-means z

► i r Dg s r nr g

D kr g r z

g n s r r

Page 35: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

Spark MLlib► iK-means z

Page 36: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

► v

► o

► BXIZS

► AMLQ

► G MMXMZ

► I SI

1OM LI

Page 37: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

► Redis

► Redis o

► Redis

Redis

Page 38: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

Redis

► NoSQL g g s

► w aof, rdb

► t set, list, hash, string, sorted set

► Redis v.s. Memcached

Page 39: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

Redis

► Standalone

► vg v

► Sentinel

► Cluster

► v 0 g v

► kvfkr )

Page 40: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

Redis

► 1000 r g

► proxyg r r

► 16384 hash slots r l

► r key hash slotg r l

► HASH_SLOT = CRC16(key) mod 16384

► g

Page 41: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

Redis

► redis-trib.rb

► ./redis-trib.rb create --replicas 1

► 192.168.1.2:6379 192.168.1.3:6379 192.168.1.4:6379 192.168.2.2:6379 192.168.2.3:6379

192.168.2.4:6379

► ./redis-trib.rb check 192.168.1.3:6379

► https://docs.qingcloud.com/guide/cache.html#id14

Page 42: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

Redis

► commons-pool2-2.0.jar jedis-2.7.3.jar

https://docs.qingcloud.com/guide/cache.html#id14

Page 43: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

► v

► o

► BXIZS

► AMLQ

► G MMXMZ

► I SI

1OM LI

Page 44: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

► o

ZooKeeper

Page 45: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

► ZIKM K LQ Q LMILT KS

► G ru g i

► g 7 OTM 3P JJ r

► 8IL X 82I M

► M KL K T

ZooKeeper

Page 46: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

► r g l

► r

► | g w ZI IK Q T O IX P

► h a LM

► Nodesandephemeralnodes

► n e

Page 47: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS
Page 48: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS
Page 49: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

► )r

► r a K O

► JQ #aSBMZ MZ P IZ

► zooKeeper Commands:TheFourLetterWords

► MKP U Z b K T K I T P ,

Page 50: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

► v

► o

► BXIZS

► AMLQ

► G MMXMZ

► I SI

1OM LI

Page 51: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

► o

Kafka

Page 52: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

► i o z n r

► I SI r

► I SI MZ MZ XZ L KMZ c , )) ZMK ZL # MK , =2# MK

► | g

https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines

Kafka

Page 53: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

► . ,

Page 54: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

Page 55: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

► . *

Page 56: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

► :I I

Page 57: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS
Page 58: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS
Page 59: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS
Page 60: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

* -

* -

* - .

Page 61: ÁìÇb AIF GP=C · Cassandra, MongoDB, Redis, MySQL, ElasticSearch/solr Reporting, Visualization (Tableau, Zepplin, Hue…) Storm/Heron Spark Streaming Flink Spark MapReduce HDFS

Thank [email protected]@yunify.com