場次:a-1 公司名稱:emc taiwan 主講人:李百飛 業務拓展總監 · sql server...

27
場次:A-1 公司名稱:EMC Taiwan 主題:建置企業Big Data應用架構 主講人:李百飛 業務拓展總監

Upload: others

Post on 24-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

場次:A-1

公司名稱:EMC Taiwan

主題:建置企業Big Data應用架構

主講人:李百飛

業務拓展總監

資料是新興石油!

釣魚台 油頁岩

巨量資料的來源與資料型態

GROW 50XIN THE NEXT 10

YEARS

INFORMATION IN THE ENTERPRISE WILL

結構性資料(DB型態)

非結構性資料(File型態)

巨量資料下的興新 IT 技術

Hadoop NoSQL DB

MapReduce Visual Discovery

Predictive AnalyticsStreaming Data Technology

In-Memory Computing

EDWGrid &ClusteringScale-out

巨量資料應用另一個棘手的問題

資料散落在各處 (Data Silo)• 資料存在企業內與企業外• 各資料系統獨立運作• 現有 IT 架構很難或不易整合• 共享效率奇差無比• 沒有全貌性的資料視野• 高昂的系統升級成本

Today’s programs are silo’dand too expensive for

enterprise to maintain. They are also often confusing for the

constituent to access and understand !

巨量資料成功應用的關鍵點

Velocity

Big Data ValueBusiness Impact =

Velocity(愈快愈好)

及時資料處理 (OLTP: Fast Data)

+ 及時資料到位 (ETL: Time Overhead)

+ 及時資料透析 (OLAP: Fast Analytics)=

* Cost

Cost(愈少愈好)

持續商轉成本 (HA, Downtime, Bkup, DR, Support)

+ 超額成本負擔 (ETL, Server, Net, Storage, License)

+ 資料管理風險 (Risk: Data Governance) =

Cost-

結構性資料(DB型態)

非結構性資料(File型態) 交叉直

接存取

•Hadoop•MapReduce•NoSQL DB•Index•OLAP

•EDW

X86 or VM: 通用型主機 & 雲架構Grid & Cluster: 高效能 & 高可用性

Scale-out: 從小長到極大

標準 ANSI SQL 支援JDBC / ODBC: 客製化 APP 整合

各種 BI tool 整合先進統計分析演算法整合: open source

MPP: 大量平行分析處理架構

Collaboration: 協同合作支援

MPP: 大量平行資料載入架構 (ETL)

兼顧即時處理 & 歷史資料分析能力

應用層(Open Source)

基楚架構(Open Arch.Cloud, HA, Backup, DR total solution )

節省成本支持商轉容易擴充容易管理EMC原廠支援

開放平台標準化容易導入學習門檻低低成本

優勢效益

完整的企業Big Data應用架構

Effective B

usiness Impact

EMC/Pivotal 巨量資料商業運轉平台與加值應用加速呈現企業巨量資料應用效益

Other Cloud & BigData

Solution

Scale-out虛 擬 化標 準 化自 動 化

EMC, VMware & Pivotal 聯合策略版圖提供 Cloud & Big Data 所需的 ITaaS 整合式架構

Big/Fast DataSD-Datacenter

FASTData

FASTAnalytics

SD-Server

SD-Network

Storage Infrastructure

IaaS

VMAX VNX IsilonHA, QoS, Backup, DR, DACDP/CRR, Mgt, Security

PaaS

Xtream SW/SF

SQLfireGEMfirevFabric

VAAI, VASA, REST HDFSFlash

SD-Storage ViPR

AutomationEfficiency

Open SourceBiz Operation

Atmos

GPDBGPHD

VM, x86

vSphere

Nicira

Big Data Analytics Case study:

EMC 客戶品質部門提升磁碟可靠度計劃~ 效能大幅提升 & 複雜度大減

Lines of code:

Run time*: 5 days

Oracle EMC Greenplum

2000+ 210

< 5 mins

• Performance gains are mainly due to MPP architecture (大量平行處理)of Greenplum.

• But it is also because of improved SQL code. Previous code had:– No Window Functions– No nested queries to reshape data

SAP Architecture

SAP Apps

Database(SAP+ AIC)

Application Integration

Cloud(AIC)

50+ Legacy Systems LegacyApps

Virtualized on Vblock ( VCE: VMware+CISCO+EMC )

Best Of Breed Technology Components

Over 470 Virtual Hosts Built To

Support 9 SAP Landscapes– 370 SAP– 100+ AIC

New AIC Architecture Reviewed And Validated By Vmware

Performance:– Simulated End-To-End Transaction

Testing At 2.5X Peak Volumes– Successfully Processed 3,000 Orders In

< 10 Hours (10X Throughput Vs. Today)– 448,000 Saps In Prod

– 1 Million SAPS Capacity Installed Across 2 Vblocks

High Availability / DR Testing:

– Architecture Validated For HA At Single Points Of Failure

– RPO – Zero Data Loss

– RTO < 4 Hours

Supplier Relationship

Management (SRM)

Business Warehouse

(BW)

Supply Chain Management (SCM)

Business Planning and Consolidation

(BPC)

ERP Central Component (ECC)

Financial Supply Chain Management

(FSCM)

OracleNon-RAC

SQL server

Hyperic

Actional

Gemfire Data Fabric

Spring Integration

Sonic MQ

Contivo

Spring Batch

iWay Adapters

Spring-based WS

EMC IT 關鍵應用系統虛擬化案例~ SAP ERP system with 1,000,000SAPS capacity; Production: 56TB

TECHNOLOGY SHOWCASEEMC

• VMAX / FAST(SAN 磁碟儲存系統)• VNX (SAN+NAS磁碟儲存系統)• PowerPath (I/O路徑HA與最佳化)• SRDF (DR資料複製 ~ 1,000KM)• NetWorker (備份軟體)• Data Domain (虛擬磁帶館)• Avamar (資料備份機制)• RSA Access Manager (身份認證)• RSA enVision (Log 資安管理)• RSA Archer (資安政策與儀表板)

VCE (VMware, Cisco, EMC)• Vblock Series 700

VMware• ESXi, Vsphere, Vfabric• Site Recovery Manager

Pivotal• TC Server • SpringSource (Java Framework)• Gemfire (in-memory NoSQL DB 架構)• Greenplum (Big Data 資料分析)

ARCHITECTURE

SAP Apps

Database(SAP+ AIC)

Application Integration

Cloud(AIC)

50+ Legacy Systems LegacyApps

Virtualized on Vblock

Supplier Relationship

Management (SRM)

Business Warehouse

(BW)

Supply Chain Management (SCM)

Business Planning and Consolidation

(BPC)

ERP Central Component (ECC)

Financial Supply Chain Management

(FSCM)

OracleNon-RAC

SQL server

Hyperic

Actional

Gemfire Data FabricSpring Integration

Sonic MQ

Contivo

Spring Batch

iWay Adapters

Spring-based WS

EMC IT 關鍵應用系統虛擬化案例~ SAP ERP system with 1,000,000SAPS capacity; Production: 56TB

資料匯總

Web & App ServersN > 100

Web伺

服器

集群

應用

伺器

集群

資料庫(x86)SQL語句抽取

Rabbit MQ (x86)集群

數據同步

Gemfire伺服器(x86)集群 > 5

.

.

.數據分流分散式

並行運算

網上訂票- 餘票和訂單查詢系統解決方案

分支機搆 NN》 15

即時資料流程

原有IT系統結構資料分流 雲應用系統設計結構

即時資料複製

即時資料複製

中央資料庫小型機

資料庫小型機N > 5

資料庫小型機M > 50

• 單次查詢耗時15秒左右, result with up to 10min gap

• 無法支援高流量併發查詢,只能通過分庫來實現

• 在極端高流量併發情況,系統無法支撐

• 運行在 Unix 主機

• 單次訂票餘票查詢最長耗時150-200毫秒• 單次查詢最短耗時1-2毫秒• 同步即時變化的資料耗時秒級• 支持每秒上萬次的併發查詢,按需彈性動態擴展• 運行在Linux X86伺服器欉集

採用“資料分流”雲應用虛擬化技術方案前後對比

採用資料分流和雲應用虛擬化技術

原有技術設計框架

網上訂票-餘票查詢系統實際運行資料

EMC/Pivotal 巨量資料分析平台提供高效大量平行處理能力 – Parallel Data Load

Master

Segment Segment Segment Segment…

ETL Host

Ii tw ro kad jhIi tw ro kad jh

Ii tw ro kad jhtom Jerry 123joe blow 456larr white 789 Ii tw ro kad jhIi tw ro kad jh Ii tw

gpfdist

ClientETL Host…

Ii tw ro kad jhIi tw ro kad jh

Ii tw ro kad jhtom Jerry 123joe blow 456larr white 789 Ii tw ro kad jhIi tw ro kad jh

gpfdist

EMC/Pivotal 巨量資料分析平台提供高效大量平行處理能力 - Query

Interconnect

Storage

Independent Segment Processors

Independent Memory

Independent Direct Storage Connection

Master Segment Processor

Interconnect Switch

Query

sql

sql

sql

sql

sql

sql

sql

sql

seg1x86主機

seg2x86主機

seg3x86主機

seg4x86主機

資料將依據系統管理人員所排定時間,自動在所有節點上重新分佈

容量和性能在擴展後線性增長

Step1 : 新節點初始化加入 MPP 集群

Step2 : 資料在所有節點上重新分佈

EMC/Pivotal 巨量資料分析平台提供Scale-out 與動態線上擴充能力

Master

seg1x86主機

seg2x86主機

seg3x86主機

seg5x86主機

seg6x86主機

interconnect

seg4x86主機

EMC Analytics Lab(1,000 台 x86 主機 Grid & Clustering)

EMC/Pivotal 推出全球最強大 Hadoop 商用版軟體~ Pivotal HD (2013Mar)

Pivotal HD 架構

Pivotal HAWQ: SQL Benchmarks

4.2 198

8.7 161

2.0 415

2.7 1,285

2.8 1,815

47X

19X

208X

476X

648X

4.2 37

8.7 596

2.0 50

2.7 55

2.8 59

9X

69X

25X

20X

21X

Pivotal HDPivotal HD

improve improve

The EMC XtremSF Family

XtremSF 2200

2.2 TB

XtremSF 1400

1.4 TB

XtremSF 700

700 GB

XtremSF 550

550 GB

All Cards Are HHHL – Highest Density In The Industry

UP TO 1,130,000 IOPS

Protect In-Memory DB

EMC Isilon.雲儲存解決方案優勢與案例

Support Native CIFS/NFS/HDFSSupport 85.8B #files @ one filesystemCapacity Record : 20PB/one filesystemPerformance Record: 1.6M IOPS/CIFSAuto ILM for Inactive dataOnline balancing Capacity and Performance Address H/W EOSL Challenge without data migrationSimple for Ease of Use

Isilon Scale-out 巨量資料儲存平台with HDFS+CIFS+NFS Innovation

1. Eliminate the data load process; 2. Improve HA; 3. Help to Time to Analytics; 4. Cost Down

NASn

NAS4

NAS3

NAS2 NAS

1

Data source generation-1

Data source generation-2

Data source generation-x

LAN

CIFSNFS

HDFS

1PB 3PB

Data source generation-1

Data source generation-2

Data source generation-x

LAN

CIFSNFS HDFSload

4 copiesSPOF

xStandard Advanced

1 copy

1/3 #x86 Servers

1PB

Isilon Storage

x86 Servers

EMC巨量資料商業運轉平台 ~ 完全整合

VPLEX

DBWebAP

DBWebAP

OLTPLAN

Scale-outNAS - Isilon

CIFS/NFS

HDFS

Log Collector

files

files

OALAN

useruser

OAOA

Scale-outStorage-

VMax

SAN

blocklog

x86x86x86Structured

Data- GPDB

Hadoop

Unstructureddata

x86x86x86OLAPLAN

DBETL

BI

KMRPT

De-dupeBackup

巨量資料分析平台x86 grid

VPLEX

DB WebAP

DB WebAP

OLTPLAN

Scale-outNAS - Isilon

CIFS/NFS

HDFS

Log Collector

files

files

OALAN

useruser

OAOA

Scale-outStorage-

VMax

SAN

block log

x86 x86 x86Structured

Data-GPDB

Hadoop

Unstructureddata

x86 x86 x86 OLAPLAN

DBETL

BI

KMRPT

De-dupeBackup

巨量資料分析平台x86 grid

DBclustering

ip

FCDWDM

ip

SIEM SIEMsync

replication

replication

Active Active

機房-site2機房-site1

Data Domain Data Domain

GPHD/GPMR GPHD/GPMR

RSAenVision

RSAenVision

巨量資料來襲 EMC 讓您效能/效率再提升

透過 Pivotal+EMC 整合達到 FAST data in-Memory 資料不遺失 一份儲存投資

– For both Production and Analytics

一份備份投資– For both Production and Analytics

一份 DR 投資 (儲存 & 頻寬)– For both Production and Analytics

一份管理投資 節省 ETL 時間與 IT 投資 x86 Grid & Clustering 彈性與選擇性 原廠技援

[進一步洽詢][email protected]

多大的資料才是巨量資料呢?

Data Data<?

Isilon support native NFS/CIFS/HDFS Accelerating Enterprise Hadoop Adoption

1Scale-Out Storage Platform

– Multiple applications & workflows

2No Single Point of Failure

– Distributed Namenode

3End-to-End Data Protection

– SnapshotIQ, SyncIQ, NDMP Backup

4Industry-Leading Storage Efficiency

– >80% Storage Utilization

5Independent Scalability

– Add compute & storage separately

6Multi-Protocol

– Industry standard protocols

– NFS, CIFS, FTP, HTTP, HDFS

Sourcesdata

Isilonnamenode

namenode

namenode

namenode

namenode

Computing nodes