the challenges of open source cloud databasespeech5)oushu_mr... · 2019-11-21 · sql-on-hadoop...

Post on 20-May-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The Challenges of Open Source Cloud Database

01 About Oushu

02 Background

03 Cloud Database

04 Apache HAWQ

目录CONTENT

About Oushu

Founded by Apache HAWQ core team. Apache HAWQ is 1st Apache Database TLP initiated by team in China

Focusing on AI and Big Data. Products: Cloud Database OushuDB/HAWQ、Littleboy AI、Lava

Most team members are from EMC, Oracle, IBM, Teradata, Google, Amazon et al.

The first Chinese Company selling database to top US companies. Hundreds of enterprise users across the world

Invested by Sequoia and Red point. Microsoft accelerator 2018

Core members hold dozens of US patents, Work published on top database conference: SIGMOD

Ou sh u Cu sto mers & Partn ers

Energy

Telcom 公安部多省系统

Finance

5

Database History:57 years

• Database:1962

⁃ InvertedFileDatabaseSystem

⁃ SystemDevelopmentCorporation

• SeveralphasesofDatabase

⁃ 1960s:NavigationalDBMS(network&hierarchical)

ü IntegratedDataStore (IDS)

ü InformationManagementSystem (IMS)

⁃ 1970s- 1990s:SQL/RelationalDBMS

ü OLTP,Datawarehouse

⁃ 2000s- Present

ü MPP,Hadoop,NoSQL(XML,KV,Graph,Tree),NewSQL,Cloud Database

6

Cloud Database

• Publiccloud:Database as a service

⁃ Amazon Redshift (ParAccel MPP)

⁃ RDS (PostgreSQL,MySQLetal)

⁃ OushuDB

• PrivateCloud:Virtual machine/Docker container

7

Cloud Database vs Traditional Database

• Difference

⁃ Howusersusethedatabase

⁃ Billing

⁃ RunningEnvironment

ü Virtualization:ResourceManagement

⁃ Ecosystem

ü InfrastructureServices:S3 etal.

⁃ Elasticity

⁃ Security

• Same

⁃ Datamodel&QueryingLanguage

⁃ Queryoptimization&Execution

⁃ Indexes&Storage

⁃ TransactionProcessing

⁃ etal

The Evolution Path of Analytical Database

Cloud Native (about 2015)

Traditional DWDedicated Hardware (1980s)

• Application:Reporting• Scale:10s;• SQL Compatibility:Good• Performance:Middle• Cloud Support:Weak• Examples:Oracle,DB2,Teradata

Network

Storage(SAN, NAS)

Compute(RDBMS, EDW) Compute Memory Storage

zz

SQL-on-HadoopSeparation of Storage & Compute (2000s)

• Application:Big Data• Scale:1000s;• SQL Compatibility:Weak• Performance:Middle• Cloud Support:Weak• Examples:Hive,SparkSQL

StorageCompute

X86 MPP:Share Nothing (2000s)

• Application:Big Data• Scale: 100s• SQL Compatibility:Good• Performance:Middle• Cloud Support:Weak• Examples:Greenplum, Vertica

Network

Memory

CPU CPU

Memory

CPU CPU

• Application:AI、Cloud、IOT, Big Data• Scale:1000s;• SQL Compatibility:Good• Performance:Good• Cloud Support:Native• Examples:OushuDB, HAWQ, Snowflake

HAWQ & OushuDB

HAWQ Started

HAWQ 1.0 Alpha

2011

2012

HAWQ 1.0GA2013

HAWQ 1.X版2014

HAWQ 2.0 Alpha2015

OushuDB2016

OushuDB 3.02017

OushuDB 4.02018-2019

Oushu Founder Lei Chang Initiated the

project at EMC

Hundreds of times faster than Hive

Separate Compute & Storage

HAWQ SIGMODpaper published

Become ApacheIncubating Project

Oushu Founded. Focusing on HAWQ &

OushuDB

10 times faster than HAWQ2.0

Support Update/Delete/Index

OushuDB/HAWQ used by hundreds of companies across the world

HAWQ Becomes Apache Top Level Project

HAWQ Main Features

● DiscoverNewRelationships● EnableDataScience● AnalyzeExternalSources● QueryAllDataTypes!

Multi-levelFaultTolerance

GranularAuthorization

Resourcequeues

highmulti-tenancy

ANSISQLStandard

OLAP Extensions

JDBCODBCConnectivity

ElasticRuntime Online Expansion

HDFS/Magma/Hadoop

PetabyteScale

CostBasedOptimizer

DynamicPipelining

ACID+Transactional

Multi-LanguageUDFSupport

Built-in DataScience Library

Extensible(PXF) QueryExternalSources

Accessibility+Usability

HDFSNativeFileFormats

● ManageMultipleWorkloads● PetabyteScaleAnalytics● Sub-secondPerformance

● LeverageExistingSkills&Tools

● EasilyIntegratewithOtherTools

Compression +Partitioning

core

compliance

● WellIntegratedwithHadoopEcosystem

HAWQ vs Others

Hadoop Native & Open & Scalability

Proprietary & limited scalability

LimitedPerformance &

SQL Compliance

Big SQL

Vortex

HighPerformance &

SQL Compliance

SQL

Contributing to HAWQ

• Documentation

• Wiki

• Bugreports

• Bugfixes

• Features

• Website:http://hawq.incubator.apache.org/

• Wiki:https://cwiki.apache.org/confluence/display/HAWQ

• Repo:https://github.com/apache/hawq• JIRA:

https://issues.apache.org/jira/browse/HAWQ• Mailinglists:dev/user@hawq.apache.org

Code contribution process

• StartaJIRA

• Forkagithub repo:https://github.com/apache/hawq.git

• Cloneyourrepotolocal

• Addthegithub repoas“upstream”

• Createafeaturebranchandcommityourcode

• Startapullrequestforcodereview

感谢观看

让人类只为兴趣而工作

top related