variety, velocity and volume · cap theorem nosql databases, cloud computing agility polyglot...

Post on 22-Aug-2020

18 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

© 2011 by The 451 Group. All rights reserved

Matthew Aslett Senior analyst, enterprise software matthew.aslett@the451group.com

Variety, Velocity and Volume Meeting the performance challenges of Big Data in the enterprise

© 2011 by The 451 Group. All rights reserved

Big Data, Total Data

2

What is it?

Current data management trends

What technologies are involved?

When to use them

The drivers behind emerging technology choices

© 2011 by The 451 Group. All rights reserved © 2011 by The 451 Group. All rights reserved

451 Research is focused on the business of enterprise IT innovation. The company’s analysts provide critical and timely insight into the competitive dynamics of innovation in emerging technology segments.

The 451 Group

Tier1 Research is a single-source research and advisory firm covering the multi-tenant datacenter, hosting, IT and cloud-computing sectors, blending the best of industry and financial research.

The Uptime Institute is ‘The Global Data Center Authority’ and a pioneer in the creation and facilitation of end-user knowledge communities to improve reliability and uninterruptible availability in datacenter facilities.

TheInfoPro is a leading IT advisory and research firm that provides real-world perspectives on the customer and market dynamics of the enterprise information technology landscape, harnessing the collective knowledge and insight of leading IT organizations worldwide.

ChangeWave Research is a research firm that identifies and quantifies ‘change’ in consumer spending behavior, corporate purchasing, and industry, company and technology trends.

© 2011 by The 451 Group. All rights reserved

Coverage areas

Matthew Aslett Senior analyst, enterprise software

With The 451 Group since 2007

matthew.aslett@the451group.com

www.twitter.com/maslett

Commercial Adoption of Open Source (CAOS) Adoption by enterprises

Adoption by vendors

Information Management Database

Data warehousing

Data caching

4

© 2011 by The 451 Group. All rights reserved

Data Management Trends

© 2011 by The 451 Group. All rights reserved

Current data management trends

6

The volume, variety and velocity of data is growing rapidly

Data processing capabilities have never been better

The value of data has never been better understood

The data deluge problem is also a big data opportunity

RISK OPPORTUNITY

© 2011 by The 451 Group. All rights reserved

What is Big Data?

7

More than just rising data volumes

Big Data ≠ Volume

© 2011 by The 451 Group. All rights reserved

What is Big Data?

8

Also variety of data types/sources and velocity of data updates

Big Data = Volume ± Variety ± Velocity

© 2011 by The 451 Group. All rights reserved

Current data management trends

9

The volume, variety and velocity of data is growing rapidly

Data processing capabilities have never been better

The value of data has never been better understood

‘Big Data’ covers a diverse set of products that can be applied to different problems

‘Big Data’ highlights the problem – volume/variety/velocity,

and promises a solution – value,

but doesn’t provide a path in between

RISK OPPORTUNITY

© 2011 by The 451 Group. All rights reserved

Total Data

© 2011 by The 451 Group. All rights reserved

What is Total Data?

11

Not just another name for Big Data

A concept defined by The 451 Group to describe new approaches to data management – beyond restrictive silos

Reflects the changing data management landscape as pragmatic choices are being made about data storage and analysis techniques

Inspired by ‘Total Football’

© 2011 by The 451 Group. All rights reserved

What is Total Data?

12

Also the desire of the user to store and process all their data

Value = (Volume ± Variety ± Velocity) x Totality

© 2011 by The 451 Group. All rights reserved

What is Total Data?

13

Within tolerable time frames

Value = (Volume ± Variety ± Velocity) x Totality

Time

Data

volume/

variety/

velocity

Rate of query

Value of data

© 2011 by The 451 Group. All rights reserved

What is Total Data?

14

Within tolerable time frames

Value = (Volume ± Variety ± Velocity) x Totality

Time

Total Data is making the most efficient use of existing and new data management resources to deliver value from data

The technologies deployed depend on which factor is most significant to the problem and the nature of the query

© 2011 by The 451 Group. All rights reserved

Technology choices

© 2011 by The 451 Group. All rights reserved

Application stack

Hardware

Database

Users

Application

© 2011 by The 451 Group. All rights reserved

Traditional scalability

Database

Users

Application

Users

Application

Users

Application

Hardware

© 2011 by The 451 Group. All rights reserved

Commodity hardware

Hardware Hardware Hardware Hardware Hardware Hardware Hardware Hardware

Database

Application Application Application

Users Users Users

© 2011 by The 451 Group. All rights reserved

User explosion

Users Users Users Users Users Users Users Users

Hardware Hardware Hardware Hardware Hardware Hardware Hardware Hardware

Database

Application Application Application

© 2011 by The 451 Group. All rights reserved

Application scalability

Users Users Users Users Users Users Users Users

Application Application Application Application Application Application

Hardware Hardware Hardware Hardware Hardware Hardware Hardware Hardware

Database

© 2011 by The 451 Group. All rights reserved

Data management use cases

Database

Operational

Analytic

© 2011 by The 451 Group. All rights reserved

Data management use cases

Data management

real-time transaction

and data ingestion

large scale data storage and analysis

© 2011 by The 451 Group. All rights reserved

Data management requirements

real-time transaction

and data ingestion

large scale data storage and analysis

Data ingestion/analysis • random reads and writes • real-time • low, predictable latency • high performance Data storage/analysis • lower-cost storage • large-scale analytics • read heavy • batch processing

© 2011 by The 451 Group. All rights reserved

Data management requirements

real-time transaction

and data ingestion

large scale data storage and analysis

Data ingestion/analysis • MySQL • Data caching • NoSQL, NewSQL databases • Stream processing Data storage/analysis • Data warehouse/marts • In-database analytics • Online repository • Hadoop

© 2011 by The 451 Group. All rights reserved

Emerging scalability

Users Users Users Users Users Users Users Users

Application Application Application Application Application Application

Hardware Hardware Hardware Hardware Hardware Hardware Hardware Hardware

Data ingestion/

analysis

Data

storage/analysis

Data ingestion/

analysis

Data storage/a

nalysis

Data ingestion/

analysis

Data storage/a

nalysis

Data ingestion/

analysis

Data

storage/analysis

Data ingestion/

analysis

Data storage/a

nalysis

Data ingestion/

analysis

Data storage/a

nalysis

Data storage/a

nalysis

Data storage/a

nalysis

© 2011 by The 451 Group. All rights reserved

Database SPRAIN

26

Scalability Hardware economics

NoSQL, NewSQL databases, Hadoop, cloud computing

Performance Database limitations

Data caching, in-memory, virtual machine, stream/event processing

Relaxed consistency

CAP Theorem NoSQL databases, cloud computing

Agility Polyglot persistence

Agile development, schema-free, non-relational, in-memory

Intricacy Big data, total data

Non-relational database, NoSQL, Hadoop, in-database analytics, memory storage

Necessity Open source The failure of incumbent vendors to address emerging requirements

© 2011 by The 451 Group. All rights reserved

Emerging scalability

Users Users Users Users Users Users Users Users

Application Application Application Application Application Application

Hardware Hardware Hardware Hardware Hardware Hardware Hardware Hardware

Data ingestion/

analysis

Data

storage/analysis

Data ingestion/

analysis

Data storage/a

nalysis

Data ingestion/

analysis

Data storage/a

nalysis

Data ingestion/

analysis

Data

storage/analysis

Data ingestion/

analysis

Data storage/a

nalysis

Data ingestion/

analysis

Data storage/a

nalysis

Data storage/a

nalysis

Data storage/a

nalysis

Virtual machine

Virtual machine

© 2011 by The 451 Group. All rights reserved

Relevant reports

Total Data

Explaining the total data management approach to dealing with the impact of big data on the data management landscape

Coming late 2011

Including the growing Hadoop ecosystem and real-time

sales@the451group.com

COMING LATE 2011

Gil Tene

CTO, Azul Systems

Enterprise Big Data

Java, Building Blocks and Performance Impacts

©2011 Azul Systems, Inc. 2 2

What Azul does for the Big Data space

• We provide Java runtimes that scale consistently

• We eliminate the Garbage Collection problem

• Our runtimes are elastic in nature

• Our runtimes are an ideal building block for Big Data

©2011 Azul Systems, Inc. 3 3

Many technologies support Big Data

• Operational databases / data warehouses/marts

• Data integration / data virtualization

• Business intelligence tools

• Hadoop / equivalent / alternative technology

• Relational / non-relational databases

• In-memory databases

• Data caching

• Stream / event processing

• In-database analytics

• Disk / memory storage

©2011 Azul Systems, Inc. | Azul Company Confidential 4

Value and infrastructure

©2011 Azul Systems, Inc. 5 5

Value inflection points, Data

Data

Value

Volume, Velocity,

Variety

Value of data

Value = (Volume ± Variety ± Velocity) x Totality

Time

©2011 Azul Systems, Inc. 6 6

Volume / Velocity/ Variety

translation to underlying capacity metrics

• Higher Volume = larger data set sizes, amounts of state

• Higher Velocity = higher processing rate, throughput

• Higher Variety = more metadata, indexes, cross-linking

©2011 Azul Systems, Inc. | Azul Company Confidential 7

Timeliness

©2011 Azul Systems, Inc. 8 8

Inflection points – Timeliness

Data

Value

1/t

Value of data

Value = (Volume ± Variety ± Velocity) x Totality

Time

©2011 Azul Systems, Inc. 9 9

Real world example of timeliness:

Affecting shopping decisions

I went [online] shopping for a camping trip last week

• Spent multi-$100: tent, sleeping bags, other items.

• Researched and shopped around for hours, across

multiple sites

• Relevant ads that showed up within my decision

window affected my shopping

• Ads that showed up a day later did not…

©2011 Azul Systems, Inc. 10 10

Big Data’s value

real-time

transaction

and data

ingestion

large scale

data storage

and analysis

Interactive application

• random reads and writes

• real-time

• low, predictable latency

• high performance

Data storage/analysis

• lower-cost storage

• large-scale analytics

• read heavy

• batch processing

©2011 Azul Systems, Inc. 11 11

Big Data’s value: enhanced by timeliness

real-time

transaction

and data

ingestion

large scale

data storage

and analysis

Interactive application

• random reads and writes

• real-time

• low, predictable latency

• high performance

Data storage/analysis

• lower-cost storage

• large-scale analytics

• read heavy

• batch processing?

Increased

Value

5 seconds

5 minutes

5 hours

1 day

1 week

©2011 Azul Systems, Inc. | Azul Company Confidential 12

Building blocks

©2011 Azul Systems, Inc. 13 13

Choice of Building blocks size

Building block granularity

©2011 Azul Systems, Inc. 14 14

Inflection points – building block size

Data

Value

Building

Block Size

Value of data

Value = (Volume ± Variety ± Velocity) x Totality

Time

©2011 Azul Systems, Inc. 15 15

Building blocks

• There are value inflection points to building block size

─ At some size levels, orders-of-magnitude improvement occur

─ E.g. when entire index fits in each replica/memory space

─ E.g. when partition sizes are big enough

• Typical physical building block

─ 12-24 core, dual-socket servers

─ 24-96 GB DRAM

• Typical process level building block

─ A Java JVM

─ 1 - 4GB of memory, 1-2 cores

• Why the mismatch?

©2011 Azul Systems, Inc. 16 16

Productivity with a scale challenge

• Variety of languages and technologies in Big Data

• Java/JVM is the enterprise default, highly productive

• But…

• Building blocks are practically limited in size

• Main limitation reason: Garbage Collection

• Main Garbage Collection issue: GC pause times

• Imposes practical limits due to stability/responsiveness

• Physical building blocks broken into 10s of tiny pieces

©2011 Azul Systems, Inc. | Azul Company Confidential 17

Critical infrastructure units

©2011 Azul Systems, Inc. 18 18

Critical unit example - HDFS

Source: Apache Hadoop documentation

©2011 Azul Systems, Inc. 19 19

Critical Infrastructure units

Size of critical infrastructure units drive cluster scale

• Central metadata nodes

• Central in-memory index nodes

• Central authoritative data nodes

• Graph-DB and dense relationship nodes

©2011 Azul Systems, Inc. 20 20

Summary: Azul’s Impact on Big Data

Break the building block size barrier

• Each instance can elastically fill an entire physical unit

• Expose/leverage value inflection points

• Remove/Expand cluster scale limitations

• Reduce/Consolidate instance counts

• Address tuning challenges

• Ensure predictable performance

For More Information: Azul Systems Web site: www.azulsystems.com

Technical resources: ./resources

Zing trial: ./trial

451 Group: www.the451group.com

blogs.the451group.com/information_management/

Webinar replay:

www.azulsystems.com/resources/webinars

top related