big data analytics preview

Post on 10-May-2015

761 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

A preview of my Big Data Analytics course on Pluralsight.com - Presented at the San Diego Tableau User Group Nov 7th, 2013

TRANSCRIPT

Big Data Analytics Preview

Ben SullinsBensullins.com

ben@bensullins.com

Course Outline

Introduction to Big DataMassively Parallel Processing (MPP) DatabasesCloud Big Data sourcesAccessing Big Data with TableauVisualizing your Big Data with TableauSharing your work

Why Big Data? - New Data Types

BI & Analytic

s

Business

Data

Web Logs

Videos

Images

Sensor Data

3rd Party Apps

Why Big Data? – Massive Content

Why Big Data? – Variety of Data

Data Volume Growth

Why Big Data? – Storage is Cheap!

Hard Drive Costs per GB since 1980

Big Data

What it IS What it IS NOTUnstructured

Petabytes+

Evolution of RDBMS

Many Platforms

Difficult for Analytics

Transactional

Simple or Easy

Structured DW

One Platform

Easy or Fast for Analytics

Big Data Storage – Document Stores

Data JSON Document Data

Data

Data

original

copy

copy

PlatformsHDFS, ElasticSearch, CouchDB

Big Data Platforms – Platform Vendors

Big Data in the Cloud

Amazon Redshift Architecture

Columnar DB MPP Architecture Speed!

Amazon Redshift Scalability

2TBXL Node

High Storage Extra Large (XL) DW Node:CPU: 2 virtual cores - Intel Xeon E5Memory: 15 GiBStorage: 3 HDD with 2TB of local attached storageNetwork: ModerateDisk I/O: ModerateAPI: dw.hs1.xlarge

High Storage Eight Extra Large (8XL) DW Node:

CPU: 16 virtual cores - Intel Xeon E5Memory: 120 GiBStorage: 24 HDD with 16TB of local attached storageNetwork: 10 Gigabit Ethernet with support for cluster placement groupsDisk I/O: Very HighAPI: dw.hs1.8xlarge

16TB8XL Node

Amazon Redshift CostOn-Demand PricingDW Node Class (On-Demand) Hourly

XL Node - 2TB storage (Per Node)

$0.850 per Hour

8XL Node - 16TB storage (Per Node)

$6.800 per Hour

DW Node Class (Reserved) Up front Hourly

XL Node - 2TB storage (Per Node) $2,500 $0.215 per Hour

8XL Node - 16TB storage (Per Node) $20,000 $1.720 per Hour

DW Node Class (Reserved) Up front Hourly

XL Node - 2TB storage (Per Node) $3,000 $0.114 per Hour

8XL Node - 16TB storage (Per Node) $24,000 $0.912 per Hour

Reserved Instance 1yr (41% savings)

Reserved Instance 3yr (73% savings)

Amazon Redshift Ease of Use

Fully Managed

Fault Tolerant

Automated Backups

Web Interface

Amazon Redshift Security

AES-256 bit Encryption Amazon VPC Firewall

Amazon Redshift Compatibility

BigQuery

Google Big Query Architecture

Columnar DB Speed!Tree Architecture

Google BigQuery on Speed

“Dremel can

Scan 35 Billion Rows without an Index in

Tens of Seconds” – Solutions Architect, Google Cloud Solutions Team

Google BigQuery Scalability

?

Google BigQuery Cost

Resource Pricing

Storage $80 (per TB/month)

Interactive Queries $35 (per TB processed)

Batch Queries $20 (per TB processed)

On-Demand Pricing

Data Cost

100 TB $3,300 per month ($33 per TB)

400 TB $12,000 per month ($30 per TB)

1,500 TB $40,500 per month ($27 per TB)

4,000 TB $100,000 per month ($25 per TB)

Packaged Pricing

• Packages are billed in full at the end of each month, whether the package is used or not.

• If you use more data than the amount in your chosen package, on-demand rates apply for any additional data.

Google BigQuery: Compatibility

Cloud Big Data Sources Comparison

Amazon Redshift

Columnar + MPP

Petabytes in Scale

Easy management interface

Straight forward billing ($1K/TB/Yr)

Great connectivity w/ BI Tools

Google BigQuery

Columnar + Tree

Infinite Scalability

No Management Required

Confusing Pricing Model

Fair Connectivity w/ BI Tools

bensullins.com

top related