big data analytics preview
DESCRIPTION
A preview of my Big Data Analytics course on Pluralsight.com - Presented at the San Diego Tableau User Group Nov 7th, 2013TRANSCRIPT
![Page 2: Big Data Analytics Preview](https://reader035.vdocuments.us/reader035/viewer/2022062513/554e8f35b4c90526358b4ce2/html5/thumbnails/2.jpg)
Course Outline
Introduction to Big DataMassively Parallel Processing (MPP) DatabasesCloud Big Data sourcesAccessing Big Data with TableauVisualizing your Big Data with TableauSharing your work
![Page 3: Big Data Analytics Preview](https://reader035.vdocuments.us/reader035/viewer/2022062513/554e8f35b4c90526358b4ce2/html5/thumbnails/3.jpg)
Why Big Data? - New Data Types
BI & Analytic
s
Business
Data
Web Logs
Videos
Images
Sensor Data
3rd Party Apps
![Page 4: Big Data Analytics Preview](https://reader035.vdocuments.us/reader035/viewer/2022062513/554e8f35b4c90526358b4ce2/html5/thumbnails/4.jpg)
Why Big Data? – Massive Content
![Page 5: Big Data Analytics Preview](https://reader035.vdocuments.us/reader035/viewer/2022062513/554e8f35b4c90526358b4ce2/html5/thumbnails/5.jpg)
Why Big Data? – Variety of Data
Data Volume Growth
![Page 6: Big Data Analytics Preview](https://reader035.vdocuments.us/reader035/viewer/2022062513/554e8f35b4c90526358b4ce2/html5/thumbnails/6.jpg)
Why Big Data? – Storage is Cheap!
Hard Drive Costs per GB since 1980
![Page 7: Big Data Analytics Preview](https://reader035.vdocuments.us/reader035/viewer/2022062513/554e8f35b4c90526358b4ce2/html5/thumbnails/7.jpg)
Big Data
What it IS What it IS NOTUnstructured
Petabytes+
Evolution of RDBMS
Many Platforms
Difficult for Analytics
Transactional
Simple or Easy
Structured DW
One Platform
Easy or Fast for Analytics
![Page 8: Big Data Analytics Preview](https://reader035.vdocuments.us/reader035/viewer/2022062513/554e8f35b4c90526358b4ce2/html5/thumbnails/8.jpg)
Big Data Storage – Document Stores
Data JSON Document Data
Data
Data
original
copy
copy
PlatformsHDFS, ElasticSearch, CouchDB
![Page 9: Big Data Analytics Preview](https://reader035.vdocuments.us/reader035/viewer/2022062513/554e8f35b4c90526358b4ce2/html5/thumbnails/9.jpg)
Big Data Platforms – Platform Vendors
![Page 10: Big Data Analytics Preview](https://reader035.vdocuments.us/reader035/viewer/2022062513/554e8f35b4c90526358b4ce2/html5/thumbnails/10.jpg)
Big Data in the Cloud
![Page 11: Big Data Analytics Preview](https://reader035.vdocuments.us/reader035/viewer/2022062513/554e8f35b4c90526358b4ce2/html5/thumbnails/11.jpg)
![Page 12: Big Data Analytics Preview](https://reader035.vdocuments.us/reader035/viewer/2022062513/554e8f35b4c90526358b4ce2/html5/thumbnails/12.jpg)
Amazon Redshift Architecture
Columnar DB MPP Architecture Speed!
![Page 13: Big Data Analytics Preview](https://reader035.vdocuments.us/reader035/viewer/2022062513/554e8f35b4c90526358b4ce2/html5/thumbnails/13.jpg)
Amazon Redshift Scalability
2TBXL Node
High Storage Extra Large (XL) DW Node:CPU: 2 virtual cores - Intel Xeon E5Memory: 15 GiBStorage: 3 HDD with 2TB of local attached storageNetwork: ModerateDisk I/O: ModerateAPI: dw.hs1.xlarge
High Storage Eight Extra Large (8XL) DW Node:
CPU: 16 virtual cores - Intel Xeon E5Memory: 120 GiBStorage: 24 HDD with 16TB of local attached storageNetwork: 10 Gigabit Ethernet with support for cluster placement groupsDisk I/O: Very HighAPI: dw.hs1.8xlarge
16TB8XL Node
![Page 14: Big Data Analytics Preview](https://reader035.vdocuments.us/reader035/viewer/2022062513/554e8f35b4c90526358b4ce2/html5/thumbnails/14.jpg)
Amazon Redshift CostOn-Demand PricingDW Node Class (On-Demand) Hourly
XL Node - 2TB storage (Per Node)
$0.850 per Hour
8XL Node - 16TB storage (Per Node)
$6.800 per Hour
DW Node Class (Reserved) Up front Hourly
XL Node - 2TB storage (Per Node) $2,500 $0.215 per Hour
8XL Node - 16TB storage (Per Node) $20,000 $1.720 per Hour
DW Node Class (Reserved) Up front Hourly
XL Node - 2TB storage (Per Node) $3,000 $0.114 per Hour
8XL Node - 16TB storage (Per Node) $24,000 $0.912 per Hour
Reserved Instance 1yr (41% savings)
Reserved Instance 3yr (73% savings)
![Page 15: Big Data Analytics Preview](https://reader035.vdocuments.us/reader035/viewer/2022062513/554e8f35b4c90526358b4ce2/html5/thumbnails/15.jpg)
Amazon Redshift Ease of Use
Fully Managed
Fault Tolerant
Automated Backups
Web Interface
![Page 16: Big Data Analytics Preview](https://reader035.vdocuments.us/reader035/viewer/2022062513/554e8f35b4c90526358b4ce2/html5/thumbnails/16.jpg)
Amazon Redshift Security
AES-256 bit Encryption Amazon VPC Firewall
![Page 17: Big Data Analytics Preview](https://reader035.vdocuments.us/reader035/viewer/2022062513/554e8f35b4c90526358b4ce2/html5/thumbnails/17.jpg)
Amazon Redshift Compatibility
![Page 18: Big Data Analytics Preview](https://reader035.vdocuments.us/reader035/viewer/2022062513/554e8f35b4c90526358b4ce2/html5/thumbnails/18.jpg)
BigQuery
![Page 19: Big Data Analytics Preview](https://reader035.vdocuments.us/reader035/viewer/2022062513/554e8f35b4c90526358b4ce2/html5/thumbnails/19.jpg)
![Page 20: Big Data Analytics Preview](https://reader035.vdocuments.us/reader035/viewer/2022062513/554e8f35b4c90526358b4ce2/html5/thumbnails/20.jpg)
Google Big Query Architecture
Columnar DB Speed!Tree Architecture
![Page 21: Big Data Analytics Preview](https://reader035.vdocuments.us/reader035/viewer/2022062513/554e8f35b4c90526358b4ce2/html5/thumbnails/21.jpg)
Google BigQuery on Speed
“Dremel can
Scan 35 Billion Rows without an Index in
Tens of Seconds” – Solutions Architect, Google Cloud Solutions Team
![Page 22: Big Data Analytics Preview](https://reader035.vdocuments.us/reader035/viewer/2022062513/554e8f35b4c90526358b4ce2/html5/thumbnails/22.jpg)
Google BigQuery Scalability
?
![Page 23: Big Data Analytics Preview](https://reader035.vdocuments.us/reader035/viewer/2022062513/554e8f35b4c90526358b4ce2/html5/thumbnails/23.jpg)
Google BigQuery Cost
Resource Pricing
Storage $80 (per TB/month)
Interactive Queries $35 (per TB processed)
Batch Queries $20 (per TB processed)
On-Demand Pricing
Data Cost
100 TB $3,300 per month ($33 per TB)
400 TB $12,000 per month ($30 per TB)
1,500 TB $40,500 per month ($27 per TB)
4,000 TB $100,000 per month ($25 per TB)
Packaged Pricing
• Packages are billed in full at the end of each month, whether the package is used or not.
• If you use more data than the amount in your chosen package, on-demand rates apply for any additional data.
![Page 24: Big Data Analytics Preview](https://reader035.vdocuments.us/reader035/viewer/2022062513/554e8f35b4c90526358b4ce2/html5/thumbnails/24.jpg)
Google BigQuery: Compatibility
![Page 25: Big Data Analytics Preview](https://reader035.vdocuments.us/reader035/viewer/2022062513/554e8f35b4c90526358b4ce2/html5/thumbnails/25.jpg)
Cloud Big Data Sources Comparison
Amazon Redshift
Columnar + MPP
Petabytes in Scale
Easy management interface
Straight forward billing ($1K/TB/Yr)
Great connectivity w/ BI Tools
Google BigQuery
Columnar + Tree
Infinite Scalability
No Management Required
Confusing Pricing Model
Fair Connectivity w/ BI Tools
![Page 26: Big Data Analytics Preview](https://reader035.vdocuments.us/reader035/viewer/2022062513/554e8f35b4c90526358b4ce2/html5/thumbnails/26.jpg)
bensullins.com