getting your big data on with hdinsight
DESCRIPTION
Introduction to HDInsight, and its capabilities, including Azure Storage, Hive, MapReduce, Mahout and HBase. See also some of the tools mentioned at http://bigdata.red-gate.com/ and source code at https://github.com/simonellistonball/GettingYourBigDataOnMapReduceTRANSCRIPT
Simon Elliston Ball Head of Big Data
@sireb
Getting your Big Data on with HDInsight
http://bit.ly/GettingHDInsight#gettingHDInsight
HDInsight: Hadoop on Azure.
HDInsight: Hadoop
wasb://
HDInsight: Hadoop on Azure.
wasb://
YARN
HDInsight: Hadoop on Azure.
wasb://
YARN
Big Data
What can I do with it?
Data warehousing
Machine Learning
Batch Analytics
ETL
HDInsight (c. 2013)
All grown up
Portal
Creating a cluster
PowerShell
Getting data in
http://www.cerebrata.com/products/azure-explorer/
http://bigdata.red-gate.com/hdfs-explorer
Import Export tool for RDBMS
Sqoop up that SQL
Command line based
Generates Map Reduce jobs
Doing it with PowerShell
Demo!
Sqoop up that SQL
SELECT * FROM hivesampletable
Hive: like SQL
Support for window functions
Rollups, aggregates
Limited support for some SQL features
Hive: like SQL, but…
Works on arbitrary data
Schema on Read
Demo!
Hive
Java based
MapReduce
Simple algorithm
key: valuea:1a:1b:1c:1
a:1,1b:1c:1
Map Sort / Shuffle Reduce
a:2b:1c:1
key: value key: value
Streaming Interface
MapReduce .NET
http://hadoopsdk.codeplex.com/
PM> Install-Package Microsoft.Hadoop.MapReduce
Demo!
MapReduce .NET
Machine learning library for Hadoop
Mahout
Just another Hadoop Job
All packaged in a jar
X
Demo!
Excel and HDInsight
High performance Key-Value store
HBase
Different cluster type in the portal
Can link to MapReduce and Hive
HDFS Explorer
Quick plug
http://bigdata.red-gate.com/
Hadoop Import/Export
Questions?Simon Elliston Ball [email protected]
@sireb
http://bit.ly/GettingHDInsight #gettingHDInsight