getting your big data on with hdinsight

25
Simon Elliston Ball Head of Big Data @sireb Getting your Big Data on with HDInsight http://bit.ly/GettingHDInsight #gettingHDInsight

Upload: simon-elliston-ball

Post on 25-May-2015

239 views

Category:

Technology


1 download

DESCRIPTION

Introduction to HDInsight, and its capabilities, including Azure Storage, Hive, MapReduce, Mahout and HBase. See also some of the tools mentioned at http://bigdata.red-gate.com/ and source code at https://github.com/simonellistonball/GettingYourBigDataOnMapReduce

TRANSCRIPT

Page 1: Getting your Big Data on with HDInsight

Simon Elliston Ball Head of Big Data

@sireb

Getting your Big Data on with HDInsight

http://bit.ly/GettingHDInsight#gettingHDInsight

Page 2: Getting your Big Data on with HDInsight

HDInsight: Hadoop on Azure.

Page 3: Getting your Big Data on with HDInsight

HDInsight: Hadoop

Page 4: Getting your Big Data on with HDInsight

wasb://

HDInsight: Hadoop on Azure.

Page 5: Getting your Big Data on with HDInsight

wasb://

YARN

HDInsight: Hadoop on Azure.

Page 6: Getting your Big Data on with HDInsight

wasb://

YARN

Page 7: Getting your Big Data on with HDInsight

Big Data

What can I do with it?

Data warehousing

Machine Learning

Batch Analytics

ETL

Page 8: Getting your Big Data on with HDInsight

HDInsight (c. 2013)

Page 9: Getting your Big Data on with HDInsight

All grown up

Page 10: Getting your Big Data on with HDInsight

Portal

Creating a cluster

PowerShell

Page 11: Getting your Big Data on with HDInsight

Getting data in

http://www.cerebrata.com/products/azure-explorer/

http://bigdata.red-gate.com/hdfs-explorer

Page 12: Getting your Big Data on with HDInsight

Import Export tool for RDBMS

Sqoop up that SQL

Command line based

Generates Map Reduce jobs

Doing it with PowerShell

Page 13: Getting your Big Data on with HDInsight

Demo!

Sqoop up that SQL

Page 14: Getting your Big Data on with HDInsight

SELECT * FROM hivesampletable

Hive: like SQL

Support for window functions

Rollups, aggregates

Page 15: Getting your Big Data on with HDInsight

Limited support for some SQL features

Hive: like SQL, but…

Works on arbitrary data

Schema on Read

Page 16: Getting your Big Data on with HDInsight

Demo!

Hive

Page 17: Getting your Big Data on with HDInsight

Java based

MapReduce

Simple algorithm

key: valuea:1a:1b:1c:1

a:1,1b:1c:1

Map Sort / Shuffle Reduce

a:2b:1c:1

key: value key: value

Page 18: Getting your Big Data on with HDInsight

Streaming Interface

MapReduce .NET

http://hadoopsdk.codeplex.com/

PM> Install-Package Microsoft.Hadoop.MapReduce

Page 19: Getting your Big Data on with HDInsight

Demo!

MapReduce .NET

Page 20: Getting your Big Data on with HDInsight

Machine learning library for Hadoop

Mahout

Just another Hadoop Job

All packaged in a jar

Page 21: Getting your Big Data on with HDInsight

X

Page 22: Getting your Big Data on with HDInsight

Demo!

Excel and HDInsight

Page 23: Getting your Big Data on with HDInsight

High performance Key-Value store

HBase

Different cluster type in the portal

Can link to MapReduce and Hive

Page 24: Getting your Big Data on with HDInsight

HDFS Explorer

Quick plug

http://bigdata.red-gate.com/

Hadoop Import/Export

Page 25: Getting your Big Data on with HDInsight

Questions?Simon Elliston Ball [email protected]

@sireb

http://bit.ly/GettingHDInsight #gettingHDInsight