data management in microsoft hdinsight: how to move and store your data

29

Upload: saptak-sen

Post on 13-Apr-2017

182 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Data Management in Microsoft HDInsight: How to Move and Store Your Data
Page 2: Data Management in Microsoft HDInsight: How to Move and Store Your Data

Data Management in Microsoft HDInsight: How to Move and Store Your Data

Saptak SenAzure Data Platform@saptak

DBI-B334

Page 3: Data Management in Microsoft HDInsight: How to Move and Store Your Data

Agenda• What is HDInsight

• Hadoop, OSS and HDInsight • HDInsight Architecture

• Working with Data in HDInsight• Where & how to store data for easy “big data” processing

• Consuming Result Sets from HDInsight Queries/Jobs• How to move result sets into familiar tools/solutions (Excel, RDBMS, etc)

• Questions

Page 4: Data Management in Microsoft HDInsight: How to Move and Store Your Data

What is HDInsight?

Page 5: Data Management in Microsoft HDInsight: How to Move and Store Your Data

Hadoop Distributed Architecture

Page 6: Data Management in Microsoft HDInsight: How to Move and Store Your Data

MapReduce: Move Code to the Data FIRST, STORE THE DATA

Server

ServerServer

Files

Server

Page 7: Data Management in Microsoft HDInsight: How to Move and Store Your Data

So How Does It Work?SECOND, TAKE THE PROCESSING TO THE DATA

// Map Reduce function in JavaScriptvar map = function (key, value, context) {var words = value.split(/[^a-zA-Z]/);for (var i = 0; i < words.length; i++) {

if (words[i] !== "")context.write(words[i].toLowerCase(),1);}}};var reduce = function (key, values, context) {var sum = 0;while (values.hasNext()) {sum += parseInt(values.next());

}context.write(key, sum);};

ServerServer

ServerServer

RUNTIMECode

Page 8: Data Management in Microsoft HDInsight: How to Move and Store Your Data

Windows Azure HDInsight Service

Hadoop

Windows Azure Blob Storage

HDFS

Hadoop Filesystem Interface

Hive Pig Map Reduce

Query & Metadata:

SqoopData Movement: OozieWorkflow:

HCatalog

Gateway (REST APIs)

Data upload/download

Ambari

Monitoring:

Job submission (hive query, etc)

Page 9: Data Management in Microsoft HDInsight: How to Move and Store Your Data

Windows Azure HDInsight Service

Compute NodeCompute NodeCompute NodeCompute Node

Windows Azure Blob StorageHead

Node

Gateway (REST APIs)

Hadoop Cluster

Job submission (hive query, etc)

Cluster Dashboard UI

Page 10: Data Management in Microsoft HDInsight: How to Move and Store Your Data

Working With Data in HDInsight

Page 11: Data Management in Microsoft HDInsight: How to Move and Store Your Data

DEMOCreating a Hadoop Cluster, Explore Filesystem

Page 12: Data Management in Microsoft HDInsight: How to Move and Store Your Data

Storing Data for use with HDInsight Service• WHERE: All persistent data stored in Windows Azure Blob Storage• Provides sharable, persistent, highly-scalable storage with Geo DR• HDInsight has been optimized for fast access from its compute nodes to blob storage

in the same Azure region (east, west, etc)• WHAT: File format used in blob storage is up to you, but using a format with existing

serializer/deserializers (aka SerDe) is often a good choice (e.g. comma delim, Avro, JSON, etc)

• WHY: By separating HDInsight compute nodes from persistent storage you can:• Pay only for what you need: drop your HDInsight cluster whenever you don’t have

work to do• Multiple clusters access the same data, but isolate the compute resources by

org/job/team/etc.• HOW: • All data access in Hadoop goes through a pluggable file system interface• In on-prem Hadoop installations, this interface is implemented by Hadoop Distributed

File System (HDFS)• In Azure, HDInsight clusters use this mechanism to be wired to blob storage accounts

by default

Page 13: Data Management in Microsoft HDInsight: How to Move and Store Your Data

Using Blob Storage From HDInsight

asv[s]://<container>@<account>.blob.core.windows.net/<path>

<property> <name>fs.azure.account.key.accountname</name> <value>enterthekeyvaluehere</value></property>

• An HDInsight cluster is bound to one “default” blob storage account & container at cluster create time

• Using the “default” container requires no special addressing to access (“/” == root folder, etc)

• To access additional blob storage accounts or containers:

• Storage accounts other than the default need to be registered in site-config.xml:

Page 14: Data Management in Microsoft HDInsight: How to Move and Store Your Data

Uploading Data to Blob Storage • For prototyping / samples: #put• For production data interact directly with

blob storage APIs. • AzCopy Command Line• CopyBlob REST API• Third party upload/download tools:

Page 15: Data Management in Microsoft HDInsight: How to Move and Store Your Data

AzCopy Example

C:\blobs\a.txtC:\blobs\b.txtC:\blobs\dir1\c.txtC:\blobs\dir1\dir2\d.txt

AzCopy c:\blobs https://<account>.blob.core.windows.net/mycontainer/ /destkey:<key> /S

Container Blob Namemycontainer a.txtmycontainer b.txtmycontainer dir1\c.txtmycontainer dir1\dir2\d.txt

Blob Storage:File System:

Command Line:

HDInsight will treat this as a file in a 2-level dir structure

Page 16: Data Management in Microsoft HDInsight: How to Move and Store Your Data

DEMOCopy blob, Query with Hive

Page 17: Data Management in Microsoft HDInsight: How to Move and Store Your Data

Uploading Data to Blob Storage • For prototyping / samples: #put• For production data interact directly with

blob storage APIs. • AzCopy Command Line• CopyBlob REST API• Third party upload/download tools:

Page 18: Data Management in Microsoft HDInsight: How to Move and Store Your Data

Consuming Result Sets

Page 19: Data Management in Microsoft HDInsight: How to Move and Store Your Data

Consuming HDInsight Result SetsTarget Destination Tool / Library Requires Active

HDInsight Cluster SQL Server,Azure SQL DB

Sqoop (Hadoop ecosystem project) Yes

Excel Codename “Data Explorer” NoAnother Blob Storage Account

Azure Blob Storage REST APIs (Copy Blob, etc)

No

SQL Server Analysis Services

Hive ODBC Driver Yes

Existing BI Apps Hive ODBC Driver (assumes app supports ODBC connections to data sources)

Yes

Page 20: Data Management in Microsoft HDInsight: How to Move and Store Your Data

DEMOConsume Result Sets – SQL DB

Page 21: Data Management in Microsoft HDInsight: How to Move and Store Your Data

DEMOConsume Result Sets – Excel & “Data Explorer”

Page 22: Data Management in Microsoft HDInsight: How to Move and Store Your Data

Summary • HDInsight is an enterprise grade Hadoop-based

big data storage/processing platform • Azure Blob Storage + HDInsight == Simple big

data storage and processing in the cloud and is available to try today

• Consuming results from HDInsight into familiar tools, app, etc (Excel, etc) is simple with Data Explorer, Azure Blob APIs, Sqoop, ODBC, etc.

Page 23: Data Management in Microsoft HDInsight: How to Move and Store Your Data

Question?

Page 24: Data Management in Microsoft HDInsight: How to Move and Store Your Data

Related contentFDN01 – Big Data. Small Data. All DataDBIB304 – Large Scale Data Warehousing and Big Data…DBI-B325 – Do you have Big Data? Most Likely!DBI-B336 – Big Data Analytics with Microsoft Excel 2013DBI-B339 – Predictive Analytics with Microsoft Big DataDBI-B313 – Polybase: Hadoop integration in SQL Server …

Page 26: Data Management in Microsoft HDInsight: How to Move and Store Your Data

msdnResources for Developers

http://microsoft.com/msdn

LearningMicrosoft Certification & Training Resources

www.microsoft.com/learning

TechNet

Resources

Sessions on Demandhttp://channel9.msdn.com/Events/TechEd

Resources for IT Professionalshttp://microsoft.com/technet

Page 27: Data Management in Microsoft HDInsight: How to Move and Store Your Data

Complete an evaluation on CommNet and enter to win!

Page 28: Data Management in Microsoft HDInsight: How to Move and Store Your Data

Evaluate this session

Scan this QR code to evaluate this session and be automatically entered in a drawing to win a prize

Page 29: Data Management in Microsoft HDInsight: How to Move and Store Your Data

© 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.