big data: sql on hadoop meetup, nyc sept. 2014 part 1

15
SQL for Hadoop: Introducing Big SQL for BigInsights C. M. Saracco, IBM Silicon Valley Lab ([email protected] ) Sept. 10 - 11, 2014

Upload: cynthia-saracco

Post on 18-Nov-2014

141 views

Category:

Technology


0 download

DESCRIPTION

Part 1 of NYC Meetup for SQL on Hadoop, Sept. 2014 -- a full-day developer workshop using Big SQL and BigInsights

TRANSCRIPT

SQL for Hadoop:

Introducing Big SQL for BigInsights

C. M. Saracco, IBM Silicon Valley Lab ([email protected])

Sept. 10 - 11, 2014

Welcome

� Introductions and logistics

� Goal– Query and analyze Big Data in Hadoop using industry standard SQL

� What you’ll learn– Create Big SQL FACT and DIMENSION tables

2 © 2014 IBM Corporation

– Create Big SQL FACT and DIMENSION tables

– Populate Big SQL tables with data

– Query Big SQL tables using projections, restrictions, N-way joins, aggregations,

etc.

– Explore advanced topics: visualization tools, SerDes, security, data access

plans, etc.

� Reference platform: InfoSphere BigInsights 3.0 – Free Quick Start Edition (VMware image, native install) for non-production use

http://www-01.ibm.com/software/data/infosphere/biginsights/quick-start/downloads.html

– Also available on BlueMix (cloud), native production license

Agenda

9:00 – 9:15 am Welcome / introduction (Cindy)

9:15 – 10:15 am Introduction to SQL on Hadoop (Neal)

10:15 – 10:30 am Lab intro: Getting started with SQL on Hadoop (Neal)

10:30 – noon Hands-on lab (self-paced)

Noon – 1 pm Lunch

3 © 2014 IBM Corporation

1:00 – 1:30 pm Exploring Big SQL: DDL, LOAD, queries, tooling (Neal)

1:30 – 1:45 pm Lab intro: Querying structured data (Neal)

1:45 – 3:15 pm Hands-on lab (self-paced)

3:15 – 4:00 pm Demo of advanced features (Cindy)

4:00 – 4:15 pm Summary (Cindy)

4:15 – 5:00 pm Q&A / Discussion (Cindy / all)

Want to learn more?

� Download free BigInsights edition

� Test drive the technologies– Follow online tutorials

– Enroll in online classes

– Watch video demos, read articles, etc.

� Links all available from HadoopDev – https://developer.ibm.com/hadoop/

4 © 2014 IBM Corporation

5 © 2014 IBM Corporation

DEMO

Advanced features

Beyond the basics

� Visualization with BigSheets – Spreadsheet-style interface for business analysis

– Generate charts, customize workbooks, . . . .

� Non-traditional data (e.g., social media data in JSON) – SerDes (serializers / deserializers)

6 © 2014 IBM Corporation

� SQL security – Column masking

– Row-based permissions

� Data access plans – Collecting statistics (ANALYZE TABLE)

– Exploring data access plans (example: query rewrite in action)

� BlueMix (Cloud) service . . . .

7 © 2014 IBM Corporation

SUMMARY

SQL on Hadoop

Summary

� What is Big SQL? – Industry-standard SQL query interface for Hadoop data in BigInsights – Derived from decades of IBM R&D investment in RDBMS technology, including database parallelism and query optimization

� Why Big SQL? – Easy on-ramp to Hadoop for SQL professionals – Support familiar SQL tools / applications (via JDBC and ODBC drivers)

8 © 2014 IBM Corporation

� What operations are supported? – Create tables / views. Underlying storage provided by Hive, DFS – Load data into tables (from local files, remote files, RDBMSs) – Query data (project, restrict, join, union, wide range of sub-queries, wide range of built-in functions . . . . )

– GRANT / REVOKE privileges, create roles, create column masks and row permissions

– Transparently join / union data between Hadoop and RDBMSs in single query – Collect statistics and inspect detailed data access plan – Establish workload management controls – Monitor Big SQL usage – . . . .

Want to learn more?

� Download free BigInsights edition

� Test drive the technologies– Follow online tutorials

– Enroll in online classes

– Watch video demos, read articles, etc.

� Links all available from HadoopDev – https://developer.ibm.com/hadoop/

9 © 2014 IBM Corporation

What is Bluemix?

Bluemix is an open-standard, cloud-based platform for building, managing,

and running applications of all types (web, mobile, big data, new smart

devices, and so on).

10 © 2014 IBM Corporation10

Go Live in Seconds

The developer can choose

any language runtime or

bring their own. Zero to

production in one command.

DevOps

Development, monitoring,

deployment, and logging tools

allow the developer to run the

entire application.

APIs and Services

A catalog of IBM, third party,

and open source API services

allow the developer to stitch an

application together in minutes.

On-Prem Integration

Build hybrid environments.

Connect to on-premise assets

plus other public and private

clouds.

Flexible Pricing

Sign up in minutes. Pay as

you go and subscription

models offer choice and

flexibility.

Layered Security

IBM secures the platform and

infrastructure and provides

you with the tools to secure

your apps.

Create apps quickly with prebuilt services

• Runtimes, services, and tooling

up to you

Choice

Industry Leading IBM Capabilities

• Services leveraging the depth

of IBM software

Cloud

Integration

Services

Mobile

Services

Database

services

Watson

Services

A full range of capabilities to suit any great idea.

11 © 2014 IBM Corporation11

• Full range of capabilities

Completeness

• Open source platform and services

• Third party to enable key use cases

Security

Services

Web and

application

services

Big Data

services

12 © 2014 IBM Corporation

SUPPLEMENTAL

Getting Started with the labs

� Power on your machines (if needed)

� Click VMware workstation icon on Desktop (upper left)

� Click InfoSphere BigInsights VMware image -> Power On

13 © 2014 IBM Corporation

� Log in when prompted (biadmin / biadmin)

� Proceed with lab when this screen appears:

SQL Access for Hadoop: Why?� Data warehouse modernization is

a leading Hadoop use case– Off load “cold” data to query-ready

platform

– Explore / analyze / pre-process non-

traditional data. Aggregate / upload

results to warehouse.

� Limited availability of skills in

MapReduce, Pig, etc.

14 © 2014 IBM Corporation

MapReduce, Pig, etc.

� SQL opens the data to a much

wider audience– Familiar, widely known syntax

– Common catalog for identifying data

and structure

© 2013 IBM Corporation14

2012 Big Data @ Work Study

surveying 1144 business and IT

professionals in 95 countries

From Getting Starting to Enterprise Deployment:

Different BigInsights Editions For Varying Needs

Standard EditionEn

terp

rise c

lass

Enterprise Edition

- Accelerators

-- GPFS – FPO

-- Adaptive MapReduce

- Text analytics

- Enterprise Integration

15 © 2014 IBM Corporation© 2013 IBM Corporation15

Standard Edition

Breadth of capabilities

En

terp

rise c

lass

- Spreadsheet-style tool

-- Web console

-- Dashboards

- Pre-built applications

-- Eclipse tooling

-- RDBMS connectivity

-- Big SQL

-- Monitoring and alerts

-- Platform enhancements

-- . . .

- Enterprise Integration

-- Big R

-- InfoSphere Streams*

-- Watson Explorer*

-- Cognos BI*

-- Data Click*

-- . . .

-* Limited use license

ApacheHadoop

Quick Start Free. Non-production

Same features as Standard Edition plus text analytics and Big R