hbase an introduction

23
Introduction to HBase Ciao ciao Vai a fare ciao ciao Dr. Fabio Fumarola

Upload: fabio-fumarola

Post on 14-Jul-2015

138 views

Category:

Engineering


0 download

TRANSCRIPT

Page 1: Hbase an introduction

Introduction to HBase

Ciaociao

Vai a fare

ciao ciao

Dr. Fabio Fumarola

Page 2: Hbase an introduction

Contents

• BigTable• HBase

– Shell– Admin– Put– Get– Scan

• Coding Session

2

Page 3: Hbase an introduction

BigTable

3

Page 4: Hbase an introduction

Bigtable at google

• "Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable including web indexing, Google Earth, and Google Finance.”

4

Page 5: Hbase an introduction

Feature

• Distributed

• Sparse

• Column-Oriented

• Versioned

5

Page 6: Hbase an introduction

1. The map is indexed by a – <row key, column key, and a timestamp>

1. each value in the map is an uninterpreted array of bytes.

6

(row key, column key, timestamp) => value

Page 7: Hbase an introduction

Key Concepts

• row key => 20120407152657

• column family => "personal:"• column key => "personal:givenName",

"personal:surname”

• timestamp => 1239124584398

• Column value => “mario”, “rossi”

7

Page 8: Hbase an introduction

Example 1

8

Page 9: Hbase an introduction

Get row 20120407145045

9

Page 10: Hbase an introduction

HBase

• Use HBase when you need random, realtime read/ write access to your Big Data.This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. HBase is an open-source, distributed, versioned, column-oriented store modeled after Google's Bigtable.

http://hbase.apache.org

10

Page 11: Hbase an introduction

HBase Shell

hbase(main):001:0> create 'blog', 'info', 'content'

0 row(s) in 4.3640 seconds

hbase(main):002:0> put 'blog', '20120320162535', 'info:title', 'Document-oriented storage using CouchDB'

0 row(s) in 0.0330 seconds

hbase(main):003:0> put 'blog', '20120320162535', 'info:author', 'Bob Smith'

0 row(s) in 0.0030 seconds

hbase(main):004:0> put 'blog', '20120320162535', 'content:', 'CouchDB is a

document-oriented...'

0 row(s) in 0.0030 seconds

11

Page 12: Hbase an introduction

HBase shellhbase(main):005:0> put 'blog', '20120320162535', 'info:category', 'Persistence'

0 row(s) in 0.0030 seconds

hbase(main):006:0> get 'blog', '20120320162535'

COLUMN

content:

info:author

info:category

info:title

4 row(s) in 0.0140 seconds

CELL

timestamp=1239135042862, value=CouchDB is a doc...

timestamp=1239135042755, value=Bob Smith

timestamp=1239135042982, value=Persistence

timestamp=1239135042623, value=Document-oriented...

12

Page 13: Hbase an introduction

HBase shellhbase(main):015:0> get 'blog', '20120407145045', {COLUMN=>'info:author', VERSIONS=>3 }

timestamp=1239135325074, value=John Doe

timestamp=1239135324741, value=John

2 row(s) in 0.0060 seconds

hbase(main):016:0> scan 'blog', { STARTROW => '20120300', STOPROW => '20120400' }

ROW

20120320162535

20120320162535

20120320162535

20120320162535

COLUMN+CELL

column=content:, timestamp=1239135042862, value=CouchDB is...

column=info:author, timestamp=1239135042755, value=Bob Smith

column=info:category, timestamp=1239135042982, value=Persistence

column=info:title, timestamp=1239135042623, value=Document...

4 row(s) in 0.0230 seconds

13

Page 14: Hbase an introduction

Java API

14

Page 15: Hbase an introduction

Admin API// Create a new table

Configuration conf = HBaseConfiguration.create(); HBaseAdmin admin = new HBaseAdmin(conf);

String tableName = "people";

HTableDescriptor desc = new HTableDescriptor(tableName); desc.addFamily(new HColumnDescriptor("personal")); desc.addFamily(new HColumnDescriptor("contactinfo")); desc.addFamily(new HColumnDescriptor("creditcard")); admin.createTable(desc);

System.out.printf("%s is available? %b\n", tableName, admin.isTableAvailable(tableName));

15

Page 16: Hbase an introduction

Client APIimport static org.apache.hadoop.hbase.util.Bytes.toBytes;

// Add some data into 'people' table

Configuration conf = HBaseConfiguration.create();

Put put = new Put(toBytes("connor-john-m-43299"));

put.add(toBytes("personal"), toBytes("givenName"), toBytes("John"));

put.add(toBytes("personal"), toBytes("mi"), toBytes("M")); put.add(toBytes("personal"), toBytes("surname"), toBytes("Connor"));

put.add(toBytes("contactinfo"), toBytes("email"), toBytes("[email protected]")); table.put(put);

table.flushCommits(); table.close();

16

Page 17: Hbase an introduction

Finding Data

• GET (by row key)

• Scan (by row key ranges, filtering)

17

Page 18: Hbase an introduction

Get

// Get a row. Ask for only the data you need. Configuration conf = HBaseConfiguration.create();

HTable table = new HTable(conf, "people");

Get get = new Get(toBytes("connor-john-m-43299")); get.setMaxVersions(2); get.addFamily(toBytes("personal"));

get.addColumn(toBytes("contactinfo"), toBytes("email"));

Result result = table.get(get);

18

Page 19: Hbase an introduction

Update// Update existing values, and add a new one

Configuration conf = HBaseConfiguration.create();

HTable table = new HTable(conf, "people");

Put put = new Put(toBytes("connor-john-m-43299")); put.add(toBytes("personal"), toBytes("surname"), toBytes("Smith"));

put.add(toBytes("contactinfo"), toBytes("email"), toBytes("[email protected]"));

put.add(toBytes("contactinfo"), toBytes("address"), toBytes("San Diego, CA"));

table.put(put);

table.flushCommits();

table.close();

19

Page 20: Hbase an introduction

Scans// Scan rows...

Configuration conf = HBaseConfiguration.create();

HTable table = new HTable(conf, "people");

Scan scan = new Scan(toBytes(”jhon-")); scan.addColumn(toBytes("personal"), toBytes("givenName")); scan.addColumn(toBytes("contactinfo", toBytes("email")); scan.addColumn(toBytes("contactinfo", toBytes("address")); scan.setFilter(new PageFilter(numRowsPerPage)); ResultScanner scanner = table.getScanner(scan);

for (Result result : scanner) {

// process result...

}

20

Page 21: Hbase an introduction

Time to CodeThis is when things start to do hard

21

Page 22: Hbase an introduction

Setup HBase Docker

• https://registry.hub.docker.com/u/banno/hbase-standalone/• https://registry.hub.docker.com/u/oddpoet/hbase-cdh5/

22

Page 23: Hbase an introduction

Steps

• Shell• Java Project– Maven– Gradle

23