dancing with the elephant h base1_final

24
Dancing With The Elephant Persistence with HBase: Part 1 www.smart-platform.com @smartplatf Event Sponsors

Upload: asterixsmartplatf

Post on 08-May-2015

768 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Dancing with the elephant   h base1_final

Dancing With The Elephant

Persistence with HBase: Part 1

www.smart-platform.com@smartplatf

Event Sponsors

Page 2: Dancing with the elephant   h base1_final

We will discuss

• Introduction to Hadoop• HBase: Definition, Storage Model, Usecases• Basic Data Access from shell• Hands-on with HBase API

Page 3: Dancing with the elephant   h base1_final

What is Hadoop

• Framework for distributed processing of large datasets(BigData)

• HDFS+MapReduce• HDFS: (Data)

Distributed Filesystem responsible for storing data across cluster

Provides replication on cheap commodity hardware Namenode and DataNode processes

• MapReduce: (Processing) May be a future session

Page 4: Dancing with the elephant   h base1_final

HBase: What

• a sparse, distributed, persistent, multidimensional, sorted map ( defined by Google’s paper on BigTable)

• Distributed NoSQL Database designed on top of HDFS

Page 5: Dancing with the elephant   h base1_final

RDBMS Woes (with massive data)

• Scaling is Hard and Expensive• Turn off relational features/secondary indexes.. to

scale• Hard to do quick reads at larger tables sizes(500

GB)• Single point of failures• Schema changes

Page 6: Dancing with the elephant   h base1_final

HBase: Why

• Scalable: Just add nodes as your data grows• Distributed: Leveraging Hadoop’s HDFS

advantages • Built on top of Hadoop : Being part of the

ecosystem, can be integrated to multiple tools• High performance for read/write

Short-Circuit reads Single reads: 1 to 10 ms, Scan for: 100s of rows in 10ms

• Schema less• Production-Ready where data is in order of

petabytes

Page 7: Dancing with the elephant   h base1_final

HBase: Storage Model 1

Page 8: Dancing with the elephant   h base1_final

HTable

• Tables are split into regions• Region: Data with continuous range of RowKeys

from [Start to End) sorted Order• Regions split as Table grows (Region size can be

configured)• Table Schema defines Column Families• (Table, RowKey, ColumnFamily, ColumnName, Timestamp)

Value

Page 9: Dancing with the elephant   h base1_final

HTable(Data Structure)

• SortedMap(RowKey, List(

SortedMap(Column, List(

Value, Timestamp)

))

)

Page 10: Dancing with the elephant   h base1_final

HBase: Data Read/Write

• Get: Random read• Scan: Sequential read• Put: Write/Update

Page 11: Dancing with the elephant   h base1_final

HBase: Data Access Clients

• Demo of HBase shell• Java API

Page 12: Dancing with the elephant   h base1_final

HBase: API

• Connection• DDL• DML• Filters• Hands-On

Page 13: Dancing with the elephant   h base1_final

HBase: API

• Configuration: holds details where to find the cluster and tunable setting .

• Hconnection : represent connection to the cluster.

• HBaseAdmin: handles DDL operations(create, list,drop,alter).

• Htable (HTableInterface) :is a handle on a single Hbase table. Send “command” to the table (Put , Get , Scan , Delete , Increment)

Page 14: Dancing with the elephant   h base1_final

HBase: API:DDL

Group name: ddl (Data Defination Language)

Commands: alter, create, describe, disable, drop, enable, exists, is_disabled, is_enabled, list

Page 15: Dancing with the elephant   h base1_final

HBase: API:DDL

HBaseConfiguration conf = new HBaseConfiguration();conf.set("hbase.master","localhost:60010"); HBaseAdmin hbase = new HBaseAdmin(conf);HTableDescriptor desc = new HTableDescriptor(" testtable

");HColumnDescriptor meta = new HColumnDescriptor("

colfam1 ".getBytes());HColumnDescriptor prefix = new HColumnDescriptor("

colfam2 ".getBytes());desc.addFamily(meta);desc.addFamily(prefix);hbase.createTable(desc);

Page 16: Dancing with the elephant   h base1_final

HBase: API:DML

Group name: dml (Data Manipulation Language)

Commands: count, delete, deleteall, get, get_counter, incr, put, scan, truncate

Page 17: Dancing with the elephant   h base1_final

HBase: API:DML PUT

HTable table = new HTable(conf, "testtable");Put put = new Put(Bytes.toBytes("row1"));put.add(Bytes.toBytes("colfam1"),

Bytes.toBytes("qual1"),Bytes.toBytes("val1"));put.add(Bytes.toBytes("colfam1"),

Bytes.toBytes("qual2"),Bytes.toBytes("val2"));table.put(put);

Page 18: Dancing with the elephant   h base1_final

HBase: API:DML GET

Configuration conf = HBaseConfiguration.create();HTable table = new HTable(conf, "testtable");Get get = new Get(Bytes.toBytes("row1"));get.addColumn(Bytes.toBytes("colfam1"),

Bytes.toBytes("qual1"));Result result = table.get(get);byte[] val =

result.getValue(Bytes.toBytes("colfam1"),Bytes.toBytes("qual1"));System.out.println("Value: " + Bytes.toString(val));

Page 19: Dancing with the elephant   h base1_final

HBase: API:DML SCAN

Scan scan1 = new Scan();ResultScanner scanner1 = table.getScanner(scan1);

for (Result res : scanner1) {System.out.println(res);

}scanner1.close();

Page 20: Dancing with the elephant   h base1_final

Other Projects around HBase

• SQL Layer: Phoenix, Hive, Impala• Object Persistence: Lily, Kundera

Page 21: Dancing with the elephant   h base1_final

FollowUp

• Part2: Building KeyValue Data store in HBase Challenges we faced in SMART

• {Rahul, vinay}@briotribes.com

Page 22: Dancing with the elephant   h base1_final

Shoutout To

Page 23: Dancing with the elephant   h base1_final

HBase: Usecase (Facebook)

• Facebook Messaging: Titan 1.5 M ops per second at peak 6B+ messages per day 16 columns per operation across diff. families

• Facebook insights: Puma provides developers and Page owners with metrics about

their content > 1 M counter increments per second

Page 24: Dancing with the elephant   h base1_final