Download - HBaseCon 2014-Just the Basics
![Page 1: HBaseCon 2014-Just the Basics](https://reader036.vdocuments.us/reader036/viewer/2022062703/554f871bb4c905d25b8b4d26/html5/thumbnails/1.jpg)
1
HBase: Just the BasicsJesse Anderson – Curriculum Developer and Instructor
v2
![Page 2: HBaseCon 2014-Just the Basics](https://reader036.vdocuments.us/reader036/viewer/2022062703/554f871bb4c905d25b8b4d26/html5/thumbnails/2.jpg)
2 ©2014 Cloudera, Inc. All rights reserved.2
What Is HBase?
• NoSQL datastore built on top of HDFS (Hadoop)• An Apache Top Level Project• Handles the various manifestations of Big Data• Based on Google’s BigTable paper
![Page 3: HBaseCon 2014-Just the Basics](https://reader036.vdocuments.us/reader036/viewer/2022062703/554f871bb4c905d25b8b4d26/html5/thumbnails/3.jpg)
3 ©2014 Cloudera, Inc. All rights reserved.3
Why Use HBase?
• Storing large amounts of data (TB/PB)• High throughput for a large number of requests• Storing unstructured or variable column data• Big Data with random read and writes
![Page 4: HBaseCon 2014-Just the Basics](https://reader036.vdocuments.us/reader036/viewer/2022062703/554f871bb4c905d25b8b4d26/html5/thumbnails/4.jpg)
4 ©2014 Cloudera, Inc. All rights reserved.4
When to Consider Not Using HBase?
• Only use with Big Data problems• Read straight through files• Write all at once or append new files
• Not random reads or writes• Access patterns of the data are ill-defined
![Page 5: HBaseCon 2014-Just the Basics](https://reader036.vdocuments.us/reader036/viewer/2022062703/554f871bb4c905d25b8b4d26/html5/thumbnails/5.jpg)
5
HBase ArchitectureHow it works
![Page 6: HBaseCon 2014-Just the Basics](https://reader036.vdocuments.us/reader036/viewer/2022062703/554f871bb4c905d25b8b4d26/html5/thumbnails/6.jpg)
6 ©2014 Cloudera, Inc. All rights reserved.6
Meet the Daemons
• HBase Master• RegionServer• ZooKeeper• HDFS
• NameNode/Standby NameNode• DataNode
![Page 7: HBaseCon 2014-Just the Basics](https://reader036.vdocuments.us/reader036/viewer/2022062703/554f871bb4c905d25b8b4d26/html5/thumbnails/7.jpg)
7 ©2014 Cloudera, Inc. All rights reserved.7
Daemon Locations
Master Nodes
Slave Nodes
![Page 8: HBaseCon 2014-Just the Basics](https://reader036.vdocuments.us/reader036/viewer/2022062703/554f871bb4c905d25b8b4d26/html5/thumbnails/8.jpg)
8 ©2014 Cloudera, Inc. All rights reserved.8
Tables and Column Families
Column Family “contactinfo” Column Family “profilephoto”
Tables are broken into groupings called Column Families.
Group data frequently accessed together and compress it Group photos with different settings
![Page 9: HBaseCon 2014-Just the Basics](https://reader036.vdocuments.us/reader036/viewer/2022062703/554f871bb4c905d25b8b4d26/html5/thumbnails/9.jpg)
9 ©2014 Cloudera, Inc. All rights reserved.9
Rows and Columns
Row key Column Family “contactinfo” Column Family “profilephoto”adupont fname: Andre lname: Dupontjsmith fname: John lname: Smith image: <smith.jpg>mrossi fname: Mario lname: Rossi image: <mario.jpg>
Row keys identify a row
No storage penalty for unused columns
Each Column Family can have many columns
![Page 10: HBaseCon 2014-Just the Basics](https://reader036.vdocuments.us/reader036/viewer/2022062703/554f871bb4c905d25b8b4d26/html5/thumbnails/10.jpg)
10 ©2014 Cloudera, Inc. All rights reserved.10
Regions
Row key Column Family “contactinfo”adupont fname: Andre lname: Dupontjsmith fname: John lname: Smith
A table is broken into regions
Row key Column Family “contactinfo”
mrossi fname: Mario lname: Rossi
zstevens fname: Zack lname: Stevens
Regions are served by RegionServers
![Page 11: HBaseCon 2014-Just the Basics](https://reader036.vdocuments.us/reader036/viewer/2022062703/554f871bb4c905d25b8b4d26/html5/thumbnails/11.jpg)
11 ©2014 Cloudera, Inc. All rights reserved.11
Write Path
1. Which RegionServer is serving the Region?
2. Write to RegionServer
![Page 12: HBaseCon 2014-Just the Basics](https://reader036.vdocuments.us/reader036/viewer/2022062703/554f871bb4c905d25b8b4d26/html5/thumbnails/12.jpg)
12 ©2014 Cloudera, Inc. All rights reserved.12
Read Path
1. Which RegionServer is serving the Region?
2. Read from RegionServer
![Page 13: HBaseCon 2014-Just the Basics](https://reader036.vdocuments.us/reader036/viewer/2022062703/554f871bb4c905d25b8b4d26/html5/thumbnails/13.jpg)
13
HBase APIHow to access the data
![Page 14: HBaseCon 2014-Just the Basics](https://reader036.vdocuments.us/reader036/viewer/2022062703/554f871bb4c905d25b8b4d26/html5/thumbnails/14.jpg)
14 ©2014 Cloudera, Inc. All rights reserved.14
No SQL Means No SQL
• Data is not accessed over SQL• You must:
• Create your own connections• Keep track of the type of data in a column• Give each row a key• Access a row by its key
![Page 15: HBaseCon 2014-Just the Basics](https://reader036.vdocuments.us/reader036/viewer/2022062703/554f871bb4c905d25b8b4d26/html5/thumbnails/15.jpg)
15 ©2014 Cloudera, Inc. All rights reserved.15
Types of Access
• Gets• Gets a row’s data based on the row key
• Puts• Upserts a row with data based on the row key
• Scans• Finds all matching rows based on the row key• Scan logic can be increased by using filters
![Page 16: HBaseCon 2014-Just the Basics](https://reader036.vdocuments.us/reader036/viewer/2022062703/554f871bb4c905d25b8b4d26/html5/thumbnails/16.jpg)
16 ©2014 Cloudera, Inc. All rights reserved.16
Gets
123
4
Get g = new Get(ROW_KEY_BYTES); Result r= table.get(g);byte[] byteArray =
r.getValue(COLFAM_BYTS,COLDESC_BYTS);
String columnValue = Bytes.toString(byteArray);
![Page 17: HBaseCon 2014-Just the Basics](https://reader036.vdocuments.us/reader036/viewer/2022062703/554f871bb4c905d25b8b4d26/html5/thumbnails/17.jpg)
17 ©2014 Cloudera, Inc. All rights reserved.17
Puts
12
3
4
Put p = new Put(ROW_KEY_BYTES);p.add(COLFAM_BYTES, COLDESC_BYTES, Bytes.toBytes("value"));
table.put(p);
![Page 18: HBaseCon 2014-Just the Basics](https://reader036.vdocuments.us/reader036/viewer/2022062703/554f871bb4c905d25b8b4d26/html5/thumbnails/18.jpg)
18
HBase Schema DesignHow to design
![Page 19: HBaseCon 2014-Just the Basics](https://reader036.vdocuments.us/reader036/viewer/2022062703/554f871bb4c905d25b8b4d26/html5/thumbnails/19.jpg)
19 ©2014 Cloudera, Inc. All rights reserved.19
No SQL Means No SQL
• Designing schemas for HBase requires an in-depth knowledge• Schema Design is ‘data-centric’ not ‘relationship-
centric’• You design around how data is accessed• Row keys are engineered
![Page 20: HBaseCon 2014-Just the Basics](https://reader036.vdocuments.us/reader036/viewer/2022062703/554f871bb4c905d25b8b4d26/html5/thumbnails/20.jpg)
20
Treating HBase like a traditional RDBMS will lead to abject failure!Captain Picard
![Page 21: HBaseCon 2014-Just the Basics](https://reader036.vdocuments.us/reader036/viewer/2022062703/554f871bb4c905d25b8b4d26/html5/thumbnails/21.jpg)
21 ©2014 Cloudera, Inc. All rights reserved.21
Row Keys
• A row key is more than the glue between two tables• Engineering time is spent just on constructing a row
key• Contents of a row key vary by access pattern• Often made up of several pieces of data:<group_id><email>
![Page 22: HBaseCon 2014-Just the Basics](https://reader036.vdocuments.us/reader036/viewer/2022062703/554f871bb4c905d25b8b4d26/html5/thumbnails/22.jpg)
22 ©2014 Cloudera, Inc. All rights reserved.22
Schema Design
• Schema design does not start in an ERD• Access pattern must be known and ascertained• Denormalize to improve performance
• Fewer, bigger tables
![Page 23: HBaseCon 2014-Just the Basics](https://reader036.vdocuments.us/reader036/viewer/2022062703/554f871bb4c905d25b8b4d26/html5/thumbnails/23.jpg)
23 ©2014 Cloudera, Inc. All rights reserved.
Jesse Anderson@jessetanderson