hypertable - massively scalable nosql database

Post on 27-Jun-2015

347 Views

Category:

Technology

5 Downloads

Preview:

Click to see full reader

DESCRIPTION

Doug Judd (co-founder / CEO) of hypertable talks about hypertable store @ Big Data Guru's meetup on 2014-10-21

TRANSCRIPT

Hypertable    

An  Open  Source,  High  Performance,  

Massively  Scalable  Database    

Doug Judd CEO Hypertable Inc.

Three  Reasons  to  Choose  Hypertable

•  High Performance •  Open Source •  Future Direction SQL

Introduction

Highlights •  Modeled after Google’s Bigtable database •  High Performance Implementation (C++) •  Apache Thrift interface for all popular languages

(Java, PHP, Ruby, Python, Perl, etc) •  Broad Hadoop distribution support

o  Apache 2 o  Cloudera CDH3, CDH4, CDH5 o  IBM BigInsights 3 o  Hortonworks HDP2 o  MapR

•  Actively developed for 8 years

Open  Source •  Licensed under the GPL •  Hosted on GitHub

o  git://github.com/hypertable/hypertable.git o  https://github.com/hypertable/hypertable.git

•  Online source documentation •  Mailing Lists

o  groups.google.com/group/hypertable-user o  groups.google.com/group/hypertable-dev

Bigtable •  Google’s most successful scalable database •  Bigtable underpins 100+ Google services •  YouTube, Blogger, Google Earth, Google Maps,

Orkut, Gmail, Google Analytics, Google Book Search, Google Code, Crawl Database, Google Code …

•  Data is physically ordered by primary key – it’s not a distributed hash table

How  Hypertable  Differs  From  A  Traditional  RDBMS

•  Horizontally Scalable •  Sparse Table Structure

o  Variable number of columns per-row o  Rows can have billions of columns

•  Cells can have multiple time stamped versions

Database  Model •  Sparse, two-dimensional tables •  Cells can have multiple versions •  Cells addressed by 4-part key

o  Row o  Column family o  Column qualifier

o  Timestamp

Conceptual  Table  Representation

Actual  Table  Representation

Anatomy  of  a  Key •  Column Family is 8-bit •  Timestamp and Revision are 64-bit integer

nanoseconds since Epoch •  Simple byte-wise comparison

Architecture

Table  Growth  Process

How  Scaling  Works

How  Scaling  Works

How  Scaling  Works

High  Level  Architecture

High  Level  Architecture

High  Level  Architecture

High  Level  Architecture

High  Level  Architecture

High  Level  Architecture

RangeServer  Insert  Handling

RangeServer  Query  Handling

Cellstore  Format

Bloom  Filter

Request  Routing

Administration

Cluster  Task  AutomationTool

•  ht_cluster •  Modeled after Capistrano •  Role

o  Designates a function or service and the set of machines that will perform that function or service

o  Examples: Hyperspace, Master, Slave (RangeServer), ThriftBroker o  Machines can belong to one ore more roles

•  Task o  Script written for specific roles and used to manage the associated

function or service

o  Examples: start_hyperspace, stop_hyperspace

cluster.def INSTALL_PREFIX=/opt/hypertable HYPERTABLE_VERSION=0.9.8.2 PACKAGE_FILE=/tmp/hypertable-0.9.8.2-linux-x86_64.tar.gz FS=hadoop HADOOP_DISTRO=cdh4 ORIGIN_CONFIG_FILE=/root/hypertable.cfg PROMPT_CLEAN=true role: source test00 role: master test[00-02] role: hyperspace test[00-02] role: slave test[03-99] - test37 role: thriftbroker role: spare include: "core.tasks"

Common  Tasks ht cluster start

ht cluster stop

ht cluster push_config

ht cluster install_package

ht cluster upgrade

Monitoring

Ganglia  Metrics

Thrift  Broker  Metrics Metric Units Connections count Requests requests/s Errors errors/s Virtual  Memory GB Resident  Memory GB Heap  Size GB Heap  Slack  Bytes GB CPU  user percentage CPU  sys percentage Version string

Range  Server  Metrics Metric Units Scans scans/s Updates updates/s Bytes  Returned bytes/s Bytes  Scanned bytes/s Byte  Scan  Yield percentage Bytes  WriUen bytes/s Cells  Returned cells/s Cells  Scanned cells/s Cell  Scan  Yield percentage Outstanding  Scanners count Request  Backlog count

Metric Units Major  Compactions count Minor  Compactions count Merging  Compactions count GC  Compactions count Virtual  Memory GB Resident  Memory GB Heap  Size GB Heap  Slack  Bytes GB Tracked  Memory GB CPU  user percentage CPU  sys percentage

Range  Server  Metrics Metric Units Ranges count CellStores count Block  Cache  Hits percentage Block  Cache  Memory GB Block  Cache  Fill GB Query  Cache  Hits Percentage Query  Cache  Memory GB Query  Cache  Fill GB Version string

FS  Broker  Metrics Metric Units Read  Throughput MB/s Write  Throughput MB/s Syncs syncs/s Sync  Latency milliseconds Errors count JVM  GCs count JVM  GC  Time milliseconds JVM  Heap  Size GB Virtual  Memory GB Resident  Memory GB

Metric Units Heap  Size GB Heap  Slack  Bytes GB CPU  user percentage CPU  sys percentage Version string

Master  and  Hyperspace  Metrics

Metric Units Operations operations/s Virtual  Memory GB Resident  Memory GB Heap  Size GB Heap  Slack  Bytes GB CPU  user percentage CPU  sys percentage Version string

Metric Units Requests requests/s Virtual  Memory GB Resident  Memory GB Heap  Size GB Heap  Slack  Bytes GB CPU  user percentage CPU  sys percentage Version string

Master Hyperspace

Slow  Query  Log •  ThriftBroker feature •  Logs queries that

take longer than 10 seconds

•  Log line format o  End time (seconds) o  Start time (seconds) o  Function called o  Client IP/port o  Latency (milliseconds) o  Sub-scanner count

o  Bytes Returned o  Bytes Scanned o  Disk read o  Servers contacted o  Namespace o  HQL representation of query

Features

Namespaces

Namespaces USE ‘/’;

CREATE NAMESPACE foo; USE foo;

CREATE NAMESPACE bar;

CREATE TABLE mytable (a, b, c); GET LISTING;

(bar) namespace mytable

Atomic  Counters •  Column option:

CREATE TABLE counts ( url COUNTER );

•  Modified via existing API using specially formatted values:

Value Format Description [+]n Increment counter by n -n Decrement counter by n =n Reset counter to n

Secondary  Indexes

Total  Cells  Inserted: 1  billion Total  Time  Taken: 45  minutes

Aggregate  Throughput  (inserts/s): 372,362 Aggregate  Throughput  (bytes/s): 14,763,300

§  Six test machines -  Dual Six-core Opteron HE Processors -  24 GB RAM -  4X 2TB SATA drives

§  Single Indexed column -  Key: randomly generated 20-byte integer -  Value: two randomly chosen words from /usr/share/dict/

words

Secondary  Indexes  (HQL)

CREATE TABLE products ( title, section, info, category,

INDEX section, INDEX info, QUALIFIER INDEX info, QUALIFIER INDEX category );

Secondary  Indexes SELECT title FROM products WHERE info:actor = “Jack Nicholson”;

B00002VWE0 title Five Easy Pieces (1970) B002VWNIDG title The Shining (1980)

Secondary  Indexes SELECT title, info:author FROM products WHERE info:author =~ /^Stephen [PK]/;

0307743659 title The Shining Mass Market Paperback 0307743659 info:author Stephen King 0321776402 title C++ Primer Plus (6th Edition) (Developer's Library)

0321776402 info:author Stephen Prata

Secondary  Indexes SELECT title FROM products WHERE Exists(info:studio);

B00002VWE0 title Five Easy Pieces (1970) B000Q66J1M title 2001: A Space Odyssey [Blu-ray] B002VWNIDG title The Shining (1980)

Secondary  Indexes SELECT title FROM products WHERE info:author =~ /^Stephen P/ OR info:publisher =~ /^Anchor/; 0307743659 title The Shining Mass Market Paperback 0321776402 title C++ Primer Plus (6th Edition) (Developer's Library)

Secondary  Indexes SELECT title FROM products WHERE info:author =~ /^Stephen [PK]/ AND info:publisher =~ /^Anchor/; 0307743659 title The Shining Mass Market Paperback

Secondary  Indexes SELECT title FROM products WHERE ROW =^ 'B' AND info:actor = 'Jack Nicholson'; B00002VWE0 title Five Easy Pieces (1970) B002VWNIDG title The Shining (1980)

Regex  Filtering •  Google’s RE2 regular expression engine

o  Extremely fast (up to 50X Java regex) o  Searches run in time linear in the size of the

input o  Searches constrained to a fixed amount of

memory

•  Supported Searches: o  Row key o Column qualifier o Value

Regex  Filtering SELECT info:/^a/ FROM products; 0307743659 info:author Stephen King 0321321928 info:author Stephen C. Dewhurst 0321776402 info:author Stephen Prata B00002VWE0 info:actor Karen Black B00002VWE0 info:actor Jack Nicholson B000Q66J1M info:actor Gary Lockwood B000Q66J1M info:actor Keir Dullea B002VWNIDG info:actor Shelley Duvall B002VWNIDG info:actor Jack Nicholson

Regex  Filtering

SELECT title FROM products

WHERE ROW REGEXP "2";

0321321928 title C++ Common Knowledge: Essential Intermediate Programming [Paperback]

0321776402 title C++ Primer Plus (6th Edition) (Developer's Library)

B00002VWE0 title Five Easy Pieces (1970) B002VWNIDG title The Shining (1980)

Regex  Filtering

SELECT title FROM products WHERE VALUE REGEXP "\("; 0321776402 title C++ Primer Plus (6th Edition) (Developer's Library)

B00002VWE0 title Five Easy Pieces (1970) B002VWNIDG title The Shining (1980)

Hadoop  MapReduce •  MapReduce Input/Output formats

o  Normal (mapreduce) o  Streaming (mapred)

•  Load data from HT to Hive and vice-versa •  Use Hive types •  Use Hive QL (joins, aggregations) •  Low latency data warehousing •  Uses Hypertable’s native MapReduce Input/Output

format

Column  Family  Options •  TTL=<t>

o  “time to live” o  Remove cells that are older than <t>

•  MAX_VERSIONS=<n> o  Keep only most recent <n> cell versions

Access  Groups CREATE TABLE User ( name, address, photo, profile, ACCESS GROUP default (name, address, photo), ACCESS GROUP profile (profile) );

Adaptive  Memory  Allocation

Group  Commit •  Supports highly concurrent updates •  Trades average latency for better throughput •  By default, commit log writes are auto-coalesced •  Commit log write interval can be statically

configured per-table: CREATE TABLE counts ( url, domain ) GROUP_COMMIT_INTERVAL=100;

Caching •  Block Cache

o  Caches CellStore blocks o  Can be configured to store blocks compressed or

uncompressed (default = compressed) o  Dynamically adjusted size based on workload

•  Query Cache o  Caches query results o  Caches single row queries only

Compression •  Cell Store blocks are compressed •  Commit Log updates are compressed •  Supported Compression Schemes:

bmz, lzo, quicklz, snappy, zlib, none

•  Quicklz performance numbers:

Language Compression Speed (MB/s)

Decompression Speed (MB/s)

C++ 308 358

Java 127 95

Performance  Study

Hypertable  vs.  HBase •  Modeled after test described in Bigtable paper •  Hypertable 0.9.5.5 vs. HBase 0.90.4 •  16-node Cluster

o  CPU: 2X AMD C32 Six-core model 4170 HE 2.1GHz o  RAM: 24GB o  Disk: 4X 2TB SATA

•  Tests Run o  Random Write o  Scan o  Random Read Zipfian o  Random Read Uniform

Random  Write

Scan

Random  Read  Zipfian

Case  Studies

•  Operational Data Store •  System metrics

o  CPU o  Memory o  IO o  Network

•  Application metrics o  Web o  DB o  Caches

•  Business metrics o  Usage o  Revenue

Case  Study:  Noah  System

•  Storage Capacity o  Up to 100TB o  Up to 1 trillion records

•  Automatic Sharding o  Irregular data growth patterns

•  Heavy Writes o  ~30K inserts/s

•  Fast Reads of Recent Data •  Table Scans

System  Requirements

Architecture  Diagram

•  2nd Largest Indian Internet Portal •  Rediffmail

o  One of the world’s largest email services o  Over 100 Million registered users

•  Active Deployments o  Rediffmaill

o  Email SPAM classification o  News Crawl Database o  Recommendation System

Case  Study:  Rediff

Architectural  Overview

Query  Latency

Summary

•  High Performance •  Open Source •  Future Direction SQL

The  End

top related