nosql series-part-3-hypertable

29
Hypertable Hypertable Doug Judd Doug Judd CEO, Hypertable, Inc. CEO, Hypertable, Inc.

Upload: hypertable

Post on 24-Jun-2015

1.582 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Nosql series-part-3-hypertable

HypertableHypertable

Doug JuddDoug Judd

CEO, Hypertable, Inc.CEO, Hypertable, Inc.

Page 2: Nosql series-part-3-hypertable

High Performance, Open SourceHigh Performance, Open SourceScalable DatabaseScalable Database

Modeled after Modeled after BigtableBigtable High Performance Implementation (C++)High Performance Implementation (C++) Project Started in March 2007Project Started in March 2007 Runs on top of HDFSRuns on top of HDFS Thrift Interface for all popular languagesThrift Interface for all popular languages

JavaJava PHPPHP RubyRuby PythonPython Perl, etc.Perl, etc.

Page 3: Nosql series-part-3-hypertable

Hypertable DeploymentsHypertable Deployments

Page 4: Nosql series-part-3-hypertable

ArchitectureArchitecture

Page 5: Nosql series-part-3-hypertable

Underlying Data Underlying Data RepresentationRepresentation

Page 6: Nosql series-part-3-hypertable

Scaling (part I)Scaling (part I)

Page 7: Nosql series-part-3-hypertable

Scaling (part II)Scaling (part II)

Page 8: Nosql series-part-3-hypertable

Scaling (part III)Scaling (part III)

Page 9: Nosql series-part-3-hypertable

Request RoutingRequest Routing

Page 10: Nosql series-part-3-hypertable

Query HandlingQuery Handling

Page 11: Nosql series-part-3-hypertable

FeaturesFeatures

Page 12: Nosql series-part-3-hypertable

Load data from HT to Hive and vice-versaLoad data from HT to Hive and vice-versa Use Hive types Use Hive types Use Hive QL (joins, aggregations)Use Hive QL (joins, aggregations) Low latency data warehousingLow latency data warehousing Uses Hypertable’s native MapReduce Uses Hypertable’s native MapReduce

Input/Output formatInput/Output format

Page 13: Nosql series-part-3-hypertable

NamespacesNamespaces/development user tweet/testing user tweet/production /v1 user tweet /v2 user tweet

Page 14: Nosql series-part-3-hypertable

Column Family OptionsColumn Family Options

TTL=<t>TTL=<t> ““time to live”time to live” Remove cells that are older than <t>Remove cells that are older than <t>

MAX_VERSIONS=<n>MAX_VERSIONS=<n> Keep only most recent <n> cell versionsKeep only most recent <n> cell versions

Page 15: Nosql series-part-3-hypertable

Access GroupsAccess Groups

Provides control over physical layoutProvides control over physical layout Row orientedRow oriented Column orientedColumn oriented HybridHybrid

Reduces I/OReduces I/O

CREATE TABLE MyTable ( a, b, c, d, ACCESS GROUP first(a), ACCESS GROUP second (b, c, d));

Page 16: Nosql series-part-3-hypertable

Regular Expression FilteringRegular Expression Filtering

Google’s RE2 regular expression engineGoogle’s RE2 regular expression engine Extremely fast (up to 50X Java regex)Extremely fast (up to 50X Java regex) Searches run in time linear in the size of the inputSearches run in time linear in the size of the input Searches constrained to a fixed amount of memorySearches constrained to a fixed amount of memory

Supported Searches:Supported Searches: Row keyRow key Column qualifierColumn qualifier ValueValue

SELECT CELLS tag:/(?i)(nosql|bigtable)/ FROM MyTable WHERE ROW REGEXP "^\D+" AND VALUE REGEXP ”(?i)hypertable";

Page 17: Nosql series-part-3-hypertable

Atomic CountersAtomic Counters

New column option:New column option:

Modified via existing API using specially Modified via existing API using specially formatted values:formatted values:

create table counts ( url COUNTER,);

Value Format Description

[+]n Increment counter by n

-n Decrement counter by n

=n Reset counter to n

Page 18: Nosql series-part-3-hypertable

Group CommitGroup Commit

Supports Supports highly concurrenthighly concurrent updates updates Trades minimum latency for better throughputTrades minimum latency for better throughput Configurable commit interval per-table:Configurable commit interval per-table:

CREATE TABLE counts ( url, domain) GROUP_COMMIT_INTERVAL=100;

Page 19: Nosql series-part-3-hypertable

CompressionCompression Block compressionBlock compression

Cell Store (SSTable) blocksCell Store (SSTable) blocks Commit Log blocksCommit Log blocks

Supported Compression Schemes:Supported Compression Schemes: zlibzlib lzolzo quicklzquicklz bmzbmz nonenone

Page 20: Nosql series-part-3-hypertable

Bloom FilterBloom Filter Dramatically reduces disk accessDramatically reduces disk access Associated with each Cell StoreAssociated with each Cell Store Tells you if key is definitively Tells you if key is definitively notnot present present

Page 21: Nosql series-part-3-hypertable

Performance EvaluationPerformance Evaluation

Page 22: Nosql series-part-3-hypertable

SetupSetup

Modeled after Test described in Bigtable paperModeled after Test described in Bigtable paper 1 Test Dispatcher, 4 Test Clients, 4 Tablet Servers1 Test Dispatcher, 4 Test Clients, 4 Tablet Servers Test was written entirely in Test was written entirely in JavaJava HardwareHardware

1 X 1.8 GHz Dual-core Opteron1 X 1.8 GHz Dual-core Opteron 10 GB RAM10 GB RAM 3X 250GB SATA drives3X 250GB SATA drives

SoftwareSoftware HDFS 0.20.2 running on all 10 nodes, 3X replicationHDFS 0.20.2 running on all 10 nodes, 3X replication HBase 0.20.4HBase 0.20.4 Hypertable 0.9.3.3Hypertable 0.9.3.3

Page 23: Nosql series-part-3-hypertable

LatencyLatency

Page 24: Nosql series-part-3-hypertable

ThroughputThroughputTest Hypertable

Advantage Relative to HBase (%)

Random Read Zipfian 80 GB 925

Random Read Zipfian 20 GB 777

Random Read Zipfian 2.5 GB 100

Random Write 10KB values 51

Random Write 1KB values 102

Random Write 100 byte values 427

Random Write 10 byte values 931

Sequential Read 10KB values 1060

Sequential Read 1KB values 68

Sequential Read 100 byte values 129

Scan 10KB values 2

Scan 1KB values 58

Scan 100 byte values 75

Scan 10 byte values 220

Page 25: Nosql series-part-3-hypertable

Why does Performance Why does Performance Matter?Matter?

$$$

Page 26: Nosql series-part-3-hypertable

Upcoming Release (0.9.5)Upcoming Release (0.9.5)

Last “alpha” releaseLast “alpha” release Release Date: February 15th 2011Release Date: February 15th 2011 FeaturesFeatures

Automatic range balancingAutomatic range balancing Asynchronous APIAsynchronous API Improved Monitoring SystemImproved Monitoring System

Page 27: Nosql series-part-3-hypertable

ResourcesResources

Twitter: hypertable

Project Site: www.hypertable.org

Blog: blog.hypertable.com

Page 28: Nosql series-part-3-hypertable

Professional SupportProfessional Support

Page 29: Nosql series-part-3-hypertable

Q&AQ&A