about "apache cassandra"
TRANSCRIPT
APACHE CASSANDRAScalability, Performance and Fault Tolerance
in Distributed databases
Jihyun.An ([email protected])
18, June 2013
TABLE OF CONTENTS
Preface
Basic Concepts
P2P Architecture
Primitive Data Model & Architecture
Basic Operations
Fault Management
Consistency
Performance
Problem handling
TABLE OF CONTENTS (NEXT TIME)
Maintaining
Cluster Management
Node Management
Problem Handling
Tuning
Playing (for Development, Client stance)
Designing
Client
Thrift
Native
CQL
3rd party
Hector
OCM
Extension
Baas.io
Hadoop
PREFACE
OUR WORLD
Traditional DBMS is very valuable
Storage(+Memory) and Computational Resources cost is cheap (than before)
But we meet new section
Big data
(near) Real time
Complex and various requirement
Recommendation
Find FOAF
…
Event Driven Trigging
User Session
…
OUR WORLD (CONT)
Complex applications combine difference types of problems
Different language -> more productive
ex: Functional language, Multiprocessing optimized language
Polyglot persistent layer
Performance vs Durability?
Reliability?
…
TRADITIONAL DBMS
Relational Model
Well-defined Schema
Access with Selection/Projection
Derived from Joining/Grouping/Aggregating(Counting..)
Small data (from refined)
…
But
Painful data model changes
Hard to scale out
Ineffective in handling large volumes of data
Not considered with hardware
…
TRADITIONAL DBMS (CONT)
Has many constraints for ACID
PK/FK & checking
Domain Type checking
.. checking checking
Lots of IO / Processing
OODBMS, ORDBMS
Good but .. more more checking / processing
Not well with Disk IO
NOSQL
Key-value store
Column : Cassandra, Hbase, Bigtable …
Others : Redis, Dynamo, Voldemort, Hazelcast …
Document oriented
MongoDB, CouchDB …
Graph store
Neo4j, Orient DB, BigOWL, FlockDB ..
NOSQL (CONT)
Benefits
Higher performance
Higher scalability
Flexible Datamodel
More effective for some case
Less administrative overhead
Drawbacks Limited Transactions
Relaxed Consistency
Unconstrained data
Limited ad-hoc query capabilities
Limited administrative aid tools
CAP
Brewer’s theorem
We can pick two of
Consistency
Availability
Partition tolerance
A
C P
Amazon Dynamo derivatives
Cassandra, Voldemort, CouchDB
, Riak
Neo4j, Bigtable
Bigtable derivatives : MongoDB, Hbase
Hypertable, Redis
Relational:
MySQL, MSSQL,
Postgres
Dynamo
(Architecture)
BigTable
(Data model)
Cassandra
(Apache) Cassandra is a free, open-source, high scalable,
distributed database system for managing large amounts of data
Written in JAVA
Running on JVM
References :
BigTable (http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/bigtable-osdi06.pdf)
Dynamo (http://web.archive.org/web/20120129154946/http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf)
DESIGN GOALS
Simple Key/Value(Column) store
limited on storage
No support anything (aggregating, grouping …) but basic operation (CRUD, Range access)
But extendable
Hadoop (MR, HDFS, Pig, Hive ..)
ESP
Distributed Processing Interface (ex: BSP, MR)
Baas.io
…
DESIGN GOALS (CONT)
High Availability
Decentralized
Everyone can accessor
Replication & Their access
Multi DC support
Eventual consistency
Less write complexity
Audit and repair when read
Possible tuning -> Trade offs between consistency, durability and latency
DESIGN GOALS (CONT)
Incremental scalability
Equal Member
Linear Scalability
Unlimited space
Write / Read throughput increase linearly by add node(member)
Low total cost
Minimize administrative work
Automatic partitioning
Flush / compaction
Data balancing / moving
Virtual nodes (since v1.2)
Middle powered nodes make good performance
Collaborating work will make powerful performance and huge space
FOUNDER & HISTORY
Founder
Avinash Lakshman (one of the authors of Amazon's Dynamo)
Prashant Malik ( Facebook Engineer )
Developer
About 50
History
Open sourced by Facebook in July 2008
Became an Apache Incubator project in March 2009
Graduated to a top-level project in Feb 2010
0.6 released (added support for integrated caching, and Apache Hadoop MapReduce) in Apr 2010
0.7 released (added secondary indexes and online schema change) in Jan 2011
0.8 released (added the Cassandra Query Language (CQL), self-tuning memtables, and support for zero-downtime upgrades) in Jun 2011
1.0 released (added integrated compression, leveled compaction, and improved read performance) in Oct 2011
1.1 released (added self-tuning caches, row-level isolation, and support for mixed ssd/spinning disk deployments) in Apr 2012
1.2 released (added clustering across virtual nodes, inter-node communication, atomic batches, and request tracing) in Jan 2013
PROMINENT USERS
User Cluster size Node count Usage Now
Facebook >200 ? Inbox search Abandoned,Moved to HBase
Cisco WebEx ? ? User feed, activity OK
Netflix ? ? Backend OK
Formspring ? (26 million account with 10 m responsed per day)
? Social-graph data OK
Urban airship, Rackspace, Open X, Twitter (preparing move to)
BASIC CONCEPTS
P2P ARCHITECTURE
All nodes are same (has equality)
No single point of failure / Decentralized
Compare with
mongoDB
broker structure (cubrid …)
Master / slave
…
P2P ARCHITECTURE
Driven linear scalability
References :
http://dev.kthcorp.com/2011/12/07/cassandra-on-aws-100-million-writ/
PRIMITIVE DATA MODEL & ARCHITECTURE
COLUMN
Basic and primitive type (the smallest increment of data)
A tuple containing a name, a value and a timestamp
Timestamp is important
Provided by client
Determine the most recent one
If meet the collision, DBMS chose the latest one
Name
Value
Timestamp
COLUMN (CONT)
Types
Standard: A column has a name (UUID or UTF8 …)
Composite: A column has composite name (UUID+UTF8 …)
Expiring: TTL marked
Counter: Only has name and value, timestamp managed by server
Super: Used to manage wide rows, inferior to using composite
columns (DO NOT USE, All sub-columns serialized)
Counter Name
Value
Name
Name
Value
Timestamp
Name
Value
Timestamp
COLUMN (CONT)
Types (CQL3 based)
Standard: Has one primary key.
Composite: Has more than one primary key,
recommended for managing wide rows.
Expiring: Gets deleted during compaction.
Counter: Counts occurrences of an event.
Super: Used to manage wide rows, inferior to using
composite columns (DO NOT USE, All sub-columns
serialized)
DDL : CREATE TABLE test (
user_id varchar,
article_id uuid,
content varchar,
PRIMARY KEY (user_id, article_id)
);
user_id article_id content
Smith <uuid1> Blah1..
Smith <uuid2> Blah2..
{uuid1,content}
Blah1…
Timestamp
{uuid2,content}
Blah2…
Timestamp
Smith
<Logical>
<Physical>
SELECT user_id,article_id from test order by article_id DESC LIMIT 1;
ROWS
A row containing a represent key and a set of columns
A row key must be unique (usually UUID)
Supports up to 2 billion columns per (physical) row.
Columns are sorted by their name (Column’s Name indexed)
Primitive
Secondary Index
Direct Column Access
Name
Value
Timestamp
Name
Value
Timestamp
Name
Value
Timestamp
Row
Key
COLUMN FAMILY
Container for columns and rows
No fixed schema
Each row is uniquely identified by its row key
Each row can have a different set of columns
Rows are sorted by row key
Comparator / Validator
Static/Dynamic CF
If columns type is super column, CF called “Super Column Familty”
Like “Table” in Relational world
Name
Value
Timestamp
Name
Value
Timestamp
Name
Value
Timestamp
Row
Key
Name
Value
Timestamp
Row
Key
DISTRIBUTION
Row
Row
Row
Row
Row
Row
Server
1
Server
3Server
2
Server
4
How to
map?
TOKEN RING
Node is a instance (typically same as a server)
Used to map between each row and node
Range from 0 to 2127-1
Associated with a row key
Node
Assigned a unique token (ex: token 5 to Node 5)
Range is from previous node token to their token
token 4 < Node 5’range <= token 5
Node 1
Node 2
Node 3
Node 4Node 5
Node 6
Node 7
Node 8
Token 5
Token 4
PARTITIONING
Row
Key
Random
Partitioners
(MD5,
Murmur3)
Order
Preserving
Partitioner /
Byte
Ordered
Partitioner
Default
Row
KeyRow
KeyRow
Key
REPLICATION
Any node has read/write role is called
coordinator node (by client)
Locator determine where located the replica
Replica is used at
Consistency check
Repair
Ensure W + R > N for consistency
Local Cache (Row cache)
Node 1
Node 2
Node 3
Node 4Node 5
Node 6
Node 7
Node 8
Replica Factor is 4 (N-1 will be replicated)Simple Locator treat strategy order as proximity
Locator
(Simple)
Coordinator node
Locating first one
1
2
Here is original
REPLICATION (CONT)
Multi DC support
Allow to Specify how many replcas in each DC
Within DC replicas are placed on different racks
Relies on snitch to place replicas
Strategy (provided from Snitch)
Simple (Single DC)
RackInferringSnitch
PropertyFileSnitch
EC2Snitch
EC2MultiRegionSnitch
DC1DC2
Entire
ADD / REMOVE NODE
Data transfer between nodes called “Streaming”
If add node 5,
node 3 and node 4, 1 (suppose RF is 2) involved in streaming
If remove node 2
node 3(got higher token and their replica container) serve instead
Node 1
Node 2Node 3
Node 4
Node 1
Node 2
Node 3
Node 4
Node 5
Node 1
Node 3
Node 4
VIRTUAL NODES
Support since v1.2
Real time migration support?
Shuffle utility
One node has many tokens
=> one node has many ranges Node 1 Node 2
Number of token is 4
Cluster
Node 2
Node 1
VIRTUAL NODES (CONT)
Less administrative works
Save cost
When Add/Remove node
many node co-works
No need to determine the token
Shuffle to re-balance
Less changing time
Smart balancing
No need to balance
(Sufficiently number of token should be higher)
Number of token is 4
Node 2
Node 1
Cluster
Node 2
Node 1
Node 3
Add node 3
KEYSPACE
A namespace for column families
Authorization
CF? yeah
Replication
Key oriented schema (see right)
{ "row_key1":{
"Users":{ "emailAddress":{"name":"emailAddress","value":"[email protected]"
}, "webSite":{"name":"webSite", "value":http://bar.com} },"Stats":{ "visits":{"name":"visits", "value":"243"} }
}, "row_key2":{
"Users":{ "emailAddress":{"name":"emailAddress", "value":"[email protected]"}, "twitter":{"name":"twitter", "value":"user2"}
} }
}
Row Key
Column Family
Column
CLUSTER
Total amount of data managed by the cluster is represented as a
ring
Cluster of nodes
Has multiple(or single) Keyspace
Partitioning Strategy defined
Authentication
GOSSIP
Gossip protocol is used for cluster membership.
Failure detection on service level (Alive or Not)
Responsible
Every node in the system knows every other node’s status
Implemented as
Sync -> Ack -> Ack2
Information : status, load, bootstraping
Basic status is Alive/Dead/Join
Runs every second
Status disseminated in O(logN) (N is the number of nodes)
Seed
PHI is used for auditing dead or alive in time window
(5 -> detecting in 15~16 s)
Data structure
HeartBeat<Application Status<Endpoint Status<Endpoint StatusMap
N1
N2
N3
N4
N6
N5
BASIC OPERATIONS
WRITE / UPDATE
CommitLog
Abstracted Mmaped Type
File & Memory Sync -> On system failure? This is angel for U ^^.
Java NIO
C-Heap used (=Native Heap)
Log Data (Write->Delete? But exists)
Segment Rolling structure
Memtable
In memory buffer and workspace
Sorted order by row key
If reach threshold or period point, written to disk to a persistent table structure(SSTable)
WRITE / UPDATE (LOCAL LEVEL)
Write
CommitLog
Write : “1”:{“name”:”fullname”,”value”:”smith”}
Write : “2”:{“name”:”fullname”,”value”:”mike”}
Delete : “1”
Write : “3”:{“name”:”fullname”,”value”:”osang”}
… Key Name Value
1 fullname smith
2 fullname mike
3 fullname Osang
… … …
Memtable
SSTable SSTable SSTable
1 Write to commitLog
2
Write/Update to Memtable
3Write to Disk (flush)
SSTABLE
SSTable is Sorted String Table
Best for log structured DB
Store large numbers of key-value pairs
Immutable
Create with “Flush”
Merges by (major/minor) compaction
Has one or more column has different version (timestamp)
Choose recent one
READ (LOCAL LEVEL)
Key Name Value
2 fullname mike
3 fullname Osang
… … …
SSTableBF
IDX
SSTableBF
IDX
Read
Memtable
READ (CLUSTER LEVEL, +READ REPAIR)
Replica(Original, Right)
Replica(Right)
Replica(Wrong)
Digest ComparingChoose the right one if digests differ(the most recent)
Recover
Read
Operation
Coordinator
Locator1 Transferred from original/replica node (with consistency level)
2
3
DELETE
Add tomstone (this is some type of column)
Garbage collected when compacting
GC grace seconds : 864000 (default 10 days)
Issue
If the fault node recover after GCGraceSeconds, the deleted data can
be resurrected
FAULT MANAGEMENT
DETECTION
Dynamic threshold for marking nodes
Accrual Detection Mechanism calculates a per-node threshold
Automatic take into account Network condition, workload and
other conditions might affect perceived heartbeat rate.
From 3rd party client
Hector
Failover
HINTED-HANDOFF
The coordinator will store a hint for if the node down or failed to
acknowledge the write
Hint consists of the target replica and the mutation(column
object) to be replayed
Use java heap (might next to be off-heap)
Only saved within limited time (default, 1 hour) after a replica fails
When failed node is alive again, it will begin streaming the miss
writes
REPAIR
Support triangle method
CommitLog Replaying (by administrator)
Read Repair (realtime)
Anti-entropy Repair (by administrator)
READ REPAIR
Background work
Configured per CF
Choose most recently written value if they are inconsistent, and
replace it.
ANTI-ENTROPY REPAIR
Ensure all data on a replica is made consistent
Merkle tree used
Tree of data block’s hashes
Verify inconsistent
Repair node request merkle hash (piece of CF)
to replicas and comparing, streaming from a replica if inconsistent, do Read-repair
Block
1
Block
2
Block
3…
CF
hash hash hash hash
hash hash
hash
CONSISTENCY
BASIC
Full ACID compliance in distributed system is a bad idea.
(network, … )
Single row updates are atomic (include internal indexes),
everything else is not
Relaxing consistency does not equal data corruption
Tunable Consistency
Speed vs precision
Any read and write operation decides how consistent the requested
data should be (from client)
CONDITION
Consistency ensure if
(W + R) > N
W is nodes written (succeed)
R is nodes read
N is replica factor
CONDITION (CONT)
N is 3
Operations
1. Write 3
2. Write 5
3. Write 1
3 5 1
Worst case
W is 1
1 5 1W is 2 3 1 1or
W is 2 1 1 1
R is 1
Possible case
3 5 1or or
R is 21
1 R is 3
Written Read
(W+R)>N ensure that at lease one latest value can be selected
This is eventual consistency
READ CONSISTENCY LEVELS
One
Two
Three
Quorum
Local Quorum
Each Quorum
All
Specify how many replicas must response
before a result is return to the client
Quorum : (Replication Factor / 2) + 1
Local Quorum / Each Quorum is used at Multi-
DC
Round down to a whole number processing
(If satisfied, return right away)
WRITE CONSISTENCY LEVELS
ANY
One
Two
Three
Quorum
Local Quorum
Each Quorum
All
Specify how many replicas must succeed
before returning acknowledge to client
Quorum : (Replication Factor / 2) + 1
Local Quorum / Each Quorum is used at Multi-
DC
ANY level contain hinted-handoff condition
Round down to a whole number processing
(If satisfied, return right away)
PERFORMANCE
CACHE
Key/Row Cache can save their data to files
Key Cache
Accessed Frequently
Hold the location of keys (indicating to columns)
In memory, on JVM heap
Row Cache
Optional
Hold entire columns of the row
In memory, on Off-heap (since v1.1) or JVM heap
If you have huge column, this will make OOME (Out Of Memory Event)
CACHE
Mmaped Disk Access
On 64bit JVM, used for data and index summary (default)
Provide virtual mmaped space in Memory for SSTable
On C-Heap(native heap)
GC make this as cache
Data accessed frequently live long period, otherwise GC will purge that
If the data exists in memory, return it (=cache)
(Problem) GC C-Heap when its full only
(Problem) handle open SSTable, this mean Cassandra can allocate the entire size of open SSTables, otherwise native OOME
If you wanna have efficient Key/Row/Mmaped Access cache, add sufficient nodes to cluster
BLOOM FILTERS
Each SSTable has this
Used to check if a requested row key exists in the SSTable before
doing any seeks (disk)
Per row key, generate several hashes and mark the buckets for
the key
Check each bucket for the key’s hashes, if any is empty the key
does not exists
False positive are possible, but false negative are not
Key 1 Key 2 Key 2
Hash A Hash B Hash C
1 1 1
Same hashes
Only has
INDEX
Primary Index
Per CF
The index of CF’s row key
Efficient access with Index summary (1 row key out of every 128 is
sampled)
In memory, on JVM heap (next move to Off-heap)
Read BF
KeyCache
SSTable
Index
Summary
Primary
Index
Offset
Calculator
INDEX (CONT)
Secondary Index
For Column’s value(s)
Support composite type
Hidden CF
Implemented by CF’name index
Value is the CF’name
Write/Update/Delete operation is atomic
Share value for many rows is good for
On the contrary unique value for indexing is poor (-> use Dynamic CF for
indexing)
COMPACTION
Combines data from SSTables
Merge row fragments
Rebuild primary and secondary indexes
Remove expired columns marked with tomestone
Delete old SSTable if complete
“Minor” only compactions merge SSTables of similar size, “Major” compactions merge all SSTables in a given CF
Size-tiered compaction
Leveled compaction
Since v1.0
Based on LevelDB
Temporary use maximum twice space and spike in disk IO.
ARCHITECTURE
Write : no race conditions, not handled by disk IO
Read : Slow than write, but fast (DHT, cache …)
Load balancing
Virtual Nodes
Replication
Multi-DC
BENCHMARK
References :
http://www.google.co.kr/url?sa=t&rct=j&q=&esrc=s&frm=1&source=web&cd=1&cad=rja&sqi=2&ved=0CCsQFjAA&url=http%3A%2F%2F68.18
0.206.246%2Ffiles%2Fycsb.pdf&ei=O_nAUYqlPI2okQWO-ICwCA&usg=AFQjCNGySLHho0zZ-
eMsJIm4VjsoNEOyKw&sig2=6p45QMDvTN963EqbM8YpDg/
Workload A—update heavy: (a) read
operations, (b) update operations.
Throughput in this (and
all figures) represents total operations
per second, including reads and
writes.
Workload B—read heavy: (a) read
operations, (b) update operations
By YCSB (Yahoo Cloud Serving Benchmark)
BENCHMARK (CONT)
References :
http://www.google.co.kr/url?sa=t&rct=j&q=&esrc=s&frm=1&source=web&cd=1&cad=rja&sqi=2&ved=0CCsQFjAA&url=http%3A%2F%2F68.18
0.206.246%2Ffiles%2Fycsb.pdf&ei=O_nAUYqlPI2okQWO-ICwCA&usg=AFQjCNGySLHho0zZ-
eMsJIm4VjsoNEOyKw&sig2=6p45QMDvTN963EqbM8YpDg/
Workload E—short scans.
By YCSB (Yahoo Cloud Serving Benchmark)
Read performance as cluster size increases.
BENCHMARK (CONT)
Elastic speedup:
Time series showing
impact of adding
servers online.
By YCSB (Yahoo Cloud Serving Benchmark)
References :
http://www.google.co.kr/url?sa=t&rct=j&q=&esrc=s&frm=1&source=web&cd=1&cad=rja&sqi=2&ved=0CCsQFjAA&url=http%3A%2F%2F68.18
0.206.246%2Ffiles%2Fycsb.pdf&ei=O_nAUYqlPI2okQWO-ICwCA&usg=AFQjCNGySLHho0zZ-
eMsJIm4VjsoNEOyKw&sig2=6p45QMDvTN963EqbM8YpDg/
BENCHMARK (CONT)By NoSQLBenchmarking.com
References :
http://www.nosqlbenchmarking.com/2011/02/new-results-for-cassandra-0-7-2//
BENCHMARK (CONT)By Cubrid
References :
http://www.cubrid.org/blog/dev-platform/nosql-benchmarking/
BENCHMARK (CONT)By VLDB
References :
http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf/
Read latency Write latencyThroughput (95% read, 5% write)
BENCHMARK (LAST) By VLDB
References :
http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf/
Throughput (50% read, 50% write) Throughput (100% write)
PROBLEM HANDLING
RESOURCE
Memory
Off-heap & Heap
OOME Problem
CPU
GC
Hashing
Compression / Compaction
Network Handling
Context Switching
Lazy Problem
IO
Bottleneck for everything
MEMORY
Heap (GC management)
Permanent (-XX:PermSize, -XX:MaxPermSize)
JVM Heap (-Xmx, -Xms, -Xmn)
C-Heap (=Native Heap)
OS Shared
Thread Stack (-Xss)
Objects that access with JNI
Off-Heap
OS Shared
GC managed by Cassandra
MEMORY (CONT)
Heap
Permanent
JVM Heap
Memtable
KeyCache
IndexSummary(move to Off-heap on next release)
Buffer
Transport
Socket
Disk
C-Heap
Thread Stack
File Memory Map (Virtual space)
Data / Index buffer (default)
CommitLogv1.2
Off-Heap (OS shared)
RowCache
BloomFilter
Index->CompressionMetaData->ChuckOffset
MEMORY (CONT)
Memtable
Managed
total size (default 1/3 JVM heap, flush largest memtable for CF if reached)
Emergency, heap usage above the fraction of the max after full GC(CMS) -> flush largest memtable (each time) -> prevent full GC / OOME
KeyCache
Managed
total size (100M or 5% of the max)
Emergency, heap usage above the fraction of the max after full GC(CMS) -> reduce max cache size -> prevent full GC / OOME
RowCache/CommitLog
Managed
total size (default disabled) -> prevent OOME
MEMORY (CONT)
Thread Stack
Not managed
But XSS set as 180k (default)
Check thrift (transport level, RPC server)’s server serving type (sync,
hsha, async(has bugs))
Set min/max threads for connection (default unlimited)
v1.2
MEMORY (CONT)
Transport buffer
Thrift
Support many languages and crossing
Provide server/client interface, serializing
Apache project, created by Facebook
Framed buffer (default max 16M, variable size)
4k, 16k, 32k, … 16M
Determine by client
Per connection
Adjust max frame buffer size (client, server)
Set min/max threads for connection (default unlimited)
v1.2
Data Service
Client
Data Service
Thrift
MEMORY (LAST)
C-Heap/Off-Heap
OS Shared -> Other application possible to make some problem
File Memory Map (Virtual space)
GC when Full GC
0 <= total size <= the size of opened SSTables
If cannot allocate? -> Native OOME
But
Generally access limited space of SSTable
GC make space
Worst case? (If OOME occur)
yaml->disk_access_mode : standard (restart required)
Add sufficient nodes
Yaml->disk_access_mode : auto After joining
v1.2
CPU
GC
CMS
Marking phase : low thread priority -> but high usage rate (it’s not a problem)
CMSInitiatingOccupancyFraction is 75 (default)
UseCMSInitiatingOccupancyOnly
Full GC
Frequency is important -> may has a problem (eg: thrift transport buffer)
Add nodes or analyze memory usage to adjust configuration for
Minor GC
It’s OK
Compaction
If do slow, okay
So priority down with “-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Dcassandra.compaction.priority=1”
High CPU Load -> sustaining? -> When U need to add nodes
SWAPPING
Swapping make big problem for real-time application
IO block -> Thread block -> Gossip/Compaction/Flush … delaying -> make other problem
Disable or Set minimum Swapping
Disable Swap partition
Or Enable JNA + Kernel Configuration
JNA : Mlockall (keep heap memory in physical memory)
Kernel
vm.swappiness=0 (but distress -> possible to swapping)
vm.overcommit_memory=1
Or vm.overcommit_memory=2 (overcommit managed)
vm.overcommit_ratio=? (eg 0.75)
Max memory = swap partition size + ratio*physical memory size
Eg: 8G = 2G + 0.75*8G
MORNITERING
System Monitoring
CPU / Memory / Disk
Nagios, Ganglia, Cacti, Zabbix
Network Monitoring
Per Client
NfSen (network flow monitoring, see: http://nfsen.sourceforge.net/#mozTocId376385)
Cluster Monitoring / Maintaining
OpsCenter
CHECK THREAD
“top” command
“H” key command to spread per thread
“P” key command to sort by CPU usage rate
Choose heavy rate thread’s PID
PID convert to in Hex (http://www.binaryhexconverter.com/decimal-to-hex-converter)
“jstack <Parent PID> > filename.log” command to save java stack to file
Search PID in Hex
313C
CHECK HEAP
Use dump file that from “jmap” or OOME
Use “jhat” or another tool to analyze
Check [B
and their reference object
For development, maintaining
Sorry..
I have just two days to write this presentation.
Next time I will write and speak to U.
See U next time
Question or Talk about anything with Cassandra
Thank you
If you have any problem or question for me, please contact my email.