how to stop worrying and start caching in java

How to Stop Worrying and Start Caching in Java

SriSatish Ambati , Azul Systems [email protected] Surtani, RedHat [email protected]

The TrailExamples

Elements of Cache Performance

Theory

Metrics

200GB Cache Design

Whodunit

Overheads in Java – Objects, GC

Locks,

Communication, Sockets, Queues (SEDA)

Serialization

Measure

Wie die wahl hat, hat die Qual!He who has the choice has the agony!

Some example caches

Homegrown caches – Surprisingly work well.

(do it yourself! It’s a giant hash)

Infinispan, Coherence, Gemstone, GigaSpaces, EhCache, etc

NoSQL stores (Apache Cassandra)

Non java alternatives: MemCached & clones.

Visualize Cache

Simple example

Visualize Cache

Replicated Cache Distributed Cache

Elements of Cache Performance

Hot or Not: The 80/20 rule.

A small set of objects are very popular!

Hit or Miss: Hit Ratio

How effective is your cache?

LRU, LFU, FIFO, LIRS.. Expiration

Long-lived objects, better locality

Spikes happen

Cascading events: Node load, Node(s) dead.

Cache Thrash: Full Table scan.

Elements of Cache Performance : Metrics

Inserts: Puts/sec, Latencies

Reads: Gets/sec, Latencies, Indexing

Updates: mods/sec, latencies (Locate, Modify & Notify)

Replication, Consistency, Persistence

Size of Objects

Number of Objects

Size of Cache

# of cacheserver Nodes (read only, read write)

# of clients

Partitioning & Distributed Caches

Near Cache/L1 Cache

Bring data close to the Logic that is using it.

Birds of feather flock together - related data live closer

Read-only nodes, Read-Write nodes

Management nodes

Communication Costs

Balancing (buckets)

Serialization (more later)

I/O considerations

Asynchronous

Sockets

Queues & Threads serving the sockets

Bandwidth

Persistence –

File, DB (CacheLoaders)

Write Behind

Data Access Patterns of Doom, ex:

“Death by a million cuts” – Batch your reads.

Buckets–Partitions, Hashing function

Birthdays, Hashmaps & Prime Numbers

Collisions, Chaining

Unbalanced HashMap

- behaves like a list O(n) retrieval

Partition-aware Hashmaps

Non-blocking Hashmaps

(see: locking)

Performance Degrades with 80% table density

Imagine – John Lennon

How many nodes to get a 200G cache?

Who needs a 200G cache?

Disk is the new Tape!

200 nodes @ 1GB heap each

2 nodes @ 100GB heap each

(plus overhead)

SIDE ONE

Join together with the band

I don’t even know myself

SIDE TWO

Let’s see action

Relay

Don’t happen that way at all

The seeker

Java Limits: Objects are not cheap!

How many bytes for a 8 char String ?

(assume 32-bit)

How many objects in a Tomcat idle instance?

char[]

String

book keeping fields 12 bytes

JVM Overhead 16 bytes

Pointer 4 bytes

data 16 bytes


A. 64bytes

31% overhead

Size of String

Varies with JVM

Picking the right collection: Mozart or Bach?

100 elements of:

Treemap <Double, Double>

82% overhead, 88 bytes constant cost per element

[pro: enables updates while maintaining order]

double[], double[] –

2% overhead, amortized

[con: load-then-use]

Sparse collections, empty collections. wrong collections.

TreeMap

Fixed Overhead: 48 bytes TreeMap$Entry

data

Per-entry Overhead: 40 bytes

Double double

*From one 32-bit JVM. Varies with JVM Architecture

Double


data 8 bytes

double

JEE is not cheap either!

Class name Size (B) Count Avg (B)Total 21,580,592 228,805 94.3char[] 4,215,784 48,574 86.8byte[] 3,683,984 5,024 733.3Built-in VM methodKlass 2,493,064 16,355 152.4Built-in VM constMethodKlass 1,955,696 16,355 119.6Built-in VM constantPoolKlass 1,437,240 1,284 1,119.30Built-in VM instanceKlass 1,078,664 1,284 840.1java.lang.Class[] 922,808 45,354 20.3Built-in VM constantPoolCacheKlass 903,360 1,132 798java.lang.String 753,936 31,414 24java.lang.Object[] 702,264 8,118 86.5java.lang.reflect.Method 310,752 2,158 144short[] 261,112 3,507 74.5java.lang.Class 255,904 1,454 176int[][] 184,680 2,032 90.9java.lang.String[] 173,176 1,746 99.2java.util.zip.ZipEntry 172,080 2,390 72

Apache Tomcat 6.0Allocated

Class name Size (B) Count Avg (B)Total 1,410,764,512 19,830,135 71.1char[] 423,372,528 4,770,424 88.7byte[] 347,332,152 1,971,692 176.2int[] 85,509,280 1,380,642 61.9java.lang.String 73,623,024 3,067,626 24java.lang.Object[] 64,788,840 565,693 114.5java.util.regex.Matcher 51,448,320 643,104 80java.lang.reflect.Method 43,374,528 301,212 144java.util.HashMap$Entry[] 27,876,848 140,898 197.9java.util.TreeMap$Entry 22,116,136 394,931 56java.util.HashMap$Entry 19,806,440 495,161 40java.nio.HeapByteBuffer 17,582,928 366,311 48java.nio.HeapCharBuffer 17,575,296 366,152 48java.lang.StringBuilder 15,322,128 638,422 24java.util.TreeMap$EntryIterator 15,056,784 313,683 48java.util.ArrayList 11,577,480 289,437 40java.util.HashMap 7,829,056 122,329 64java.util.TreeMap 7,754,688 107,704 72

Million Objects allocated live

JBoss 5.1 20 4Apache Tomcat 6.0 0.25 0.1

Live

JBoss 5.1Allocated

Java Limits: Garbage Collection

GC defines cache configuration

Pause Times: If stop_the_world_pause > time_to_live node is declared dead

Allocation Rate: Write, Insertion Speed.

Live Objects (residency)

if residency > 50%. GC overheads dominate.

Increasing Heap Size only increases pause times.

64-bit is not going to rescue us either:

Increases object header, alignment & pointer overhead

40-50% increase in heap sizes for same workloads.

Overheads – cycles spent GC (vs. real work); space

Fragmentation, Generations

Fragmentation – compact often, uniform sized objects

[Finding seats for a gang-of-four

is easier in an empty theater!]

Face your fears, Face them Often!

Generational Hypothesis

Long-lived objects promote often,

inter-generational pointers, more old-gen collections.

Entropy: How many flags does it take to tune your GC ? Avoid OOM, configure node death if OOM Shameless plug: Azul’s Pauseless GC (now software edition) ,

Cooperative-Memory (swap space for your jvm under spike: No more OOM!)

Locks: Why Amdahl’s law trumps Moore’s!

Schemes

Optimistic, Pessimistic

Consistency

Eventually vs. ACID

Contention, Waits

java.util.concurrent, critical sections: Use Lock Striping

MVCC, Lock-free, wait-free DataStructures. (NBHM)

Transactions are expensive Reduce JTA abuse, Set the right isolation levels.

Inter-node communication

• TCP for mgmt & data– Infinispan

• TCP for mgmt, UDP for data– Coherence, Infinispan

• UDP for mgmt, TCP for data– Cassandra, Infinispan

• Instrumentation– EHCache/Terracotta

• Bandwidth & Latency considerations Ensure proper network configuration in the kernel Run Datagram tests Limit number of management nodes & nodes

Sockets, Queues, Threads, Stages

How many sockets?

gemstone (VMWare) : Multi-socket implementation, infinispan

alt: increase ports, nodes, clients

How many threads per socket? Mux

Asynchronous IO/Events (apache mina, jgroups)

Curious Case of a single threaded Queue Manager Reduce context switching SEDA

Marshal Arts: Serialization/Deserialization

java.io.Serializable is S.L..O.…W

+ Use “transient”

+ jserial, avro, etc

+ Google Protocol Buffers,

PortableObjectFormat (Coherence)

+ JBossMarshalling

+ Externalizable + byte[]

+ Roll your own

Serialization + Deserialization uBench

http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV2

Count what is countable, measure what is measurable, and what is not measurable, make measurable -Galileo

Latency: Where have all the millis gone?

Measure. 90th percentile. Look for consistency.

=> JMX is great! JMX is also very slow.

Reduced number of nodes means less MBeans!

Monitor (network, memory, cpu), ganglia,

Know thyself: Application Footprint, Trend data.

Q&A

References:

Making Sense of Large Heaps, Nick Mitchell, IBM

Oracle Coherence 3.5, Aleksandar Seovic

Large Pages in Java http://andrigoss.blogspot.com/2008/02/jvm-performance-tuning.html

Patterns of Doom http://3.latest.googtst23.appspot.com/

Infinispan Demos http://community.jboss.org/wiki/5minutetutorialonInfinispan

RTView, Tom Lubinski, http://www.sl.com/pdfs/SL-BACSIG-100429-final.pdf

Google Protocol Buffers, http://code.google.com/p/protobuf/

Azul’s Pauseless GC http://www.azulsystems.com/technology/zing-virtual-machine

Cliff Click’s Non-Blocking Hash Map http://sourceforge.net/projects/high-scale-lib/

JVM Serialization Benchmarks:


http://andrigoss.blogspot.com/2008/02/jvm-performance-tuning.html

http://3.latest.googtst23.appspot.com/

http://community.jboss.org/wiki/5minutetutorialonInfinispan

http://www.sl.com/pdfs/SL-BACSIG-100429-final.pdf

http://code.google.com/p/protobuf/

http://www.azulsystems.com/technology/zing-virtual-machine

http://sourceforge.net/projects/high-scale-lib/




Optimization hinders evolution – Alan Perlis

how to stop worrying and start caching in java

Technology

java objects

java limits

java alternatives

bytes data

bytes jvm

bytes treemap

g cache

java srisatish ambati