azul yandexjune010

58
Gil Tene CTO & co-founder, Azul Systems June 26, 2010 Azul Tech Talk Yandex, June 2010

Upload: yaevents

Post on 23-Jun-2015

2.467 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Azul yandexjune010

Gil TeneCTO & co-founder, Azul SystemsJune 26, 2010

Azul Tech TalkYandex, June 2010

Page 2: Azul yandexjune010

2 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Agenda

• Background

• Memory stagnation

• Java & Virtualization

• [Concurrent] Garbage Collection deep-dive

• Some [more] Azul Zing Platform details

• Talk-Talk / Q & A

Page 3: Azul yandexjune010

Gil TeneCTO & co-founder, Azul SystemsJune 26, 2010

Memory Stagnation

Page 4: Azul yandexjune010

Gil TeneCTO & co-founder, Azul SystemsJune 26, 2010

2GB:the new 640K

Page 5: Azul yandexjune010

5 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Memory.

How many of you use heap sizes:

Larger than ½ GB?

Larger than 1 GB?

Larger than 2 GB?

Larger than 4 GB?

Larger than 10 GB?

Larger than 20 GB?

Larger than 100 GB?

Page 6: Azul yandexjune010

6 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Problem statement:Memory footprint has stagnated

• Individual Instances with ~1-2GB of heap were commonplace in 2001

─ Some were still smaller─ Very few were larger

• Individual instances with ~1-2GB dominate in 2010─ Very few are smaller─ Relatively few are larger

• Could it really be that all applications have the same memory size needs?

• The practical size of an individual Java heap has not moved in ~9 years.

6

Page 7: Azul yandexjune010

7 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Why ~2GB? – It’s all about GC (and only GC)

• Seems to be the practical limit for responsive applications

• A 100GB heap won’t crash. It just periodically “pauses” for many minutes at a time.

• [Virtually] All current commercial JVMs will exhibit a periodic multi-second pause on a normally utilized 2GB heap.

─ It’s a question of “When”, not “If”.

─ GC Tuning only moves the “when” and the “how often” around

• “Compaction is done with the application paused. However, it is a necessary evil, because without it, the heap will be useless…” (JRockit RT tuning guide).

7

Page 8: Azul yandexjune010

8 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Maybe 2GB is simply enough?

• We hope not

• Plenty of evidence to support pent up need for more heap

• Common use of lateral scale across machines

• Common use of “lateral scale” within machines

• Use of “external” memory with growing data sets─ Databases certainly keep growing

─ External data caches (memcache, Jcache, JavaSpaces)

• Continuous work on the never ending distribution problem─ More and more reinvention of NUMA

─ Bring data to compute, bring compute to data

8

Page 9: Azul yandexjune010

9 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Will distributed solutions “solve” the problem?

• Distributed data solutions are [only] used to solve problems that can’t be solved without distribution

─ Extra complexity & loss of efficiency only justified by necessity

• It’s always because something doesn’t fit in one simple symmetric UMA process model

─ When we need more compute power than one node has

─ When we need more memory state than one node can hold

• Distributed solutions are not used to solve tiny problems

• “Tiny” is not a matter of opinion, it’s a matter of time─ “Tiny” gets ~100x bigger every decade

9

Page 10: Azul yandexjune010

10 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary 10

“Tiny” application history

1980

1990

2000

2010

100KB apps on a ¼ to ½ MB Server

10MB apps on a 32 – 64 MB server

1GB apps on a 2 – 4 GB server

??? GB apps on 256 GBMoore’s Law: transistor counts grow• 2x every 18 mouths• ~100x every 10 yrs

100GB

Zing VM

Page 11: Azul yandexjune010

11 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

2GB is the new 640K

This is getting embarrassing….

• We do strange things within servers now…

• Java runtimes “misbehave” above ~2GB of memory─ Most people won’t tolerate 20 second pauses

• It takes 50 JVM instances to fill up a ~$10K, ~100GB server─ A 256GB server can now be bought for ~$18K (mid 2010)

• Using distributed SW solutions within a single commodity server─ Similar to the EMS tricks Windows 3.1 used to deal with the 640KB cap─ Looks a lot like Yak shaving

• The problem is in the software stack─ Artificial constraints on memory per instance─ GC Pause time is the only limiting factor for instance size─ Can’t just “tune it away”

• Solve GC, and you’ve solved the problem

Page 12: Azul yandexjune010

Gil TeneCTO & co-founder, Azul SystemsJune 26, 2010

The Zing PlatformVirtualization++ for Java

Page 13: Azul yandexjune010

13 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Java & Virtualization

How many of you use virtualization?i.e. KVM, VMWare, Xen, desktop

virtualization (Fusion, Parallels, VirtualBox, etc.)

How many of you use it for production applications?

How many of you think that virtualization willmake your application run faster?

Page 14: Azul yandexjune010

14 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

The Virtualization Tax

Virtualization is universally considered a “tax”

Typical focus is on measuring and reducing overhead

Everyone hopes to get to “virtually the same as non-virtualized” performance characteristics

But we can do so much better….

What If virtualization made Applications Better ?

Page 15: Azul yandexjune010

15 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Virtualization with Application benefits

Improve response times:

Increase Transaction rates:

Increase Concurrent users:

Forget about GC pauses:

Eliminate daily restarts:

Elastically grow during peaks:

Elastically shrink when idle:

Gain production visibility:

If you want to:…

Use Zing & Virtualization

Use Zing & Virtualization

Use Zing & virtualization

Use Zing & virtualization

Use Zing & virtualization

Use Zing & virtualization

Use Zing & virtualization

Use Zing & virtualization

Page 16: Azul yandexjune010

16 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Java VirtualizationNew foundation for elastic, scalable Java deployments

Virtualize the Java runtime….. Liberate Java from the OS, optimizing runtime execution

Allow highly effective use of available resources100x better scale, throughput and responsiveness, improving

user experience

Elastically scale applicationsSmoothly scale resources up/down based on real-time

demand, improving scalability, efficiency and resiliency

Simplify deployment configurationReduce instances count, improve management and visibility

Page 17: Azul yandexjune010

17 ©2009 Azul Systems, Inc. Azul Company Confidential

Java Virtualization

Java App

OS Layer

x86Server

Azul Java VirtualizationLiberating Java from the rigidities of the OS

Page 18: Azul yandexjune010

18 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Zing™ Platform Components

Zing Virtual Appliances

Elastic, Scalable Capacity

Zing Virtual Machine

Transparent Virtualization

Zing Resource Controller

Management and Monitoring

Zing Vision

Built-in App Profiling

Page 19: Azul yandexjune010

19 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Zing™ Platform Components

Zing Virtual Appliances

Elastic, Scalable Capacity

Zing Virtual Machine

Transparent Virtualization

Zing Resource Controller

Management and Monitoring

Zing Vision

Built-in App Profiling

Page 20: Azul yandexjune010

20 ©2009 Azul Systems, Inc. Azul Company Confidential

20

Zing Elastic DeploymentsVirtualizing Java workloads in the Cloud

Hypervisor Hypervisor Hypervisor

Zing Resource Controller Plug-In

Linux Windows

ZVAZVA

App

ZVAZVA

Apps

ZVAZVA

App

ZVAZVA

App

Zing Virtual ApplianceZing Virtual Appliance

App

ZVAZVA

Apps

ZVAZVA

App

ZVAZVA

App

ZVAZVA

App

Resources are utilized by the Zing virtual appliances, not the

OSs

Page 21: Azul yandexjune010

21 ©2009 Azul Systems, Inc.

Virtues of the Zing Elastic PlatformMaking virtualization the best environment for Java

Optimized Runtime PlatformMore effective use of resources (dozens of cores, 100s of GBs)

Scales smoothly over a wide range (from 1 GB to 1 TB)

Greater stability, resiliency and operating range

Record-breaking ScalabilityCompletely eliminates GC related barriers

Practical support for 100x larger heaps (e.g. 200-500+ GBs)

Sustain 100x higher throughput and allocation rates

Simplified Java App DeploymentsBetter app stability with fewer, more robust JVMs

Zero-overhead runtime visibility

Application-aware resource control

Page 22: Azul yandexjune010

22 ©2009 Azul Systems, Inc. Azul Company Confidential

Building on a Heritage of Elastic Runtimes Proven, vertically integrated execution stack

• Up to 864 cores• Heaps up to 640 GB

Hardware

Java Runtime

OS Kernel

Virtualization

Azul Vega™ 3 Compute Appliance

Page 23: Azul yandexjune010

23 ©2009 Azul Systems, Inc. Azul Company Confidential

Benefits of the Zing Elastic Platform

Business Implications• Consistently fast Java

application response times

• Improved customer experience and loyalty

• Faster time to market

• Greater application availability, even during peaks

• Room for growth, delivered in robust and cost effective manner

• Lower costs through simplified deployments and virtualization and cloud enablement

IT Implications• 100x improvements in key

response time and throughput metrics

• Accelerate virtualization and cloud adoption

• Increased resiliency and efficiency for all Java applications through dynamic resource sharing

• Simplified deployments through instance consolidation

• Unmatched production-time visibility and management

• Fast ROI

Page 24: Azul yandexjune010

24 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Now you can:

Improve response times: with Zing & virtualization!

Increase Transaction rates: with Zing & virtualization!

Increase Concurrent users: with Zing & virtualization!

Forget about GC pauses: with Zing & virtualization!

Eliminate daily restarts: with Zing & virtualization!

Elastically grow during peaks: with Zing & virtualization!

Elastically shrink when idle: with Zing & virtualization!

Gain production visibility: with Zing & virtualization!

Smoothly scale resources up/down based on real-time demand, improving scalability, efficiency and resiliencyAllow highly effective use of available resources

100x better scale, throughput and responsiveness, improving user experienceSimplify deployment configuration

Reduce instances count, improve management and visibility

Page 25: Azul yandexjune010

Thank You

Page 26: Azul yandexjune010

Gil TeneCTO, Azul Systems

Performance Considerations in

Concurrent Garbage Collected

Systems

Page 27: Azul yandexjune010

27 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

1. Understand why concurrent garbage collection is a necessity

2. Gain an understanding of performance considerations specific to concurrent garbage collection

3. Understand what concurrent collectors are sensitive to, and what can cause them to fail

4. Learn how [not] to measure GC in a lab

Page 28: Azul yandexjune010

28 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

What is a concurrent collector?

A Concurrent Collector performs garbage collection work concurrently with the application’s own execution

A Parallel Collector uses multiple CPUs to perform garbage collection

Page 29: Azul yandexjune010

29 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

About the speakerGil Tene (CTO), Azul Systems

• We deal with concurrent GC on a daily basis

• Azul makes Java scalable thru virtualization─ We make physical (Vega™) and Virtual (Zing™) appliances

─ Our appliances power JVMs on Linux, Solaris, AIX, HPUX, …

─ Production installations ranging from 1GB to 300GB+ of heap

─ Zing VM instances smoothly scale to 100s of GB, 10s of cores

• Concurrent GC has always been a must in our space─ It’s now a must in everyone’s space - can’t scale without it

• Focused on concurrent GC for the past 8 years─ Azul’s GPGC designed for robustness, low sensitivity

Page 30: Azul yandexjune010

30 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Why use a concurrent collector?Why not stop-the-world?

• Because pause times break your SLAs

• Because you need to grow your heap size

• Because your application needs to scale

• Because you can’t predict everything exactly

• Because you live in the real world…

Page 31: Azul yandexjune010

31 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Agenda

• Background

• Failure & Sensitivity

• Terminology & Metrics

• Detail and inter-relations of key metrics

• Collector mechanism examples

• Recommendations for measurements

• Q & A

Page 32: Azul yandexjune010

32 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Why we really need concurrent collectorsSoftware is unable to fill up hardware effectively

• 2000:

─ A 512MB-1GB heap was “large”

─ A 1-2GB commodity server was “large”

─ A 2 core commodity server was “large”

• 2010:─ A 2GB heap is “large”

─ A 128-256GB commodity server is “medium”

─ An 24-48 core commodity server is “medium”

• The gap started opening in the late 1990s

• The root cause is Garbage Collection Pauses

Page 33: Azul yandexjune010

33 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Agenda

• Background

• Failure & Sensitivity

• Terminology & Metrics

• Detail and inter-relations of key metrics

• Collector mechanism examples

• Recommendations for measurements

• Q & A

Page 34: Azul yandexjune010

34 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

What constitutes “failure” for a collector?It’s not just about correctness any more

• A Stop-The-World collector fails if it gets it wrong…

• A concurrent collector [also] fails if it stops the application for longer than requirements permit

─ “Occasional pauses” longer than SLA allows are real failures

─ Even if the Application Instance or JVM didn’t crash

─ Otherwise, you would have used a STW collector to begin with

• Simple example: Clustering─ Node failover must occur in X seconds or less

─ A GC pause longer than X will trigger failover. It’s a fault.

─ ( If you don’t think so, ask the guy whose pager just went off… )

Page 35: Azul yandexjune010

35 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Concurrent collectors can be sensitiveGo out of the smooth operating range, and you’ll pause

• Correctness now includes response time

• Just because it didn’t pause under load X, doesn’t mean it won’t pause under load Y

• Outside of the smooth operating range:

─ More state (with no additional load) can cause a pause

─ More load (with no additional state) can cause a pause

─ Different use patterns can cause a pause

• Understand/Characterize your smooth operating range

Page 36: Azul yandexjune010

36 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Agenda

• Background

• Failure & Sensitivity

• Terminology & Metrics

• Detail and inter-relations of key metrics

• Collector mechanism examples

• Recommendations for measurements

• Q & A

Page 37: Azul yandexjune010

37 ©2009 Azul Systems, Inc. Azul Company Confidential

TerminologyUseful terms for discussing concurrent collection

• Mutator─ Your program…

• Parallel─ Can use multiple CPUs

• Concurrent─ Runs concurrently with program

• Pause time─ Time during which mutator is not

running any code

• Generational─ Collects young objects and long

lived objects separately.

• Promotion─ Allocation into old generation

• Marking─ Finding all live objects

• Sweeping─ Locating the dead objects

• Compaction─ Defragments heap─ Moves objects in memory─ Remaps all affected references─ Frees contiguous memory

regions

Page 38: Azul yandexjune010

38 ©2009 Azul Systems, Inc. Azul Company Confidential

MetricsUseful metrics for discussing concurrent collection

• Heap population (aka Live set)─ How much of your heap is alive

• Allocation rate─ How fast you allocate

• Mutation rate─ How fast your program updates

references in memory

• Heap Shape─ The shape of the live object graph─ * Hard to quantify as a metric...

• Object Lifetime─ How long objects live

• Cycle time─ How long it takes the collector to

free up memory

• Marking time─ How long it takes the collector to

find all live objects

• Sweep time─ How long it takes to locate dead

objects─ * Relevant for Mark-Sweep

• Compaction time─ How long it takes to free up

memory by relocating objects─ * Relevant for Mark-Compact

Page 39: Azul yandexjune010

39 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Agenda

• Background

• Failure & Sensitivity

• Terminology & Metrics

• Detail and inter-relations of key metrics

• Collector mechanism examples

• Recommendations for measurements

• Q & A

Page 40: Azul yandexjune010

40 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Cycle TimeHow long until we can have some more free memory?

• Heap Population (Live Set) matters─ The more objects there are to paint, the longer it takes

• Heap Shape matters─ Affects how well a parallel marker will do

─ One long linked list is the worst case of most markers

• How many passes matters ─ A multi-pass marker revisits references modified in each pass

─ Marking time can therefore vary significantly with load

Page 41: Azul yandexjune010

41 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Heap Population (Live Set)It’s not as simple as you might think…

• In a Stop-The-World situation, this is simple─ Start with the “roots” and paint the world─ Only things you have actual references to are alive

• When mutator runs concurrently with GC:─ Not a “snapshot” of a single program state ─ Objects allocated during GC cycle are considered “live”─ Objects that die after GC starts may be considered “live”─ Weak references “strengthened” during GC…

• So assume:─ Live_Set >= STW_live_set + (Allocation_Rate * Cycle_time)

Page 42: Azul yandexjune010

42 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Mutation rateDoes your program do any real work?

• Mutation rate is generally linear to work performed─ The higher the load, the higher the mutation rate

• A multi-pass marker can be sensitive to mutation:─ Revisits references modified in each pass─ Higher mutation rate longer cycle times─ Can reach a point where marker cannot keep up with mutator─ e.g. one marking thread vs.15 mutator threads

• Some common use patterns have high mutation rates─ e.g. LRU cache

Page 43: Azul yandexjune010

43 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Object lifetimeObjects are active in the Old Generation

• Most allocated objects do die young─ So generational collection is an effective filter

• However, most live objects are old─ You’re not just making all those objects up every cycle…

• Large heaps tend to see real churn & real mutation─ e.g. caching is a very common use pattern for large memory

• OldGen is under constant pressure in the real world─ Unlike some/most benchmarks (e.g. SPECjbb)

Page 44: Azul yandexjune010

44 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Generational AssumptionWhen you are forced to collect live young things

• A lot of state dies when transactions are completed─ Transactions typically take some minimum amount of time

• Load usually shows up as concurrently active state─ More concurrent users & transactions – more state

• Higher load always generates garbage faster─ NewGen collections happens more often as load grows

• At some load point NewGen becomes very expensive─ When NewGen GC cycles faster than transactions complete

─ pauses significantly longer if it uses a STW mechanism

Page 45: Azul yandexjune010

45 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Major things that happen in a pauseThe non-concurrent parts of “mostly concurrent”

• If collector does Reference processing in a pause─ Weak, Soft, Final ref traversal

─ Pause length depends on # of refs.

─ Sensitive to common use cases of weak refs

─ e.g. LRU & multi-index cache patterns

• If the collector marks mutated refs in a pause─ Pause length depends on mutation rate

─ Sensitive to load

• If the collector performs compaction in a pause…

Page 46: Azul yandexjune010

46 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Fragmentation & CompactionYou can’t delay it forever

• Fragmentation *will* happen─ Compaction can be delayed, but not avoided─ “Compaction is done with the application paused. However, it is

a necessary evil, because without it, the heap will be useless…” (JRockit RT tuning guide).

• If Compaction is done as a stop-the-world pause─ It will generally be your worst case pause─ It is a likely failure of concurrent collection

• Measurements without compaction are meaningless─ Unless you can prove that compaction won’t happen (Good

luck with that)

Page 47: Azul yandexjune010

47 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

More things that may happen in a pauseMore “mostly concurrent” secrets

• When collector does Code & Class things in a pause─ Class unloading, Code cache cleaning, System Dictionary, etc.

─ Can depend on class and code churn rates

─ Becomes a real problem if full collection is required (PermGen)

• GC/Mutator Synchronization, Safe Points─ Can depend on time-to-safepoint affecting runtime artifacts:

─ Long running no-safepoint loops (some optimizers do this).

─ Huge object cloning, allocation (some runtimes won’t break it up).

• Stack scanning (look for refs in mutator stacks)─ Can depend on # of threads and stack depths

Page 48: Azul yandexjune010

48 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Agenda

• Background

• Failure & Sensitivity

• Terminology & Metrics

• Detail and inter-relations of key metrics

• Collector mechanism examples

• Recommendations for measurements

• Q & A

Page 49: Azul yandexjune010

49 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

HotSpot CMSCollector mechanism examples

• Stop-the-world compacting new gen (ParNew)

• Mostly Concurrent, non-compacting old gen (CMS)─ Mostly Concurrent marking

─ Mark concurrently while mutator is running─ Track mutations in card marks─ Revisit mutated cards (repeat as needed)─ Stop-the-world to catch up on mutations, ref processing, etc.

─ Concurrent Sweeping─ Does not Compact (maintains free list, does not move objects)

• Fallback to Full Collection (Stop the world, serial).─ Used for Compaction, etc.

Page 50: Azul yandexjune010

50 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Azul GPGC Collector mechanism examples

• Concurrent, compacting new generation

• Concurrent, compacting old generation

• Concurrent guaranteed-single-pass marker─ Oblivious to mutation rate─ Concurrent ref (weak, soft, final) processing

• Concurrent Compactor─ Objects moved without stopping mutator─ Can relocate entire generation (New, Old) in every GC cycle

• No Stop-the-world fallback─ Always compacts, and does so concurrently

Page 51: Azul yandexjune010

51 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Agenda

• Background

• Failure & Sensitivity

• Terminology & Metrics

• Detail and inter-relations of key metrics

• Collector mechanism examples

• Recommendations for measurements

• Q & A

Page 52: Azul yandexjune010

52 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Measurement RecommendationsWhen you are actually interested in the results…

• Measure application – not synthetic tests─ Garbage in, Garbage out

• Avoid the urge to tune GC out of the testing window─ You’re only fooling yourself─ Your application needs to run for more than 20 minutes, right?─ Most industry benchmarks are tuned to avoid GC during test

• Rule of Thumb:─ You should see 5+ of the “bad” GCs during test period─ Otherwise, you simply did not test real behavior─ Test until you can show it’s stable (e.g. What if it trends up?)─ Believe your application, not -verbosegc

Page 53: Azul yandexjune010

53 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Don’t ignore “bad” GCCompaction? What Compaction?

Page 54: Azul yandexjune010

54 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Measurement TechniquesMake reality happen

• Aim for 20-30 minute “stable load” tests─ If test is longer, you won’t do it enough times to get good data

─ Don’t “ramp” load during test period – it will defeat the purpose

─ We want to see several days worth of GC in 20-30 minutes

• Add low-load noise to trigger “real” GC behavior─ Don’t go overboard

─ A moderately churning large LRU cache can often do the trick

─ A gentle heap fragmentation inducer is a sure bet

─ Can easily be added orthogonally to application activity

─ See Azul’s “Fragger” example (http://e2e.azulsystems.com)

Page 55: Azul yandexjune010

55 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

Establish smooth operating rangeKnow where it works, and know where it doesn’t…

• Test main metrics for sensitivity

• Stress Heap population, allocation, mutation, etc.

• Add artificial load-linear stress if needed─ E.g. Increase allocation and mutation per transaction

─ E.g. Increase state per session, increase static state

─ E.g. Increase session length in time

─ Drive load with artificially enhanced GC stress

─ Keep increasing until you find out where GC breaks SLA in test

─ Then back off and test for stability

Page 56: Azul yandexjune010

56 ©2010 Azul Systems, Inc. Azul Systems Confidential and Proprietary

SummaryKnow where the cliff is, then stay away from the edge…

• Sensitivity is key─ If it fails, it will be without warning

• Know where you stand on key measurable metrics─ Application driven: Live Set, Allocation rate, Heap size

─ GC driven: Cycle times, Compaction Time, Pause times

• Deal with robustness first, and only then with efficiency─ Efficient and 2% away from failure is not a good thing

• Establish your envelope─ Only then will you know how safe (or unsafe) you are

http://e2e.azulsystems.com

Page 57: Azul yandexjune010

Gil TeneCTO, Azul Systems

Q & A

Remember:

Zing Announcement

JavaOne Ticket drawing

13:45 @“Double” Conf. room

Page 58: Azul yandexjune010

Gil TeneCTO, Azul Systems

Thank YouPerformance Considerations in

Concurrent Garbage Collected

Systems