con5388 maier

47
Java Application Design Practices to Avoid When Dealing with Sub-100 ms SLAs Daryl Maier (IBM Canada Lab), Anil Kumar (Intel Corporation) 1 st October, 2012 © 2012 IBM Corporation

Upload: 0xdaryl

Post on 07-Jul-2015

292 views

Category:

Technology


5 download

TRANSCRIPT

Page 1: Con5388 maier

Java Application Design Practices to Avoid When Dealing with Sub-100 ms SLAs

Daryl Maier (IBM Canada Lab), Anil Kumar (Intel Corporation)

1st October, 2012

© 2012 IBM Corporation

Page 2: Con5388 maier

Important Disclaimers

§ THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.

§ WHILST EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS-IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESSED OR IMPLIED.

§ ALL PERFORMANCE DATA INCLUDED IN THIS PRESENTATION HAVE BEEN GATHERED IN A CONTROLLED ENVIRONMENT. YOUR OWN TEST RESULTS MAY VARY BASED ON HARDWARE, SOFTWARE, OR INFRASTRUCTURE DIFFERENCES.

© 2012 IBM Corporation2

§ ALL DATA INCLUDED IN THIS PRESENTATION ARE MEANT TO BE USED ONLY AS A GUIDE.

§ IN ADDITION, THE INFORMATION CONTAINED IN THIS PRESENTATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM, WITHOUT NOTICE.

§ IBM AND ITS AFFILIATED COMPANIES SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

§ NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF: – CREATING ANY WARRANT OR REPRESENTATION FROM IBM, ITS AFFILIATED COMPANIES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS

Page 3: Con5388 maier

Introduction to the speakers

Daryl Maier

– 12 years experience developing and deploying Java SDKs at IBM Canada Lab

– Recent work focus:• X86 Java just-in-time compiler development and performance• Java benchmarking

© 2012 IBM Corporation3

– Contact: [email protected]

Anil Kumar

– 10 years experience in server Java performance ensuring best customer experience on all Intel Architecture based platforms

– Contact: [email protected]

Page 4: Con5388 maier

The contents of this presentation were jointly produced with

Credits

Elena Sayapina. Java Performance / Intel

© 2012 IBM Corporation4

Intel and IBM collaborate to ensure the best user experience across all Intel Architecture based platforms.

4

Page 5: Con5388 maier

What this talk is about…

§ Learn what contributes to higher transactional response times within a Java application

§ How to measure response time

§ Java application design practices that lead to lower response times

© 2012 IBM Corporation5

§ Java application design practices that lead to lower response times

§ How to tune the environment in which your application runs for better response time

§ How to determine if you can achieve an even better response time

§ Lots of practical examples

Page 6: Con5388 maier

Service Level Agreements

§ SLA == Service Level Agreement– A commitment to provide a service that meets a prescribed level of performance– Can be informal or contractually obligated

CPU

© 2012 IBM Corporation6

CPU

AvailabilityStorage

ConcurrentUsers

ResponseTime

?

Page 7: Con5388 maier

Response time

§ Measure of time needed to complete a transaction in response to a request to do work

§ Lower response times generally have positive effects

§ Different perceptions of response time: user interface, real time event, service level

© 2012 IBM Corporation7

§ Different perceptions of response time: user interface, real time event, service level commitments, …

§ Isn’t improving response time simply a matter of increasing throughput? Not necessarily…

Page 8: Con5388 maier

How do you measure response time?

§ Be sure what you’re measuring is the response time you’re interested in

Transaction A

Requests Responses

© 2012 IBM Corporation8

Transaction B

Transaction CRequestQueue

TransactionQueue

ResponseQueue

Executor Thread Pool

Requests Responses

Page 9: Con5388 maier

How do you measure response time?

§ Be sure what you’re measuring is the response time you’re interested in

Transaction A

Requests Responses

© 2012 IBM Corporation9

Transaction B

Transaction CRequestQueue

TransactionQueue

ResponseQueue

Executor Thread Pool

Requests Responses

Measuring response time from request made to response received?

Page 10: Con5388 maier

How do you measure response time?

§ Be sure what you’re measuring is the response time you’re interested in

Transaction A

Requests Responses

© 2012 IBM Corporation10

Transaction B

Transaction CRequestQueue

TransactionQueue

ResponseQueue

Executor Thread Pool

Requests Responses

Measuring response time from transaction submitted to response received?

Page 11: Con5388 maier

How do you measure response time?

§ Be sure what you’re measuring is the response time you’re interested in

Transaction A

Requests Responses

© 2012 IBM Corporation11

Transaction B

Transaction CRequestQueue

TransactionQueue

ResponseQueue

Executor Thread Pool

Requests Responses

Measure time to complete the transaction?

Page 12: Con5388 maier

How do you measure response time?

§ Make sure your timing measurement isn’t part of the response time!

§ Be aware of accuracy and precision of Java timing methods– System.nanotime()– System.currentTimeMillis()– …and don’t use too many timers!

© 2012 IBM Corporation12

– …and don’t use too many timers!

§ Beware of clock skew in virtual environments– May need to keep time on an external system

Page 13: Con5388 maier

How do you measure response time?

Sample of transaction response times for an IR of 3000 ops/sec. Most long transactions above 95th percentile.

© 2012 IBM Corporation13

Page 14: Con5388 maier

Framework

Influences on response time are not localized

Application

© 2012 IBM Corporation14

Hardware

Operating System

Java VM

FrameworkYou must design and tune the entire stack in order to achieve your response time targets

Page 15: Con5388 maier

SPECjbb2012

§ Next generation Java business logic benchmark from SPEC

§ Business model is a supermarket supply chain: headquarters, supermarkets, suppliers

§ Scalable, self-injecting workload with multiple supported configurations

© 2012 IBM Corporation15

§ Scalable, self-injecting workload with multiple supported configurations

§ Customer relevant technologies: security, XML, JDK 7 features

§ Metrics: max-jOPs (throughput) and critical-jOPs (response time)

§ Will be used for case studies in this presentation

Page 16: Con5388 maier

Framework

Application design influences response time

Application• design for scalability

• eliminate serial bottlenecks

• use appropriate JCL packages

© 2012 IBM Corporation16

Hardware

Operating System

Java VM

Framework• use appropriate JCL packages

• avoid needless synchronization

• avoid excessive object allocations

• cache data locally

• use non-blocking I/O

• be careful with logging and tracing

Page 17: Con5388 maier

Design for scalability

§ Scalability : the ability to increase throughput as more resources are applied

§ Prepare your application to run on modern multi-core architectures

§ Create more parallelism in your application and eliminate serial bottlenecks– Change algorithms

© 2012 IBM Corporation17

– Change algorithms

§ Organize your application into parallel tasks– Leverage TaskExecutor framework for high-level tasks– Consider ForkJoin in Java 7 for fine-grained task decomposition

Page 18: Con5388 maier

Use the java/util/concurrent package

§ j/u/c introduced in Java 5, additional features in Java 6/7

§ Contains building blocks for developing scalable applications– Uses state-of-the-art concurrency algorithms using non-blocking sync algorithms– More variety in locking operations (Lock interface, multiple Conditions)– Atomic variables (atomic math ops such as increment, test-and-set)

© 2012 IBM Corporation18

– Atomic variables (atomic math ops such as increment, test-and-set)– Concurrent collections– Coarse and fine-grained task management

§ Use j/u/c classes as base classes for new data structures

§ Optimized by modern JVMs

Page 19: Con5388 maier

Avoid unnecessary Java synchronization

§ Required for correctness so it can’t always be done

§ Built-in Java synchronization is coarse grained and can inhibit scalability– Useful when true mutual exclusion is the goal– JVMs can help

§ Strongly consider using j/u/c for finer-grained locking– Building blocks for scalable locking

© 2012 IBM Corporation19

– Building blocks for scalable locking

§ Eliminate contended locks

§ Use volatile fields when appropriate– No locking– May be suitable for single writer, multiple-reader (e.g., time stamps)

Page 20: Con5388 maier

Avoid excessive object allocations

§ Understand the effect of object creation on the heap and the strain on garbage collection

§ Consider hoisting allocations from loops

§ Consider using weak/soft references when appropriate

© 2012 IBM Corporation20

§ Consider using weak/soft references when appropriate– Useful for caches, object metadata, or easily rematerializable data

§ Be aware of immutable classes that implicitly return new objects– e.g., BigDecimal, Integer

Page 21: Con5388 maier

Case study: SPECjbb2012

§ Example of design choices around receipt storage in the benchmark

• Some impact on throughput

• No impact on median response time

© 2012 IBM Corporation21

response time

• Significant impact on 99th-percentile response time

Page 22: Con5388 maier

Case study: SPECjbb2012

§ Example of design choices where background tasks become more heavy– Increase in background task of Data Mining (DM)

• Some impact on throughput

• No impact on median response time

© 2012 IBM Corporation22

response time

• Significant impact on 99th-percentile response time

Page 23: Con5388 maier

Reduce data access latency

§ Often a problem in client/server systems

§ Cache data locally to avoid remote communication– Particularly effective with data unlikely to change

© 2012 IBM Corporation23

§ Pitfall : Tradeoff between caching too much to improve remote access latency and accumulating too much that strains garbage collection– an example of where local benefits to throughput have broader negative effects

§ Use Java NIO (Java SE 1.4) and NIO2 (Java SE 7)– Can leverage high performance features

§ Carefully consider non-blocking, unbounded data structures (e.g., ConcurrentLinkedQueue)

Page 24: Con5388 maier

Case study: SPECjbb2012

Performance effects of caching supermarket data over not caching it

• Throughput reduces by half

• Minor impact on median response time

© 2012 IBM Corporation24

median response time

• Some impact on 99th-percentile response time

Page 25: Con5388 maier

Framework

Application frameworks

Application

• application containers (e.g., application

© 2012 IBM Corporation25

Hardware

Operating System

Java VM

Framework• application containers (e.g., application servers, Eclipse)

• 3rd party packages (e.g., Apache commons), Grizzly

• understand thread management and local caching policies

Page 26: Con5388 maier

Framework

Java virtual machine tuning

Application

© 2012 IBM Corporation26

Hardware

Operating System

Java VM

Framework

• garbage collection

• heap tuning

• 64-bit addressing

Page 27: Con5388 maier

Java virtual machine architecture

Debugger Profilers Java Application Code

JVMTI JSE6 Classes

JSE6 Classes

Harmony Classes

User Natives

GC / JIT / Class Lib. Natives Java Native Interface (JNI)

Core VM (Interpreter, Verifier, Stack Walker)

Trace & Dump EnginesJava Runtime

Java APIe.g. Java6/Java7

User Code

© 2012 IBM Corporation27

Trace & Dump Engines

Port Library (Files, Sockets, Memory)

Thread Library

AIX Linux Windows z/OS

PPC-32PPC-64

x86-32x86-64

PPC-32PPC-64

zArch-31zArch-64

x86-32x86-64

zArch-31zArch-64

Operating Systems /Architecture

Environmente.g. J9 R26

= User Code

= Java Platform API

= VM-aware

= Core VM

Page 28: Con5388 maier

Garbage collection

§ Determine the best garbage collection policy to use for your application– Often a response time vs. throughput tradeoff

§ Most GC policies involve a “stop-the-world” phase that works against response times– “throughput” policies tend to incur longer pauses but fewer interruptions– “concurrent” policies lower average pause times by completing some tasks concurrently

© 2012 IBM Corporation28

– “concurrent” policies lower average pause times by completing some tasks concurrently– “balanced” policies carve heap into regions to improve parallelism and reduce pauses

§ Tune your heap parameters

§ -verbose:gc to correlate GC events with application events

Page 29: Con5388 maier

Case study: SPECjbb2012

§ Example showing the effect of different GC policies and heap tunings

• Small throughput reduction from ConMarkSweep

• No impact on median

© 2012 IBM Corporation29

• No impact on median response time

• ConMarkSweep 99th-percentile response time higher but consistent

Page 30: Con5388 maier

64-bit addressing

§ Heap addressability beyond 32-bits (> 3.5GB)– Common for applications with large in-memory working set (e.g., databases, object caches)

§ 64-bit addressing is a less efficient representation than 32-bit– Cache & TLB effects stress hardware

© 2012 IBM Corporation30

– Cache & TLB effects stress hardware

§ Solution: build a 64-bit JVM with near 32-bit efficiency– Use 32-bit values (offsets) to represent object fields– With scaling, between 4 GB and 32 GB can be addressed

§ Enable with –XX:+UseCompressedOops or -Xcompressedrefs

Page 31: Con5388 maier

Framework

Operating system tuning

Application

© 2012 IBM Corporation31

Hardware

Operating System

Java VM

Framework

• large pages

• thread scheduling

Page 32: Con5388 maier

Large data and code pages

§ OS paging architecture requires memory addresses to be mapped to more granular “pages” that are mapped to physical memory– Translation Lookaside Buffers (TLBs)– Using larger page sizes increases TLB effectiveness

§ Large pages must be enabled by the OS

© 2012 IBM Corporation32

§ Large pages must be enabled by the OS– BUT require enough physical pages to be allocated together to be most effective

§ Modern JVMs place both heap and compiled code in large pages

§ -Xlp (J9) or –XX:+UseLargePages (HotSpot)

Page 33: Con5388 maier

Case study: SPECjbb2012

§ Example showing the effect of large pages

• Increase throughput by ~13%

• No impact on median response time

© 2012 IBM Corporation33

response time

• Helps in keeping 99th-percentile response time lower at higher load

Page 34: Con5388 maier

Thread scheduling

§ Context switches

– Voluntary (e.g., preemption during locking)

– Involuntary (e.g., too many active threads)

© 2012 IBM Corporation34

§ Watch for thread migration

Page 35: Con5388 maier

Framework

Hardware tuning

Application

© 2012 IBM Corporation35

Hardware

Operating System

Java VM

Framework

• power management

• BIOS settings

Page 36: Con5388 maier

Hardware tuning

§ Power management

§ Insufficient resources– Physical memory, amount and latency– I/O storage latency

• RAID• SSDs

– Network I/O bandwidth

§ Tune your BIOS settings carefully

© 2012 IBM Corporation36

§ Tune your BIOS settings carefully– Hyperthreading– Prefetching– Power management

Page 37: Con5388 maier

Know your Intel® Xeon® Processor Family

© 2012 IBM Corporation37

Page 38: Con5388 maier

Know your Intel® Xeon® Processor SKU:

© 2012 IBM Corporation38

Page 39: Con5388 maier

Case study: SPECjbb2012

§ Example showing the effect of 8 cores vs. 4 cores– Assumes application leveraging parallelism of multiple cores

• Increases throughput by ~100%

• No impact on median response time

© 2012 IBM Corporation39

response time

• 8 cores deliver much lower 99th-percentile response

Page 40: Con5388 maier

Leveraging your hardware topology

§ Understand the underlying hardware topology to reduce latency and increase throughput

§ For NUMA, affinitize JVMs to core/memory subsets to improve performance– Improve NUMA performance– Optimize the cache hierarchy of the underlying processors

• Increases throughput by ~12%

© 2012 IBM Corporation40

• No impact on median response time

• Much lower 99th-percentile response

Page 41: Con5388 maier

Evaluating your response time

§ Even though you may be achieving an acceptable SLA are there tell-tale signs that you could be achieving even better?– Lack of multi-threadedness in your application– Lock contention– Low CPU utilization– Excessive time (>10%) being spent in OS kernel

© 2012 IBM Corporation

§ Tooling to help diagnose response time issues– IBM HealthCenter

– What is my JVM doing? Is everything ok?– Why is my application running slowly? Why is it not scaling?– Am I using the right options?

– Garbage Collector and Memory Visualizer• Online analysis of heap usage, pause times, many others

– Memory Analyzer• Offline tool providing insight into Java heaps

Page 42: Con5388 maier

Questions?

© 2012 IBM Corporation42

Page 43: Con5388 maier

References

§ Get Products and Technologies– IBM Java Runtimes and SDKs:• https://www.ibm.com/developerworks/java/jdk/

– IBM Monitoring and Diagnostic Tools for Java:• https://www.ibm.com/developerworks/java/jdk/tools/

– SPEC benchmarking• http://www.spec.org

© 2012 IBM Corporation43

• http://www.spec.org

§ Learn– IBM Java InfoCenter:• http://publib.boulder.ibm.com/infocenter/javasdk/v6r0/index.jsp

§ Discuss– IBM Java Runtimes and SDKs Forum:• http://www.ibm.com/developerworks/forums/forum.jspa?forumID=367&start=0

Page 44: Con5388 maier

Copyright and Trademarks

© IBM Corporation 2012. All Rights Reserved.

IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., and registered in many jurisdictions worldwide.

© 2012 IBM Corporation44

Other product and service names might be trademarks of IBM or other companies.

A current list of IBM trademarks is available on the Web – see the IBM “Copyright and trademark information” page at URL: www.ibm.com/legal/copytrade.shtml

Page 45: Con5388 maier

SPECjbb2012 architecture

Single Application Set Multi-Application Set

Ctrl BETxI Ctr

l

BETxI

© 2012 IBM Corporation4545

Controller (Ctrl)–Controls and evaluates the runs

Transaction Injector (TxI)– Issues “Requests” at a given rate–Measures response time by sending probe requests

Backend SUT (BE) –Some % of transactions go across BEs exercising inter-JVM process communication

BETxI

Group

Page 46: Con5388 maier

SPECjbb2012 architecture

SM 1

HQSM 2

SP 1 SP 2

Backend 1

Group 1

© 2012 IBM Corporation46

SM 1

HQ

SM: SupermarketHQ: HeadquartersSP: Supplier

SM 2

SP 1 SP 2

Backend 2

Group 2

Group 1

Page 47: Con5388 maier

Be aware of the impact of logging and tracing

§ Tracing and logging events from your application can have hidden costs– I/O latency

– Storage requirements

– Overhead of test guarding tracing code

© 2012 IBM Corporation47

– Impact on JIT compilation

§ Do try to correlate application tracing information with events in other system or JVM logs