2/14/01rightorder : telegraph & java1 telegraph java experiences sam madden uc berkeley...

Post on 18-Jan-2016

218 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

2/14/01 RightOrder : Telegraph & Java 1

Telegraph Java Experiences

Sam MaddenUC Berkeleymadden@cs.berkeley.edu

2/14/01 RightOrder : Telegraph & Java 2

Telegraph Overview 100% Java In memory database Query engine for alternative sources

Web Sensors

Testbed for adaptive query processing

2/14/01 RightOrder : Telegraph & Java 3

Telegraph & WWW : FFF Federated Facts and Figures Collect Data on the Election Based on Avnur and Hellerstein

Sigmod ‘00 Work: Eddies Route tuples dynamically based on

source loads and selectivities

2/14/01 RightOrder : Telegraph & Java 4

fff.cs.berkeley.edu

2/14/01 RightOrder : Telegraph & Java 5

Architecture Overview Query Parser

Jlex & CUP Preoptimizer

Chooses Access Paths Eddy

Routes Tuples To Modules

2/14/01 RightOrder : Telegraph & Java 6

Modules Doubly-Pipelined Hash Joins Index Joins

For probing into web-pages Aggregates & Group Bys Scans

Telegraph Screen Scraper: View web pages as Relations

2/14/01 RightOrder : Telegraph & Java 7

Execution Framework One Thread Per Query Iterator Model for Queries

Experimented with Thread Per Module Linux threads are expensive

Two Memory Management Models Java Objects Home Rolled Byte Arrays

2/14/01 RightOrder : Telegraph & Java 8

Tuples as Java Objects Tuple Data stored as a Java Object Each in separate byte array Tuples copied on joins, aggregates Issues

Memory Management between Modules, Queries, Garbage collector control

Allocation Overhead Performance: 30,000 200byte tuples / sec

-> 5.9 MB / sec

2/14/01 RightOrder : Telegraph & Java 9

Tuples As Byte Array All tuples stored in same byte array / query Surrogate Java Objects

Offset, Size

Offset, Size

Offset, Size

Surrogate Objects

Byte Array

Directory

2/14/01 RightOrder : Telegraph & Java 10

Byte Array (cont) Allows explicit control over memory /

query (or module) Compaction eliminates garbage

collection randomness Lower throughput: 15,000 t/sec

No surrogate object reuse Synchronization costs

2/14/01 RightOrder : Telegraph & Java 11

Other System Pieces XML Based Catalog

Java Introspection Helps Applet-based Front End JDBC Interface Fault Tolerance / Multiple Servers

Via simple UNIX tools

2/14/01 RightOrder : Telegraph & Java 12

RightOrder Questions Performance vs. C JNI Issues Garbage Collection Issues Serialization Costs Lots of Java Objects JDBC vs ODI

2/14/01 RightOrder : Telegraph & Java 13

Performance Vs. C JVM + JIT Performance Encouraging: IBM

JIT == 60% of Intel C compiler, faster than MSC for low level benchmarks

IBM JIT 2x Faster than HotSpot for Telegraph Scans

Stability Issues www.javalobby.org/features/jpr

2/14/01 RightOrder : Telegraph & Java 14

JIT Performance vs C

IBM JIT

Optimized Intel

Optimized MS

Source: www.javalobby.org/features/jpr

2/14/01 RightOrder : Telegraph & Java 15

Performance Gotchas Synchronization

~2x Function Call overhead in HotSpot Used in Libraries: Vector, StringBuffer

• String allocation single most intensive operation in Telegraph

• Mercatur: 20% initial CPU Cost

Garbage Collection Java dumb about reuse Mercatur: 15% Cost OceanStore: 30ms avg latency, 1S peak

2/14/01 RightOrder : Telegraph & Java 16

More Gotchas Finalization

Finalizing methods allows inlining Serialization

RMI, JNI use serialization Philippsen & Haumacher Show

Performance Slowness

2/14/01 RightOrder : Telegraph & Java 17

Performance Tools Tools to address some issues

JAX, Jopt: make bytecode smaller, faster• www.alphaworks.ibm.com/tech/JAX

www.condensity.com• Bytecode optimizer

www.optimizeit.com• Good profiler, memory allocation and garbage

collection monitor

2/14/01 RightOrder : Telegraph & Java 18

JNI Issues Not a part of Telegraph JNI overhead quite large (JDK 1.1.8, PII

300 MHz)

Source: Matt Welsh. A System Support High Performance Communication and IO In Java. Master’s Thesis,

UC Berkeley, 1999.

2/14/01 RightOrder : Telegraph & Java 19

More JNI But, this is being worked on

IBM JDK 100,000 B copy in 5ms, vs 23ms for 1.1.8 (500 Mhz PIII)

JNI allows synchronization (pin / unpin), thread management See http://developer.java.sun.com/developer/onlineTraining/Programming/JDCBook/

jni.html

GCJ + CNI: access Java objects via C++ classes http://gcc.gnu.org/java/

2/14/01 RightOrder : Telegraph & Java 20

Garbage Collection Performance

Big problem: 1 S or longer to GC lots of objects Most Java GCs blocking (not concurrent or multi-

threaded) Unexpected Latencies

OceanStore: Network File Server, 30ms avg. latencies for network updates, 1000 ms peak due to GC

In high-concurrency apps, such delays disastrous

2/14/01 RightOrder : Telegraph & Java 21

Garbage Collection Cont. Limited Control

Runtime.gc() only a hint Runtime.freeMemory() unreliable No way to disable

No object reuse Lots of unnecessary memory allocations

2/14/01 RightOrder : Telegraph & Java 22

Serialization Not in Telegraph Philippsen and Haumacher, “More Efficient Object Serialization.”

International Workshop on Java for Parallel and Distributed Computing. San Juan, April, 1999. Serialization costs for RMI are 50% of total RMI time Discard longevity for 7x speed up

Sun Serialization provides versioning Complete class description stored with each serialized object Most standard classes forward compatible (JDK docs note

special cases) See http://java.sun.com/products/jdk/1.2/docs/guide/serialization/spec/serialTOC.doc.html

2/14/01 RightOrder : Telegraph & Java 23

Lots of Objects GC Issues Serious Memory Management

GC makes programmers allocate willy-nilly Hard to partition memory space

Telegraph byte-array ugliness due to inability to limit usage of concurrent modules, queries

2/14/01 RightOrder : Telegraph & Java 24

Storage Overheads Java Object class is big:

Integer requires 23 bytes in JDK 1.3 int requires 4.3 bytes No way to circumvent object fields Use primitives or hand-written

serialization whenever possible

2/14/01 RightOrder : Telegraph & Java 25

JDBC vs ODI No experience with Oracle JDBC overheads are high, but don’t

have specific performance numbers

2/14/01 RightOrder : Telegraph & Java 26

Bottom Line Java great for many reasons

GC, standard libraries, type safety, introspection, etc. Significant reductions in development and debugging

time. Java performance isn’t bad

Especially with some tuning Memory Management an Issue Lack of control over JVMs bad

When to garbage collect, how to serialize, etc.

top related