2/14/01rightorder : telegraph & java1 telegraph java experiences sam madden uc berkeley...
Post on 18-Jan-2016
218 Views
Preview:
TRANSCRIPT
2/14/01 RightOrder : Telegraph & Java 1
Telegraph Java Experiences
Sam MaddenUC Berkeleymadden@cs.berkeley.edu
2/14/01 RightOrder : Telegraph & Java 2
Telegraph Overview 100% Java In memory database Query engine for alternative sources
Web Sensors
Testbed for adaptive query processing
2/14/01 RightOrder : Telegraph & Java 3
Telegraph & WWW : FFF Federated Facts and Figures Collect Data on the Election Based on Avnur and Hellerstein
Sigmod ‘00 Work: Eddies Route tuples dynamically based on
source loads and selectivities
2/14/01 RightOrder : Telegraph & Java 4
fff.cs.berkeley.edu
2/14/01 RightOrder : Telegraph & Java 5
Architecture Overview Query Parser
Jlex & CUP Preoptimizer
Chooses Access Paths Eddy
Routes Tuples To Modules
2/14/01 RightOrder : Telegraph & Java 6
Modules Doubly-Pipelined Hash Joins Index Joins
For probing into web-pages Aggregates & Group Bys Scans
Telegraph Screen Scraper: View web pages as Relations
2/14/01 RightOrder : Telegraph & Java 7
Execution Framework One Thread Per Query Iterator Model for Queries
Experimented with Thread Per Module Linux threads are expensive
Two Memory Management Models Java Objects Home Rolled Byte Arrays
2/14/01 RightOrder : Telegraph & Java 8
Tuples as Java Objects Tuple Data stored as a Java Object Each in separate byte array Tuples copied on joins, aggregates Issues
Memory Management between Modules, Queries, Garbage collector control
Allocation Overhead Performance: 30,000 200byte tuples / sec
-> 5.9 MB / sec
2/14/01 RightOrder : Telegraph & Java 9
Tuples As Byte Array All tuples stored in same byte array / query Surrogate Java Objects
Offset, Size
Offset, Size
Offset, Size
Surrogate Objects
Byte Array
Directory
2/14/01 RightOrder : Telegraph & Java 10
Byte Array (cont) Allows explicit control over memory /
query (or module) Compaction eliminates garbage
collection randomness Lower throughput: 15,000 t/sec
No surrogate object reuse Synchronization costs
2/14/01 RightOrder : Telegraph & Java 11
Other System Pieces XML Based Catalog
Java Introspection Helps Applet-based Front End JDBC Interface Fault Tolerance / Multiple Servers
Via simple UNIX tools
2/14/01 RightOrder : Telegraph & Java 12
RightOrder Questions Performance vs. C JNI Issues Garbage Collection Issues Serialization Costs Lots of Java Objects JDBC vs ODI
2/14/01 RightOrder : Telegraph & Java 13
Performance Vs. C JVM + JIT Performance Encouraging: IBM
JIT == 60% of Intel C compiler, faster than MSC for low level benchmarks
IBM JIT 2x Faster than HotSpot for Telegraph Scans
Stability Issues www.javalobby.org/features/jpr
2/14/01 RightOrder : Telegraph & Java 14
JIT Performance vs C
IBM JIT
Optimized Intel
Optimized MS
Source: www.javalobby.org/features/jpr
2/14/01 RightOrder : Telegraph & Java 15
Performance Gotchas Synchronization
~2x Function Call overhead in HotSpot Used in Libraries: Vector, StringBuffer
• String allocation single most intensive operation in Telegraph
• Mercatur: 20% initial CPU Cost
Garbage Collection Java dumb about reuse Mercatur: 15% Cost OceanStore: 30ms avg latency, 1S peak
2/14/01 RightOrder : Telegraph & Java 16
More Gotchas Finalization
Finalizing methods allows inlining Serialization
RMI, JNI use serialization Philippsen & Haumacher Show
Performance Slowness
2/14/01 RightOrder : Telegraph & Java 17
Performance Tools Tools to address some issues
JAX, Jopt: make bytecode smaller, faster• www.alphaworks.ibm.com/tech/JAX
www.condensity.com• Bytecode optimizer
www.optimizeit.com• Good profiler, memory allocation and garbage
collection monitor
2/14/01 RightOrder : Telegraph & Java 18
JNI Issues Not a part of Telegraph JNI overhead quite large (JDK 1.1.8, PII
300 MHz)
Source: Matt Welsh. A System Support High Performance Communication and IO In Java. Master’s Thesis,
UC Berkeley, 1999.
2/14/01 RightOrder : Telegraph & Java 19
More JNI But, this is being worked on
IBM JDK 100,000 B copy in 5ms, vs 23ms for 1.1.8 (500 Mhz PIII)
JNI allows synchronization (pin / unpin), thread management See http://developer.java.sun.com/developer/onlineTraining/Programming/JDCBook/
jni.html
GCJ + CNI: access Java objects via C++ classes http://gcc.gnu.org/java/
2/14/01 RightOrder : Telegraph & Java 20
Garbage Collection Performance
Big problem: 1 S or longer to GC lots of objects Most Java GCs blocking (not concurrent or multi-
threaded) Unexpected Latencies
OceanStore: Network File Server, 30ms avg. latencies for network updates, 1000 ms peak due to GC
In high-concurrency apps, such delays disastrous
2/14/01 RightOrder : Telegraph & Java 21
Garbage Collection Cont. Limited Control
Runtime.gc() only a hint Runtime.freeMemory() unreliable No way to disable
No object reuse Lots of unnecessary memory allocations
2/14/01 RightOrder : Telegraph & Java 22
Serialization Not in Telegraph Philippsen and Haumacher, “More Efficient Object Serialization.”
International Workshop on Java for Parallel and Distributed Computing. San Juan, April, 1999. Serialization costs for RMI are 50% of total RMI time Discard longevity for 7x speed up
Sun Serialization provides versioning Complete class description stored with each serialized object Most standard classes forward compatible (JDK docs note
special cases) See http://java.sun.com/products/jdk/1.2/docs/guide/serialization/spec/serialTOC.doc.html
2/14/01 RightOrder : Telegraph & Java 23
Lots of Objects GC Issues Serious Memory Management
GC makes programmers allocate willy-nilly Hard to partition memory space
Telegraph byte-array ugliness due to inability to limit usage of concurrent modules, queries
2/14/01 RightOrder : Telegraph & Java 24
Storage Overheads Java Object class is big:
Integer requires 23 bytes in JDK 1.3 int requires 4.3 bytes No way to circumvent object fields Use primitives or hand-written
serialization whenever possible
2/14/01 RightOrder : Telegraph & Java 25
JDBC vs ODI No experience with Oracle JDBC overheads are high, but don’t
have specific performance numbers
2/14/01 RightOrder : Telegraph & Java 26
Bottom Line Java great for many reasons
GC, standard libraries, type safety, introspection, etc. Significant reductions in development and debugging
time. Java performance isn’t bad
Especially with some tuning Memory Management an Issue Lack of control over JVMs bad
When to garbage collect, how to serialize, etc.
top related