2/14/01rightorder : telegraph & java1 telegraph java experiences sam madden uc berkeley...

26
2/14/01 RightOrder : Telegraph & Java 1 Telegraph Java Experiences Sam Madden UC Berkeley [email protected]

Upload: cornelius-alexander

Post on 18-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2/14/01RightOrder : Telegraph & Java1 Telegraph Java Experiences Sam Madden UC Berkeley madden@cs.berkeley.edu

2/14/01 RightOrder : Telegraph & Java 1

Telegraph Java Experiences

Sam MaddenUC [email protected]

Page 2: 2/14/01RightOrder : Telegraph & Java1 Telegraph Java Experiences Sam Madden UC Berkeley madden@cs.berkeley.edu

2/14/01 RightOrder : Telegraph & Java 2

Telegraph Overview 100% Java In memory database Query engine for alternative sources

Web Sensors

Testbed for adaptive query processing

Page 3: 2/14/01RightOrder : Telegraph & Java1 Telegraph Java Experiences Sam Madden UC Berkeley madden@cs.berkeley.edu

2/14/01 RightOrder : Telegraph & Java 3

Telegraph & WWW : FFF Federated Facts and Figures Collect Data on the Election Based on Avnur and Hellerstein

Sigmod ‘00 Work: Eddies Route tuples dynamically based on

source loads and selectivities

Page 4: 2/14/01RightOrder : Telegraph & Java1 Telegraph Java Experiences Sam Madden UC Berkeley madden@cs.berkeley.edu

2/14/01 RightOrder : Telegraph & Java 4

fff.cs.berkeley.edu

Page 5: 2/14/01RightOrder : Telegraph & Java1 Telegraph Java Experiences Sam Madden UC Berkeley madden@cs.berkeley.edu

2/14/01 RightOrder : Telegraph & Java 5

Architecture Overview Query Parser

Jlex & CUP Preoptimizer

Chooses Access Paths Eddy

Routes Tuples To Modules

Page 6: 2/14/01RightOrder : Telegraph & Java1 Telegraph Java Experiences Sam Madden UC Berkeley madden@cs.berkeley.edu

2/14/01 RightOrder : Telegraph & Java 6

Modules Doubly-Pipelined Hash Joins Index Joins

For probing into web-pages Aggregates & Group Bys Scans

Telegraph Screen Scraper: View web pages as Relations

Page 7: 2/14/01RightOrder : Telegraph & Java1 Telegraph Java Experiences Sam Madden UC Berkeley madden@cs.berkeley.edu

2/14/01 RightOrder : Telegraph & Java 7

Execution Framework One Thread Per Query Iterator Model for Queries

Experimented with Thread Per Module Linux threads are expensive

Two Memory Management Models Java Objects Home Rolled Byte Arrays

Page 8: 2/14/01RightOrder : Telegraph & Java1 Telegraph Java Experiences Sam Madden UC Berkeley madden@cs.berkeley.edu

2/14/01 RightOrder : Telegraph & Java 8

Tuples as Java Objects Tuple Data stored as a Java Object Each in separate byte array Tuples copied on joins, aggregates Issues

Memory Management between Modules, Queries, Garbage collector control

Allocation Overhead Performance: 30,000 200byte tuples / sec

-> 5.9 MB / sec

Page 9: 2/14/01RightOrder : Telegraph & Java1 Telegraph Java Experiences Sam Madden UC Berkeley madden@cs.berkeley.edu

2/14/01 RightOrder : Telegraph & Java 9

Tuples As Byte Array All tuples stored in same byte array / query Surrogate Java Objects

Offset, Size

Offset, Size

Offset, Size

Surrogate Objects

Byte Array

Directory

Page 10: 2/14/01RightOrder : Telegraph & Java1 Telegraph Java Experiences Sam Madden UC Berkeley madden@cs.berkeley.edu

2/14/01 RightOrder : Telegraph & Java 10

Byte Array (cont) Allows explicit control over memory /

query (or module) Compaction eliminates garbage

collection randomness Lower throughput: 15,000 t/sec

No surrogate object reuse Synchronization costs

Page 11: 2/14/01RightOrder : Telegraph & Java1 Telegraph Java Experiences Sam Madden UC Berkeley madden@cs.berkeley.edu

2/14/01 RightOrder : Telegraph & Java 11

Other System Pieces XML Based Catalog

Java Introspection Helps Applet-based Front End JDBC Interface Fault Tolerance / Multiple Servers

Via simple UNIX tools

Page 12: 2/14/01RightOrder : Telegraph & Java1 Telegraph Java Experiences Sam Madden UC Berkeley madden@cs.berkeley.edu

2/14/01 RightOrder : Telegraph & Java 12

RightOrder Questions Performance vs. C JNI Issues Garbage Collection Issues Serialization Costs Lots of Java Objects JDBC vs ODI

Page 13: 2/14/01RightOrder : Telegraph & Java1 Telegraph Java Experiences Sam Madden UC Berkeley madden@cs.berkeley.edu

2/14/01 RightOrder : Telegraph & Java 13

Performance Vs. C JVM + JIT Performance Encouraging: IBM

JIT == 60% of Intel C compiler, faster than MSC for low level benchmarks

IBM JIT 2x Faster than HotSpot for Telegraph Scans

Stability Issues www.javalobby.org/features/jpr

Page 14: 2/14/01RightOrder : Telegraph & Java1 Telegraph Java Experiences Sam Madden UC Berkeley madden@cs.berkeley.edu

2/14/01 RightOrder : Telegraph & Java 14

JIT Performance vs C

IBM JIT

Optimized Intel

Optimized MS

Source: www.javalobby.org/features/jpr

Page 15: 2/14/01RightOrder : Telegraph & Java1 Telegraph Java Experiences Sam Madden UC Berkeley madden@cs.berkeley.edu

2/14/01 RightOrder : Telegraph & Java 15

Performance Gotchas Synchronization

~2x Function Call overhead in HotSpot Used in Libraries: Vector, StringBuffer

• String allocation single most intensive operation in Telegraph

• Mercatur: 20% initial CPU Cost

Garbage Collection Java dumb about reuse Mercatur: 15% Cost OceanStore: 30ms avg latency, 1S peak

Page 16: 2/14/01RightOrder : Telegraph & Java1 Telegraph Java Experiences Sam Madden UC Berkeley madden@cs.berkeley.edu

2/14/01 RightOrder : Telegraph & Java 16

More Gotchas Finalization

Finalizing methods allows inlining Serialization

RMI, JNI use serialization Philippsen & Haumacher Show

Performance Slowness

Page 17: 2/14/01RightOrder : Telegraph & Java1 Telegraph Java Experiences Sam Madden UC Berkeley madden@cs.berkeley.edu

2/14/01 RightOrder : Telegraph & Java 17

Performance Tools Tools to address some issues

JAX, Jopt: make bytecode smaller, faster• www.alphaworks.ibm.com/tech/JAX

www.condensity.com• Bytecode optimizer

www.optimizeit.com• Good profiler, memory allocation and garbage

collection monitor

Page 18: 2/14/01RightOrder : Telegraph & Java1 Telegraph Java Experiences Sam Madden UC Berkeley madden@cs.berkeley.edu

2/14/01 RightOrder : Telegraph & Java 18

JNI Issues Not a part of Telegraph JNI overhead quite large (JDK 1.1.8, PII

300 MHz)

Source: Matt Welsh. A System Support High Performance Communication and IO In Java. Master’s Thesis,

UC Berkeley, 1999.

Page 19: 2/14/01RightOrder : Telegraph & Java1 Telegraph Java Experiences Sam Madden UC Berkeley madden@cs.berkeley.edu

2/14/01 RightOrder : Telegraph & Java 19

More JNI But, this is being worked on

IBM JDK 100,000 B copy in 5ms, vs 23ms for 1.1.8 (500 Mhz PIII)

JNI allows synchronization (pin / unpin), thread management See http://developer.java.sun.com/developer/onlineTraining/Programming/JDCBook/

jni.html

GCJ + CNI: access Java objects via C++ classes http://gcc.gnu.org/java/

Page 20: 2/14/01RightOrder : Telegraph & Java1 Telegraph Java Experiences Sam Madden UC Berkeley madden@cs.berkeley.edu

2/14/01 RightOrder : Telegraph & Java 20

Garbage Collection Performance

Big problem: 1 S or longer to GC lots of objects Most Java GCs blocking (not concurrent or multi-

threaded) Unexpected Latencies

OceanStore: Network File Server, 30ms avg. latencies for network updates, 1000 ms peak due to GC

In high-concurrency apps, such delays disastrous

Page 21: 2/14/01RightOrder : Telegraph & Java1 Telegraph Java Experiences Sam Madden UC Berkeley madden@cs.berkeley.edu

2/14/01 RightOrder : Telegraph & Java 21

Garbage Collection Cont. Limited Control

Runtime.gc() only a hint Runtime.freeMemory() unreliable No way to disable

No object reuse Lots of unnecessary memory allocations

Page 22: 2/14/01RightOrder : Telegraph & Java1 Telegraph Java Experiences Sam Madden UC Berkeley madden@cs.berkeley.edu

2/14/01 RightOrder : Telegraph & Java 22

Serialization Not in Telegraph Philippsen and Haumacher, “More Efficient Object Serialization.”

International Workshop on Java for Parallel and Distributed Computing. San Juan, April, 1999. Serialization costs for RMI are 50% of total RMI time Discard longevity for 7x speed up

Sun Serialization provides versioning Complete class description stored with each serialized object Most standard classes forward compatible (JDK docs note

special cases) See http://java.sun.com/products/jdk/1.2/docs/guide/serialization/spec/serialTOC.doc.html

Page 23: 2/14/01RightOrder : Telegraph & Java1 Telegraph Java Experiences Sam Madden UC Berkeley madden@cs.berkeley.edu

2/14/01 RightOrder : Telegraph & Java 23

Lots of Objects GC Issues Serious Memory Management

GC makes programmers allocate willy-nilly Hard to partition memory space

Telegraph byte-array ugliness due to inability to limit usage of concurrent modules, queries

Page 24: 2/14/01RightOrder : Telegraph & Java1 Telegraph Java Experiences Sam Madden UC Berkeley madden@cs.berkeley.edu

2/14/01 RightOrder : Telegraph & Java 24

Storage Overheads Java Object class is big:

Integer requires 23 bytes in JDK 1.3 int requires 4.3 bytes No way to circumvent object fields Use primitives or hand-written

serialization whenever possible

Page 25: 2/14/01RightOrder : Telegraph & Java1 Telegraph Java Experiences Sam Madden UC Berkeley madden@cs.berkeley.edu

2/14/01 RightOrder : Telegraph & Java 25

JDBC vs ODI No experience with Oracle JDBC overheads are high, but don’t

have specific performance numbers

Page 26: 2/14/01RightOrder : Telegraph & Java1 Telegraph Java Experiences Sam Madden UC Berkeley madden@cs.berkeley.edu

2/14/01 RightOrder : Telegraph & Java 26

Bottom Line Java great for many reasons

GC, standard libraries, type safety, introspection, etc. Significant reductions in development and debugging

time. Java performance isn’t bad

Especially with some tuning Memory Management an Issue Lack of control over JVMs bad

When to garbage collect, how to serialize, etc.