java collections the force awakens

Post on 23-Jan-2017

974 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Java CollectionsThe Force Awakens

Darth @RaoulUKDarth @RichardWarburto#javaforceawakens

Evolution can be interesting ...Java 1.2 Java 10?

Collection API Improvements

Persistent & Immutable Collections

Performance Improvements

Collection bugs

1. Element access (Off-by-one error, ArrayOutOfBound)2. Concurrent modification 3. Check-then-Act

Scenario 1

List<String> jedis = new ArrayList<>(asList("Luke", "yoda"));

for (String jedi: jedis) {

if (Character.isLowerCase(jedi.charAt(0))) {

jedis.remove(jedi);

}

}

Scenario 2

Map<String, BigDecimal> movieViews = new HashMap<>();

BigDecimal views = movieViews.get(MOVIE);

if(views != null) {

movieViews.put(MOVIE, views.add(BigDecimal.ONE));

}

views != nullmoviesViews.get movieViews.putThen

Check Act

Reducing scope for bugs

● ~280 bugs in 28 projects including Cassandra, Lucene

● ~80% check-then-act bugs discovered are put-if-absent

● Library designers can help by updating APIs as new idioms emerge

● Different data structures can provide alternatives by restricting reads & updates to reduce scope for bugs

CHECK-THEN-ACT Misuse of Java Concurrent Collectionshttp://dig.cs.illinois.edu/papers/checkThenAct.pdf

Java 9 API updates

Collection factory methods● Non-goal to provide persistent immutable collections● http://openjdk.java.net/jeps/269

Live Demo using jShellhttp://iteratrlearning.com/java9/2016/11/09/java9-collection-factory-methods

Collection API Improvements

Persistent & Immutable Collections

Performance Improvements

Categorising Collections

Mutable

Immutable

Non-Persistent Persistent

Unsynchronized Concurrent

Unmodifiable View

Available in Core Library

Mutable

● Popular friends include ArrayList, HashMap, TreeSet

● Memory-efficient modification operations

● State can be accidentally modified

● Can be thread-safe, but requires careful design

Unmodifiable

List<String> jedis = new ArrayList<>();

jedis.add("Luke Skywalker");

List<String> cantChangeMe = Collections.unmodifiableList(jedis);

// java.lang.UnsupportedOperationException

//cantChangeMe.add("Darth Vader");

System.out.println(cantChangeMe); // [Luke Skywalker]

jedis.add("Darth Vader");

System.out.println(cantChangeMe); // [Luke Skywalker, Darth Vader]

Immutable & Non-persistent

● No updates

● Flexibility to convert source in a more efficient representation

● No locking in context of concurrency

● Satisfies co-variant subtyping requirements

● Can be copied with modifications to create a new version (can be

expensive)

Immutable vs. Mutable hierarchy

ImmutableList MutableList

+ ImmutableList<T> toImmutable()

java.util.List

+ MutableList<T> toList()

Eclipse Collections (formaly GSCollections) https://projects.eclipse.org/projects/technology.collections/

ListIterable

Immutable and Persistent

● Changing source produces a new (version) of the collection

● Resulting collections shares structure with source to avoid full copying on updates

LISP anyone?

Persistent List (aka Cons)

public final class Cons<T> implements ConsList<T> {

private final T head;

private final ConsList<T> tail;

public Cons(T head, ConsList<T> tail) {

this.head = head; this.tail = tail;

}

@Override

public ConsList<T> add(T e) {

return new Cons(e, this);

}

}

Updating Persistent List

A B C X Y Z

Before

Updating Persistent List

A B C X Y Z

Before

A B D

After

Blue nodes indicate new copiesPurple nodes indicates nodes we wish to update

Concatenating Two Persistent Lists

A B C

X Y Z

Before

Concatenating Two Persistent Lists

- Poor locality due to pointer chasing- Copying of nodes

A B C

X Y Z

Before

A B C

After

Persistent List

● Structural sharing: no need to copy full structure

● Poor locality due to pointer chasing

● Copying becomes more expensive with larger lists

● Poor Random Access and thus Data Decomposition

Updating Persistent Binary Tree

Before

Updating Persistent Binary Tree

After

Persistent Array

How do we get the immutability benefits with performance of mutable variants?

Trieroot

10 4520

3. Picking the right branch is done by using parts of the key as a lookup

1. Branch factor not limited to binary

2. Leaf nodes contain actual values

a

a e

bc

b c f

Persistent Array (Bitmapped Vector Trie)... ...

... ...

... ...

... ...

.

.

.

.

.

.

1 31

0 1 31

Level 1 (root)

Level 2

Leaf nodes

Trade-offs

● Large branching factor facilitates iteration but hinders updates

● Small branching factor facilitates updates but hinders traversal

Java Persistent Collections

- Not available as part of Java Core Library

- Existing projects includes- PCollections: https://github.com/hrldcpr/pcollections- Port of Clojure DS: https://github.com/krukow/clj-ds- Port of Scala DS: https://github.com/andrewoma/dexx- Now also in Javaslang: http://javaslang.io

Memory usage survey

10,000,000 elements, heap < 32GB

int[] : 40MBInteger[]: 160MBArrayList<Integer>: 215MBPersistentVector<Integer>: 214MB (Clojure-DS)Vector<Integer>: 206MB (Dexx, port of Scala-DS)

Data collected using Java Object Layout: http://openjdk.java.net/projects/code-tools/jol/

Takeaways

● Immutable collections reduce the scope for bugs

● Always a compromise between programming safety and performance

● Performance of persistent data structure is improving

Collection API Improvements

Persistent & Immutable Collections

Performance Improvements

O(N)

O(1)

O(HYPERSPACE)

Primitive specialised collections

● Collections often hold boxed representations of primitive values

● Java 8 introduced IntStream, LongStream, DoubleStream and

primitive specialised functional interfaces

● Other libraries, eg: Agrona, Koloboke and Eclipse-Collections provide

primitive specialised collections today.

● Valhalla investigates primitive specialised generics

Java 8 Lazy Collection Initialization

Many allocated HashMaps and ArrayLists never written to, eg Null object pattern

Java 8 adds Lazy Initialization for the default initialization case

Typically 1-2% reduction in memory consumption

http://www.javamagazine.mozaicreader.com/MarApr2016/Twitter#&pageSet=28&page=0

HashMaps Basics

...

Han Solohash = 72309

Chewbaccahash = 72309

Chaining Probing

HashMaps

a separate data structure for collision lookups

Store inline and have a probing sequence

Aliases: Palpatine vs Darth Sidious

Chaining Probing

HashMaps

aka Closed Addressing

aka Open Hashing

aka Open Addressing

aka Closed Hashing

Chaining Probing

HashMaps

Linked List Based Tree Based

java.util.HashMap

Chaining Based HashMap

Historically maintained a LinkedList in the case of a collision

Problem: with high collision rates that the HashMap approaches O(N) lookup

java.util.HashMap in Java 8

Starts by using a List to store colliding values.

Trees used when there are over 8 elements

Tree based nodes use about twice the memory

Make heavy collision lookup case O(log(N)) rather than O(N)

Relies on keys being Comparable

https://github.com/RichardWarburton/map-visualiser

So which HashMap is best?

Example Jar-Jar Benchmark

call get() on a single value for a map of size 1

No model of the different factors that affect things!

Tree Optimization - 60% Collisions

Tree Optimization - 10% Collisions

Probing vs Chaining

Probing Maps usually have lower memory consumption

Small Maps: Probing never has long clusters, can be up to 91% faster.

In large maps with high collision rates, probing scales poorly and can be significantly slower.

Takeaways

There’s no clearcut “winner”.

JDK Implementations try to minimise worst case.

Linear Probing requires a good hashCode() distribution, Often hashmaps “precondition” their hashes.

IdentityHashMap has low memory consumption and is fast, use it!

3rd Party libraries offer probing HashMaps, eg Koloboke & Eclipse-Collections.

Conclusions

Any Questions?

www.iteratrlearning.com

● Modern Development with Java 8● Reactive and Asynchronous Java● Java Software Development Bootcamp

#javaforceawakens

Further reading

Fast Functional Lists, Hash-Lists, Deques and Variable Length Arrayshttps://infoscience.epfl.ch/record/64410/files/techlists.pdf

Smaller Footprint for Java Collectionshttp://www.lirmm.fr/~ducour/Doc-objets/ECOOP2012/ECOOP/ecoop/356.pdf

Optimizing Hash-Array Mapped Tries for Fast and Lean Immutable JVM Collectionshttp://michael.steindorfer.name/publications/oopsla15.pdf

RRB-Trees: Efficient Immutable Vectorshttps://infoscience.epfl.ch/record/169879/files/RMTrees.pdf

Further reading

Doug Lea’s Analysis of the HashMap implementation tradeoffshttp://www.mail-archive.com/core-libs-dev@openjdk.java.net/msg02147.html

Java Specialists HashMap article

http://www.javaspecialists.eu/archive/Issue235.html

Sample and Benchmark Codehttps://github.com/RichardWarburton/Java-Collections-The-Force-Awakens

top related