lucene kv-store

Lucene KV-StoreA high-performance key-value store

Mark Harwood

Benefits

High-speed reads and writes of key/value pairs sustained over growing volumes of data

Read costs are always 0 or 1 disk seek

Efficient use of memory

Simple file structures with strong durability guarantees

Why “Lucene” KV store?

Uses Lucene’s “Directory” APIs for low-level file access

Based on Lucene’s concepts of segment files, soft deletes, background merges, commit points etc BUT a fundamentally different form of index

I’d like to offer it to the Lucene community as a “contrib” module because they have a track record in optimizing these same concepts (and could potentially make use of it in Lucene?)

Example benchmark results

Note, regular Lucene search indexes follow the same trajectory of the “Common KV Store” when it comes to lookups on a store with millions of keys

KV-Store High-level Design

Map held in RAM

Disk

Key hash (int)

Disk pointer (int)

23434 0

6545463 10

874382 22

Num keys with hash (VInt)

Key 1 size (VInt)

Key 1 (byte [ ])

Value 1 size(Vint)

Value 1(byte[ ])

Key/values 2,3,4…

1 3 Foo 3 Bar

2 5 Hello 5 World 7,Bonjour,8,Le Mon..

Most hashes have only one associated key and valueSome hashes will have key collisions

requiring the use of extra columns

here

Read logic (pseudo code)

int keyHash=hash(searchKey);

int filePointer=ramMap.get(keyHash);

if filePointer is null

return null for value;

file.seek(filePointer);

int numKeysWithHash=file.readInt()

for numKeysWithHash

{

storedKey=file.readKeyData();

if(storedKey==searchKey)

return file.readValueData();

file.readValueData();

}

There is a guaranteed maximum of one random disk seek for any lookup

With a good hashing

function most lookups will only need to go once around this loop

Write logic (pseudo code)

int keyHash=hash(newKey);

int oldFilePointer=ramMap.get(keyHash);

ramMap.put(keyHash,file.length());

if oldFilePointer is null

{

file.append(1);//only 1 key with hash

file.append(newKey);

file.append(newValue);

}else

{

file.seek(oldFilePointer);

int numOldKeys=file.readInt();

Map tmpMap=file.readNextNKeysAndValues(numOldKeys);

tmpMap.put(newKey,newValue);

file.append(tmpMap.size());

file.appendKeysAndValues(tmpMap);

}

Updates will always append to the end of

the file, leaving older values unreferenced

In case of any key collisions,

previously stored values are copied to

the new position at the end of the file along with the new content

Segment generations: writes

Maps held in RAM

Key and value disk

stores

Hash Pointer

23434

0

65463

10

… …

Hash Pointer

203765

0

37594 10

… …

old new

0 1

Hash Pointer

23434

0

65463

10

… …

Hash Pointer

152433

0

742297

10

… …

23

Writes append to the end of

the latest generation

segment until it reaches a set size then it is made read-

only and new segment is

created.

Segment generations: reads

Maps held in RAM

Key and value disk

stores

Hash Pointer

23434

0

65463

10

… …

Hash Pointer

203765

0

37594 10

… …

old new

0 1

Hash Pointer

23434

0

65463

10

… …

Hash Pointer

152433

0

742297

10

… …

23

Read operations search memory maps in reverse order. The first map found with

a hash is expected to

have a pointer into its

associated file for all the latest keys/values with

this hash

Segment generations: merges

Maps held in RAM

Key and value disk

stores

Hash Pointer

23434

0

65463

10

… …

Hash Pointer

203765

0

37594

10

… …

0 1

Hash Pointer

23434

0

65463

10

… …

Hash Pointer

152433

0

742297

10

… …

23

A background thread merges read-only

segments with many outdated entries into new, more compact

versions

4

Segment generations: durability

Maps held in RAM

Key and value disk

stores

Hash Pointer

23434

0

65463

10

… …

Hash Pointer

203765

0

37594 10

… …

0 4

Hash Pointer

152433

0

742297

10

… …

Like Lucene, commit operations create a new

generation of a “segments” file, the

contents of which reflect the committed (i.e.

fsync’ed state of the store.)

Completed Segment IDs

0,4

Active Segment ID

3

Active segment committed length

423423

3

Implementation detailsJVM needs sufficient RAM for 2 ints for every active key (note: using “modulo N” on the hash can reduce RAM max to Nx2 ints at the cost of more key collisions = more disk IO)

Uses Lucene Directory forAbstraction from choice of file system

Buffered reads/writes

Support for Vint encoding of numbers

Rate-limited merge operations

Borrows successful Lucene concepts:Multiple segments flushed then made read-only.

“Segments” file used to list committed content (could potentially support multiple commit points)

Background merges

Uses LGPL “Trove” for maps of primitives

lucene kv-store

Technology

hash file

segments file

file tmpmap

unreferenced file

length file

appendnewkey file

thisreturn file

associatedstores file