craig kerstiens - scalable uniques in postgres @ postgres open

Scalable Uniques in Postgres -

Craig KerstiensHeroku Postgres

Postgresql-HLL

Truviso

• Extended Postgres to do streaming

• Various markets

• Ad space

• Wanted unique impressions

• Sort of wanted unique impressions

SELECT count(*)

Approx Top K

Compressed Bitmap

HyperLogLog

HyperLogLog• KMV - K minimum value

• Bit observable patterns

• Stochastic averaging

• Harmonic averaging

• Implemented by Aggregate Knowledge

HyperLogLog

Probabilistic uniques with small footprint

HyperLogLog

Probabilistic uniques with small footprintClose enough distinct with small footprint

Use cases

• Semi distinct count

• Think pg_stat_statements

• Ad networks

• Web traffic

Use cases

• Semi distinct count

• Think pg_stat_statements

• Ad networks

• Web traffic

• With rollups/groupings

Digging in

CREATE EXTENSION hll;

CREATE TABLE helloworld ( id integer, set hll );

Digging in

CREATE EXTENSION hll;

CREATE TABLE helloworld ( id integer, set hll );

Inserting dataUPDATE helloworld SET set = hll_add(set, hll_hash_integer(12345)) WHERE id = 1;

UPDATE helloworld SET set = hll_add(set, hll_hash_text('hello world')) WHERE id = 1;

Real world

CREATE TABLE daily_uniques ( date date UNIQUE, users hll);

Real world

INSERT INTO daily_uniques(date, users) SELECT date, hll_add_agg(hll_hash_integer(user_id)) FROM users GROUP BY 1;

Real worldSELECT

EXTRACT(MONTH FROM date) AS month, hll_cardinality(hll_union_agg(users))FROM daily_uniquesWHERE date >= '2012-‐01-‐01' AND date < '2013-‐01-‐01'GROUP BY 1;

Real worldSELECT EXTRACT(MONTH FROM date) AS month, hll_cardinality(hll_union_agg(users))FROM daily_uniquesWHERE date >= '2012-‐01-‐01' AND date < '2013-‐01-‐01'GROUP BY 1;

Good practices

• It uses update

Good practices

• It uses update

• Do as a batch in most cases

Good practices

• It uses update

• Do as a batch in most cases

• Tweak the config

Tuning Parameters

Tuning Parameters• log2m - log base 2 of registers

• Between 4 and 17

• Each 1 increase doubles storage

• regwidth - bits per register

• expthresh - threshold for explicit vs sparse

• spareson - on/off for sparse

Is it better?

1280 bytesEstimate count of 10s of billions

Few percent error

Resources• https://github.com/aggregateknowledge/

postgresql-hll

• http://blog.aggregateknowledge.com/2013/02/04/open-source-release-postgresql-hll/

• http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog

Questions

craig kerstiens - scalable uniques in postgres @ postgres open

hyperloglog kmv

minimum value

uniques datedateunique

uniques wheredate

scalable uniques

distinct count think

small footprint close

real world insertintodaily

Technology

keith paskett - postgres on zfs @ postgres open

postgres plus cloud database -...

postgres enterprise manager · postgres enterprise manager,...

scaling postgres

postgres open, september 2014: postgres interface...

configuring edb postgres advanced server streaming...

bob dalesio, deb kerstiens, rozelle wright oct 1, 2001

des photos uniques

using vmware vfabric postgres - vfabric postgres 9.3 · pdf...

postgres enterprise manager installation guide · postgres...

postgres (for non-postgres people) · 2020-01-04 · greg...

john melesky - federating queries using postgres fdw @...

violet collection by gate uniques

young uniques africa

using vmware vfabric postgres - vfabric postgres 9.3.5

postgres plus advanced server installation...

postgres tutorial

postgres user

scaling php to 40 million uniques

heroku's craig kerstiens: developer marketing channels