craig kerstiens - scalable uniques in postgres @ postgres open

40
Scalable Uniques in Postgres - Craig Kerstiens Heroku Postgres

Upload: postgresopen

Post on 23-Jan-2015

749 views

Category:

Technology


5 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

Scalable Uniques in Postgres -

Craig KerstiensHeroku Postgres

Page 2: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

Postgresql-HLL

Page 3: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

Truviso

• Extended Postgres to do streaming

• Various markets

• Ad space

• Wanted unique impressions

• Sort of wanted unique impressions

Page 4: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

SELECT count(*)

Page 5: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

Approx Top K

Page 6: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

Compressed Bitmap

Page 7: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

HyperLogLog

Page 8: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

HyperLogLog• KMV - K minimum value

Page 9: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

HyperLogLog• KMV - K minimum value

• Bit observable patterns

Page 10: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

HyperLogLog• KMV - K minimum value

• Bit observable patterns

• Stochastic averaging

Page 11: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

HyperLogLog• KMV - K minimum value

• Bit observable patterns

• Stochastic averaging

• Harmonic averaging

Page 12: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

HyperLogLog• KMV - K minimum value

• Bit observable patterns

• Stochastic averaging

• Harmonic averaging

Page 13: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

HyperLogLog• KMV - K minimum value

• Bit observable patterns

• Stochastic averaging

• Harmonic averaging

• Implemented by Aggregate Knowledge

Page 14: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open
Page 15: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

HyperLogLog

Probabilistic uniques with small footprint

Page 16: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

HyperLogLog

Probabilistic uniques with small footprintClose enough distinct with small footprint

Page 17: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

Use cases

Page 18: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

Use cases

• Semi distinct count

• Think pg_stat_statements

• Ad networks

• Web traffic

Page 19: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

Use cases

• Semi distinct count

• Think pg_stat_statements

• Ad networks

• Web traffic

• With rollups/groupings

Page 20: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

Digging in

CREATE  EXTENSION  hll;

   CREATE  TABLE  helloworld  (            id        integer,            set      hll    );

Page 21: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

Digging in

CREATE  EXTENSION  hll;

   CREATE  TABLE  helloworld  (            id        integer,            set      hll    );

Page 22: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

Inserting dataUPDATE  helloworld  SET  set  =  hll_add(set,  hll_hash_integer(12345))  WHERE  id  =  1;

UPDATE  helloworld  SET  set  =  hll_add(set,  hll_hash_text('hello  world'))  WHERE  id  =  1;

Page 23: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

Real world

CREATE  TABLE  daily_uniques  (        date                        date  UNIQUE,        users                      hll);

Page 24: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

Real world

INSERT  INTO  daily_uniques(date,  users)    SELECT  date,  hll_add_agg(hll_hash_integer(user_id))    FROM  users    GROUP  BY  1;

Page 25: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

Real worldSELECT  

             EXTRACT(MONTH  FROM  date)  AS  month,                hll_cardinality(hll_union_agg(users))FROM  daily_uniquesWHERE  date  >=  '2012-­‐01-­‐01'  AND            date  <    '2013-­‐01-­‐01'GROUP  BY  1;

Page 26: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

Real worldSELECT                EXTRACT(MONTH  FROM  date)  AS  month,                hll_cardinality(hll_union_agg(users))FROM  daily_uniquesWHERE  date  >=  '2012-­‐01-­‐01'  AND            date  <    '2013-­‐01-­‐01'GROUP  BY  1;

Page 27: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

Good practices

Page 28: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

Good practices

Page 29: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

Good practices

• It uses update

Page 30: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

Good practices

• It uses update

• Do as a batch in most cases

Page 31: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

Good practices

• It uses update

• Do as a batch in most cases

• Tweak the config

Page 32: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

Tuning Parameters

Page 33: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

Tuning Parameters• log2m - log base 2 of registers

• Between 4 and 17

• Each 1 increase doubles storage

Page 34: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

Tuning Parameters• log2m - log base 2 of registers

• Between 4 and 17

• Each 1 increase doubles storage

• regwidth - bits per register

Page 35: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

Tuning Parameters• log2m - log base 2 of registers

• Between 4 and 17

• Each 1 increase doubles storage

• regwidth - bits per register

• expthresh - threshold for explicit vs sparse

Page 36: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

Tuning Parameters• log2m - log base 2 of registers

• Between 4 and 17

• Each 1 increase doubles storage

• regwidth - bits per register

• expthresh - threshold for explicit vs sparse

• spareson - on/off for sparse

Page 37: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

Is it better?

Page 38: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

1280 bytesEstimate count of 10s of billions

Few percent error

Page 40: Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open

Questions