audience segments. technical aspects of audience targeting in dsp by ivan michailov techhangout #6
TRANSCRIPT
Audience segments
Technical aspects of audience targeting in DSP
What is audience segment- named set of device ids or cookies- ids are any advertising device ids or its hashes, IDFA, UDID, AAID, Android
Device ID, MD5, SHA1
- can be self-gathered (1st-party), or provided by DMP (3rd-party)
“In-Market --> Autos --> Makes & Models --> Audi --> A4”
idfa:6630580C-4347-4FFF-8AEB-19530C143800idfa:4C34D479-2F53-4673-B119-5F9AF0FE6BB7aaid:3b93df69-8b57-4b64-a874-44eb37e3312caaid_md5:134c1ca7d0abc2f5c0b046b81f7941a3
Taxonomy
Some numbersTotal:
- segment size up to billion device ids (109)- up to 200000 segments per DMP (105)- total IDs set contains more than 7 billion device ids (7*109)- gzipped segments content size >75Tb- several DMPs: BlueKai, Lotame, Mobext, Statiq, etc
Used in active campaigns:
- segments count ~200- total unique ids count ~1*109
- data size ~500Gb
Requirements- support big sizes and counts- reply in 20ms (100ms for whole bidding cycle)- support multiple datacenters. Bidders are spread over multiple DCs, ADB
instances should be local to keep latency, require full replication- integration with many DMPs- regular updates- short update cycle for self-gathered segments, it should be available during
gathering
Solutions- kvs-based precise solution- Bloom-filter based probabilistic solution
KVS-based- fast real-time storage only for active segments- slow storage for all segments- controller that upload data from slow storage
Datacenter US
External DMPsExternal DMPs
Fast storage
ADB controller
Campaign server
External DMPs
Postgres S3
Biddercommands/status
upload segmentsdata
Datacenter EU
Datacenter APAC
targeting periods bid request
KV real-time storage
KV storage issues- costs. Slow s3 storage >$2k monthly, real-time database >$5k per datacenter- upload bandwidth is limited and shared with lookups. Lookups 100k/s, upload
30k id/s
Alternative solution: Bloom-filters- real-time storage is replaced by set of Bloom-filters- slow storage is the same- controller create Bloom-filter instead of uploading to real-time storage, resolve
issue with uploading bandwidth- real-time storage is calculation cluster that check requested ids against all
B-filters- each segment require Bloom-filter size from 10Mb to 200Mb, false-positive
error rate 0.5%- cost for one DC is ~$1k
Bloom-filters solution
Datacenter US
B-filter host 1
ADB controller
S3, b-filters
Biddercommands/status
Datacenter EU
Datacenter APAC
bid request
S3, data
B-filter host 2
b-filters
sharding
Cons- there are no ready solutions which host Bloom-filters and support sharding and
replication- it is probabilistic and can’t be used for strict segments- no strict support for cross-device
Q&A
Thank you for attention!