couchbase live europe 2015: criteo: real time advertising at nosql speed
Post on 16-Jul-2015
89 Views
Preview:
TRANSCRIPT
Copyright © 2014 Criteo
Real time advertising at NoSQL speed
Nicolas Helleringer
Site Reliability Engineering Manager
March 23rd 2015
Copyright © 2014 Criteo
Criteo
• French start up since 2005• Keeping the spirit
• Machine Learning and Big Data oriented
• Algorithms as root
• Ad Tech business
• 210 in R&D 1400 worldwide• R&D teams in Paris
Copyright © 2014 Criteo
Criteo – Personalized performance advertising at scale
• Personalized• Per user per view
• Real time product recommendation
• Performance• Cost of sales model
• Cross device
• Scale• 994 millions unique users reach globally
• 741 billions ads in 2014
Copyright © 2014 Criteo
Site Reliability Engineering at Criteo
50 DevOps engineers and growing with the company
Software infrastructure
Systems (CentOS, Windows, monitoring)
Automation (Chef)
Big Data infrastructure (Hadoop, Storm, Kafka, HBase)
NoSQL infrastructure (MemCache, CouchBase, Graphite,ElasticSearch, MongoDB)
Escalation
Predictive monitoring
Build & continuous integration
Copyright © 2014 Criteo
Hardware Infrastructure
• 6 Datacenters
• 10k+ server (50/50 windows/linux)
• Private worldwide dedicated network up to 10Gb/s
• One of the biggest Hadoop clusters in Europe(1k+ servers, 37 PB)
Copyright © 2014 Criteo
From SQL to some NoSQL
• Criteo started with• User data in client side cookies
• Client and campaign configurations in MS SQL server
• First issues• RAM cache in IIS servers
• Size of the cookies
• Added some MemCache …. big instances
Copyright © 2014 Criteo
Next steps
• Criteo connected to RTBs
• Real Time Bidding networks
• Server to servers calls => bye bye client side cookies
• Additional challenge : multi datacenter => sync !
• Moved to automated by Chef scale out MemCache/CouchBase clusters
• Persistency !
• C# driver with double write
Copyright © 2014 Criteo
Grown big !
• 550 CouchBase servers (from1.8 to 3)
• 24 clusters
• 107 TB Total RAM and SSD space
• 10 millions hits / sec on average
Copyright © 2014 Criteo
Lessons learned / quick wins
• Do not mix RAM and persisted usages• This voids the flexibility of persisted only clusters
• Before 3.0 metadata memory usage is the limiting factor on large clusters
• CouchBase admin console is fine to begin with. Extracted raw stats and put them in Graphite for more flexibility
• Invest the same in Dev than in Ops
Copyright © 2014 Criteo
Copyright © 2014 Criteo
Copyright © 2014 Criteo
Next
• Replication and Metadata dynamic memory in 3.0
• Cross datacenter replication via CouchBase
• C# driver with failover on replicasmove from CP to AP
• Set/Get with metadata
Copyright © 2014 Criteo
Thanks !
Questions ?
top related