building a scalable and modern infrastructure at carfax

Post on 29-Aug-2014

411 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

The CARFAX vehicle history database contains over twelve billion documents in a twelve shard cluster that replicates to multiple data centers. This will be a step by step walk through of how we deploy our servers, manage high volume reads and writes, and our configuration for high availability. By automating everything from the operating system install up we are able deploy complete replica clusters quickly and efficiently. Using distributed processing and message queuing we load millions of new documents each day with a projected growth over a billion records per year. Through the use of tagging, server configuration, and read settings we deliver content with high consistency and availability.

TRANSCRIPT

A Scalable and Modern Infrastructure at CARFAX

About Me• Jai Hirsch – Senior Systems Architect, Data

Technologies at CARFAX• Long-time Java and Database Developer• Data and Distributed Processing Enthusiast

• Github: https://github.com/JaiHirschTwitter: @JaiHirsch https://twitter.com/JaiHirschLinkedIn: http://www.linkedin.com/pub/jai-hirsch/8/a89/335

“CARFAX helps millions of people buy and sell used cars with more confidence”

CARFAX Vehicle History Report

Documents on the Report

NoSQL Before it Was Cool

Proprietary Key Value Store on OpenVMS Developed by CARFAX in 1984

Never mind that sh*t! Here comes Mongo!

Why MongoDB?Legacy structures mapped to

documentsHigh availability using replica setsPlatform IndependenceSupport

MongoDB at CARFAXOur Production EnvironmentThe Legacy Database and High

Volume LoadsHigh Availability Reads

Our Production Environment

Server Deployment

AUTOMATEAUTOMATE

AUTOMATEAUTOMATE

Server Configuration12 Shards with two spare servers racked for failover• OS: Linux• MongoDB 2.4.9• 128 GIGs of RAM• 1.8 TB of Drive Space • 10K RPM SAS Drives

The Future

Extract, Transform, Load

Loading Millions to Billions of Records per Day

AUTOMATEAUTOMATE

AUTOMATEAUTOMATE

First Attempt To Load Was Completely CPU Bound

Not Acceptable!45 Days to

Backload the Legacy Database

DistributedProcessing

Acceptable! Billion+ inserts per

Day! 9 Days to Backload

The MongoDB Implementation

13 billion+ documents 1.5 billion+ new documents per

year Document size: ~ 795 Bytes VHR uses 200+ documents

High Availability

Reads

Millions of Reports per Day

AUTOMATEAUTOMATE

AUTOMATE

Read Scalability With Tagging

Each Data center is Tagged

Each Replica Set is Tagged

5X More Reports per

Second

But we can do More!

Lets Wrap It UpDon’t buy a used car without a

CARFAX report➢Grok your data and working set➢Architect for your load volume➢Scale your reads to meet demand

Keys To Success➢AUTOMATE EVERYTHING

➢Test Many Configurations

➢Grid Computing is Awesome

➢Shard Early, Shard Often

And Remember

Friends Don’t Let Friends Use Default Ulimits!

Thank You!The migration was a

success due to the incredible teams at CARFAX and MongoDB

We are always looking for great people to join us.

www.carfax.com/careers

top related