algolia - hosted search api

47
Instant Search API Build Unique Search Experiences Sylvain Utard VP of Engineering [email protected] @sylvainutard Enterprise Search and Analytics

Upload: enterprisesearchmeetup

Post on 17-Aug-2015

144 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Algolia - Hosted Search API

Instant Search API

Build Unique Search Experiences

Sylvain UtardVP of Engineering

[email protected]@sylvainutard

Enterprise Search and Analytics

Page 2: Algolia - Hosted Search API

@algolia

Who am I?5 years @ Exalead, leading the core-engine & NLP teams

• C++ • ExaScript (RIP) • Java

2 years @ Algolia, VP of Engineering • C++ • Ruby • Java • and 10+ other languages…

@sylvainutard

Page 3: Algolia - Hosted Search API

@algolia

A hosted search API

Page 4: Algolia - Hosted Search API

@algolia

A hosted search API

Page 5: Algolia - Hosted Search API

@algolia

Page 6: Algolia - Hosted Search API

@algolia

A hosted search API

Replies in milliseconds

Page 7: Algolia - Hosted Search API

@algolia

A hosted search API

Replies in milliseconds

From anywhere

Page 8: Algolia - Hosted Search API

@algolia

A hosted search API

Replies in milliseconds

From anywhere With intuitive relevance

Page 9: Algolia - Hosted Search API

@algolia

Algolia Today

Page 10: Algolia - Hosted Search API

@algolia

800+ customers in 80+ countries

Algolia Today

Page 11: Algolia - Hosted Search API

@algolia

800+ customers in 80+ countries

40B+ Write operations per month

4B+ User-generated queries per monthAlgolia Today

Page 12: Algolia - Hosted Search API

@algolia

Algolia Today

13 locations

800+ customers in 80+ countries

40B+ Write operations per month

4B+ User-generated queries per month

Page 13: Algolia - Hosted Search API

@algolia

Performance is our DNA

Page 14: Algolia - Hosted Search API

@algolia

Speed matters

Half a second delaycaused 20% drop in traffic

Every 100ms of latencycosts them 1% in sales

Page 15: Algolia - Hosted Search API

@algolia

Behind the scene

Page 16: Algolia - Hosted Search API

@algolia

Unique set of constraintsHigh volume of Read & Write operations

Page 17: Algolia - Hosted Search API

@algolia

Unique set of constraintsHigh volume of Read & Write operations

High-availability

Page 18: Algolia - Hosted Search API

@algolia

Unique set of constraintsHigh volume of Read & Write operations

High-availability

Worldwide data distribution

Page 19: Algolia - Hosted Search API

@algolia

API Software StackStarted as a mobile offline SDK

Written in C++

Search code embedded in Nginx as a module

Indexing is done in a separate process

Two redis instances

Page 20: Algolia - Hosted Search API

@algolia

API Hardware

Fast CPU (Xeon E5 >3.5GHz)

In Memory (128GB)

Backed by High-end SSD in Raid-0 (800GB)

Specific kernel settings

Page 21: Algolia - Hosted Search API

@algolia

Scaling horizontally

Several clusters per location

A user is assigned to one master cluster

A user can be replicated to N replicate clusters

Page 22: Algolia - Hosted Search API

@algolia

What is a cluster

Master-Master

Stream of writes via Consensus

At least 3 machines

Page 23: Algolia - Hosted Search API

@algolia

A write in practice

One of the machines acceptthe write operation via the API (https)

/1/indexes/MyFirstIndex/batch

Page 24: Algolia - Hosted Search API

@algolia

A write in practice

The file is saved on the three machinesas a temporary file

tmp1265

tmp7864

tmp2357

Page 25: Algolia - Hosted Search API

@algolia

A write in practice

Launch the consensus by contactingthe RAFT master

startConsensus(tmp2357, tmp7864, tmp1265)

Page 26: Algolia - Hosted Search API

@algolia

A write in practice

1 -Master send the commit order to all nodes

2- Each node returns the next job ID to master

3- If there is a majority the file is committed

Page 27: Algolia - Hosted Search API

@algolia

A write in practice

Same job ID on all hosts

Send to slave replicate in parallel

Processed in parallel on all hostsjob42

job42

job42

Page 28: Algolia - Hosted Search API

@algolia

In case one host is down

Continue to accept writes

The two other hosts keep jobs

Jobs are sequential, will catch up at restartjob42job42

Page 29: Algolia - Hosted Search API

@algolia

Distribution

Replicate jobs, not the result

Send to all machines in parallel

Consistent with few seconds delay

Page 30: Algolia - Hosted Search API

@algolia

High availability

Multi-regions in one location

Page 31: Algolia - Hosted Search API

@algolia

High availability

13 fully independent locations

Page 32: Algolia - Hosted Search API

@algolia

Network Optimisations

API usage moving from servers to browser and mobile apps

Get close to end users

Page 33: Algolia - Hosted Search API

@algolia

Distributed Search Network - Worldwide Synchronization

Page 34: Algolia - Hosted Search API

@algolia

Distributed Search Network - Worldwide Synchronization

Page 35: Algolia - Hosted Search API

@algolia

• 13 locations = 25 datacenters • No ideal worldwide provider

• AWS is not in India, Eastern EU, Africa…

• Need to handle several providers

• Anticipate long deliveries / customs

• Keep as few providers as possible

Distributed Search Network - Worldwide Synchronization

Page 36: Algolia - Hosted Search API

@algolia

DNS is key

Used to find the closest location

Several DNS providers

Good anycast network

Page 37: Algolia - Hosted Search API

@algolia

API Clients

DNS health checks are not enough

Smart retry logic in all our API Clients

Page 38: Algolia - Hosted Search API

@algolia

Analytics• What are my users searching for?

• Top search

• Top search without hits

• Top refinements

• From where do they search for?

Page 39: Algolia - Hosted Search API

@algolia

Page 40: Algolia - Hosted Search API

@algolia

Page 41: Algolia - Hosted Search API

@algolia

Analytics

• Billions of user-generated queries per month

• As-you-type aggregation

• ~3 months retentions

• Storing all of them in…

Page 42: Algolia - Hosted Search API

@algolia

Analytics

• Elasticsearch \o/

• … without FTS :)

• but with aggregations

Page 43: Algolia - Hosted Search API

@algolia

Analytics• No FTS

• No source

• Doc values everywhere

• SSD only

• Custom aggregations

(deprecated since ES 1.1.0)

Page 44: Algolia - Hosted Search API

@algolia

Top-k Aggregation• Before

• Linear memory consumption

• Exhaustivity

• After

• Constant memory consumption

• Approximative but enough

Page 45: Algolia - Hosted Search API

@algolia

Building your worldwide infra- Is long and difficult quest - Is a real asset & differentiator

The Future of APIs is Distributed

Page 46: Algolia - Hosted Search API

@algolia

All the details of our architecture are on HighScalability.com

Want to know more?

Page 47: Algolia - Hosted Search API

THANK YOU!

[email protected] @algolia

Build Unique Search ExperiencesWe are hirin

g in SF, NYC and Paris 😊