nosql matters in catchoom recognition service

Download NoSQL matters in Catchoom Recognition Service

If you can't read please download the document

Upload: david-arcos

Post on 16-Apr-2017

10.286 views

Category:

Technology


0 download

TRANSCRIPT

NoSQL matters in Catchoom Recognition ServiceDavid Arcos http://catchoom.com

1) Introduction2) What did we need?3) How we build it4) Advantages of NoSQL5) Cool uses of NoSQL6) Limits7) Conclusion

Hi! I'm David Arcos

- Python/Django developer (>4yr)

- Web backend, distributed systems, databases, scalability, security

- Team leader at Catchoom

- You can follow me at @DZPM

Catchoom technology recognizes an object by searching through a large collection of images in a fraction of a second. Catchoom targets application developers and integrators.

Our customers are leaders in Augmented Reality

Visual Recognition:

Identify an object in front of the camera by comparing it to a huge collection of reference images

Examples of recognized objects:

- CD/DVD and book covers- Newspapers and magazines- Logos and brands- Posters- Packaged goods- Monuments and places

Catchoom Recognition Service:

- Cloud-based Visual Recognition (SaaS)- RESTful API to integrate- Add VR features to your app/platform

- Small team of 4 developers, doing SCRUM

1) Introduction2) What did we need?3) How we build it4) Advantages of NoSQL5) Cool uses of NoSQL6) Limits7) Conclusion

Minimum requirements:

- a public API for the final users to perform Visual Recognition

- a private API for the customer to manage the Collections and get statistics

- a nice website for the customer, providing the functionality of both APIs

Looks easy?

Must be flexible:

- A customer who does Augmented Reality, and needs a 3D model (binary format) in the item

- Another one who needs just the item id

- Our data model needs to allow everything (structured and unstructured data)

Must be reliable:

- Images or data should never be lost

- Avoid single points of failure

- We need redundancy

Must be very fast:

Layar has been using Catchooms Visual Search technology since the launch of Layar Vision, allowing users to quickly view the AR content placed on top of images by just pointing their camera to the image.

Weve benchmarked Catchooms technology in 2011 against 3 of their main competitors and found they had the best results both on speed and on successful matches (including lowest false positives)

Dirk Groten CTO of Layar

1) Introduction2) What did we need?3) How we build it4) Advantages of NoSQL5) Cool uses of NoSQL6) Limits7) Conclusion

Technology stack:- Development: Python, Django, Tornado, Gevent- Deployed using: Supervisord, Nginx, gunicorn, Fabric- AWS: EC2, S3, ELB

The Panel:

- typical customer portal:

- manage your Collections, run Visual Recognition

- get usage statistics

- and configure the payment method :)

Mobile apps:

- for Android, iOS

- use the Visual Recognition API

- the code will be published

Data models:

- Collection: a set of items. Has at least one token.

- Item: has at least one Image. Has metadata.

- Image: you want several images if the item has different sides, logos, flavours...

- Token: for authenticating the requests.

Components:

- the platform is highly modular

- Do one thing, and do it well

- they pass json messages

- optimized hardware settings

- Frontend:gets the API request

- Extractor:extracts the visual points

- Collector:message exchange

- Searcher:looks for matches

Required NoSQL features:

- key-value storage- cache- message lists- message pub/sub- real-time analysis

What servers have we chosen?

Required NoSQL features:

- key-value storage- cache- message lists- message pub/sub- real-time analysis

Required NoSQL features:

- key-value storage- cache- message lists- message pub/sub- real-time analysis

- and Filesystem:

1) Introduction2) What did we need?3) How we build it4) Advantages of NoSQL5) Cool uses of NoSQL6) Limits7) Conclusion

Performance:

- Can't afford writing to disk, or querying slow databases

- Using Redis, everything stays on memory

- One V.R. query takes just 300 ms

Scalability:

- Need to scale different components, separately

- Load balancing using Redis Lists:

BLPOP: Remove and get the first element in a list, or block until one is available

- But focus on the bottlenecks!

Unstructured data: query

- A query object has many optional parameters - each component can add/remove fields dynamically - schema change between versions

- Can't fit in a SQL table

- We model the query in Redis as a json

(timestamps, the image index, debug info...)

Unstructured data: metadata

- Metadata is optional and unstructed, can be from a json to a binary blob

- Can't fit in a SQL table, and would be too slow

- Serve the data from Redis, and use S3 as a backup

- Warning: in the future, if we have huge metadata files,Redis will get out of memory. We'll improve this approach

Availability:

- Avoid single points of failure. Replicate everything!

- Replicating a SQL server is painful

- Redis instances configured as Master/Slave - When the master dies: - promote a slave to be the new master - reconfigure the other slaves to use this new master - Redis Sentinel does this (beta)

1) Introduction2) What did we need?3) How we build it4) Advantages of NoSQL5) Cool uses of NoSQL6) Limits7) Conclusion

Do real-time calculations:

- Usage statistics - total, monthly, daily, hourly - per image, item or collection

- Metric monitoring for internal use - response times, queue size, etc

- QoS: enforce rate limiting - max hits per minute

EfficiencyTotals, per month, per day, per image, per item, per collectionResponse times, queue sizeRedis is compatible with memcached APIAvoid hitting the db

Sorted Sets:

- To create indexes and filters

- In example, Most recognized images (sorted by hits)

- Updating the Sorted Set, no need to reconsolidate:

ZADD Add one or more members to a sorted set, or update its score if it already exists

EfficiencyNo need to consolidate

Cache:

- Redis is compatible with memcached API

- Cache everything: - Sessions, metadata, etc

- ...although the website is internal: no bottleneck here - Better focus on optimizing other stuff!

EfficiencyNo need to consolidate

Volatile data:

- Redis can set an expiration time for a value

- Very easy for: - implementing timeouts - removing old queries - adding temporary capping

EfficiencyNo need to consolidate

Messages:

- Redis implements pub/sub and lists.

- Publish/Subscribe to a channel - all components get the message - use it for monitoring

- List: push/pop messages - only one component gets the message - use the blocking versions for load balancing

EfficiencyNo need to consolidate

1) Introduction2) What did we need?3) How we build it4) Advantages of NoSQL5) Cool uses of NoSQL6) Limits7) Conclusion

Django apps compatibility:

- we use Django and several contrib and external apps. - (Standing in the shoulder of giants)

- but no support for NoSQL in Django ORM

- dropping SQL is not an option!

- we use MySQL. South migrations.

EfficiencyNo need to consolidate

1) Introduction2) What did we need?3) How we build it4) Advantages of NoSQL5) Cool uses of NoSQL6) Limits7) Conclusion

Summary:

- We use a combination of SQL and NoSQL

- Using NoSQL was necessary to meet the requirements

- There are a lot of different uses for NoSQL

EfficiencyNo need to consolidate

Recommendations:

- There is no silver bullet

- Use the best tool for each task

- But avoid unneeded complexity!

- Try Redis. Don't do a migration, just add it to your stack

EfficiencyNo need to consolidate

Thanks for attending!

- Our beta will be ready soon.Get a free trial at http://catchoom.com

- Contact me at [email protected]

- Questions?

EfficiencyNo need to consolidate

EfficiencyNo need to consolidate

Thanks for attending!

- Our beta will be ready soon.Get a free trial at http://catchoom.com

- Contact me at [email protected]

- Questions?

EfficiencyNo need to consolidate

catchoom.com | @catchoom

David Arcos | @DZPMCatchoom | http://catchoom.com | @catchoomClick to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level

Click to edit the title text format

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level

NoSQL matters in Recognition Service

Copyright 2012, Catchoom A. N., S.L.

catchoom.com | @catchoomClick to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level

Click to edit the title text format

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level