ccb12 app development with indexes, queries and geo

Developing with Views:See Inside the Data

Matt IngenthronDirector, Developer Solutions

What we’ll talk about

• Lifecycle of a view• Index definition, build, and query phase• Consistency options (async by default)• Emergent Schema - Views and Documents• Patterns:

• Secondary index• Basic aggregations (avg ratings by brewery)• Time-based analytics with group_level• Leaderboard• Schema Evolution

VIEW LIFECYCLE:DEFINE - BUILD - QUERY

View Definition (in JavaScript)

like:CREATE INDEX city ON brewery city;

Distributed Index Build Phase

• Optimized for lookups, in-order access and aggregations• All view reads from disk (different performance profile)• View builds against every document on every node

– This is why you should group them in a design document• Automatically kept up to date

SERVER 1

SERVER 2

SERVER 3

Doc 8 Doc 6

Doc 8 Doc 2

Replica Docs Replica Docs Replica Docs

Active Docs Active Docs Active Docs

Dynamic Range Queries with Optional Aggregation

• Efficiently fetch an row or group of related rows.• Queries use cached values from B-tree inner nodes when possible• Take advantage of in-order tree traversal with group_level queries

SERVER 1

SERVER 2

SERVER 3

Doc 8 Doc 6

Doc 8 Doc 2

Replica Docs Replica Docs Replica Docs

Active Docs Active Docs Active Docs

?startkey=“J”&endkey=“K”{“rows”:[{“key”:“Juneau”,“value”:null}]}

Queries run against stale indexes by default

• stale=update_after (default if nothing is specified)– always get fastest response– can take two queries to read your own writes

• stale=ok– auto update will trigger eventually– might not see your own writes for a few minutes– least frequent updates -> least resource impact

• stale=false– Use with Persistence observe if data needs to be included in

view results– BUT aware of delay it adds, only use when really required

Development vs. Production Views

• Development views index a subset of the data.

• Publishing a view builds the index across the entire cluster.

• Queries on production views are scattered to all cluster members and results are gathered and returned to the client.

EMERGENT SCHEMA

Emergent Schema

• Falls out of your key-value usage• Helps to know what's efficient• Mostly you can relax

JSON.orgGithub API

Twitter API

"Capture the user's intent"

QUERY PATTERN:FIND BY ATTRIBUTE

Find documents by a specific attribute

• Lets find beers by brewery_id!

The index definition

The result set: beers keyed by brewery_id

QUERY PATTERN:BASIC AGGREGATIONS

Use a built-in reduce function with a group query

• Lets find average abv for each brewery!

We are reducing doc.abv with _stats

Group reduce (reduce by unique key)

QUERY PATTERN:TIME-BASED ROLLUPS

Find patterns in beer comments by time

{ "type": "comment", "about_id": "beer_Enlightened_Black_Ale", "user_id": 525, "text": "tastes like college!", "updated": "2010-07-22 20:00:20"}{ "id": "f1e62"}

timestamp

Query with group_level=2 to get monthly rollups

dateToArray() is your friend

dateTo

• String or Integer based timestamps• Output optimized for group_level

queries• array of JSON numbers:

[2012,9,21,11,30,44]

group_level=2 results

• Monthly rollup• Sorted by time—sort the query results in your

application if you want to rank by value—no chained map-reduce

group_level=3 - daily results - great for graphing

• Daily, hourly, minute or second rollup all possible with the same index.

• http://crate.im/posts/couchbase-views-reddit-data/

QUERY PATTERN:LEADERBOARD

Aggregate value stored in a document

• Lets find the top-rated beers!{ "brewery": "New Belgium Brewing", "name": "1554 Enlightened Black Ale", "abv": 5.5, "description": "Born of a flood...", "category": "Belgian and French Ale", "style": "Other Belgian-Style Ales", "updated": "2010-07-22 20:00:20", “ratings” : { “ingenthr” : 5, “jchris” : 4, “scalabl3” : 5, “damienkatz” : 1 }, “comments” : [ “f1e62”, “6ad8c” ]}

ratings

Sort each beer by its average rating

• Lets find the top-rated beers!

average

WHAT NOT TO WRITE

Most common mistakes

• Reduces that don’t reduce• Trying to do too many things with one view• Emitting too much data into a view value• Expecting view query performance to be as fast as

get/set• Recursive queries require application code.

GEOGRAPHIC INDEX

Experimental Status

• Not yet using Superstar trees • (only fast on large clusters)

• Optimized for bulk loading

FULL TEXT INDEX

Elastic Search Adapter

• Elastic Search is good for ad-hoc queries and faceted browsing• Our adapter is aware of changing Couchbase topology• Indexed by Elastic Search after stored to disk in Couchbase

ElasticSearch

QUESTIONS?

Views Under The Hood

J Chris AndersonArchitect

THIS TALK IS NOT WRITTEN YETmaybe combine with Dustin’s internals talk about vbucket handoff

What we’ll talk about

• Key areas/topics discussed

Dynamic Time Range Queries

The B-tree Index• Helps to know what's efficient• Superstar

http://damienkatz.net/2012/05/stabilizing_couchbase_server_2.html

Logical View B-tree• Incremental reduce values are stored in the tree

REDUCES

Logical View B-tree• Incremental reduce values are stored in the tree

7 5 5 3 2 3

25 REDUCES

Reduce!• Incremental reduce values are stored in the tree

7 5 5 3 2 3

25_count

function(keys, values) { return keys ? values.length : sum(values);}

Dynamic Queries• You can query that tree dynamically• Lots of the patterns are about pulling value from this data structure

7 5 5 3 2 3 { }?startkey=“abba”&endkey=“robot”{“value”:19}

_count

Dynamic Queries• Queries use cached values from B-tree inner nodes when possible• Take advantage of in-order tree traversal with group_level queries

25{7 5 5 3 2 3 {

{ }?startkey=“abba”&endkey=“robot”{“value”:19}

(7 5 5 2)19

_count

Respect Reduce! (anti-pattern)• Incremental reduce values are stored in the tree

function(keys, values) { return values;}

DO NOT DO THIS!

IT DOESN’T reduce

[“ace”, “argh!”,“asphalt”]s[“front”, “garage”,“hibernate”]s[“pluto”, “nectar”,“mirage”]s

[“ace”, “argh!”,“asphalt”, “front”, “garage”,“hibernate”]

Just use the Map

• If you think you need “the identity reduce”—just use the map.

[“ace”, “argh!”,“asphalt”, “front”, “garage”,“hibernate”]USE THE MAP

Lookup via key-range• Find tables during yesterdays lunch shift• Find shifts owned by which manager

7 5 5 3 2 3

?startkey=“abba”&endkey=“robot”{“value”:19}

Schema evolution

Application and Views

• Interactive schema fully controlled by application• If your code can handle it, the database can• Learn to write views defensively

Incremental schema evolution

• Use a view to decide which documents need work• Make your workers idempotent• Once all your data is cleaned up, and old clients are no

longer writing the old format• The cleanup view is obsolete, so is any app code for

dealing with the old case• You've evolved!

ccb12 app development with indexes, queries and geo

Documents

turbocharge your data warehouse queries with columnstore...

ccb12 getting started with couchbase app development

querying for queries: indexes of queries for efficient and...

tuning for performance: indexes and queries – couchbase...

ccb12 navigating the nosql ladnscape

short queries and indexes

ccb12 concur's journey to 1 billion couchbase operations per...

identifying slow queries, and fixing...

chapter 14-1 chapter outline types of single-level ordered...

data storage tree indexes · tree indexes rasmus pagh...

covering indexes for branching path queries

efficient processing regular queries in shared-nothing...

fast indexes and algorithms for set similarity selection...

a framework for supporting dbms-like indexes in the...

hindex: secondary indexes for faster hbase queries

covering indexes for branching path...

queries and...

optimizing oak repository search · second step: ensure...

ccb12 couchbase server 2.0: full text search integration

ccb12 migrating from a scale-up relational database to a...