Transcript
Page 1: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Couchbase 104Justin Michaels

[email protected] | @justindmichaels

Page 2: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Views and Indexes Overview

Page 3: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Indexes are “views” into Data

• shortcut derived from and pointing into, a greater volume of values, data,

information or knowledge

Traditional Index Examples

• Table of Contents

• Card Catalog

Indexes and Views

©2014 Couchbase, Inc. 3

Page 4: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

In Couchbase Map-Reduce is used to maintain Indexes

Map functions are applied to JSON documents and they output or "emit" data that is organized in an Index form

Each emit() call produces a row in the index

Couchbase Views - Map-Reduce Indexes

©2014 Couchbase, Inc. 4

Page 5: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Map-Reduce is a technique designed for dealing with semi-structured data by parallel processing across a distributed system

Different than Hadoop Map/Reduce

• Map functions identify data with collections, process them, and output transformed values

• Reduce functions take the output of Map functions and perform numeric aggregate calculations on them

What is Map Reduce?

©2014 Couchbase, Inc. 5

Page 6: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Map inputs:

• Document – Application data

• Metadata – Couchbase data

Map outputs:

• Document ID

• View Key: User configurable based on JSON fields

• View Value: Only needed when reducing, use ‘null’ otherwise

Produces Index:

• B-tree Structure

• Sorted Alphabetically

Map Functions

©2014 Couchbase, Inc.

Page 7: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Built-in reduce functions (Optional)

• _count – provides a count of unique keys

• _sum – provides a sum total of values

• _stats – provides statistics (max, min, avg, etc.) of values

Operate on results emitted by map function

Results stored pre-computed for fast access

Custom reductions are possible

Reduce Functions

©2014 Couchbase, Inc.

Page 8: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Architecture

Page 9: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

33 2

Architecture - Couchbase View Engine

2

Managed Cache

Dis

k Q

ueu

e

Disk

Replication Queue

App Server

Couchbase Server Node

Doc 1

Doc 1

To other node

View engine Doc 1Doc 1

©2014 Couchbase, Inc.9

Page 10: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

COUCHBASE SERVER CLUSTER

User Configured Replica Count = 1

ACTIVE

Doc 5

Doc 2

Doc

Doc

Doc

SERVER 1

REPLICA

Doc 4

Doc 1

Doc 8

Doc

Doc

Doc

APP SERVER 1

COUCHBASE Client Library

CLUSTER MAP

COUCHBASE Client Library

CLUSTER MAP

APP SERVER 2

Doc 9

• Indexing is distributed across nodes

• Parallelize the effort

• Each node has index for data stored on it

• Queries combine the results from required nodes

ACTIVE

Doc 5

Doc 2

Doc

Doc

Doc

SERVER 2

REPLICA

Doc 4

Doc 1

Doc 8

Doc

Doc

Doc

Doc 9

ACTIVE

Doc 5

Doc 2

Doc

Doc

Doc

SERVER 3

REPLICA

Doc 4

Doc 1

Doc 8

Doc

Doc

Doc

Doc 9

Query

Architecture - Couchbase View Engine

Page 11: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Buckets have one or more DESIGN DOCUMENTS

• Distributed across cluster when created

DESIGN DOCUMENTS contain one or more VIEW definitions

• Design Documents are processed in parallel

• All the views in a single design document are processed sequentially

Architecture – Design Document

BUCKET A

Design document 1View 1

View 2

View 3

Design document 2View 4

View 5

Design document 3 View 6

View 7BUCKET B©2014 Couchbase, Inc.

Page 12: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Architecture – Couchbase Map Reduce

©2014 Couchbase, Inc. 12

Patch

Management

Many others..

Individual document operations are atomic

Views are eventually consistent in relation to documents

Incremental Map-Reduce

• Spread load across nodes

• Each node indexes it’s data

Map Reduce

Process, filter, map

and emit a row

Aggregate mapped

data

Default:

_count

_sum

_stats

Page 13: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Architecture - Index Building Details

©2014 Couchbase, Inc. 13

Views are maintained directly from managed cache

• The entire view is recreated if the view definition has changed

• All the views within a design document are incrementally updated

Views are updated automatically according to:

• Update Interval (time period); default 5000 millisecondsOR (as of 3.x)

• Update Documents (number of changes); default 5000 changes

Update Controlled by:

• Configured Globally via REST for Individual Design Document

• Manual updates provide application control

Page 14: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

stale = UPDATE_AFTER (default if nothing is specified)

• fast response

• can take two operations to read your own writes

stale = OK (most likely to be used)

• auto update only

• might not see your own writes

• least frequent updates -> least resource impact -> highest performance

stale = FALSE (only when TRULY required)

• use with persistTo during set if data needs to force view update

• BUT aware of delay it adds on set and query operation

Architecture - Index Building Details

©2014 Couchbase, Inc.

Page 15: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

In addition to data replicas, optionally create replica for indexes

• Build an index using the data in replica vBuckets

Enabled per bucket (Bucket Config) or per design document (REST API)

• Each node must maintain index for active and replica data

• Implies additional CPU and I/O overhead

Failover and Failures

• Without replica indexes complete view is rebuilt

• Replica indexes enabled if present and queries remain consistent

Architecture - Index Building Details (Replicas)

©2014 Couchbase, Inc.

Page 16: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Architecture - Disk Structure

Each design document creates it’s own set of index files

Index data is always read from disk

• File format allows for successful I/O caching by operating system

Separate disk devices for view versus data files

• Both are append-only

• Both are compacted in parallel

• Better use of IO and caching

• Possible to use SSD’s for improved performance on one or other (or both)

©2014 Couchbase, Inc.

Page 17: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Development vs Production Views

Development Views

• Can be edited

• Can be test on full/partial dataset

• Not automatically maintained

Production Views

• Always operate on full document set

• Cannot be modified

• Automatically updated

Development Views are ‘published’ to Production

Simple creation of the view definition NOT a move to new cluster

Execute Development View on Entire Cluster

Development View

Create

Edit/Refine

Sample Index

Subset

Production View

Full Index

Promote to ProductionFull Data

Full DataBucket Content

©2014 Couchbase, Inc.

Page 18: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Writing Views

Page 19: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Map() Function => Index

function(doc, meta) {emit(doc.username, doc.email)

} indexed key output value(s)create row

json doc doc metadata

Every Document passes through View Map() functions

Map

View Anatomy

©2014 Couchbase, Inc.

Page 20: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Single Element Keys (Text Key)

function(doc, meta) {emit(doc.email, doc.points)

}text key

Map

meta.id doc.email doc.points

u::1 [email protected] 1000

u::35 [email protected] 1200

u::20 [email protected] 900

View Anatomy

©2014 Couchbase, Inc.

Page 21: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Compound Keys (Array)

function(doc, meta) {emit(dateToArray(doc.timestamp), 1)

} array key

Array Based Index Keys get sorted as Strings,

but can be grouped by array elements

Map

meta.id dateToArray(doc.timestamp) value

u::20 [2012,10,9,18,45] 1

u::1 [2012,9,26,11,15] 1

u::35 [2012,8,13,2,12] 1

View Anatomy

Page 22: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

key = “” (exact match)

keys = [ ] (set of keys match)

startkey/endkey = “” (range queries on view key)

startkey_docID/endkey_docID = “” (range queries on meta.id)

stale (false, update_after, ok)

group/group_by (aggregate with grouping)

View Anatomy - Parameters

©2014 Couchbase, Inc.

Page 23: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

View Anatomy - Collation

©2014 Couchbase, Inc.

23

1234567890 < aAbBcCdDeEfFgGhHiIjJkKlLmM...

Unicode Collation

a < á < A < Á < b

1234567890 < a-z < A-Z

Byte Order

Page 24: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

View Anatomy - Sample Document

Document ID

©2014 Couchbase, Inc.

Page 25: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

View Anatomy - Sample Index

ValueKey

©2014 Couchbase, Inc.

Page 26: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

View Anatomy - Examples

©2014 Couchbase, Inc. 26

Patch

Management

Many others..

Page 27: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

View Anatomy - Querying

©2014 Couchbase, Inc. 27

Patch

Management

• Simple View Access

• Exact Match

• Range

• With Reduction

• With Grouping

Page 28: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Best Practices

Page 29: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

View size is determined by key and value contents

• Emit as little as possible … not full document

• Only use values when required by a reduce function

• Only emit either null or the secondary key (doc ID included with each row)

View distribution:

• More views per designdoc require more time to update all views in group

• Single views per designdoc may require more CPU

• Group views in designdocs by update frequency, rather than subject/topic

View Best Practices

©2014 Couchbase, Inc.

Page 30: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Queries should have consistent response times

• Indexes are pre-materialized

• Expect to use “stale.ok”

File system cache availability for the index has a big impact on performance

• Indexes are disk based

• Reduce cluster quota to give more system cache

In house performance results show that by doubling system cache availability

• query latency reduces by half

• throughput increases by 50%

View Best Practices

©2014 Couchbase, Inc.

Page 31: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

View Best Practices

31

Patch

Management

Many others..

Avoid computing too many things in a single View

Select (filter) data to avoid unnecessary entries in the View

• Use document types to make Views more selective

Project (map) only necessary data and emit it as value

• When possible emit a null value and perform additional Get to retrieve the whole document

Use the built in reduce functions if possible

©2014 Couchbase, Inc.

Page 32: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Couchbase Query Language

32

Page 33: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Querying with N1QL (“Nickel”)

33

Person

JSON can model our

Complex World

N1QL Can Query

that World

N1QL Developer Preview and Tutorial

http://docs.couchbase.com/developer/n1ql-dp3/n1ql-intro.html

http://query.pub.couchbase.com/tutorial/#1©2014 Couchbase, Inc.

Page 34: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Thank You!

Next Session:

Couchbase 105 | December 3, 2014 | 10am Pacific

Cross Data Center Replication (aka XDCR)

34

Justin Michaels

[email protected] | @justindmichaels


Top Related