deep dive into n1ql with global secondary indexes – couchbase live new york 2015

37
Deep Dive into N1QL & Indexing Cihan Biyikoglu | Dir. Product Management, Couchbase

Upload: couchbase

Post on 12-Jan-2017

833 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

Deep Dive into N1QL & Indexing

Cihan Biyikoglu | Dir. Product Management, Couchbase

Page 2: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

©2015 Couchbase Inc. 2

Goals Deeper look at query performance and scale

– Look at Query and Index Service Scale Characteristics– Understand Query Execution Flow– Understand Index Usage

Page 3: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

©2015 Couchbase Inc. 3

Agenda Part I - Architectural Overview

– Architecture with Couchbase Server 4.0– Query Processing & Indexing

Part II - Optimizing Queries– Execution Plans and Operators

QA

Page 4: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

Architecture OverviewPart I

Page 5: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

©2015 Couchbase Inc. 5

Couchbase Server Cluster Architecture

STORAGE

Couchbase Server 1

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster Manager

Managed CacheStorage

Data Service

Index Service

Query Service

STORAGE

Couchbase Server 2

Managed Cache

Cluster ManagerCluster Manager

Data Service

Index Service

Query Service

STORAGE

Couchbase Server 3

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster Manager

Data Service

Index Service

Query Service

STORAGE

Couchbase Server 4

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster Manager

Data Service

Index Service

Query Service

STORAGE

Couchbase Server 5

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster Manager

Data Service

Index Service

Query Service

STORAGE

Couchbase Server 6

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster Manager

Data Service

Index Service

Query Service

Managed CacheStorage

Managed CacheStorage

Managed CacheStorage

Managed CacheStorage

Managed CacheStorage

Page 6: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

©2015 Couchbase Inc. 6

Couchbase Server Cluster Architecture

STORAGE

Couchbase Server 1

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster Manager

Managed CacheStorage

Data Service

Index Service

Query Service

STORAGE

Couchbase Server 2

Managed Cache

Cluster ManagerCluster Manager

Data Service

Index Service

Query Service

STORAGE

Couchbase Server 3

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster Manager

Data Service

Index Service

Query Service

STORAGE

Couchbase Server 4

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster Manager

Data Service

Index Service

Query Service

STORAGE

Couchbase Server 5

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster Manager

Data Service

Index Service

Query Service

STORAGE

Couchbase Server 6

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster Manager

Data Service

Index Service

Query Service

Managed CacheStorage

Managed CacheStorage

Managed CacheStorage

Managed CacheStorage

Managed CacheStorage

Page 7: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

Query Processing Overview

Page 8: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

©2015 Couchbase Inc. 8

Query Execution Submitting Queries in N1QL

– Stateless Connectivity through REST– Load-Balance across Query Service nodes– Prepared vs Ad-hoc Query Execution– Consistency Dials – more on this later…

Execution Flow

Page 9: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

©2015 Couchbase Inc. 9

Query Service - Capacity Management Scaling the Query Service

– Pro: Load Balance Queries across all nodes– Con: Compete with Index and Data Workloads

Index Service

Couchbase Cluster

Query ServiceData Service

node1 node8

Page 10: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

©2015 Couchbase Inc. 10

Query Service - Capacity Management Scaling the Query Service

– Added CPU: higher concurrent query execution and intra-query parallelization

– Added RAM: improved caching with larger result sets– Added Node: better availability and load balancing

Couchbase Cluster

node1 node8

Data ServiceIndex Service

Query Service

Page 11: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

Indexing Overview

Page 12: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

©2015 Couchbase Inc. 12

Indexing in Couchbase Server 4.0Multiple Indexers GSI – Index Service

– New indexing for N1QL for low latency queries without compromising on mutation performance (insert/update/delete)

– Independently partitioned and independently scalable indexes in Indexing Service

Map/Reduce Views – Data Service– Powerful programmable indexer for complex reporting and indexing logic. – Full partition alignment and paired scalability with Data Service.

Spatial View – Data Service– Incremental R-tree indexing for powerful bounding-box queries– Full partition alignment and paired scalability with Data Service

New

Page 13: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

©2015 Couchbase Inc. 13

Query and Index TodayOnce upon a time in a User Profile System…. Q1: Find the top 10 most “active” customer by #logins in

AUG 2015

{…“customer_name” : ”Cihan”,“total_logins”: {…

“aug_2015”:100,…}

“type” : “customer_profile”…}

Q1Active @ Jan

2015

Page 14: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

©2015 Couchbase Inc. 14

Query and Index TodayINDEX ON Customer_bucket(customer_name, total_logins.jan_2015)WHERE type=“customer_profile”;

SELECT customer_name, total_logins.jan_2015 FROM customer_bucketWHERE type=“customer_profile”ORDER BY total_logins.aug_2015 DESC LIMIT 10;

Q1Active @ Jan

2015

Q1: Execution Plan on N nodes• Scatter: Execute Q1 on N nodes• Gather: gather N results• Finalize: Execute Q1 on

governor node

1

2 2 2 2 2

3

123

Page 15: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

©2015 Couchbase Inc. 15

Query and Index with GSIINDEX ON Customer_bucket(customer_name, total_logins.jan_2015)WHERE type=“customer_profile”;

SELECT customer_name, total_logins.jan_2015 FROM customer_bucketWHERE type=“customer_profile”ORDER BY total_logins.aug_2015 DESC LIMIT 10;

Q1Active @ Jan

2015

Q1: Execution Plan on N nodes• Execute Q1 on N1QL Service

node• Scan index on Index Service node

12

13

Page 16: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

©2015 Couchbase Inc. 16

Introducing Global Secondary IndexesWhat are Global Secondary Indexes? High performance indexes for low latency queries with powerful caching, storage and independent placement.

Power of GSI– Fully integrated into N1QL Query Optimization and

Execution– Independent Index Distribution for Limiting scatter-gather– Independent Scalability with Index Service – more on this

later– Powerful caching and storage with ForestDB

Page 17: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

©2015 Couchbase Inc. 17

Which to choose – GSI vs Views

Workloads New GSI in v4.0

Map/Reduce Views

Complex Reporting

Just In Time Pre-aggregated

Workload Optimization

Optimized for Scan Latency & Throughput

Optimized for Insertion

Flexible Index Logic

N1QL Functions Javascript

Secondary Lookups

Single Node Lookup Scatter-Gather

Page 18: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

©2015 Couchbase Inc. 18

Which to choose – GSI vs Views

Capabilities New GSI in v4.0

Map/Reduce Views

Partitioning Model Independent – Indexing Service

Aligned to Data – Data Service

Scale Model Independently Scale Index Service

Scale with Data Service

Fetch with Index Key Single Node Scatter-Gather

Range Scan Single Node Scatter-Gather

Grouping, Aggregates With N1QL Built-in with Views API

Caching Managed Not Managed

Storage ForestDB Couchstore

Availability Multiple Identical Indexes load balanced

Replica Based

Page 19: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

©2015 Couchbase Inc. 19

Index Service - Capacity Management Scaling the Index Service

– Pro: Load balance scans across all nodes– Con: Compete with Query and Data Workloads

Index Service

Couchbase Cluster

Query ServiceData Service

node1 node8

Page 20: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

©2015 Couchbase Inc. 20

Index Service - Capacity Management Scaling the Index Service

– Added RAM: better caching of indexes– Added CPU: faster index maintenance & parallelized index scans– Add Faster IO Path: faster index persistence– Added Node: better availability and load balancing

Couchbase Cluster

Data ServiceQuery Service

Index Service

Page 21: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

©2015 Couchbase Inc. 21

Data Service

Projector & Router

Indexing Service

Query ServiceIndex Service

SupervisorIndex maintenance &

Scan coordinator

Index#2

Index#1

Query Processorcbq-engine

Bucket#1

Bucket#2

DCP Stream Index#4Index#3

...Bucket#2

Bucket#1

Projector and Router: 1 Projector and Router per node1 stream of changes per buckets per supervisor

ForestDBStorage Engine Supervisor

1 Supervisor per nodeMany indexes per Supervisor

Page 22: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

Optimizing QueriesPart II

Page 23: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

©2015 Couchbase Inc. 23

Execution Plans & Explain EXPLAIN query

– Plan is assembled into an execution flow expressed through the operators

– Operators stream results up and down the stream

Sequence ParallelPrimary

Scan

InitialProjectFetch

InitialProjectFetch

InitialProjectFetch

Limit

Page 24: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

©2015 Couchbase Inc. 24

OperatorsMain Operations Scans

– PrimaryScan: Scan of the Primary Index based on document keys

– IndexScan: Scan of the Secondary Index based on a predicate

Fetch – Fetch: Reach into the Data service with a document key

Projection Operations– InitialProject: reducing the stream size to the fields involved in

query. – FinalProject: final shaping of the result to the requested JSON

shape

Page 25: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

©2015 Couchbase Inc. 25

Operators cont. Operator Assembly

– Parallel: execute all child operations in parallel– Sequence: execute child items in a sequence

Filtering Operators– Filter: Apply a filter expression (ex. WHERE field = “value”)– Limit: limit the number of items returned to N– Offset: start returning items from a specified item count

Page 26: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

©2015 Couchbase Inc. 26

Operators cont.  Join Operators

Join: Join left and right keyspaces on attributes and document key

Unnest: Join operation between a parent and a child with a nested array where parent is repeated for each child array item.

Nest: Grouping operation between a parent and a child array where child array is embedded into the parent.

Page 27: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

DEMO

Execution Plans

Page 28: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

Common Techniques for Tuning Queries

Page 29: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

©2015 Couchbase Inc. 29

Minimize Items Scanned Primary Index Scan vs. Index Scan

– Primary Index can only filter on document keys thus typically means “full-scan” of the bucket– Secondary Index is typically done with predicates and are smaller in size thus better to scan– Index Selection: Based on matching expressions matching in Index and WHERE clause

DEMO #2

SELECT name,updated FROM `beer-sample` WHERE type="beer" AND abv>0 ORDER BY name LIMIT 10;

Vs.

CREATE INDEX i_type on `beer-sample`(type) USING GSI;SELECT name,updated FROM `beer-sample` WHERE type="beer" AND abv>0 ORDER BY name LIMIT 10;

Page 30: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

©2015 Couchbase Inc. 30

Minimize Items Scanned HINT index usage to queries

– There can be multiple indexes with to choose from and you can hint index choice to us.

SELECT name,updated FROM `beer-sample` USE INDEX(i_type using gsi) WHERE type="beer" AND abv>0 ORDER BY name LIMIT 10;

Page 31: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

©2015 Couchbase Inc. 31

Joins Joins are efficient by nature

– Left hand value is joined to the right hand document key with nested loop.

Query: Get brewery location for each beer:– SELECT …– FROM `beer-sample` AS b1 – JOIN `beer-sample` AS b2 on KEYS b1.brewery_id– WHERE b1.type="beer”;

For each document with type=“beer” take b1.brewery_id and look for and equal document key in b2.

Page 32: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

Optimizing Applications

Page 33: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

©2015 Couchbase Inc. 33

FUTURE - New Consistency Settings! View Stale-ness

– Ok: unbounded – query what’s available in the index/view now– False: query after all changes up to the request timestamp (and

maybe more) has been indexed for a given index or view.

New Indexes with Couchbase Server 4.0– Improves granularity of the consistency logical-timestamp. – New: Scan Consistency can be set to any logical timestamp

– Indicate stale=false to stale=ok and everything in between

Page 34: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

©2015 Couchbase Inc. 34

FUTURE - Flexible Consistency SettingsTime

t1 insert (k1, v1)…

t2 do other business logic computation…

t3 issue query/read on (k1,v1) with t3 vs t1

Catch up all the indexes to t3 and then issue query

Identical to “stale=false”

Catch up all the indexes to t1 and then issue query

Improved efficiency over “stale=false”

Page 35: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

Recap

Page 36: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

©2015 Couchbase Inc. 36

Recap New Unique Query and Indexing Architecture

– Workload isolation with MDS gives you a great performance and scale advancement.

Familiar Concepts from your past life will help tune queries– Understand Execution Plans– Understand Indexes and Index Selection– Filter & Limit aggressively– Understand JOINs

Soon - Use powerful new Consistency Dials for best efficiency

Page 37: Deep Dive into N1QL with Global Secondary Indexes – Couchbase Live New York 2015

Get Started with Couchbase Server 4.0 - Couchbase.com/Downloads

Q&ACihan Biyikoglu | [email protected] |

@cihangirb