deep dive into n1ql: couchbase connect 2015

43
DEEP DIVE INTO N1QL: INTERNALS AND POWER FEATURES IN COUCHBASE 4.0 Keshav Murthy Couchbase Engineering [email protected] @N1QL @rkeshavmurthy

Upload: couchbase

Post on 11-Aug-2015

133 views

Category:

Technology


3 download

TRANSCRIPT

DEEP DIVE INTO N1QL:INTERNALS AND POWER FEATURES IN COUCHBASE 4.0Keshav Murthy Couchbase [email protected]@N1QL @rkeshavmurthy

©2015 Couchbase Inc. 2

Agenda

Query Service OverviewQuery Service ArchitectureN1QL Power FeaturesQ&A

Query Service Overview

©2015 Couchbase Inc. 4

Couchbase Server Cluster Architecture

4

STORAGE

Couchbase Server 1

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster Manager

Managed CacheStorage

Data Service

Index Service

Query Service

STORAGE

Couchbase Server 2

Managed Cache

Cluster ManagerCluster Manager

Data Service

Index Service

Query Service

STORAGE

Couchbase Server 3

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster Manager

Data Service

Index Service

Query Service

STORAGE

Couchbase Server 4

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster Manager

Data Service

Index Service

Query Service

STORAGE

Couchbase Server 5

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster Manager

Data Service

Index Service

Query Service

STORAGE

Couchbase Server 6

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster Manager

Data Service

Index Service

Query Service

Managed CacheStorage

Managed CacheStorage

Managed CacheStorage

Managed CacheStorage

Managed CacheStorage

©2015 Couchbase Inc. 5

Couchbase Server Cluster Service Deployment

5

STORAGE

Couchbase Server 1

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster Manager

Managed CacheStorage

Data Servi

ceSTORAGE

Couchbase Server 2

Managed Cache

Cluster ManagerCluster Manager

Data Servi

ceSTORAGE

Couchbase Server 3

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster Manager

Data Servi

ceSTORAGE

Couchbase Server 4

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster Manager

Query

Service

STORAGE

Couchbase Server 5

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster Manager

Query

Service

STORAGE

Couchbase Server 6

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster Manager

Index

Service

Managed CacheStorage

Managed CacheStorage Storage

STORAGE

Couchbase Server 6

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster Manager

Index

Service

Storage

Managed Cache

Managed Cache

©2015 Couchbase Inc. 6

N1QL: Query Execution Flow

Clients

1. Submit the query over REST API

8. Query result

2. Parse, Analyze, create Plan

7. Evaluate: Documents to results

3. Scan Request;

index filters

6. Fetch the documents

Index

Service

Query

Service Data

Service

4. Get qualified doc keys

5. Fetch Request, doc keys

SELECT c_id,        c_first, c_last,        c_max        FROM   CUSTOMER WHERE   c_id = 49165;

{ "c_first": "Joe", "c_id": 49165, "c_last": "Montana", "c_max" : 50000}

Query Service Architecture

©2015 Couchbase Inc. 8

Inside a Query Service

Client

FetchParse Plan Join FilterPre-Aggregate

Offset Limit ProjectSortAggregateScan

Query ServiceInde

x Servi

ce

Data Servi

ce

©2015 Couchbase Inc. 10

Client to Query Service: REST API

Communication protocol is REST on top of HTTP

The database protocol structure is embedded within the REST API.

Query Service is stateless: All query information is embedded within the REST request.

REST is open. All REST clients work with N1QL

All N1QL clients, JDBC, ODBC drivers use REST

Fetch

Parse

Plan

Join

Filter

Offset

Limit

Project

Sort

Aggregate

Scan

import requestsimport jsonurl = "http://localhost:8093/query"s1=”SELECT * FROM CUSTOMER WHERE C_ID = 1284";r = requests.post(url, data=s1, auth=('Administrator', 'abc'))print r.json()

©2015 Couchbase Inc. 11

Query Execution: Parse & Semantic Check

Analyzes the Query for syntax & grammar

Only verifies for existence of referenced buckets

Flexible schema means, you can refer to arbitrary attribute names

Use IS MISSING clause to check if the keyname is present

Full reference to JSON structure Nested reference: CUSTOMER.contact.address.state

Array Reference: CUSTOMER.c_contact.phone_number[0]

SQL is enhanced to access & manipulate Arrays

Fetch

ParsePlan

Join

Filter

Offset

Limit

Project

Sort

Aggregate

Scan

©2015 Couchbase Inc. 12

Query Execution: Parse & Semantic Check

SELECT c_zip,        COUNT(c_id),        AVG(c_balance)          FROM   CUSTOMER WHERE  c_state = ‘CA’   AND c_year = 2014ORDER  BY         COUNT(c_id) DESC LIMIT 100

Simple refererences to the attribute name, just like

columnsUse expressions, just like

SQL

Table/keyspace/bucket references.

Filters on the JSON document work just like SQL

Sorting of the result set

Top N clause.

Fetch

ParsePlan

Join

Filter

Offset

Limit

Project

Sort

Aggregate

Scan

©2015 Couchbase Inc. 14

Query Execution: Plan

Fetch

Parse

Plan

Join

Filter

Offset

Limit

Project

Sort

Aggregate

Scan

Each query can be executed in several ways

Create the query execution plan Access path for each keyspace reference Decide on the filters to push down Determine Join order and join method Create the execution tree

For each keyspace reference: Look at the available indices Match the filters in the query with index

keys Choose one or more indices for each

keyspace

©2015 Couchbase Inc. 15

Query Execution: Plan

Fetch

Parse

Plan

Join

Filter

Offset

Limit

Project

Sort

Aggregate

Scan

EXPLAIN SELECT c_id,        c_first,        c_middle,        c_last,        c_balance        FROM   CUSTOMER WHERE  c_w_id = 49        AND c_d_id = 16        AND c_last = ‘Montana’;  

Explain provides the JSON representation of the query plan

Focus on the index selection and the predicates pushed down

©2015 Couchbase Inc. 16

Query Execution: Plan

Fetch

Parse

Plan

Join

Filter

Offset

Limit

Project

Sort

Aggregate

Scan

{ "#operator": "IndexScan", "index": "CU_W_ID_D_ID_LAST", "keyspace": "CUSTOMER", … "spans": [ { "Range": { "High": [ "49", "16", "\"Montana\"" ], "Inclusion": 3, "Low": [ "49", "16", "\"Montana\"" ] },

©2015 Couchbase Inc. 17

Query Execution: Plan

Fetch

Parse

Plan

Join

Filter

Offset

Limit

Project

Sort

Aggregate

Scan"#operator": "Parallel", "~child": { "#operator": "Sequence", "~children": [ { "#operator": "Fetch", "keyspace": "CUSTOMER", "namespace": "default » }, {"#operator": "Filter”, "condition": "((((`CUSTOMER`.`C_W_ID`) = 49) and ((`CUSTOMER`.`C_D_ID`) = 16)) and ((`CUSTOMER`.`C_LAST`) = \"Montana\"))”},

©2015 Couchbase Inc. 18

Query Execution: Project

Fetch

Parse

Plan

Join

Filter

Offset

Limit

Project

Sort

Aggregate

Scan "#operator": "Parallel", "~child": { "#operator": "Sequence", "~children": [ { "#operator": ”Project", "keyspace": "CUSTOMER", "namespace": "default » }, {

©2015 Couchbase Inc. 21

Query Execution: Scan

Data Service

Global Secondary

Index

View Indexes

Global Secondary

Index

Global Secondary

Index

KeyScan

IndexScan

IndexScan

Data Fetch

Fetch

Parse

Plan

Join

Filter

Offset

Limit

Project

Sort

Aggregate

Scan

Query Service

Cluster Map

©2015 Couchbase Inc. 22

Query Execution: Fetch

List of qualified document-keys are grouped into batches.

List of the documents is obtained from the Index or specified directly via USE KEYS clause.

Fetch request is done in parallel. The join operation use the fetch

operation to get the matching document. Fetch results are streamed into next

operators. For big queries, scan-fetch-join-filter-

aggregation will be executing in parallel.

Fetch

Parse

Plan

Join

Filter

Offset

Limit

Project

Sort

Aggregate

Scan

©2015 Couchbase Inc. 23

Query Execution: Join

You can join any two key spaces if one has document-key of the other.

You can store multiple entities within the same bucket and join between distinct groups

Uses Nested Loop JOIN now JOINs are done in the same order

specified in the query Index selection is important for the first

keyspace in the FROM clause. Qualified documents from that scan is

joined with the other Keyspace using the DOCUMENT KEYS

Fetch

Parse

Plan

JoinFilter

Offset

Limit

Project

Sort

Aggregate

Scan

©2015 Couchbase Inc. 24

Query Execution: Join

"CUSTOMER": {"C_D_ID": 10,"C_ID": 1938, "C_W_ID": 1,

"C_BALANCE": -10, "C_CITY": ”San Jose", "C_CREDIT": "GC”, \"C_DELIVERY_CNT": 0, "C_DISCOUNT": 0.3866,

"C_FIRST": ”Jay","C_LAST": ”Smith",

"C_MIDDLE": "OE", "C_PAYMENT_CNT": 1, "C_PHONE": ”555-123-1234", "C_SINCE": "2015-03-22 00:50:42.822518", "C_STATE": ”CA", "C_STREET_1": ”555, Tideway Drive", "C_STREET_2": ”Alameda",

"C_YTD_PAYMENT": 10, "C_ZIP": ”94501" }

Document key: “1.10.1938” Document key: “1.10.143”

“ORDERS”: { “O_CUSTOMER_KEY”: “1.10.1938”: "O_D_ID": 10, "O_ID": 1,

"O_ALL_LOCAL": 1, "O_CARRIER_ID": 2, "O_C_ID": 1938, "O_ENTRY_D": "2015-05-19 16:22:08.544472", "O_ID": 143, "O_OL_CNT": 10, "O_W_ID": 1}x

“ORDERS”: { “O_CUSTOMER_KEY”: “1.10.1938”:

"O_ALL_LOCAL": 1, "O_CARRIER_ID": 2, "O_C_ID": 1938, "O_D_ID": 10, "O_ENTRY_D": "2015-05-19 16:22:08.544472", "O_ID": 1355, "O_OL_CNT": 10, "O_W_ID": 3}

Document key: “1.10.1355”

©2015 Couchbase Inc. 25

Query Execution: Join

SELECT COUNT(o.O_ORDER_CNT ) AS CNT_O_OL_CNT FROM   ORDERS o        INNER JOIN CUSTOMER c        ON KEYS (o.O_CUSTOMER_KEY)  WHERE  o.O_CARRIER_NAME = ”Penske”        AND c.C_STATE = “CA”;  

Two keyspace joins

ON Clause for the joinFetch

Parse

Plan

JoinFilter

Offset

Limit

Project

Sort

Aggregate

Scan

©2015 Couchbase Inc. 26

N1QL: JoinSELECT *FROM   ORDERS o INNER JOIN CUSTOMER c  ON KEYS (o.O_C_ID) LEFT JOIN PREMIUM p ON KEYS (c.C_PR_ID) LEFT JOIN demographics d ON KEYS (c.c_DEMO_ID) 

Fetch

Parse

Plan

JoinFilter

Offset

Limit

Project

Sort

Aggregate

Scan

Support INNER and LEFT OUTER joins Join order follows the order in the FROM

clause. N1QL supports the nested loop joins

now. Join is always from a key of one

document(outer table) to the document key of the second document (inner table)

©2015 Couchbase Inc. 27

Query Execution: Filter

Filters not pushed to the index scan will

have to be applied.

Since the indices are maintained

asynchronously, we apply the filters again

to ensure integrity of the result set.

Fetch

Parse

Plan

Join

Filter

Offset

Limit

Project

Sort

Aggregate

Scan

©2015 Couchbase Inc. 28

Query Execution: Aggregate, Sort, Offset, Limit

Each stream creates partial grouping & aggregates

The result set is sorted to evaluated the ORDER BY

The sort is done in parallel OFFSET and LIMIT is typically used in

pagination Evaluated after the ORDER BY clause is

evaluated.

Fetch

Parse

Plan

Join

Filter

Offset

Limit

Project

Sort

AGG

Scan

©2015 Couchbase Inc. 29

Query Execution: Project

Fetch

Parse

Plan

Join

Filter

Offset

Limit

Project

Sort

Aggregate

Scan

SELECT C_ZIP, count(*) as NUMCUSTOMERS FROM CUSTOMER GROUP BY C_ZIP ORDER BY COUNT(*) DESC LIMIT 10;{ "requestID": "ff49a6e6-35f0-4eac-8d74-aa8a0aab58e7", "signature": { "C_ZIP": "json", "NUMCUSTOMERS": "number" }, "results": [ { "C_ZIP": "304811111", "NUMCUSTOMERS": 12 },... { "C_ZIP": "709811111", "NUMCUSTOMERS": 10 } ], "status": "success", "metrics": { "elapsedTime": "1.57600634s", "executionTime": "1.575851088s", "resultCount": 10, "resultSize": 228 }}

Projection

Signature of the resultset

Query execution & resultset information

N1QL Power Features: USE KEYS

©2015 Couchbase Inc. 31

Power Features: USE KEYS

Data Service

Global Secondary

Index

View Indexes

Global Secondary

Index

Global Secondary

Index

KeyScan

IndexScan

IndexScan

Data Fetch

Query Service

Cluster Map

©2015 Couchbase Inc. 32

Power Features: USE KEYS

SELECT c_id,        c_first,        c_middle,        c_last,       (c_max - c_balance)        FROM   CUSTOMER USE KEYS [‘1.10.1938’];  

KeyScan: Directly use the Couchbase cluster map to get the document

You can give one or more values in the array From N1QL, get keys via: META(CUSTOMER).id

©2015 Couchbase Inc. 33

Power Features: USE KEYSEXPLAIN SELECT * FROM CUSTOMER USE KEYS ['1.1.1634', '1.1.1639'];{ …[ { "#operator": "Sequence", "~children": [ { "#operator": "KeyScan", "keys": "[\"1.1.1634\”, "1.1.1639”]" }, { "#operator": "Parallel", "~child": { "#operator": "Sequence", "~children": [ { "#operator": "Fetch", "keyspace": "CUSTOMER", "namespace": "default" },

©2015 Couchbase Inc. 34

Power Features: USE KEYS

UPDATE customer USE KEYS ['1.20.981', '12.42.196'] SET    c_balance = c_balance + 200;

DELETE customer USE KEYS ['1.20.198', '12.42.2848'];

Even when you use the USE KEYS, the indexes are automatically maintained.

N1QL Power Features: UNNEST

©2015 Couchbase Inc. 36

UNNEST: Denormalized CUSTOMER Document

{"C_ZIP" : "828011111","C_STATE" : "vt",

"C_FIRST" : "ykfdbqku","C_CREDIT" : "GC","C_DELIVERY_CNT" : 0,"C_W_ID" : 1,"C_CITY" : "quhpismkzumehqhr","C_STREET_1" : "rmtxadlsxqefdcwf","C_D_ID" : 1,"ORDERS" : [

{"ORDER_LINE" : [

{"OL_AMOUNT" : 0,"OL_DELIVERY_D" : "2015-02-11T14:55:25.480Z","OL_DIST_INFO" : "yptiwgjdelfxmathbjzirvye","OL_I_ID" : 35828,"OL_SUPPLY_W_ID" : 1,"OL_QUANTITY" : 5

},{

"OL_AMOUNT" : 0,"OL_DELIVERY_D" : "2015-02-11T14:55:25.480Z","OL_DIST_INFO" : "dxhqulhcgksjgqsicujzqhdb","OL_I_ID" : 26024,"OL_SUPPLY_W_ID" : 1,"OL_QUANTITY" : 5

},}

….

©2015 Couchbase Inc. 37

Power Features: UNNEST operation

SELECT COUNT(my_order_line)               AS total_orders,        MAX(my_order_line.ol_delivery_d)   AS max_delivery_date,        MAX(my_order_line.ol_quantity)     AS max_order_quantity,        MAX(my_orders.o_entry_d)           AS max_customer_entry,        MAX(my_orders.o_ol_cnt)            AS max_orderline_entry,        COUNT(customer)                    AS total_customers FROM   CUSTOMER MY_CUSTOMER            UNNEST ORDERS AS my_orders            UNNEST my_orders.order_line AS my_order_line           ;

 

N1QL Power Features: Named Prepared

Statement

©2015 Couchbase Inc. 39

Power Features: Named Prepare Statement

Client

FetchParse Plan Join FilterPre-Aggregate

Offset Limit ProjectSortAggregateScan

Query ServiceInde

x Servi

ce

Data Servi

ce

©2015 Couchbase Inc. 41

Named Prepared Statement

url="http://localhost:8093/query"s = requests.Session()s.keep_alive = Trues.auth = ('Administrator','password')

query = {'statement':'prepare select * from `beer-sample` where name = [$1]’}r = s.post(url, data=query, stream=False)prepared = str(r.json()['results'][0]['name'])

for i in range (0, 5): query={'prepared': '"' + prepared + '"', 'args': '["old_hat_brewery"]' } r = s.post(url, data=query, stream=False) print i, r.json()['metrics']['executionTime']  

Bind Values Many times

Prepare ONCE

Functional Indices

©2015 Couchbase Inc. 43

Functional Indices"contacts": { "age": 46, "children": [ { "age": 17, "fname": "Aiden", "gender": "m" }, { "age": 2, "fname": "Bill", "gender": "f" } ], "email": "[email protected]", "fname": "Dave", "hobbies": [ "golf", "surfing" ],

"lname": "Smith", "relation": "friend", "title": "Mr.", "type": "contact" } } ],

CREATE INDEX idx_lname_lower ON contacts(LOWER(lname)) using GSI;

SELECT count(*) FROM contacts WHERE lower(lname) = smith;

©2015 Couchbase Inc. 44

Functional Indices"contacts": { "age": 46, "children": [ { "age": 17, "fname": "Aiden", "gender": "m" }, { "age": 2, "fname": "Bill", "gender": "f" } ], "email": "[email protected]", "fname": "Dave", "hobbies": [ "golf", "surfing" ],

"lname": "Smith", "relation": "friend", "title": "Mr.", "type": "contact" } } ],

The value indexed is the result of the function or expression.

The query has to use the same expression in the WHERE clause for the planner to consider using the index.

Use EXPLAIN to verify using the index.

N1QL Power Features: Multi-Index Scans

©2015 Couchbase Inc. 46

Power Features: IntersectScan (Multi-Index Scan)

 SELECT * FROM   customer WHERE  c_last = ’Smith’ AND c_city = 'Santa Clara'; 

Index scan using composite index:– Needs first N keys to be used to choose the index– Will multiple indexes on same set of columns to support filter

push down

IntersectScan using multiple indices:– Multiple indices are scanned in parallel– Provides more flexibility in using the indices for filters– Requires less number of indexes defined on table.• Can save on disk space and memory space as well.

©2015 Couchbase Inc. 47

Multi-Index Scan SELECT * FROM   customer WHERE  c_last = ’Smith’ AND c_city = 'Santa Clara';  "#operator": "IntersectScan", "scans": [ { "#operator": "IndexScan", "index": "idx_cust_city", "keyspace": "CUSTOMER", "limit": 9.223372036854776e+18, "namespace": "default", "spans": [ { "Range": { "High": [ "\"Santa Clara\"" ], "Inclusion": 3, "Low": [ "\"Santa Clara\"" ] }, "Seek": null } ], "using": "view" },

{ "#operator": "IndexScan", "index": "idx_last_name", "keyspace": "CUSTOMER", "limit": 9.223372036854776e+18, "namespace": "default", "spans": [ { "Range": { "High": [ "\”Smith\"" ], "Inclusion": 3, "Low": [ "\”Smith\"" ] }, "Seek": null } ],

query.couchbase.com

@N1QL

Keshav [email protected]

@rkeshavmurthy