next gen queries
TRANSCRIPT
![Page 1: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/1.jpg)
![Page 2: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/2.jpg)
Next Gen Queries Alfred R. Fuller May 20th, 2010 http://bit.ly/NextGenQueriesWave
![Page 3: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/3.jpg)
Who?
• Software Engineer
• App Engine Datastore Team – Query Planner
– Removed 1000 Entity Limit – Cursors/Prefetching
– Python and Java SDK
3
![Page 4: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/4.jpg)
4
Notes
• View live notes and ask questions about this session on Google Wave: – http://bit.ly/NextGenQueriesWave
• Good background in previous IO talks – Building Scalable, Complex Apps on App Engine
• Google I/O 2009 – Brett Slatkin
• http://code.google.com/events/io/2009/sessions/BuildingScalableComplexApps.html
– Under the Covers of the Google App Engine Datastore • Google I/O 2008 – Ryan Barrett
• http://sites.google.com/site/io/under-the-covers-of-the-google-app-engine-datastore
![Page 5: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/5.jpg)
5
Outline
• Overview (of largest query changes since launch)
• Background
• Technology – Zigzag Merge Join
– MultiQuery
• Examples – Just a few interesting cases (not necessarily practical)
• Corollaries
![Page 6: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/6.jpg)
Current System
• Incredibly Scalable
• Schema-less
• Able to perform many query operations – Equality Filter (=, IN)
– Inequality Filter (!=, <, <=, >=, >) – Sort
• Composite Indexes – Multiple filters (only 1 inequality) and sort orders – Logical AND operator
– (not required for only equality filters and no sort)
6
![Page 7: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/7.jpg)
7
Improvements – Query Capabilities
• Combine filters using arbitrary query logic! – AND, OR, NOT, sub expressions
– (still single inequality filter)
• First class support for domain specific queries: – Geo-Query – Date Range Queries
– (multiple inequality filters on numeric properties)
![Page 8: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/8.jpg)
Improvements – Composite Indexes
• Reduced requirements – Solve exploding index problem
– Fewer indexes server more queries
• Composite index selection = dark art – Space Vs Speed
– Write Vs Read Latency/Cost
8
![Page 9: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/9.jpg)
Scalability
9
![Page 10: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/10.jpg)
10
Disclaimer
• Next Gen Queries != Current or Next Release – Rolling out these features in pieces
– Infrastructure is there
– Exposing it to users will take time
• Syntax/API not finalized
• (some of the backend changes available today)
![Page 11: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/11.jpg)
11
Example Model - Photo – Owner: person who posted picture
– Tag: List of user assigned tags
– People: List of people in the photo (either user assigned or facial recognition)
– Location: Geo location of where photo was taken – Date: Time/Date when the photo was taken
![Page 12: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/12.jpg)
Background
12
![Page 13: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/13.jpg)
Indexes
• Index data = Set of ordered values
• Index = Sorted collection of index data
• Built-in Property Index:
• Composite Index:
Kind Key Value Name
Index Key Composite Value
13
![Page 14: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/14.jpg)
Query => Index Scan (Query Planner)
• Split index data into 2 pieces: – Prefix: held constant
– Postfix: different for each result
• Equality filters => prefix
• Orders => postfix
• Inequality filters => range restrictions on the postfix
14
![Page 15: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/15.jpg)
Photo Photo Photo Photo
SELECT * FROM Photo WHERE tag = ‘family’
• Prefix Constraints – Kind == Photo
– Name == tag
– Value == ‘family’
• Postfix Ordering – key asc
Kind Key Value Name Photo family tag key key
15
![Page 16: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/16.jpg)
Photo Photo Photo Photo
SELECT * FROM Photo ORDER BY date DESC
• Prefix Constraints – Kind == Photo
– Name == date
• Postfix Ordering – date desc
– key asc
Kind Key Name Photo date key key
Value 5/19/10 5/20/10
16
![Page 17: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/17.jpg)
Photo Photo Photo Photo
SELECT * FROM Photo WHERE tag = ‘family’ AND tag = ‘outside’ ORDER BY date DESC
• Index 1 on (tag, tag, date)
• Prefix Constraints: – tag == ‘family’
– tag == ‘outside’
• Postfix Order – date desc
– key asc
17
Index Composite Value 1 Key key key 5/19/10 5/20/10
tag tag family outside date
![Page 18: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/18.jpg)
Zigzag Merge Join (Current)
18
![Page 19: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/19.jpg)
Zigzag Merge Join
• Efficiently finds index postfixes common to multiple scans
• Produces intermediate false positive results – Skips past sections of non-matching entities
– Tends to scale with size of requested results
19
Mass hysteria
![Page 20: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/20.jpg)
Photo Photo Photo Photo
• Prefix Constraints – S1.Kind == Photo
– S1.Name == tag
– S1.Value == ‘family’
– S2.Kind == Photo
– S2.Name == tag
– S2.Value == ‘outside’
• Postfix Constraints – s1.Key == s2.Key
• Postfix Ordering – key asc
Kind Key Value Name Photo family tag
key key Kind Value Name Photo outside tag
SELECT * FROM Photo WHERE tag = ‘family’ AND tag = ‘outside’
Key Key
S1:
S2:
20
![Page 21: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/21.jpg)
MultiQuery (Current)
21
![Page 22: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/22.jpg)
22
MultiQuery
• Combines multiple query result sets – Uses priority queue based on orders
– (newly optimized to avoid priority queue when possible)
• Currently supports IN and != – WHERE tag IN [‘family’, ‘friends’] ORDER BY date DESC
• WHERE tag == ‘family’ ORDER BY date DESC
• WHERE tag == ‘friends’ ORDER BY date DESC
– WHERE tag != ‘beach’ AND tag !=‘coworkers’ • WHERE tag < ‘beach’
• WHERE tag > ‘beach’ AND tag < ‘coworkers’
• WHERE tag > ‘coworkers’
Warning! Has unintuitive meaning for multi-valued properties. Means: Has a tag other than ‘beach’ and ‘coworkers’
![Page 23: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/23.jpg)
Next Gen Features
![Page 24: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/24.jpg)
Zigzag Merge Join += Sort
• New features: – Merge on entire postfix (instead of just key)
24
![Page 25: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/25.jpg)
Photo Photo Photo Photo
SELECT * FROM Photo WHERE tag = ‘family’ AND tag = ‘outside’ ORDER BY date DESC
• Index 1 on (tag, date)
• Prefix Constraints: – S1.tag == ‘family’
– S2.tag == ‘outside’
• Postfix Constraints: – S1.date == S2.date
– S1.key == S2.key
• Postfix Order – date desc
– key asc
25
Index Key Composite Value tag date
Index Key tag date 1 outside
1 family S1:
S2: Key date key key 5/19/10 5/20/10
![Page 26: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/26.jpg)
26
Zigzag Merge Join += OR, NOT
• AND – S1.postfix == S2.postfix
• OR – No join constraint – Every postfix matches – Each postfix only seen once
• NOT – S1.postfix != S2.postfix for all S1.prefix and S2.prefix – Equivalent to set subtraction – NOT(a) == ALL – a – Does not require an inequality filter!! – Intuitive results for multi-valued properties!!
![Page 27: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/27.jpg)
27
Not Example • a AND NOT(b)
– a = [2,3,5]
– b = [1,3,4]
– result = [2,5]
• Algorithm – a:2 – b(>=2):3 – 2 matches – a:3 – 3 does not match – a:5 – b(>=5):Null – 5 matches – a:Null
• Performance: – Only looked at 4 out of 6 keys to produce 2 results
![Page 28: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/28.jpg)
((a AND b) OR c) AND NOT(d)
28
![Page 29: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/29.jpg)
MultiQuery += OR
• WHERE tag IN [‘family’, ‘friends’] OR person IN […] ORDER BY date DESC – WHERE tag = ‘family’ ORDER BY date DESC – WHERE tag = ‘friends’ ORDER BY date DESC
– WHERE person = ‘…’ ORDER BY date DESC
– WHERE person = ‘…’ ORDER BY date DESC – …
• No false positives but could be lots of dupes!
• Performance vs. Zigzag depends on data
29
![Page 30: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/30.jpg)
MultiQuery += Geo, Date Range, …
• QuerySplitter – Produces multiple parallel query components
– Geo/Date Range: range scans along space filling curve • Accuracy Vs # of parallel components
• Entity Filter – Removes unwanted results
– De-dupe always needed
– Geo/Date Range: Fuzzy result pruning • Removes points outside of exact range
30
![Page 31: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/31.jpg)
Examples (Current Vs Next)
31
![Page 32: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/32.jpg)
• Meaning: “Find recent photos of my family taken outside”
• Minimum composite index requirement: – Current Gen: Index on (tag, tag, date)
• Repeated multi-valued property, “tag” (exploding index)
– Next Gen: Index on (tag, date)
32
SELECT * FROM Photo WHERE tag = ‘family’ AND tag = ‘outside’ ORDER BY date DESC
![Page 33: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/33.jpg)
• Meaning: “Find all photos of my family and friends taken outside but not on the beach”
• Note: No sort order specified (will be ordered by tag ASC)
• Minimum composite index requirement: – Current Gen: Index on (tag, tag, tag)
• Repeated multi-valued property, “tag” (exploding index)
• Unintuitive results (tag != ‘beach’ will have no effect)
– Next Gen: No index required • Uses built-in indexes
33
SELECT * FROM Photo WHERE tag IN [‘family’, ‘friends’] AND tag = ‘outside’ AND tag != ‘beach’
![Page 34: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/34.jpg)
• Meaning: “Find recent photos of my family and friends taken outside but not on the beach”
• Note: Same as last query except ordered and using AND, OR, NOT and sub expressions
• Minimum composite index requirement: – Current Gen: Not possible because of order – Next Gen: Index on (tag, date)
• Arbitrary number of ‘tag =‘ filters supported by this single index
34
SELECT * FROM Photo WHERE (tag = ‘family’ OR tag = ‘friends’) AND tag = ‘outside’ AND NOT(tag = ‘beach’) ORDER BY date DESC
![Page 35: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/35.jpg)
• Meaning: “Find recent photos of my family and friends taken outside but not on the beach”
• Note – Not reliant on proper tagging – Using OR on different properties, – NOT is applied to a entire sub-expression – Requires clever use of geo encoding to order by date
• Minimum composite index requirement: – Current Gen: Not possible
– Next Gen: • Index on (tag, date) • Index on (people, date) • Index on (geo(location), date)
35
SELECT * FROM Photo WHERE (tag IN [‘family’, ‘friends’] OR people IN […]) AND tag = ‘outside’ AND NOT(tag = ‘beach’ OR location IN [{Coastal Regions}]) ORDER BY date DESC
![Page 36: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/36.jpg)
What to use when? • Zigzag: produces intermediate false positive results
– Tends to be O(R) worst case O(N)
• MultiQuery/Query: produces duplicate correct results – Guaranteed to be O(R) as # of sub-queries is constrained
• Actual performance depends on query and shape of the data
• Prefer MultiQuery/Query when possible – Not possible when indexes are missing or features are not supported – Can use both on a single query
36
Scalability Index
Requirements AND OR NOT
Domain Specific Queries
Zigzag O(R) – O(N)
MultiQuery/Query O(R)
R = size of final result set
N = size of largest sub-scan
![Page 37: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/37.jpg)
Capabilities • Composite Index Requirements
– Zigzag: many, reusable, and simple
– MultiQuery/Query: Requires index very specific index
• NOT – Zigzag: Any # of constrains on any # of properties + sort
– MultiQuery: Any # of constraints on 1 property + first sort must be on that property
37
Scalability Index
Requirements AND OR NOT
Domain Specific Queries
Zigzag O(R) – O(N) Low
MultiQuery/Query O(R) High Fixed # of
filters
Restricted # of
parallel queries
Single property
![Page 38: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/38.jpg)
AND
tag = ‘outside’
Planning/Optimizations
38 Google Confidential
WHERE tag = ‘family’ WHERE tag = ‘friends’
WHERE owner = ‘al’ WHERE owner = ‘nick’
WHERE tag = ‘outside’
AND
tag IN [‘family’, ‘friends’]
tag = ‘outside’
owner != ‘al’
owner != ‘nick’
WHERE
tag IN [‘family’, ‘friends’]
AND tag = ‘outside’
AND owner != ‘al’
AND owner != ‘nick’
OR
tag = ‘family’
tag = ‘friends’
NOT
owner = ‘al’
OR owner = ‘nick’
OR
owner = ‘al’
owner = ‘nick’
![Page 39: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/39.jpg)
WHERE
tag IN [‘family’, ‘friends’]
AND tag = ‘outside’
AND owner != ‘al’
AND owner != ‘nick’
Planning/Optimizations + Index (tag, tag, owner)
39
tag = ‘family’
tag = ‘friends’
owner < ‘al’
‘al’ < owner < ‘nick’
owner > ‘nick’
WHERE tag = ‘family’ AND tag = ‘outside’ AND owner < ‘al’ WHERE tag = ‘family’ AND tag = ‘outside’ AND owner > ‘al’ AND owner < ‘nick’ WHERE tag = ‘family’ AND tag = ‘outside’ AND owner > ‘nick’’ WHERE tag = ‘friends’ AND tag = ‘outside’ AND owner < ‘al’ WHERE tag = ‘friends’ AND tag = ‘outside’ AND owner > ‘al’ AND owner < ‘nick’ WHERE tag = ‘friends’ AND tag = ‘outside’ AND owner > ‘nick’’
![Page 40: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/40.jpg)
AND
tag = ‘outside’
OR
tag = ‘family’
tag = ‘friends’
NOT
tag = ‘beach’ AND
tag = ‘coworkers’,
-date
OR
owner = ‘al’
owner = ‘owner’
Planning/Optimizations + Index(tag, tag)
40
WHERE tag = ‘family’ AND tag = ‘outside’
WHERE tag = ‘friends’ AND tag = ‘outside’
WHERE owner = ‘al’
WHERE owner = ‘nick’
OR tag = ‘family’ AND tag = ‘outside’
tag = ‘friends’ AND tag = ‘outside’
![Page 41: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/41.jpg)
Space Vs Time (Write Vs Read Latency/Cost)
• WHERE a = ‘…’ AND b = ‘…’ AND c = ‘…’ – a = ‘…’
– b = ‘…’ – c = ‘…’
• WHERE a = ‘…’ AND b = ‘…’ AND d = ‘…’ • a = ‘…’
• b = ‘…’
• d = ‘…’
• + Index(a, b, c)
• + Index(a, b, d) • +Space, -Time, +Write, -Read
41
![Page 42: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/42.jpg)
Already Available
• + Index(a, b)
• WHERE a = ‘…’ AND b = ‘…’ AND c = ‘…’ – a = ‘…’ AND b = ‘…’
– c = ‘…’
• WHERE a = ‘…’ AND b = ‘…’ AND d = ‘…’ – a = ‘…’ AND b = ‘…’ • d = ‘…’
42
![Page 43: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/43.jpg)
Corollaries
43
![Page 44: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/44.jpg)
SearchableModel becomes useful!
• Every query can now be solved using zigzag – Never a need to have more than one property in prefix
(although can be good to improve performance)
• Almost never a reason to sort or restrict range on multi-valued properties
• Can handle arbitrary number of equal filters without needing any extra indexes
• Thus no more exploding indexes!
44
![Page 45: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/45.jpg)
Cursors
• Only store postfix – Much smaller
– Can be used on any index with the same postfix • Equality filters can change completely
– Can be used with multi-query • All queries in multi-query have the same postfix • Deduping issues
• Store raw values – Positions can be generated directly from an entity if there are
no multi-valued properties in the postfix
– A query with reversed orders can be used to scroll backwards • Requires trailing key descending order • Always requires extra indexes
45
![Page 46: Next Gen Queries](https://reader035.vdocuments.us/reader035/viewer/2022070222/613d0be5736caf36b758ad73/html5/thumbnails/46.jpg)
Questions? http://bit.ly/NextGenQueriesWave
46