sql now! how optiq brings the best of sql to nosql data
DESCRIPTION
A talk given at the August, 2013 NoSQL Now! conference in San Jose, California. 2013 is the year of that SQL meets Hadoop & NoSQL. There is a surprisingly good fit between thirty-something SQL and the upstart millennial data technologies. SQL helps you deliver faster value, improve integration with visualization and analytics tools, and bring your data to a larger audience. Julian Hyde, author of Mondrian and Optiq, and a contributor to Apache Drill and Cascading Lingual, shows how to quickly build a SQL interface to a NoSQL system using the Optiq framework. As a case study, he shows how Optiq has allows the Mondrian in-memory analysis engine (part of Pentaho's business intelligence suite) to leverage the memory and compute power of a cluster and pull data from hybrid SQL and NoSQL sources.TRANSCRIPT
![Page 1: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/1.jpg)
SQL Now!
NoSQL Now!San Jose, California
August, 2013
Julian Hyde @julianhyde
![Page 2: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/2.jpg)
About me
• Database: Oracle, Broadbase
• Streaming query: SQLstream
• Open source BI: Mondrian / Pentaho
• Open source SQL: LucidDB, Optiq
• Contributor to Apache Drill, Cascading Lingual
![Page 5: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/5.jpg)
3 things
1. Modern data challenges need both NoSQL and SQL.
2. Optiq is not a database (and this is a Good Thing).
3. How to use Optiq with your data.
![Page 6: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/6.jpg)
A disturbance in the Force
![Page 7: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/7.jpg)
SQL has been around avery, very long time
![Page 8: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/8.jpg)
NoSQL: trade-offs vs SQL
Gain Lose
Scale out Optimization
Flexibility Data independence
Productivity Central control
Purchase cost Tool integration
![Page 9: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/9.jpg)
SQL: key features
• Inspired by Codd’s 1970 “relational database” paper
• Semantics based on relational operators (scan, filter, project, join, agg, union, sort)
• All implementations transform queries
• Optimizer rewrites queries onto available data structures and operators
![Page 10: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/10.jpg)
Optiq
![Page 11: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/11.jpg)
Optiq design principles
1. Do the stuff that SQL does well
2. Let other systems do what they do well
![Page 12: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/12.jpg)
Conventional DB architecture
![Page 13: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/13.jpg)
Optiq architecture
![Page 14: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/14.jpg)
Examples
![Page 15: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/15.jpg)
Example #1: CSV
• Uses CSV adapter (optiq-csv)
• Demo using sqlline
• Easy to run this for yourself:$ git clone https://github.com/julianhyde/optiq-csv$ cd optiq-csv$ mvn install$ ./sqlline
![Page 16: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/16.jpg)
!connect jdbc:optiq:model=target/test-classes/model.json admin admin
!tables
!describe emps
SELECT * FROM emps;
EXPLAIN PLAN FOR SELECT * FROM emps;
SELECT depts.name, count(*)FROM emps JOIN depts USING (deptno)GROUP BY depts.name;
model.json:{ version: '1.0', defaultSchema: 'SALES', schemas: [ { name: 'SALES', type: 'custom', factory: 'net.hydromatic.optiq.impl.csv.CsvSchemaFactory', operand: { directory: 'target/test-classes/sales' } } ]}
![Page 17: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/17.jpg)
More adapters
Adapters Embedded Planned
CSV Cascading (Lingual) HBase (Phoenix)
JDBC Apache Drill Spark
MongoDB Cassandra
Splunk Mondrian
linq4j
![Page 18: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/18.jpg)
SQL on NoSQL
• No schema or schema-on-read
• Optiq MAP columns, dynamic schema
• Nested data (Drill, MongoDB)
• Optiq ARRAY columns
• Source system optimized for transactions & focused reads, not analytic queries
• Optiq cache and materializations
![Page 19: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/19.jpg)
• Mongo’s standard “zips” data set with all US zip codes
• Raw ZIPS table with _MAP column
• ZIPS view extracts named fields, including nested latitude and longitude, and converts them to SQL types
Example #2: MongoDB
![Page 20: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/20.jpg)
Splunk
• NoSQL database
• Every log file in the enterprise
• A single “table”
• A record for every line in every log file
• A column for every field that exists in any log file
• No schema
![Page 21: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/21.jpg)
Example #3:Splunk + MySQL
SELECT p.product_name, COUNT(*) AS cFROM splunk.splunk AS s JOIN mysql.products AS p ON s.product_id = p.product_idWHERE s.action = 'purchase'GROUP BY p.product_nameORDER BY c DESC
![Page 22: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/22.jpg)
Expression tree
![Page 23: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/23.jpg)
Optimized tree
![Page 24: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/24.jpg)
Analytics
![Page 25: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/25.jpg)
Interactive analytics on NoSQL?
• NoSQL operational DB (e.g. HBase, MongoDB, Cassandra)
• Analytic queries aggregate over full scan
• Speed-of-thought response (< 5 seconds)
• Data freshness (< 10 minutes)
![Page 26: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/26.jpg)
Simple analytics problem
• 100M U.S. census records
• 1KB each record, 100GB total
• 4 SATA3 disks, total 1.2GB/s
• How to count all records in under 5s?
![Page 27: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/27.jpg)
Simple analytics problem
• 100M U.S. census records
• 1KB each record, 100GB total
• 4 SATA3 disks, total 1.2GB/s
• How to count all records in under 5s?
![Page 28: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/28.jpg)
Simple analytics problem
• 100M U.S. census records
• 1KB each record, 100GB total
• 4 SATA3 disks, total 1.2GB/s
• How to count all records in under 5s?
• Not possible?! It takes 80s just to read the data.
![Page 29: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/29.jpg)
Solution: Cheat!
![Page 30: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/30.jpg)
Solution: Cheat!
• Compress data
• Column-oriented storage
• Store data in sorted order
• Put data in memory
• Cache previous query results
• Pre-compute (materialize) aggregates
![Page 31: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/31.jpg)
![Page 32: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/32.jpg)
Mondrian on Optiq on NoSQL (under construction)
![Page 33: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/33.jpg)
Hybrid analytic architecture
1. NoSQL source system
2. Access via Optiq SQL
3. Optiq rewrites queries to use materialized data (e.g. aggregate tables)
4. Cached results are treated as “in-memory tables”
5. Materializations offline dynamically as underlying data changes, and go online as they are refreshed
![Page 34: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/34.jpg)
Summary
![Page 35: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/35.jpg)
3 things (reprise)
1. SQL allows you to reorganize your data and optimize your queries—while still using your NoSQL database for what it does best.
2. Optiq is not a database. It lets you create very powerful federated data architectures.
3. Access your data using Optiq adapters. Write schemas in JSON or use schema SPI. Connect via JDBC.
![Page 36: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/36.jpg)
Questions?
![Page 37: SQL Now! How Optiq brings the best of SQL to NoSQL data](https://reader034.vdocuments.us/reader034/viewer/2022052505/5562240cd8b42af6668b4b51/html5/thumbnails/37.jpg)
Thanks!@julianhyde
optiq https://github.com/julianhyde/optiq
optiq-csv https://github.com/julianhyde/optiq-csv
drill http://incubator.apache.org/drill/
lingual http://www.cascading.org/lingual/
mondrian http://mondrian.pentaho.com
blog http://julianhyde.blogspot.com/
book http://manning.com/back/ 45% off code mlnosql13