Fastest, distributed, in-memory GPU-Accelerated Database
Eric Mizell – VP, Solution Engineering
Evolution of Data Processing
2
DATA WAREHOUSE
RDBMS and Data Warehouse technologies enable organizations to store and analyze growing volumes of data on high performance machines, but at high cost.
DISTRIBUTED STORAGE AFFORDABLE
IN-MEMORYGPU-ACCELERATED COMPUTE
Hadoop and MapReduce enable distributed storage and processing across multiple machines.
Storing massive volumes of data becomes more affordable, but performance is slow.
Affordable memory allows for faster data read and write. HBase, HANA, and MemSQLprovide faster analytics.
At scale compute processing now becomes the bottleneck.
GPU cores bulk process tasks in parallel–far more efficient for compute-intensive tasks than CPUs.
1990 - 2000s 2005… 2010… 2016…
A Cohesive, High-Performance Data Analytics Solution
Massive parallel processingProcess and manage massive amounts of data cost-effectively
In-memory computingDeliver human acceptable response times to complex analytical queries
High performance computingLeverage GPU-accelerated hardware to massively improve performance with better cost-and energy-efficiency
Massive Parallel Processing
(MPP)
In-Memory Computing
HPC Hardware
3
GPU Acceleration Overcomes Processing Bottlenecks
4,000+ cores per device in many cases, versus 8 to 32
cores per typical CPU-based device.
High performance computing trend to using GPUs to solve
massive processing challenges GPU acceleration brings high
performance compute to commodity hardware
Parallel processing is ideal for scanning entire dataset & brute force compute.
GPUs are designed around thousands of small, efficient cores that are well suited to performing repeated similar instructions in parallel. This makes them well-suited to the compute-intensive workloads required of large data sets.
4
Who is Kinetica?20
09
‘HPC Research Project’ incubated by US military
2010
2011
Patent # US8373710 B1 issued to GPUdb
2012
US Army deploys GPUdb
2013
GPUdb commercially available
2014
IDC HPC innovation excellence award
Army
GPUdb goes into production
at USPS
2015
Iron Net selects GPUdb for Cyber
Defense
2015
PG&E selects GPUdb for electric grid
analysis
IDC HPC innovation excellence award
USPS
2016
Rebrand to
5
Reference Architecture
• High speed in-memory database for your most critical enterprise data
• Real-time analytics for streaming data
• Accelerate performance of your existing analytics tools and applications
• Offload expensive relational databases
• Get more value from your Hadoop investment
15
APPLICATION LAYER
OLAP | NoSQL | GEOSPATIAL| APIs | TEXT SEARCH
DATA INGEST
DATA LAKE
TRANSACTIONAL
IoT/StreamProcessing
MessageQueue
BatchFeeds
Parallel Ingestion
• Parallel ingestion of events
• Kinetica is speed layer with real-time analytic capabilities
• HDFS/Object Store/SAN for data lake
• Much looser coupling than traditional streaming architectures
• Batch mode Spark or MapReduce jobs can push data to Kinetica as needed for fast query
EVENTS
MESSAGEBROKERS
Amazon Kinesis
ANALYSTS
MOBILE USERS
DASHBOARDS &APPLICATIONS
ALERTING SYSTEMS
PUT, GET, SCAN
Execute complex analytics on the fly
Kinetica Connectors
STREAMPROCESSING
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
HDFS/Object Store/SAN
Streaming Analytics Simplified
17
Kinetica Architecture
8
VISUALIZATION via ODBC/JDBCAPIs
Java API
JavaScript API
REST API
C++ API
Node.js API
Python API
OPEN SOURCE INTEGRATION
Apache NiFi
Apache Kafka
Apache Spark
Apache Storm
GEOSPATIAL CAPABILITIESGeometric
Objects
Tracks
Geospatial Endpoints
WMS
WKT
KINETICA CLUSTEROn-demand Scale
Commodity Hardwarew/ GPUs
Disk
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
Columnar In-memory
HTTP Head Node
Commodity Hardwarew/ GPUs
Disk
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
Columnar In-memory
HTTP Head Node
Commodity Hardwarew/ GPUs
Disk
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
Columnar In-memory
HTTP Head Node
Commodity Hardwarew/ GPUs
Disk
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
Columnar In-memory
HTTP Head Node
OTHERINTEGRATION
Message Queues
ETL Tools
Streaming Tools
Reliable, Available and Scalable• Disk-based persistence• Data replication for high
availability• Scale up and/or out
Performance• GPU-accelerated (1000s cores
per GPU)• Ingest billions of records per
minute• Ultra low-latency performance
from ingestion through to analytics
Connectors• ODBC/JDBC• Restful endpoints• Open source APIs• Native geospatial capabilities
16
Code Examples/APIs
http://www.gpudb.com/docs/https://github.com/GPUdb
Configuring Connection
Java Connection - Example:import com.gpudb.GPUdb;GPUdb gpudb = new GPUdb("http://localhost:9191");
Python Connection - Example:import gpudb;h_db = gpudb.GPUdb(encoding = 'BINARY', host = '127.0.0.1', port = '9191')
JDBC Connection – Example:String driverClass = "com.simba.client.core.jdbc4.SCJDBC4Driver”;String url = "jdbc:simba://127.0.0.1:9292;URL=http://127.0.0.1:9191;ParentSet=MASTER”;Connection conn = getConnection(driverClass, url);
10
Object Relational MappingWith the API’s (Java, JavaScript, Python, C++), everything is an object
RecordObject Extension - Example:import com.gpudb.RecordObject;import com.gpudb.ColumnProperty;
public class Order extends RecordObject{
@Column(order = 0, properties = { ColumnProperty.PRIMARY_KEY })public int id;
@Column(order = 1)public String name;
@Column(order = 2, properties = { ColumnProperty.TIMESTAMP })public long orderDate;
}
11
Inserting RecordsSingle Insert - Example:
//requires a list for insert (could have multiple rows/don't use this for bulk loading)List <PersonType> list = new ArrayList <PersonType>();list.add(person);InsertRecordsRequest <PersonType> request = new InsertRecordsRequest<PersonType>("person", list, null);gpudb.insertRecords(request);
Bulk Insert - Example:BulkInserter<PersonType> bulkInserter = new BulkInserter<PersonType>(gpudb, tableName, type, batchSize, null);
//flushes to Kinetica DB automatically based on batchSizefor (Integer i = 1; i <= numberRows; i++) {
PersonType record = createPerson();bulkInserter.insert( record );
}
12
Querying RecordsBasic Query – Example (select * from table limit 1000):
GetRecordsResponse<PersonType> response; //provide table name, records offset, number of records to retrieve, optional optionsresponse = gpudb.getRecords(tableName, 0, 1000, null); List<PersonType> responseList = response.getData(); for(PersonType t : responseList){
//do some processing on the records}
Query by Expression – Example (select * from table where range filter limit 5)String expression = "timestamp >= " + startDate + " and timestamp <= " + endDate;//provide table name, view name (for query chaining), expression, optional options (sort column/order)FilterResponse resp = gpudb.filter(tableName, viewName, expression, null);long numResults = resp.getCount();//provide view name, records offset, number of records to retrieve, optional options (sort column/order)GetRecordsResponse getRecordsRsp = gpudb.getRecords( viewName, 0, 5, null );
13
Demo
http://demo.kinetica.com/gaiademo/
Thank You
Geoff Lunsford – [email protected] Mizell – [email protected]