![Page 1: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/1.jpg)
©Continuent 2013
Tungsten University: Load a Vertica Data
Warehouse with MySQL DataRobert Hodges
CEO, Continuent
![Page 2: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/2.jpg)
©Continuent 2013
Introducing Continuent
2
• The leading provider of clustering and replication for open source DBMS
• Our Product: Continuent Tungsten
• Clustering - Commercial-grade HA, performance scaling and data management for MySQL
• Replication - Flexible, high-performance data movement
![Page 3: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/3.jpg)
©Continuent 2013
OLTP and Data Warehouse Fundamentals
3
![Page 4: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/4.jpg)
©Continuent 2013
The Contenders
4
Popular open source RDBMS for transaction
processing
Popular closed source RDBMS
for analytics
![Page 5: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/5.jpg)
©Continuent 2013
Storage Layout in MySQL
5
id cust_id prod_id ...
1 335301 532 ...
2 2378 6235 ...
3 ... ... ...
Sales Table
id sku type
532 C00135 consumer
533 S09957 specialty
... ...
Product Tableprod_id id
532 1
6235 2
... ...
Prod_ID Index
Row format makes table scans very
slow
Indexes slow OLTP
Low/no data compression
Limited indextypes
Limited join
types
![Page 6: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/6.jpg)
©Continuent 2013
Storage Layout in Vertica
6
Sales Table
cust_id
335301
2378
...
prod_id
532
6235
...
Fast scans on columns
Updates to single rows are
hideously slow
quantity
1
3
...
id
1
2
3
Every column is an index
Good compression
id
532
533
...
sku
C00135
S09957
...
type
consumer
specialty
...
Product Table
Fast joins with parallel
query
![Page 7: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/7.jpg)
©Continuent 2013
Traditional ETL Problems
7
MySQL
SalesTable
SalesTable
LoadTransferExtract
Date columns = intrusive
Batch-oriented = not timely
Scan for changes = performance hit
![Page 8: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/8.jpg)
©Continuent 2013
Questions for Real-Time Loading
• Do I need to transform data and if so how?
• Do I need to clean up bad information?
• Do I need to process UPDATE/DELETE too?
• Do I need to load from multiple sources?
• How timely do loads need to be?
• What if something fails?
8
![Page 9: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/9.jpg)
©Continuent 2013
Tungsten Replicator Basics
9
![Page 10: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/10.jpg)
©Continuent 2013
Real-Time Data Replication
10
MySQL
SalesTable
SalesTable
Fast propagation = timely
No SQL changes = transparent
Automatic change capture = low impact
DBMSLogs
Data Replication
![Page 11: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/11.jpg)
©Continuent 2013
Tungsten Master/Slave in Action
11
Master
(Transactions + Metadata)
Slave
THL
DBMSLogs
Replicator
(Transactions + Metadata)
THLReplicator
Download transactions via network
Apply using JDBC
![Page 12: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/12.jpg)
©Continuent 2013
Pipelines with Parallel Apply
12
Extract Filter Apply
StageExtract Filter Apply
StageStage
Pipeline
RemoteMaster
TransactionHistory Log
ParallelQueue
SlaveDBMS
Extract Filter ApplyExtract Filter ApplyExtract Filter Apply
(Assign Shard ID)
![Page 13: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/13.jpg)
©Continuent 2013
Real-Time Batch Loading
13
MySQL Tungsten Master Replicator
Service my2vr
MySQLExtractorSpecial Filters* pkey - Fill in pkey info* colnames - Fill in names* replicate - Ignore tables
binlog_format=row
Tungsten Slave Replicator
Service my2vr
MySQLBinlog
CSVFilesCSVFilesCSVFilesCSVFilesCSVFiles
Large transaction batches to leverage load parallelization
Single transactions from OLTP operations
![Page 14: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/14.jpg)
©Continuent 2013
Batch Loading--The Gory Details
14
Replicator
Service my2vrTransactions from master
CSVFilesCSVFilesCSVFiles
StagingTablesStagingTablesStagingTables
Base Tables
Base Tables
Base Tables
Merge Script
(or)COPY
directly to base tables
COPY to stage tables SELECT to
base tables
![Page 15: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/15.jpg)
©Continuent 2013
Setting Up MySQL to Vertica Replication
15
![Page 16: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/16.jpg)
©Continuent 2013
DEMO
16
MySQL to Vertica replication with some bells and a whistle
MySQL
db01db02db03
db01renamed02
Xsysbenchsysbenchsysbench
![Page 17: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/17.jpg)
©Continuent 2013
Get the Code
wget --no-check-certificate https://s3.amazonaws.com/files.continuent.com/builds/nightly/tungsten-2.0-snapshots/tungsten-replicator-2.1.0-285.tar.gz
tar -xf tungsten-replicator-2.1.0-285.tar.gz
cd tungsten-replicator-2.1.0-285
17
![Page 18: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/18.jpg)
©Continuent 2013
Installing MySQL Master
18
tools/tungsten-installer --master-slave -a \ --service-name=mysql2vertica \ --master-host=mysql1 \ --cluster-hosts=mysql1 \ --datasource-user=tungsten \ --datasource-password=secret \ --home-directory=/opt/continuent \ --buffer-size=100 \ --java-file-encoding=UTF8 \ --java-user-timezone=GMT \ --mysql-use-bytes-for-string=false \ --svc-extractor-filters=replicate,colnames,pkey \ --property=replicator.filter.pkey.addPkeyToInserts=true \ --property=replicator.filter.pkey.addColumnsToDeletes=true \ --property=replicator.filter.replicate.do=db01.*,db02.* \ --start-and-report
![Page 19: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/19.jpg)
©Continuent 2013
Installing Vertica Slave
19
$ tools/tungsten-installer --master-slave -a \ --service-name=mysql2vertica \ --home-directory=/opt/continuent \ --cluster-hosts=vertica1 \ --master-host=mysql1 \ --datasource-type=vertica \ --datasource-user=dbadmin \ --datasource-password=secret \ --datasource-port=5433 \ --batch-enabled=true --batch-load-template=vertica6 \ --vertica-dbname=bigdata \ --java-user-timezone=GMT \ --java-file-encoding=UTF8 \ --svc-applier-filters=dbtransform \ --property=replicator.filter.dbtransform.from_regex1=db02 \ --property=replicator.filter.dbtransform.to_regex1=renamed02 \ --property=replicator.stage.q-to-dbms.blockCommitRowCount=25000 \ --start-and-report
![Page 20: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/20.jpg)
©Continuent 2013
Generate Schema Using ddlscan
20
•Data types?•Column lengths?•Naming conventions?•Staging tables?
MySQL Tables
ddlscan
![Page 21: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/21.jpg)
©Continuent 2013
Tungsten ddlscan Utility
cd /opt/continuent/tungsten/tungsten-replicator/bin
# Base table generation../ddlscan -template ddl-mysql-vertica.vm \ -db db01 -user tungsten -pass secret >> ddl.sql
# Staging table generation./ddlscan -template ddl-mysql-vertica-staging.vm \ -db db01 -user tungsten -pass secret >> ddl.sql
# Load into Verticavsql -Udbadmin -wsecret < ddl.sql
21
![Page 22: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/22.jpg)
©Continuent 2013
Checking Status
# Checking status on mastertrepctl -host logos1 heartbeattrepctl -host logos1 status
# Checking status on slavetrepctl -host vertica1 status
# Checking detailed performance of apply task. trepctl -host vertica1 status -name tasks
22
![Page 23: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/23.jpg)
©Continuent 2013
Application Tips and Tricks
23
![Page 24: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/24.jpg)
©Continuent 2013
Application Design Practices
24
• Primary keys on all tables
• (Tungsten requires single column keys)
• Clean schema design *really* helps
• UTF-8 character set--or at least be consistent
• Use GMT timezone--or be very consistent about dates
• Use row replication on MySQL master
![Page 25: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/25.jpg)
©Continuent 2013
Transforming Data -- Replicator Filters
25
• Tables to ignore/include?
• Schema/table/column renaming?
• Map names to upper/lower case?
• Drop data?
tungsten-installer --master-slave -a \ --service-name=mysql2vertica \ ... --svc-extractor-filters=pkey,colnames,replicate \ --property=replicator.filter.replicate.do=db01.*,db02.*\ ...
![Page 26: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/26.jpg)
©Continuent 2013
List of Commonly Used Filters
26
• CDC -- Transform log to record of changes
• colnames -- Add column names
• dbtransform -- Change db name only
• enumtostring -- Make MySQL enums a string
• pkey -- Add primary key metadata
• rename -- Rename db/table/column
• replicate -- Replicate/don’t replicate tables
• zerodate2null -- Make MySQL ‘0’ dates null
![Page 27: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/27.jpg)
©Continuent 2013
Transforming Data -- Staging Server(s)
27
OLTP Servers
StagingServer with Triggers/SQL
Vertica Cluster
![Page 28: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/28.jpg)
©Continuent 2013
Transforming Data -- Merge Script Hacks
28
# Hacked load script for Vertica--deletes always precede inserts, so# inserts can load directly.
# Extract deleted data keys and put in temp CSV file for deletes. !egrep '^"D",' %%CSV_FILE%% |cut -d, -f4 > %%CSV_FILE%%.deleteCOPY %%STAGE_TABLE_FQN%% FROM '%%CSV_FILE%%.delete' DIRECT NULL 'null' DELIMITER ',' ENCLOSED BY '"'
# Delete rows using an IN clause. You could also set a column value to # mark deleted rows. DELETE FROM %%BASE_TABLE%% WHERE %%BASE_PKEY%% IN (SELECT %%STAGE_PKEY%% FROM %%STAGE_TABLE_FQN%%)
# Load inserts directly into base table from a separate CSV file. !egrep '^"I",' %%CSV_FILE%% |cut -d, -f4- > %%CSV_FILE%%.insert
COPY %%BASE_TABLE%% FROM '%%CSV_FILE%%.insert' DIRECT NULL 'null' DELIMITER ',' ENCLOSED BY '"'
![Page 29: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/29.jpg)
©Continuent 2013
Provisioning -- Using CSV
29
mysql> SELECT * from sales INTO OUTFILE ‘sales.csv’;...(Fix up data if necessary)...vsql> COPY sales FROM 'sales.csv' DIRECT NULL 'null' DELIMITER ',' ENCLOSED BY '"';
![Page 30: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/30.jpg)
©Continuent 2013
Provisioning Using a Sandbox Server
30
OLTP Server
Temporary Sandbox Server
Vertica Cluster
1. Restore logical backup
2. Replicate restored transactions
3. Replicate normally after restore loads
![Page 31: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/31.jpg)
©Continuent 2013
Parallel Provisioning from Sandbox
31
OLTP Server
Temporary Sandbox Server
Vertica Cluster
1. Restore logical backup
2. Replicate restored data in parallel
3. Replicate normally after restore loads
![Page 32: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/32.jpg)
©Continuent 2013
Complex Topologies: Fan-In
32
VerticaCluster
logos1
Master
logos2
Master
logos2
SlaveServices
logos1
![Page 33: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/33.jpg)
©Continuent 2013
Wrapping Up
33
![Page 34: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/34.jpg)
©Continuent 2013
Tungsten University Sessions
34
• Load a Vertica Data Warehouse with MySQL Data (May 30 10am PDT and June 4, 4pm CEST)
Send feedback to: [email protected]
![Page 35: Tungsten University: Load A Vertica Data Warehouse With MySQL Data](https://reader034.vdocuments.us/reader034/viewer/2022052601/558bdee0d8b42aee458b457d/html5/thumbnails/35.jpg)
©Continuent 2012.
Continuent Web Page:http://www.continuent.com
Tungsten Replicator 2.0:http://code.google.com/p/tungsten-replicator
Our Blogs:http://scale-out-blog.blogspot.comhttp://!yingclusters.blogspot.comhttp://datacharmer.org/bloghttp://www.continuent.com/news/blogs
560 S. Winchester Blvd., Suite 500 San Jose, CA 95128 Tel +1 (866) 998-3642 Fax +1 (408) 668-1009e-mail: [email protected]