mysql nov 3 cluster perftuning

1Copyright 2009 MySQL – Sun Microsystems. The World’s Most Popular Open Source Database

Performance / Tuning Best Practices -MySQL Cluster

Johan AnderssonPS Practice Manager, MySQL Cluster [email protected]

Mat KeepMySQL Cluster Product Management [email protected]


Agenda

• General Design Concepts

• Schema Optimization

• Index Selection & Tuning

• Query & Parameter Tuning

• Development Tools

• Resources


About MySQL

High Performance ▪▪▪▪ Reliable ▪▪▪▪ Easy to Use

• 14 Years of Development• Acquired by Sun in February 2008• 400+ in Database Group• 750+ Partners• 70K+ Downloads Per Day

Customers across every major operating system, hardware

vendor, geography, industry, and application type


Open-source powers the Web & the Network

Serving Key Markets & Industry Leaders

Enterprise 2.0TelecommunicationsOn Demand, SaaS, Hosting

Web / Web 2.0 OEM / ISV's


MySQL Cluster Users and Applications

HA, Real Time Services: Web & Telecoms

http://www.mysql.com/customers/cluster/

• Subscriber Databases

• Service Delivery Platforms

• Application Servers

• Telecoms Value Added Services

• Web Session Stores

• eCommerce

• VoIP, IPTV & VoD

• Mobile Content Delivery

• On-Line app stores and portals

• DNS/DHCP for Broadband

• Payment Gateways

• Data Store for LDAP Directories


General Design Principles

design considerations• MySQL Cluster is designed for

– Short transactions– Many parallel transactions

• Utilize Simple access patterns to fetch data– Use efficient scans and batching interfaces

• Analyze what your most typical use cases are – optimize for those

Overall design goal Minimize network roundtrips for your

most important requests!



• Application spaces where Cluster is being used heavily– Subscriber databases (telecom)– Session management– Online gaming– Finance– E-commerce– As a Shard catalog in web 2.0 shops

• The key denominator for all these applications are:– high throughput, low response times, high-availability, and

simple access patterns.

• Reporting is typically performed on a subsystem– Replicate to a slave (e.g, MYISAM or INNODB database)



Tuning options• Schema Optimization

– denormalize tables

• Query Optimization– rewrite slow queries– often goes hand in hand with Schema Optimization

• Parameter Tuning– Use a good configuration– www.severalnines.com/config

• Network Tuning / OS Tuning– Tune TCP parameters– Cluster interconnects

• Hardware Tuning– get faster CPU/DISK

Speed

Increase


Detecting Problems – PT 101

• Enable the slow query log!– set global slow_query_log=1;

– set global long_query_time=3; //3 seconds

– set global log_queries_not_using_indexes=1;

– Slow queries will be written in the slow query log:mysql> show global variables like 'slow_query_log_f ile';

+---------------------+---------------------------- --+

| Variable_name | Value |

+---------------------+---------------------------- --+

| slow_query_log_file | /data1/mysql/mysqld-slow.lo g |

+---------------------+---------------------------- --+

1 row in set (0.00 sec)

– Queries will be written in plain text.

• MySQL Enterprise Monitor and Query Analyzer– MEM currently does not monitor data nodes– QUAN is fantastic for spotting problematic queries.


Detecting Problems – PT 101

• Performance tuning is an iterative processBEGIN– Start by analyzing the slow query log

• Change long_query_time if needed

– Use EXPLAIN to figure out if the query is• Using the correct indexes• JOINing the tables in the wrong order• so bad it needs to be rewritten.

– Re-run the optimized typical use cases using mysqlslap

– GOTO BEGIN;

END;• Yes, Performance tuning is a never-ending task.• Never tune unless you can measure and test• Don't optimize unless you have a problem


Simple Access Patterns

• Simple Access Patterns are key to build scalable and high performing solutions (this is not subject to Cluster only)

• PRIMARY KEY lookups are done in constant time O(1)– Fastest way to access data in MySQL Cluster

• INDEX searches are done in O(log n) time.• JOINs are ok if you understand what can make them

slow.– If your most important requests are 10-way JOINs with huge

result sets then Cluster may not be for you.• Or use scale out (write to cluster read from innodb):

http://johanandersson.blogspot.com/2009/05/ha-mysql-write-scaling-using-cluster-to.html


Primary Key Operations

• Cost of typical PK operations (depends on HW/Network)

• Synchronous replication adds ~2.7x for writes compared to reads

• Test was with 8 threads connecting to one mysqld• 'bencher' was used to generate the load.

PK Operations Avg cost (us) Min cost (us) Normalized ( scalar)

Insert 4B + 64B 768 580 2.74

read 64B 280 178 1

update 64B 676 491 2.41

Insert 4B + 255B 826 600 2.82read 255B 293 174 1

update 255B 697 505 2.38delete 636 202 2.17


Batching

• MySQL Cluster allows batching on– Inserts, index scans (when not part of a JOIN), PK reads, PK

deletes, and PK (most) updates.– Batching means that one network round trip is used to

read/modify a number of records → less ping-pong!– If you can batch - do it!

• Example – Insert 1M records– No batching:

• INSERT INTO t1(data) VALUES (<data>);

• 765 seconds to insert 1M records

– Batching (batches of 16):• INSERT INTO t1(<columns>) VALUES (<data>),

(<data>)...

• 50 seconds to insert 1M records


Batching

• Example – READ 10 records (get 10 services for a user)– No batching:

• 10 x SELECT * FROM t1 WHERE userid=1 AND serviceid= { id }

mysqlslap --create-schema=test -c8 -i 1000 -q singl e.txt

Average number of seconds to run all queries: 0.006 seconds

– Batching (batches of 10):• SELECT * FROM t1 WHERE userid=1 AND serviceid IN

(1,2,3,4,5,7,8,9,10);

mysqlslap --create-schema=test -c8 -i 1000 -q batch .txt


– Batching compared to Efficient index scan:• SELECT * FROM t1 WHERE userid=1


– Batching compared to index scan:• SELECT * FROM t1 WHERE userid=1



Batching

• Another way – batching on different tables• transaction_allow_batching can be used:SET transaction_allow_batching=1; //must be set on the connection

BEGIN;INSERT INTO t1 ....;

INSERT INTO t2 ....;

INSERT INTO t3 ....;

INSERT INTO t4 ....;DELETE FROM t5 ....;UPDATE t1 SET value='new value' WHERE id=1;

COMMIT;

– The above will be executed in one batch (one roundtrip)

• Batching using transaction_allow_batching does not work with:– SELECTs..– UPDATE .. SET counter=counter+1 - type of updates..


Efficient Index scan - Partition Pruning

Scanning only one partition is sometimes better than scanning all partitions (all nodes).

By default, all index scans hit all data nodes – good if big result set.

User-defined partitioning can help to improve equality index scans on part of a primary key.

CREATE TABLE t1 (userid, serviceid, data, PRIMARY KEY(userid, serviceid))PARTITION BY KEY(userid);

All data belonging to a particular userid will be on the same partition.

select * from user where userid=1;

Only one data node will be scanned (no matter how many nodes you have)


Efficient Index Scans - Partition Pruning

• You can verify if you got it correct:mysql> show global status like 'ndb_pruned_scan_cou nt';

+-----------------------+-------+


+-----------------------+-------+

| Ndb_pruned_scan_count | 0 |

+-----------------------+-------+


• select * from user where userid=1;mysql> show global status like 'ndb_pruned_scan_cou nt';

+-----------------------+-------+


+-----------------------+-------+

| Ndb_pruned_scan_count | 1 |

+-----------------------+-------+


PARTITION BY KEY (userid)

Increases if Partition Pruningwas possible


Efficient Index Scans - Partition Pruning

• Partition Pruning is better up to a certain point– Depends on number of data nodes and records retrieved

(shorter bars are better)

16 32 64 1280

500

1000

1500

2000Two data nodes

Non-partitioned Partitioned

Number of records

mic

rose

cond

s

16 32 64 1280

500

1000

1500

2000Four data nodes

Non-partitioned Partitioned

Number of records

mic

rose

cond

s


BLOBs/TEXTs vs VARBINARY/VARCHAR

• BLOB/TEXT columns are stored in an external hidden table.– First 255B are stored inline in main table– Reading a BLOB/TEXT requires two reads

• One for reading the Main table + reading from hidden table

• Reading/Writing a VARCHAR/VARBINARY is less expensive.

• Change to VARBINARY/VARCHAR if:– Your BLOB/TEXTs can fit within an 8052B record– (record size is currently 8052 Bytes)

Note 1: BLOB/TEXT are also more expensive in Innodb as BLOB/TEXT data is not inlined with the table. Thus, two disk seeks are needed to read a BLOB.

Note 2: We recommend (for any storage engine) to store images, movies etc outside the database on the filesystem.


Schema Optimization -Denormalization

• Two tables with the same PRIMARY KEY can be denormalized into a single table:

– Requires two roundtrips to get data• Denormalize:

USERID VOIP_DATA

1 <data>

2 <data>

3 <data>

4 <data>

USERID BB_DATA

1 <data>

2 <data>

3 <data>

4 <data>

USER_SVC_VOIP USER_SVC_BROADBAND

USERID VOIP_DATA BB_DATA

1 <data> <data>

2 <data> <data>

3 <data> <data>

4 <data> <data>

USER_SVC_VOIP_ BB


Schema Optimization - Denormalization

• Normalized:– SELECT * FROM

USER_SVC_BROADBAND AS bb, USER_SVC_VOIP AS voip WHERE bb.id=voip.id AND bb.id=1 ;

– Total throughput = 12623.09 tps– Average response time=658us

• Denormalized:– SELECT * FROM USER_SVC_VOIP_BB AS bb_voip

WHERE bb_voip=1;

– Total throughput = 21591.64 tps– Average response time=371us


Query Optimization

• JOINs are executed in the MySQL server.• The OPTIMIZER in MYSQL only knows one algorithm

– Nested Loop Join– This algorithm is not brilliant in its effectiveness

• If we have the following query:– SELECT fname, lname, title

FROM a,b WHERE b.id=a.id AND a.country='France';

Authid (PK) Frame Iname Country

1 Albert Camus France

2 Sully Prudhomme France

3 Johann Goethe Germany

4 Junichiro Tanizaki Japan

Authid (PK) ISBN (pk) title

1 1111 La Peste

1 1112 La Chute

1 1113 Caligula

2 2111 La France

Author a AuthorBook b


Query Optimization

• SELECT fname, lname, title FROM a,b WHERE b.id=a.id AND a.country='France';

• Index scan left table to find matches. • For each match in 'a', find matches in 'b'

– In this an index scan on the right table on b.id for each matching record in 'a'

– This could be very expensive if there are many records matching a.country='France'







1 1111 La Peste

1 1112 La Chute

1 1113 Caligula

2 2111 La France



Query Optimization

• JOINs can be expensive if:– We have many matches in the “left” table .

• E.g, we have 1000 matches in left table

– Each match in left table matches in “right” table with Index Scans

• Primary Key joins are cheaper – but perhaps you can de-normalize table with the same PK (if you can you should).

• E.g, for each match in 'left' table we have 1000 matches in the right table → we now have to read up 1000x1000 records, in 1000 network roundtrips!

– If each match in <a,b> then matches 1000 in a third table <c>...







1 1111 La Peste

1 1112 La Chute

1 1113 Caligula

2 2111 La France



Query Optimization

• Which of my book shops stock books from France?– SELECT c.id FROM a,b,c WHERE

c.isbn=b.isbn AND b.id=a.id AND a.country='France' ANDc.count>0 ;

• Continue from previous slide… for each ISBN in AuthorBook we have to JOIN on ISBN in Bookshop. – Each record found in AuthorBook renders an index scan on

Bookshop (so in this example 4 more network hops)– Index scans will be executed sequentially...– This query can perform really slow as network hops cost.

• Depending on the requirements this may be ok or not ok.







1 1111 La Peste

1 1112 La Chute

1 1113 Caligula

2 2111 La France


id(pk) count

1 1111 0

1 1112 2

1 1113 3

2 2111 12

2 1111 5

..

10000 1111 2

ISBN (pk,key)

Bookshop c


Query Optimization

• In real life...– A user does not have to view all email threads but perhaps

only be presented with 10 at a time.– A subscriber does not have 1M telecom services associated– An author does not write 1M books

• It is easier to optimize if we know the boundaries.• Careful analysis of this problem reveals:

– Authors in our database have written an ending amount of books. It is not N books!

– In this case an author writes max 64 books

• Can we do better with this information?


Query Optimization• Denormalization could be an option:

• SELECT ISBN_LIST FROM a WHERE a.country='France'• SELECT id FROM c WHERE c.isbn in (<ISBN_LIST>);

– Why better?• Multi read range scan: one index scan takes all values in

<isbn_list> instead of one index scan for each ISBN (many roundtrips)!

• We are down at two roundtrips for this query!• Let's implement this as a SPROC and compare!

Authid (PK) Frame Iname Country ISBN_LIST

1 Albert Camus France <64 isbns>

2 Sully Prudhomme France <64 isbns>



ISBN (pk) title

1111 La Peste

1112 La Chute

1113 Caligula

2111 La France

Author2 a

isbn_title b

id(pk) count

1 1111 0

1 1112 2

1 1113 3

2 2111 12

2 1111 5

..

10000 1111 2

ISBN (pk,key)

Bookshop c


Query Optimization

• SPROC with dynamic SQL:CREATE PROCEDURE ab(IN c VARCHAR(255))

BEGIN

SELECT @list:=isbn FROM author2 WHERE country=c;

SET @s = CONCAT("SELECT DISTINCT id

FROM bookshop WHERE count>0 ANDisbn IN (", @list, ");");

PREPARE stmt FROM @s;

EXECUTE stmt;

END


Query Optimization

• Normal querymysqlslap --create-schema=test

-q "SELECT DISTINCT id FROM author a, authorbook b, bookshop c WHERE b.authid=a.authid AND

a.country='France' AND b.isbn=c.isbn AND c.count>0" -i 1000 -c 8


• SPROCmysqlslap --create-schema=test -q "call ab('France' )" -i 1000 -c 8


• 4.5X faster – but updates are more difficult etc..


Query Optimization - Cluster Interconnects

• Cluster Interconnects– Instead of spending $$$ on application development

• DX (SCI) is a Cluster Interconnect offering:– High Bandwidth ( 20 Gb/sec full duplex), Low latency (<2us)– Offers a socket interface – any socket based application

benefits from it– 10 Gig-E form factor on cabling– Seamless fallback to Gig-E

– >2x more performance just plugging it in.

• Products– DXH510 PCI Express Host Adapter (600USD list price Oct

2008)– DXS410 DX 10 Port Switch (4200USD list price Oct 2008)– Read more at http://www.dolphinics.com


Query Optimization - Cluster Interconnects

• With SCI...LD_PRELOAD=/opt/DIS/lib64/libksupersockets.so mysq lslap

--create-schema=test -q "SELECT DISTINCT id FROM author a, authorbook b, bookshop c WHERE b.authid=a.authid AND

a.country='France' AND b.isbn=c.isbn AND c.count>0" -i 1000 -c 8

Benchmark


Minimum number of seconds to run all queries: 0.020 seconds

Maximum number of seconds to run all queries: 0.023 seconds

Number of clients running queries: 8

Average number of queries per client: 1

• 2.5x faster that 'Normal'


Query Optimization - considerations

• The main performance limiter for JOINs are– Number of tables in JOIN – Number of Records matching the JOIN criteria

• In general – JOINs are limiting scalability even for INNODB/MYISAM

• It is a complex access pattern

– JOINs should be as simple as possible when used in the Realtime part of the application:

• WHERE – conditions should be as limiting as possible• 2-way/3-way inspecting ~2000 records in each table is usually

no problem if well indexed and well written query.• N-way JOINs with PK/Index scan matching a couple of records in

each table is also ok if well indexed (N<10)• N-way (N>1) JOINs matching > 5000 records will likely not be

able to be executed within 5ms limit.

– Using SCI/DX can help a lot as JOINs are subject to network latency that is problematic.


Query Tuning

• Don't trust the OPTIMIZER!– Statistics gathering is very bad– Optimizer thinks there are only 10 rows to examine in each

table!

• If you have two similar indexes:– index(a)– index(a,ts)

on the following tableCREATE TABLE `t1`

(

ìd` int(11) NOT NULL AUTO_INCREMENT,

à` bigint(20) DEFAULT NULL,

`ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,

PRIMARY KEY (ìd`),

KEY ìdx_t1_a` (à`),

KEY ìdx_t1_a_ts` (à`,`ts`)

) ENGINE=ndbcluster DEFAULT CHARSET=latin1


Ndb_cluster_connection_pool

• Problem: – A mutex on the connection from the mysqld to the data nodes

prevents scalability.– Many threads → contention on the mutex– Must have many mysqld processes running...

• Solution:– Ndb_cluster_connection_pool (in my.cnf) creates more connections

from one mysqld to the data nodes• One free [mysqld] slot is required in config.ini for each connection.

– Threads load balance on the connections→ less contention on mutex → increased scalabilty

– Less mysqlds needed to drive load!– www.severalnines.com/config allows you to specify the connection

pool.


Ndb_cluster_connection_pool

• Gives atleast 70% better performance (>150% has been seen).

• How to set Ndb_cluster_connection_pool depends on how many CPU cores you have for the mysqld.– Ndb_cluster_connection_pool=2x<CPU cores> is a good

starting point.

1 2 4 8 16 320

10000

20000

30000

40000Ndb_cluster_connection_pool

Ndb_cluster_connection_pool=1 (default)Ndb_cluster_connection_pool=32

Number of threads

Tra

nsac

tion

per

seco

nd


Other things

• Statistics gathering for MySQL Cluster is bad– ndb_use_exact_count=0

– ndb_index_stat_enable=1

• Enable these actually cost more – statistics are fetched before each query is executed. Costly.

– You can enable this on a single test mysqld server dedicated to testing.

• Understanding how many round trips there is in a query:– Ndb_execute_count in SHOW GLOBAL STATUS;– Shows how many times mysqld has called Execute to the data

nodes. – Execute means “send”. Each Execute is a round trip.


Other things

• Query cache is most often useless with MySQL Cluster– Invalidating the cache is expensive– Invalidation must happen on all mysql servers– Disable it

• Ndb_auto_increment_prefetch_size– Specifies how big range the mysqld should cache before

asking Cluster for the next range.– Default is Ndb_auto_increment_prefetch_size=1

• When inserting it means one roundtrip down to cluster before each INSERT!

– Set it to 256 (Severalnines Configurator does this for you)• Mysqld will have 256 auto_incs to use before fetching more from

the data nodes.


Other things

• Make sure you never:– Run in SWAP – the data nodes will be sub-performing and

you will have an unstable system.

• Make sure you– Lock data nodes threads to CPUs not handling interrupts for

ETH.– On SUN CMT (T5240 etc) it is very important to create

processor sets and bind interrupt handling to particular CPUs


Tools

• Third party tools at www.severalnines.com– Configurator

• Uses best practices for setting up a good config.ini and my.cnf • Scripts to control cluster from one single location.

– CMON - Cluster Monitor• Monitor X number of Clusters (and mysqld statistics) from a

single web interface (to be released 8th of November)


“MySQL Cluster won the performance test hands-down, and it fitted our needs perfectly.We evaluated shared-disk clustered databases, but the cost would have been at least 10x more.”

— François Leygues, Systems Manager, Alcatel-Lucent

• Applications• Infrastructure for next-generation

converged services including subscriber data management (HLR, HSS) and the XDMS contacts and presence management SDP

• Key business benefits• Only MySQL Cluster CGE offers

the scalability and the high availability required for the system

• Why MySQL? • Performance

• Reliability

• Lower costs

Alcatel-Lucent: Real-Time Platform

41

http://mysql.com/customers/view/?id=566


“ Since deploying MySQL Cluster as our eCommerce database, we have had continuous uptime with linear scalability enabling us to exceed our most stringent SLAs”

— Sean Collier, CIO & COO, Shopatron Inc

• Applications• ECommerce back-end, user authentication,

order data & fulfilment, payment data & inventory tracking. Supports several thousand queries per second

• Key business benefits• Scale quickly and at low cost to meet demand

• Self-healing architecture, reducing TCO

• Why MySQL? • Low cost scalability

• High read and write throughput

• Extreme availability

Shopatron: eCommerce Platform

42


Resoures to Get Started

• Getting Started with MySQL Cluster– http://www.mysql.com/products/database/cluster/get-started.html

• Severalnines Tool Set

– http://www.severalnines.com/

• MySQL Cluster Evaluation Guide

– http://www.mysql.com/why-mysql/white-papers/mysql_cluster_eval_guide.php

• MySQL Cluster Benchmark Results

– http://www.mysql.com/why-mysql/benchmarks/mysql-cluster/


Questions ?

Johan Andersson

PS Practice Manager, MySQL Cluster [email protected]

Mat Keep

MySQL Cluster Product Management [email protected]


The most cost-effective open source relational database for real-time, carrier-grade data management requirements

mysql nov 3 cluster perftuning

Documents

slow query log

slow queries

tables query optimization

web telecomshttp

hostingweb web

global log

thoseoverall design

application type4copyright