optimizing mysql without touching sql or my · optimizing mysql without touching sql or my.cnf...
Post on 06-Aug-2020
6 Views
Preview:
TRANSCRIPT
Percona Live, Santa Clara 2017
Optimizing MySQL without touching SQL or my.cnf
Maxim Bublis, Peter Boros
Percona Live, Santa Clara 2017
Percona Live, Santa Clara 2017
We want to be more efficient• Less hardware • Faster responses • Power savings
Percona Live, Santa Clara 2017
OS-level optimizationsSet cpufreq scaling governor to performance
$ echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
Percona Live, Santa Clara 2017
What’s going onstrace and perf are our friends
Percona Live, Santa Clara 2017
TCP Wrapper libraryWritten by Wietse Venema in 1990
Host-based networking ACL
It is NOT the same as MySQL ACL – doesn’t verify credentials
Percona Live, Santa Clara 2017
How it worksHow libwrap works in general – it reads /etc/hosts.allow and /etc/hosts.deny
Verifies that client host can connect to server
Percona Live, Santa Clara 2017
Let’s run strace$ sudo strace -c -p 28894 % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 94.39 56.109363 678 82816 poll 0.82 0.489786 1 328312 read 0.66 0.391561 2 164156 munmap 0.55 0.328845 1 248451 fcntl 0.50 0.298584 2 157468 865 futex 0.49 0.291448 2 164156 open 0.44 0.259529 2 164156 mmap 0.39 0.232314 1 164894 738 setsockopt 0.38 0.228411 1 164156 close 0.35 0.209210 1 164156 fstat 0.32 0.192175 2 82079 rt_sigaction 0.26 0.156935 2 82817 accept 0.18 0.108958 1 82079 getsockname 0.17 0.100658 1 82079 getpeername 0.01 0.003137 3 1045 clone ------ ----------- ----------- --------- --------- ---------------- 100.00 59.446237 2132821 1603 total
Percona Live, Santa Clara 2017
Let’s run strace$ sudo strace -c -p 28894 % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 94.39 56.109363 678 82816 poll 0.82 0.489786 1 328312 read 0.66 0.391561 2 164156 munmap 0.55 0.328845 1 248451 fcntl 0.50 0.298584 2 157468 865 futex 0.49 0.291448 2 164156 open 0.44 0.259529 2 164156 mmap 0.39 0.232314 1 164894 738 setsockopt 0.38 0.228411 1 164156 close 0.35 0.209210 1 164156 fstat 0.32 0.192175 2 82079 rt_sigaction 0.26 0.156935 2 82817 accept 0.18 0.108958 1 82079 getsockname 0.17 0.100658 1 82079 getpeername 0.01 0.003137 3 1045 clone ------ ----------- ----------- --------- --------- ---------------- 100.00 59.446237 2132821 1603 total
Percona Live, Santa Clara 2017
Better to use iptablesMakes tens of thousands context switches per second
• open() • fstat() • read() • close() • etc
We spend 2.5% of system time there
Percona Live, Santa Clara 2017
Fast Mutexes
Percona Live, Santa Clara 2017
Mutex implementation On Linux it is spin lock + futex.
Futex is user-space mutex implementation provided by kernel.
Percona Live, Santa Clara 2017
Fast Mutexes“Fast” mutex
volatile ulong j;
j = 0;
for (i = 0; i < delayloops * 50; i++) j += i;
return(j);
NPTL
do { if (cnt++ >= max_cnt) { LLL_MUTEX_LOCK (mutex); break; } atomic_spin_nop (); } while (LLL_MUTEX_TRYLOCK (mutex) != 0);
Percona Live, Santa Clara 2017
Fast Mutexes“Fast” mutex
Does load+store+add per loop iteration
Number of spins is produced by PRNG
PRNG uses floating-point arithmetics
Utilizes CPU execution port
NPTL
Emits nop / pause instruction
Adaptive number of spins
Percona Live, Santa Clara 2017
Fixed in 5.7.8https://dev.mysql.com/worklog/task/?id=4601
“Pretending that we can implement faster mutexes with such a simple code is justnaive and labeling then as "fast mutex" without any evidence is misleading toour users.”
Percona Live, Santa Clara 2017
sysbenchhttps://github.com/akopytov/sysbench
Scriptable multi-threaded benchmark tool based on LuaJIT
Provides a collection of OLTP-like database benchmarks
Percona Live, Santa Clara 2017
sysbench example$ sysbench \ --num-threads=$num_threads \ --max-time=300 \ --max-requests=0 \ --test=/usr/local/share/sysbench/oltp_read_write.lua \ --mysql-user=sbtest \ --mysql-password=sbtest \ --mysql-host=127.0.0.1 \ --mysql-port=3306 \ --table-size=1000000 \ --tables=64 \ --report-interval=1 \ --auto-inc=off \ run
Percona Live, Santa Clara 2017
CompilersUbuntu 16.04
• GCC 4.9 • GCC 5.4 • Clang 3.8
Percona Live, Santa Clara 2017
Percona Live, Santa Clara 2017
Percona Live, Santa Clara 2017
Profile-guided optimizations• Requires “training data” • Usually requires to rebuild binary • Common mistake: collecting profile data from unit tests
To build “instrumented” binary: -fprofile-generate
To use profiling data: -fprofile-use (GCC additionally requires -fprofile-correction)
Percona Live, Santa Clara 2017
Percona Live, Santa Clara 2017
Percona Live, Santa Clara 2017
AutoFDOhttps://github.com/google/autofdo
Helps to convert perf data into profiling data
Works with both GCC and Clang
Percona Live, Santa Clara 2017
AutoFDOAlmost as efficient as building instrumented binaries
Requires LBR support (last branch record) from both CPU and kernel
Regular perf works well, but works better with perf -b
Percona Live, Santa Clara 2017
Link-time optimizationsAlso known as Full Program Optimization)
Supported by both modern GCC and Clang with -flto option
Percona Live, Santa Clara 2017
Percona Live, Santa Clara 2017
Percona Live, Santa Clara 2017
Percona Live, Santa Clara 2017
Native instruction setsBy default compilers generate code using k8 instruction set for x86_64
It was created and released by AMD in 2000.
-march=native and -mtune=native
Percona Live, Santa Clara 2017
Percona Live, Santa Clara 2017
Percona Live, Santa Clara 2017
PGO + LTOWe can combine it!
Percona Live, Santa Clara 2017
Percona Live, Santa Clara 2017
Percona Live, Santa Clara 2017
PGO + LTO + NativeEven better!
Percona Live, Santa Clara 2017
Percona Live, Santa Clara 2017
Percona Live, Santa Clara 2017
Real world scenarios• Artificial profiling data won’t help much in real world scenarios • Probably will hurt in reality – like slower replication, etc.
Percona Live, Santa Clara 2017
Percona Query Playbackhttps://github.com/Percona-Lab/query-playback
Workload reproduction based on slow log
Percona Live, Santa Clara 2017
Options for capturingslow_query_log_timestamp_precision = microsecond slow_query_log_timestamp_always = 1 log_slow_verbosity = full long_query_time = 0
log_slow_rate_limit = 1
Percona Live, Santa Clara 2017
Slow log event example# Time: 170330 6:25:14.040355 # User@Host: edgestore_user[edgestore_user] @ [127.0.0.1] Id: 10671469 # Schema: stage_edgestore_shard232 Last_errno: 0 Killed: 0 # Query_time: 0.000274 Lock_time: 0.000037 Rows_sent: 0 Rows_examined: 0 Rows_affected: 1 # Bytes_sent: 11 Tmp_tables: 0 Tmp_disk_tables: 0 Tmp_table_sizes: 0 # InnoDB_trx_id: 125F7CBE2A # QC_Hit: No Full_scan: No Full_join: No Tmp_table: No Tmp_table_on_disk: No # Filesort: No Filesort_on_disk: No Merge_passes: 0 # InnoDB_IO_r_ops: 0 InnoDB_IO_r_bytes: 0 InnoDB_IO_r_wait: 0.000000 # InnoDB_rec_lock_wait: 0.000000 InnoDB_queue_wait: 0.000000 # InnoDB_pages_distinct: 5 SET timestamp=1490855114; ACTUAL QUERY SQLTEXT;
Percona Live, Santa Clara 2017
Learn more about Edgestore
26 April - 3:30 PM - 4:20 PM @ Ballroom A
Percona Live, Santa Clara 2017
Query Playback• For every thread id in slow log it opens new connection • For each client connection it enqueues queue_depth number of statements to
replay • Replays those on those connections
Percona Live, Santa Clara 2017
Query Playback issues• Leaks connections • Often runs into lock waits, and seems to do nothing
(Slow log ordering issue) • Output is unreadable, reasonable debugging is next to impossible
Percona Live, Santa Clara 2017
Query Playback enhancements• No more connection leak (preprocessing) • Correct ordering (preprocessing) • Accurate playback mode
Percona Live, Santa Clara 2017
Kudos to Marius Wachtlerhttps://github.com/undingen
Pyston JIT compiler team member
Joined us for a couple of planning cycles
Bicycling from Austria to Australia now
Percona Live, Santa Clara 2017
Example$ percona-playback \
--mysql-user=testuser \
--mysql-host=127.0.0.1 \
--mysql-password=secret \
--query-log-file my.slow.log \
--queue-depth 1 \
--mysql-schema=stage_edgestore_shard232 \
--disable-reporting-plugin error_report \
--ignore-row-result-diffs \
--mysql-max-retries=0
Percona Live, Santa Clara 2017
Accurate mode• Controlled by query-log-accurate-mode option • Use with high queue depth
• In theory ordering issues are still possible, but we didn’t experience any, restoring the timing solves it.
• Keeps the gap between queries if there were any • Query 1 starts at T, query 2 at T + 5 sec
• If Query 1 was 1 second originally, and was played back in 0.8 seconds, playback will wait 4.2 seconds before Query 2
• If Query 1 was 1 second originally, and was played back in 2 seconds, playback will wait 2 seconds before Query 2
• If Query 1 was 1 seconds originally, and was played back in 6 seconds, playback will start Query 2 immediately
Percona Live, Santa Clara 2017
Tips for larger queue depth• “Faster than possible” playback, while ordering is guaranteed
to have issues • The session_init_query option can help here, for example for
reducing innodb_lock_wait_timeout • New mysql_filter_error option to filter out known issues
to a specific workload
Percona Live, Santa Clara 2017
We are hiring!
Percona Live, Santa Clara 2017
top related