© 2007 ibm corporation ibm – informix dynamic server cheetah - agile & fast performance...
TRANSCRIPT
© 2007 IBM Corporation
IBM – Informix Dynamic Server
Cheetah - Agile & Fast Performance enhancements
Slide 2
IBM Informix Dynamic Server
© 2007 IBM Corporation
Agenda
► Non-Blocking Checkpoints
► Automatic Checkpoints
► Recovery Time Objective
► Automatic LRU Tuning
► Automatic AIO VP Tuning
► Support for Direct I/O
© 2007 IBM Corporation
IBM – Informix Dynamic Server
Cheetah Checkpoint Improvements
Slide 4
IBM Informix Dynamic Server
© 2007 IBM Corporation
What is a checkpoint?
► A checkpoint is a point in time where cached data (bufferpool) is flushed to disk to create a consistency point for fast recovery, backups, HDR…
Slide 5
IBM Informix Dynamic Server
© 2007 IBM Corporation
What is an LRU?
► The LRU are queues used to manage the bufferpool
► An LRU is comprised of 2 lists■ MLRU
• Tracking modified pages in the queue
■ FLRU• Tracking free or unmodified pages in the queue
Slide 6
IBM Informix Dynamic Server
© 2007 IBM Corporation
Existing characteristics of Checkpoints
►Significant transaction blocking, even fuzzy checkpoints
►Fuzzy checkpoints
■ Unpredictable checkpoint processing time
■ Unpredicatable recovery time
Slide 7
IBM Informix Dynamic Server
© 2007 IBM Corporation
Existing characteristics of Checkpoints
►Checkpoint tuning vs OLTP tuning.■ Tune LRU very aggresive
• causes constant flushing of the buffer pool• Reduces the write cache• flushers consuming CPU cycles• Increases buffer contention
■ Tune LRU less aggressive• checkpoints were longer• transactions blocked for longer periods• longer disaster recovery time
►Wasn’t easy to figure out optimal tuning.
© 2007 IBM Corporation
IBM – Informix Dynamic Server
Non-Blocking Checkpoints
Slide 9
IBM Informix Dynamic Server
© 2007 IBM Corporation
Non Blocking Checkpoints
►Most checkpoints do not block transactions during buffer flushing.
►Exceptions….■ Checkpoint running short on resources
• Physical log 75%• At least one checkpoint per logical log space.
■ Admin, archive checkpoints
►Fuzzy checkpoint completely removed►Phase A recovery has been removed►Physical logging activity to 7.3 amounts
• Will need to increase size of physical log!!
Slide 10
IBM Informix Dynamic Server
© 2007 IBM Corporation
Benefits of Non-Blocking Checkpoints?
►Transaction processing continues during the disk flush portion of checkpoint processing
►Allows LRU flushing to be relaxed■ Dramatic transaction performance
improvement.
►More Frequent checkpoints■ Shortens fast recovery
Slide 11
IBM Informix Dynamic Server
© 2007 IBM Corporation
Interval checkpoint
Slide 12
IBM Informix Dynamic Server
© 2007 IBM Corporation
Recommendations
►Increase LRUMIN and LRUMAX to at least 60 and 70
►Make sure the physical log is large■ Move Online
■ Can be larger than 2GB
►Make sure the logical logs space is large
►Check new onstat –g ckp
Slide 13
IBM Informix Dynamic Server
© 2007 IBM Corporation
What do you do if Checkpoint Block?
►Use automatic checkpoint feature
■ The server will automatically trigger checkpoints basing on resources remaining.
►Increase the size of physical/logical log
■ The server will suggest which resource to increase and what size it should be
►Make LRU flushing more aggressive
►Increase I/O performance■ More AIO VPs and cleaners
■ Improve performance of I/O subsystem
© 2007 IBM Corporation
IBM – Informix Dynamic Server
Automatic Checkpoints
Slide 15
IBM Informix Dynamic Server
© 2007 IBM Corporation
Automatic Checkpoints
►If potential transaction blocking detected
►Caliculation based on..■ Physical, logical logs usage
■ Buffer flush speed
■ Transaction throughput
►To help Automatic Checkpoints■ Increase the physical log size
■ Increase the logical log size
■ Increase LRU flushing (Use automatic LRU Tunning)The server will make suggestions when resources are lackingMonitor online.log and onstat –g ckp
Slide 16
IBM Informix Dynamic Server
© 2007 IBM Corporation
Automatic Checkpoints
►Default is always on►onmode –wm AUTO_CKPTS=0 … turn
off►onmode –wm AUTO_CKPTS=1 … turn
on
Slide 17
IBM Informix Dynamic Server
© 2007 IBM Corporation
Checkpoint Performance Advisory
► During checkpoint IDS will evaluate checkpoint related configuration parameters and produce a performance advisory if they are not optimal setting to avoid transaction blocking.
► Performance Advisory is in the second part of onstat –g ckp output and in online.log
► Configuration parameters evaluated at checkpoint:■ PHYSFILE
■ PHYSBUFF
■ LOGBUFF
■ LOGFILES and LOGSIZE
Slide 18
IBM Informix Dynamic Server
© 2007 IBM Corporation
PHYSFILE – Physical log Size
► 110% of the combined size of all bufferpools for optimum performance
► Enables fast recovery to use all bufferpool resources
► Depends on transactional workload and speed of the disks
Slide 19
IBM Informix Dynamic Server
© 2007 IBM Corporation
PHYSBUFF - Physical buffer size
► With RTO_SERVER_RESTART off, default value is 128KB
► With RTO_SERVER_RESTART on, default value is 512 KB
► If a smaller value is used, a message appears in the online.log.
Slide 20
IBM Informix Dynamic Server
© 2007 IBM Corporation
Checkpoint Performance Advisory – Physical log
► During checkpoint processing potential physical log overflow is detected.
Performance advisory: Physical log is running out of room. Results: Blocking transactions until checkpoint is complete.Action: Increase physical log size.
Slide 21
IBM Informix Dynamic Server
© 2007 IBM Corporation
Physical log and automatic checkpoints ON
► If the physical log is less than 10MB (10000KB) or automatic checkpoints every 35 seconds, then automatic checkpoints are turned off
Performance advisory: The physical log is too small for automatic checkpoints. Results: Automatic checkpoints are disabled. Action: Increase the physical log size to at least ## Kb.
Slide 22
IBM Informix Dynamic Server
© 2007 IBM Corporation
LOGBUFF – Logical log buffer
► Default value is 64KB
► If value < 64 KB, a message appears in the online.log
► Assumes buffered logging is used. If non-buffered logging is used, smaller buffers can be used
Slide 23
IBM Informix Dynamic Server
© 2007 IBM Corporation
Checkpoint Performance Advisory – Logical log
► During checkpoint processing system detects potential for reaching checkpoin per log span limit.
Performance advisory: Logical log is running out of room. Results: Blocking transactions until checkpoint is complete.Action: Increase logical log size.
Slide 24
IBM Informix Dynamic Server
© 2007 IBM Corporation
Long Transaction blocking checkpoints
► Long transactions are triggering frequent checkpoints
Performance advisory: Long transactions are triggering blocking checkpoints. Results: Blocking transactions until checkpoint is complete. Action: Increase logical log size.
Slide 25
IBM Informix Dynamic Server
© 2007 IBM Corporation
Logical and automatic checkpoints ON
► If the logical log is less than 20MB (20000KB) or auto checkpoint generated every 35 seconds.
Performance advisory: The logical log space is too small forautomatic checkpoints. Results: Automatic checkpoints are disabled. Action: Increase the logical log space to at least ## Kb.
Slide 26
IBM Informix Dynamic Server
© 2007 IBM Corporation
Performance Warning Examples
23:28:26 Performance Advisory: The current size of the physical log buffer is smaller than recommended.23:28:26 Results: Transaction performance might not be optimal.23:28:26 Action: For better performance, increase the physical log buffer size to 128.
13:25:54 Performance Advisory: Based on the current workload, the physical log might be too small to accommodate the time it takes to flush the buffer pool.13:25:54 Results: The server might block transactions during checkpoints.13:25:54 Action: If transactions are blocked during the checkpoint, increase the size of the physical log to at least 14000 KB.13:25:54 Performance Advisory: The physical log is too small for automatic checkpoints.13:25:54 Results: Automatic checkpoints are disabled.13:25:54 Action: To enable automatic checkpoints, increase the physical log to at least 14000 KB.
Slide 27
IBM Informix Dynamic Server
© 2007 IBM Corporation
onstat –g ckp
IBM Informix Dynamic Server Version 11.10.FB7TL -- On-Line -- Up 01:03:54 -- 39936 Kbytes
AUTO_CKPTS=Off RTO_SERVER_RESTART=Off
Critical Sections Physical Log Logical Log Clock Total Flush Block # Ckpt Wait Long # Dirty Dskflu Total Avg Total Avg Interval Time Trigger LSN Time Time Time Waits Time Time Time Buffers /Sec Pages /Sec Pages /Sec 24 16:04:11 Plog 26:0x2d50f8 0.4 0.4 0.4 1 0.0 0.4 0.4 709 709 750 10 638 8 25 16:04:31 Plog 28:0x108c 0.6 0.6 0.6 2 0.0 0.6 0.6 940 940 722 38 1276 67 26 16:05:03 *User 28:0x32b018 0.1 0.0 0.0 1 0.0 0.1 0.1 34 34 187 5 810 24 27 16:20:05 CKPTINTVL 28:0x32e018 0.0 0.0 0.0 0 0.0 0.0 0.0 1 1 0 0 3 0 28 16:21:38 Plog 29:0x1c676c 0.5 0.5 0.5 1 0.0 0.5 0.5 705 705 750 8 640 6 29 16:21:52 *User 29:0x3b9018 0.1 0.0 0.0 1 0.0 0.1 0.1 33 33 186 12 499 33 30 16:23:45 *Backup 29:0x3bd018 0.1 0.0 0.0 0 0.0 0.0 0.0 16 16 18 0 4 0
Max Plog Max Llog Max Dskflush Avg Dskflush Avg Dirty Blocked pages/sec pages/sec Time pages/sec pages/sec Time 200 200 1 405 10 1
The server is blocking transactions because the physical log is too small.Based on the current workload, to prevent the server from blocking future transactions, increase the size of the physical log to 14000 KB.Based on the current workload, the logical log space might be too smallto accommodate the time it takes to flush the buffer pool. The server might blocktransactions during checkpoints. If the server blocks transactions,increase the size of the logical log space to at least 14000 KB.
Slide 28
IBM Informix Dynamic Server
© 2007 IBM Corporation
onstat –g ckpAUTO_CKPTS On/Off Displays if automatic checkpoints feature is on or off
RTO_SERVER_RESTART Seconds Displays the RTO policy. 0=RTO policy is off.
Estimated recovery time Seconds This is the estimated time it would take the IDS server to perform fast recovery.
Interval Number Checkpoint interval id
Clock Time Wall clock time This is the wall clock time that the checkpoint occurred
Trigger Text There are several events that can trigger a checkpoint. The most common are RTO, Plog or Llog (running out of logical log resources).
LSN Log position Log position of checkpoint
Total Time Seconds Total checkpoint duration from request time to checkpoint completion
Flush Time Seconds Time to flush bufferpools
Block Time Seconds Transaction blocking time
# Waits Number Number of transactions that blocked waiting for checkpoint
Ckpt Time Seconds amount of time it takes for all transactions to recognize a checkpoint has been requested
Wait Time Seconds Average time thread waited for checkpoint
Long Time Seconds Longest amount of time a transaction waited for checkpoint
# Dirty Buffers Number Number of buffers flushed to disk during checkpoint processing
Dskflu/Sec Number Number of buffers flushed to disk per sec during checkpoint processing
Plog Total Pages Number Total number of pages physically logged during the checkpoint interval
Plog Avg/Sec Number Average rate of physical log activity during the checkpoint interval
Llog Total Pages Number Total number of pages logically logged during the checkpoint interval
Llog Avg/Sec Number Average rate of logical log activity during the checkpoint interval
Slide 29
IBM Informix Dynamic Server
© 2007 IBM Corporation
New SYSMASTER Tables
►syscheckpoint■ Keeps history on the last 20
checkpoints
►sysckptinfo■ Keeps info on automatic
checkpoints
© 2007 IBM Corporation
IBM – Informix Dynamic Server
Recovery Time Objective (RTO)
Slide 31
IBM Informix Dynamic Server
© 2007 IBM Corporation
Onconfig parameter
► New onconfig parameter
► RTO_SERVER_RESTART■ Amount of time in seconds that Dynamic Server has to recover
from a problem after you restart Dynamic Server and bring the server into online or quiescent mode.
■ Seed the logical recovery pages in physical log
■ Valid values are 60 – 1800
■ Default is 0 (disabled)
Slide 32
IBM Informix Dynamic Server
© 2007 IBM Corporation
RTO
► Facts about RTO_SERVER_RESTART■ Allows users to set target fast recovery time.
■ RTO_SERVER_RESTART and CKPTINTVL mutually exclusive.
■ If turned off, the system will use the CKPTINTVL to trigger checkpoints (the old style).
■ Valid values 60 - 1800 seconds (1–30 minutes).
■ Automatically adjust the checkpoint frequency to meet the RTO policy.
■ The server will fine tune with each fast recovery to improve the predictability.
■ This parameter can be updated with onmode –wf and –wm.
■ RTO_SERVER_RESTART=0 (off) is the default.
Slide 33
IBM Informix Dynamic Server
© 2007 IBM Corporation
How does RTO_SERVER_RESTART work?
► Estimate/Calculate the speed of fast recovery■ Server boot time
■ Physical log recovery (RAS_PLOG_SPEED)
■ Logical log recovery (RAS_LLOG_SPEED)
■ Assume all updates fit into bufferpools(pages seeded in physlog)
► Automatic checkpoints based on resource usage to meet RTO policy.
© 2007 IBM Corporation
IBM – Informix Dynamic Server
Auto LRU Tuning
Slide 35
IBM Informix Dynamic Server
© 2007 IBM Corporation
Automatic LRU Tuning (lru_min/max_dirty)
►With interval checkpoints, LRU flushing can be less aggressive.
■ so go ahead and relax… your lru_min/max_dirty
■ Can bring dramatic increases in performance.
►LRU flushing will automatically adjust to be more aggressive
■ When a hot page is replaced, 1%.
■ When a foreground write occurs, 5%
■ Time to flush bufferpool> RTO_SERVER_RESTART, 10% more aggressive
■ Continues adjusting until optimal.
Slide 36
IBM Informix Dynamic Server
© 2007 IBM Corporation
LRU_MAX_DIRTY and LRU_MIN_DIRTY
► Default values■ LRU_MAX_DIRTY 60%
■ LRU_MIN_DIRTY 50%
► A good starting point when AUTO_LRU_TUNING is ON■ LRU_MAX_DIRTY 80%
■ LRU_MIN_DIRTY 70%
Slide 37
IBM Informix Dynamic Server
© 2007 IBM Corporation
Automatic LRU Tuning – Configuration
► AUTO_LRU_TUNING■ 0 or 1
■ ON by default
► Dynamically switch off LRU_TUNING■ onmode –wm AUTO_LRU_TUNING=0
► Dynamically switch on LRU_TUNING■ onmode –wm
AUTO_LRU_TUNING=1,min=val,max=val
► Dynamically set LRU parameters when lru tuning is on/off
■ onmode –wm AUTO_LRU_TUNING=min=val
■ onmode –wm AUTO_LRU_TUNING=max=val
Slide 38
IBM Informix Dynamic Server
© 2007 IBM Corporation
Performance Advisory when auto LRU tuning ON
► During checkpoint if buffers flush time exceeds RTO.
Performance advisory: The time to flush the bufferpool ## Is longer than RTO_SERVER_RESTART ##. Results: The IDS server can't meet the RTO policyAction: Automatically adjusting LRU flushing to be moreaggressive. Adjusting LRU for bufferpool - id ## size ##k Old max ## min ## New max ## min ##
Slide 39
IBM Informix Dynamic Server
© 2007 IBM Corporation
…..when auto LRU tuning OFF
Performance advisory: The time to flush the bufferpool ## Is longer than RTO_SERVER_RESTART ##. Results: The IDS server can't meet the RTO policyAction: Automatic LRU tuning is off. Either turn onautomatic LRU tuning or change LRU flushing to be moreaggressive.
© 2007 IBM Corporation
IBM – Informix Dynamic Server
Automatic AIO VP Tuning
Slide 41
IBM Informix Dynamic Server
© 2007 IBM Corporation
Automatic Tuning of AIO VPs
► For cooked chunks
► Monitor I/O performance and add more AIO VPs and/or cleaners if needed
► AUTO_AIOVPS configuration parameter■ 0 or 1
■ ON by default
► Dynamically change it using onmode■ onmode –wm/-wf AUTO_AIOVPS=1
■ onmode –wm/-wf AUTO_AIOVPS=0
Slide 42
IBM Informix Dynamic Server
© 2007 IBM Corporation
NUMAIOVPS or VPCLASS aio_num=#
► Initial setting will be 2 AIO VPs per cooked chunk► If you add one cooked chunk, 2 more AIO VPs will be
added up to a value of 128► Changing the value in ONCONFIG does not have any
impact if RTO_SERVER_RESTART is ON.► Possible to change the value dynamically using onmode -p
Slide 43
IBM Informix Dynamic Server
© 2007 IBM Corporation
CLEANERS
► Initial setting will be 1 cleaner thread per AIO VP► Value adjusted in conjunction with changes to the number
of AIO VPs.
Slide 44
IBM Informix Dynamic Server
© 2007 IBM Corporation
Additional Information on checkpoints
http://www.ibm.com/developerworks/db2/library/techarticle/dm-0703lashley
© 2007 IBM Corporation
IBM – Informix Dynamic Server
Direct I/O for cooked files
Slide 46
IBM Informix Dynamic Server
© 2007 IBM Corporation
Behavior of cooked files
► Cooked file performance can be much slower than raw devices.
File System Cache
Slide 47
IBM Informix Dynamic Server
© 2007 IBM Corporation
The Solution with Cooked files
► Direct I/O bypasses file system cache
► Unix and Linux OS support Direct I/O
► Performance close to that of raw devices
File System Cache
Slide 48
IBM Informix Dynamic Server
© 2007 IBM Corporation
When is DIO used
► DIO not used by default on cooked files
■ Onconfig DIRECT_IO = 1 to turn on
► When using DIO, kaio will be used by default. This can be switched off bysetting KAIOOFF=1
Slide 49
IBM Informix Dynamic Server
© 2007 IBM Corporation
What are the benefits of DIO?
► File reads/writes bypass the operating system read and write caches.
► Reducing CPU consumption and eliminating the overhead of copying data twice.
■ first between the disk and the file buffer cache
■ second from the file buffer cache to the application’s buffer.
► Can reduce number of AIO VPs if KAIOOFF is not set.
Slide 50
IBM Informix Dynamic Server
© 2007 IBM Corporation
Limitations
► Can not be used for temporary dbspaces.
► can only be used for dbspace chunks whose file systems support direct I/O for the page size
Slide 51
IBM Informix Dynamic Server
© 2007 IBM Corporation
What are customers saying
► "During the IDS “Cheetah” beta program we extensively tested IDS ...Our major focus was the non-blocking checkpoints in "Cheetah" which will bring our customers additional performance boost. -- Wolfgang Kraus, Bytec GmbH, Head of IT-
Services
► We have seen enormous performance improvements—up to seven times faster in some cases—using IDS Cheetah.” —Rob Prop, Manager Professional Services,
Informa
Slide 52
IBM Informix Dynamic Server
© 2007 IBM Corporation
Summary
► These are just some of the performance improvements that have been made in Cheetah
Slide 53
IBM Informix Dynamic Server
© 2007 IBM Corporation
Questions