lascon storage - tsm database and log
TRANSCRIPT
-
7/24/2019 LasCon Storage - TSM Database and Log
1/12
8/16/13 LasCon Storage - TSM Database and Log
www.lascon.co.uk/tsm-database-and-log.php#db6base 1/12
LASCON STORAGE
CONTACT US Find
POWERED BY FREEFIND
HOME BACKUP AND DR HARDWARE M AINFRAM E OPEN SYSTEM S DATABASES STRATEGY
Subsections FDRABR TSM Snapshot Backups Ins tant Backup Remote Mirro ring
This Page
Managing the TSM database andlog files.
TSM pages
Defining TSM backups
Backup Journal Backups Image Backups Oracle TDP Backups Windows Cluster Backups VCS Cluster Backups LAN free backups TDP for MSSQL DBs TSM with DB2
Managing TSM
Tivoli Storage Manager TSM news from IBMUsing TSM SQL TSM AdminRestore ArchivingTSM Scripts Database and Log TSM performance tuning SERVERGRAPH
New TSM features TSM SHOW commands
TSM Tape Management
Hardware LibrariesScratching Tapes General
Bare Metal RecoveryHow to - BMR
Who are we?
Welcome to Lascon Storage. Thesite was founded in 2000 andprovides hints and tips on how tomanage your data, strategicadvice, news items and adiscussion forum. Here is a list ofrecent major updates
TSM hints and tips from IBM,updated August 2013ICF Catalog pages updated, June2013TSM LAN free updated June 2013EMC VPLEXadded May 2013VCS Cluster Backups added April
TSM Database and Recovery Log
TSM Database version 6.1 and later
Basic database structure
Database sizing andtuning
Recovery log s izing and tuning
Best practice for Database and Storage Pool disks
Using DB2 commands on a TSM serverInvestigating Problems with the Server Instance
-
7/24/2019 LasCon Storage - TSM Database and Log
2/12
8/16/13 LasCon Storage - TSM Database and Log
www.lascon.co.uk/tsm-database-and-log.php#db6base 2/12
2013Enterprise disk subsystems
updated March 2013HDS Remote Replicator addedFebruary 2013Some 6.x database updates
added February 2013
Lasconet Forum
Visit the Lasconet Forum
When a TSM database is initially created on multiple file systems , that database will be s pread eaqually over all
the file systems. However if you add an extra file system to the databas e space using the 'extend dbs' command,
DB2 will not rebalance the database to sp read the data equally. This means that if som e of the original file
spaces were 100% full, they will still be 100% full after the new filespace is added and this could caus e the TSM
to stop.
If you are running TSM Server V6.2 or above you can rebalance the database dynamically us ing DB2 com mands .
I suggest that you look up the IBM technote about this, and also contact IBM for advice before trying this.
The database can be sized anywhere between 2.2GB and 1TB. The DB2 database will be between 35 and 50%
bigger than the equivalent legacy database, partly becuase it hold s ort space for SQL queries. The DB2 database
is la rgely self tuning, so there is no requirement for DB2 tuning skills . A new parameter, DBMEMPERCENT,
replaces the old BUFFPOOLSIZE. This set of buffers contains much m ore data than the old buffer so the
recommendation is to set its s ize to unlim ited. In fact, TSM/DB2 will try to change it to unlimited on startup.
Two other legacy features are not required now; database audits and indexed tables.
The database uses DB2 relational cons istency rules to prevent incorrect data from entering, and is s elf auditing.
The database will as lo run automatic 'runstats' from time to time. This is a DB2 feature that optimises storage
paths through the database to improve performance.
The database also us es relational indices , so it does not require special index tables to speed up SQL queries.
back to top
Recovery log sizing and tuning
TSM 6.1 has three recovery logs.
The Active log contains updates that have not been comm itted to disk yet and is us ed for roll-forward or roll-back
in case of problems . Once a transaction is com mitted, the data is moved to the archive log. The default size for
the Active log is 2GB and the s ize can be inceas ed by increments of 512MB right up to 128GB.
The Archive log contains com mitted transaction data and is used for PIT recovery of the database. The Archive log
is cleared out by a full database backup. However it retains all data updates appl ied right back to the second lastbackup, so you need to size your archive log with that in m ind.
The Failover Archive log
TSM collectively calls these three logs the 'recovery log', but a DB2 DBA would just call them 'transaction logs '.
The log files form part of the TSM database, and unlike the legacy TSM database there is no need to create and
format log volumes. The logm ode is equivalent to legacy roll-forward. In DB2 terms, these are archive logs , not
circular logs. This means that the log files can fill up, so log file managem ent is s till required. You can specify a
failover log for the Archive log to help prevent this, but the Active log cannot failover and the size is fixed between
2GB and 128GB, so don't allocate all the s pace that you have available for the Active log, keep som e in reserve for
emergencies.
It is highly recomm ended that FailoverArchiveLog space be set as ide for poss ible emergency use. You can use
slower disks for FailoverArchiveLog s pace.
If the Active log fills up and the server stops, the proces s to get your TSM server up aga in is :
DSMSERV DISPLAY LOG - to check the current log s tatus
Update the Active log size parameter in dsms erv.optStart the server up
back to top
Best practice for Database and Storage Pool disks
The following are some of the 'Best Practices' recommendations from IBM for setting up DB disk volumes for
TSM Servers
Use fas t, low latency disks for the Database, use SSD if you can afford it. Avoid the slower internal dis ks included
by default in most AIX servers, and avoid consumer grade PATA/SATA disks. Use faster disks for the Active Logs
too. Do not mix active logs with disks containing the DB, archive logs, or system files such as page or swap
space. Slower disks for archive logs and failover archive logs can be used , if needed.
Use m ultiple database containers. For an average size DB, it is recomm ended to use a t least 4 containers
initially for the DB. Larger TSM servers, or TSM servers p lanning on using data deduplication, should have up to 8containers or more. You should plan for growth with additional containers up front as adding containers later can
result in an im balance of IO and create hot spots.
Place each database container in a different filesystem. This im proves performance; DB2 will s tripe the database
data across the various containers. Tivoli Storage Manager supports up to 128 containers for the DB. Ideally
place each container on a different LUN, though this i s not so important for high end disks like XIV or VMAX.
There should be a ratio of one databas e directory, array, or LUN for each inventory expiration process .
The block size for the DB varies depending on the tablespace, mos t are 16K, but a few are 32K. Segment/strip
sizes on disk subsystems s hould be 64K or 128K.
If you use RAID, then define all your LUNs with the same size and type. Don't mix 4+1 RAID5 and 4+2 RAID6
together. RAID10 will ou tperform RAID for heavy write workloads, but cos ts twice as much. RAID1 is good for
active logs .
Smaller capacity disks are better than larger ones if they have the same rotational s peed. Have containers on
disks that have the sam e capacity and IO characteristics. don't mix 10K and 15K d rives for the DB containers.
Cache subs ystem "readahead" is good to use for the active logs; it helps in archiving them faster. Disk
subs ystems detect readahead on a LUN by LUN basis . If you have multiple reads go ing agains t a LUN, then this
detection fails. So several sma ller LUNs are better than a few large ones, but too m any LUNS can be harder to
manage.
However it is very difficult to given generic rules about disk configuration as this very much depends on what type
of disks you are using.
http://www.lascon.co.uk/tsm-database-and-log.php#tophttp://www.lascon.co.uk/tsm-database-and-log.php#tophttp://www.lascon.co.uk/lasconet/user_home.phphttp://www.lascon.co.uk/tsm-database-and-log.phphttp://www.lascon.co.uk/rdm-universal-replicator.phphttp://www.lascon.co.uk/hwd-enterprise-disks.php -
7/24/2019 LasCon Storage - TSM Database and Log
3/12
8/16/13 LasCon Storage - TSM Database and Log
www.lascon.co.uk/tsm-database-and-log.php#db6base 3/12
High end disk subsystems such as the EMC DMX, the HDS VSP and the IBM DS8000 have very large front end
cache to speed up performance, and s tripe data in a way that makes it difficult to separate data by physical
spindle. The IBM XIV takes this virtualisation to a higher level again. To get the best performance from these
devices you want enough LUNs to s pread the IO and get readahead cahce benefit, but not so m any that they
become difficlult to m anage. For the XIV, consider using a queue depth of 64 per HBA to get best advantage of the
parallelism capabilities.
Don't stripe your data using logical volumes, let the hardware do the striping. As a rule o f thumb, consider using
50GB volumes for DISK pools and 25GB volumes for file pools . Define the same num ber of volumes per LUN as
the RAID type to make up the LUN, so for example with 4+1 RAID5, define 4*50GB volumes per LUN, then each
LUN will use 250GB, with effective capacity of 200GB.
The Unix Tips section contains s ome detail on how to us e VMSTAT and IOSTAT commands to investigate
potential disk bottlenecks.
back to top
Using DB2 commands on a TSM server
IBM's des ign model for TSM v6 and upwards is to store TSM metadata in a DB2 database, without the TSM
adminis trators needing to know anything about DB2 and how to manage it. That design model holds wel l, but
there are a few circums tances where a bit of DB2 comes in useful. On Windows, you start a DB2 comm and line
from Start -> All Programs -> IBMDB2 -> Comm and Line Tools.
AUTHORISING A USERID TO BE ABLE TO START THE TSM SERVER
The TSM DB2 system is 'owned' by the userid that installed it, and normally only that userid has the
adminis tration authority needed to manage the DB2 database, including the ability to start the TSM service.
However you can give access to another userid us ing DB2 comm ands. Open up a command li ne as the TSM
instance owner by right clicking on it and taking the 'run as ' option. You will need the ins tance owner userid andpass word to do this. Once you have the command line, type the following comm ands
db2==> connect to tsmdb1
db2==> grant dbadm on database to user TSM_ADMIN
Userid TSM_ADMIN can now be used to stop and start the TSM services
Recovering from a full archive log
Under tsm 6.x, the archive and active log directories can fill up, and if they do, the server will s hut down. To
prevent this, you need to m ake sure you trigger a FULL database backup once the archive log hits a thresho ld,
but if the worst happens and the log files do fill up, you need a recovery process .
If this happens then you cannot use TSM commands to move the logs into bigger directories, as you cannot start
TSM. What you need to do is create temporary logs els ewhere, then prune the a rchive log using native DB2
commands. However, remem ber that the archive log will ho ld enough information to wind back through the last 2
full backups, so you need to run 2 full backups to clear it down.
1. Create a temporary directory large enough to hold y our active logs. The dsmserv.op t file may contain the log sizes in
the ACTIVELOGSIZE parameter, and if not, it will point to the physical log location.
2. Open a DB2 command line and run the commands below to switch the logs to a new location
Set db2instance=SERVER1db2start
db2 update db cfg for tsmdb1 using newlogpathpath\to\new\logs
db2stop
db2start
3. 'Activate' the database to copy the log files with the following command, this command does not affect the original logs.
This may take a while, and success will be indicated when you see a command promp t again.
db2 activate db tsmdb1
4. Now you need to back the database up to clear the logs out, and you need to do this on disk, so identify or create a
directory with enough space to t ake a database backup then run the following DB2 commands.
db2stop
db2startdb2 backup db tsmdb1 topath\to\database\backup\directory
The archive logs will start pruning once you see the Backup Successful message, but t his could take a while to
app ear if your database is large. Make a note of t he backup timestamp, which will look something like The
timestamp for this backup image is: 20120412130821
5. Find some more space, and run another full DB2 database backup with the command.
db2 backup db tsmdb1 topath\to\another\database\backup\directory
When this second backup completes, the archive log directory and original active log directory are empty of log files.
Make a note of the backup timestamp again, let us call this one 20120412150425
6. Now you need to delete the first backup using these commands note how you use the timestamp from step 4.
http://www.lascon.co.uk/tsm-database-and-log.php#tophttp://www.lascon.co.uk/UNIX-tips.php -
7/24/2019 LasCon Storage - TSM Database and Log
4/12
8/16/13 LasCon Storage - TSM Database and Log
www.lascon.co.uk/tsm-database-and-log.php#db6base 4/12
db2stop
db2startdb2 connect to tsmdb1
db2 PRUNE HISTORY 20120412130821WITH FORCE OPTION AND DELETE
7. Point DB2 back to t he original, empt y active log in the original location, you will get this from the
ACTIVELOGDIRECTORY parameter in dsmserv.opt
db2 UPDATE DATABASE CONFIG FOR TSMDB1 USING NEWLOGPATHpath\to\activelogdir
8. Connect to the database again, and that will automatically start moving the active logs from the temporary location to
original active log location, and again, this can take a while if the logs are big.
db2 force application all
db2stop
db2startdb2 connect to tsmdb1
9. Now you need to start the TSM server up and run a good backup. You need to start t he server in the foreground to do
this, so open a normal Windows command line and navigate to t he server directory and run dsmserv. If you have more
than one TSM server on this machine, you may need to use the -k option to get the right server. This will bring you up
a TSM server command line. Disable your client sessions then t ake 2 full database backups . You need to know y our
backup device classes to be able to do this.
Disable sessions
Backup db type=full dev=your_db_devclass
10. Delete the second DB2 database backup as follows, using the database timestamp that y ou recorded in step 5.
db2 PRUNE HISTORY 20120412150425WITH FORCE OPTION AND DELETE
11. Now you can halt your server in the foreground, and start it normally. Remember to enable sessions.
It is poss ible to query what is happening while a database recovery is in progress with the db2pd utility, a DB2
diagnostic tool that is provided with the TSM server installation code. You sim ply run this as a comm and from the
shell prompt, like this:
tsm:~ # db2pddb2pd> You are running db2pd in interactive mode.
db2pd> If you want command line mode, rerun db2pd with valid options.
db2pd> Type -h or -help for help.
db2pd> Type q to quit.
To check out what is happening w ith a database recovery, run
db2pd> -recovery -db tsmdb1
STARTING AND STOPPING AUTO RUNSTATS
Runstats is used to optimise access pa ths through the TSM tables and should normally be set to run
automatically as required. However if runstats s tarts automatically when the TSM is started up after a database
upgrade, it can cause performance problems to the extent that no-one can log into the system.
To temporarily suspend auto runstats, before halting the TSM server for an upgrade, submi t the following
commands to the DB2 ins tance that is as sociated with the TSM server:
db2 connect to tsmdb1
db2 update db cfg for TSMDB1 using AUTO_RUNSTATS OFF
Now runs tats will not start automatically when you restart TSM server. However you need runstats to keep your
database optim ised, s o once you are happy that your TSM server is up and running, submit the followingcommands to the DB2 instance for your TSM server and Runstats will resum e normal process ing.
db2 connect to tsmdb1
db2 update db cfg for TSMDB1 using AUTO_RUNSTATS ON
SOME OTHER POTENTIALLY USEFUL COMMANDS
You can enter any DB2 command from the DB2 comm and line, including SQL queries and comm ands that
update or delete the database, so be careful. Some of the query commands could be useful for investigating TSM
problems
get instance - returns the name of the TSM server
list active databases - will show the TSM database name as known to DB2, and the path to it.
get dbm config - shows t he settings for the database configuration manager
db2start and db2stop - obvious what these are! They should never be necessary as DB2 should be started
automatically as part of the server startup, but if necessary, it can be done manually
The following comm ands require you to be logged in with admins trator authority and connected to a
database
get db cfg show detail - shows the confguration parameters for the database
list tables - for the connected database
describe table table_name- lists the columns for the specified table
-
7/24/2019 LasCon Storage - TSM Database and Log
5/12
8/16/13 LasCon Storage - TSM Database and Log
www.lascon.co.uk/tsm-database-and-log.php#db6base 5/12
back to top
Investigating Problems with the Server Instance
The first place to s tart is the TSM Active log, but if you need to go deeper, then the DB2 logs can be useful.
However finding those logs can be a challenge as the location can depend on the the OS platform or even the OS
release level. The best way to be sure you have the correct log is to check the DIAGPATH variable in DB2.
Start up a DB2 comm and line, in Windows go to Start->Programs->IBM DB2->Command Line tools->Comm and
Window and in UNIX, su - db2inst1 (db2ins t1 is the default ins tance, if you change the ins tance name or have
multiple i nstances, you need to su to the correct userid for your instance). You then type 'db2' to open the DB2
command line
From the db2 command line type db2=> get dbm cfg. The comm and produces a lot of output, look for the line li ke"Diagnostic data directory path (DIAGPATH) = /home/E1WT1/e1wt1/sqllib/db2dum p and this shows the path
to the log files. If the DIAGPATH is blank, look for the default PATH directory instead.
'quit' gets you out of that DB2 command line
The DB2diag.log contains information like database backups, table reorganizations, memory managem ent
mes sages , start and stop of TSM server and hardware information logged at instance s tart time, as well as error
and warning messages .
Sometimes when investigation TSM server problems, the DB2 termino logy does not quite match TSM so the
error messages in the DB2 logs can look a bit s trange. For example, Tivoli Storage Manager refers to
transactions which DB2 calls to units of work (UOW). Tivoli Storage Manager us es s elect
statements where DB2 uses SQL and are also sometimes referred to as DML, or data manipulation language
statements.
Another potentially useful file is the startup trace log dsm upgdx.trc, which is located in c:\program files \tivoli\tsm
for Windows or /opt/tivoli/tsm/server for UNIX and Linux servers. If you get database startup problems it's alwayus
worth checking the file to see if any useful error mess ages exist.
back to top
ANR0170E on Database Startup
When trying to s tart up TSM the following error m ess age can appear "ANR0170E - Error detected, database
restart required", and you may see errors in the actlog a bit like
ANR0171I dbieval.c(874): Error detected on 3:2, databas e in evaluation m ode.
ANR0170E dbieval.c(935): Error detected on 3:2, databas e restart required.
ANR0162W Supplemental databas e diagnos tic information: -1:58031:-1034
([IBM][CLI Driver] SQL1034C The database is damaged. The application has been dis connected from the
database. All applications process ing the database have been stopped.
The resolution is to res tart DB2 manua lly with the RESTART command. Open up a DB2 com mand line window
as explained above then issue the following
set db2instance=db2inst1 (this is the default instance)
db2 force application alldb2stop
db2 restart database db2inst1
You might have to run the restart comm and a few times before the issue is resolved. If this does not fix the
problem you probably need to contact IBM Support, although you can use the db2dart comm and to run a
database analysis . This generates a report file that would be useful for IBM support.
db2 force application all
db2stopdb2dart db2inst1 /db
back to top
AIX Maximum Number of Processes
You may see a database backup failing on an AIX server with an error like 'ANR2968E Database backup
terminated. DB2 sqlcode: -2033. DB2 sqlerrmc: 292' If this error is not corrected the recovery log will fills up and
crash the server. You may also see a m ess age like 'Insufficient AIX system resource' in the db2diag log file.
The API error code 192 means that the API was unable to 'fork' or create a process to do its database backup. AIX
has a parameter called maxuproc which limits the maximum number of processes that a user is allowed per
user, and this value should be increased.
To see what value is s et, use the command
lsattr l sys0 E | grep maxuproc
and to change the value use the command below, s electing a value that is sui table for your server.
chdev l sys0 a maxuproc=2048
back to top
http://www.lascon.co.uk/tsm-database-and-log.php#tophttp://www.lascon.co.uk/tsm-database-and-log.php#tophttp://www.lascon.co.uk/tsm-database-and-log.php#tophttp://www.lascon.co.uk/tsm-database-and-log.php#top -
7/24/2019 LasCon Storage - TSM Database and Log
6/12
8/16/13 LasCon Storage - TSM Database and Log
www.lascon.co.uk/tsm-database-and-log.php#db6base 6/12
Effect of Deduplication on Database size
Deduplication will s ave a lot of backend storage, but it does this at the expense of increasing the s ize of the TSM
database becaus e the TSM database has to store and track the metadata that is required to manage the
deduplication. The exact amount of extra space requ ired is difficult to calculate up front, as it depends on your
average 'deduplication chunk s ize' and this will vary depending on how well your data deduplicates. IBM suggests
an typical chunk s ize of 100,000 bytes, and provides som e scripts that you can run to measure your exact average
chunk size once you have deduplication working. Each chunk needs 490 bytes of metadata to describe the data
in the primary pool, and another 190 bytes for the data in each copy pool.
A starting point is to estim ate your database s ixe without dedupl ication, and to to this you us e the formula
db_size = file_count * number_of_backup_copies * 200
To give you an idea of how m any backup files exist, you can find the num ber of backup files that you are holding
with the following SQL query on an existing s erver
select sum(cast(num_files as bigint)) from occupancy -
where node_name is not null - and filespace_id is not null
To calculate the deduplication overhead, use the formula below to get the num ber of chunks
chunk_count = total_backedup_data_in_GB * 10,000 * 2
The doubling factor at the end of the formula is to cater for 'base deduplication chunks' that is, chunks that must
remain even after a file is expired and deleted from TSM. The extra database overhead is then
chunk_count * (490 + 190 * extra_backup_copies)
Running this formula on an existing server with a 135GB database predicted an increase of 105GB with
deduplication, which is not a trivial amount.
back to top
TSM Database version 5.5 and earlier
TSM 5.5 goes out of support at the end of April 2014
Recovery log processing
Database Defragmentation
Extending the TSM database under AIX
Formatting the TSM database and log
Auditing the TSM databas e
Database size and disk setup
Recovering a TSM 5.x Database on a Windows s erver.
Database and log Mirroring
Disk Storage Pools
Recovery log processingThe TSM database is quite sophis ticated, and uses a transaction log, called the recovery log. Multiple updates
are grouped together into 'transactions', which have to be applied to the database as a unit. If the server crashes ,
say, and the updates in a transaction have not been applied, then the partial updates must be backed out using
the log records. This al l or nothing approach p rotects da tabase integrity during updates.
If the s erver cannot update the recovery log, because it is full, then the server crashes. So its worth knowing what
makes the log fill up, and how to avoid it.
The log has two pointers, a 'head' pointer and a 'tail'. The head marks the position where the next update can take
place, new updates are added at the head. The tail marks the pos ition where the oldest transaction is s till
processing, and also where the last update can take place. Tail movement depends on how the 'logmode' is s et
up. If you define logm ode=rollforward, then the tail will only move when a database backup is run. If you use
logmode=norm al, then the tail moves when the oldest transaction completes . When the pointers reach the end of
the file, they start again from the beginning. Cons ider the logfile as being a circle, with the head and tail pointers
being points on the circumference. The command Q STATUS will tell you which logmode you are using.
The tail is then 'pinned' by the oldest in-flight transaction, and if this is not cleared before the head catches up,then the file is full. Tivoli provided a new command with TSM 4.2.2.1, 'show logpinned', which will i dentify the
transaction which is pinning the log.
The log file usually fills up due to a combination of two events. An old transaction hangs around and 'pins' the tail,
while another process is caus ing the head to move rapidly, so it catches up.
Long running transactions can be caused by very large database backups, or smaller backups running over slow
lines. A process which is trying to recover from a tape I/O error can also hang around for a l ong time.
Rapid head m ovement is caused by something which is doing large quantities of database updates, very fast.
Expire Inventory is a good example o f this. There are ways to m anage this
Don't s chedule inventory expiration w hen large backups are running
Make the log almost as large as p ossible, which is about 13GB at the moment. But, leave a bit of free sp ace so you can
extend the log if the server crashes.
Consider clearing out y our log before the backups start , by temporarily reducing the dbbackup trigger. UPDAT E DBB
LOGF=20 should force a backup. H owever, remember that if you are running with logmode=rollforward, and t he tail is
pinned, then the database backup will not clear out the log.
Consider running with a smaller value of dbbackupt rigger during the backup run, to help p revent the log from filling.
However, this can cause lots of backups to be triggered, so use with caution.
Monitor the log utilisation, and alert support staff, if the log exceeds, say 80%. The support staff then need to look for
an process which is holding the tail, and cancel it, or look for a process which is rap idly filling up the log and cancel
that. Or, to be on the safe side, cancel them both.
TXNG roupmax (maximum number of files sent t o the server in a single transaction) and TXNByt elimit (tot al number
of bytes in a single transaction) are usually set high to speed up backup performance. If you are getting problems with
http://www.lascon.co.uk/tsm-database-and-log.php#diskphttp://www.lascon.co.uk/tsm-database-and-log.php#dbmirhttp://www.lascon.co.uk/tsm-database-and-log.php#db5rechttp://www.lascon.co.uk/tsm-database-and-log.php#dbsizehttp://www.lascon.co.uk/tsm-database-and-log.php#dbaudhttp://www.lascon.co.uk/tsm-database-and-log.php#dbfmthttp://www.lascon.co.uk/tsm-database-and-log.php#dbexthttp://www.lascon.co.uk/tsm-database-and-log.php#dbdefraghttp://www.lascon.co.uk/tsm-database-and-log.php#recloghttp://www.lascon.co.uk/tsm-database-and-log.php#top -
7/24/2019 LasCon Storage - TSM Database and Log
7/12
8/16/13 LasCon Storage - TSM Database and Log
www.lascon.co.uk/tsm-database-and-log.php#db6base 7/12
your log filling up, consider reducing these to force more frequent commit point s.
Recovery log processing was enhanced in TSM 4.2. If the DB Backup Trigger is set correctly, and the LOGMODE
is in ROLLFORWARD, then a databas e backup wi ll s tart when the log reached 100% full. If the Recovery log hits
100%, then TSM will s top all process es except the database backup. When the backup completes , TSM iss ues
the message
ANR2104I Activity log process restarted - recovered from an ins ufficient space condi tion in the Log or Databas e.
This should help us avoid some difficult recovery situations.
back to top
Database Defragmentation
This contentious is sue applied to legacy databases only. The legacy TSM Server database has a b-tree format,
and grows sequentially from the beginning to end. When file entries are deleted by expiration process ing or file
space/volume delete processing, this leaves spaces or holes in that database. These may be re-used later
when new information is added, but they mean that the TSM database is using more s pace than it needs to. The
only way you can compress the databas e so that the 'holes' left by deleted pages are not present is to use the
database unload/reload utility.
The problem is that while the dump takes about an hour, the load utility can take several hours. Does it make a
difference? I have seen performance improve after defragmenting a da tabase, and I've also see an unload/reload
make performance worse. A defrag will reduce the physical size of your database.
The Tivoli team supplied a new command with TSM 5.3 for to you to check to see what an unload/reload would
achieve, called 'ESTIMATE DBREORGSTATS' This will estimate the amount of space that would be recovered by
an unload reload.
For older releases of TSM use the QUERY DB to see if you need to defrag your TSM DB.
Avail able Assign ed Maximum Maximum Page Total Used Pct Max.
Space Capacity Extension Reduction Size Usable Pages Util Pct
(MB) (MB) (MB) (MB) (bytes) Pages Util
--------- -------- --------- --------- ------- ---------- --------- ----- -----
50,208 49,004 1,204 9,412 4,096 12,545,024 9,544,454 76.1 76.1
Here a 49GB database can be reduced by 9.4GB = 19%, but it is only 76% used, so 5% could be reclaimed by
defragging. Some people claim that TSM tries to allocate pages in a way that leaves you with as good as
poss ible performance, and defragging the database wil l degrade performance. Its also pos sible that after a
defrag, the database will quickly become defragmented aga in, as it ins erts data into the tree. The following
formula can be used to see how m uch space could be reclaimed by an unload/reload.
SELECT CAST((100 - (CAST(MAX_REDUCTION_MB AS FLOAT) * 256 ) /
(CAST(USABLE_PAGES AS FLOAT) - CAST(USED_PAGES AS FLOAT) ) * 100) AS
DECIMAL(4,2)) AS PERCENT_FRAG FROM DB
A high PERCENT_FRAG value can indicate problems . If you think your database needs a defrag, then if possible,
take a copy and try that first. That will give you an indication of how much time is needed for the load.
back to top
Extending the TSM database under AIX
Create a new file system in AIX using SMITTY
make LV
make FS on existing LV
mount new-filesystem
THEN in TSM
dsmadmc ... define dbv /new-filesystem/filename
dsmadmc ... extend db
If you use incremental da tabase backups , then remember that after an EXTEND DB the next DB backup must be
a full backup.
back to top
Formatting the TSM database and log
Legacy TSM database files and log files have to be formatted before they can be used. There are two different
commands for this, and it i s vitally important that you know the difference. If you want to add a file to the databaseor recovery log, then you use the DSMFMT command to format the file. The DSMSERV FORMAT looks similar but
that command will format the whole recovery log and database. So jus t make things clear, DSMSERV FORMAT
will wipe all your existing database and log files, so if you want to make a complete fresh start, that's what you
use . DSMFMT will jus t format the file that you specify. The s yntax of DSMFMT is
dsmfmt -m -log tsmlogvol7 5
http://www.lascon.co.uk/tsm-database-and-log.php#tophttp://www.lascon.co.uk/tsm-database-and-log.php#tophttp://www.lascon.co.uk/tsm-database-and-log.php#top -
7/24/2019 LasCon Storage - TSM Database and Log
8/12
8/16/13 LasCon Storage - TSM Database and Log
www.lascon.co.uk/tsm-database-and-log.php#db6base 8/12
Which will format a 5 m eg.log volume called tsm logvol7. Size options are 'k' 'm' 'or 'g' and data type options are
'db' 'log' or 'data'
back to top
Auditing the TSM database
The Audit process only applies to legacy TSM databases.
Richard Sims has correctly pointed out that a database audit with FIX=YES is a dangerous procedure. "Correcting
database problem s without TSM Support direction can result in wors e problems , including data loss . Structural
problems and inconsis tencies in any database s ystem can be much more comp lex than a vanilla utility can
properly deal with. If one has a reason to believe that their TSM database has problems , they need to contact TSMSupport for assistance in dealing w ith them, rather than attempt amateur s urgery. IBM repeatedly advises
customers NOT to attempt to fix database problems themselves".
I'd also suggest that if you run an audit, you always make sure you have a full database backup available first.
Database Audits are used to fix inconsistency problems between the database and its s torage components. A
full database audit can run for several hours, but it is poss ible to run sm aller audits on parts of the database. As
a general rule of thumb, a full database audit takes about 3 hours per milli on pages, and a 4 GB utilised
database holds about a million pages . The actual times will mostly depend on the process ing power of your
server. An audit will write a lot of log records so if you normally run with your recovery log in 'ROLL FORWARD'
mode it is advisable to put the log into 'NORMAL' mode before running an audit, then put it back into 'NORMAL'
mode when the audit completes.
/dsmserv auditdb fix=yes admin detail=yes
Is a very quick check of the admin data
/dsmserv auditdb fix=yes archstorage detail=yes
will audit the archive storage, and runs for 1-2 hours depending on your database s ize
/dsmserv auditdb fix=yes diskstorage detail=yes
will audit the disk s torage pools, and takes about 30 m ins, depending on the size of the database, and how full
the disk pools are. Best done when all the data is migrated out to tape.
/dsmserv auditdb fix=yes inventory detail=yes
This is the long running one, 8-12 hours.
The following information was supplied by Maureen O'Connor of Fiserv Technology in April 2007. Maureen has
provided som e excellent detail on how to estimate how long an aufit will take, and how to run audits agains t
multiple TSM servers on one AIX server.
Running an audi t of the TSM database can be a very long and time-cons uming p rocess, and it is not well
documented by IBM, so es timations can be difficult to make.
Generally speaking, the best way to run the audit is to run it agains t the whole database, not jus t a piece of it, but
if the db is very large, this can m ean an extensive outage, so it should be planned well in advance.
The audit follows 33 s teps:
1. ANR4726I The ICC supp ort module has been loaded.
2. ANR0990I Server restart-recovery in progress.
3. ANR0200I Recovery log assigned capacity is 1000 megabytes.
4. ANR0201I Database assigned capacity is 2500 megabytes.
5. ANR0306I Recovery log volume mount in progress.
6. ANR0353I Recovery log analysis pass in progress.
7. ANR0354I Recovery log redo pass in progress.8. ANR0355I Recovery log undo pass in progress.
9. ANR0352I Transaction recovery complete.
10. ANR4140I AUDITDB: Database audit process started.
11. ANR4075I AUDITDB: Auditing policy definitions.
12. ANR4040I AUDITDB: Auditing client node and administrator definitions.
13. ANR4135I AUDIT DB: Auditing central scheduler definitions.
14. ANR3470I AUDIT DB: Auditing enterprise configuration definitions.
15. ANR2833I AUDIT DB: Auditing license definitions.
16. ANR4136I AUDITDB: Auditing server inventory.
17. ANR4138I AUDITDB: Auditing inventory backup objects.
18. ANR4137I AUDITDB: Auditing inventory file spaces.
19. ANR2761I AUDIT DB: auditing inventory virtual file space mappings.
20. ANR4307I AUDITDB: Auditing inventory external space-managed objects.
21. ANR4310I AUDIT DB: Auditing inventory sp ace-managed objects.
22. ANR4139I AUDIT DB: Auditing inventory archive objects.
23. ANR4230I AUDIT DB: Auditing data storage definitions.
24. ANR4264I AUDIT DB: Auditing file information.
25. ANR4265I AUDIT DB: Auditing disk file information.
26. ANR4266I AUDIT DB: Auditing sequential file information.
27. ANR4256I AUDITDB: Auditing data storage definitions for disk volumes.
28. ANR4263I AUDITDB: Auditing data storage definitions for sequential volumes.
29. ANR6646I AUDITDB: Auditing disaster recovery manager definitions.
http://www.lascon.co.uk/tsm-database-and-log.php#top -
7/24/2019 LasCon Storage - TSM Database and Log
9/12
8/16/13 LasCon Storage - TSM Database and Log
www.lascon.co.uk/tsm-database-and-log.php#db6base 9/12
30. ANR4210I AUDITDB: Auditing phy sical volume repository definitions.
31. ANR4446I AUDITDB: Auditing address definitions.
32. ANR4141I AUDITDB: Database audit process completed.
33. ANR4134I AUDITDB: Processed 187 entries in database tables and 255998 blocks in bit vectors. Elapsed t ime is
0:00:10.
Each step is called bas ed on the architecture; the DSMSERV utility runs several concurrently, 5-10 at a time,
returning output as each step completes and picking up the next step in order. Steps 1-9 will finish almos t
imm ediately. Steps 10-16 will run next, and will take a sl ightly longer time, these follow definitions in order of
creation. When Step 17 begins, it will trigger Step 33, and depend ing on how many entries there are in the
database, the output from 33 will appear m ixed with the output from Steps 18-32. Step 33 is reviewing all client
data in the database, this is the longest running step in the audit process.
Typical output from Step 33 (from a large database) will look like this:
ANR4134I AUDITDB: Processed 8260728 entries in database tables and 0 blocks in
bit vectors. Elapsed time is 1:05:00.
ANR4134I AUDITDB: Processed 9035641 entries in database tables and 0 blocks in
bit vectors. Elapsed time is 1:10:00.
ANR4134I AUDITDB: Processed 9812999 entries in database tables and 0 blocks in
bit vectors. Elapsed time is 1:15:00.
ANR4134I AUDITDB: Processed 10663992 entries in database tables and 0 blocks i n
bit vectors. Elapsed time is 1:20:00.
ANR4134I AUDITDB: Processed 11677212 entries in database tables and 0 blocks i n
bit vectors. Elapsed time is 1:25:00.
ANR4134I AUDITDB: Processed 12014759 entries in database tables and 0 blocks i n
bit vectors. Elapsed time is 1:30:00.
Note this output refers to 'entries'. Entries are not a standard reference in TSM, this is a parsed view of data files,
part of the occupancy. To estimate how many entries will be scrolled through the audit, run this formula on a
command line within TSM:
select sum(num_files)*3 from occupancy
The '3' refers to the three pieces to a file: the entry, a header for the entry, and an active/inactive flag. Remember
that this is only an estimate, the reason for running the audit is possible corruption, there may be pieces
miss ing or mis-filed.
Entries are read anywhere from 500K to 1 m illion every five m inutes, so based on the output from this formula ,
this is how to estimate the time for the audit to complete.
Audits can be run on pieces of the database ins tead of the whole - a specific s torage pool or the adminis trative
portion - this can be a considerable time-saver, but if it is unknown what part of the database is corrupt, this may
not be a worthwhile option.
To run an audit, the TSM server instance must be down. If there are multiple TSM instances on a server, the
DSMSERV executable m ust be in the prim ary server directory, but if the audit is running on a secondary
instance, for example, parameters mus t be passed to operating system so the utility will know where it is
looking for the database:
AIX# export DSMSERV_DIR=/usr/tivoli/tsm/server/bin
AIX# export DSMSERV_CONFIG=/usr/tivoli/tsm/server/bin//dsmserv.opt
To run an audit jus t on the admin istrative portion of the database (the fastes t, 10-15 minutes), start the utility this
way:
AIX# at now
dsmserv auditdb fix=yes admin detail=yes > /tmp/tsmauditadmin.log[ctl-D]
The process wi ll run in the background, and a log will be kept; this log can be run with the tail -f command by
multiple users to track the progress.
To run the audit on the archived data (1-2 hours, depending on size of archives), enter this:
dsmserv auditdb fix=yes archstorage detail=yes >/tmp/tsmauditarchive.log
To run the audit on the dis kpool (very fast if all da ta is m igrated), enter this:
dsmserv auditdb fix=yes diskstorage detail=yes > /tmp/tsmauditdisk.log
To run on the client data only, not including the archives (still the longes t running), enter this:
dsmserv auditdb fix=yes inventory detail=yes > /tmp/tsmauditdata.log
Again, running on the inventory, while it can be run separatel y, it is alm ost a moot point.
If any data is found to be damaged, location messages as well as the fix (usually a deletion) will output to the
log as follows:
ANR1777I afaudit.c(967: Object 0.62882489 is \WINDOWS\INF\DSUP.PNF for node
(257), filespace \\\c$ (1).
ANR1777I afaudit.c(967: Object 0.62882490 is \WINDOWS\INF\DSUPT.PNF for node
(257), filespace \\\c$ (1).ANR1777I afaudit.c(967: Object 0.62882491 is \WINDOWS\INF\DVD.PNF for node
(257), filespace \\\c$ (1).
ANR4303E AUDITDB: Inventory references for object(0.62882489) deleted.
ANR4303E AUDITDB: Inventory references for object(0.62882490) deleted.ANR4303E AUDITDB: Inventory references for object(0.62882491) deleted.
-
7/24/2019 LasCon Storage - TSM Database and Log
10/12
8/16/13 LasCon Storage - TSM Database and Log
www.lascon.co.uk/tsm-database-and-log.php#db6base 10/12
Be sure s ufficient outage time is s cheduled. Once an audit begins, it is not good practice to halt the process ,
because the current location of the audit is not truly known - a data file could be open, and ha lting may actually
cause further corruption.
back to top
Legacy Database size and disk setup
The TSM database is critical to open systems backup and recovery. It needs to be 100% available as without it, it
is im poss ible to recover files. The 'incremental forever' philosophy behind TSM means that it is impos sible to
build a li st of files needed to recover a server without the TSM database. If the TSM database setup is notdesigned correctly then the database wil l perform bad ly and this wi ll affect your ability to fit backups within the
overnight window.
TSM performance is very much dependent on the size of the database . TSM performance su ffers if a database
becomes too large, but there are no exact rules on how big too large is . The maximum poss ible si ze for a TSM
database is 530GB. IBM recommend 120 GB as a general rule, with the caveat that 'when expiration, database
restores, and other Tivoli Storage Manager admin processes take too long and client restores become too s low,
it is too big'. Database backup and Expire Inventory are both CPU intensive process es that can be used to
indicate server performance problem s in general. The only sens ible ans wer to 'how big should a TSM database
be?' is to let you database grow until these process es s tart to become an iss ue. Expire Inventory should real ly
run within 12 hours and should be processing 3 million pages an hour or more. Backups should run in 30
minutes and process 6 m illion pages per hour or more, but these are jus t general rules-of-thumb. The actual
size will depend on how fas t your hosting s erver is, how good your disks are and what level of service you need to
provide.
A TSM Database consis ts of a number of files , called 'dis ks'. As TSM will s chedule one concurrent operation for
each database disk it makes sens e to allocate a lot of small dis ks, rather than a few large ones. A disk file size of
2 GB seems to be about right (The maximum poss ible s ize for a disk volume is 8 TB). IBM recommends that
these database disk files be spread over as many physical disks as possible. This makes sense for low or mid
tier disk subsystems, as this means that multiple disk heads can be seeking, reading, and writing
sim ultaneously, but as high tier subs ystems perform mos t of their I/O in cache this is less of an issue.
Most operating systems allow you to stripe files over logical and physical dis ks, or partitions, and recommend
that this be us ed for large performance critical files. It is very difficult to get any kind of consensus from the TSM
user community on the benefits of disk striping. For example to quote two users:-
USERA; 250GB! database on a high tier EMC DMX disk s ubsystem. Disk striping introduced and database
backup reduced by more than half.
USERB; 80GB database striped on a m id-tier IBM FASTT subsystem s triping removed and database converted to
RAID5. No impact on databas e backup times , expire inventory run times or client backup times.
TSM will allocate a default database and log file du ring a AIX usually in the server installation directory
/usr/tivoli/tsm/server/bin These default files should be deleted and re-allocated to your strategic size and location.
back to top
Recovering a 5.x database on a Windows Server
The basic s teps you need to take to recover a legacy database are:
Prepare the files you need to do a res tore
Format the database and logs
Restore the database
Sort out any storage pool iss ues
FILE PREPARATION
Obviously, you need a good backup of a TSM database, and you need to know the device class that was used for
the backup. For illustration, we will ass ume the latest database backup is on a tape called T012456L and us ed a
devclass called T_LTO3.
You also need a lis t of the database and log file names and si zes. If you use DRM then the best place to get this
from is the latest prepare. If you don't use DRM, you can get this info when your TSM server is running with the
commands
query dbvol f=d
query logvol f=d
That's all very well if you have a planned outage, but what if your database crashes and you don't have a prepare?
You can still get the info with dsms erv commands, use
dsmserv display dbvolumes
dsmserv display logvolumes
Create two text files, one called DB.VOLS that contains the Database file names , paths and s izes and one called
LOG.VOLS for the log files . The files should look like this, but use your own file nam es, paths and s izes. The file
sizes are in MB, so this is a 20GB database.
DB.VOLS
"H:\TSMDB\EXT01\DB01.DSM" 5000
"H:\TSMDB\EXT02\DB02.DSM" 5000"H:\TSMDB\EXT03\DB03.DSM" 5000
"H:\TSMDB\EXT04\DB04.DSM" 5000
LOG.VOLS"H:\TSMRLOG\RLOG01.DSM" 4096
"H:\TSMRLOG\RLOG02.DSM" 4096
Place these files in your c:\program files\tivoli\tsm\server\ directory
FORMAT THE DATABASE
http://www.lascon.co.uk/tsm-database-and-log.php#tophttp://www.lascon.co.uk/tsm-database-and-log.php#top -
7/24/2019 LasCon Storage - TSM Database and Log
11/12
8/16/13 LasCon Storage - TSM Database and Log
www.lascon.co.uk/tsm-database-and-log.php#db6base 11/12
Navigate to the c:\program files\tivoli\tsm\server\ directory and run the following com mand. Note that the logs are
described first, then the database, and that you need to s ay how many of each type of file you are formatting, so 2
log volumes and 4 database volumes.
DSMSERV FORMAT 2 FILE:LOG.VOLS 4 FILE:DB.VOLS
When you run a DSMSERV FORMAT on a Windows server, it resets the registry entry for the TSM server, and this
must be put back before you attempt the restore. Use REGEDIT and navigate to the correct registry entry for your
server. If you just have one TSM server on this Windows box, it will be Server1, otherwise Server2-4 depending on
which server you are working with. The Server1 key is HKEY-LOCAL-
MACHINE\SOFTWARE\IBM\ADSM\CurrentVersion\Server\Server1 and you need to change the path entry from
c:\progam files \tivoli\TSM\Server to Server1.
RESTORE THE DATABASE
Navigate to the c:\program files\tivoli\tsm\server\ directory and run the following com mand. The tape name and
devclass are the ones we found before we s tarted the restore, you subs titute your own names.
DSMSERV RESTORE DB VOLUMENAMES=T012456L DEVCLASS=T_LTO3 commit=yes
CLEAN UP
OK, now you have a copy of your TSM database as it was when it was backed up. Your problem now is that you
may have data on your disk s torage pools that is not recorded in your database, or your database will think data
exists on dis k that has been moved off. The database has also lost all record of any tape activity that has
happened s ince the backup, so you need to get these two in s tep again.
Before you s tart TSM up, go into the dsm serv.opt file and add the lines
NOMIGRRECL
DISABLESCHED YES
EXPINT 0
These three comm ands will prevent migration, client schedule and expire inventory from running. Now s tart your
server in the foreground and run comm and DISABLE SESSIONS to stop clients from contacting the server.
Audit your dis k storage pools us ing AUDIT Volum e volume_name FIX=YES, and that will hopefull y fix any
problems , but you may need to delete and redefine your disk volumes, dis carding faulty data, to get migration to
run clean.
Audit your tape library, and that will l et TSM know the current location of all tapes .
check your latest s aved volhist file for any tapes that have been deleted or updated since the backup ran. You will
need to audit these tapes too. Once you complete the audits, back out the changes you made to the dsm serv.opt
file, halt the TSM server, then start it normally and enable sess ions again.
back to top
Database and log Mirroring
There are three levels of m irroring, Hardware controlled, Operating Systems controlled and TSM controlled.
Mirroring protects the database from disk failure, and also subs ystem or site failure if the mirroring is between
subs ystems or s ites. Mirroring also offers some protection from system failure as the chance that at least one of
the mirror writes was s uccessful is much higher. TSM mirroring can detect if a partial write has occurred then a
mirror volume can be us ed to construct valid images of the miss ing pages . TSM mirroring can complem ent
hardware mirroring. It is bes t to mirror both the database and the recovery log to optimise availability and
recoverability.
If you are us ing automatic database or logfile expansion with mirroring, then this will place both the primary file
and the mi rrored file in the sam e directory, as only one directory path can be s pecified. This means that the
primary file and m irrored file could end up on the sam e disk, so they will need to be s eparated.
This sounds obvious, but the mirrors need to be on different disks. It is pos sible to place them on the sam e disk
and that would be pretty pointless . It is als o poss ible to mirror to three ways as well as two ways. With three-way
mirroring you get three copies of the data.
Hardware mirroring (RAID1)Most disk s ubsystems support RAID1 mirroring, which is expensive as it needs twice as much dis k, and
will not detect logical errors in the data. All data is mirrored even if it is corrupt.
Operating System MirroringIBM state that disk striping is suitable for a large sequential-access files that need high performance. AIX
supports RAID0, RAID1 and RAID10. RAID0 is not really a good idea, as if a logical volume was spread
over five physical volumes , then the logical volume is five times more li kely to be affected by a disk crash.
If one disk crashes , the whole file is los t. RAID1 is s traight disk mirroring with two stripes and requires
twice as much disk. RAID10 combines striping and mirroring, and also uses twice as much disk.
If AIX is m irroring raw logical volumes it is pos sible for it to overwrite some TSM control information, as
they both write to the same user area on a dis k. The impact would be that TSM would be unable to vary
volumes online.
TSM mirroringSoftware mirroring just applies to the legacy database. If TSM is m anaging the m irror and it detects
corrupt data during a write, it will not write the corrupt data to the second copy. TSM can then us e the good
copy to fix the corrupt mirror. TSM also mirrors at transaction level, and hardware at IO level. Hardware will
always mirror every IO, but TSM will only mirror complete transactions . This als o protects the m irror fromcorruption.
back to top
Disk Storage Pools
http://www.lascon.co.uk/tsm-database-and-log.php#tophttp://www.lascon.co.uk/tsm-database-and-log.php#top -
7/24/2019 LasCon Storage - TSM Database and Log
12/12
8/16/13 LasCon Storage - TSM Database and Log
HOME BACKUP AND DR HARDWARE M AINFRAM E OPEN SYSTEM S DATABASES STRATEGY
TSM will only perform one concurrent IO operation to every storage pool volume, s o it is better to have a num ber
of smaller volumes than a single large volume in your disk storage pools. Also, it is eas ier to relocate a sm all
volume to a different disk pool if space requirements change. However, every volume will us e one TSM
processing thread, and the TSM server will crash if too m any threads are allocated.
The normal proces s is to initially write backup data to disk, then move it off to tape. It is pos sible to copy the data
to tape but not delete it from the disk pool , if the disk cache setting is set to 'yes'. The TSM server would then use
the disk data for restores and delete it as space is needed for fresh backups. This would s peed up recovery, but
slow down backups as the TSM server has to do extra work clearing out data, and would al so m ake the TSM
database bigger as TSM needs to store two locations for recently backed up data. It is your choice, faster restores
or slower backups and a bigger database.
back to top
Cookies Policy: This site does not use cookies to gather personal data. 2012 Lascon Storage
See the Privacysection for details
http://www.sitemeter.com/stats.asp?site=s10lasconhttp://www.lascon.co.uk/privacy.phphttp://www.lascon.co.uk/tsm-database-and-log.php#tophttp://www.lascon.co.uk/storage-strategy-index.phphttp://www.lascon.co.uk/database-storage-index.phphttp://www.lascon.co.uk/open-systems-storage-index.phphttp://www.lascon.co.uk/zos-storage-index.phphttp://www.lascon.co.uk/storage-hardware-index.phphttp://www.lascon.co.uk/backup-and-recovery-index.phphttp://www.lascon.co.uk/index.php