database administration guide - sap · pdf filedatabase administration guide sap ... netweaver...

Database Administration Guide

SAP® NetWeaver 2004s

Business Intelligence - Administration Tasks: IBM DB2 Universal Database for UNIX and Windows

Document Version 1.00 – November 11, 2005

© Copyright 2004 SAP AG. All rights reserved. No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. The information contained herein may be changed without prior notice. Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors. Microsoft, Windows, Outlook, and PowerPoint are registered trademarks of Microsoft Corporation. IBM, DB2, DB2 Universal Database, OS/2, Parallel Sysplex, MVS/ESA, AIX, S/390, AS/400, OS/390, OS/400, iSeries, pSeries, xSeries, zSeries, z/OS, AFP, Intelligent Miner, WebSphere, Netfinity, Tivoli, and Informix are trademarks or registered trademarks of IBM Corporation in the United States and/or other countries. Oracle is a registered trademark of Oracle Corporation. UNIX, X/Open, OSF/1, and Motif are registered trademarks of the Open Group. Citrix, ICA, Program Neighborhood, MetaFrame, WinFrame, VideoFrame, and MultiWin are trademarks or registered trademarks of Citrix Systems, Inc. HTML, XML, XHTML and W3C are trademarks or registered trademarks of W3C®, World Wide Web Consortium, Massachusetts Institute of Technology. Java is a registered trademark of Sun Microsystems, Inc. JavaScript is a registered trademark of Sun Microsystems, Inc., used under license for technology invented and implemented by Netscape. MaxDB is a trademark of MySQL AB, Sweden.

SAP, R/3, mySAP, mySAP.com, xApps, xApp, SAP NetWeaver, and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and in several other countries all over the world. All other product and service names mentioned are the trademarks of their respective companies. Data contained in this document serves informational purposes only. National product specifications may vary. These materials are subject to change without notice. These materials are provided by SAP AG and its affiliated companies ("SAP Group") for informational purposes only, without representation or warranty of any kind, and SAP Group shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP Group products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty. SAP Library document classification: PUBLIC Disclaimer Some components of this product are based on Java™. Any code change in these components may cause unpredictable and severe malfunctions and is therefore expressively prohibited, as is any decompilation of these components. Any Java™ Source Code delivered with this product is only to be used by SAP’s Support Services and may not be modified or altered in any way. Documentation in the SAP Service Marketplace You can find this documentation at the following Internet address: service.sap.com/instguides

SAP AG Neurottstraße 16 69190 Walldorf Germany T +49/18 05/34 34 24 F +49/18 05/34 34 20 www.sap.com

Terms for Included Open Source Software This SAP software contains also the third party open source software products listed below. Please note that for these third party products the following special terms and conditions shall apply.

SAP License Agreement for STLport SAP License Agreement for STLport

between

SAP Aktiengesellschaft Systems, Applications, Products in Data Processing

Neurottstrasse 16 69190 Walldorf

Germany ( hereinafter: SAP )

and

you

( hereinafter: Customer ) 1. Subject Matter of the Agreement

a. SAP grants Customer a non-exclusive, non-transferable, royalty-free license to use the STLport.org C++ library (STLport) and its documentation without fee.

b. By downloading, using, or copying STLport or any portion thereof Customer agrees to abide by the intellectual property laws, and to all of the terms and conditions of this Agreement.

c. The Customer may distribute binaries compiled with STLport (whether original or modified) without any royalties or restrictions.

d. Customer shall maintain the following copyright and permission notices on STLport sources and its documentation unchanged: Copyright 2001 SAP AG

e. The Customer may distribute original or modified STLport sources, provided that: • The conditions indicated in the above permission notice

are met; • The following copyright notices are retained when

present, and conditions provided in accompanying permission notices are met:

Copyright 1994 Hewlett-Packard Company Copyright 1996,97 Silicon Graphics Computer Systems, Inc. Copyright 1997 Moscow Center for SPARC Technology. Copyright 1999,2000 Boris Fomitchev Copyright 2001 SAP AG Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose is hereby granted without fee, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation. Hewlett-Packard Company makes no representations about the suitability of this software for any purpose. It is provided "as is" without express or implied warranty. Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose is hereby granted without fee, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation. Silicon Graphics makes no representations about the suitability of this software for any purpose. It is provided "as is" without express or implied warranty. Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose is hereby granted without fee, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation. Moscow Center for SPARC

Technology makes no representations about the suitability of this software for any purpose. It is provided "as is" without express or implied warranty. Boris Fomitchev makes no representations about the suitability of this software for any purpose. This material is provided "as is", with absolutely no warranty expressed or implied. Any use is at your own risk. Permission to use or copy this software for any purpose is hereby granted without fee, provided the above notices are retained on all copies. Permission to modify the code and to distribute modified code is granted, provided the above notices are retained, and a notice that the code was modified is included with the above copyright notice. Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose is hereby granted without fee, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation. SAP makes no representations about the suitability of this software for any purpose. It is provided with a limited warranty and liability as set forth in the License Agreement distributed with this copy. SAP offers this liability and warranty obligations only towards its customers and only referring to its modifications.

2. Support and Maintenance

SAP does not provide software maintenance for the STLport. Software maintenance of the STLport therefore shall be not included. All other services shall be charged according to the rates for services quoted in the SAP List of Prices and Conditions and shall be subject to a separate contract.

3. Exclusion of warranty As the STLport is transferred to the Customer on a loan basis and free of charge, SAP cannot guarantee that the STLport is error-free, without material defects or suitable for a specific application under third-party rights. Technical data, sales brochures, advertising text and quality descriptions produced by SAP do not indicate any assurance of particular attributes.

4. Limited Liability a. Irrespective of the legal reasons, SAP shall only be liable for

damage, including unauthorized operation, if this (i) can be compensated under the Product Liability Act or (ii) if caused due to gross negligence or intent by SAP or (iii) if based on the failure of a guaranteed attribute.

b. If SAP is liable for gross negligence or intent caused by employees who are neither agents or managerial employees of SAP, the total liability for such damage and a maximum limit on the scope of any such damage shall depend on the extent to which its occurrence ought to have anticipated by SAP when concluding the contract, due to the circumstances known to it at that point in time representing a typical transfer of the software.

c. In the case of Art. 4.2 above, SAP shall not be liable for indirect damage, consequential damage caused by a defect or lost profit.

d. SAP and the Customer agree that the typical foreseeable extent of damage shall under no circumstances exceed EUR 5,000.

e. The Customer shall take adequate measures for the protection of data and programs, in particular by making backup copies at the minimum intervals recommended by SAP. SAP shall not be liable for the loss of data and its recovery, notwithstanding the other limitations of the present Art. 4 if this loss could have been avoided by observing this obligation.

f. The exclusion or the limitation of claims in accordance with the present Art. 4 includes claims against employees or agents of SAP.

Typographic Conventions

Type Style Description

Example Text Words or characters quoted from the screen. These include field names, screen titles, pushbuttons labels, menu names, menu paths, and menu options.

Cross-references to other documentation

Example text Emphasized words or phrases in body text, graphic titles, and table titles

EXAMPLE TEXT Technical names of system objects. These include report names, program names, transaction codes, table names, and key concepts of a programming language when they are surrounded by body text, for example, SELECT and INCLUDE.

Example text Output on the screen. This includes file and directory names and their paths, messages, names of variables and parameters, source text, and names of installation, upgrade and database tools.

Example text Exact user entry. These are words or characters that you enter in the system exactly as they appear in the documentation.

<Example text>

Variable user entry. Angle brackets indicate that you replace these words and characters with appropriate entries to make entries in the system.

EXAMPLE TEXT Keys on the keyboard, for example, F2 or ENTER.

Icons

Icon Meaning

Caution

Example

Note

Recommendation

Syntax

Additional icons are used in SAP Library documentation to help you identify different types of information at a glance. For more information, see Help on Help → General Information Classes and Information Classes for Business Information Warehouse on the first page of any version of SAP Library.

SAP BW Administration Tasks in Multi-Partition Installations: IBM DB2 UDB for UNIX and Windows

6 11.11.2005

Contents SAP NetWeaver 2004s Business Intelligence - Administration Tasks: IBM DB2 Universal Database for UNIX and Windows........8

1 Naming Conventions ........................................................................10 2 Standard SAP NetWeaver 2004s BI System on DB2 UDB for UNIX and Windows ..............................................................................11

2.1 Default Database Layout ..................................................................... 11 2.2 Default Database and File System Layout ......................................... 13

3 Database Layout Planning ...............................................................15 3.1 Number of Partitions............................................................................ 16 3.2 Data Distribution in a SAP BI System with Multipe Database Partitions..................................................................................................... 17 3.3 System Growth..................................................................................... 22

4 Creation of Large Infocubes, DataStore Objects, and PSA Tables....................................................................................................24

4.1 Creating Database Partition Groups................................................... 24 4.1.1 Creating a Database Partition Group for InfoCubes...................................................25 4.1.2 Creating Database Partition Groups for DataStore Objects.......................................26 4.1.3 Creating Database Partition Groups for PSA Tables .................................................27

4.2 Creating Tablespaces .......................................................................... 27 4.3 Creating Data Classes ......................................................................... 31

4.3.1 Assigning InfoCubes and Their Aggregates to Data Classes ....................................32 4.4.2 Assigning DataStore Objects to Data Classes ...........................................................33 4.4.3 Assigning PSA Tables to Data Classes......................................................................33

5 Improving Database Performance with Multi-Dimensional Clustering .............................................................................................35

5.1 DB2 Clustering ..................................................................................... 35 5.1.1 Index Clustering ..........................................................................................................36 5.1.2 Multi-Dimensional Clustering ......................................................................................37

5.2 Clustering support in SAP NetWeaver 04s ........................................ 41 5.3 Setting up MDC for DataStore Objects............................................... 51

5.3.1 Choosing the Right MDC-Dimensions for DataStore Objects ....................................51 5.3 Setting Up MDC for DataStore Objects Manually..........................................................53

5.4 Setting up MDC for InfoCubes ............................................................ 56 5.4.1 Choosing the Right MDC-Dimension for InfoCubes ...................................................56 5.4.2 Setting Up MDC for InfoCubes Manually....................................................................59

5.5 Reclustering InfoCubes and DataStore Objects................................ 62 5.5.1 Reclustering of InfoCubes ..........................................................................................63

5.5.1.2 Creating and Scheduling a Reclustering Job for an InfoCube..............................64 5.5.2 Reclustering of DataStore Objects .............................................................................66

5.5.2.1 Creating and Scheduling a Reclustering Job for DataStore Objects ....................67 5.5.3 Monitoring and Maintaining Reclustering Jobs...........................................................69

SAP BW Administration Tasks in Multi-Partition Installations: IBM DB2 UDB for UNIX and Windows

11.11.2005 7

5.6 Checking the Disk Space Consumption of MDC InfoProviders.....................................70 6 Adding of New Partitions to the Database .....................................71

6.1 Preparing New Partitions..................................................................... 72 6.1.1 Creating Additional Partitions .....................................................................................72 6.1.2 Updating the Database Configuration for Each New Partition ...................................73

6.2 Redistributing the Data with DB6CONV ............................................. 75 6.2.1 Adding Containers to Temporary Tablespaces on Each New Partition .....................75 6.2.3 Creating New Database Partition Groups ..................................................................76 6.2.4 Creating New Tablespaces in the New Database Partition Group.............................77 6.2.5 Checking the Partitioning Keys of All Partitioned Tables............................................77 6.2.6 Redistributing All Tables of a Tablespace ..................................................................79

6.3 Redistributing the Data with DB2 REDISTRIBUTE ............................ 80 6.3.1 Adding Containers to Temporary Tablespaces on Each New Partition .....................81 6.3.2 Adding Partitions to Database Partitions Groups .......................................................81 6.3.3 Creating Tablespace Containers on Each New Partition ...........................................82 6.3.4 Checking The Partitioning Keys of all Partitioned Tables...........................................82 6.3.5 Redistributing Database Partition Groups ..................................................................83

7 Best Practices ...................................................................................84 7.1 Data Clustering of InfoCube Fact Tables and DataStore Objects .... 84 7.2 Controlling Database Log Space Consumption ................................ 85 7.3 Dropping Indexes Before Deletion and Compression of InfoPackages .............................................................................................. 87 7.4 Increasing the Buffer Pool Hit Ratio ................................................... 88

8 Additional Sources of Information ..................................................90

SAP NetWeaver 2004s BI - Administration Tasks: IBM DB2 UDB for UNIX and Windows

8 11.11.2005

SAP NetWeaver 2004s Business Intelligence - Administration Tasks: IBM DB2 Universal Database for UNIX and Windows Purpose This documentation describes the physical database layout, the administration tasks that you have to perform and performance recommendations for an SAP NetWeaver 2004s BI system if your database is IBM DB2 Universal Database for UNIX or Windows.

It focuses especially on an environment with multiple database partitions and is also valid for SAP OLAP (online analytical processing) systems that are based on SAP NetWeaver 2004s BI SAP OLAP systems are, for example, SAP BW, SAP SCM, SAP SRM (formerly known as SAP APO) and SAP SEM.

The documentation contains the following information:

• Naming conventions [Page 10]

• A description of the standard database layout [Page 11]

• Information about and recommendations for the database layout planning [Page 15]

• Information about how to create large InfoCubes, DataStore Objects and PSA tables [Page 22]

• Information about how to improve database performance with multi-dimensional clustering MDC [Page35]

• Information about how to add new partitions to the database [Page 35]

• Best practices [4] with information about special database-related recommendations

• Additional sources of information about SAP BI [Page 70]

The information provided in this documentation mainly applies to newly installed SAP NetWeaver 2004s BI systems.

For additional information about administration tasks in SAP BI multi-partition installations, see the following SAP Notes:

• SAP Note 688647

• SAP Note 546262

• SAP Note 374502

Constraints When using this documentation, you have to consider the following information:

• SAP NetWeaver 2004s BI is only available for version 8 of IBM DB2 Universal Database for UNIX and Windows.

• Most of the recommendations in this documentation are also valid for earlier SAP BI releases starting with SAP BW 2.x.


11.11.2005 9

• There are differences in the database layout between the different SAP BW releases, which are not explicitly described in this documentation. Make sure that you adapt the given recommendation of this documentation to the requirements of your SAP BI release, for example:

DataStore and PSA tablespaces are not distributed across multiple partitions with SAP BW 2.x.

For SAP NetWeaver 2004s BI, a page size of 16 KB is used for fact tablespace. For SAP BW release 2.x, a page size of 8 KB is used.

Multi-dimensional clustering (MDC) is not supported for SAP BI releases before SAP NetWeaver 2004s BI.


10 11.11.2005

1 Naming Conventions In this documentation, the following naming conventions apply:

Database Versions Formal Name Short Name

IBM DB2 Universal Database for UNIX and Windows

DB2 UDB for UNIX and Windows

IBM DB2 Universal Database Enterprise Server Edition for UNIX and Windows

DB2 UDB ESE for UNIX and Windows

Database Terminology Term Description

Partition or database partition Part of the database that consists of its own data, indexes, configuration files and transaction logs

Database partition group Group of one or more partitions

New Tablespace Names As of SAP BW 3.1 Content the tablespace names have changed. Except for temporary tablespaces, the prefix <SAPSID># replaces the former prefix PSAP. Throughout this documentation the prefix <SAPSID># is used.

If your system is a SAP BI system with a release lower than 3.1 Content, the tablespace names do not change when you perform an upgrade to SAPNetWeaver 2004s BI. The prefix PSAP is still used.


11.11.2005 11

2 Standard SAP NetWeaver 2004s BI System on DB2 UDB for UNIX and Windows The following sections provide information about:

• Default Database Layout []

• Default Database and File System Layout [Page 13]

2.1 Default Database Layout This section provides information about default buffer pools, database partition groups and tablespaces that are created during an SAP NetWeaver installation. You probably will have to adapt the default database layout for production use of SAP BI. Throughout this section, we distinguish between SAP base database objects and BI-specific database objects.

Make sure that you apply all SAP BI-specific steps of an SAP NetWeaver installation as described in the documentation SAP NetWeaver 2004s ABAP on <Operating system platform>: IBM DB2 Universal Database for UNIX and Windows.

You can find this documentation in SAP Service Marketplace at service.sap.com/instguidesNW2004s → Installation.

Buffer Pools The default buffer pool IBMDEFAULTTBP is automatically created during an SAP system installation and exists on all partitions. IBMDBEFAULTTBP has a page size of 16 KB and a default size of 5000 pages. In general, you have to increase the size of this buffer pool size for production usage.

Database Partition Groups

SAP Base Database Partition Group

The SAP base database partition group is SAPNODEGRP_<SAPSID> and contains partition 0.

BI-Specific Database Partition Groups The following BI-specific default database partition groups are created during the SAP system installation:

• The default dimension database partition group NGRP_DIM_<SAPSID> that contains partition 0

• The default fact database partition group NGRP_FACT_<SAPSID> that contains all partitions

• The PSA and DataStore database partition group NGRP_ODS_<SAPSID> that contains all partitions


12 11.11.2005

Tablespaces All tablespaces use default buffer pool IBMDEFAULTBP and have a page size of 16 KB and extend size equals 2.

If you are upgrading a system, the tablespace names do not change. If the source system release is lower than SAP BW 3.1 Content, the prefix PSAP remains.

Base Tablespaces

All SAP base tablespaces are located on partition 0. SAP BI master data is located in SAP base tablespaces.

BI-Specific Tablespaces

The following BI-specific default tablespaces are created during the SAP system installation:

• The dimension tablespaces <SAPSID>#DIMD (data tablespace) and <SAPSID>#DIMI (index tablespace) are created in the database partition group NGRP_DIM_<SAPSID>. The corresponding data class is DDIM.

• The fact tablespaces <SAPSID>#FACTD (data tablespace) and <SAPSID>#FACTI (index tablespace) are created in the database partition group NGRP_FACT_<SAPSID>. The corresponding data class is DFACT.

• The tablespaces for PSA and DataStore tables are <SAPSID>#ODSD (data tablespace) and <SAPSID>#ODSI (index tablespace) and are created in the database partition group NGRP_ODS_<SAPSID>. The corresponding data class is DODS.


11.11.2005 13

The following graphic shows the SAP BI system tablespaces after an SAP BI server installation as part of the SAP NetWeaver ‘04 installation.

SAP BI Tablespace Layout After the Installation

Partition 0 Partition 1 Partition n

<SAPSID>#FACTD, <SAPSID>#FACTI

<SAPSID>#DIMD, <SAPSID>#DIMI

Base standardtablespaces

NGRP_FACT_<SAPSID>NGRP_DIM_<SAPSID>

NGRP_ODS_<SAPSID>

PSAPTEMP16

<SAPSID>#ODSD, <SAPSID>#ODSI

2.2 Default Database and File System Layout In a multi-partition system the BI specific tablespaces for fact, DataStore and PSA tables are distributed across all database partitions. The database containers of the BI-specific tablespaces are distributed across all sapdata directories. One container is created on each sapdata directory.

The following example shows fact tablespace containers (DMS) on UNIX where two sapdata directories are located on one database partition and <NNNN> equals the database partition number starting with 0000.

• For the data tablespace:

/db2/<SAPSID>/sapdata1/NODE<NNNN>/<SAPSID>#FACTD.container000

/db2/<SAPSID>/sapdata2/NODE<NNNN>/<SAPSID>#FACTD.container001

• For the index tablespace:

/db2/<SAPSID>/sapdata1/NODE<NNNN>/<SAPSID>#FACTI.container000

/db2/<SAPSID>/sapdata2/NODE<NNNN>/<SAPSID>#FACTI.container001


14 11.11.2005

The following example shows fact tablespace containers (DMS) on Windows where two sapdata directories are located on one database partition and <NNNN> equals the database partition number starting with 0000.

• For the data tablespace:

<drive>:\db2\<SAPSID>\sapdata1\NODE<NNNN>\<SAPSID>#FACTD.

container000

<drive>:\db2\<SAPSID>\sapdata2\NODE<NNNN>\<SAPSID>#FACTD.

container001

• For the index tablespace:

<drive>:\db2\<SAPSID>\sapdata1\NODE<NNNN>\<SAPSID>#FACTI.

container000

<drive>:\db2\<SAPSID>\sapdata2\NODE<NNNN>\<SAPSID>#FACTI.

container001


11.11.2005 15

3 Database Layout Planning Purpose You should consider using multiple database partitions for databases that are larger than 200 GB and that have BI database tables with more than 5 GB.

Using multiple database partitions requires a database server with several processors or that the database is distributed across several database servers.

The layout of your database has a large impact on the performance of your SAP BI system. It therefore requires careful planning.

Using multiple database partitions has the following advantages:

• Since the BI queries are processed in parallel, the query performance for large tables improves. Tests show that performance improves linear with the number of database partitions for most of the BI queries.

• Performance of other SQL statements also benefit from parallel processing, for example, UPDATE, DELETE, INSERT, JOINS, GROUP BY, INDEX/TABLE SCANS, SORT, INDEX CREATE.

• Improved performance of the database administration task, for example, backup, restore, REORG and RUNSTATS due to database partitioning parallelism.

• Overcome tablespace size limit. The size limit per partition is 256 GB for 16 KB page size and 512 GB for 32 KB page size.

• Parallel processing uses more hardware resources, such as, CPU, memory

or I/O capacity. If you have running many BI queries in parallel , you can then either create less database partitions or distribute BI tables over a subset of all database partitions.

• A BI system with multiple database partitions can require an increased administration effort.

Process Flow ...

1. You check and decide how much hardware you will need. The hardware sizing must reflect the size of the final hardware needed for the installation as closely as possible.

For general hardware sizing information for SAP BI, see SAP Service Marketplace at service.sap.com/bi → Performance.

To make a first sizing estimation, we recommend that you use the Web-based tool SAP Quick Sizer that is available on SAP Service Marketplace at service.sap.com/quicksizer.

You determine the number of machines and the amount of processors, memory, and disks that you need for the chosen hardware platform.


16 11.11.2005

2. You need to plan the database layout itself.

To be able to correctly plan the database layout, you must take the following sections into consideration:

Number of Database Partitions []

Data Distribution [Page 17]

System Growth [Page 22]

3. You create large InfoCubes, DataStore objects and PSA tables [Page 22].

3.1 Number of Partitions In general we distinguish between following two database partition types:

• Administration partition

The administration partition is the database partition that contains the database catalog, SAP basis tables, BI master data and BI dimension tables. All SAP work processes are connected to the administration partition.

You can optionally store InfoCube fact table, DataStore or PSA tables on the administration partition.

• Data partition

The data partition is a database partition on which InfoCube fact or DataStore or PSA tables are stored. Ideally, each data partition has equal data volume and equal data access workload.

The number of database partitions, which is based on a given hardware sizing, depends on the following:

• Number of processors on each database server.

We recommend the following:

Two to eight processors per administration partition.

One to four processors per data partition.

• Main Memory

We recommend the following:

4 to 16 GB per administration partition

4 to 8 GB memory per data partition

The number of CPUs and main memory size per database partition could be higher for very large BI systems that have a high concurrent workload, for example, several hundred database connections and more than 50 parallel BI queries running in parallel.

To determine the optimal number of database partitions, we recommend that you run a performance test with the customer-specific workload.


11.11.2005 17

To achieve optimal performance, make sure that there are no bottlenecks with I/O throughput and with communication between database partitions. We recommend that you switch on shared memory usage for logical database partitions and use a fast network switch for physical database partitions, for example, Gigabit Ethernet switch.

3.2 Data Distribution in a SAP BI System with Multipe Database Partitions General Recommendations When planning the data distribution in a SAP BI system, you should consider the following:

• SAP basis tables, BI master data tables and InfoCube dimension tables should be located on partition 0.

We recommend that you do not change this setup.

If you expect, that the number of distinct values in one dimension is getting close to the fact table cardinality, you should use a line item dimension for this dimension in the InfoCube’s layout.

• Use the same page size for all tablespaces.

If you do not use the same page size for all tablespaces, the database optimizer might choose the wrong bufferpool for DB2 temporary tables that are used for sort operations.

• Create groups of tables for PSA, InfoCube fact and datastore tables that should run with the same degree of parallelism. Distribute these table groups on the database partitions. Ideally each data partition has an equal volume of data and similar access of data workload.

You can locate table group with small fact tables on one partition and you can distribute groups with large fact tables over several partitions.

If you have a large number of small InfoCubes, you can create several groups for InfoCube fact tables and store all tables of each group in one separate tablespace. You can then distribute the separate tablespaces on different partitions. As a result you achieve a better workload distribution across the partitions.

• In general the workload on the administration partition is higher than on data partitions. Make sure, that the database server, on which the administration partition is located, has enough system resources available (for example, CPU, memory, I/O throughput) for the expected workload.


18 11.11.2005

You can choose one of the following options to deal with the expected workload:

Reduce the workload on the administration partition by storing InfoCube fact, DataStore or PSA tables on data partitions only.

The database partition server A, which contains the administration partition, has the highest workload with a CPU utilization higher than 90 %. All other database partition servers have a workload with a CPU utilization of less than 40 %.

A heavy workload on database partition server A can increase the query response time. In this case heavy workload means that database processes on the other database partition servers are accessing tables (for example, master data or SAP system basis tables) that are located on database partitionserver A.

To increase the total throughput of the SAP BI system, you should reduce the workload on server A. For example, InfoCube fact, DataStore or PSA tables should be redistributed over data partitions that are located on other database partition servers.

Leave the workload but increase system resources (for example, CPU, memory, I/O throughput) on the database server, on which the administration partition is located.

• Ideally keep database configuration and buffer pool size the same on all partitions. If the workload on the administration partition is higher than on a data partition, you might have to increase the buffer pool size and some database configuration values, for example, sort heap valueon administration partition.

• You always have to back up or restore the administration partition before you can back up or restore all other partitions in parallel.

To minimize the backup or restore execution time of the administration partition, we recommend that you should reduce the data volume size on this partition. Therefore, you should avoid to store InfoCube fact tables, DataStore tables or PSA tables on the administration partition.

• You can store all three types of tables of a DataStore object (active table, activation queue table and change log table).

• Production DataStore objects and PSA tables should be stored in different tablespaces and not in the default DataStore tablespace.

If you are using the SAP BI default settings, DataStore and PSA tables are stored in the default DataStore tablespace.

Examples of DatabaseConfigurations The following are examples of SAP Bi system configurations with:

• One database partition

• Multiple database partitions where large BI tables are distributed over all partitions

• Multiple database partitions where groups of tables are distributed over several partitions

For details about the database manager, database configuration and recommended buffer pool sizes, see SAP Note 584952.


11.11.2005 19

SAP BI system with one Database Partition.

The database size of the SAP BI system is less than 200 GByte. Therefore, the database is created with one partition only. All tables of the SAP BI system are stored in one partition.

Partition 0

(Administration Partition)

SAP Basis Tables

BI InfoCube and Aggregate Fact Fables

BI Dimension Tables

BI Master Data Tables

BI DataStore and PSA Tables


20 11.11.2005

Multiple Database Partitions Where Large BI Tables are Distributed on all Partitions.

The following example describes the database layout for a medium-size BI system, which has a 400 GB database distributed over 8 partitions.

Almost all InfoCube fact tables, DataStore tables and PSA tables are large, that is, larger than 5 GB. Therefore, these tables are distributed on all partitions. Since each table size is only a few GB, all aggregate fact tables are located on the administration partition.

The database configuration and buffer pool size are the same on all data partitions. Depending on the workload, the database configuration values and buffer pool size on the administration partition can be higher.

Partition 1...

(Data Partition)

Partition 7

(Data Partition)

Data Distribution Using 8 Partitions

Partition 0


SAP Basis Tables

BI Dimension Tables

BI Master Data Tables

BI Aggregate Fact Tables

BI InfoCube Fact Tables

BI DataStore Tables

BI PSA Tables


11.11.2005 21

Multiple Database Partitions Where Groups of Tables are Distributed Over Several Partitions.

The following example describes the database layout for a large SAP BI system with more than one Terabyte of data and with 16 partititions.

You can implement the database server layer as follows:

• You can create the 16 partitions on one large SMP database server (that is, logical database partitioning)

• You can distribute the 16 partitions on several smaller database servers with less hardware resources (CPUs, main memory). For example, two database servers with 16 CPUs and 32 GB memory.

In this example, the two different groups of BI tables are stored on data partitions as follows:

• Small BI Aggregate and InfoCube fact tables

Each table has a table size of a few GB. The query response time when accessing these tables is fast enough without parallel processing. Therefore, these tables are distributed on a single data partition.

• To use parallel processing wihin the database and to get fast query response times, large InfoCube and aggregate fact tables, large DataStore and PSA tables with table sizes larger than 5 GB are distributed over 14 partitions.

The database configuration and buffer pool size are the same on all data partitions. Depending on the workload, the database configuration values and buffer pool size on the administration partition can be higher.

Partition.1

(Data Partition)

Data Distribution Using 16 Partitions

Partition 0


SAP Basis Tables

BI Dimension Tables

BI Master Data

Small BI

Aggregate

and InfoCube

Fact Tables

Partition.2

(Data Partition)

Partition.15

(Data Partition)

Large BI InfoCube and Aggregate Fact Tables

Large BI DataStore Tables

Large BI PSA Tables


22 11.11.2005

3.3 System Growth The following section provides information about one of the most important objectives of the SAP BI design with multiple partitions, that is, the support of system growth, typical growth scenarios and the options to deal with it.

There are the following three typical growth scenarios in SAP BI systems:

• Increased data volume

• Increased workload

• Additional new SAP BI application areas

Increased Data Volume Over time the data volume increases on each partition. Thus, the query response time can increase, too. To reduce query response time, you can increase CPU and memory capacity on the SAP BI system and add new data partitions. You should then redistribute the data on the old and new data partitions. For more information about redistributing data, see Redistributing Data with DB6CONV [Page 75] or Redistributing Data with DB2 REDISTRIBUTE [Page 80].

The following describes a scenario where no data redistribution is required:

If you know that the maximum amount of data will be twice as much as in the first year of your SAP BI system, you can start with twice the number of partitions that you would need at the beginning (see the graphic below). At a later point in time, you can either add additional CPU, memory and I/O capacities to the existing database server or add new database partition servers and move logical partitions to the new database partition servers.

You have a system with an InfoCube with an initial size of 500 GB, but you plan to have a maximum data size of 1 TB. Initially, you have a ten-way machine with eight partitions. At a later point in time you add another machine and move four partitions to the new machine. On AIX, for example, this is easily possible if you separate the file systems of each partition in different volume groups.


11.11.2005 23

Machine 1Initial Layout

Scaled Layout

Partition0

Volume group 1

Partition1

Volume group 2

Partition2

Volume group 3

Partition3

Volume group 4

Partition4

Volume group 5

Partition5

Volume group 6

Partition6

Volume group 7

Partition7

Volume group 8

Machine 1

Partition0

Volume group 1

Partition1

Volume group 2

Partition2

Volume group 3

Partition3

Volume group 4

Machine 2

Partition4

Volume group 5

Partition5

Volume group 6

Partition6

Volume group 7

Partition7

Volume group 8

Increased Workload An increased workload with an unchanged data volume results in an increase of query activity either by the addition of BI applications or users to the SAP BI system. Typically, both causes longer query response time. To reduce the query response time, workload growth can require additional CPU and memory capacity. You might not have to increase the number of data partitions.

New BI application areas The introduction of new BI application areas, for example new InfoCubes for new business areas, into the SAP BI system usually involves a combination of additional data volume and additional workload. You can increase the CPU and memory capacity on your SAP BI system and add new data partitions, over which the new tables are distributed. You might not have to redistribute the existing data.

Make sure that for all three growth scenarios the data volume and workload is evenly distributed over all data partitions.


24 11.11.2005

4 Creation of Large Infocubes, DataStore Objects, and PSA Tables If you plan to create large InfoCubes, DataStore objects or PSA tables, you can create separate database partition groups, separate data tablespaces, separate index tablespaces and separate data classes for each new object in advance.

The advantages are as follows:

• You can maintain the physical disk layout separately for each object.

• You can add new partitions for single InfoCubes, DataStore objects or PSA tables, and you can redistribute them one at a time.

• You can locate InfoCubes, DataStore objects or PSA tables on different partitions.

You should consider adding new partitions to the database before you add a new InfoCube, DataStore object, or PSA table. For more information, see Adding New Partitions to the Database [Page 35].

The following sections describe the database-related steps you have to perform when you create new InfoCube, DataStore objects, or PSA tables on database partitions:

• Creating Database Partition Groups []

• Creating Tablespaces [Page 27]

• Creating Data Classes [Page 31]

4.1 Creating Database Partition Groups Use Creating database partition groups is part of the overall process of creating large InfoCubes, DataStore objects and PSA tables. A database partition group determines the partitions on which a tablespace is created.

Procedure ...

1. Identify the partitions where you create the large InfoCube, DataStore object, or PSA table. For more information, see Database Layout Planning [Page 15].

2. Create the database partition group(s) for the InfoCube, DataStore object, or PSA table as described in the following sections:

Creating Database Partition Groups for InfoCubes [Page 25]

Creating Database Partition Groups for DataStore Objects [Page 26]

Creating Database Partition Groups for PSA Tables [Page 27]


11.11.2005 25

4.1.1 Creating a Database Partition Group for InfoCubes Use You use the following procedure to create a database partition group for InfoCubes. The following two steps have to be performed: ...

1. You create a database partition group for fact tables.

2. You create a database partition group for aggregate tables.

You create a database partition group for aggregate tables to specify the single partition where the tablespace should reside or to specify a number of partitions for larger aggregates. You can create aggregates of different InfoCubes in the same tablespace on a single partition.

Procedure You create database partition groups either using the DBA Cockpit or native DB2 commands.

Using the DBA Cockpit

For information about how to create database partition groups using the DBA Cockpit, see the SAP Online documentation and choose SAP NetWeaver → SAP NetWeaver by Key Capabilities → Solution Life Cycle Management → System Management → Database Administration in CCMS → CCMS: SAP/DB2 UDB for UNIX and Windows → Configuration: Database Partition Groups.

Using Native DB2 Commands

To create a database partition group for Fact tables, proceed as follows:

1. Log on to a database server as user db2<dbsid>.

2. Open a command prompt and enter the following SQL statement: db2 create database partition group

NGRP_FACT_<SAPSID>#<INFOCUBE-SHORT-IDENTIFIER>

on dbpartitionnums(<partition_number>, <second_partition_number>,..)

where <INFOCUBE-SHORT-IDENTIFIER> is the technical name of the InfoCube.

To create a database partition group with one partition for aggregate tables, proceed as follows:



NGRP_AGGR_<SAPSID>#<INFOCUBE-SHORT-IDENTIFIER>

on dbpartitionnum(<partition_number>)

where <INFOCUBE-SHORT-IDENTIFIER> is the technical name of the InfoCube.


26 11.11.2005

4.1.2 Creating Database Partition Groups for DataStore Objects Use You can create a database partition group for the tables of a large DataStore object.



For information about how to create database partition groups using the DBA Cockpit, see the SAP Online documentation and choose SAP NetWeaver → SAP NetWeaver by Key Capabilities → Solution Life Cycle Management → System Management → Database Administration in CCMS → CCMS: SAP/DB2 UDB for UNIX and Windows → Configuration: Database Partition Groups.

Using Native DB2 Commands ...



NGRP_ODS_<SAPSID>#AT_<DATASTORE-SHORT-IDENTIFIER>

on dbpartitionnums (<partition_number>, <second_partition_number>,..)

where <DATASTORE-SHORT-IDENTIFIER> is the technical name of the DataStore object.

If you want to store the activation queue tables of all standard DataStore objects on a specific single partition, you create a database partition group to specify this partition by entering the following SQL statement: db2 create database partition group

NGRP_ODS_<SAPSID>#ODS_AQUEUES on dbpartitionnums (<partition_number>)


11.11.2005 27

4.1.3 Creating Database Partition Groups for PSA Tables Use PSA tables can grow very large in size. Therefore, partitioning of the PSA tablespaces considerably improves the performance of an SAP BI system when data is loaded from PSA.



For information about how to create database partition groups using the DBA Cockpit, see the SAP Online documentation and choose SAP NetWeaver → Application Platform (SAP Web Application Server) → Databases → IBM DB2 Universal Database for UNIX and Windows → CCMS: SAP/DB2 UDB for UNIX and Windows → Configuration: Database Partition Groups.

Using Native DB2 Commands



NGRP_PSA_<SAPSID>#<PSA-SHORT-IDENTIFIER>

on dbpartitionnums (<partition_number>,

<second_partition_number>,..)

where <PSA-SHORT-IDENTIFIER> is the technical name of the PSA object.

4.2 Creating Tablespaces Use You create new tablespaces as part of the overall process of Creating Large Infocubes, DataStore Objects and PSA Tables [Page 22].

There are several advantages in separating large SAP BI tables of InfoCubes, DataStore or PSA objects on different tablespaces:

• The size of tablespaces with 16 KB pages is limited to 256 GB per partition. If tables are located in the same tablespace, the size limit applies to the sum of these tables.

• Changing the tablespace disk layout is easier because you can perform the changes step by step with smaller data volumes.

You use and create managed-by-database (DMS) tablespaces for all types of data: dimension, fact, DataStore, PSA and aggregate tables and indexes. We recommend that you either use file or raw device containers for these tablespaces.


28 11.11.2005

Procedure You create data and index tablespaces for InfoCubes, DataStore objects, or PSA tables either using the DBA cockpit or an SQL statement.

DBA Cockpit

For information about how to create tablespaces using the DBA Cockpit, see the SAP Online documentation and choose SAP NetWeaver → Application Platform (SAP Web Application Server) → Databases → IBM DB2 Universal Database for UNIX and Windows → CCMS: SAP/DB2 UDB for UNIX and Windows → Space: Tablespaces.

SQL Statement ...


2. Open a command prompt and enter the following SQL statement: db2 create tablespace <SAPSID>#<TABLESPACE_NAME><T>

in database partition group <partition_group>

pagesize <pagesize> managed by database

using (file <container_1> <container_size>) on dbpartitionnum(<partition_number_1>)

using (file <container_2> <container_size>) on dbpartitionnum(<partition_number_2>)

…

using (file <container_n>) on dbpartitionnum(<partition_number_n>)

autoresize yes extentsize <extentsize> prefetchsize automatic

bufferpool <bufferpool>

The partition number should correspond to the 4-digit number in the container directory.

A rule of thumb is to create the index tablespace with half the size of the related data tablespace.

3. To create new objects in the tablespaces, you have to grant the corresponding authorization to user sap<sapsid>. To do so, enter the following SQL statement:

db2 grant use of tablespace <SAPSID>#<TABLESPACE_NAME><T> to public


11.11.2005 29

Explanation of Variables

Variable Description <SAPSID> 3-character SAP system ID

<TABLESPACE_NAME> Tablespace identifier that should correspond to the object identifier:

• <INFOCUBE_SHORT_IDENTIFIER>, <DATASTORE_SHORT_IDENTIFIER>

• <PSA_SHORT_NAME_IDENTIFIER>

<T> Type of tablespace:

• 'D’ for data

• 'I’ for index

<partition_group> Name of the database partition group as created in Creating Database Partition Groups [Page 24]

<pagesize> We recommend that you have 16 KB for fact, aggregate, DataStore and PSA tablespaces (data and index).

<container_x> File name of the DMS container:

• UNIX: ’/db2/<SAPSID>/sapdata<mm>/NODE<nnnn>/

<SAPSID>#<TABLESPACE_NAME><T>.

container<ccc>’

• Windows: ’<drive>:\db2\<SAPSID>\sapdata<mm>\

NODE<nnnn>\SAPSID>#<TABLESPACE_NAME><T>.

container<ccc>’ where <drive> is the drive letter of the logical drive.

<mm> Number of the sapdata directory.

The default values are 1 … n, where n is the number of sapdata directories that were created by SAPinst.

<nnnn> 4-digit partition number, for example, 0001 for partition 1.

<ccc> 3-digit container number of this tablespace container on the local partition

<container_size> Size of container

<partition_number_x> Number of partition where the container is located. It should correspond to <nnnn>.

<extentsize> For all tablespace types the recommended value is 2.

<bufferpool> Name of buffer pool this tablespace should be assigned to


30 11.11.2005

We recommend that you use the conventions for tablespace names, container directories, and container file names as described in the table above.

Example In the SAP BI system BWP in the database partition group NGRP_FACT_BWP#IC01, you want to create a new DMS fact tablespace for the fact tables of the InfoCube IC01. The containers should be located in sapdata directories 1 and 2. You then want to assign this tablespace to buffer pool IBMDEFAULTBP. To do so, proceed as follows:

...


2. Open a command prompt and enter the following SQL statement: db2 CREATE TABLESPACE BWP#IC01D IN DATABASE PARTITION GROUP NGRP_FACT_BWP#IC01 PAGESIZE 16K MANAGED BY DATABASE

USING (FILE ’/db2/BWP/sapdata1/NODE0000/BWP#IC01D.container000’ 25G) ON DBPARTITIONNUM (0)




AUTORESIZE YES EXTENTSIZE 2 PREFETCHSIZE AUTOMATIC

BUFFERPOOL IBMDEFAULTBP

The fact tablespace is stored in four containers on two partitions. On each partition, the containers are located in the sapdata1 and sapdata2 subdirectories. The size of each container is 25 GB. The short name of the tablespace is IC01.

Constraints If you are not using a storage system, you can avoid I/O bottlenecks by applying the disk striping function of DB2. To do so, create a tablespace with several containers of the same size, each one located on a separate disk. If the number of disks is limited, index and data tablespace containers can reside in pairs on the same disk.


11.11.2005 31

4.3 Creating Data Classes Use A data class is the counterpart of a tablespace in the SAP system. It associates a data tablespace with an index tablespace and defines a logical name for this combination. This logical name is referenced in the ABAP Dictionary and in the BI metadata.

To make the previously created tablespaces known to the SAP BI system, you create new data classes by defining a data class for the fact tables, aggregates, DataStore and PSA tablespaces according to the default data classes DFACT and DODS.

You assign these data classes in the SAP BI Data Warehousing Workbench when you create a new InfoCube, DataStore object, or PSA table as described in the following sections. The assigned data classes determine the tablespaces and the corresponding database partition groups.

Procedure To create a new data class, proceed as follows:

1. In your SAP BI system, call transaction DB6COCKPIT.

2. In the navigation frame, choose Configuration → Data Classes.

The Configuration: Data Classes screen appears

3. To add a new data class, choose Add.

The Add Data Class dialog box appears.

4. Enter the name and description of the new data class. The data class name should be in compliance with the information in SAP Note 46272.

You should use the following prefixes for the data classes:

Fact table: ZF*

Dimension table: ZD*

Aggregate table: ZA*

DataStore table: ZO*

PSA table: ZP*

Keep in mind that not defining data classes according to the naming conventions has an impact on future upgrades of your system.

Such entries will not be recognized as customer entries and will be lost during the upgrade.

5. From the list box, select a data and index tablespace for the new data class.

6. To confirm your entries, choose Add.

See also:

• Assigning InfoCubes and their Aggregates to Data Classes [Page 32]

• Assigning DataStore Objects to Data Classes [Page 33]

• Assigning PSA Tables to Data Classes [Page 33]


32 11.11.2005

4.3.1 Assigning InfoCubes and Their Aggregates to Data Classes Use The following describes the two steps that you have to perform to assign InfoCubes and their aggregates to data classes.

By default, the fact tables of an InfoCube and its aggregates are created in the tablespace that is defined in the data class DFACT. By default, the dimension tables are created in the tablespace that is defined in the data class DDIM. For each InfoCube that should be stored in a separate tablespace, you need to overwrite the default data class with the new data class.

You can assign new data classes to fact tables of an InfoCube as long as the InfoCube does not contain any data.

Procedure

Changing the Data Class of Infocubes

1. In your SAP BI system, start the Data Warehousing Workbench.

2. Create a new InfoCube or choose the required InfoCube and choose Change from the context menu.

The Edit InfoCube screen appears.

3. Choose Extras → Maintain DB-storage parameters.

The Maintain Storage Parameters dialog box appears.

4. In the InfoCube: Logical Storage Parameter area, change the data class for fact tables.

5. If you created separate tablespaces for the InfoCube or Aggregate dimension tables, change the value in the Data class field to the new data class for dimension tables.

6. Once you have completed all entries, choose Activate.

When you activate the InfoCube, the tables are created in the tablespaces when you activate the InfoCube.

Changing the Data Class of Aggregates

By default, the aggregate fact tables of an InfoCube are created in the same data class as the fact tables of the InfoCube. To determine the default fact table data class for all subsequently created aggregates of an InfoCube, proceed as follows: ...

1. In your SAP BI system start the Data Warehousing Workbench, choose the required InfoCube and choose Maintain Aggregates…. from the context menu.

The Maintenance for Aggregate screen appears.

2. Choose Extras → Change Data Class for Aggregates.

The Choose Data Class for Aggregates for InfoCube <InfoCube_name> dialog box appears.


11.11.2005 33

3. Choose the new data class for aggregate tables and confirm your entries.


If you want to change the data class of already filled aggregates, you have to deactivate the aggregates first. Then re-activate and fill them again.

4.4.2 Assigning DataStore Objects to Data Classes Use By default, all tables of a DataStore object are created in the tablespaces that were defined in the data class DODS. You can change the default data class of the active table and the data class of the activation queue table as long as the DataStore object does not contain any data.

Procedure 1. In your SAP BI system, start the Data Warehousing Workbench.

2. Create a new DataStore object or choose the required DataStore object and choose Change from the context menu.

The Edit DataStore Object screen appears.


The Maintain Storage Parameters dialog box appears.

4. In the Log. Storage Parameter for DataStore Object (Active Records) area, choose a new data class.

5. In the Log. Storage Parameter for DataStore Object (Activation Queue) area, choose a new data class.

6. Confirm your entries.


4.4.3 Assigning PSA Tables to Data Classes Use By default, PSA tables are created in the data class DODS. You can change the data class of a PSA table if the corresponding DataSource has been assigned to an InfoSource. Once the transfer structure of the InfoSource has been activated, you cannot change the data class of the PSA table.

A PSA table is created when the Transfer Structure of an InfoSource is activated.


34 11.11.2005

Procedure 1. In your SAP BI system, start the Data Warehousing Workbench.

2. Assign a DataSource to an InfoSource object.

The InfoSource <name> Change screen appears.


The Maintain Storage Parameters for PSA Table dialog box appears.

4. Choose the new data class for PSA tables and confirm your entries.

5. To activate the transfer structure, choose Activate.


11.11.2005 35

5 Improving Database Performance with Multi-Dimensional Clustering The following sections describe in detail the integration of the new database performance feature Multi-Dimensional Clustering (MDC) into SAP Netweaver 2004s BI. A technical overview of the feature is provided and it is shown how SAP Netweaver 2004s BI takes advantage of it for the several InfoProviders.

5.1 DB2 Clustering Using clustering, DB2 maintains a physical order of the data records of a table. Records with same key values or key values in the same range are stored physically close together. Clustering can reduce the number of data pages that must be read to process an SQL operation and thus improves the performance of SQL statements.

SAP NetWeaver 2004s BI supports the following two types of DB2 clustering:

• Index clustering

Sorts the records in the key order of an index (clustering index). Clustering indexes improve the performance of range queries that have restrictions on the key or keys of the clustering index.

For more information, see Index Clustering [page 36].

• Multi-dimensional clustering (MDC).

Sorts the records along multiple dimensions and considerably improves the performance of queries that have restrictions on multiple columns.

MDC is especially designed for data warehousing. Tests proved that MDC - compared to index clustering - reduced the query execution time on fact tables of InfoCubes up to 80 % when using multiple MDC dimensions.

MDC offers fast roll-in and roll-out capabilities that you can use to improve the performance of the loading and deleting of larger portions of a table. For example, an OLAP cube contains historical data and you want to delete the data of a past year that is not used anymore in reporting. If MDC is defined on the time dimension, this considerably improves the delete execution time and limits the required log space. Internal tests showed an improvement up to 30% in comparison to index clustering.

For more information, see Multi-Dimensional Clustering [page 37].


36 11.11.2005

5.1.1 Index Clustering Using a clustering index, DB2 maintains the physical order of the data in the key order of the index when records are inserted and updated in a table. Performance is improved with a clustering index because only a portion of a table needs to be accessed and more efficient pre-fetching is performed.

Figure: Table with a Clustering Index and a Non-Clustered Index

The table shown here has two record-based indexes defined:

• Clustering index on Year

As keys are scanned in the index, the corresponding records should be found for the most part on the same or neighboring data pages in the table.

• Non-clustered index on Region

As keys are scanned in that index, the corresponding records are likely to be found on random data pages throughout the table.

Scans on the clustering index will result in better I/O performance and will benefit more from sequential pre-fetching.

Table

Year Clustering Index

Region Un-clustered


11.11.2005 37

Using index clustering has the following drawbacks:

• Clustering is not guaranteed

Since space is filled up in the data pages over time, clustering is not guaranteed. An insert operation attempts to add a record physically close to records that have the same or similar clustering key values. If no space can be found in the ideal location, the record is inserted elsewhere in the table. This will degrade the quality of the clustering over time.

To re-establish the best clustering of a table and to set up data pages with additional free space to accommodate future clustered insert requests, periodic table reorganizations are required.

• Clustering on a single dimension

Only one index can be designated as the clustering index, and all other indexes are non-clustered

5.1.2 Multi-Dimensional Clustering In an MDC table, every unique combination of MDC dimension values forms a logical cell, which can physically consist of one or more blocks of pages. Each data page belongs to exactly one block and all blocks consist of an equal number of pages.

The logical cell will only have enough blocks associated with it to store the records that have the dimension values of this logical cell. If there are no records in the table that have the dimension values of a particular logical cell, no blocks will be allocated for that logical cell.

The size of a block equals to the extent size of the table space, so that block boundaries line up with extent boundaries.

MDC introduces indexes that are block-based. These indexes are much smaller than regular record-based indexes. Thus block indexes consume a lot less less disk space and can be scanned faster.

Block indexes point to blocks or groups of records instead of individual records. They are structurally the same as regular indexes, except that they point to blocks instead of records.

Block indexes are smaller than regular indexes by a factor of the block size multiplied by the average number of records on a page. The number of pages in a block is equal to the extent size of the table space, which can range from 2 to 256 pages. The page size can be 4 KB, 8 KB, 16 KB, or 32 KB.

MDC physically organizes data in a table into blocks according to the clustering values and then accesses these blocks using block indexes.


38 11.11.2005

Features Compared to a clustering index, MDC has the following features:

• MDC allows you to physically cluster a table on more than one key or dimension simultaneously. With MDC, the benefits of single-dimensional clustering, which are described in the previous section, are extended to multiple dimensions or clustering keys.

Performance of queries with predicates on the clustered dimensions of a table is improved because these queries access only those pages that have records with the correct dimension values. The qualifying pages are grouped into blocks which results in most efficient sequential pre-fetching.

• An MDC-table maintains and guarantees its clustering over all dimensions automatically and continuously. Using MDC, you do not have to reorganize MDC tables to restore the physical order of the data.

• MDC supports the typical data warehousing operations of loading data from external source systems (roll-in) and deleting old data that is no longer needed for reporting (roll-out).

• If bulks of data are inserted into separate cells by different processes, locking is reduced to locking MDC blocks instead of single rows. Block locking is enabled for an individual table by setting the LOCKSIZE parameter to BLOCKINSERT in the database system catalog. The default for this parameter is ROW which signifies that each row is locked.

Block locking is automatically enabled in SAP BI where applicable.

• If data is deleted from an MDC cell the blocks that belong to the cell are processed in the following way:

The block is locked. Then a log entry is written and it is marked as rolled-out without actually clearing the data. The block ID is removed from the block indexes. Additionally, for each data page one log record has to be written. If there are record-based indexes defined on the table, the index entries have to be processed in the standard way. Blocks can be re-used as soon as the rollout is committed.


11.11.2005 39

Figure: MDC Table with two Block Indexes

The figure above shows an MDC table that is organized in a way that records with the same Year and Region values are grouped together into separate blocks, or extents. In this case, two block indexes are created, one for the Year dimension and one for the Region dimension. These block indexes only point to the blocks in the table.

A scan of the Year block index for all records with the value Year between 1998 and 1999 will find three blocks that qualify. A data scan of each of these three blocks will return only the records that are between 1998 and 1999, and the data scan will find these records clustered on the sequential pages within each of the blocks.

At the same time, and completely independently, a scan of the Region index for records Region = EAST will find two blocks that qualify. Only the records with Region = EAST will be found in these two blocks.

If there is no single MDC dimension that already contains all dimension columns, a composite block index that contains all columns across all MDC dimensions is also automatically created in addition to the dimension block indexes. This composite block index is used to maintain the clustering of the data over insert and update operations. In SAP BI, the composite block index is always created when more than one MDC dimension is selected.

Block Index

Region Block Index

Table1997 EAST

1997 WEST

1997 WEST

1998 NORTH

1999 EAST

1999 SOUTH

Year

Block


40 11.11.2005

Figure: Blocks Allocated and Block Indexes for an MDC Table

This figure shows the block indexes and those blocks that might have been allocated for the table after a larger amount of data has been inserted. The blocks are represented by blue rectangles and are numbered according to the logical order of allocated extents in the table. The grid represents the logical partitioning of these blocks, and each square represents a logical cell.

You want to insert a new row with Region = EAST and Year = 1999 into a table:

• If there is still enough free space for the row in the blocks that are allocated for rows with Region = EAST and Year = 1999, the new row is inserted into one of the blocks. The block indexes do not have to be updated because no new blocks have to be allocated.

• If there is not enough free space, either an existing empty block is re-used or a new block (extent) is allocated. The row is then inserted into the block and the block indexes are updated with the new block ID.

When the last row of a block is deleted, the block ID is removed from the block indexes so that the block ID can be reused in either the same or in a different logical cell.

When the dimension values of a row are updated so that the row belongs to another logical cell after the update, the row is deleted from the block where it is currently stored and inserted into a block that belongs to its new logical cell.

East North South West

1999

1998

1997 Block 2

Block 14

Block 10

Block 13

Block 3

Block 4

Dimension block index for "Year"

Block 1 Block 5

Block 7

Block 6

Block 12

Block 8

Block 9

Block 11

Dimension block index for

"Region"


11.11.2005 41

Advantages of MDC In addition to the clustering improvements described above, the use of MDC provides the following advantages:

• Probes and scans of block indexes are much faster due to their small size compared to record-based indexes.

• Block indexes and the corresponding organization of data allow fine-grained partition elimination or selective table access.

• Queries that are using the block indexes benefit from the reduced index size, optimized pre-fetching of blocks, and guaranteed clustering of the corresponding data.

• Reduced locking and predicate evaluation is possible for some queries.

• Block indexes have much less overhead associated with them for logging and maintenance because they only need to be updated when adding the first record to a block, or removing the last record from a block.

• Data rolled in can reuse the contiguous space left by data previously rolled out.

5.2 Clustering Support in SAP NetWeaver 2004s Since release SAP Business Information Warehouse 2.0A, index clustering was exclusively used for fact tables of InfoCubes and Aggregates.

As of SAP NetWeaver 2004s Business Intelligence (BI), the support of the new clustering type MDC is added to more objects as follows:

• Persistent Staging Area (PSA)

• Standard DataStore

• DataStore for Direct Update

• Write-Optimized DataStore

• InfoCube and Aggregate.


42 11.11.2005

This section describes the table layout for each object, the indexes that are created on these tables and how MDC is applied to them.

The following figure shows the indexes that can be created on SAP BI tables.

Persistent Staging Area (PSA) A PSA table has all fields of the corresponding transfer structure and a technical key that consists of the following three fields:

• Request ID

• Data package ID

• Record number.

The technical key allows to uniquely address a single row and is the primary key. A request can contain a number of data packages that have a configurable maximum number of records each.

A PSA table can reside in a partitioned tablespace. The partitioning key always is the record number field.

All operations on the PSA are done request-wise, that is, INSERT, READ and DELETE. To generally improve database performance of all operations, the table is organized by request. Therefore, a PSA table is created by default with MDC and has the request ID field as the only MDC dimension. Thus, records that belong to the same request are stored in the same blocks (extents).

Space consumption is minimal because MDC is performed on a single field and it is likely that a request has more records than can be stored in a single block (extent).

To reduce locking during insert and to speed up loading into PSA, the lock size of a PSA table is set to BLOCKINSERT.

You cannot change the clustering settings of a PSA table.

MDC Block Index

Secondary Index

Index name

Secondary Clustering

Index Index name

Table table name

Key Field Field name

Field Field Name

Table with a set of fields. The table name and the field names as they appear in the ABAP dictionary are in italic. The key fields are the fields of the primary key. If the table is a fact table the key fields are the dimension ID fields.

Secondary index and its ABAP dictionary name in italic. The position of the icon indicates on which field the index is created. A compound index can span across several fields.

Secondary index that is the clustering index for this table.

AN MDC block index that is created by the database. It will not appear in the ABAP dictionary.


11.11.2005 43

The following figure shows the default MDC settings of a PSA table.

Standard DataStore A standard DataStore consists of three database tables:

• Activation Queue table

• Change Log table

• Active Table.

The Activation Queue table is a PSA-like table that has fields of all InfoObjects and a technical key that consists of the three fields, that is, request SID, data package ID and record number. The technical key is the primary key. All operations on the Activation Queue table are performed request-wise.

An Activation Queue table can reside in a partitioned tablespace. The partitioning key alwaysis the record number field.

To generally improve database performance of INSERT, READ and DELETE operations on the Activation Queue table, the table is created by default with MDC and has the request SID field as the only MDC dimension. The lock size is set to BLOCKINSERT.

You cannot change the clustering settings for this table.

The Change Log table has the same structure as the Activation Queue table except for the request ID field which replaces the request SID field. The technical or primary key consists of three fields, that is, request ID, data package ID and record number.

An Change Log table can reside in a partitioned tablespace. The partitioning key always is the record number field.

Request ID REQUEST

InfoObject /BIC/…

PSA table /BI0/B0000101000

Default MDC of a PSA table

Package DATAPAKID

Record RECORD

InfoObject /BIC/…

InfoObject /BIC/…

…

MDC Block Index

Unique Primary Key Index ~0


44 11.11.2005

By default the Change Log table is created with MDC and has the request ID field as the only MDC dimension for the same reasons as for the Activation Queue table. The lock size is set to ROW because multiple processes can write into the same request at a time.

The Active Table has the fields of all InfoObjects. The primary key consists of the InfoObject fields that were selected as key fields by the user before activation of the DataStore object.

An Active Table can reside in a partitioned tablespace. The partitioning key always contains all user-selected key fields.

To improve query performance on the Active Table, the user can create secondary indexes or define MDC on InfoObject fields when creating the DataStore object. For more information about defining MDC, see Setting Up MDC for DataStore Objects [Page 51].

MDC dimensions can only be InfoObject fields that were selected as key fields. The lock size is always set to ROW, even if MDC is selected because multiple activation processes can write concurrently into the same MDC logical cell of the Active Table.

By default, no secondary indexes are created and no clustering is used for the Active Table. While you can create additional indexes any time, even if the DataStore object already contains data, you can only change the MDC settings as long as the DataStore object is empty.

The following figures show the default MDC-settings for Activation Queue table, Change Log table and Active Table:

Request ID REQUEST

InfoObject /BIC/…

Change Log table /BI0/B0000801000

Default MDC of the Change Log Table of the DataStore STORE

Package DATAPAKID

Record RECORD

InfoObject /BIC/…

InfoObject /BIC/…

…

MDC Block Index


Request SID SID

InfoObject /BIC/…

Activation Queue table /BIC/ASTORE40

Default MDC of the Activation Queue Table of the DataStore STORE

Package DATAPAKID

Record RECORD

InfoObject /BIC/…

InfoObject /BIC/…

…

MDC Block Index



11.11.2005 45

DataStore for Direct Update A DataStore for Direct Update consists of the Active Table only.

The structure and the clustering of Active Table is exactly the same as for the Standard DataStore.

Write-Optimized DataStore Write-Optimized DataStore consists of a single PSA-like table that has the fields of all InfoObjects and a technical key with the following fields:

• Request ID

• Data package ID

• Record number.

A secondary index, which can either be unique or non-unique, is created for the user-selected key fields. Another non-unique secondary index is created on the technical key fields.

The table can reside in a partitioned tablespace. The partitioning key always contains all user–selected key fields.

To generally improve the database performance of INSERT and DELETE operations, the table is created by default with MDC having the request ID field as the only MDC dimension. Lock size is set to BLOCKINSERT.

You cannot change the clustering settings for the Write-Optimized DataStore.

InfoObject /BIC/…

Active table /BIC/ASTORE00

MDC of the Active Table of the DataStore STORE The DataStore contains four InfoObjects in the primary key and a set of other InfoObjects. MDC has been defined on two InfoObjects in the primary key.

InfoObject /BIC/…

InfoObject /BIC/…

InfoObject /BIC/…

InfoObject /BIC/…

…

MDC Block Index

InfoObject /BIC/…


MDC Block Index


46 11.11.2005

InfoCube An InfoCube consists of the following tables:

• Tables for each dimension

• Fact table (F fact table)

• Compressed fact table (E fact table)

Aggregates that belong to an InfoCube are small InfoCubes by themselves and are discussed later in this chapter.

A dimension table contains a dimension ID field and SID fields for all InfoObjects that are assigned to the dimension. The primary key only contains the dimension ID field.

A dimension table cannot reside in a partitioned tablespace and therefore has no partitioning key. By default, no clustering is used for dimension tables.

The F fact table contains dimension ID fields for all dimensions and fields for all key figures. A primary key is not created for this table. Secondary indexes are created on each dimension ID field except for the unit dimension.

Line item dimensions contain a field with SID-values for the line item characteristic in the F fact table instead of a dimension ID field.

The F fact table can reside in a partitioned tablespace and the partitioning key contains all dimension ID fields except for the request dimension ID field.

Data is loaded and deleted request-wise into and from the F fact table. Reading data is in general not performed request-wise and queries are normally restricted on dimension ID fields.

By default, index clustering is used for the F fact table to improve query performance. To improve performance of queries that contain restrictions on the time dimension (which is in general the case for all queries), the clustering index is the secondary index on the time dimension ID field.

If the InfoCube does not contain any time characteristics, the clustering index is the secondary index on the request dimension ID field. In this rare case, index clustering improves only the performance of DELETE operations while it has no influence on the query performance.

Request ID REQUEST

InfoObject /BIC/…

Write-optimized DataStore table /BIC/ASTORE00

Default MDC of the Write-Optimized DataStore STORE

Package DATAPAKID

Record RECORD

InfoObject /BIC/…

InfoObject /BIC/…

…

MDC Block Index

Non-Unique Index ~RDR

Unique or Non-Unique Index ~KEY


11.11.2005 47

To speed up data INSERT and DELETE operations and to further improve the query performance, you can manually change the clustering to MDC for an InfoCube. For more information about how to define MDC dimensions for an InfoCube, see Setting up MDC for InfoCubes [page 56].

The clustering settings take effect on E fact and F fact tables as follows:

MDC dimensions can be the time dimension ID field and all dimension ID fields of the user-defined dimensions. You cannot select the request dimension ID field. By default, it is always the first MDC dimension on the F fact table. This ensures a better performance of INSERT and DELETE operations for the F fact table.

In place of the time dimension ID field, you can select the time characteristic Calendar Month (0CALMONTH) or Fiscal Period (0FISCPER) as MDC dimension. An additional field is added to the E fact and F fact table that contain the SID values for this characteristic and an MDC block index is created on this field.

If a dimension is selected as an MDC dimension, the secondary index on the dimension ID field is not created. Instead, a database-internal block index is created that does not appear in the ABAP DATA dictionary.

If MDC is selected, the lock size of the table is BLOCKINSERT.

The E fact table has exactly the same fields as the F fact table. It also does not have a primary key. Instead a unique, compound index is created that contains all dimension ID fields. The compound index is called P-index and is the clustering index if index clustering is used. The partitioning key is the same as for the F fact table.

Data is inserted into the E fact table only by using compression. The relation to a request is lost and only selective deletion can be performed.

MDC is applied to the E fact table in the same way as to the F fact table except for the request dimension ID field, which is not added to the MDC dimensions.

The following figures show the clustering settings for F fact tables when you are using MDC or index clustering.


48 11.11.2005

Secondary Index ~020

Multi-dimensional Clustering of InfoCube CUBE The InfoCube contains 3 user-defined dimensions and 2 cumulative key figures.

MDC has been defined on the time characteristic 0CALMONTH. Only the fact tables are shown.








Unique Index~P

Request dimension

KEY_CUBEP

Time dimension

KEY_CUBET

Time charac. /BIC/…

Unit dimension

KEY_CUBEU

User-def. dimension

KEY_CUBE1

User-def. dimension

KEY_CUBE2

User-def. dimension

KEY_CUBE3

Key Figure /BIC/…

F fact table /BIC/FCUBE

Key Figure /BIC/…

MDC Block Index

MDC Block Index

Request dimension

KEY_CUBEP

Time dimension

KEY_CUBET

Key Figure /BIC/…

Unit dimension

KEY_CUBEU

User-def. dimension

KEY_CUBE1

User-def. dimension

KEY_CUBE2

User-def. dimension

KEY_CUBE3

Key Figure /BIC/…

E fact table /BIC/ECUBE

Time charac. /BIC/…

MDC Block Index


Request dimension

KEY_CUBEP

Time dimension

KEY_CUBET

Key Figure /BIC/…

Unit dimension

KEY_CUBEU

User-def. dimension

KEY_CUBE1

User-def. dimension

KEY_CUBE2

User-def. dimension

KEY_CUBE3

Key Figure /BIC/…

F fact table /BIC/FCUBE

Request dimension

KEY_CUBEP

Time dimension

KEY_CUBET

Key Figure /BIC/…

Unit dimension

KEY_CUBEU

User-def. dimension

KEY_CUBE1

User-def. dimension

KEY_CUBE2

User-def. dimension

KEY_CUBE3

Key Figure /BIC/…

E fact table /BIC/ECUBE

Default Index Clustering of InfoCube CUBE The InfoCube contains 3 user-defined dimensions and 2 cumulative key figures. Only the fact tables are shown.

Secondary Clustering

Index ~020








Unique Clustering Index~P


11.11.2005 49

Aggregate An aggregate is a copy of its Base-InfoCube with the reduction of one or several characteristics, dimensions or key figures. The key figures are aggregated for the remaining dimensions.

If an aggregate contains an unchanged dimension of the Base-InfoCube with the same characteristics, the dimension table of the Base-InfoCube is reused by the aggregate rather than a new dimension table is created. The fact tables contain the same dimension IDs for this dimension as the fact tables of the Base-InfoCube.

If an aggregate contains a reduced dimension of the Base-InfoCube which has fewer characteristics, a new dimension table is created.

If the maximum number of dimensions (16) is not exceeded for the aggregate, dimensions of the Base-InfoCube are split into a number of line item dimensions by creating for each characteristic a line item dimension in the aggregate.

The fact tables contain the dimension ID fields of the aggregate dimensions and the fields of the aggregate key figures. The layout of indexes and partitioning keys are the same as for the fact tables of an InfoCube.

The clustering of the aggregate’s fact tables is inherited from the clustering of the Base-InfoCube. If Index Clustering is used the Clustering Index for the F fact table is the secondary index on the time dimension ID field or, if the time dimension does not exist, the secondary index on the request dimension ID field. The Clustering Index for the E fact table is the P-index.

If MDC is defined on the Base-InfoCube, a dimension ID field in the aggregate’s fact table will be an MDC dimension if the dimension is unchanged and points to a dimension of the Base-InfoCube that has been defined as an MDC dimension. If MDC is defined on a time characteristic, the aggregate’s fact tables will only be clustered on the time characteristic if the aggregate contains the time characteristic. If the aggregate does not contain any unchanged MDC dimensions, Index Clustering is used instead. This is also the case when MDC has been disabled for the aggregate in the aggregate maintenance screen by selecting Change Partitioning from the Aggregate menu or the context menu of the aggregate.


50 11.11.2005

The following table summarizes the MDC settings for all objects:

Table MDC dimension(s) Fix MDC dimensions

Clustering Changeable

Locksize

PSA Request ID Yes No Block

Standard DataStore - Activation Queue

Request SID Yes No Block

Standard DataStore – Change Log

Request ID Yes No Row

Standard DataStore – Active

User-defined, only key fields

No Yes Row

DataStore for Direct Update – Change Log

Request ID Yes No Row

DataStore for Direct Update – Active

User-defined, only key fields

No Yes Row

Write Optimized DataStore

Request ID Yes No Block

F Fact Table – InfoCube

Fix: Request Dimension ID field, User-defined: only dimension ID fields and 1 time characteristic (0CALMONTH or 0FISCPER), same as E Fact Table

No Yes Block

E Fact Table – InfoCube

User-defined: only dimension ID fields and 1 time characteristic (0CALMONTH or 0FISCPER), same as F Fact Table

No Yes Block

F Fact Table – Aggregate

Fix: Request Dimension ID field, Inherit MDC dimensions from InfoCube

Yes (Inheritance)

No (Inheritance)

Block

E Fact Table – Aggregate

Inherit MDC dimensions from InfoCube

Yes (Inheritance)

No (Inheritance)

Block

For more information about how to change standard MDC settings, see SAP Note 832621.


11.11.2005 51

5.3 Setting up MDC for DataStore Objects The default setting for the Active Table of a DataStore Object is no clustering. To take advantage of MDC, the user must manually set up the MDC dimensions.

The following sections provide information about:

• Choosing the right MDC dimensions before you set up MDC

• Setting up MDC for DataStore objects manually

5.3.1 Choosing the Right MDC Dimensions for DataStore Objects Before you actually set up MDC in your SAP BI system, you have to make sure that you choose the right MDC dimensions.

The following section provides important background information about:

• Improving performance of queries

• Improving performance of selective deletion

• Limiting disk space consumption

Read this information carefully before you set up MDC manually.

Improving Performance of Queries The response time of a query on an MDC table only improves if the query is restricted on one or more MDC dimensions. You therefore choose MDC dimensions according to the expected queries on the DataStore object as follows:

• You choose dimensions for which restrictions in your queries occur frequently.

Usually, most queries have a restriction on a time characteristic. Therefore, you should select an InfoObject that is a time characteristic. If most of the queries are restricted on the time characteristic 0CALMONTH or 0FISCPER, you should select either of these time characteristics as MDC dimension.

If there are not enough records to fill up MDC blocks, choosing 0CALDAY as MDC dimension can lead to excessive disk space consumption.

• Specify the status numbers of the MDC dimensions in the order of the frequency with which they occur in the queries.

Specify lower status numbers for dimensions that occur more often. If you do not know the frequencies of restrictions or if they are nearly the same, you should sort the MDC dimensions according to their selectivity. That is, choose lower status numbers for dimensions with a larger cardinality.


52 11.11.2005

Improving Performance of Selective Deletion The performance of DELETE statements is improved by the fast MDC roll-out capability that flags MDC blocks when they are deleted and releases the data pages later on.

If you have selected an InfoObject as an MDC dimension, you do not have to create an additional secondary index on the InfoObjec because an MDC block index is already created. DELETE operations benefit from the reduced maintenance of less secondary indexes.

Limiting Disk Space Consumption The more MDC dimensions are selected the more granular the blocks get and are thus likely to be sparsely filled. For each value combination, which occurs in one or more data row, in the MDC dimensions at least one block is reserved. If the table is distributed over several partitions, the different value combinations can occur on more than one partition. In this case, blocks are allocated on all partitions on which a value combination occurs. If there are only a few rows for each value combination on each partition, a bigger portion of each block remains empty and the number of allocated data pages is larger than required. This does not only lead to a waste of disk space, but it also has a negative impact on the performance of queries because many partially filled data pages have to be read instead of a few that are filled.

To reduce the risk of sparsely filled MDC blocks, the maximum number of MDC dimensions, that you can select, is limited to three. Choose the MDC dimensions according to the expected cardinalities of the key InfoObjects of the DataStore object (for example, you choose InfoObjects with a low cardinality). MDC dimensions are key InfoObjects of the DataStore object. The cardinality equals the number of different values for the InfoObject. Therefore, you should select InfoObjects with only a small number of characteristics values.

The block size is determined by the page size and the extent size of the tablespace. By default, the page size of DataStore data is 16 KB. The default extent size in SAP NetWeaver 2004s BI is 2 (in previous releases it was 16). With a page size of 16 KB and an extent size of 2, 32 KB would be reserved for each block. The number of bytes for each record in a DataStore Active Table depends on the number and bytes of its InfoObjects.

A DataStore object has 5 key InfoObjects and 10 data InfoObjects. All InfoObjects have a data type of CHAR and a length of 10. Each record needs 5 * 10 + 10 * 10 bytes = 150 bytes. About 218 records would fit into one block. If you have selected the time characteristic 0CALMONTH and one additional InfoObject for MDC there should be at least 150 records per calendar month and value of the second InfoObject to optimally use the allocated space in the Active Table.

Assume that each MDC value combination occurs on every partition. If the Active table is distributed over four database partitions, there should be at least 150 records per calendar month and value of the second InfoObject on each partition, that is, 600 rows altogether. Thus the allocated space of the Active table is optimally used.

You can estimate the number of logical cells, for which blocks will be allocated, and the number of records in the logical cells on each database partition. This estimation is based on the expected number of records in the table, the cardinalities of the MDC dimensions, and the correlation between the MDC dimensions.

The following example describes how the correlation of MDC dimensions can lead to a reduction of disk space consumption:

DataStore object DS1 is expected to have 100,000,000 rows. The Active table is distributed over four database partitions. DS1 contains the following key InfoObjects with the following cardinalities:


11.11.2005 53

InfoObject Cardinality

Customer 10,000

Material 5,000

Sales organization 10

Calendar day 1,825

Calendar month 60

If customer and calendar day are selected as MDC dimensions and they are not correlated there will be a maximum of 10,000 * 1,825 = 18,250,000 logical cells. If each cell exists on every partition, there will be at least 18,250,000 * 4 = 73,000,000 MDC blocks allocated. On average there would be 1.3 records per MDC block per partition, which is not enough to fill an MDC block. If most of the queries contained restrictions on customer and calendar day, a lot of space is wasted.

If, however, the selected MDC dimensions were correlated, the number of logical cells, for which records exist, could be much smaller and thus the number of records in the logical cells much higher. In this case, no space or a lot less space would be wasted.

5.3.2 Setting Up MDC for DataStore Objects Manually 1. In your SAP BI system, call transaction RSA1.

2. In the tree structure, navigate to an InfoArea where you either want to create a new DataStore object or edit an existing one.

3. From the context menu, choose either Edit DataStore Object or Create New DataStore Object, depending on the choice you have made in the previous step.

The Edit DataStore Object screen appears.

4. Define the key fields and the data fields of the DataStore object.


54 11.11.2005

5. Choose Extras → DB Performance → Clustering.

The Multi-Dimensional Clustering dialog box appears.

If you are editing an existing DataStore object and it already contains data, you cannot change the clustering settings. To set up or change clustering of an active DataStore object that contains data, you first have to recluster it. To do so, follow the instructions described in Reclustering [page 62].

Otherwise, proceed with the next step.

6. To select MDC dimensions from the InfoObjects that belong to the key of the DataStore object, you have to enter a status number in the Stat.No. field.

The status number indicates that a dimension is selected as an MDC dimension. The status numbers that you enter must be consecutive.

You cannot select - and thus enter - more than three MDC dimensions.


11.11.2005 55

Examples The following examples provide information about MDC-selections that you can use for DataStore objects.

MDC with Only One Time Characteristic Your DataStore object contains at least one InfoObject that is a time characteristic, for example 0CALMONTH. In the Stat.No. field of this InfoObject that is a time characteristic, enter 1. Leave the Stat.No. fields of all others empty.

This improves the performance of all queries that are restricted on the time characteristic. The disk space consumption depends on the number of records with the same value for the time characteristic. If there are enough records in each MDC logical cell to fill at least one extent to 90% or more, disk space consumption will not become a problem.

The advantages of this MDC selection are as follows:

• Improved query performance compared to using a secondary index on the time characteristic

• Fast roll-out capability of MDC table

Selective deletion of data from your DataStore object, which is based exclusively on the time characteristic values, is faster than without MDC.

• Reduced maintenance of secondary indexes during INSERT and DELETE operations

Compared to using an additional secondary index on the time characteristic field, the MDC block index is smaller and easier to maintain.

You use this MDC-selection in the following case:

• You have queries with restrictions on the time dimension

• You perform selective deletion with restrictions on the time characteristic values

• You know that the DataStore object will contain enough records for each time dimension value to fill one or more extents.

If you have several time characteristics in your DataStore object, you should select the one that occurs in complex queries more often, is used for selective deletion and guarantees that the allocated extents can be filled to a reasonable level.

MDC with Two Characteristics

In the Multi-Dimensional Clustering dialog box, select an InfoObject that is a time characteristic by entering the value 1 in the Stat.No. field. Select any one other characteristic by entering the value 2 in the Stat.No. field.

This considerably improves the performance of queries that are restricted on both selected MDC dimensions. If the MDC-selection is not appropriate to the actual data distribution in the MDC dimensions, the disk space consumption can be considerably higher compared to MDC with only the time characteristic or without MDC.

The advantages of this MDC-selection are:

• Improved performance of queries that are restricted on the time characteristic or the additional characteristic or even both.

• Fast roll-in, roll-out capability of MDC table

If you have a good knowledge of the expected queries and the cardinalities of the characteristics in the DataStore object, you use this MDC-selection.


56 11.11.2005

5.4 Setting up MDC for InfoCubes The default setting for InfoCubes is index clustering. To be able to take advantage of MDC, you must manually set up MDC dimensions for InfoCubes.

The following sections provide information about:

• Choosing the right MDC dimensions before you set up MDC

• Setting up MDC for InfoCubes manually

5.4.1 Choosing the Right MDC Dimensions for InfoCubes Before you set up MDC in your SAP BI system, you have to make sure that you choose the right MDC dimensions. The following section provides important background information about:

• Improving performance of queries

• Improving performance of INSERT and DELETE statements

• Limiting disk space consumption

Read this information carefully before you set up MDC manually.

Improving Performance of Queries The response time of a query on an MDC table only improves if the query is restricted on one or more MDC dimensions. Therefore, you choose MDC dimensions according to the expected queries on the InfoCube taking the following into consideration:

• You choose dimensions for which restrictions in your queries occur frequently.

Usually, all queries have a restriction on the time dimension. Therefore, you should select a time dimension.

If most of the queries are restricted on the time characteristic 0CALMONTH or 0FISCPER, you should choose this time characteristic as MDC dimension.

If you expect more granular queries, you should select the time dimension ID field as MDC dimension.

For more information, see Setting Up MDC dimensions for InfoCubes Manually [page 59].

• Specify the status numbers of the MDC dimensions in the order of the frequency with which the dimensions occur in the queries.

Specify lower status numbers for dimensions that occur more often. If you do not know the frequencies of restrictions or if they are nearly the same you should sort the MDC dimensions according to their selectivity. That is, choose lower status numbers for dimensions with a larger cardinality.


11.11.2005 57

Improving Performance INSERT and DELETE Statements The performance of INSERT statements is improved by the fast roll-in capability of MDC tables. The fast roll-in capability reduces the amount of information that is written to the database log files.

The performance of DELETE statements is improved by the fast roll-out capability of MDC tables. The fast roll-out capability flags MDC blocks as deleted and releases the data pages later on.

Additionally, both operations benefit from the reduced maintenance of less secondary indexes. All these improvements apply for any MDC dimensions selections.

Limiting Disk Space Consumption To reduce the risk of sparsely filled MDC blocks, the maximum number of MDC dimensions is limited to three. Choose the MDC dimensions according to the expected cardinalities of the InfoCube’s dimensions and characteristics:

• Choose dimensions with a low cardinality

MDC dimensions are dimension ID fields of the InfoCube. The cardinality equals the number of distinct combinations of the characteristics SID-values in the dimension. Therefore, you should select dimensions with only one or a small number of characteristics and with only a small number of distinct characteristics SID values.

• If a characteristic occurs often in restrictions of queries and has a low cardinality, define a line item dimension for this characteristic and select it as an MDC dimension.

For example, choosing characteristics like country or material group, which are normally part of larger dimensions (like customer or material), can lead to unnecessary high disk space consumption. An MDC dimension country or material group with a significantly lower cardinality limits the space consumption and at the same time improves the query performance. This is opposed to the general rule that line item dimensions are characteristics with a high cardinality.

An InfoCube has three standard dimensions, five user-defined dimensions, and 20 key figures of data type decimal. Each record needs 3 * 4 + 5 * 4 + 20 * 9 bytes = 212 bytes. If the tablespace where the table resides has the default page size 16 KB and the default extent size 2, the block size would be 32 KB and about 154 records would fit into one block.

If you have selected the time characteristic 0CALMONTH and one of the customer-defined dimensions for MDC, there should be at least 150 records per request, calendar month and dimension ID of the customer-defined dimension to optimally use the allocated space in the F fact table. For the E fact table this is applied per calendar month and dimension ID of the customer-defined dimension.

If the table is distributed over four database partitions, per partition there should be at least 150 records per request, calendar month and dimension ID of the customer-defined dimension in the F Fact table. For the E fact table this is applied per calendar month and dimension ID of the customer-defined dimension per partition.

You can estimate the number of logical cells for which blocks will be allocated and the number of records in the logical cells on each database partition. This estimation is based on the expected number of records in the table, the cardinalities of the MDC dimensions, and the correlation between the MDC dimensions.


58 11.11.2005

The following example describes how the correlation of MDC dimensions can lead to a reduction of disk space consumption:

InfoCube IC1 is expected to have 100,000,000 rows in the E fact table. The fact tables are distributed over four database partitions. IC1 contains the following dimensions with the following cardinalities:

Dimension / Characteristic Cardinality

Customer 10,000

Material 5,000

Sales organization 10

Package dimension 200

Time dimension 1,825

Time characteristic calendar month within time dimension 60

If the customer and the time dimension are selected as MDC dimensions and they are not correlated there will be a maximum of 10,000 * 1,825 = 18,250,000 logical cells for the E fact table. If each cell exists on every partition, there will be at least 18,250,000 * 4 = 73,000,000 MDC blocks allocated. On average, per partition there would be 100,000,000 / 73,000,000 = 1.4 records per MDC block, which is not enough to fill a block.

If most of the queries contained restrictions on customer and calendar day, a lot of disk space is wasted. However, depending on the correlation of the two MDC dimensions, only a certain percentage of these logical cells can contain rows. In this case, with the same total number of rows in the table, you can estimate a larger number of records per MDC block. Thus, a lot less space is probably wasted.

In addition, the F fact table uses the package dimension for MDC. Theoretically this leads to a maximum of 10,000 * 1,825 * 200 = 3,650,000,000 logical cells. However, because we can assume that the time and the package dimension values are correlated, the maximum number of logical cells will not be reached.


11.11.2005 59

5.4.2 Setting Up MDC for InfoCubes Manually 1. In your SAP BI system, call transaction RSA1.

2. In the tree structure, navigate to an InfoArea where you either want to create a new InfoCube or edit an existing one.

3. From the context menu, choose either Edit InfoCube or Create InfoCube, depending on the choice you have made in the previous step

The Edit InfoCube screen appears.

4. Define the dimensions and key figures of the InfoCube.

5. Choose Extras → DB Peformance → Custering.

The Selection of Clustering dialog box appears.

If the InfoCube already contains data, the dialog box will be inactive. To set up or change the clustering settings of an active InfoCube that contains data, you have to recluster it. To do so, follow the instructions described in Reclustering [page 62].

Otherwise, proceed with the procedure described here.

6. If you want to define index clustering for the InfoCube and all its aggregates, choose Index Clustering and confirm your entry.

When the InfoCube is activated or reclustered, the fact tables will be created with a clustering index. On the F fact table, the secondary index on the time dimension ID field will be the clustering index. On the E fact table, the unique P-index will be the clustering index.

7. If you want to define MDC for the InfoCube and all its aggregates, choose Multi-Dimensional Clustering.


60 11.11.2005

The Multi-Dimensional Clustering dialog box appears:

In the Selected Column field of the Time Dimension group box, you exclusively choose a time characteristic or the time dimension ID field.

In the Char. Dimension group box, you specify one or more user-defined dimensions.

The status number indicates that a dimension is selected as an MDC dimension. If you select an item in the Selected Column field, the inactive field Status Number is automatically filled with 1.

As a result, the selected column will always be the first MDC dimension. If the Selected Column field is empty, the status number is 0. In the Char. Dimension group box you can manually add a number in the Stat. No. field of a dimension. The status numbers must be consecutive. No more than three MDC dimensions can be selected including the time dimension.


11.11.2005 61

You proceed as follows:

To define the time dimension ID field as an MDC dimension on both fact tables, choose Dim. ID of Time Dimension in the Selected Column field. When the InfoCube is activated or reclustered, the fact tables will not contain an additional SID field for the time characteristic 0CALMONTH or 0FISCPER. The secondary index on the time dimension ID field will not be created. Instead, an MDC block index will be created.

To define a time characteristic as an MDC dimension on both fact tables, choose Fiscal year/period or Calendar year/month in the Selected Column field. A time characteristic will only appear in the drop-down list if it exists in the InfoCube. When the InfoCube is activated or reclustered, an additional field for the SID value of the time characteristic is added to the F fact table and E fact table. A block index will be created on this additional field.

To define a user-defined dimension as an MDC dimension on both fact tables, add a number in the Stat. No. field in the box Char. Dimension for the dimension. When the InfoCube is activated or reclustered, the secondary index on the dimension ID field will not be created. Instead, an MDC block index will be created.

Examples The following examples provide information about MDC-selections that you can use for InfoCubes

MDC-Selection with One Time Dimension Only

In the Selected Column field, choose Dim. ID of Time Dimension. Leave all Stat. No. fields in the box Char. Dimension empty.

In general, this improves the performance of all queries that are restricted on an arbitrary time characteristic. The query performance will be at least as good as if you are using Index Clustering, which organizes the data by the time dimension, too. The disk space consumption will be similar to Index Clustering because the F fact table is clustered on the MDC dimensions request and time, which are often correlated. The E fact table is clustered on the single MDC dimension time. The advantages of this MDC-selection are as follows:

• Improved query performance

Internal tests showed an average improvement by 25% compared to index clustering

• Fast roll-in, roll-out capability of MDC table

• Reduced maintenance of secondary indexes during INSERT and DELETE operations

• One secondary index is replaced by an MDC block index

Use this MDC-selection if you do not know about the queries and the data distribution in the InfoCube’s dimensions beforehand and thereby cannot adapt the MDC dimensions to a given workload. This MDC-selection can be generally used for any kind of time-related InfoCubes.


62 11.11.2005

MDC-Selection with One Time Characteristic Only

In the Selected Column field, choose Fiscal year/period or Calendar year/month. Leave all Stat. No. fields empty.

This improves particularly the performance of queries that are restricted on the selected time characteristic. The disk space consumption will be similar to index clustering because the F fact tables is clustered on two correlated MDC dimensions and the E fact table on a single MDC dimension. The advantages of this MDC selection are as follows:

• Improved performance of queries restricted on this time characteristic

Internal tests showed an average improvement by 15% for this type of queries compared to the MDC selection with one time dimension only.

• Fast roll-in, roll-out capability of MDC tables

If you know that most of the queries will restrict on this time characteristic, you use this MDC-selection.

MDC-Selection with a Time Characteristic and a User-Defined Dimension

In the Selected Column field, choose Fiscal year/period or Calendar year/month. Choose a user-defined dimension by entering the value 2 in the corresponding Stat. No. field.

This considerably improves the performance of queries that are restricted on these selected MDC dimensions. If the MDC-selection is not appropriate to the actual data distribution in the MDC dimensions, the disk space consumption can be considerably higher compared to when using index clustering.

The advantages of this MDC-selection are as follows:

• Improved performance of queries that are restricted on this time characteristic and user-defined dimensions,

Internal tests showed an improvement up to 80% for this type of queries compared to index clustering

• Fast roll-in, roll-out capability of MDC tables

If you have a good knowledge of the expected queries and the cardinalities of characteristics and dimensions in the InfoCube, you use this MDC-selection.

5.5 Reclustering InfoCubes and DataStore Objects You recluster InfoCubes and DataStore objects in the following cases:

• To apply MDC to InfoCubes and DataStore objects that you created before MDC was supported by SAP NetWeaver 2004s BI.

• To change the clustering settings if you have defined MDC for an InfoCube or DataStore object that requires too much space for the fact tables or the Active Table.

• To change the clustering settings in a way that the queries that you run on your InfoCubes or DataStore objects are better supported.

You cannot recluster write-optimized DataStore objects and PSA tables.


11.11.2005 63

During the process of reclustering, target tables are created. These target tables are copies of the fact tables of InfoCubes and the Active Table of DataStore objects and contain the new clustering settings. The data is then copied from the source tables to the target tables. Afterwards, the target tables are renamed with the name of the original source tables and the original source tables are renamed so that you can store them for security purposes.

For InfoCubes, the clustering settings of their aggregates are adapted to the clustering settings of the InfoCube. That is, if you select MDC for an InfoCube, the aggregates will be MDC-clustered as well. If you select index clustering the aggregates will also be index clustered. Using index clustering requires that you first deactivate the aggregates and then re-activate them to be refilled.

You should be aware that reclustering requires copying the data to a new table with the new clustering settings. Copying the data may take a long time.

Before you start to recluster DataStore objects and InfoCubes, we strongly recommend that you perform a complete backup of the database.

During reclustering, a WRITE lock is acquired for the InfoCube or DataStore object so that operations that ADD, DELETE, or CHANGE data in these objects are blocked. While the source and target tables are renamed, a READ lock, which blocks all reporting functions, for InfoCubes is acquired. For DataStore objects, the READ lock is not acquired. You have to make sure that no reporting or delta updates into other InfoProviders are executed.

5.5.1 Reclustering of InfoCubes Purpose If you want to use MDC clustering for the InfoCube and the InfoCube’s fact tables reside in a tablespace with a large extent size, you should define a new tablespace with extent size 2 for the fact tables that are clustered with MDC. Thus you avoid that the MDC clustered InfoCube occupies too much disk space.

In releases prior to SAP NetWeaver 2004s, the default Fact tablespace was created with an extent size of 16 pages. If your InfoCube resides in the default Fact tablespace or a tablespace with the same extent size, at least 16 pages or a multiple of 16 pages will be allocated for each MDC logical cell. If there are not enough rows to fill up the extents, this can lead to a waste of space.

If you create a new tablespace with extent size of 2 pages, only 2 pages or a multiple of 2 pages will be allocated for each MDC logical cell. This also applies to the InfoCube’s aggregates because aggregates have fewer rows than InfoCubes.

If you want to use index clustering for the InfoCube, it is not necessary to create new tablespaces with extent size 2.


64 11.11.2005

Process Flow 1. Optionally, you create a new data class [page 31] for the InfoCube and for the

InfoCubes aggregates and new data and index tablespaces [page 27] for the data classes.

2. You create and schedule a new reclustering job for the InfoCube [page 64].

5.5.1.1 Creating and Scheduling a Reclustering Job for an InfoCube

1. In your SAP BI system call transaction RSA1.

2. Navigate to the InfoCube for which you want to create and schedule a reclustering job.

3. From the context menu choose Additional Functions → Reclustering.

The Reclustering of InfoProviders dialog box appears.

4. To use new data classes for the InfoCube or its aggregates, enter the data class names in the Target Data Class field.

5. To specify the size of data packages to be copied in one transaction from the source tables to the target tables, enter the required value in the Commit Size (MB) field.

The default value is 300 MB.

6. To continue, choose Initialize.

The Confirmation of Database Backup dialog box appears.

If you already performed a backup of your database, choose Yes.


Continue with step 7.

If you have not yet performed a backup of your database, exit this procedure by choosing No and perform the database backup. Start again with step 1.

7. Choose the clustering method that you want to use and continue as described in Setting Up MDC for InfoCubes Manually [page 59], step 5.

When selecting MDC, you can estimate the MDC disk space consumption. The estimation is based on the data currently stored in the InfoCube. To check, choose Execute Space Check before confirming your entries. The percentage of free space for the fact tables and the aggregates will be estimated and displayed in a popup window. If the percentage of free space is higher than 33% you should consider selecting different MDC dimensions.

8. After you have entered all required values in the clustering dialogs, a pop-up window displaying a message that a new reclustering request has been created appears.


11.11.2005 65

9. To continue, confirm the pop-up.

The Start Time dialog box appears.

10. Schedule the reclustering job as a batch job. The system will inform you that the job

has been scheduled or started.

You can monitor the job using transaction SM37 or using the SAP BI request monitor. Invoke the request monitor by choosing Monitor on the Reclustering of InfoProviders screen.

The batch job for reclustering of InfoCubes consists of the following steps:

1. Creation of target tables for the E and the F fact table of the InfoCube.

2. Copy of data from the source tables to the target tables.

3. Creation of the required secondary indexes on the target tables.

4. Setting of a READ lock on the InfoCube to block READ operations while old and new fact tables are being exchanged.

5. Deactivation of the aggregates.

6. Dropping of the UNION ALL view on the old E and F fact table.

7. Consistency check of the copied data.


66 11.11.2005

8. Exchange of the target tables and the original source fact tables. The original source fact tables are saved under the name of SAP BI temporary tables. The target tables are renamed to the names of the original source fact tables.

9. Recreation of the UNION ALL view on the new fact tables.

10. Adaptation of the SAP BI metadata of the InfoCube and reactivation of the transfer structures.

11. Collection of statistics on the new fact tables.

12. Removal of the READ lock.

13. Reactivation of the aggregates (this will adapt their layout to the new clustering settings of the InfoCube).

14. Cleanup.

5.5.2 Reclustering of DataStore Objects Purpose The following sections describe the steps that you have to perform to recluster DataStore objects.

You can only recluster standard DataStore objects and DataStore objects for direct update. Reclustering cannot be applied to write-optimized DataStore objects.

If you want to use MDC clustering for the DataStore object and the DataStore object’s Active Table resides in a tablespace with a large extent size, you should define a new tablespace with extent size 2 for the Active Table that is clustered by MDC. Thus, you avoid that the MDC clustered DataStore object occupies too much disk space.

In releases prior to SAP NetWeaver 2004s, the default ODS/PSA tablespace was created with an extent size of 16 pages. If your DataStore object resides in the default ODS/PSA tablespace or a tablespace with the same extent size, at least 16 pages or a multiple of 16 pages will be allocated for each MDC logical cell.

If there are not enough rows to fill up the extents this can lead to a waste of disk space. If you create a new tablespace with extent size of 2 pages, only 2 pages or a multiple of 2 pages will be allocated for each MDC logical cell.

Process Flow 1. Optionally, you create a new data class [page 31] for the Active Table of the

DataStore object and a new data and index tablespace [page 27] for the data class.

2. You create and schedule a new reclustering job for the DataStore object [page 67].


11.11.2005 67

5.5.2.1 Creating and Scheduling a Reclustering Job for DataStore Objects

1. In your SAP BI system call transaction RSA1.

2. Navigate to the DataStore object for which you want to create and schedule a reclustering job.

3. From the context menu choose Additional Functions → Reclustering.

The Reclustering of InfoProviders dialog box appears.

4. To define new data classes for the DataStore object, enter the data class names in the Target Data Class field.

5. To specify the size of data packages to be copied in one transaction from the source table to the target table, enter the required value in the Commit Size (MB) field.

The value that you enter here restricts the size of the units of work so that log overflows do not occur

The default value is 300 MB.

6. To continue, choose Initialize.

The Confirmation of Database Backup dialog box appears.

If you already performed a backup of your database, choose Yes.


Continue with step 7.

If you have not yet performed a backup of your database, exit this procedure by choosing No and perform the database backup. Start again with step 1.

7. Choose the clustering method that you want to use and continue as described in Setting Up MDC for DataStore Objects Manually [page 59], step 5.

When selecting MDC, you can estimate the disk space consumption. The estimation is based on the data that is currently stored in the DataStore object. To check, choose Execute Space Check before confirming your entries,

The percentage of free space for the Active Table will be estimated and displayed in a popup window. If the percentage of free space is higher than 33% you should consider selecting different MDC dimensions.

8. After you have entered all required values in the clustering dialog, a pop-up window displaying that a new reclustering request has been created appears.


68 11.11.2005

9. To continue, confirm the pop-up.

The Start Time dialog box appears.

10. Schedule the reclustering job as a batch job. The system will inform you that the job

has been scheduled or started.

You can monitor the job using transaction SM37 or using the SAP BI request monitor. Invoke the request monitor by choosing Monitor on the reclustering screen.


11.11.2005 69

The batch job for reclustering of InfoCubes consists of the following steps:

1. Creation of the source table of the Active Table

2. Copying the data to the target table

3. Creation of the secondary indexes on the target table

4. Consistency check of the copied data

5. Exchange of the target table and the original source Active Table

6. Modification of the SAP BI metadata of the DataStore object

7. Collection of statistics on the new Active Table

8. Cleanup

During reclustering of DataStore objects no READ lock is set because this is not supported. Make sure that you do no reporting on the DataStore object and no delta updates into other InfoProviders while reclustering is running.

5.5.3 Monitoring and Maintaining Reclustering Jobs To monitor a reclustering job for an InfoCube or DataStore object, choose Reclustering from the context menu. In the Reclustering of InfoProviders screen select Monitor.

For each reclustering job the steps are displayed as follows:

• Steps that have already been completed are displayed with a green icon.

• Steps that are currently being executed are displayed with a yellow icon.

• Steps that have not yet been started are displayed with a white icon.

If a step fails, it is displayed with a red icon. The reclustering request aborts in this case and is displayed with a red icon, too.

To display the details of a reclustering step, double-click it. In the right frame, the details of the step and their start and end times are displayed.

Deleting Reclustering Jobs from the Monitor You delete a reclustering job from the monitor by choosing Delete Request from the context menu of the selected the InfoCube or DataStore object. The request is removed from the monitor.

Do not delete aborted requests until you are sure that the InfoCube or DataStore object is consistent. If you delete a reclustering request, all tables remain in the current state. No action is taken to restore the InfoCube or DataStore object to the state before reclustering.


70 11.11.2005

Resetting Aborted Reclustering Jobs You reset an aborted reclustering request by choosing Reset Request from the context menu of the selected InfoCube or the DataStore object. Locks on the InfoCube or DataStore objects are removed and the target tables are deleted.

Restarting Aborted Reclustering Jobs You restart an aborted reclustering job by choosing Restart Request from the context menu of the selected InfoCube or DataStore object after you have removed the cause of the failure. The reclustering job will continue with the step that failed.

Restarting a Failed Reclustering Job Step You restart a step of a reclustering job that failed by choosing Restart Step from the context menu of the selected InfoCube or DataStore Object after removing the cause of the failure. The reclustering job will continue the execution of the failed step and, if this is successful, proceed with the execution of the remaining steps.

5.6 Checking the Disk Space Consumption of MDC InfoProviders Use You use the new check function in transaction RSRV to check the disk space consumption of InfoCubes and DataStore objects that are using MDC.

Procedure 1. In your SAP BI system, call transaction RSRV.

The Analysis and Repair of BI Objects screen appears.

2. Choose Tests in Transaction RSRV → All Elementary Tests → Database.

3. Drag and drop test Check space consumption of MDC InfoProviders in DB2/UDB into the frame on the right side.

4. Double-click it and specify the name of the InfoProvider in the dialog box that appears.

5. To start the test, choose Execute.

If the InfoProvider is an InfoCube that uses MDC, the percentage of free disk space for the fact tables and the Aggregate fact tables will be determined based on the database statistics for these tables.

If the InfoProvider is a DataStore object that uses MDC, the percentage of free space for the Active table will be determined based on the database statistics for this table.

Make sure that the database statistics are up-to-date before executing the test. If more than 33% of the space allocated is free space, you should consider reclustering the InfoProvider.


11.11.2005 71

6 Adding New Partitions to the Database Purpose The following section describes the steps that are necessary if you want to add new partitions to the database.

Process Flow

Before you start adding partitions, we strongly recommend that you perform a complete backup of the database.

...

1. You create new partitions and perform all necessary steps as described in Preparing New Partitions [Page 72].

2. If you want to add new, large InfoCubes, DataStore objects, or PSA tables to the system and if you want to create new database partition groups for these objects, proceed with Creating Large InfoCubes, DataStore Objects, and PSA Tables [Page 22].

3. If you want to redistribute InfoCubes, DataStore objects, or PSA tables to the new partitions, you can either perform this step by using the table conversion report DB6CONV or using the DB2 REDISTRIBUTE DATABASE PARTITION GROUP command:

Report DB6CONV

You create new database partition groups and new tablespaces and move the tables to the new tablespaces.

You use report DB6CONV for the conversion of single tables. It is well suited for single, large InfoCubes, DataStore, and PSA objects in separate tablespaces. For each table, a background job in the SAP system is scheduled. For a large number of tables, the manual effort for scheduling all conversions can be high.

For BI tables, report DB6CONV supports the conversion option not logged initially. The conversion is not written to the log files, that is, this conversion method saves execution time and log space. However, you must perform a complete backup after the conversion has finished.

For detailed information about report DB6CONV, see SAP Note 362325. If you want to use this report, see Redistributing the Data with DB6CONV [Page 75].

DB2 REDISTRIBUTE

You add the new partitions to the existing database partition groups using the Partition Integration Wizard of the DBA Cockpit and redistribute all tables with the DB2 REDISTRIBUTE command.

You should use the DB2 REDISTRIBUTE command for tablespaces with a large number of tables. DB2 REDISTRIBUTE requires free log space of at least the amount of the largest distributed table since each table is redistributed in one unit of work. Therefore, you might have to increase the active log space.

If you want to use this command, see Redistributing the Data with DB2 REDISTRIBUTE [Page 80].


72 11.11.2005

When partitions are added to an existing database partition group and tables exist in this group, the new partitions are not added automatically to the partition map of the group. In this case, you must use the DB2 REDISTRIBUTE command because only this command is able to update partition maps. You cannot use report DB6CONV in this case.

For the redistribution with DB6CONV, you have to create a new database partition group with a partition map that includes all new partitions and a new tablespace in this group. The partitioned tables are then moved to the new tablespace created in the new database partition group.

No matter which option you use, be aware that tables cannot be accessed during the whole redistribution process.

6.1 Preparing New Partitions To prepare new partitions, proceed as follows: ...

1. Create additional partitions [].

2. Update the database configuration for each new partition [Page 73].

6.1.1 Creating Additional Partitions Use You use one of the following procedures to create additional partitions in a multi-partition environment, that is you:

• Add new database partition servers to a multi-partition installation

• Create additional logical partitions on already existing database partition servers

Procedure

Adding New Database Partition Servers

To add new database partition servers or new database partitions to a multi-partition installation, you use the SAP installation tool SAPinst.

For more information, see the SAP documentation Installation Guide – SAP NetWeaver 2004s<ABAP|ABAP+Java |Java> on <OS>: IBM DB2 Universal Database for UNIX and Windows, section Installing SAP BW on Multiple Database Partition Servers. This documentation is included in the installation package or available on SAP Service Marketplace at

service.sap.com/instguidesNW2004s → Installation.


11.11.2005 73

6.1.2 Updating the Database Configuration for Each New Partition Use Once you have created additional partitions, you might have to update the database configuration.

In a multi-partition installation, each partition has its own database configuration. The database configuration parameters of a new partition are set to the values that we recommend in SAP Note 584952. If you want to change these values you have to update the database configuration.

Update the parameters according to the SAP Note 584952 (DB2 UDB Version 8.1) and your experiences. A general recommendation in addition to the mentioned SAP Note cannot be given.

Prerequisites Before you update the database configuration parameters, you can check the database configuration of a partition using one of the following commands:

• You can check the database configuration by entering the following CLP command: db2_all “<<+<partition_number>< db2 get database configuration for <database_name>”

• You can also check the database configuration by setting the environment variable DB2NODE to the desired partition number. To do so, enter the following commands:

set DB2NODE=<partition_number>

db2 get database configuration for <database_name>

After setting the DB2NODE environment variable, all DB2 commands are executed on the specified partition.


74 11.11.2005

Procedure To update the database configuration parameters, we recommend that you write an SQL script that you execute on each new partition:

1. Log on to the instance-owning database partition server as user db2<dbsid>.

2. To create the SQL script in file <dbconfig_file>, add the following line for each parameter: update database configuration for <database_name> using <parameter_name> <value>;

It is important that you make sure to set the parameter NEWLOGPATH to <log_path>.

3. To update the database configuration, enter the following command for each new partition: db2_all “<<+<partition_number>< db2 –tf <dbconfig_file>”

4. To stop and restart the database manager, use the following commands: db2stop

db2start.

5. To connect to the database, enter the following command: db2 connect to <database_name>

This may take some time since the log files for all new partitions are created in the specified log directory.


Variable Description

<partition_number> Partition number of the new partition (If four partitions 0, 1, 2, 3 exist, the number is 4)

<dbconfig_file> Name of the file created by the user to update the database configuration parameters

<log_path> SAP standard log path:

• UNIX: “/db2/<SAPSID>/log_dir/”

• Windows: “<drive>:\db2\<SAPSID>\log_dir\”


11.11.2005 75

6.2 Redistributing the Data with DB6CONV Use As described in Adding New Partitions to the Database [Page 35], you use report DB6CONV to convert tables and redistribute data over multiple partitions.

Procedure To redistribute the data with DB6CONV, you proceed as follows: ...

1. Add containers to temporary tablespaces on each new partition [].

2. Create new database partition groups [Page 76].

3. Create new tablespaces in the new database partition groups [Page 77].

4. Check the partitioning keys of all partitioned tables [Page 77].

5. Redistribute all partitioned tables [Page 79].

6.2.1 Adding Containers to Temporary Tablespaces on Each New Partition


2. To add containers for each new partition, enter the following SQL statements: db2 alter tablespace PSAPTEMP16 add (‘<temp16_containers>’) on dbpartitionnum(<partition number>)

3. The 4-digit-number in the container directory should correspond to the partition number.



<temp16_containers>

SAP standard path for tablespace containers of the temporary system tablespace with a page size of 16 KB.

• UNIX: “/db2/<SAPSID>/saptemp1/NODE<nnnn>/temp16”

• Windows: “<drive_letter>:\db2\<SAPSID>\saptemp1\NODE<nnnn>\temp16”

<nnnn> 4-digit partition number, for example, 0001 for partition 1


76 11.11.2005

6.2.3 Creating New Database Partition Groups Use When redistributing the data with DB6CONV, you cannot add new partitions to the partition maps of existing database partition groups. You have to create new database partition groups that include the new partitions in the partition map and then add the new database partition group to the corresponding buffer pool.

To redistribute all partitioned tables, you create a new group for each database partition group. Then move all tables from the original group to the new group using report DB6CONV, and drop the original groups.

Procedure ...

1. Create new database partition groups as described in section Creating Database Partition Groups [Page 24].

2. Add the new database partition groups to the corresponding buffer pools as described in the following example:

To add the database partition group <partition_group> to the default buffer pool for fact tables IBMDEFAULTBP , log on to a database server as user db2<dbsid> and enter the following SQL statement:

db2 alter bufferpool IBMDEFAULTBP add database partition group <partition_group>

where <partitition_group> is the name of the database partition group.


11.11.2005 77

6.2.4 Creating New Tablespaces in the New Database Partition Group Use DB6CONV moves the tables from the current tablespaces to the target tablespaces that were created in the new database partition groups. You have to create a new tablespace for each original tablespace. Then you move all tables from the original tablespace to the new tablespace and drop the original tablespace.

Procedure ...

1. Create new tablespaces as described in section Creating Tablespaces [Page 27].

Always create the data tablespace and the corresponding index tablespace.

If you replace original tablespaces with new tablespaces, change the corresponding data classes in the SAP system. To avoid inconsistency with the metadata of BI objects, you should not drop or rename data classes. Hence, in the existing data classes, replace the original tablespaces with the new tablespaces rather than creating new data classes and dropping the old data classes. This ensures later that new BI objects are created in the correct tablespaces.

2. Change data classes as described in Creating Data Classes [Page 31].

6.2.5 Checking the Partitioning Keys of All Partitioned Tables Use You only use the following procedure if your tables are not evenly distributed in a multi-partition SAP BI system.

Adding new partitions neither invalidates partitioning keys nor does it require changing the partitioning keys. If you migrate from a database system that does not support hash partitioning, incorrect or missing partitioning keys can occur after a heterogeneous system migration.

We strongly recommend that you perform these checks after a heterogeneous system migration. If you are not sure about the status of partitioning keys or if you have a single-partition system, you should perform these checks as well.

To find out whether all partitioned tables are distributed evenly across the partitions defined in the database partition groups, you can check and repair the partitioning key of each partitioned table using the following function modules:

• RSDU_CHKREP_PKEY_ALL_CUBES_DB6

• RSDU_CHKREP_PKEY_ALL_ODS_DB6

• RSDU_CHKREP_PKEY_ALL_PSA_DB6

For more information about these function modules, see SAP Note 648432.


78 11.11.2005

Procedure 1. Log on to the SAP BI system.

2. Call transaction SE37.

The Function Builder: Initial Screen appears.

3. Enter the function module name RSDU_CHKREP_PKEY_ALL_CUBES_DB6 and choose Single Test.

The Test Function Module: Initial Screen appears.

4. For parameters I_PROC_AGGREGATES and I_PREP_REPAIR, enter X

5. Choose Execute.

The Test Function Module: Result Screen appears.

6. Go back to the Function Builder: Initial Screen and repeat steps 2 to six for the modules RSDU_CHKREP_PKEY_ALL_ODS_DB6 and RSDU_CHKREP_PKEY_ALL_PSA_DB6.

There are two scenarios that you have to take into account:

• If the source tablespace is on a single partition, missing partitioning keys are created and wrong partition keys are repaired immediately.

• If the source tablespace is distributed on two or more partitions, partitioning keys cannot be repaired without copying the table content. In this case, the check function modules create entries for the distributed tables with incorrect partitioning keys in table DB6CONVT. You then have to run report DB6CONV to convert the tables.

By default, the source and target tablespaces for the repair conversions are the same. Therefore, if you have to repair missing or incorrect partitioning keys of tables and you want to redistribute these tables to a new database partition group, you have to run DB6CONV twice. The first run corrects the partitioning keys and the second run redistributes and moves the tables to the new tablespaces.

You can perform the actions repair and redistribute in a single execution of DB6CONV, which is especially helpful for very large tables whose conversion may take a long time. For a single execution, call report DB6CONV and edit the DB6CONVT entry for all tables that you want to repair and redistribute in a single step. Then change the fields Target Data Tablespace and Target Index Tablespace to the names of the new tablespaces, which you previously created as described in Creating New Tablespaces in the New Database Partition Group [Page 77].


11.11.2005 79

6.2.6 Redistributing All Tables of a Tablespace Use To schedule tables for conversion with DB6CONV, you have to make entries in table DB6CONVT. To create entries for all partitioned tables in a source tablespace, you can use function module RSDU_REPARTITION_PREPARE_DB6. For more information, see SAP Note 648432.

Function module RSDU_REPARTITION_PREPARE_DB6 schedules all tables with the option not logged initially.

Procedure ...

1. Log on to the SAP BI system.



3. Enter the function module name RSDU_REPARTITION_PREPARE_DB6 and choose Single Test.

The Test Function Module :Initial Screen appears.

4. Enter the name of the source tablespace in parameter field I_TABLESPACE and the name of the target tablespace in parameter field I_TABLESPACE_NEW.

5. Choose Execute.

A data class must exist that contains the target tablespace and its corresponding index tablespace. You can edit the data classes to contain the new tablespaces before you call this function module.

Index tablespaces do not need to be specified as source tablespaces since they do not contain any tables.


7. Enter the report name DB6CONV and choose Execute.

8. To schedule one or more DB6CONV jobs, select one or more tables with status NEW.

9. To start the redistribution, choose Convert NEW.

Since the redistribution of the tables is not logged, you must perform a complete backup after all tables have been redistributed. If you have enabled automatic RUNSTATS for the database, DB2 will update the statistics for the redistributed tables. If you have disabled automatic RUNSTATS for the database, you also have to schedule a runstats_all job afterwards to update the database statistics for all redistributed tables.


80 11.11.2005

6.3 Redistributing the Data with DB2 REDISTRIBUTE Use In contrast to using DB6CONV, the existing database partition groups and tablespaces are altered. You can either use the Partition Integration Wizard of the DBA Cockpit for this or perform the steps manually using DB2 REDISTRIBUTE.

Procedure Using the Partitition Integration Wizard

You can access the Partition Integration wizard by calling the DBA Cockpit using transaction DB6COCKPIT and choosing Wizards: Partition Integration in the navigation frame.

To perform the required steps, follow the instructions on the screens of the Partition Integration wizard.

Using DB2 REDISTRIBUTE

To manually alter database partition groups and redistribute the data with DB2 REDISTRIBUTE, you have to proceed as follows: ...

1. Add containers to temporary tablespaces on each new partition [Page 81].

2. Add partitions to database partition groups [Page 81].

3. Add tablespace containers on each new partition [Page 82].

4. Check the partitioning keys of all partitioned tables [Page 82].

5. Redistribute the database partition groups [Page 83].


11.11.2005 81

6.3.1 Adding Containers to Temporary Tablespaces on Each New Partition


2. To add containers for each new partition, enter the following SQL statements: db2 alter tablespace PSAPTEMP16 add (‘<temp16_containers>’) on dbpartitionnum(<partition number>)

The 4-digit-number in the container directory should correspond to the partition number.



<temp16_containers>

SAP standard path for tablespace containers of the temporary system tablespace with a page size of 16 KB.

• UNIX: “/db2/<SAPSID>/saptemp1/NODE<nnnn>/temp16”

• Windows: “<drive_letter>:\db2\<SAPSID>\saptemp1\NODE<nnnn>\temp16”

<nnnn> 4-digit partition number, for example, 0001 for partition 1

6.3.2 Adding Partitions to Database Partitions Groups

1. Log on to a database partition server as user db2<dbsid>.

2. To add one ore more partitions to a database partition group, open a command prompt and enter the following SQL statement: db2 alter database partition group <partition_group>

add dbpartitionnums (<partition_number>[, <second_partition_number>, …]) without tablespaces

Do not omit the option without tablespaces. Otherwise, DB2 default containers, which differ from the SAP standard, are created on the new partitions for each tablespace that is defined in the database partition group. Tablespace containers in the new partitions are created explicitly in the next step.


82 11.11.2005

6.3.3 Creating Tablespace Containers on Each New Partition Use On each partition that is added to a database partition group, you have to create tablespace containers for all tablespaces that belong to the database partition group.

Procedure 1. Log on to a database server as user db2<dbsid>.

2. To create the required tablespace containers, enter the following SQL statement: db2 alter tablespace <table_space> add (file '<container>' <size>) on dbpartitionnum (<partition_number>)



<table_space> Tablespace name

<container> Path or file name of tablespace container

<size> Size of the new container

6.3.4 Checking The Partitioning Keys of all Partitioned Tables Use You only use the following procedure if your tables are not evenly distributed in a multi-partition SAP BI system.

Adding new partitions does neither invalidate partitioning keys nor requires changing the partitioning keys. If you migrate from a database system that does not support hash partitioning, incorrect or missing partitioning keys can occur after a heterogeneous system migration,

We strongly recommend that you perform these checks after a heterogeneous system copy. If you are not sure about the status of partitioning keys or if you have a single-partition system, you should perform these checks as well.

To find out whether all partitioned tables are distributed evenly across the partitions defined in the database partition groups, you can check and repair the partitioning key of each partitioned table with the following function modules:

• RSDU_CHKREP_PKEY_ALL_CUBES_DB6

• RSDU_CHKREP_PKEY_ALL_ODS_DB6

• RSDU_CHKREP_PKEY_ALL_PSA_DB6

For more information about these function modules, see SAP Note 648432.


11.11.2005 83

Procedure 1. Log on to the SAP BW system.



3. Choose function module RSDU_CHKREP_PKEY_ALL_CUBES_DB6 and Single Test.

The Test Function Module: Initial Screen appears.

4. For both parameters I_PROC_AGGREGATES and I_PREP_REPAIR enter X

5. Choose Execute.

6. Repeat this procedure with the modules RSDU_CHKREP_PKEY_ALL_ODS_DB6 and RSDU_CHKREP_PKEY_ALL_PSA_DB6.

6.3.5 Redistributing Database Partition Groups Use You use the following procedure to redistribute all tables in tablespaces belonging to a database partition group.

Procedure 1. Log on to a database partition server as user db2<dbsid>.

2. To redistribute database partition groups, enter the following CLP command: db2 redistribute database partition group <partition_group> uniform

The following SQL error can occur although you have checked and repaired tables (as described in Checking The Partitioning Keys of all Partitioned Tables [Page 82]):

SQL6047N: Nodegroup cannot be redistributed because table name does not have a partitioning key

Use report DB6CONV to move all tables that do not have a partitioning key to a non-distributed tablespace for example, the tablespace for dimension tables.

After successfully redistributing database partition groups, perform a reorganization of all redistributed tables to free empty data pages on the source partitions. For reorganization, use the action REORG of Tables in Tablespace(s) in the DBA Planning Calendar. If you have disabled automatic statistics collection for the database, it is also necessary to schedule a runstats_all job afterwards. For more information see the SAP Library and choose SAP NetWeaver → SAP NetWeaver by Key Capability → Solution Life Cycle Management → System Management → Database Administration in CCMS → CCMS: SAP/DB2 UDB for UNIX and Windows.


84 11.11.2005

7 Best Practices

7.1 Data Clustering of InfoCube Fact Tables and DataStore Objects Concern The queries on the base fact tables of InfoCubes and tables of DataStore objects have a long execution time because the qualifying rows are spread across many data pages in the database.

Goal To reduce the I/O effort during query processing and to improve the query performance, the respective rows should be clustered in the physical data pages.

Solution Recluster the base fact tables of InfoCubes and tables of DataStore objects using MDC as described in Reclustering InfoCubes and DataStore Objects [page 63].

We recommend that you use multi-dimensional clustering (MDC) whenever applicable.

Example The following describes how you can improve the performance of an SAP BI query and thus of an SAP BI system by switching from index clustering to multi-dimensional clustering:

In general, the clustering index on a fact table has the following order according to the defined dimensions: ...

1. Time dimension

2. Dimension 1

3. Dimension 2

4. Dimension 3

5. Dimension <n>

Thus, the assignment of the characteristics to InfoCube dimensions determines the data clustering of fact tables in the database.

The query selects all materials that a specific customer acquired last year at a given time. On average, 2000 rows per customer are selected.

The distinct values of the characteristics are as follows:

• 5000 Customers

• 10000 Materials

• 10 Sales Organizations

• 50 Departments.


11.11.2005 85

The following is an example of a wrong assignment of dimensions to an InfoCube :

• Dimension 1: Material

• Dimension 2: Sales Organizations

• Dimension 3: Department

• Dimension 4: Customer

If the Customer characteristic is defined as the last dimension, 2000 rows from all over the fact table have to be selected for the query.This results in bad query performance because up to 2000 data pages must be read.

The following is an example of a suitable assignment of MDC dimensions as InfoCube dimension:

• MDC Dimension 1: TIME

• MDC Dimension 2: Customer

If the Customer characteristic is defined as an MDC dimension, 2000 rows have to be selected from only a few data pages of the fact table. This results in improved query performance because the same amount of rows has to be selected from a few data pages only.

7.2 Controlling Database Log Space Consumption Concern A SAP BI function stops with the error message that the database log space is exceeded.

Goal You want to reduce the amount of database log space that is consumed by a BI function.

Solution Certain operations in an SAP BI system require a considerable amount of log space in the database. You can control the log space consumption of these operations by setting the corresponding BI parameters, which belong to table RSADMINC or ROIDOCPRMS, as described in the following list:

• Function Aggregate Build / Change Run:

You have to set parameter BLOCKSIZE in table RSADMINC. In your SAP BI system call transaction SPRO and choose SAP Reference IMG → SAP Customizing Implementation Guide → SAP NetWeaver → General BW settings → Parameters for aggregates.

By default parameter BLOCKSIZE is set to 100,000,000. We recommend that you change this setting to a value between 5,000,000 and 10,000,000. Thus, you reduce sorting of large temporary tables, which are stored on disk, and also reduce log space consumption.


86 11.11.2005

You should not set parameter BLOCKSIZE to lower values, because this can result in an SQL WHERE statement that forces the optimizer to use an index other than the clustered index or MDC index. We suggest that you use value 5,000,000 for systems with less than 10 GB of real memory. If you have more real memory, we recommend that you set parameter BLOCKSIZE to a value up to 10,000,000.

The unit of measurement is records. For more information, see SAP Note 780104.

• Function Loading InfoPackage from file into PSA or InfoCube:

You have to set parameter IDOCPACKSIZE in table RSADMINC. In your SAP BI system, call transaction SPRO and choose SAP Reference IMG → SAP Customizing Implementation Guide → SAP NetWeaver → Business Intelligence → Links to Other Systems → Maintain Control Parameters for the data transfer → Package Size.

Parameter IDOCPACKSIZE determines how many records are inserted in one transaction. A reasonable value for IDOCPACKSIZE is 50,000. Very small values can have a negative impact on performance.

• Function Loading InfoPackage from R/3 Source System into PSA and/or InfoCubes:

You have to set parameter MAXSIZE in table ROIDOCPRMS in the source system. For more information about how to set this parameter, see SAP Note 417307.

Parameter MAXSIZE determines the size of a data package. The unit of measurement is "kilobytes".

If you are using parallel load processes and load data into several data targets concurrently, log space consumption will increase.

• Function InfoPackage Deletion:

You have to set parameter DB6_DELETE_PIECEWISE in table RSADMIN. By default, InfoCube data requests are deleted with one single SQL DELETE statement. You can delete InfoCube data requests in pieces with several SQL DELETE statements (piecewise delete). Thus you make sure that the required log space for deletion is limited. The deletion of InfoPackages also happens during compression. For more information about how to set parameter DB6_DELETE_PIECEWISE, see SAP Note 597053.

If the parameter is set to YES, data is deleted in blocks of 100,000 records.


11.11.2005 87

7.3 Dropping Indexes Before Deletion and Compression of InfoPackages Concern The deletion of InfoPackages from InfoCubes requires a long execution time. The same applies during compression when InfoPackages are deleted.

Goal You want to speed up the deletion of InfoPackages and thus improve the performance of your SAP BI system.

Solution The performance of INSERT and DELETE SQL statements is affected by the maintenance of indexes that exist on the tables. This specifically applies to data manipulation operations that affect a large number of rows on tables with several indexes.

E and F fact tables of InfoCubes have one row-based index defined on each BI dimension, which is not chosen as MDC dimension; E-Fact tables additionally have a compound row-based index over all dimensions.

Maintaining the indexes decreases the overall performance of loading and deleting large InfoPackages. Therefore, it is faster to drop the row-based indexes before the operations and to rebuild the indexes afterwards.

SAP BI provides functions to drop and to recreate the row-based secondary indexes, that is, the indexes on a single dimension of both fact tables of an InfoCube. MDC indexes are not dropped because they are required for data clustering. MDC indexes are smaller than row-based secondary indexes. Therefore, the effort for MDC index maintenance is a lot smaller compared to row-based secondary indexes.

The compound index on E fact tables over all dimensions, also known as P-index, is never dropped. It ensures the uniqueness on the compressed E fact table.

The dropping and recreation of secondary row-based indexes can be automatically applied when InfoPackages are loaded by switching on the flag Delete InfoCube indexes before each data load and then refresh.

To do so, proceed as follows: ...

1. In your SAP BI system, start the Data Warehousing Workbench.

2. Select the required InfoCube and choose Manage from the context menu.

The Manage Data Targets screen appears.

3. On the Performance tab, choose Create Indexes (batch).

The Create index for InfoCube <infocube_name> dialog box appears.

4. Check that Delete InfoCube indexes is selected before each data load.

5. Choose Refresh and confirm your entries.


88 11.11.2005

You can improve the overall performance of InfoPackage deletion or compression by dropping or recreating secondary indexes under the following circumstances:

• The InfoCube contains many uncompressed InfoPackages.

• The InfoCube contains no compressed InfoPackages

• The amount of data in the uncompressed F fact table is much higher than the amount of data in the compressed E fact table.

In single cases, there has been an overall improvement of performance for DELETE statements of up to 60 percent.

We recommend that you use a process chain to drop the secondary indexes before deletion or compression of InfoPackages and to recreate them afterwards.

You should not drop the indexes if the compressed E fact table is large or if the dataset for deletion or compression is small.

7.4 Increasing the Buffer Pool Hit Ratio Concern The hit ratio of a buffer pool is low and you see increasing number of READ/WRITE operations on the disk.

Goal To reduce I/O activities.

Solution Increase the size of the buffer pools. In addition, to improve performance you can also create one separate buffer pools for dimension, master data and SAP base tables.

By default one bufferpool is used for all tables. If you have fact, DataStore object or PSA tables, which are located on the same database partition as, dimension, master data and SAP base tables, then data and index pages of these different tables are written to the same buffer pool.

In general there are differences how often data is read from the different types of tables. Many times fact data is read only once for a single query. Dimension data or master data are frequently accessed by many queries. Thus, fact data can overwrite dimension or master data tables that are frequently used.


11.11.2005 89

Therefore, we would recommend that you have two separate buffer pools for the following objects:

• SAP base, Master and Dimension tables,

• Fact, Aggregate, DataStore and PSA tables.

In a multiple database partition environment you should use only one page size for your tablespaces and bufferpools otherwise the database optimizer might choose the wrong bufferpool for DB2 temporary tables used for sort operations.

For more information about how to create a bufferpool, see Creating Buffer Pools [Page 27].

Example The following figure shows an example with 4 database partitions.

… Partition 3

Buffer pools

Partition 0

TablespacesInfoCube and large Aggregate fact tables

Small Aggregate Fact tables

SAP base tables

Shared buffer pool for InfoCube / Aggragate Fact, DataStore and PSA tables

Shared buffer poolfor dimension, master data andSAP base tables

PSA tables

DataStore tables

Dimension tables of all InfoCubesand Aggregates

Master data tables

Two bufferpools are defined:

1. One buffer pool for dimension, master data and SAP base tables is only located on database partition 0.

2. A separate bufferpool is distributed across all partitions. The buffer pool is shared by all InfoCube, Aggregate, DataStore and PSA tables.

All tablespaces and bufferpools should be created with the same page size.


90 11.11.2005

8 Additional Sources of Information The following list provides additional sources of information:

• For detailed information about SAP BI, see SAP Service Marketplace at service.sap.com/bi

• For the installation of an SAP BI system on DB2 UDB for UNIX and Windows, see documentation Installation Guide – SAP NetWeaver 2004s <ABAP | ABAP+Java | Java> on <OS>: IBM DB2 Universal Database for UNIX and Windows.

You can find this documentation in SAP Service Marketplace at service.sap.com/instguidesNW2004s → Installation.

• For information about how to build and scale an SAP BI system, see the IBM redbook Building and Scaling an SAP Business Information Warehouse on DB2 UDB ESE, no. SG24-7094.

You can download this redbook as a PDF at www.ibm.com/redbooks.

• For information about SAP solutions on IBM DB2 UDB V8.2.2, see the IBM redbook SAP Solutions on IBM DB2 UDB V8.2.2 Handbook, no. SG24-6765

You can download this redbook as a PDF at www.ibm.com/redbooks.

• For information about the released operating systems for an SAP BI installation, see the Product Availability Matrix (PAM) in SAP Service Marketplace at service.sap.com/pam.

• SAP Notes

Note Number Title

609384 DB6: Known Errors and Available Fixes in DB2 UDB V8

584952 DB6: DB2 UDB ESE Version 8 Standard Parameter Settings

546262 DB6: Administration & Performance on SAP BW, SAP SCM and SAP SEM

374502 DB6: DB2 UDB - BW Performance - Overview of notes

192658 Setting basis parameters for BW Systems

database administration guide - sap · pdf filedatabase administration guide sap ... netweaver...

Documents