the complete guide to sas indexes by michael a. raithel reviewed by horst wolter
DESCRIPTION
The Complete Guide to SAS Indexes by Michael A. Raithel Reviewed by Horst Wolter. About the Author. Wrote two previous SAS books and over twenty SAS technical papers. Section Chair at SUGI, SESUG and NESUG and co-chaired NESUG in 1995. - PowerPoint PPT PresentationTRANSCRIPT
Slide 2
About the Author
• Wrote two previous SAS books and over twenty SAS technical papers.
• Section Chair at SUGI, SESUG and NESUG and co-chaired NESUG in 1995.
• First book entitled Tuning SAS Applications in the MVS Environment resides in the Smithsonian Institution of American History’s Permanent Research Collection of Information Technology.
Slide 3
Goals of Indexing
• The main goal of using a SAS index is to read only a small portion of a large SAS dataset instead of reading the entire SAS dataset.
• Reduce I/Os.
• Reduce wall clock time.
• Reduce CPU time.
Slide 4
When to Index
Consider
• the size of the subsets (the overhead of using an index can become greater than the overhead of a sequential read of the entire dataset)
• frequency of use (is building an index cost effective)
• variability of the data (indexed SAS datasets have an overhead to maintain if data changes often and may not be cost effective)
Slide 5
1% to 15% An index will definitely improve processing. There should be dramatic resource savings in the lower end of this range.
16% to 30% An index will improve processing. However, the resource savings will not be as dramatic as in the lower range.
31% to 60% An index may improve processing, or it might worsen processing. Be very careful in this subset range.
61% to 100% Do not use an index. A sequential read of the entire dataset is very likely to be more efficient.
Indexing ActionSubset Size
Indexing Guidelines(Table 2.1 from book)
Slide 6
Index Variable Selection
What variables to consider for an index?
• Often used to subset data
• Values that represent a small subset
• A SAS dataset sorted by the index variable is more efficient
Slide 7
How to create an Index
• Data Step Method
DATA data-set-name(INDEX=(index-name=(var1 var2 etc.) </UNIQUE> </NOMISS>));
• SQL Method
proc sql;CREATE <UNIQUE> INDEX index-name ON table-name ( column
<, ... column>); quit;
Slide 8
Viewing Index Information
proc contents data=data-set-name <centiles>;run;
Indexes 1
Alphabetic List of Indexes and Attributes # of Unique # Index Values 1 var1 1535958
Slide 9
Using Indexes
• With a WHERE clause
• With a BY statement
SAS determines whether or not to use an index based on a number of rules as described in the book.
Slide 10
Other Topics covered in the book
• Composite indexes
• Creating indexes while using PROC SORT, PROC DATASETS
• Creating Multiple Indexes
• Rules for SAS Using an Index
• Removing Indexes
• Index options e.g. Unique Option, NoMiss Option
• Recovering Missing Index Files
• Repairing Damaged Index Files
• And more