creating and using custom formats for data manipulation and summarization presented by john schmitz,...
TRANSCRIPT
![Page 1: Creating and Using Custom Formats for Data Manipulation and Summarization Presented by John Schmitz, Ph.D. Schmitz Analytic Solutions, LLC Certified Advanced](https://reader035.vdocuments.us/reader035/viewer/2022062803/56649f3e5503460f94c5f4d7/html5/thumbnails/1.jpg)
Creating and Using Custom Formats for Data Manipulation and Summarization
Presented byJohn Schmitz, Ph.D.
Schmitz Analytic Solutions, LLCCertified Advanced Programmer for SAS ®9
Presentation to Denver SAS Users Group
January 14, 2009
![Page 2: Creating and Using Custom Formats for Data Manipulation and Summarization Presented by John Schmitz, Ph.D. Schmitz Analytic Solutions, LLC Certified Advanced](https://reader035.vdocuments.us/reader035/viewer/2022062803/56649f3e5503460f94c5f4d7/html5/thumbnails/2.jpg)
Summary• Summary:
• This presentation will review creation of custom formats directly in PROC FORMAT as well as through the data step.
• These formats will then be used as lookup tables in a data step, replacing a merge and for summarization within a CLASS statement.
• Audience: • The presentation is introductory to intermediate level. No
experience with SAS formats is assumed.• This talk will serve as a base for our second talk on
Multilabel formats.Schmitz Analytic Solutions, 2009
![Page 3: Creating and Using Custom Formats for Data Manipulation and Summarization Presented by John Schmitz, Ph.D. Schmitz Analytic Solutions, LLC Certified Advanced](https://reader035.vdocuments.us/reader035/viewer/2022062803/56649f3e5503460f94c5f4d7/html5/thumbnails/3.jpg)
Outline• Define meaning of a ‘SAS Custom Format’• Creating custom formats with PROC FORMAT.• Creating custom formats in a DATA STEP.• Using a custom format within a CLASS statement.• Using a custom format in place of a data merge.
Schmitz Analytic Solutions, 2009
![Page 4: Creating and Using Custom Formats for Data Manipulation and Summarization Presented by John Schmitz, Ph.D. Schmitz Analytic Solutions, LLC Certified Advanced](https://reader035.vdocuments.us/reader035/viewer/2022062803/56649f3e5503460f94c5f4d7/html5/thumbnails/4.jpg)
Before I Begin• Nikki Carroll presented ‘Tap Into the Power of Formats’ (JUL
07). That presentation is available on www.denversug.org. It will provide a nice complement to the comments I have here.
Schmitz Analytic Solutions, 2009
![Page 5: Creating and Using Custom Formats for Data Manipulation and Summarization Presented by John Schmitz, Ph.D. Schmitz Analytic Solutions, LLC Certified Advanced](https://reader035.vdocuments.us/reader035/viewer/2022062803/56649f3e5503460f94c5f4d7/html5/thumbnails/5.jpg)
A Quick note on Naming Convention
• Throughout today’s presentations, you will notice a naming convention.• Format names end in Fmt• Informat names end in InFmt• Multilabel formats end in MLF• When data sets are used, dataset names match the name
of the format they define.
• This is my naming convention and is not required to make the code work.
Schmitz Analytic Solutions, 2009
![Page 6: Creating and Using Custom Formats for Data Manipulation and Summarization Presented by John Schmitz, Ph.D. Schmitz Analytic Solutions, LLC Certified Advanced](https://reader035.vdocuments.us/reader035/viewer/2022062803/56649f3e5503460f94c5f4d7/html5/thumbnails/6.jpg)
Defining a ‘SAS Custom Format’• SAS has many built in formats that most users already use:
(6.2, DOLLAR8.2, YYQ6., COMMA15.0, $15., DATE9.)• Base SAS provides PROC FORMAT which allows the user to expand
the pre-defined format list with formats tailored to specific needs. These are ‘SAS Custom Formats’.
• PROC FORMAT allows creation of a• FORMAT (numeric or character)• INFORMAT (numeric or character)• PICTURE (numeric)
• Custom formats are stored in catalogs. • by default in work.formats and cleared at end of session.• User can define an alternate catalog and retain formats.
Schmitz Analytic Solutions, 2009
![Page 7: Creating and Using Custom Formats for Data Manipulation and Summarization Presented by John Schmitz, Ph.D. Schmitz Analytic Solutions, LLC Certified Advanced](https://reader035.vdocuments.us/reader035/viewer/2022062803/56649f3e5503460f94c5f4d7/html5/thumbnails/7.jpg)
Creating a simple custom formatPROC FORMAT;
value $BossNamefmt
‘1’ = ‘Abe Lincoln’
‘2’ = ‘George Washington’ ;
RUN;
DATA example1;
SET sashelp.company;
boss = PUT(DeptHead,$BossNameFmt20.);
RUN;
Schmitz Analytic Solutions, 2009
Input field is $15 format so a character format is required.
Use format width to define length of output character variable.
Use FORMAT with PUT and use INFORMAT with INPUT.
For a PUT, match VARIABLE and FORMAT types, output will be CHAR.
For a INPUT, use CHAR variable, output type will match format type.
See SAS Example1.
![Page 8: Creating and Using Custom Formats for Data Manipulation and Summarization Presented by John Schmitz, Ph.D. Schmitz Analytic Solutions, LLC Certified Advanced](https://reader035.vdocuments.us/reader035/viewer/2022062803/56649f3e5503460f94c5f4d7/html5/thumbnails/8.jpg)
A quick INFORMAT CasePROC FORMAT; invalue $GroupInFmt 'FINANCE' = 'FIN' ‘MARKETING' = 'M/S' 'SALES' = 'M/S' 'MIS' = 'MIS' other = 'N/A’; invalue CityInFmt 'LONDON' = 1 'NEW YORK' = 2 'TOKYO' = 3;RUN;
DATA example2; set sashelp.company (keep=level2 level4);
GroupCode = input(LEVEL4,$GroupInFmt.);
CityID = input(LEVEL2,CityInFmt.);
RUN;
Schmitz Analytic Solutions, 2009
Output variables are same type as the informat used.
This format is a great example of how custom formats could be used to replace a large SELECT statement within code.
Keyword OTHER is used for any unspecified value.
See SAS Example2.
![Page 9: Creating and Using Custom Formats for Data Manipulation and Summarization Presented by John Schmitz, Ph.D. Schmitz Analytic Solutions, LLC Certified Advanced](https://reader035.vdocuments.us/reader035/viewer/2022062803/56649f3e5503460f94c5f4d7/html5/thumbnails/9.jpg)
A more realistic format examplePROC FORMAT;
value seasonsFmt
other = ‘Missing’
low - < ‘21DEC2008’d = ‘Prior ’
‘21DEC2008’d - < ’20MAR2009’d = ‘2009S1’
’20MAR2009’d - < ’21JUN2009’d = ‘2009S2’
’21JUN2009’d - < ’22SEP2009’d = ‘2009S3’
’22SEP2009’d - < ’21DEC2009’d = ‘2009S4’
’21DEC2009’d - high = ‘Post ’;
RUN;
Schmitz Analytic Solutions, 2009
Special values LOW and HIGH can be used for MIN and MAX values.
-< START is INCLUSIVE. END is EXCLUSIVE<- START is EXCLUSIVE. END is INCLUSIVE<-< NEITHER VALUE IS INCLUSIVE - BOTH VALUES ARE INCLUSIVE
![Page 10: Creating and Using Custom Formats for Data Manipulation and Summarization Presented by John Schmitz, Ph.D. Schmitz Analytic Solutions, LLC Certified Advanced](https://reader035.vdocuments.us/reader035/viewer/2022062803/56649f3e5503460f94c5f4d7/html5/thumbnails/10.jpg)
Creating custom formats in a DATA STEP.
• The PROC FORMAT examples shown have many limitations. • More complex cases can become very lengthy.• Content is not dynamic.• Code can be difficult to maintain.• May wish to ‘auto generate’ code from data set contents
from programming logic.• DATA STEPS can be used to greatly expand the usability of
custom formats.
Schmitz Analytic Solutions, 2009
![Page 11: Creating and Using Custom Formats for Data Manipulation and Summarization Presented by John Schmitz, Ph.D. Schmitz Analytic Solutions, LLC Certified Advanced](https://reader035.vdocuments.us/reader035/viewer/2022062803/56649f3e5503460f94c5f4d7/html5/thumbnails/11.jpg)
PROC FORMAT IMPORT / EXPORT FEATURE
• PROC FORMAT provides options to read (write) format definitions from (to) data sets.
• To generate a data set containing data from custom formats:
PROC FORMAT cntlout = formats;
RUN;• To generate a (or multiple) custom format from data set:
PROC FORMAT cntlin = formatData;
RUN;
Schmitz Analytic Solutions, 2009
This data file must contain variables: START FMTNAME and LABEL.Other fields can be included to further control the format definition.
![Page 12: Creating and Using Custom Formats for Data Manipulation and Summarization Presented by John Schmitz, Ph.D. Schmitz Analytic Solutions, LLC Certified Advanced](https://reader035.vdocuments.us/reader035/viewer/2022062803/56649f3e5503460f94c5f4d7/html5/thumbnails/12.jpg)
Equinox data
Schmitz Analytic Solutions, 2009
Season Begins Ends
1988S1 22-Dec-87 20-Mar-88
1988S2 20-Mar-88 21-Jun-88
1988S3 21-Jun-88 22-Sep-88
1988S4 22-Sep-88 21-Dec-88
1989S1 21-Dec-88 20-Mar-89
1989S2 20-Mar-89 21-Jun-89
… … …
2010S4 23-Sep-10 21-Dec-10
Data stored in EquinoxData.xls
![Page 13: Creating and Using Custom Formats for Data Manipulation and Summarization Presented by John Schmitz, Ph.D. Schmitz Analytic Solutions, LLC Certified Advanced](https://reader035.vdocuments.us/reader035/viewer/2022062803/56649f3e5503460f94c5f4d7/html5/thumbnails/13.jpg)
Converting RAW DATA to FORMAT
Schmitz Analytic Solutions, 2009
data SeasonsFmt; set equinoxData (rename=(begins=START ends=END season=LABEL)) end=last; length HLO $5; retain fmtname 'SeasonsFmt' SEXCL 'N' EEXCL 'Y' hlo '‘ firstDate ; if _N_ = 1 then firstDate = start; output; if last then do; start = end; end=.; label = 'POST'; hlo = 'H';EEXCL=‘N’; output; EEXCL=‘Y’; hlo=‘O'; start=.; end=.; label='MISSING'; output; start = .; end= FirstDate; label = 'PRIOR'; hlo = 'L'; output; end;
keep start end label fmtname hlo sexcl eexcl; format start end date9.;run;
![Page 14: Creating and Using Custom Formats for Data Manipulation and Summarization Presented by John Schmitz, Ph.D. Schmitz Analytic Solutions, LLC Certified Advanced](https://reader035.vdocuments.us/reader035/viewer/2022062803/56649f3e5503460f94c5f4d7/html5/thumbnails/14.jpg)
Using the Custom Format in PROC MEANS
Schmitz Analytic Solutions, 2009
DATA sample;
set sashelp.citiday
(keep=date SNYDJCM);
RUN;
PROC MEANS data=sample N MEAN;
class DATE;
format DATE seasonsFmt.;
var SNYDJCM;
RUN;
See SAS Example3.
![Page 15: Creating and Using Custom Formats for Data Manipulation and Summarization Presented by John Schmitz, Ph.D. Schmitz Analytic Solutions, LLC Certified Advanced](https://reader035.vdocuments.us/reader035/viewer/2022062803/56649f3e5503460f94c5f4d7/html5/thumbnails/15.jpg)
Using custom formats as a convenient lookup table.
Schmitz Analytic Solutions, 2009
• Custom formats can be used for lookup tables, potentially avoiding complex table joins and resource intensive sorts.
• These lookup tables are limited to one field as a lookup criteria. *
• Use of lookup formats will require additional memory since the format definition is maintained in memory.
• They perform best when the lookup table is small relative to the main table.
* One can use HASH tables for lookups involving multiple criteria and obtain similar performance benefits.
![Page 16: Creating and Using Custom Formats for Data Manipulation and Summarization Presented by John Schmitz, Ph.D. Schmitz Analytic Solutions, LLC Certified Advanced](https://reader035.vdocuments.us/reader035/viewer/2022062803/56649f3e5503460f94c5f4d7/html5/thumbnails/16.jpg)
COMMENTS / QUESTIONS?
Schmitz Analytic Solutions, 2009
For More Information, ContactJohn Schmitz
Schmitz Analytic Solutions
Phone: 303-482-1860
Email: [email protected]
Web: http://www.SchmitzAnalyticSolutions.com