managing extended attributes with an enterprise guide add-in larry hoyle, institute for policy &...
TRANSCRIPT
Managing Extended Attributes With an Enterprise Guide Add-In
Larry Hoyle, Institute for Policy & Social Research, University of Kansas
2
SAS 9.4 – Extended Attributes (name,value) pairsdata mySasData; fee=1; run;
proc datasets lib=work nolist ; modify mySasData;
XATTR SET VAR fee (Question='How much did you pay?') ;
XATTR SET VAR fee (MeasurementUnits='Total payment in Euros.') ;
XATTR SET DS Abstract='dataset abstract';
Name Value
3
PROC Contents data=mySasData;Dataset
Variable
4
Metadata in the Dataset Extended attributes (name,value) pairs useful for
metadata about The dataset Each variable
Recording metadata early is desirable If you don’t have a repository, how about in the dataset?
Using standard names is desirable Searching, comparison
Prompts could be useful (one less vocabulary to learn)
5
Enterprise Guide Addins .Net applications integrated into EG
Reference: Hemedinger, Chris. Custom Tasks for SAS® Enterprise Guide® Using Microsoft .NET
6
Addin “Extended Attributes”
7
Edit Settings and Extended AttributesMultiple tab interface
Here the “Edit” tab is the tab for entering the extended attributes
8
Global Settings
These globalsettings apply to all output
In detail they are ….
9
Agency, Version, Dataset Name
DDI IDs unique within agency
Default metadata version
Name for the updated data
10
Example: Set a Title for the Dataset
When “{DATASET_Attribute}”
Is selected, the Key,Value pair applies to the whole dataset
11
Set a MeasurementUnits for the Variable “fee”
12
Extended Attribute Names from a Vocabulary
A combo box allows for: • selection from a list
or • entry of a new term
Promotes consistency
13
Make Definitions of Provided Terms Available
14
Vocabulary Based on a Metadata Standard (DDI)Metadata Can be Output in XML
15
Outputs DDI Lifecycle 3.2 – Data Documentation Initiative Metadata
A major standard for the social sciences
See: http://www.ddialliance.org/
Could be adapted for other standards(e.g. CDISC? EML? Open Data?) Would need code to write the XML, JSON, etc.
16
Recognized Terms Generate Valid DDI Structure<l:VariableName>
<r:String>fee</r:String>
</l:VariableName>
<r:Label>
<r:Content>Fee in Euros</r:Content>
</r:Label>
<r:MeasurementUnit>Euros</r:MeasurementUnit>
r:MeasurementUnit defined in DDI 3.2 schema
17
SAS Built-in Metadata into Valid XML Structure<l:VariableName>
<r:String>fee</r:String>
</l:VariableName>
<r:Label>
<r:Content>Fee in Euros</r:Content>
</r:Label>
<r:MeasurementUnit>Euros</r:MeasurementUnit>
From SAS variable name
From SAS variable label
Entered by user
18
Other Terms in DDI as Name,Value Pairs<r:UserAttributePair>
<r:AttributeKey>RockChalk</r:AttributeKey>
<r:AttributeValue>Jayhawk</r:AttributeValue>
</r:UserAttributePair>
<r:UserAttributePair>
<r:AttributeKey>MeasurementUnits</r:AttributeKey>
<r:AttributeValue>Euro</r:AttributeValue>
</r:UserAttributePair></r:UserAttributePair>
19
Outputs a Codebook
Note: A codebook could also be generated with an XSLT from the DDI
20
.Net Code Writes SAS Code (PROC Datasets)
21
Saves, and Runs it
22
Extended Attributes in a Copy of the Dataset The extended attributes become part of the dataset
In Windows XATTRS are in a .sas7bxat file
23
How do you Build a Custom Task?
24
GUI Tool Helps Lay Out the Interface
Form Designed in Visual Studio
DLL called by Enterprise Guide
25
.Net Code Handles the InterfaceControls Generate Events
private void btnEnterAttribute_Click(object sender, System.EventArgs e) { try { // composite key: var name and attribute string key = Settings.PackVarAttrKey(cmbVariable.Text, cmbExVarAttribute.Text);// Look up the key in the hash. If it is there then replace the text in the hash. If it is not then add a new entry to the hash if (Settings.hashVarAttrs.ContainsKey(key)) { Settings.hashVarAttrs.Remove(key); } Settings.hashVarAttrs.Add(key, txtVarAttributeValue.Text); rtbXmlProperties.Text = Settings.ToXml();
Events Trigger Event Handler Code
Click
26
Embedded SAS Code Scrapes Dataset for Metadata/* capture the current user formats in _MyFormats */
proc format cntlout=work._MyFormats0; run;
/* ods select nlevels; */
ods output nlevels=_VariableLevelsNF;
proc freq data=&lib..&dataset nlevels ;
tables _all_;
format _all_;
run;
Lots more SAS code….
27
C# Code Writes SAS Codesb.Append("title 'Contents of the Revised Dataset';");
sb.AppendFormat("proc datasets lib={0} nolist ;\n", ActiveLibrary);
// NOTE: *************** hard wired segment length here *****************************
sb.AppendFormat("\tmodify {0} ;\n xattr options seglen = 4000;\n", outMember
// go through the hash table and generate the appropriate xattr commands
28
C# Program Can Run SAS Code// submit program and wait for completion submitter.SubmitSasProgramAndWait(rtbSASProgram.Text, out log);
Submits the SAS Code in this window
29
On Closing EG Calls GetSasCode() EG submits SAS code to run in a SAS session after
the .NET program has terminated
Important user settings can be saved and retrieved for this session
In this case the user choices for XATTRS are needed for a PROC Datasets
These can be saved in an XML structure with ToXML()
30
Saving State, XML Produced by ToXML()
Preserved:
Valid attribute names
(Name,value) pairs
31
An Ad Hoc XML Schema for Saving SettingsXDocument doc = new XDocument(
new XDeclaration("1.0", "utf-8", string.Empty),
new XElement("ExtendedAttributesTask",
new XElement("Version", XmlVersion),
new XElement("OutMember", outMember),
datasetAttributeElements,
variableAttributeElements,
xattrs)
<Version>1.0</Version>
32
Settings Retrieved by FromXML()public void FromXml(string xml)
{
XDocument doc = XDocument.Parse(xml);
XElement elVersion = doc
.Element("ExtendedAttributesTask")
.Element("Version");
XmlVersion = elVersion.Value;
<Version>1.0</Version>
33
Potential Uses
34
Codebook - Dataset
35
Codebook – Variables, Native to SAS
Sorry, Couldn’t avoid the pun
36
Codebook – Variables, Native to SAS
Categorical Variables
Continuous Variables
37
Variables – Extended Attributes
38
Just some kinds of metadata Attribution (e.g. Creator, Contributor) Variable relationships (e.g. derived from) Universe (e.g. persons over 65) Survey question Code snippets
39
Issues
40
Representing a Hierarchy as Name,Value Pairs Sometimes more structure is needed than simple values
Example: Contributor*
Name
ORCID
(Role, Degree of Contribution)*
Use XML for values?
Several Contributors
Each with multiple roles
41
Future Directions
42
Possibilities Process-
Capture EG process flow
Allow structure in attributes (XML?) Example: multiple roles and degree of contribution for
contributors
43
Other Approaches Adapt “literate programming” methods
to incorporate structured metadata
Examples SASWeave, StatRep
See:
http://homepage.cs.uiowa.edu/~rlenth/SASweave/
http://support.sas.com/rnd/app/papers/statrep.html
http://www.mwsug.org/proceedings/2012/PH/MWSUG-2012-PH09.pdf
44
Suggestions for SAS
45
Allow Extended Attributes to Follow Variables “regular” attributes follow variables into other datasets
E.g. DATA step set statement, or SQL select statement.
Shouldn’t extended attributes do the same?
46
Imagine Imagine if when a variable was created it was assigned
a UUID
This would not change if the variable was renamed? (e.g. data dataB(rename=(sex=gender)); )
Or it could be used for provenance chains BMI = Weight/Height**2; “BasedOn” attribute for BMI gets UUIDS for Weight and Height
This could be harvested for documentation
47
Acknowledgements Thanks to Chris Hemedinger for his very useful book
and his response to questions.
48
Session ID #2481
Thank you.Contact Information: [email protected]
Complete code for this project has been archived at:http://kuscholarworks.ku.edu/dspace/handle/1808/12488