contents....sam the code examples for the .sam file are shown in a fixed-width font. test button the...

268

Upload: others

Post on 04-Mar-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product
Page 2: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product
Page 3: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Contents

About This Book .......................................................................................1Audience .......................................................................................................... 1Prerequisites ..................................................................................................... 1Conventions ...................................................................................................... 2

What’s New in SAS Sentiment Analysis Studio 1.3 ...............................3Overview .......................................................................................................... 3Improved User Interface .................................................................................. 3Rule Enhancements .......................................................................................... 3Additional Languages ...................................................................................... 4Licensing .......................................................................................................... 5

1 About SAS Sentiment Analysis Studio ................................................71.1 What Is SAS Sentiment Analysis Studio? ................................................. 71.2 Benefits of Using SAS Sentiment Analysis Studio .................................... 91.3 How Does SAS Sentiment Analysis Studio Work? ................................... 91.4 Architecture ................................................................................................ 10

2 Before You Begin ...................................................................................13

3 Understanding the Interface Components ..........................................153.1 Your First Look at the SAS Sentiment Analysis Studio User Interface .... 153.2 The SAS Sentiment Analysis Studio Menus .............................................. 17

3.2.1 About the Availability of Menus and Menu Selections ................... 173.2.2 About Menus .................................................................................... 173.2.3 File Menu ......................................................................................... 173.2.4 Edit Menu ......................................................................................... 183.2.5 View Menu ....................................................................................... 193.2.6 Build Menu ...................................................................................... 193.2.7 Help Menu ....................................................................................... 20

3.3 The Standard Toolbar ................................................................................. 203.4 The New Project Wizard ............................................................................ 21

3.4.1 Overview of the New Project Wizard .............................................. 213.4.2 Access Project Settings .................................................................... 223.4.3 Specify the Rule-Based Model Settings .......................................... 23

iii

Page 4: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

3.4.4 Specify the Statistical Model Settings ..............................................263.4.5 Use the Summary Window ...............................................................28

3.5 The Main SAS Sentiment Analysis Studio User Interface .........................293.5.1 Overview of the User Interface ........................................................293.5.2 The Corpora Tab ..............................................................................303.5.3 The Statistical Tab ............................................................................31

3.5.3.A Overview of the Statistical Tab .............................................313.5.3.B Define a New Model .............................................................323.5.3.C Use the Statistical Model Configuration Tab for the Simple Model .....................................................................................353.5.3.D Use the Statistical Model Configuration Tab for the Advanced Model ................................................................................36

3.5.4 Using the Text Result Tab ................................................................383.5.5 Using the Graphical Result Tab .......................................................393.5.6 Train, Activate, Validate, or Delete the Model ................................41

3.6 The Rule Tab ..............................................................................................423.6.1 Overview of the Rule Tab ................................................................423.6.2 Add Concepts ...................................................................................443.6.3 Drop-down Operations .....................................................................483.6.4 Using the Search Field .....................................................................493.6.5 The Rule-Writing Pane .....................................................................503.6.6 The Search Result, Syntax Errors Result, and Rule Evaluation Panes .......................................................................................52

3.7 The Test Tab ...............................................................................................553.8 The Project Settings Dialog Wizard ...........................................................613.9 The Miscellaneous Windows ......................................................................62

3.9.1 The Search Rules Dialog Box ..........................................................623.9.2 The New Model Dialog Dialog Box ................................................653.9.3 The Import Precompiled Model Dialog Box ....................................663.9.4 The Import Learned Features Dialog Box ........................................693.9.5 The Add Intermediate Entity Dialog Box ........................................723.9.6 The Intermediate Entity Property Dialog Box .................................733.9.7 The Add Product Dialog Box ...........................................................743.9.8 The Product Property Dialog Box ....................................................753.9.9 The Add New Feature Dialog Box ...................................................753.9.10 The Feature Property Dialog Box ..................................................763.9.11 The Rule Evaluation Dialog Dialog Box .......................................773.9.12 The Rule Editor Dialog Box for Rules with Boolean Operators ....793.9.13 The Test Configuration Dialog Box ...............................................81

iv SAS Sentiment Analysis Studio: User’s Guide

Page 5: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

3.9.14 The Select Font Dialog Box ........................................................... 833.9.15 The New Corpus Dialog Dialog Box ............................................. 853.9.16 The Choose a File and Similar Dialog Boxes ................................ 863.9.17 The Information and SAS Sentiment Analysis Studio Windows .. 87

4 Accessing a Project and Choosing a Model ........................................894.1 Overview of Project Creation ..................................................................... 894.2 Defining a New SAS Sentiment Analysis Studio Project .......................... 91

4.2.1 Create a New Project ....................................................................... 914.3 Access an Existing SAS Sentiment Analysis Studio Project ..................... 964.4 How Your Selected Model Appears in the Interface ................................. 98

5 Creating a Statistical Model ..................................................................1015.1 Assembling Training Documents ............................................................... 1015.2 Set Up the Training and Validation Corpora ............................................. 1025.3 Create One or More Models for Your Project ........................................... 107

5.3.1 Overview of Choosing a Model ....................................................... 1075.3.2 Create a Simple Statistical Model .................................................... 1085.3.3 Create an Advanced Statistical Model ............................................. 110

5.4 Building, Training, and Validating a Model .............................................. 1155.4.1 Understanding the Interrelationship of These Operations ............... 1155.4.2 Building a Statistical Model ............................................................. 1155.4.3 Train a Statistical Model .................................................................. 1165.4.4 Validate a Statistical Model ............................................................. 1185.4.5 Activate a Statistical Model ............................................................. 121

5.5 Import a Precompiled Model ..................................................................... 1215.6 Before You Test Your Statistical Model .................................................... 124

6 Creating a Rule-Based Model ...............................................................1296.1 Overview of the Rule-based Model ........................................................... 1296.2 Understanding Sentiment Computation ..................................................... 130

6.2.1 Overview of Sentiment Computation .............................................. 1306.2.2 Using Pseudo Code to Understand Sentiment Computation ........... 1306.2.3 Change the Weight for a Rule .......................................................... 131

6.3 Step 1: Defining Your Keywords ............................................................... 1326.3.1 Understanding Keywords ................................................................. 1326.3.2 Add Keyword Rules into the Keywords Node ................................ 1336.3.3 Load Keywords Using the Import Learned Features Operation ...... 134

SAS Sentiment Analysis Studio: User’s Guide v

Page 6: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

6.3.4 Write Keyword Definitions ..............................................................1366.3.5 Adding Weights ................................................................................137

6.4 Step 2: Specifying Product and Feature Information .................................1396.4.1 Add a Product ...................................................................................1396.4.2 Add a Product Feature ......................................................................141

6.5 Step 3: Specifying Rules .............................................................................1426.5.1 Overview of Specifying Rules .........................................................1426.5.2 Writing Rules ...................................................................................1436.5.3 Import Rules .....................................................................................143

6.6 Step 4: Edit Your Rules ..............................................................................1446.7 Step 5: Build the Rule-based Model ...........................................................1476.8 Step 6: Specify the Test Configuration .......................................................1476.9 Export Your Rules ......................................................................................149

7 Writing Sentiment Analysis Rules ........................................................1517.1 The Benefits of Rules .................................................................................1517.2 What You Need to Know Before You Write Your Sentiment Analysis Rules ..................................................................................................152

7.2.1 Overview of What You Need to Know ............................................1527.2.2 Verify the Accuracy of Your Rules ..................................................1547.2.3 Ensure Accurate Rule Matches ........................................................154

7.3 Rule Types ..................................................................................................1557.4 Table of Rule Modifiers .............................................................................1567.5 What Are the Building Blocks for Sentiment Analysis Rules ....................157

7.5.1 About the n-gram Sequence .............................................................1577.5.2 Entering Comments into Rules .........................................................1577.5.3 Specify a Match within an XML Field .............................................1587.5.4 Specifying the _c Marker .................................................................1627.5.5 Specifying Arguments ......................................................................1637.5.6 Specifying the _w Marker ................................................................1647.5.7 Specifying the _cap Marker .............................................................1647.5.8 Specifying the > Symbol ..................................................................1657.5.9 Specifying the @ Sign to Match Word Forms .................................1667.5.10 Specifying Rule Punctuation Marks ...............................................167

7.5.10.A Specifying Quotation Marks ...............................................1677.5.10.B Specifying Parentheses, Square Braces, and Curly Braces .1687.5.10.C Specifying Commas .............................................................1687.5.10.D Specifying Colons ...............................................................1697.5.10.E Specifying Spaces ................................................................169

vi SAS Sentiment Analysis Studio: User’s Guide

Page 7: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

7.5.11 Specifying Boolean Operators ....................................................... 1707.5.11.A Overview of Boolean Operators ......................................... 1707.5.11.B Specifying the ALIGNED Operator ................................... 1717.5.11.C Specifying the AND Operator ............................................. 1737.5.11.D Specifying the OR Operator ............................................... 1757.5.11.E Specifying the ORD Operator ............................................. 1777.5.11.F Specifying the DIST_n Operator ......................................... 1797.5.11.G Specifying the ORDDIST_n Operator ................................ 1807.5.11.H Specifying the SENT Operator ........................................... 181

7.5.12 Specifying the _def Marker ........................................................... 1817.5.13 Specify a Part-of-Speech Tag ........................................................ 1817.5.14 Writing Regular Expressions ......................................................... 1847.5.15 Specifying Coreference Operators ................................................. 187

7.6 Sentiment Analysis Concept Definition Examples .................................... 1927.6.1 Example: Matching a Term with a CLASSIFIER Rule ................... 1927.6.2 Example: Matching a CONCEPT Rule ........................................... 1947.6.3 Example: Context Matching with a C_CONCEPT Rule ................. 1967.6.4 Example: Matching Boolean Operators in a Concept_Rule ............ 1987.6.5 Examples: Predicate Rule ................................................................ 200

7.6.5.A Overview of Predicate Rule Examples ................................. 2007.6.5.B Example 1: Matching Two Different Concepts .................... 2017.6.5.C Example 2: Referencing One Concept and Two Terms ........ 2037.6.5.D Comparing Products and Features Using a Predicate Rule .. 205

7.6.6 Example: Using Regular Expressions to Match Patterns ................ 2077.7 Troubleshoot Your Rules ........................................................................... 209

8 Creating a Hybrid Model ........................................................................2118.1 Understanding a Hybrid Model .................................................................. 2118.2 Create a Hybrid Model ............................................................................... 2118.3 Test Your Hybrid Model ............................................................................ 213

9 Testing Your Models, Exporting Your Rules .......................................2159.1 Assembling Testing Documents ................................................................ 215

9.1.1 Overview of Assembling Testing Documents ................................. 2159.2 Import the Testing Documents ................................................................... 2169.3 Testing a Model .......................................................................................... 218

9.3.1 Overview of Testing a Model .......................................................... 2189.3.2 Test One Folder ................................................................................ 2189.3.3 Test One Document ......................................................................... 2209.3.4 Manually Test a Document .............................................................. 223

SAS Sentiment Analysis Studio: User’s Guide vii

Page 8: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Appendixes ............................................................................ 227

A The Program Files .................................................................................229A.1 Overview of the Program Files ..................................................................229A.2 What Are the Files in the Project Folder? .................................................230A.3 What Are the Tags in the Project Settings XML File Format? .................232A.4 What Are the Tags in the Rules File? ........................................................233

B Regular Expressions .............................................................................235B.1 What Rules and Restrictions Apply to Regular Expressions? ...................235B.2 What Special Characters Are Used with Regular Expressions? ................237B.3 What Are the Special Cases .......................................................................238

C Part-of-Speech Tags .............................................................................239

D Recommended Reading .......................................................................245

E Glossary .................................................................................................247Index ...........................................................................................................251

viii SAS Sentiment Analysis Studio: User’s Guide

Page 9: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

About This Book

Audience

SAS Sentiment Analysis Studio is designed for the following users:- subject matter experts who write the complex rules that identify the

sentiment expressed in input documents.- persons who assemble sets of representative documents to use for

training and testing purposes.- (optional) linguists who develop the rules to extract the expressed

sentiments.- (optional) users who select one of the advanced statistical models

available in SAS Sentiment Analysis Studio.You could be assigned a specific function, or you might develop a project by yourself.

Prerequisites

Here are the prerequisites for using SAS Sentiment Analysis Studio:- SAS Sentiment Analysis Studio installed on your machine- access to representative documents where you want to locate sentiments

about your products and their features

1

Page 10: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Conventions

This manual uses the following typographical conventions:

Convention Description

TGM_ROOT The root directory where SAS Sentiment Analysis Studio is installed, typically the following:

Windows: C:/Program Files/Teragram/SAS Sentiment Analysis Studio/SAMUNIX: /opt/SAS Sentiment Analysis Studio

.sam The code examples for the .sam file are shown in a fixed-width font.

Test button The labels for user interface controls are shown in a bold, sans-serif font.

Product The names of taxonomy nodes appear in a fixed-width font.

www.sas.com The hypertext links are shown in a light blue, fixed-width font, and are underlined.

2 SAS Sentiment Analysis Studio: User’s Guide

Page 11: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

What’s New in SAS Sentiment Analysis Studio 1.3

Overview

New and enhanced features in SAS Sentiment Analysis Studio include the following:

- improved user interface- rule enhancements- additional languages- SAS licensing replaces the Teragram license

Improved User Interface

The following enhancements were made to the user interface:- The Set test configuration and Search rules buttons appear in the

standard toolbar.- Polarity Keywords and Product tabs are merged into the Rule tab.- The Rule pane displays the Search Result, Syntax Errors, and Rule

Evaluation Result tabs. - The Project Settings Dialog dialog box replaces the Preferences Wizard.

You can specify the settings for your rule-based and statistical models separately.

Rule Enhancements

The following enhancements were made to the rules:- Positive and negative rules are now weighted the same. By default, the

Relative weight of positive rules in rule-based model is set to 100% in the Set Test Configuration window.

3

Page 12: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Note: At this time, changing this setting has no effect on sentiment scores.

- In-place editing enables you to edit your rules in the Body field using the operations that are available in a drop-down menu.

- Use the Search Rules dialog box to locate matching terms in your rules. See the results in the Search Result pane in the Rule pane.

- Syntax checking occurs automatically when you either edit a rule or build a model.

- Specify the coreference operator (_ref) for pronoun resolution. In other words, when a pronoun or another word refers to the canonical form for a term, return the canonical form.

- Define rules that limit matches to the specified XML fields.- Use the Rule Editor window to edit your rules in tree format. Other

editing operations are also available in menu format when you right-click on a rule.

- Specify and reference intermediate entities (concepts) in your rules. Intermediate concepts can be referenced by other concepts to shorten the rule-writing process. This feature enables you to write once and reference multiple times.

- Append an at sign (@) to your rules to make morphological expansion possible.

- Use the Set Test Configuration dialog box to specify the model settings for the testing operation.

Additional Languages

The following languages have been added: Czech, Danish, Finnish, Greek, Hebrew, Hungarian, Indonesian, Norwegian, Romanian, Russian, Slovak, Thai, Turkish, Vietnamese, Farsi (beta version)

4 SAS Sentiment Analysis Studio: Installation Guide

Page 13: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Licensing

SAS licensing replaces the Teragram License. For more information, see SAS Sentiment Analysis Studio: Installation Guide.

SAS Sentiment Analysis Studio: Installation Guide 5

Page 14: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

6 SAS Sentiment Analysis Studio: Installation Guide

Page 15: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

1About SAS Sentiment Analysis Studio

- What Is SAS Sentiment Analysis Studio?- Benefits of Using SAS Sentiment Analysis Studio- How Does SAS Sentiment Analysis Studio Work?- Architecture

1.1 What Is SAS Sentiment Analysis Studio?

SAS Sentiment Analysis Studio is a comprehensive solution to the multi-faceted challenges of analyzing sentiment in input documents. SAS Sentiment Analysis Studio automatically evaluates the feelings expressed about an object, when sentiment is expressed about anything, in input documents. This sentiment can apply to any object, person, event, or experience. However, sentiment can also apply to features and attributes. For the purposes of this document, these objects, persons, events, or experiences are referred to as products and their attributes are referred to as features.SAS Sentiment Analysis Studio uses expressed sentiment in the text to understand whether these are positive, negative, or neutral sentiments. SAS Sentiment Analysis Studio also extracts the critical comments that alert you to product and feature trends early in the life cycle of a product.SAS Sentiment Analysis Studio combines several key technologies to provide a comprehensive solution to sentiment analyses:Sentiment Analysis

SAS Sentiment Analysis Studio evaluates input documents using statistical text analysis, rule-based text analysis using advanced linguistic

7

Page 16: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

technologies, or both types of analyses. These analyses evaluate the subtle, or seemingly contradictory, emotional content of these messages.

A choice of models and toolschoose to deploy any of several statistical methods to analyze your input documents. The same is also true for the rules that are used to determine the sentiment of the input documents. You can use either statistical or rule-based models, or choose to combine both in one, hybrid model.Rule-based analyses

analyze the sentiment of input documents at a granular level.Statistical analyses

discover the overall emotions shaping sentiment in input documents using numerical data.

Hybrid analysescombine both rule-based and statistical analysis in one, hybrid model.

A choice of languageschoose to build a project in English, or in any of the 28 world languages available. This capability makes it possible for you to build an application in your native language.

Intuitive user interfaceuse the SAS Sentiment Analysis Studio interface to create a custom project.

Evaluation and Displayreturn a detailed breakdown of sentiment in input documents and color-coded graphs with the extrapolated information using a dashboard display. Included in this reporting feature is the ability to track overall and specific feature sentiments.

Sample Projectuse the sample SAS Sentiment Analysis Studio project that is included with SAS Sentiment Analysis Studio.

8 SAS Sentiment Analysis Studio: User’s Guide

Page 17: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

1.2 Benefits of Using SAS Sentiment Analysis Studio

SAS Sentiment Analysis Studio provides the following benefits:Empowers business owners by analyzing customer sentiment that is found in input documents

SAS Sentiment Analysis Studio includes functionality that is designed to fit your organization’s requirements by analyzing customer sentiment. The product interface enables you to create a project using linguistic rules that locate sentiment. The application also provides the capabilities required to mathematically analyze these sentiments to discover trends and defects early in a product’s life-cycle.

Improves the business value of IT and the corporate data that it managesSAS Sentiment Analysis Studio provides you with easy, self-service access to customer sentiment. Use SAS Sentiment Analysis Studio to obtain the reliable analyses that you require to meet your customers’ demands.

Saves money on training and support costsSAS Sentiment Analysis Studio is so simple that you can quickly become self-sufficient, with minimal IT support and no need for extensive training. Once you start using SAS Sentiment Analysis Studio, you are no longer dependent on the IT staff.

1.3 How Does SAS Sentiment Analysis Studio Work?

SAS Sentiment Analysis Studio is an application that anyone can use to interpret the documents available in input documents for their products and the features of these products. You can interactively obtain these analyses using models that you define using the SAS Sentiment Analysis Studio Windows interface without using a programming language. Use the SAS Sentiment Analysis Studio interface to define the terms that are used in favorable and unfavorable documents. You can also determine whether

SAS Sentiment Analysis Studio: User’s Guide 9

Page 18: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

to use statistical analysis, or a combination of linguistic and statistical analyses. List your products and their features and test the analytical model that you define to determine whether you are obtaining the results that you require.

1.4 ArchitectureFigure 1-1 SAS Sentiment Analysis Studio

During the management phase, subject matter experts specify sentiment concepts that are based on the information in your documents. This group of training documents is defined as the training corpus. If you use multiple collections of training documents, this collection is defined as training corpora. During the second part of this phase, subject matter experts write

10 SAS Sentiment Analysis Studio: User’s Guide

Page 19: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

definitions to ensure that all of the documents that should match a sentiment are located.The .sam file is used by the SAS Sentiment Analysis Server to identify the expressed sentiment in the input documents.

SAS Sentiment Analysis Studio: User’s Guide 11

Page 20: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

12 SAS Sentiment Analysis Studio: User’s Guide

Page 21: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

2 Before You Begin

The SAS Sentiment Analysis Studio user interface is designed to enable you to progress, from top to bottom, through the steps of building a project.

Figure 2-1 Building a Project Using the User Interface

The SAS Sentiment Analysis Studio user interface appears after you create a new project.

Note: In order to build a project, it is not necessary to change any of the project settings in the New Project Wizard that appears when you define a new project.

13

Page 22: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

14 SAS Sentiment Analysis Studio: User’s Guide

Page 23: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

3 Understanding the Interface Components

- Your First Look at the SAS Sentiment Analysis Studio User Interface- The SAS Sentiment Analysis Studio Menus- The Standard Toolbar- The New Project Wizard- The Main SAS Sentiment Analysis Studio User Interface- The Rule Tab- The Test Tab- The Project Settings Dialog Wizard- The Miscellaneous Windows

3.1 Your First Look at the SAS Sentiment Analysis Studio User Interface

To access the SAS Sentiment Analysis Studio user interface, go to Start —> Programs —> SAS Sentiment Analysis Studio —> SAS Sentiment Analysis Studio.

15

Page 24: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Display 3-1 SAS Sentiment Analysis Studio User Interface

The components of the main window are listed below from top to bottom:Program and Project title bar

see the name of the program that you are running and the title of the project that you are working within. (The title only appears after you create a new project.)

Menu baraccess drop-down lists for project tasks. For more information, see Section 3.2 The SAS Sentiment Analysis Studio Menus on page 17.

Standard toolbarclick shortcut buttons for some operations. For more information, see Section 3.3 The Standard Toolbar on page 20.

16 SAS Sentiment Analysis Studio: User’s Guide

Page 25: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

3.2 The SAS Sentiment Analysis Studio Menus

3.2.1 About the Availability of Menus and Menu Selections

All of the following conditions influence whether a menu or menu selection is available to use:

- Your location in the SAS Sentiment Analysis Studio application. For example, some tasks are available only after you create a project and select a tab.

- Whether, or not, you have created a project- The type of model that you are building- The selections that you choose

3.2.2 About Menus

The menus contain operations that apply to the entire project, or to the currently displayed tab. For example, create a new project, access an existing project, or build a project.

3.2.3 File Menu

Here are the operations that are available in the File menu:New

access the New Project wizard where you name, set the path, and choose a language, for your new project. You can also change the default settings for your models using this wizard.

Open

access the Choose a project file dialog box that enables you to locate an existing project.

Close

exit the current project.

SAS Sentiment Analysis Studio: User’s Guide 17

Page 26: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Save

store the current project and its specifications.Export Rules

send your rules, as an .xml file, to a location on your local machine.Import Rules

bring in your rules, as an .xml file, from a location on your local machine.Recent Projects

select one of the projects that you created from the drop-down list that

appears when you select . The last projects accessed are displayed in order from the most recent to the oldest of the most recent projects.

Exit

close this program.

3.2.4 Edit Menu

Here are the operations that are available in the Edit menu:Font

access the Select Font dialog box where you can select the font, style, and size for the interface component names, rules, manual testing texts, and so on. For more information, see Section 3.9.14 The Select Font Dialog Box on page 83.

Preferences

access the Project Settings Dialog dialog box where you can specify the log and model settings. For more information, see Section 3.8 The Project Settings Dialog Wizard on page 61.

Search Rules

access the Search Rules dialog box where you enter the text to search and make any additional specifications. For more information, see Section 3.9.1 The Search Rules Dialog Box on page 62.

18 SAS Sentiment Analysis Studio: User’s Guide

Page 27: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

3.2.5 View Menu

Here are the operations that are available in the View menu:Toolbar

see, or hide, the toolbar by clicking this selection.Status Bar

see, or hide, the status bar.

Note: At this time, this feature does not display any operations.

Output pane

see, or hide, the Search Result, Syntax Errors, and the Rule Evaluation Result tabs.

3.2.6 Build Menu

Here are the operations that are available in the Build menu:Build Statistical Model

compile a model that uses numerical data to determine sentiment. For more information, see Chapter 5: Creating a Statistical Model.

Import Learned Features

bring in the learned features from a statistical model that form the rules for the Positive and Negative concepts in the Rule tab. Learned features are the keywords generated by a statistical model. Use this operation instead of writing concept rules manually. For more information, see Section 6.3 Step 1: Defining Your Keywords on page 132.

Build Rule-based Model

compile a model that uses rules to extract the sentiments expressed in the input documents. For more information, see Chapter 6: Creating a Rule-Based Model.

SAS Sentiment Analysis Studio: User’s Guide 19

Page 28: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Build Hybrid Model

compile a model that uses a combination of the statistics and linguistic rules to analyze input documents. For more information, see Chapter 8: Creating a Hybrid Model.

Set Test Configuration

specify the test model before you test. In other words, this operation enables you to build multiple models, but to test only the selected model. For more information, see Section 3.9.13 The Test Configuration Dialog Box on page 81.

3.2.7 Help Menu

Here are the operations that are available in the Help menu:SAS Sentiment Analysis Studio: User’s Guide

displays a PDF version of this SAS Sentiment Analysis Studio: User’s Guide.

About SAS Sentiment Analysis Studio

enables you to see license, version, and build date information.

3.3 The Standard Toolbar

Access a number of operations using the standard toolbar that is located below the menu bar. These standard toolbar icons are shortcuts to some, but not all, of the commands available in the menu bar.

Display 3-2 Standard Toolbar

In order to hide, or to show, the standard toolbar, select View —> Toolbar.

20 SAS Sentiment Analysis Studio: User’s Guide

Page 29: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Table 3-1: Standard Toolbar

3.4 The New Project Wizard

3.4.1 Overview of the New Project Wizard

Use the New Project Wizard to specify the settings for a new project. These specifications determine the name and location of this project and are set across the project level. For this reason, these settings do not affect the installation.

Button Action

Click the New button and the New Project Wizard appears. Name your project and perform other operations. For more information, see Section 3.4 The New Project Wizard on page 21.

Click the Open button and the Choose a project file dialog box appears, asking you to locate an existing project file (.xml). For more information, see Section 3.9.16 The Choose a File and Similar Dialog Boxes on page 86.

Click the Save button to save your project.

Click the Preferences button and the Project Settings Wizard appears.

The following operations appear only after you create a project:

Click the Set test configuration button and the Set Test Configuration dialog box appears. For more information, see Section 3.9.13 The Test Configuration Dialog Box on page 81.

Click the Search rules button and the Search Rules dialog box appears. For more information, see Section 3.9.1 The Search Rules Dialog Box on page 62.

SAS Sentiment Analysis Studio: User’s Guide 21

Page 30: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Tip: To change the project settings after you create a project, select Edit --> Preferences. The Project Settings Dialog wizard appears. When you select this operation, the Log File pane also appears.

3.4.2 Access Project Settings

Access Project Settings in the New Project Wizard to specify the basic settings for your SAS Sentiment Analysis Studio project. After you specify these settings, the user interface is accessible and you can begin creating a project.To access and use the Project Settings window, complete these steps:

1. Select File --> New. The New Project Wizard wizard appears.

2. Enter the name of the new project into the Name field. For example, type MyNewProject.

22 SAS Sentiment Analysis Studio: User’s Guide

Page 31: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

3. Enter the path to the folder where the project is stored into the Directory field, or click and the Browse For Folder window appears.

4. Use the Browse For Folder window to select a location for your project.

a. Select a folder.

b. Click OK.

5. If you purchased more than one language, click in the Language

field in the Project Settings window. Select one of the languages that you purchased from the drop-down list that appears.

6. Click Next and the Settings for Rule-based Model window appears. For more information, see Section 3.4.3 Specify the Rule-Based Model Settings below.

3.4.3 Specify the Rule-Based Model Settings

The rule-based model settings enable you to choose the specifications for the rule-based model. Use this model when you want to write rules that determine the sentiment of the input documents. (If you do not plan to use the rule-based model, or if you are satisfied with the default settings, no action is required.)

SAS Sentiment Analysis Studio: User’s Guide 23

Page 32: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Display 3-3 Specify the Settings for the Rule-Based Model

To use the Rule-based Model Settings window, complete these steps:

1. (Optional) By default, the Word Tokenizer field specifies a path to the tokenizer file that is used to break input texts into streams of characters. Click to select a tokenizer file that is not the default entry. The

Browse For Folder dialog box appears. Use Step 3. through Step 4. on page 23.

24 SAS Sentiment Analysis Studio: User’s Guide

Page 33: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

2. (Optional) By default, Use sentence tokenizer is selected. SAS Sentiment Analysis Studio uses the specified file to break input text into sentences.

3. (Optional) Click and the Select a File window appears where you can

choose your custom sentence tokenizer file. For more information, see Section 3.9.16 The Choose a File and Similar Dialog Boxes on page 86.

4. (Optional) By default, Use part-of-speech tagger is selected. SAS Sentiment Analysis Studio uses the .htagger or another, similar file to recognize parts of speech in the input text.

5. (Optional) See Step 3. above and select your custom part-of-speech tagger file.

6. (Optional) By default, Enable case insensitive is selected. SAS Sentiment Analysis Studio uses the .bin file to recognize upper- and lowercase letters in the input text.

7. (Optional) See Step 3. above and select your custom case file.

8. (Optional) By default, Enable morphological expansion is selected. Words with an at sign (@) are automatically expanded into all of their word forms. Each of these forms can produce a match.

9. (Optional) By default, a filename and path appear in Tlp file. This file provides a list of expanded word forms enabling SAS Sentiment Analysis Studio to recognize various word forms when the at sign (@) is appended to the original term.

10. (Optional) See Step 3. above and select your custom expanded word forms file.

11. (Optional) By default, a filename and path are entered into Tags to expand file. This file provides a list of the part-of-speech tags that are necessary to recognize words based on their function in the input text.

12. (Optional) See Step 3. above and select your custom part-of-speech file.

13. (Optional) By default, Enable XML support is selected. If you deselect this check box, SAS Sentiment Analysis Studio matches rule text in .xml files as if they are .txt files.

SAS Sentiment Analysis Studio: User’s Guide 25

Page 34: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

14. (Optional) Enter a comma-separated (,) list of XML fields to search for matching terms into Default fields.

15. (Optional) Enter a comma-separated (,) list of XML fields that are not searched into XML tags to ignore.

16. Click Next and the Settings for Statistical Model appears.

3.4.4 Specify the Statistical Model Settings

The statistical model settings enable you to choose the specifications that are for the statistical model. Use this model when you want to determine the sentiment of the input documents using numerical data. (If you do not plan to use the statistical model, no action is required.)

Display 3-4 Specify the Settings for the Statistical Model

26 SAS Sentiment Analysis Studio: User’s Guide

Page 35: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

To use the Statistical Model Settings pane, complete these steps:

1. (Optional) Use any or all of the selections in Step 1. through Step 5. on page 25.

2. (Optional) By default, Use noun phrase extraction is selected. SAS Sentiment Analysis Studio uses the .concepts file to recognize noun phrases such as the boat in the input text.

(Optional) Click to select a noun phrase extraction file that is not

the default entry. The Select a File window appears where you can select a different noun phrase extraction file. For more information, see Section 3.9.16 The Choose a File and Similar Dialog Boxes on page 86.

3. (Optional) By default, the Use predefined stop words filter field is selected. SAS Sentiment Analysis Studio uses the .txt file to recognize words such as the, a, at, and so on, that are “noise.” The use of this file prevents these words from being used with statistical models in this project.

4. (Optional) See Step 2. above and select your stop words file.

Note: Alternatively, you can add your custom file when you specify the settings for an advanced statistical model.

5. Click Next and the Summary window appears.

SAS Sentiment Analysis Studio: User’s Guide 27

Page 36: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

3.4.5 Use the Summary Window

Use the Summary window to review the settings that you select for your project.

Display 3-5 Check Your Project Specifications in the Summary Window

To use the Summary window, complete these steps:

1. Scroll through this pane.

2. (Optional) If you want to make a change, click the Back button until you access the pane where you can make the necessary revision.

3. Click Finish and the main SAS Sentiment Analysis Studio user interface appears.

28 SAS Sentiment Analysis Studio: User’s Guide

Page 37: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

3.5 The Main SAS Sentiment Analysis Studio User Interface

3.5.1 Overview of the User Interface

By default, the Corpora pane of the main window of the SAS Sentiment Analysis Studio user interface appears after you set up your project using the New Project Wizard. For more information, see Section 3.4 The New Project Wizard on page 21. Use these panes, from top to bottom, to build a project.

Display 3-6 Corpora Tab

Corpora tab(for use with the statistical model only) import the training documents that are used by the statistical model to define sentiment types. For more information, see Section 3.5.2 The Corpora Tab on page 30.

Statistical tabspecify one, or more, types of statistical models. For more information, see Section 3.5.3 The Statistical Tab on page 31.

SAS Sentiment Analysis Studio: User’s Guide 29

Page 38: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Rule tabwrite the rules that are used by the rule-based model to locate expressed sentiment in input documents. For more information, see Section 3.6 The Rule Tab on page 42.

Test tabtest your models against preselected testing documents. For more information, see Section 3.7 The Test Tab on page 55.

3.5.2 The Corpora Tab

By default, the Corpora tab is displayed when you start SAS Sentiment Analysis Studio. See Display 3-6 on page 29. Use the Corpora pane to import and navigate the training documents that you assemble for the various sentiment types.You manually collect, analyze, and sort these documents by expressed sentiment.After you import a corpus and expand the components, this pane looks similar to the example displayed below.

Display 3-7 Corpora Tab

30 SAS Sentiment Analysis Studio: User’s Guide

Page 39: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

These are the sections in the Corpora tab:Corpus

Define your training taxonomy and import training documents using the following pre-defined classes:Positive

import the training documents that express a positive sentiment into this folder.

Negative

import the training documents that express a negative sentiment into this folder.

Neutral

import the training documents that express ambivalence into this folder.

Unclassified

import the training documents that have not been reviewed for their expressed sentiment into this folder.

Reference Files paneAccess the following operations in this pane:- see the list of training documents.- click on a training file to see its text in the File Content pane.- right-click on a training file and select the Delete File operation that

appears.File Content pane

display the text of a training document.

3.5.3 The Statistical Tab

3.5.3.A Overview of the Statistical Tab

Use the Statistical tab to define a model that uses numerical data to determine the sentiment of input texts. SAS Sentiment Analysis Studio uses the statistical model, with or without, the rule-based model to analyze input texts. (A hybrid model combines both the rule-based and statistical model.) You can also

SAS Sentiment Analysis Studio: User’s Guide 31

Page 40: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

import a .li file of concepts from SAS Contextual Extraction Studio and use these definitions to train your statistical model.

Display 3-8 Statistical Pane

3.5.3.B Define a New Model

By default, the Statistical Models pane is displayed when you select Statistical. Use this pane to name your statistical models, to access the Configuration, Text Results, and Graphical Results tabs, and to select a model to test. You gain access to these panes when you create a new model or import a precompiled model.

Caution: If you import a precompiled model and have defined simple or advanced models, unexpected behaviors can occur if you try to delete one of these models. You might lose the data for the lost statistical models.

To access these tabs, complete these steps:

32 SAS Sentiment Analysis Studio: User’s Guide

Page 41: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

1. Right-click in the white space of the Statistical Models pane. A drop-down menu appears.

2. Select New Model to define a new statistical model for your project. The New Model Dialog dialog box appears.

3. Enter the name of the new statistical model that you are defining into the Model name field. For example, type MySimpleModel.

4. (Optional) Leave the default setting Simple selected under the Work Mode heading. Alternatively, select Advanced. For more information, see Section 3.5.3.C Use the Statistical Model Configuration Tab for the Simple Model on page 35. Also see Section 3.5.3.D Use the Statistical Model Configuration Tab for the Advanced Model on page 36.

SAS Sentiment Analysis Studio: User’s Guide 33

Page 42: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

5. Click OK and the Statistical Model Configuration pane appears.

Note: Select Import Precompiled Model to access the Import Precompiled Model dialog box. Use this operation only after you build a statistical model and want to use this model with another project or on another machine. For more information, see Section 3.9.3 The Import Precompiled Model Dialog Box on page 66.This model does not display the Configuration, Text Results, and Graphical Results tabs.

34 SAS Sentiment Analysis Studio: User’s Guide

Page 43: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

3.5.3.C Use the Statistical Model Configuration Tab for the Simple Model

Use this section in the Statistical tab to specify the settings that define this model. See the image in Step 5. on page 34.To use this pane, complete these steps:Statistical Model Configuration

To configure the basic settings for the simple model, complete these steps:

1. Click in the Training corpus field to locate the folder containing the

training and validating documents for this model.

2. Click either or to change the default setting of 80% in the Set percentage for training field. By default, 80% of the texts in the selected folder are used to train the model. The remaining 20% are used to validate the model. (SAS Sentiment Analysis Studio automatically selects the validating texts.)

3. (available for simple models, only) Click Best mode and 16 training algorithms are run while the model is trained. The default setting specifies the Smoothed Relevancy Frequency text normalization algorithm is run with a check against each of the four feature ranking algorithms.

Tip: It takes longer to train the model when Best mode is selected.

4. (available only after you train a model) Click Text Result to see the testing results of this model.

5. (available only after you train a model) Click Graphical Result to see the testing results of this model in bar chart form.

SAS Sentiment Analysis Studio: User’s Guide 35

Page 44: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

3.5.3.D Use the Statistical Model Configuration Tab for the Advanced Model

If you select Advanced, see the following example and the explanations that follow:

To define an advanced model, complete these steps:

1. See the information for the Training corpus and Set percentage for training fields in Section 3.5.3.C Use the Statistical Model Configuration Tab for the Simple Model on page 35.

2. (No selection is possible.) The Bayes Method is the only available Solution at this time.

3. (Optional) Use either or to determine whether the input document expresses positive, negative, or neutral sentiment in the Probability threshold field. The Probability threshold is used to determine the overall sentiment of a document. This value is used during the validation step when building the statistical model. When SAS Sentiment Analysis Studio processes a document, the text is assigned a probability score. The value of this score lies between 0 and 1. If the score is expressed in terms of percentages, this value lies between 0 and

36 SAS Sentiment Analysis Studio: User’s Guide

Page 45: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

100%. Any document that has a probability score above the threshold is considered to be positive. A probability score that falls below the threshold is considered to be negative. A document that has the score equal to the threshold is considered as neutral.

- (Default setting) 0.5: signifying that this value is neutral

- 0: signifying the most negative value

- 1: signifying the most positive value

Note: Most values fall somewhere between the extremes of 0 and 1.

4. (Optional) Click to select one of the following algorithms in the

Text normalization model field:

(Default) Smoothed Relative Frequency or Relative Frequency normalize the term frequency by dividing it by the normalization factor.

Okapi BM25

is derived from the 2-Poisson probabilistic model.Pivoted Length Normalization

is derived from the investigation between document length and the probability of relevance from the test collections.

5. (Optional) Click to the right of the Contextual extraction field

and the Select a File dialog box appears. (For more information, see Section 3.9.16 The Choose a File and Similar Dialog Boxes on page 86.) Use this window to locate a .li file. This file can be one of two types of files. The file can be a noun-phrase file for languages such as Chinese or Japanese. Alternatively, the file could be a .li file that was output from SAS Contextual Extraction Studio.

When you choose to specify a .li file, SAS Sentiment Analysis Studio uses this file to train the statistical model. This file is applied to the training documents and the concepts extracted are used as additional features to train the model.

SAS Sentiment Analysis Studio: User’s Guide 37

Page 46: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

6. Click to the right of the Runtime stop words field and the Open

dialog box appears. Use this dialog box to locate an existing .txt file listing the noise words that you want to add to the stopwords file (stopwords.txt). If you chose to use a custom stopwords file, the file you now specify is added to that file. The stopwords file that you specify here is used for this project only. (The default stopwords.txt file applies to the program.)

3.5.4 Using the Text Result Tab

The Text Result pane provides a quick overview of the training results and enables you to see what model is determined to be the best for your project.To see this pane, click the Text Result tab. At the bottom of the pane, you can see the BEST MODEL information. This is true whether you select a simple or advanced statistical model.

Display 3-9 Result Region Text Results

38 SAS Sentiment Analysis Studio: User’s Guide

Page 47: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Tip: If you do not see the Text Result tab, make sure that View --> Output Pane is selected. If you still do not see these tabs, place your cursor at the bottom of the right side of the user interface. When the cursor forms two parallel lines with arrows pointing up and down, pull the cursor up.

3.5.5 Using the Graphical Result Tab

The Graphical Result pane enables you to obtain a quick overview of the training results. This pane also displays the BEST MODEL information.To see this pane, click the Graphical Result tab. The example shown below is for an advanced model. At the top of the pane, you can see the BEST MODEL information.

SAS Sentiment Analysis Studio: User’s Guide 39

Page 48: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Display 3-10 Results in Graph Format

Tip: If you do not see the Graphical Result tab, place your cursor at the bottom of the right side of the user interface. Pull the cursor up.

40 SAS Sentiment Analysis Studio: User’s Guide

Page 49: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

3.5.6 Train, Activate, Validate, or Delete the Model

After you define your model, train, activate, and validate this model in order to use the model. To use these operations, complete these steps:

1. Right-click on a model in the Statistical Models pane. A drop-down list appears

2. Select Train Model in order to develop the model using this training corpus. The SAS Sentiment Analysis Studio window appears with a message stating that the model is building.

3. Select Validate Model. When you select this operation, the process of building the model type that you selected begins and a check is automatically run. The documents set aside by SAS Sentiment Analysis Studio using the Set Percentage For Training field are used for the Validate Model operation.

4. Select Activate Model. When you select this operation, you enable SAS Sentiment Analysis Studio to analyze the input documents using the selected model. The last model to be activated is selected in the Default test type field of the Set Test Configuration window when you select Build —> Set Test Configuration.

5. Select Delete Model at any time.

SAS Sentiment Analysis Studio: User’s Guide 41

Page 50: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Tip: If you delete a statistical model, this model might continue to exist in the Workshop folder.

3.6 The Rule Tab

3.6.1 Overview of the Rule Tab

Use the Rule tab to specify the definitions that are used to locate the sentiment in an input document. The types of definitions that you define are specified below the Rules heading. These definitions, or rules, can collectively be referred to as concepts.

Figure 3-1 Default Concepts

By default, the following concepts appear:Intermediate Entities

Use intermediate entities to define concepts that are frequently referenced in the rules that use _def in their definition (CONCEPT, C_CONCEPT,

42 SAS Sentiment Analysis Studio: User’s Guide

Page 51: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

CONCEPT_RULE, and PREDICATE_RULE). When you develop intermediate entities, you can write rules once and then refer to them many times.

Tips: If you enable intermediate entities, they are also matched.At this time, if you enable a concept, close, and reopen the project, the concept might appear to be disabled. Right-click on the concept, examine enabled operations in the drop-down menu, and make the appropriate selection.

Products

Specify the rules that define the products and the features of these products. For example, specify the products and their features to eliminate the possibility of returning sentiment for products that should not be matched such as a competing brand.

Keywords

Use the Tonal Keywords section as it is divided by Positive, Negative, and Neutral to write definitions that locate corresponding sentiment.

Display 3-11 Rule Pane

SAS Sentiment Analysis Studio: User’s Guide 43

Page 52: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Note: After you add products and features, the Summary node is added. This node contains all of the rules from the corresponding product’s features. For this reason, do not reference these nodes in your rules.

3.6.2 Add Concepts

To specify the concepts that enable you to write the rules that are applied to the testing documents, complete these steps:

1. Right-click on Intermediate Entities and select New Entity from the drop-down list that appears.

2. Enter the name of this concept into the Entity name field of the Add Intermediate Entity dialog box that appears. For example, type PositiveTerms.

3. Click OK to save this name.

44 SAS Sentiment Analysis Studio: User’s Guide

Page 53: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

4. (Optional) You can expand the taxonomy tree and the rule writing pane appears. For more information about the rule-writing pane, see Section 3.6.5 The Rule-Writing Pane on page 50.

5. Right-click on the Products node and select New Product from the drop-down list that appears. The Add Product dialog box appears.

6. Enter the name of an object, service, or brand into the Product name field. For example, type Nikon.

7. Click OK and this node, with the Feature, Summary, Positive, Negative, and Neutral subnodes are added to the taxonomy. For more

SAS Sentiment Analysis Studio: User’s Guide 45

Page 54: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

information about how to access the operations that are available for these concepts, see Table 3-2 on page 48.

8. Right-click on the Feature node and select Add Feature from the drop-down menu. The Add New Feature dialog box appears.

a. Enter the title for this feature into the Name field. For example, type, Price.

b. (Optional) Use either or to reset the weight for this feature in the Relative weight field.

46 SAS Sentiment Analysis Studio: User’s Guide

Page 55: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Note: At this time, changing the weight has no effect on the sentiment score.

c. Click OK.

9. Select any of the nodes in the taxonomy and see the rule writing pane.

10. (Optional) Click on the CLASSIFIER rule type under the Type heading

and select to select a different rule type. For information about how to write rules, see Chapter 7: Writing Sentiment Analysis Rules.

11. (Optional) After you write two or more rules, you can change the display order by clicking Type, Body, or Weight.

SAS Sentiment Analysis Studio: User’s Guide 47

Page 56: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

3.6.3 Drop-down Operations

When you add entities, products, and features, you do so by right-clicking on a node. Use the following table to understand how to access and use these nodes.

Table 3-2: Drop-down Operations

Node Operation Description

Intermediate Entity

New Entity The Add Intermediate Entity dialog box appears where you can specify the name of the new entity. For more information, see Section 3.9.5 The Add Intermediate Entity Dialog Box on page 72.

These concepts can be referenced using the _def operator in CONCEPT, C_CONCEPT, CONCEPT_RULE, and PREDICATE_RULE rules.

The entity node that you added

Enable Entity By default, these concepts are disabled because they are intended to be used for reference purposes only. Select this operation to see these concepts matched in the Test --> Text Result tab.

The entity node that you added

Entity Property Change the name of this concept using the Intermediate Entity Property dialog box that appears. For more information, see Section 3.9.6 The Intermediate Entity Property Dialog Box on page 73.

The entity node that you added

Delete Entity Remove the concept.

The entity node that you added

Remove Duplicate Rules

Remove a rule with identical syntax to another rule.

New Product Products The Add Product dialog box appears. For more information, see Section 3.9.7 The Add Product Dialog Box on page 74.

Paste Copied Product

Products Add the product that you performed a copy operation on, with all of its specifications, to the taxonomy.

After you add a new product, the following operations are available:

48 SAS Sentiment Analysis Studio: User’s Guide

Page 57: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

3.6.4 Using the Search Field

Use the search field that is available in the Rule tab to locate rules that contain this term for the selected node.

The product node that you added

Product Property The Product Property dialog box appears. Use this window to specify the new name for the product. For more information, see Section 3.9.8 The Product Property Dialog Box on page 75.

The product node that you added

Delete Product A SAS Sentiment Analysis Studio status window appears, asking if you want to eliminate this product from the project.

The product node that you added

Copy Product Select this operation and the Paste Copied Product operation is available when you right-click on a Products node.

The product node that you added

Paste Copied Feature

This operation is available when you copy a feature and only when you select a different product. You cannot add a feature to the same product that you copied it from.

The node that you added

Remove Duplicate Rules

Remove a rule whose syntax is identical to another rule.

Feature Add Feature Use the New Feature dialog box that appears in order to insert a new feature node into the taxonomy. For more information, see Section 3.9.9 The Add New Feature Dialog Box on page 75.

Feature Paste Copied Feature

Access this operation after you copy a feature, add a new product node, and right-click on the Feature node under the new product.

Table 3-2: Drop-down Operations (Continued)

Node Operation Description

SAS Sentiment Analysis Studio: User’s Guide 49

Page 58: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Display 3-12 Enter a Search Term into the Search Field

If you want to locate a matching term only within one type of rule, click in the All Types field.

Note: Clear the input term before you enter a new search term.

3.6.5 The Rule-Writing Pane

Use the rule-writing, or the right-hand side, of the Rule pane to develop the definitions that locate products, features, and the sentiment expressed about these objects or services.See the following headings in the rule writing pane:Type

click to select a different type of rule. By default, CLASSIFIER is displayed. The table below lists the types of rules and a brief description

50 SAS Sentiment Analysis Studio: User’s Guide

Page 59: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

of each rule. For more information about rule types, see Chapter 7: Writing Sentiment Analysis Rules.

Body

write the rule syntax here. After you write your rules, you can right-click on the rule and use the editing operations that are available. With the exception of Edit in Tree View, these are standard Windows operations. If you select Edit in Tree View, the Rule Editor dialog box appears. For more information, see Section 3.9.12 The Rule Editor Dialog Box for Rules with Boolean Operators on page 79.

Table 3-3: Rule Types and Descriptions

Rule Type Description

CLASSIFIER Use this rule to match a string.

CONCEPT Locate related terms using strings, markers, and part-of-speech tags.

C_CONCEPT Specify the context for matches. The required _c marker displays a highlighted match in an input text.

CONCEPT_RULE Return related information that includes matches on other concepts. Specify markers and part-of speech tags. You are required to specify Boolean operators and to use the _c marker. This marker highlights the match in the Test tab.

PREDICATE_RULE Locate sentiment that is product and feature specific using arguments. For example, define a relationship between the sentiment terrific and the feature focus for the camera product. Without this defined relationship, any match on the sentiment could be returned as a match without any occurrences of the product and feature concept matches. Use the _def marker to reference products or features in the same domain or intermediate concepts.

REGEX Return information that follows a preset pattern such as dollar signs, percentage, and star (*) ratings. For example, type \d+ dollars to locate 10 dollars, 100 dollars, and so on. Also specify these expressions within, or at the end of a word to locate various word forms.

Note: For more complete information, see Chapter 7: Writing Sentiment Analysis Rules.

SAS Sentiment Analysis Studio: User’s Guide 51

Page 60: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Weight

by default, every rule has the same weight, 1. This means that each instance of a match on a rule in the input document adds 1 to the total score for the document. To understand how weight is applied use this example: An input document might have matches on two rules. Three instances of a match might occur on one rule and one instance of a match on the second rule. In this case, there is a total of four matches making the score of this document 4.Reset this weight to reflect the relative value of the matched keyword. For example, if dance costumes were compared, the word horrible could be assigned a higher number than the term not too good.

Note: At this time, changing the weight does not affect your results.

3.6.6 The Search Result, Syntax Errors Result, and Rule Evaluation Panes

The Search Result, Syntax Errors, and Rule Evaluation tabs appear at the bottom of the Rule pane. These tabs appear after you perform a search

operation using , or have a syntax error in one or more rules.

Tips: The Rule Evaluation pane does not work at this time because it is the output pane for the rule evaluation operation. If you do not see the Text Result, Graphical Result and Rule Evaluation tabs, make sure that View --> Output Pane is selected. If you still do not see these tabs, place your cursor at the bottom of the right side of the user interface. When the cursor forms two parallel lines with arrows pointing up and down, pull the cursor up.

52 SAS Sentiment Analysis Studio: User’s Guide

Page 61: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

To see the Search Result tab, complete the following steps:

1. Access the Rule tab.

2. Click and the Search Rules dialog box appears.

3. Enter a term into the Search Rules dialog box that you want to find in any of the rules in your project.

4. Enter a term into the Search for field. For example, type good. For more information, see Section 3.9.1 The Search Rules Dialog Box on page 62.

Tip: The Search Rules dialog box appears only when you use this search button. If you use the Search Rules field in the Rule tab, you search only the current definition.

SAS Sentiment Analysis Studio: User’s Guide 53

Page 62: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

5. See the search matches highlighted in yellow in the Search Result pane that appears.

To see the Syntax Errors tab, complete the following steps:

1. Select Build --> Build Rule-based Model when there is at least one syntax error in one of the rules.

Tip: Syntax error is defined as a mistake in the rule specification. At this time, misspellings in the name of referenced concepts are among the errors that are not recognized as syntax errors.

2. See a list of the rule syntax errors that is displayed in the Syntax Errors tab.

3. (Optional) You can expand the node names to display all of the errors for each listed node.

54 SAS Sentiment Analysis Studio: User’s Guide

Page 63: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

3.7 The Test Tab

Use the Test tab to run tests on the model that you develop. Use the left-hand pane to load the testing files and the right-hand pane to see the sentiment results. The information returned in the panes on the right side of the interface enables you to make appropriate changes, if necessary.

Display 3-13 Test Tab

The Test tab contains these components:Test Data

Manual Test

right-click in the white space below this node and select New Test Directory from the menu that appears. Use the Browse for Folder dialog box to locate a taxonomy of testing documents. For more information, see Section 3.9.16 The Choose a File and Similar Dialog Boxes on page 86. Also see the example shown in Display 3-14 on page 56.You can also use this operation to test text that you enter or copy and paste. Click on the Manual Test button to see a blank space below the File field. Use this space to enter the text that you want to test, see Display 3-13 above.

SAS Sentiment Analysis Studio: User’s Guide 55

Page 64: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

File field

enter the path to a testing document. Alternatively, click to access the

SAS Sentiment Analysis Studio dialog box where you can select a file.Test button

click Test to apply the model that you selected to the input document.Testing pane

after the testing operation is complete, you can see the results in this pane.Text Result tab

see the testing results in text format. Also see the prominence and dominance for the product. See the example shown below:

Display 3-14 Testing Results

56 SAS Sentiment Analysis Studio: User’s Guide

Page 65: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Graphical Result tabsee the sentiment distribution when you test a folder of files such as neg or pos. (This feature is not available for an individual document because you can see these matches in the Text Result pane.) The numbers to the right of each of the features specifies the number of documents that are determined to be positive, negative, and neutral, respectively. The bar that appears below the color descriptions specifies the number of documents tested for overall sentiment. The numbers to the right of this bar add up to the total number of tested documents. Below this bar, a group of bars that each represent a product feature appears. These bars specify the number of instances within the test documents where the expressed sentiment for the feature is positive, negative, or neutral.

Note: If you test only a statistical model, results are not returned for neutral sentiment. At this time, the statistical model is not trained using a neutral set of training documents.

In the following example, there are a total of 10 test documents. One of these documents has an overall sentiment of positive. The other nine texts express negative sentiment. Quality is mentioned positively in 33 instances and negatively in 52 instances within these 10 documents.

SAS Sentiment Analysis Studio: User’s Guide 57

Page 66: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Figure 3-2 Graphical Results

Tip: If you do not see the Text Result and Graphical Result tabs, make sure View --> Output Pane is selected. If you still do not see these tabs, place your cursor at the bottom of the right side of the user interface. When the cursor forms two parallel lines with arrows pointing up and down, pull the cursor up.

Drop-down testing operations are available for the testing documents that you import into the Test Data pane.

58 SAS Sentiment Analysis Studio: User’s Guide

Page 67: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Display 3-15 Drop-down Testing Operations

See the following table for a description of each available operation and the locations where these operations are accessible.

Table 3-4: Drop-down Operations for Test Documents

Operation DescriptionEntire Testing Corpus

Negative and Positive Folders

Individual Testing Document

Test in Statistical Model

Use the statistical model to test

X X X

Test in Rule-based Model

Use the rule-based model to test

X X X

Test in Hybrid Model

Use the hybrid model to test

X X X

SAS Sentiment Analysis Studio: User’s Guide 59

Page 68: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

When you select the export results operation for one of the models, the Test Result Export Dialog dialog box appears:

Display 3-16 Using the Test Result Export Dialog Dialog Box

To use the Test Result Export Dialog dialog box, complete these steps:

1. Leave the default selection XML files selected to export the test results for each file, individually. Alternatively, select Single CSV file to export the results into one comma-separated file, such as an Excel file.

Export Results of Statistical Model

Generate the test results as one .xml or .csv file per tested document

X X X

Export Results of Rule-based Model

X X X

Export Results of Hybrid Model

X X X

Delete Directory Remove the selected test folder from the Test pane

X X X

Table 3-4: Drop-down Operations for Test Documents (Continued)

Operation DescriptionEntire Testing Corpus

Negative and Positive Folders

Individual Testing Document

60 SAS Sentiment Analysis Studio: User’s Guide

Page 69: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

2. Enter the path to store the output in the Location to store exported

file(s) field. Alternatively, click and use the Browse For Folder dialog box to choose this location. For more information, see Section 3.9.16 The Choose a File and Similar Dialog Boxes on page 86.

3. Click OK to save these settings and to exit this dialog box.

3.8 The Project Settings Dialog Wizard

Use the Project Settings Dialog wizard to change the project settings that you specified when you created a new project.To access and use the Project Settings Dialog wizard, complete this step:Select Edit --> Preferences. The Project Settings Dialog dialog box appears.

SAS Sentiment Analysis Studio: User’s Guide 61

Page 70: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

The Log settings pane applies to all of the projects in this installation of SAS Sentiment Analysis Studio. This use of this pane is optional.To use the Log settings pane, complete these steps:

1. Click Write log file and SAS Sentiment Analysis Studio writes any errors to a file.

2. Enter the path to the log file in the Log file field. Alternatively, click .

3. Click OK to save these specifications.

For more information, see Section 3.4 The New Project Wizard on page 21.

3.9 The Miscellaneous Windows

3.9.1 The Search Rules Dialog Box

Use the Search Rules dialog box to locate a specific term in your rules.To access and use the Search Rules dialog box, complete these steps:

1. Select Edit --> Search Rules. The Search Rules dialog box appears.

2. Enter the term that you want to locate into the Search for field. For example, type quality.

3. (Optional) Select Case sensitive to locate an exact match on the term in the specified upper-and lowercase letters.

62 SAS Sentiment Analysis Studio: User’s Guide

Page 71: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

4. (Optional) Select Whole word only to specify that matches on partial words are not returned. For example, if you type fire, no matches are returned for firearms.

5. (Optional) Select Use regular expressions. Enter a regular expression when you want to match a specific pattern. For example, type n\w+ to locate words like not, no, and Nikon.

Note: Only use regular expressions when you make this selection.

6. Click OK to begin the search.

7. When you use this operation, the Search Result pane of the Rule tab appears. Use this pane to see the search results in text format. See the example below:

SAS Sentiment Analysis Studio: User’s Guide 63

Page 72: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

8. See the matching terms highlighted in yellow when you expand each matched concept in the Search Result tab.

64 SAS Sentiment Analysis Studio: User’s Guide

Page 73: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

3.9.2 The New Model Dialog Dialog Box

Use the New Model Dialog dialog box to define a new model.To access and use the New Model Dialog dialog box, complete these steps:

1. Right-click in the white space in the Statistical Models pane.

2. Select New Model in the drop-down menu that appears. The New Model Dialog dialog box appears.

3. Enter the name of the new model into the Model Name field. For example, type MyNewModel.

4. Leave the default selection Simple selected under the Work Mode heading. If you want to define a model with additional specifications, select Advanced.

5. Click OK.

SAS Sentiment Analysis Studio: User’s Guide 65

Page 74: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

3.9.3 The Import Precompiled Model Dialog Box

Choose to import a precompiled model when a model is large. If the training corpus is large and the model takes a long time to train, it can be moved to a UNIX machine for training purposes.Use the Import Precompiled Model dialog box to bring the .sam file for a statistical model, which you already built, into your current project.

Caution: If you import a precompiled model and have defined simple or advanced models, unexpected behaviors can occur if you try to delete one of these models. You might lose the data for the lost statistical models.

To use the Import Precompiled Model dialog box, complete these steps:

1. Make sure that you created a project, specified a statistical model, and built that statistical model. You can import this model.

2. In your current project, right-click on the white space in the Statistical Models pane.

3. Select Import Precompiled Model from the drop-down list that appears. The Import Precompiled Model dialog box appears.

4. Enter the name of the new statistical model into the Model name field.

66 SAS Sentiment Analysis Studio: User’s Guide

Page 75: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

5. Enter the path to the .sam file into the Load a pre-compiled SAS

Sentiment Analysis object field. Alternatively, click and the SAS Sentiment Analysis Studio dialog box appears.

a. Select the Profile_<model_name> directory in the Workshop folder for the project with a statistical model that you want to import.

b. Select the <model_name>_stat_object.sam file.

c. Click Open and the filename and path appear in the Load a pre-compiled SAS Sentiment Analysis object field of the Import Precompiled Model dialog box.

SAS Sentiment Analysis Studio: User’s Guide 67

Page 76: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

6. Click OK in the The Import Precompiled Model dialog box. The precompiled model appears in the Statistical Models pane. See the example below:

Note: Precompiled models do not display the Configuration, Text Result, and Graphical Result tabs.

68 SAS Sentiment Analysis Studio: User’s Guide

Page 77: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

3.9.4 The Import Learned Features Dialog Box

Use the Import Learned Features dialog box to bring the keywords that SAS Sentiment Analysis Studio automatically extracted from the input training corpora into the Rule tab. Learned features are defined as the keywords generated by a statistical model. You can use this operation to write only CLASSIFIER rules in the rule-based model. To access and use the Import Learned Features dialog box, complete these steps:

1. (Optional) Select the Keywords node in the Rule tab.

2. Select Build —> Import Learned Features and the Import Learned Features dialog box appears.

3. Select one of the following operations:

- Leave the default selection, Load features from a model selected.

(Optional) Click to select another model, if you created more than one model. Alternatively, click in the field below Load features from a file to select an _ss_feature.txt file to import using the Open dialog box.

- Select Load features from a file to select a _ss_feature.txt file to import. This operation is used when you have created and

SAS Sentiment Analysis Studio: User’s Guide 69

Page 78: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

trained a model on one machine and you want to use the _ss_feature.txt file on another machine.

(Optional) Leave the default Number of keywords to import

number, 1000, selected. Alternatively, click either or to increase, or decrease, respectively, the number of words that are imported.

4. Click OK and the SAS Sentiment Analysis Studio window appears:

Note: The number 1000 refers to the number of negative and to the number of positive learned features that are added to the project. For this reason 2000 appears in the example above.

70 SAS Sentiment Analysis Studio: User’s Guide

Page 79: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

5. Click Positive or Negative to see the learned features.

The float values in the Weight column are computed when the statistical model is built.

SAS Sentiment Analysis Studio: User’s Guide 71

Page 80: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

3.9.5 The Add Intermediate Entity Dialog Box

You can add an intermediate entity, or a concept that is referenced by other concepts, when you use this dialog box.To use the Add Intermediate Entity dialog box, complete these steps:

1. Right-click on Intermediate Entities in the Rules pane and select New Entity.

2. Enter the name of the new intermediate entity into the Add Intermediate Entity dialog box that appears. For example, type StoreName.

3. Click OK.

72 SAS Sentiment Analysis Studio: User’s Guide

Page 81: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

3.9.6 The Intermediate Entity Property Dialog Box

You can change the name of an intermediate entity that you added to your project when you use this dialog box.To use the Intermediate Entity Property dialog box, complete these steps:

1. Right-click on an intermediate entity that you added to your project. For example, select PositivePhrases.

2. Select Entity Property.

3. By default the name of the selected concept appears in the New entity name field of the Intermediate Entity Property dialog box that appears.

4. Edit this name.

5. Click OK to save the change.

SAS Sentiment Analysis Studio: User’s Guide 73

Page 82: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

3.9.7 The Add Product Dialog Box

You can add a product to your project when you use the Add Product dialog box.To use the Add Product dialog box, complete these steps:

1. Right-click on Products in the Rules pane and select New Product.

2. Enter the name of the new product into the Add Product dialog box that appears. For example, type Cars.

3. Click OK.

74 SAS Sentiment Analysis Studio: User’s Guide

Page 83: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

3.9.8 The Product Property Dialog Box

Change the name of the product that you added to the rule pane when you use this dialog box.To use the Product Property dialog box, complete these steps:

1. Right-click on a product in the Rule pane. The Product Property dialog box appears.

2. By default, the name of the existing product appears. Enter the new name into the New product name field.

3. Click OK to close this dialog box.

3.9.9 The Add New Feature Dialog Box

Add a feature to a product that you defined in the Rules pane when you use this dialog box.To use the Add New Feature dialog box, complete these steps:

1. Right-click on Feature in the Rules pane and select Add Feature.

SAS Sentiment Analysis Studio: User’s Guide 75

Page 84: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

2. Enter the name of the new feature into the Add New Feature dialog box that appears. For example, type Image.

3. (Optional) Click either or to change the default setting of 0 in the Relative weight field.

Note: At this time, this weight is not entered into the sentiment score calculation.

4. Click OK.

3.9.10 The Feature Property Dialog Box

Use the Feature Property dialog box to change a product feature in the Rules pane.To access and use the Feature Property dialog box, complete these steps:

1. Right-click on a feature and select Feature Property from the drop-down menu that appears. The Feature Property window appears.

76 SAS Sentiment Analysis Studio: User’s Guide

Page 85: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

2. Enter the name of the new feature into the Name field. For example, type PointandShoot.

3. (Optional) Click either or to change the default setting of 0 in the Relative weight field. For example, use this field if you defined a Camera project and want to rank the matches on PointandShoot higher than those for Color.

Note: At this time, the weight entered into the Relative weight field is not factored into the sentiment score calculation.

4. Click OK.

3.9.11 The Rule Evaluation Dialog Dialog Box

Use the Rule Evaluation Dialog dialog box to specify the documents that you want SAS Sentiment Analysis Studio to assess. When you use this dialog box, you can see the matching terms, the number of documents that match that term, and the precision in the tree or text views.

Tip: At this time, this feature does not work.

To use the Rule Evaluation Dialog dialog box, complete the following steps:

1. Select Build --> Evaluate Rules. The Rule Evaluation Dialog dialog box appears.

SAS Sentiment Analysis Studio: User’s Guide 77

Page 86: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

2. Enter the path to the Positive file directory, or click to access the

Browse For Folder dialog box. Use this window to locate the folder of positive testing documents that you assembled. For more information, see Section 3.9.16 The Choose a File and Similar Dialog Boxes on page 86.

3. Enter the path to the Negative file directory, or click to access the

Browse For Folder dialog box. Use this window to locate the folder of negative testing documents that you assembled.

4. Click OK to see the results in the Rule Evaluation Result tab in the Rule pane.

78 SAS Sentiment Analysis Studio: User’s Guide

Page 87: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

3.9.12 The Rule Editor Dialog Box for Rules with Boolean Operators

Use the Rule Editor dialog box to make changes to any of your rules that begin with a Boolean operator (CONCEPT_RULE and PREDICATE_RULE). Use either the text or the taxonomy view in this dialog box.To access and use the Rule Editor dialog box, complete these steps:

1. Right-click on a rule in the Body section of the Rule pane.

SAS Sentiment Analysis Studio: User’s Guide 79

Page 88: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

2. Select Edit in Tree View from the drop-down list that appears. The Rule Editor dialog box appears.

3. (Optional) Right-click on a node and select one of the following operations:

a. Select Add Statement to add a new section of the rule between the quotation marks (“”) that appear.

b. Select Add Keyword to add a Boolean operator. For more information, see Table 7-2 on page 170.

Tip: The Add Keyword operation refers to adding a Boolean operator, not a keyword definition.

c. Select Select Keyword to edit the selected Boolean operator.

d. Select Delete Node to remove the selected section of the rule.

80 SAS Sentiment Analysis Studio: User’s Guide

Page 89: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

4. (Optional) Click Text View to see the rule in a text editor where you can make changes to the syntax.

Tip: In this window, the Boolean operators are highlighted in blue.

5. Click OK to save your changes.

3.9.13 The Test Configuration Dialog Box

Use the Test Configuration dialog box to choose the type of model that you are testing.

Notes: After you set the test configuration, these settings are used for each test process in your current project until you change these specifications.Unless the default setting Test Rule-based Model, is changed, a statistical model cannot be built.

To access and use the Test Configuration dialog box, complete these steps:

SAS Sentiment Analysis Studio: User’s Guide 81

Page 90: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

1. Select Build —> Set Test Configuration and the Set Test Configuration dialog box appears.

2. (Optional) Leave the default setting Test Rule-based Model selected in the Default test type field. If you want to test a different type of

model, click . Select either Test Statistical Model or Test Hybrid Model.

3. (Optional) Click either or to the right of the Probability threshold field to change the default number 50%. This value is used to determine the overall sentiment of a document. When a document is processed, it is assigned a probability score. If the score of a processed document matches this specification, the sentiment is neutral. If the score exceeds this value, the sentiment is positive. When the score falls below this value, the sentiment is negative.

4. (Specify this setting if you selected Test Statistical Model or Test Hybrid Model in Step 2. above.) You can choose to leave the activated statistical model selected, or you can change this model. When you

select a different model using , the selected model is activated and applied to the project.

5. (Optional) Click either or to the right of the Relative weight of positive rules in rule-based model field. Use this operation to change the default number 100%.

82 SAS Sentiment Analysis Studio: User’s Guide

Page 91: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Notes: 100% indicates that the positive rules are treated with the same importance as the negative rules. For more information, see Section 6.2 Understanding Sentiment Computation on page 130.At this time, this setting has no effect on the sentiment analysis results.

6. (Optional, and for hybrid models only) Click either or to the right of the Weight of statistical model in hybrid model field. Use this operation to change the default weight of a statistical model, which is 70%.

7. Click OK to save these settings.

3.9.14 The Select Font Dialog Box

Use the Select Font dialog box to choose the display lettering for the SAS Sentiment Analysis Studio interface components. To access and use the Select Font dialog box, complete these steps:

SAS Sentiment Analysis Studio: User’s Guide 83

Page 92: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

1. Select Edit —> Font and the Select Font dialog box appears.

2. (Optional) Use the scroll bar to select a font that is not the default selection, Arial, beneath the Font heading.

3. (Optional) Select a different font style under the Font style heading if you do not want to display the component tags in the default setting, Normal.

4. (Optional) Use the scroll bar to select a font size that is not the default selection, 10, beneath the Size heading.

5. (Optional) Click either Strikeout or Underline under the Effects heading.

6. (Optional) Select a script for some of the world languages under the Script heading.

7. (Optional) See the selections that you made applied in the Sample pane.

8. Click OK to save your settings.

84 SAS Sentiment Analysis Studio: User’s Guide

Page 93: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

3.9.15 The New Corpus Dialog Dialog Box

Use the New Corpus Dialog dialog box to define a new collection of training documents.To access and use the New Corpus Dialog dialog box, complete these steps

1. Right-click in the white space of the Corpora tab and select New Corpus from the drop-down list that appears.

2. Type the name of the training set of documents into the Corpus Name field.

3. Click OK.

SAS Sentiment Analysis Studio: User’s Guide 85

Page 94: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

3.9.16 The Choose a File and Similar Dialog Boxes

Use the Choose a File dialog box to select testing files. There are several similar windows including Browse For Folder, Choose a file to save under, Open, and Select a File. All of these dialog boxes are used like the Choose a File dialog box.

Display 3-17 Choose a File Dialog Box

To use the Choose A File dialog box, complete any of these steps that you require to locate a file:

1. (Optional) Click in the Look in field to select a folder.

2. (Optional) Select a file in the pane below the Look in field.

3. (Optional) Click in the File name field to select a file.

4. (Optional) Click in the Files of type field to select a file extension.

86 SAS Sentiment Analysis Studio: User’s Guide

Page 95: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

5. Click Open to access this file.

3.9.17 The Information and SAS Sentiment Analysis Studio Windows

The Information screen appears when you try to perform an operation without the appropriate files.

Display 3-18 Information Screen

The SAS Sentiment Analysis Studio status screen appears when you build a model and perform other operations.

Display 3-19 SAS Sentiment Analysis Studio Screen

Click OK to exit either of these windows.

SAS Sentiment Analysis Studio: User’s Guide 87

Page 96: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

88 SAS Sentiment Analysis Studio: User’s Guide

Page 97: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

4 Accessing a Project and Choosing a Model

- Overview of Project Creation- Defining a New SAS Sentiment Analysis Studio Project- Access an Existing SAS Sentiment Analysis Studio Project- How Your Selected Model Appears in the Interface

4.1 Overview of Project Creation

Each SAS Sentiment Analysis Studio project uses one or more models to extract sentiment from input documents. These models can include multiple types of statistical models. Only one rule-based model can be deployed for each project, but a project with a rule-based model can also include several statistical models. Alternatively, build a project that contains only a rule-based model.Although elements of both the statistical and rule-based models are deployed for each SAS Sentiment Analysis Studio project, you select a model based on the type of analysis that you want to deploy. Before you access SAS Sentiment Analysis Studio to create a project, you should understand the benefits for each model.The benefits for each type of model include the following:Statistical Models: both simple and advanced

- Train a classifier using a corpus, or several collections (corpora), of preselected documents for each type of sentiment.

- Use at least 1,000, hand-selected documents for each of the positive and negative sentiment nodes in your training corpora.

89

Page 98: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Tip: At this time, neutral and unclassified sets of documents are not used by the statistical model.

- Statistical models are good predictors of the overall sentiment expressed in an input document.

Rule-based Model: build only one in each project- Extract sentiment at a granular level.- Extract sentiment that is related to objects, brands, features, products,

and so on.- Write your own custom linguistic rules using several rule types to

extract sentiment and to locate objects, brands, features, products, comparisons between features, products, and so on.

Hybrid Model: Apply the numerical data from the activated statistical model and custom rules from your rule-based model to analyze the expressed sentiment in input documents.SAS Sentiment Analysis Studio enables you to train, validate, activate, and test your model before it is deployed using sample documents that meet your requirements. You can also use automatically created or customized, hand-written rules and select the mathematical methods deployed with the statistical model.

90 SAS Sentiment Analysis Studio: User’s Guide

Page 99: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

4.2 Defining a New SAS Sentiment Analysis Studio Project

4.2.1 Create a New Project

To create a project, complete these steps:

1. Select Start —> Programs —> SAS Sentiment Analysis Studio —> SAS Sentiment Analysis Studio.

2. Select File —> New and the New Project Wizard appears.

3. Type the name of the new project into the Name field.

SAS Sentiment Analysis Studio: User’s Guide 91

Page 100: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

4. Click Browse and the Browse for Folder dialog box appears.

a. Select a directory location for the new project.

b. (Optional) Click Make New Folder and name the folder that appears.

c. Click OK to save your selection and exit this dialog box.

5. (Optional) If you purchased more than one language, click on the

right side of the Language field in the New Project Wizard. Select a language from the drop-down menu that appears.

92 SAS Sentiment Analysis Studio: User’s Guide

Page 101: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

6. Click Next. The Settings for Rule-based Model page appears:

7. Leave the default settings unless you have custom files that you want to use in place of those provided with the application.

8. (Optional) If you are using XML files, leave the default selection Enable XML support selected.

9. (Optional) Enter the XML fields that should be searched unless another field is specified in the rule in the Default fields field.

10. (Optional) Enter the XML tags that should not be searched in the XML fields to ignore field.

SAS Sentiment Analysis Studio: User’s Guide 93

Page 102: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

11. Click Next. The Settings for Statistical Model page appears.

12. Leave the default settings unless you have custom files that you want to use in place of those provided with the application.

94 SAS Sentiment Analysis Studio: User’s Guide

Page 103: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

13. Click Next to exit this page. The Summary page appears.

14. Review the text in the Summary pane.

15. (Optional) Click Back to return to a previous page. Make any changes in that page.

SAS Sentiment Analysis Studio: User’s Guide 95

Page 104: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

16. Click Finish and the Corpora tab appears. The SAS Sentiment Analysis Studio user interface displays the name of the new project such as My_New_Project.

4.3 Access an Existing SAS Sentiment Analysis Studio Project

If you have already created a SAS Sentiment Analysis Studio project, you can access this .xml project file. For example, you can access the SamProject.xml file.

Note: When you uninstall and reinstall SAS Sentiment Analysis Studio, reinstall in the same folder if you want to access the projects that you have already created. Open the <projectname>.xml file to access a saved project. All of the files that you import into a project appear if you reinstall SAS Sentiment Analysis Studio into the same folder. This is true whether these are training and validating, rules, or testing files.

96 SAS Sentiment Analysis Studio: User’s Guide

Page 105: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

To access an existing project, complete these steps:

1. Select File —> Open and the SAS Sentiment Analysis Studio dialog box appears.

2. Navigate to the <project_name>.xml file that you want to access.

SAS Sentiment Analysis Studio: User’s Guide 97

Page 106: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

3. Click Open and the selected project appears. You can see the name of the project in the title bar.

4.4 How Your Selected Model Appears in the Interface

Each of the model types has differences in the ways that they appear, are created within, and interact with, the user interface. Some of this high-level information is explained below: Statistical model

These are the types of statistical models—simple and advanced. Each of the models that you develop are named and listed in the Statistical tab. The most recently activated model is highlighted in black. Each of these models creates an output file, which is an _ss_feature.txt. Use this file to import the keywords identified by SAS Sentiment Analysis Studio using the training corpora in the Corpora tab. You can also import a precompiled model.

98 SAS Sentiment Analysis Studio: User’s Guide

Page 107: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Rule-based model

There is only one rule-based model for each project. However, you can add multiple products, and with each product, you are able to add an unlimited number of features for that product. This model is unnamed and does not appear in the interface. You can also use the Intermediate Entities node to define phrases that incorporate your Keywords definitions. Reference these intermediate entities, or the keywords, in rules that you write for products and features. When you reference concepts, you save time because you can reference all of the rules in a definition without rewriting these rules.

Hybrid model

Specify a combination of the most recently activated statistical model and the rule-based model.

SAS Sentiment Analysis Studio: User’s Guide 99

Page 108: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

100 SAS Sentiment Analysis Studio: User’s Guide

Page 109: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

5 Creating a Statistical Model

- Assembling Training Documents- Set Up the Training and Validation Corpora- Create One or More Models for Your Project- Building, Training, and Validating a Model- Import a Precompiled Model- Before You Test Your Statistical Model

5.1 Assembling Training Documents

Before you define a statistical model, assemble the documents that express sentiment for your product and its features. Select these training documents by hand to ensure that their content reflects the overall types of sentiment that are expressed in your testing documents. (A portion of the training documents are reserved for validation purposes by SAS Sentiment Analysis Studio.) In other words, select documents that represent positive feeling for a product and place them in one folder. Repeat this operation for documents that express negative sentiment.SAS Sentiment Analysis Studio uses these documents to train the statistical model. When you define this type of model, the overall sentiment of the document is identified and the terms that express these feelings are identified.In other words, you can assemble approximately 1,000 documents in which the overall sentiment expression for a product is positive. Collect another 1,000 documents where reviewers express sentiment that is negative overall. (At this time, the neutral and unclassified categories are not used by SAS Sentiment Analysis Studio.)

101

Page 110: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Place each of these groups of texts into a taxonomy of folders with a structure that is similar to the example that is displayed below. (You can also see this structure in the Nikon sample project, which is supplied with the application.)

Display 5-1 Folders of Training and Testing Documents

See Section 9.1 Assembling Testing Documents on page 215 for information about selecting testing documents.After you create your training and testing directory structure and add all of the documents that SAS Sentiment Analysis Studio uses, you can begin developing your project.

5.2 Set Up the Training and Validation Corpora

Before you train your statistical model, import the training documents that you collected using Section 5.1 Assembling Training Documents on page 101. These training documents are used to identify sentiment keywords and overall sentiment in an input document. (SAS Sentiment Analysis Studio automatically removes a portion of the training documents and uses these texts to validate the model.)

102 SAS Sentiment Analysis Studio: User’s Guide

Page 111: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

To set up your training corpora, complete these steps:

1. Right-click in the Corpus pane in the Corpora tab and select New Corpus.

2. The New Corpus Dialog dialog box appears. Enter the name of the collection of training documents into the Corpus Name field. For example, type Cameras.

SAS Sentiment Analysis Studio: User’s Guide 103

Page 112: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

3. Click OK and a taxonomy of sentiment nodes appears in the Corpus pane.

Tip: At this time, only the Positive and Negative nodes are used by the statistical model.

4. Right-click on the Positive node and select Add a Directory from the drop-down menu that appears. This feature enables you to import the

104 SAS Sentiment Analysis Studio: User’s Guide

Page 113: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

entire collection of training documents for this node. (If you want to add your documents one at a time, select Add a File.)

The Browse For Folder dialog box appears.

5. Select the folder that contains the documents that express the selected sentiment for this model.

SAS Sentiment Analysis Studio: User’s Guide 105

Page 114: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

6. Click OK.

7. A list of the training documents in the selected folder is displayed in the Reference Files pane.

8. Click on a document name in the Reference Files pane to see its text in the File Content pane.

9. (Optional) Repeat Step 4 on page 104 through Step 8 above for the remaining nodes in the Corpus pane.

106 SAS Sentiment Analysis Studio: User’s Guide

Page 115: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

SAS Sentiment Analysis Studio automatically removes a percentage of training documents for validation purposes. When SAS Sentiment Analysis Studio runs the validation operation, the application adjusts the parameters of the training results. These adjustments are based on the results obtained from the validating corpora of documents for negative and positive sentiment expressions.

5.3 Create One or More Models for Your Project

5.3.1 Overview of Choosing a Model

Choose a statistical model based on your precision requirements: Simple model

The configuration for this type of model can be one of four combinations. Smoothed Relative Frequency is combined with no feature ranking, or the feature rankings of Risk Ratio, Chi Square, and Information Gain to automatically obtain the best model. The best model is derived from the training documents that you assemble.

Advanced modelThis model enables you to select one of the four types of text normalization. This selection is combined with any of the feature rankings listed above, or no feature ranking. For this reason, four combinations for each of the four types of text normalization are possible. You can choose to try each combination, or use only one text normalization type. In either case, the best model is applied to input documents.

SAS Sentiment Analysis Studio: User’s Guide 107

Page 116: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

5.3.2 Create a Simple Statistical Model

Define the type of statistical model that you want to build.To define a simple statistical model, complete the following steps:

1. Click the Statistical tab and the Statistical Models pane appears.

2. Right-click on the white space in the Statistical Models pane and select New Model in the drop-down menu that appears.

108 SAS Sentiment Analysis Studio: User’s Guide

Page 117: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

3. The New Model Dialog dialog box appears. Enter the name of the model into the Model Name field. For example, type My_Simple_Model.

4. By default, the Simple radio button is selected.

5. Click OK and the selected model appears in the Statistical Models pane.

6. By default, the name of the training corpus that you added appears in the

Training corpus field. For example, see My_Simple_Model. Click to select a different corpus if you have added more than one corpus to the Corpora tab.

7. By default, 80% is selected in the Set percentage for training field for simple models. This means that 80% of the documents in the selected corpus are used as training documents. The remaining 20% of the texts, as randomly and automatically selected by SAS Sentiment Analysis Studio, are used for validation purposes. Change this number

SAS Sentiment Analysis Studio: User’s Guide 109

Page 118: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

by clicking either or . For more information, see Section 5.4.4 Validate a Statistical Model on page 118.

8. (Optional) Click Best mode and training algorithms are run while the model is trained. The default setting specifies the Smoothed Relevancy Frequency text normalization algorithm is run with a check against each of the no feature and feature ranking algorithms.

Tip: At this time, the best model is always used by SAS Sentiment Analysis Studio. This is true whether you click Best mode or not.

9. Click to save your project.

10. Select Build —> Build Statistical Model. The Build Statistical Model dialog box appears.

11. Make sure that the model that you defined is selected in the Select a

model field, or click to select a different model.

12. Click OK to save your selection.

5.3.3 Create an Advanced Statistical Model

Choose to build an advanced statistical model if you are mathematically sophisticated and want to design your own model. SAS Sentiment Analysis Studio selects the best model available based on the text normalization selection that you make.To create an advanced model, complete these steps:

110 SAS Sentiment Analysis Studio: User’s Guide

Page 119: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

1. Complete Step 1 through Step 2 on page 108.

2. Type the name of the model into the Model name field of the New Model Dialog dialog box that appears. For example, type My_Advanced_Model.

3. Select Advanced.

4. Complete Step 5 through Step 7 on page 109.

5. (No action is possible.) The Bayes Method is the only Solution, at this time.

SAS Sentiment Analysis Studio: User’s Guide 111

Page 120: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

6. The Probability threshold is set to 0.50, or neutral. To reset this number, type in a number within the range of 0—the most negative value, and 1—the most positive value. Most values fall somewhere between the extremes of 0 and 1. The probability threshold determines whether the sentiment in the document is a match.

7. By default, the Text normalization model is set to Smoothed

Relative Frequency. Click to the right of the Text Normalization Model field to choose another model:

Relative Frequency works like Smoothed Relative FrequencyThe term frequency is normalized by dividing it by the normalization factor. Differences in the lengths of the texts affect the analysis results.

Okapi BM25

This is derived from 2-Poisson probabilistic model.Pivoted Length Normalization

This is derived from the investigation between document length and the probability of relevance from the test collections.

Note: When the model is built, four combinations that include three types of feature ranking and no feature ranking are applied to identify the best model. The best model is used to return the overall sentiment of the document.

8. (Optional) Click to the right of the Contextual extraction field

and the Select a file dialog box appears. Use the Select a file dialog box to locate a .li file. This file can be a noun-phrase file that is used with languages such as Chinese or Japanese. This file can also be a .li file that was output from SAS Contextual Extraction Studio. When you select this operation, the imported file is used to train the statistical model. The rules written for SAS Contextual Extraction Studio are similar to the rules that can be written in SAS Sentiment Analysis Studio.

112 SAS Sentiment Analysis Studio: User’s Guide

Page 121: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

9. (Optional) Click to the right of the Runtime stop words field and

the Select a file dialog box appears. Use this dialog box to locate your custom .txt file that contains a list of noise words such as the, then, and, and so on. This file is added to the stopwords.txt file specified in the Project Settings Dialog Wizard.

10. Click to save your settings.

11. Select Build —> Build Statistical Model and the Build Statistical Model dialog box appears.

12. Leave the default selection in the Select a model field, or click to select a different model.

13. Click OK to save your selection and to exit this dialog box.

14. When you build this model, the training results appear in the Text Result tab. See the image in Step 1 on page 111. At the bottom of the Text Result pane see the best model information, which is circled in the example shown in Step 1 on page 111. This is the model that is used for the selected statistical model.

SAS Sentiment Analysis Studio: User’s Guide 113

Page 122: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

15. Click Graphical Result to see the training results in this pane.

16. Click OK.

114 SAS Sentiment Analysis Studio: User’s Guide

Page 123: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

5.4 Building, Training, and Validating a Model

5.4.1 Understanding the Interrelationship of These Operations

Whether you choose the build, train, or validate a model operation; the model is built, trained, and validated. However, the results that are displayed in the Text Result and Graphical Result panes are different. When you use the build and train operations, you see the best model results in these tabs. If instead, you select the Validate Model operation, you can see the overall, positive, and negative precision percentages. These percentages are based on the documents that are set aside for this purpose. For more information, see Section 3.5.3 The Statistical Tab on page 31.Select the Activate Model operation to choose the model that is used to test the input documents. This operation enables you to create, and use, several models in a single project. However, only one model can be applied at a time.

Note: Before you can activate a model, train the model.

You can access all of these operations when you right-click on a model in the Statistical Models pane of the Statistical tab. You can also access the Build Statistical Model operation in the Build menu.Use the Set Test Configuration operation (in the Build menu) to specify the statistical model type and the specific testing model. For more information, see Section 3.9.13 The Test Configuration Dialog Box on page 81.

5.4.2 Building a Statistical Model

Use any of the following operations to build your model:Train Model

Build the model as it trains the new model that you create. For more information, see Section 5.4.3 Train a Statistical Model on page 116.

Validate Model

SAS Sentiment Analysis Studio: User’s Guide 115

Page 124: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Validate a trained model. For more information, see Section 5.4.4 Validate a Statistical Model on page 118.

Activate Model

Activate a trained model to use this model for testing purposes. For more information, see Section 5.4.5 Activate a Statistical Model on page 121.

Build —> Build Statistical Model

Use the Build Statistical Model dialog box to build the model of your choice. For more information, see Step 10 on page 110.

5.4.3 Train a Statistical Model

After you define a simple, or an advanced model, you train the model using the training documents added to the Corpora tab. (You can obtain the same information using the build operation.)The following set of steps is provided for the training operation. You can use this same set of steps for the build operation, when you substitute Build Model in Step 1 below.

To train your model, complete these steps:

1. Right-click on the model node. For example, choose Simple. Select Train Model in the drop-down menu that appears.

116 SAS Sentiment Analysis Studio: User’s Guide

Page 125: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

2. The SAS Sentiment Analysis Studio status screen appears. When this screen disappears, the training results appear in the Text Result tab. Check your results.

3. (Optional) Click Graphical Result to see this information in bar chart format.

Tip: If you do not see the Text Result and Graphical Result tabs, place your cursor at the bottom of the right side of the user interface. Pull the cursor up to display these tabs.

SAS Sentiment Analysis Studio: User’s Guide 117

Page 126: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

4. (Optional) Make changes to a model to obtain the results that you require. For example, if you specified a simple model, you could choose to make the edits necessary to define an advanced model.

Note: If you make a change to the training corpus, retrain your model.

5.4.4 Validate a Statistical Model

After you create a model, you can validate this model in order to see the percentages for overall, positive, and negative precision. SAS Sentiment Analysis Studio automatically removes a percentage of training documents for validation purposes. The validation operation adjusts the parameters of the training results based on the results obtained from the validating corpora of documents for negative and positive sentiment.The documents that are used to validate the model are different from those that are used to train the model. For example, if you specify 80% in the Set percentage for training field in the Statistical tab, 20% of the training documents are applied in the validation operation. When SAS Sentiment Analysis Studio runs the build or validate operations, the application adjusts the parameters of the training results. These parameters are adjusted based on the results obtained from the validating corpora of documents for negative and positive sentiment.To validate your model, complete these steps:

1. Right-click on the model and select Validate Model from the drop-down menu that appears.

118 SAS Sentiment Analysis Studio: User’s Guide

Page 127: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

2. The SAS Sentiment Analysis Studio status screen appears. Click OK to exit this window.

3. See the results in the Text Result tab.

SAS Sentiment Analysis Studio: User’s Guide 119

Page 128: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

4. Click Graphical Result to see these results in bar chart form.

5. (Optional) Make any necessary changes to your model or corpus of training documents.

120 SAS Sentiment Analysis Studio: User’s Guide

Page 129: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

5.4.5 Activate a Statistical Model

The process of activating a model enables you to build several models in one project, but to apply only one model to each input set of documents. Choose the applied model based on the input set of documents and the results that you seek to obtain.When you activate a model, you bypass the build operation and make the selected model the model that SAS Sentiment Analysis Studio applies to input documents.To activate a model, complete this step:Right-click on a model that you have previously trained and built. Select Activate Model from the drop-down menu that appears.You can use this process reiteratively to create several models. The last model to be activated is highlighted in black.

5.5 Import a Precompiled Model

Choose to import a precompiled model when a model is large. For example, if the training corpus is large and the model takes a long time to train, it can be moved to a UNIX machine for training purposes.Use the Import Precompiled Model dialog box to bring the .sam file for a statistical model that you have already built into your current project.

Caution: If you import a precompiled model and have defined simple or advanced models, unexpected behaviors can occur if you try to delete one of these models. You might lose the data for the lost statistical models.

To import a precompiled model, complete these steps:

SAS Sentiment Analysis Studio: User’s Guide 121

Page 130: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

1. Right-click in the Statistical Models pane of the Statistical tab. Select Import Precompiled Model from the drop-down menu that appears.

2. The Import a Precompiled Model dialog box appears.

3. Enter a name for this model into the Model Name field. For example, type Precompiled.

122 SAS Sentiment Analysis Studio: User’s Guide

Page 131: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

4. Click to the right of the Load a pre-compiled SAS Sentiment

Analysis Object field. The SAS Sentiment Analysis Studio dialog box appears.

5. Select a .sam file. For example, choose simple_stat_object.sam.

6. Click Open.

SAS Sentiment Analysis Studio: User’s Guide 123

Page 132: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

7. Click OK and the new model is loaded into the Statistical Model tab.

5.6 Before You Test Your Statistical Model

After you build, train, and activate the statistical model that you want to test,

click to save your project. You can now specify the test configuration. Use the Test Configuration dialog box to specify the statistical model before you test your input testing documents.To access and use the Test Configuration dialog box, complete these steps:

124 SAS Sentiment Analysis Studio: User’s Guide

Page 133: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

1. Select Build —> Set Test Configuration and the Set Test Configuration dialog box appears.

2. By default, Test Rule-based Model is selected in the Default test

type field. Click to select Test Statistical Model.

Tip: If you do not set the Default test type to Test Statistical Model, no results are returned when you test a statistical model. This is true even if you build, train, and activate the statistical model and save the project.

3. (Optional) Click either or to the right of the Probability threshold field to change the default number 50%. This value is used to determine the overall sentiment of a document. When a document is processed, it is assigned a probability score. If the score of a processed document matches this specification, the sentiment is neutral. If the score exceeds this value, the sentiment is positive. When the score falls below this value, the sentiment is negative.

4. (Optional) Click to select a different statistical model in the Activated statistical model field. Use this operation if you created more than one statistical model.

SAS Sentiment Analysis Studio: User’s Guide 125

Page 134: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

5. (Optional) Click to select a different weight in the Relative weight of positive rules in rule-based model field. This operation sets one of the determinants for sentiment. For more information, see Section 6.2 Understanding Sentiment Computation on page 130.

Note: At this time, changing the weight does not affect your results.

6. (Optional) Click to select a different statistical model in the Activated statistical model field. Use this operation if you created more than one statistical model.

Note: The selected model is the activated model. If you change this selection, the selected model is activated.

7. Click OK to save these settings.

126 SAS Sentiment Analysis Studio: User’s Guide

Page 135: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

8. Click Test to access the testing pane. Here you can see all of the loaded testing documents.

Tip: If you did not load your testing documents, see Section 3.7 The Test Tab on page 55.

9. Select a document and click Test to see the matching terms highlighted in black in the testing pane.

10. See the overall sentiment results displayed in the Text Result pane.

For more information about testing, see Chapter 9, Testing Your Models, Exporting Your Rules.

SAS Sentiment Analysis Studio: User’s Guide 127

Page 136: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

128 SAS Sentiment Analysis Studio: User’s Guide

Page 137: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

6 Creating a Rule-Based Model

- Overview of the Rule-based Model- Understanding Sentiment Computation- Step 1: Defining Your Keywords- Step 2: Specifying Product and Feature Information- Step 3: Specifying Rules- Step 4: Edit Your Rules- Step 5: Build the Rule-based Model- Step 6: Specify the Test Configuration- Export Your Rules

6.1 Overview of the Rule-based Model

The rule-based model is more complex to develop than the statistical models. Statistical models depend on numerical data applied to training documents. The rule-based model enables you to write custom rules to identify sentiment at the granular level. This capability is made possible when your rules define matched instances of products, features, and sentiment. For this reason, this chapter precedes Chapter 7: Writing Sentiment Analysis Rules. Use the rule-writing chapter after you use this chapter to build your model.

129

Page 138: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

6.2 Understanding Sentiment Computation

6.2.1 Overview of Sentiment Computation

Before you build a taxonomy of products and features and specify the rules that identify the expressed sentiment for these objects, you should understand how sentiment is computed. When you write your rules, you have the ability to weight these rules. Although weight is set by default, this capability enables you to affect the way that overall sentiment is computed for the document.

Note: At this time this relative weight setting has no effect on the sentiment score.

Sentiment information is returned for each input document that is tested in the Text Result tab. This data includes overall sentiment, the probability, and confidence scores along with a list of the matched terms and the definitions that are matched.

The overall sentiment is a function of matches on all of the product, feature, and keyword rules. Also included in this overall sentiment is a common weighting factor for the positive rules.The weighting factor for positive rules determines how much importance is assigned to positive rules compared to negative rules. This equation is expressed in terms of a percentage that is internally converted to a normalized value. For example, 100% indicates that both positive and negative rules are equally weighted.

6.2.2 Using Pseudo Code to Understand Sentiment Computation

The sentiment computation can be demonstrated in the form of pseudo code as follows:

overall_sentiment = 0

for each positive rule match:

130 SAS Sentiment Analysis Studio: User’s Guide

Page 139: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

overall_sentiment = overall_sentiment + (weight of the rule * common weighting factor for positive rules)

for each negative rule match:overall_sentiment = overall_sentiment – (weight of the

rule)overall_sentiment_probability =

1/(1+e((0-overall_sentiment)*log(1.5)))

6.2.3 Change the Weight for a Rule

To change the weighting factor in your rules, change the default setting for the Relative weight of positive rules specification field. This setting is specified in the Set Test Configuration dialog box.

Note: At this time, changing the weight does not affect your results.

To change the relative weight of positive rules, complete these steps:

1. Select Build --> Set Test Configuration.

2. Make sure that Test Rule-based model is selected in the Default test type field.

SAS Sentiment Analysis Studio: User’s Guide 131

Page 140: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

3. Click either or to the right of the Relative weight of positive rules in rule-based model field to change this number.

4. Click OK to save your settings.

6.3 Step 1: Defining Your Keywords

6.3.1 Understanding Keywords

Keywords are the basic building blocks for each rule type. Each keyword, or string, defines a unique term that identifies the sentiment that you are matching in input documents. Each instance of a match on a term is added to the total weight for the definition.You can use keywords like intermediate concepts. In other words, reference a positive or negative keyword node in an intermediate concept, product, or feature rule. The ability to reference one definition multiple times enables you to save time. Write a lengthy definition once, and reference that definition multiple times.If you choose to reference your keywords, consider writing CLASSIFIER rules that specify strings. For example, develop a list of terms that express positive sentiment such as like, enjoy, wonderful, great, terrific, and so on. You can then reference these definitions in the intermediate entities that you develop. Use CONCEPT_RULEs and PREDICATE_RULEs to reference keywords directly. Alternatively, reference rules such as C_CONCEPT and CONCEPT rules in intermediate concepts.Although you can enter these keywords by hand, you can also import your keywords using the Import Learned Features operation. For more information, see Section 3.9.4 The Import Learned Features Dialog Box on page 69.

132 SAS Sentiment Analysis Studio: User’s Guide

Page 141: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

6.3.2 Add Keyword Rules into the Keywords Node

Use the Rule tab to enter the keyword rules that form the building blocks of your rules. When you use these nodes, you can also specify the type of rule that you are writing and weight each of these rules.

Tip: Typically, these rules are CLASSIFIER rules that are referenced by intermediate entities and product and feature rules.

To access and use the Rule tab, complete these steps:

1. Click the Rule tab.

2. Select one of the following Tonal Keyword nodes to access the concept definition interface:Positive

rules that locate positive sentiment define this node.Negative

rules that locate negative sentiment define this node.

SAS Sentiment Analysis Studio: User’s Guide 133

Page 142: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Neutral

rules that locate neutral sentiment define this node.

3. Choose one of the following two methods to define the keywords for your rules:

- If you are using CLASSIFIER rules only, load the keywords defined by the statistical model. For more information, see Section 6.3.3 Load Keywords Using the Import Learned Features Operation below.

- Hand-write the keywords. For more information, see Section 6.3.4 Write Keyword Definitions on page 136.

6.3.3 Load Keywords Using the Import Learned Features Operation

Use the Import Learned Features operation to load the keywords for your model. These keywords are generated by SAS Sentiment Analysis Studio from the training corpora when you create a statistical model. Use this operation if you are writing only CLASSIFIER rules. The _ss_feature.txt file that is generated by the statistical model is used for this operation.To import your keywords after you activate your rule-based model, complete these steps:

1. Select the Keywords node in the Rule tab.

134 SAS Sentiment Analysis Studio: User’s Guide

Page 143: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

2. Select Build —> Import Learned Features and the Import Learned Features dialog box appears.

3. Select one of the following operations:

- Leave the default selection in the Load features from a model field selected. (This is the activated statistical model.) For example, leave the default selection Simple. If you have created another

model and want to use that model, click to select a model from the drop-down list that appears.

- Select Load features from a file to select an _ss_feature.txt file to import. Alternatively, click to use the Sentiment Analysis

Studio dialog box to select an _ss_feature.txt file to import. For more information, see Section 3.9.17 The Information and SAS Sentiment Analysis Studio Windows on page 87.

4. Leave the default setting 1000 in the Number of keywords to import. (This number is for the positive and negative sentiment, making the total twice the number that you enter.) Alternatively, click

either or to change this number.

SAS Sentiment Analysis Studio: User’s Guide 135

Page 144: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

5. Click OK and a SAS Sentiment Analysis Studio confirmation window appears.

Note: 2000 learned features refers to the number of both positive and negative keywords that are imported. In this case, 1000 is specified in the Import Learned Features dialog box. The number 1000 specifies the total count for the positive, and the total count for the negative, keywords.

6.3.4 Write Keyword Definitions

You can add keyword definitions to your project. When you choose to use this operation, you define a list of definitions for sentiment that can be referenced by other concepts in your project. You can see matches returned for these concepts whether they are referenced or matched as stand-alone concepts.To write the rules that form the definitions for your keywords, complete these steps:

1. Click to select a rule type under the Type heading. For more information about concept rules, see Chapter 7: Writing Sentiment Analysis Rules.

2. Place your cursor in the selected Rule Body field and enter a string. For example, type terrific for a CLASSIFIER rule for a Positive node. The string that you can write is determined by the specified syntax for each rule type.

3. Specify the weight that is assigned to each occurrence of a match using the Weight field. For more information, see Section 6.3.5 Adding Weights below.

136 SAS Sentiment Analysis Studio: User’s Guide

Page 145: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

4. Repeat Step 1 through Step 3 above to add all of the rules for the Positive keyword concept.

5. Repeat Step 1 through Step 3 above to define the rules for the Negative concept.

6. Repeat Step 1 through Step 3 above to define the rules for the Neutral concept.

6.3.5 Adding Weights

After you add your keywords and write your rules, you can define the weight for each rule match. Each instance of a match on this rule in the input document adds to the total weight of the match for this concept definition.The number set under the Weight heading applies to each instance of a match. Set rule weights using the following information:

- If you import the <model_name>_ss_feature.txt file, the weights are automatically determined by SAS Sentiment Analysis Studio. You can manually enter a new weight into each of the Weight fields.

- If you write your rules, the default value is set to 1 in the Weight field. If there is a match on a negative keyword and a match on a positive keyword, the document is returned as one that expresses neutral sentiment. If a positive rule is weighted .5, it might take two matches on positive keywords to offset one match on a negative keyword. (Weight is only one part of the determination that is used for matching. For more information, see Section 6.2 Understanding Sentiment Computation on page 130.) You can change these weights for each rule individually.

- You can enter a number for the weight for each match manually.All of the weights for the concepts that this document matches are compared by SAS Sentiment Analysis Studio. The match for the concept with the highest weight has the greatest affect on the overall sentiment determination for the input document.See the following examples:

SAS Sentiment Analysis Studio: User’s Guide 137

Page 146: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Example 6-1: Setting Weight Comparatively

When you specify a weight for a rule, you determine the value of the matched string in comparison with strings that match other rules.

A negative CLASSIFIER rule weighted 1 matches rather poor A positive CLASSIFIER rule weighted 3 matches adequate

In the example above, if these are the only matches that occur in the input document, the overall sentiment of the document is considered positive. This statement is true if the default specification of 100% for Relative weight of positive rules remains unchanged. You can change this specification in the Set Test Configuration window. For more information, see Section 6.2.3 Change the Weight for a Rule on page 131.

Example 6-2: Affecting Overall Sentiment

Three matches might be located in an input document for one positive concept definition that is defined by two rules. In this case, the weight for this document is three. However, if this document also matches a negative concept definition, this text might have a weight of six. This weight is the total that includes the match on one rule and two matches on another rule. The total weight of this document is eight for the negative concept. In this example, the match on the negative concept could return an overall sentiment of negative for the input document. This statement is true, if the default setting of 100% in the Relative weight of positive rules field remains unchanged.However, a human reviewer might change these weights to reflect the strength of the sentiment. For example, the CLASSIFIER rule terrific might be considered stronger than a match on the CLASSIFIER rule not too good. A human reviewer might also consider the significance of weight when the first match applies to the product and the second to a feature.

For information about setting weights, see Section 6.2 Understanding Sentiment Computation on page 130. You specify these weights in the Weight field for each rule and the Add New Feature dialog box. For more information, see Section 3.9.9 The Add New Feature Dialog Box on page 75.

138 SAS Sentiment Analysis Studio: User’s Guide

Page 147: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

6.4 Step 2: Specifying Product and Feature Information

6.4.1 Add a Product

Add a product into the Rule tab and later you can add the features that comprise this product. When you add a product, you define the object about which sentiment is expressed. This object can have components that define the object, which are called features.You can choose to add more than one product when you want to compare your products and their features or when you want to identify multiple products in input documents.To define your object, develop a definition that consists of rules. These rules should capture the various names of the product and all of the terms that are used to identify the product.To add a product, complete these steps:

1. Select the Rule tab and right-click on the Products node that appears, by default.

SAS Sentiment Analysis Studio: User’s Guide 139

Page 148: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

2. Click New Product and the Add Product dialog box appears.

3. Enter the name of the new product into the Product name field. For example, type NikonCamera.

Tip: If you specify an underscore (_), hyphen (-), colon (:), or a space in a node name, a reference to this concept by another rule does not return a match.

4. Click OK to save your change. The new product appears in the taxonomy of the Rules pane.

5. Use these steps reiteratively, until you have added all of your products.

You can now add the features for your products. For more information, see Section 6.4.2 Add a Product Feature below.

140 SAS Sentiment Analysis Studio: User’s Guide

Page 149: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

6.4.2 Add a Product Feature

After you add a product, you can add the features for this product. Use definitions, comprised of rules, to identify all instances of each feature, or components of a product. When you identify the features of a product, you can write rules that link sentiment to these objects.To add a feature, complete these steps:

1. Select the Rule tab and right-click a Feature node that is below the product node that you want to reference, such as NikonCamera.

2. Select Add Feature from the drop-down list that appears. The Add New Feature window appears.

3. Enter the name of a product feature into the Name field. For example, type Lens.

SAS Sentiment Analysis Studio: User’s Guide 141

Page 150: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

4. (Optional) Click either or to change the default setting of 0 in the Relative weight field.

Note: At this time, this weight is not entered into the sentiment score calculation.

5. Click OK to save this feature and to see it in the taxonomy.

6. Use these steps reiteratively, until you have added all of the features for all of your products.

6.5 Step 3: Specifying Rules

6.5.1 Overview of Specifying Rules

You can choose to write rules, or to download an entire taxonomy that includes all of the rules in a saved project. When you write your rules, you also specify the products and features in your taxonomy. If you import rules, all of the products and features specified in the imported taxonomy replace any of these nodes specified in your taxonomy.

142 SAS Sentiment Analysis Studio: User’s Guide

Page 151: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

6.5.2 Writing Rules

Write each of the rules that together specify the definition of a Positive, a Negative, or a Neutral node using Chapter 7: Writing Sentiment Analysis Rules.

6.5.3 Import Rules

You can also choose to import the rules that define the nodes in your taxonomy. When you choose to use this operation, all of the products, features, and sentiment definitions are imported. For this reason, the taxonomy that you develop in your current project is overwritten by the taxonomy in the imported project. To import the rules that you defined in another project, complete these steps:

1. Go to File —> Import Rules and the SAS Sentiment Analysis Studio window appears. For more information, see Section 3.9.17 The Information and SAS Sentiment Analysis Studio Windows on page 87.

SAS Sentiment Analysis Studio: User’s Guide 143

Page 152: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

2. Select the sam_rule.xml file in the project folder that contains the rules that you want to import.

3. Click Open and the imported taxonomy with all of its nodes and rules replaces the existing taxonomy in the Rules pane.

4. Click the Rule tab and click on one of the nodes in the taxonomy to see its rules.

6.6 Step 4: Edit Your Rules

You can change rules for greater precision, or for any other reason. When you choose to rewrite your rules, you have two choices:First, enter the changes by selecting the rule and typing the new syntax.Second, use the Rule Editor dialog box to make changes to any of your rules that begin with a Boolean operator. Use either the text or the taxonomy view in this dialog box to make these changes. When you use the taxonomy view, the rule appears with its Boolean operators as nodes.

Tip: The Rule Editor dialog box works only with CONCEPT_RULEs and PREDICATE_RULEs because these rules contain a Boolean operator.

To access and use the Rule Editor dialog box, complete these steps:

144 SAS Sentiment Analysis Studio: User’s Guide

Page 153: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

1. Right-click on the Body section of a PREDICATE_RULE or a CONCEPT_RULE in the Rule pane.

2. Select Edit in Tree View from the drop-down list that appears. The Rule Editor dialog box appears.

3. (Optional) Right-click on a node and select one of the following operations:

a. Select Add Statement to add a new section of the rule between the quotation marks (“”) that appear.

SAS Sentiment Analysis Studio: User’s Guide 145

Page 154: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

b. Select Add Keyword to add a Boolean operator. For more information, see Table 7-2 on page 170.

Note: The Add Keyword operation refers to adding a Boolean operator, not a keyword definition.

c. Select Select Keyword to edit the selected Boolean operator.

d. Select Delete Node to remove the selected section of the rule.

4. (Optional) Click Text View to see the rule in a text editor where you can make changes to the syntax. (In this window, the Boolean operators are highlighted in blue.) Make any changes by entering your edits into the rule syntax.

5. Click OK to save your changes.

146 SAS Sentiment Analysis Studio: User’s Guide

Page 155: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

6.7 Step 5: Build the Rule-based Model

Build your rule-based model after you make major changes to your project or before you perform some operations. For example, you cannot perform test operations until you specify the rule-based model in the Set Test Configuration window and build this model.Before you build your model, specify the rule-based model in the Test Configuration dialog box.To build your model, complete these steps:

1. Select Build —> Build Rule-based Model. A SAS Sentiment Analysis Studio confirmation screen appears.

2. Click OK.

6.8 Step 6: Specify the Test Configuration

Use the Test Configuration dialog box to choose the type of model that you are testing.

Tip: After you set the test configuration, these settings are used for each test process in your current project until you change the configurations settings.

To access and use the Test Configuration dialog box, complete these steps:

SAS Sentiment Analysis Studio: User’s Guide 147

Page 156: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

1. Select Build —> Set Test Configuration and the Set Test Configuration dialog box appears.

2. Leave the default setting Test Rule-based Model selected in the Default test type field.

3. (Optional) Click either or to the right of the Probability threshold field to change the default number 50%. This value is used to determine the overall sentiment of a document. When a document is processed, it is assigned a probability score. If the score of a processed document matches this specification, the sentiment is neutral. If the score exceeds this value, the sentiment is positive. When the score falls below this value, the sentiment is negative.

4. (Optional) Click either or to the right of the Relative weight of positive rules in rule-based model field. Use this operation to change the default number 100%.

Notes: 100% indicates that the positive rules are treated with the same importance as the negative rules. For more information, see Section 6.2 Understanding Sentiment Computation on page 130.At this time, changing the weight does not affect the sentiment score.

148 SAS Sentiment Analysis Studio: User’s Guide

Page 157: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

5. Click OK to save these settings.

For more information about testing, see Chapter 9: Testing Your Models, Exporting Your Rules.

6.9 Export Your Rules

You can export the rules that are defined in the Rule tab. The rule file that you export acts like the sam_rule.xml file that is automatically created when you define rules. The sam_rule.xml file, if saved just before you export rules, is an equivalent file. For this reason, use the export operation when you want to easily re-locate a rules file to another directory.To export your rules, complete the following steps:

1. Go to File —> Export Rules and the SAS Sentiment Analysis Studio dialog box appears.

2. Use this dialog box to name and save your project to the location of your choice. For more information, see Section 3.9.16 The Choose a File and Similar Dialog Boxes on page 86.

3. You can see the exported rules by opening the .xml file that you exported. See the sample below.

SAS Sentiment Analysis Studio: User’s Guide 149

Page 158: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Example 6-3: sam_rule.xml File

4. Import your rules into another project. For more information, see Section 6.5.3 Import Rules on page 143.

150 SAS Sentiment Analysis Studio: User’s Guide

Page 159: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

7 Writing Sentiment Analysis Rules

- The Benefits of Rules- What You Need to Know Before You Write Your Sentiment Analysis

Rules- Rule Types- Table of Rule Modifiers- What Are the Building Blocks for Sentiment Analysis Rules- Sentiment Analysis Concept Definition Examples- Troubleshoot Your Rules

7.1 The Benefits of Rules

Write definitions to define the concepts that locate the sentiment in an input document. Concepts also define products and features and the sentiment expressed about them. The definitions that specify sentiment, products, and features can be comprised of numerous rules. You can also use definitions to identify comparative sentiments between products, features, or brands. SAS Sentiment Analysis Studio provides the following rule-writing benefits for concepts:

- Write a simple rule that matches one term, or complex rules that identify multiple, related terms.

- Reference and match a concept.- Write restrictive rules to prevent matches that do not appear within a

specified context.- Specify part-of-speech tags to locate concepts.

151

Page 160: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

- Write regular expressions to locate matches on patterns including various types of spellings.

- Use Boolean operators with some rule types.- Use various types of markers to increase the matching precision for

some rules.- Choose whether to highlight specific matches when you write some

types of rules.- Use rules to define taxonomy nodes, intermediate entities, and tonal

keywords. The taxonomy consists of one or more product nodes that include its feature and sentiment nodes.

This chapter describes how to write the definitions that specify your sentiment analysis concepts and provides examples of these rules.

7.2 What You Need to Know Before You Write Your Sentiment Analysis Rules

7.2.1 Overview of What You Need to Know

Read the following information before you write your sentiment analysis rules:

- If there are underscores (_) in your node names, matches might not occur for definitions.

- The terms rule and definition are sometimes used interchangeably. Strictly speaking, definitions consist of rules.

- Rule types, for example CLASSIFIER and PREDICATE_RULE are written in uppercase letters, as they appear in the Rule pane.

- SAS Sentiment Analysis Studio performs case-insensitive matching for all rule types.

- If you specify the name of a concept using the _def marker, matches occur if there is a match on the concept that is referenced by this marker.

152 SAS Sentiment Analysis Studio: User’s Guide

Page 161: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

This statement is true with the exception of CLASSIFIER and REGEX rules for which you cannot specify a match on another concept.

- The matched term is highlighted in the tested document for the term specified by the _c marker. This marker is required for a C_CONCEPT or a CONCEPT_RULE.

- Specify one or more unique arguments to highlight matched terms in a PREDICATE_RULE.

- If you specify an intermediate entity in a rule, enable the intermediate entity to see this match highlighted in the tested document.

- A match on any rule for a concept returns a match on the concept. For example, you could write the CLASSIFIER rule perfect. In this case, any instances of the word perfect in an input document are highlighted and returned as a match.

- By default, matches can occur in any part of an input document. You can delimit these matches. For example, when the SENT operator is specified, a match is returned if all of the concepts and terms are located in one sentence. If you want to limit matches in a .xml document, specify the searchable fields in your rule. Also check the default and excluded XML fields in the Rule-based Model Settings tab of the Project Settings Dialog wizard.

- Click under the Type heading in the Rule tab to select a rule type

before you type the text for the rule.- Specify the weight as you write each rule. The weight of each rule

reflects the degree of sentiment expressed or the relative importance of the rule. In other words, the CLASSIFIER rule Terrific! could carry a higher weight than the CLASSIFIER rule good. By default, however, the weight of each rule is set to 1.

Notes: At this time, if you change the weight of a rule, there is no effect on the sentiment score.If you want to enable a disabled concept, right-click on the concept and select Enable Concept.

SAS Sentiment Analysis Studio: User’s Guide 153

Page 162: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

7.2.2 Verify the Accuracy of Your Rules

SAS Sentiment Analysis Studio enables you to validate the syntax of your rules automatically. When you write a rule, the syntax is automatically checked for you when you complete the rule and when you build the rule-based model. You can also see the results in the Syntax Errors tab.

Tip: If the Syntax Errors tab is not visible, place your cursor on the bottom of the window and pull up.

7.2.3 Ensure Accurate Rule Matches

Use the Text Result tab after you test a document to see a display of all of the definitions that match terms in the tested document. When you check this tab, you confirm that your rules are matching as expected. If a rule does not match as you anticipate, rewrite the rule or edit a referenced rule.Any terms that are highlighted in the input text are also matched to a rule. However, when your rule references another concept definition, these terms are not matched to the input document for C_CONCEPT and CONCEPT_RULEs. This statement is true unless a _c marker precedes the bracketed term, terms, referenced concept, or multiple concepts. The same is true for a PREDICATE_RULE. However, an argument that is similar to the _c context marker, but which does not use a c, is specified for PREDICATE_RULEs. In all cases where another concept is referenced, _def precedes the reference, or a match is not returned.When you reference another concept, a match is returned to the referenced concept. This statement is true except in cases where PREDICATE_RULEs return a match on a string that includes words that are not specified by the rule.In some cases, the referenced concept returns a match on a sentiment subnode, such as Positive for the Image feature of the CAMERA product. In these instances, the match is not returned to the referring rule. Matches are returned only for the feature and product nodes.Matches are returned for intermediate entities if you enable this concept. Like all other concept matches, tonal keyword matches appear in the Text Result pane.

154 SAS Sentiment Analysis Studio: User’s Guide

Page 163: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

See the examples in this chapter and refer back to the explanations that are provided here as necessary. Also see Section 9.3.3 Test One Document on page 220.

7.3 Rule Types

Develop various types of sentiment analysis rules using the explanations that are listed below:CLASSIFIER

This rule consists of a word or string. When a match occurs on this rule, the entire string is highlighted in the input document. This rule is typically used in the definitions of Tonal Keywords to locate the terms that express sentiment. For more information, see Section 7.6.1 Example: Matching a Term with a CLASSIFIER Rule on page 192.

CONCEPT

Specify matched, highlighted terms like you do for a CLASSIFIER rule. Add the _def marker to locate referenced concepts in an input document. Also specify part-of-speech tags, and the _cap and _w markers to locate types of terms. You can also use CONCEPT rules to locate, or to discover, related information. For more information, see Section 7.6.2 Example: Matching a CONCEPT Rule on page 194.

C_CONCEPT

A C_CONCEPT rule is similar to a CONCEPT rule. Use a C_CONCEPT rule to locate matches within a specified context. For this reason, the context marker (_c) is required for this rule. The context marker also specifies the match that is highlighted in an input document. Use this rule, only, to specify coreference. You can also use some of the available operators and markers with this rule. For more information, see Section 7.6.3 Example: Context Matching with a C_CONCEPT Rule on page 196.

CONCEPT_RULE

Like the C_CONCEPT rule, the _c marker is required for CONCEPT_RULEs. However, CONCEPT_RULEs also require Boolean operators. Also use the _def and other markers to specify the matches that are returned to this rule. For

SAS Sentiment Analysis Studio: User’s Guide 155

Page 164: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

more information, see Section 7.6.4 Example: Matching Boolean Operators in a Concept_Rule on page 198.

PREDICATE_RULE

This rule is the most complex and exact of the rule types. You can specify arguments that return highlighted matches in a tested document. These matches include all of the strings between the first and last located arguments. For example, specify relationships between products, features, and the sentiment expressed about them. Use the _def marker, with strings to reference other concepts. Boolean operators, like arguments, are required to identify related pieces of information that are often located and matched as phrases. You can also use a PREDICATE_RULE to locate an unfavorable comparison. For example, compare the flavor of two different brands of the same type of soup. For more information, see Section 7.6.5 Examples: Predicate Rule on page 200.

REGEX

Locate information that follows a preset pattern such as price, percent, and various word spellings. Do not specify any of the Boolean operators or markers with this rule. For more information, see Section 7.6.6 Example: Using Regular Expressions to Match Patterns on page 207 and Appendix B: Regular Expressions.

7.4 Table of Rule Modifiers

The following table provides an overview of the various operators and modifiers that are available for the rules that you write. Use the links for each capability in order to see a more detailed explanation of each modifier. This table is provided as a quick reference.

Table 7-1: Rule Modifiers

Modifier CLASSIFIER CONCEPT C_CONCEPT CONCEPT_RULE

PREDICATE_RULE REGEX

Match specified strings

X X X X X X

Comments X X X X X X

156 SAS Sentiment Analysis Studio: User’s Guide

Page 165: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

7.5 What Are the Building Blocks for Sentiment Analysis Rules

7.5.1 About the n-gram Sequence

SAS provides n-gram sequence features that are often used in natural Language Processing (NLP). These elements specify the context for a match. Before you write your sentiment analysis rules, consider the building blocks that are explained in this section.

7.5.2 Entering Comments into Rules

Any character, or characters, which follow the pound sign (#) are considered to be comments. For a literal # to match, it is escaped as \#.

c marker required required

_def marker X X X X

arguments required

_w X X

_cap X X

> symbol X

@, @N, and @V X X X X

Part-of-speech tags

X X X X

Coreference X X

Regular expressions

required

Table 7-1: Rule Modifiers (Continued)

Modifier CLASSIFIER CONCEPT C_CONCEPT CONCEPT_RULE

PREDICATE_RULE REGEX

SAS Sentiment Analysis Studio: User’s Guide 157

Page 166: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

For example, enter this comment into the CLASSIFIER rule very: #write Positive- and NegativePhrase rules that reference this concept.

You can also use comments to prevent a rule from matching. In other words, you can choose to test only one rule in a definition. To perform this operation, place the comment character at the beginning of the rule syntax.See the example of a comment in a rule that does not prevent matching on the term very below.

Display 7-1 Comment in a Rule

7.5.3 Specify a Match within an XML Field

You can choose to enable matches only within specified fields in an input .xml document. In order to specify a field, type the following sequence:

underscore (_) field name colon (:)See the following syntax examples:

- _title:: locate a match only in the <title> field.- _author:: locate a match only within the <author> field.- _body:: locate a match only in the <body> field.

Use this specification with any of the rule types except the REGEX rules.

158 SAS Sentiment Analysis Studio: User’s Guide

Page 167: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

To enable matching in .xml documents that is limited to one or more specific fields, complete these steps:

1. Select Edit --> Preferences. The Project Settings Dialog wizard appears.

2. Click Rule-based Model Settings and the Rule-based Model Settings page appears

3. (Default) Leave Enable XML support selected.

4. Check the Default fields to make sure there are no conflicts with the fields that you plan to specify.

5. Check the XML tags to ignore to make sure there are no conflicts with the fields that you plan to specify.

SAS Sentiment Analysis Studio: User’s Guide 159

Page 168: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Tip: For more information about Step 4 and Step 5 above, see Section 3.4.3 Specify the Rule-Based Model Settings on page 23.S

6. Click OK to exit this page and to save any of your changes.

7. Write your rule. See the following example:.

160 SAS Sentiment Analysis Studio: User’s Guide

Page 169: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

8. See the matching text in the following example:

SAS Sentiment Analysis Studio: User’s Guide 161

Page 170: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

7.5.4 Specifying the _c Marker

Use the context marker (_c) to locate related terms within a specific context. This marker is required for C_CONCEPT and CONCEPT_RULEs. Unless the located terms are adjacent to one another in a C_CONCEPT rule, no match is returned. This is not true for CONCEPT_RULEs.

Display 7-2 C_CONCEPT Rule

In this example, see the following C_CONCEPT rule components:- _c: The context marker specifies that a match is highlighted, and

returned as a match to the rule, if it appears in the specified context.- CAMERAImage: The definition for the Image feature is specified. In this

example, flash is matched.- _def: This term specifies a match on the definition of the specified

concept.- NegativePhrases: The definition for the NegativePhrases

intermediate entity is matched twice. In this example, is worthless and is ever needed are matched.

The rule matches are circled below:

162 SAS Sentiment Analysis Studio: User’s Guide

Page 171: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Figure 7-1 Rule Matches

Other examples of rules with the _c marker include: - Section 7.5.11.B Specifying the ALIGNED Operator on page 171- Section 7.6.3 Example: Context Matching with a C_CONCEPT Rule

on page 196

7.5.5 Specifying Arguments

Arguments are specific to PREDICATE_RULEs and each argument is unique. This statement is true, unless you specify the same type of information in one, or more, arguments. For example, if you specify the _a argument for Reserve Officer Training Corps, you might also specify the _a argument for ROTC because they return the same information. For more information, see Section 7.6.5 Examples: Predicate Rule on page 200.

SAS Sentiment Analysis Studio: User’s Guide 163

Page 172: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

7.5.6 Specifying the _w Marker

Use the word marker (_w) to specify that a match can occur on a word. You can specify this term multiple times to locate several words or more than one occurrence of the same word. For an example of how _w is used, see Section 7.5.13 Specify a Part-of-Speech Tag on page 181.

7.5.7 Specifying the _cap Marker

Use the _cap marker in ways that are similar to the _w term. However, _cap only returns matches on words that begin with an uppercase letter. You can specify this term multiple times to locate several words or more than one occurrence of the same word.

Display 7-3 Specifying _cap

In this example, see the following CONCEPT rule components:- _cap: This marker specifies a match on a word that begins with an

uppercase letter. In this example a match is returned for the word Great.- camera: This term matches itself.

164 SAS Sentiment Analysis Studio: User’s Guide

Page 173: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

The rule match is circled below:Figure 7-2 Rule Match

7.5.8 Specifying the > Symbol

This symbol is used with coreference to specify that every occurrence of the bracketed term is a match. For more information, see Section 7.5.15 Specifying Coreference Operators on page 187.

SAS Sentiment Analysis Studio: User’s Guide 165

Page 174: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

7.5.9 Specifying the @ Sign to Match Word Forms

Append the at sign (@) to a word when you want to return all of the word forms for this word. You could also append an @N to match all noun forms and @V to match all verb forms.

Display 7-4 Appending an @ Sign

When you append at @ sign to a term, there is no space between the term and the @ sign. Nor does a space follow the appended @ sign.The rule matches are circled below:

166 SAS Sentiment Analysis Studio: User’s Guide

Page 175: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Figure 7-3 Rule Matches

For more information about the at sign, see Section 7.5.9 Specifying the @ Sign to Match Word Forms on page 166.

7.5.10 Specifying Rule Punctuation Marks

7.5.10.A Specifying Quotation Marks

Place quotation marks (“”) around terms and concepts when you write a CONCEPT_RULE or a PREDICATE_RULE.See the following examples:

- Section 7.5.11.B Specifying the ALIGNED Operator on page 171- Section 7.5.11.C Specifying the AND Operator on page 173- Section 7.6.4 Example: Matching Boolean Operators in a

Concept_Rule on page 198

SAS Sentiment Analysis Studio: User’s Guide 167

Page 176: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

7.5.10.B Specifying Parentheses, Square Braces, and Curly Braces

Use parentheses (()), square braces ([]), and curly braces ({}) to qualify the information that you are seeking to match. Use these delimiters with all of the sentiment analysis definitions except the CLASSIFIER and REGEX types. Use parentheses (()) to group the elements that comprise C_CONCEPT, CONCEPT_RULE, and PREDICATE_RULE definitions. For example, use parentheses with Boolean operators. These operators are followed by a comma (,) and a space. For more information, see Section 7.6.4 Example: Matching Boolean Operators in a Concept_Rule on page 198.Use curly braces to delimit the information that is returned as a match. Curly braces ({}) are used with or without parentheses (()), according to the type of definition that is specified. For more information, see Section 7.6.4 Example: Matching Boolean Operators in a Concept_Rule on page 198.Use square braces ([]) to group REGEX rule elements. For more information, see Section 7.6.6 Example: Using Regular Expressions to Match Patterns on page 207.

7.5.10.C Specifying Commas

Commas (,) always follow definition elements: - Commas follow Boolean operators.- Quotation marks (“”) enclose concept names and a comma follows the

second quotation mark.- A comma is placed after the closing set of quotation marks when a

match is specified using _c, _def, an argument, or any combination of these markers.

See the list of examples in Section 7.5.10.A Specifying Quotation Marks on page 167.

168 SAS Sentiment Analysis Studio: User’s Guide

Page 177: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

7.5.10.D Specifying Colons

Type a colon (:) before the part-of-speech tag or after an XML field specification. For example, type :Prep and :sep as part-of-speech tags. For more information, see Section 7.5.13 Specify a Part-of-Speech Tag on page 181. Also see Section 7.5.3 Specify a Match within an XML Field on page 158.

7.5.10.E Specifying Spaces

When you write CONCEPT, C_CONCEPT, CONCEPT_RULE, or PREDICATE_RULE definitions, you type at least one space between tokens, concepts, part-of-speech tags, and the _w and _cap, terms. Also type a space before the _c marker if it is preceded by a token, comma (,), or the name of a concept.

Notes: Do not add a space at the beginning or the end of a rule or the rule might not match the input text.

See the following example of the use of spaces:Display 7-5 Spaces in Rules

SAS Sentiment Analysis Studio: User’s Guide 169

Page 178: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

7.5.11 Specifying Boolean Operators

7.5.11.A Overview of Boolean Operators

To locate related information with greater precision, specify Boolean, or logical operators, with CONCEPT_RULEs or PREDICATE_RULEs.

Specify a comma (,) and a space after a Boolean operator and enclose it in parentheses (()). For example, write (SENT, “PRODUCT”).

Table 7-2: Boolean Operators

Operator Description

ALIGNED Specify that a match only occurs when there is an overlap between two concepts or terms. For example, if the arguments _a and _b locate a common match on at least one instance of the rules that comprise their definitions.

AND Specify that all of the terms linked to this operator are present or a match does not occur.

OR Specify that if one, or more, terms are located, a match is returned.

ORD Specify the order for a match. If matched instances are located out of order, a match is not returned for the document.

DIST_n Specify the number of words between matches on rule terms. The first match takes the starting position 1 while the last match falls at or before the specified number of words.

ORDDIST_n Specify the maximum word count between matches and the order necessary to return a match. Otherwise, this operator functions like the DIST operator above.

SENT Specify this operator to return the specified matches but only when they occur in the same sentence.

170 SAS Sentiment Analysis Studio: User’s Guide

Page 179: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

7.5.11.B Specifying the ALIGNED Operator

Specify the ALIGNED operator for two or more matches when one of the located terms matches each of the concepts specified in the rule. This operator specifies that a highlighted match is returned for each of the concepts.

Display 7-6 ALIGNED Operator with a CONCEPT_RULE

In this example, see the following CONCEPT_RULE components:- ALIGNED: This operator specifies that a match occurs when one of the

located terms matches each of the concepts specified in the rule.- _c marker: This marker specifies that a match is highlighted, and

returned as a match to the rule, if it appears in the specified context.- _def: This term specifies a match on the definition of the following

concept.- CAMERAPrice: This definition matches, if located in an input document.

In this example, a match occurs for the terms Nikon L14. (There is a match on the subnode CAMERAPrice: Positive for great buy.)

- CAMERA: The definition for the CAMERA product completes the match for the rule. (Nikon L14 is specified in this rule, triggering the ALIGNED rule match.)

SAS Sentiment Analysis Studio: User’s Guide 171

Page 180: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Tip: This example uses the manual test operation instead of an imported document. For more information, see Section 9.3.4 Manually Test a Document on page 223.

The rule matches are circled below:Figure 7-4 Rule Matches

172 SAS Sentiment Analysis Studio: User’s Guide

Page 181: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

7.5.11.C Specifying the AND Operator

Specify the AND operator for two or more arguments. A match occurs only if the specified arguments are present. For example, the AND operator in the following rule limits matches image quality in documents where the term is superb also occurs and the rest of the rule is matched.

Display 7-7 AND Operator in a CONCEPT_RULE

In this example, see the following CONCEPT_RULE components:- AND: A match occurs when both of the specified terms appear in the

document.- _c marker: This marker specifies that a match is highlighted, and

returned as a match to the rule, if it appears in the specified context.- _def: This term specifies a match on the definition of the following

concept.- CAMERAImage: The definition for the Image feature is specified. For

example, match image quality.- CAMERAUsability: A match occurs when the definition for the

Usability feature of the CAMERA product is located. For example, is superb. However, this match is not returned for this rule, because it is not specified using a context marker (_c).

The rule matches are circled below:

SAS Sentiment Analysis Studio: User’s Guide 173

Page 182: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Figure 7-5 Rule Matches

174 SAS Sentiment Analysis Studio: User’s Guide

Page 183: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

7.5.11.D Specifying the OR Operator

Specify the OR operator for one or more matched rule components. A match occurs for an input document if at least one of these matches is present. For example, the following rule matches if either the token but or disappointing is present in the input document:

Display 7-8 OR Operator in a CONCEPT_RULE

In this example, see the following CONCEPT_RULE components:- OR: A match occurs when one or more terms is located in an input

document.- _c marker: This marker specifies that a match is highlighted, and

returned as a match to the rule, if it appears in the specified context.- but: A match occurs on the word but, if this word is present in the input

document.- _def: This term specifies a match on the definition of the following

concept.- CAMERAUsability: In this rule, a match occurs on the word

disappointing. However, this match is not returned for this rule, because the match is not specified using a context marker (_c).

The rule matches are circled below:

SAS Sentiment Analysis Studio: User’s Guide 175

Page 184: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Figure 7-6 Rule Matches

176 SAS Sentiment Analysis Studio: User’s Guide

Page 185: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

7.5.11.E Specifying the ORD Operator

Specify the ORD operator to delineate the order for two or more matches. A match on the rule occurs only when the matched terms occur in the specified order within the document.

Display 7-9 ORD Operator in a PREDICATE_RULE

In this example, see the following PREDICATE_RULE components:- ORD: A match occurs when the located terms occur in the specified

order.- _def: This term specifies a match on the definition of the following

concept.- NegativePhrases: A match on the definition for the Image feature of

the CAMERA product triggers the rule. In this example, a match occurs on the word Poor because this intermediate concept is enabled.

- _a and _b: These arguments specify that if a match is located in an input document, it is highlighted in an input document.

- CAMERAImage: A match on the definition for the Image feature of the CAMERA product is highlighted in the text. In this example, a match is returned for image.

- CAMERAPrice: A match on the definition for the Price feature of the CAMERA product is highlighted in the text. In this example, a match is returned for price.

See the matching string that is circled below:

SAS Sentiment Analysis Studio: User’s Guide 177

Page 186: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Figure 7-7 String Match

Tip: The entire string is highlighted because this is a PREDICATE_RULE. For more information, see Section 7.6.5 Examples: Predicate Rule on page 200.

178 SAS Sentiment Analysis Studio: User’s Guide

Page 187: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

7.5.11.F Specifying the DIST_n Operator

Specify the maximum distance, in words, between located terms in order to return a match on the selected concept.

Note: This rule does not specify the ordering of the matching terms. To use the ORDDIST operator, see Section 7.5.11.G Specifying the ORDDIST_n Operator on page 180.

Display 7-10 DIST Operator

In this example, see the following CONCEPT_RULE components:- DIST_10: A match occurs when the located terms occur within 10

words of each other.- _c marker: This marker specifies that a match is highlighted, and

returned as a match to the rule, if it appears in the specified context.- _def: This term specifies a match on the definition of the following

concept.- PositivePhrases: A match on the definition for the Positive

Phrases intermediate entity is highlighted in the input document. In this example, a match occurs on the word Does the job because this intermediate concept is enabled.

- CAMERAOverview: A match on the definition for the Overview feature of the CAMERA product is matched in the text. In this example, a match is

SAS Sentiment Analysis Studio: User’s Guide 179

Page 188: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

returned for nice and light. However, this match is not returned for this rule because it is not specified with a context marker (_c).

The rule match is circled below:Figure 7-8 Rule Match

7.5.11.G Specifying the ORDDIST_n Operator

Specify the order and distance between the terms or concepts that you want the specified sentiment analysis concept to match. This operation locates and returns a match even when the usual contextual clues provided by adjacent matches are missing. An example of this rule is the following adaptation from the DIST_n rule, see Section 7.5.11.F Specifying the DIST_n Operator on page 179.

(ORDDIST_10, "_c{_def{PositivePhrases}}", "_def{CAMERAOverview}")

In this ORDDIST rule, the matches displayed in Figure 7-8 above are also true for this rule. The matches Does the job and nice and light appear in the specified order, within a distance of 10 words.

180 SAS Sentiment Analysis Studio: User’s Guide

Page 189: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

7.5.11.H Specifying the SENT Operator

Locate matches in the same sentence. Matches are returned when the matching concepts or terms are located in the same sentence.An example of this rule is the following adaptation from the DIST_n rule, see Section 7.5.11.F Specifying the DIST_n Operator on page 179.

(SENT, "_c{_def{PositivePhrases}}", "_def{CAMERAOverview}")

In this SENT rule, the matches displayed in Figure 7-8 on page 180 are also true for this rule. This statement is true because these matches occur within the same sentence.

7.5.12 Specifying the _def Marker

Use the _def marker to reference products, features, and other concepts within the same taxonomy. The taxonomy consists of one or more product nodes that include its feature and sentiment nodes. Use the _def marker with a CONCEPT, C_CONCEPT, CONCEPT_RULE, and PREDICATE_RULE. For more information, see Section 7.6.5 Examples: Predicate Rule on page 200.

7.5.13 Specify a Part-of-Speech Tag

Specify part-of-speech tags when you don’t know the exact word that you are seeking. For example, :A to return any adjective and :sep to specify a separator character. A separator character is any punctuation mark. These part-of-speech tags are preceded by a colon (:). Each tag, with the exception of the :sep tag, begins with an uppercase letter. In addition, a space is located between each of these tags. For a complete list of part-of-speech tags, see Appendix C.To write a rule that uses a part-of-speech tag, complete these steps:

1. Select Edit —> Preferences.

SAS Sentiment Analysis Studio: User’s Guide 181

Page 190: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

2. Click Rule-based Model Settings in the Project Settings Dialog wizard that appears.

3. Make sure that Use Part-of-speech Tagger is selected and that the default, or another .htagger file is specified.

4. Click OK to exit the Project Settings Dialog wizard and to save any changes that you made.

182 SAS Sentiment Analysis Studio: User’s Guide

Page 191: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

5. Write a CONCEPT rule that specifies a part-of-speech tag. For more information about CONCEPT rules, see Section 7.6.2 Example: Matching a CONCEPT Rule on page 194.

In this example, see the following CONCEPT rule components:- :A: This part-of-speech marker matches an adjective such as digital. - :N: This part-of-speech marker matches a noun such as camera. - _w: This marker matches any word such as I, ever, and had.

SAS Sentiment Analysis Studio: User’s Guide 183

Page 192: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

6. Test the rule and see a rule match that is similar to the example circled below:

7.5.14 Writing Regular Expressions

To match known patterns, use regular expressions to specify a range of letters or numbers. For example, specify a-z inside square braces ([]). This string matches any word beginning with an ASCII character whose value is between a and z.

Note: Use REGEX rules to write only regular expressions.Regular expressions are case-insensitive, at this time. This means that a match occurs on a string, regardless of the case specified in the rule.

If you add a plus (+) sign after the last square brace (]), all matching instances within a term that is delimited by whitespace characters is matched. See the following example of a REGEX rule:

[a-z]+

184 SAS Sentiment Analysis Studio: User’s Guide

Page 193: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

In this case, all instances of terms beginning with a letter from the English alphabet match any and all occurrences of words in an input document. For example, the words like, bad, and lens could be returned as matches. Use a regular expression to match various forms of a word. See the following example:

L\w+v

This REGEX rule could return matches on Luv, luv, Lov, and lov.

Tips: You could also add a CLASSIFIER rule to the definition to match instances of Love.To build a phrase in a REGEX rule, you can also add one or more terms. For example, specify L\w+v it to return a match on Luv it. Alternatively, reference the L\w+v Tonal Keyword rule in an Intermediate Concept and build this phrase as a positive phrase.

Add either the % symbol or write out percent, after a bracketed range of numbers. This feature enables you to locate percentage matches in your documents. See the following examples:

SAS Sentiment Analysis Studio: User’s Guide 185

Page 194: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Display 7-11 REGEX Rule

Tip: You could also add a REGEX rule that specifies [0-9]+ percent. Examples of matches on this rule include, 50 percent, 6 percent, and 100 percent.

This regular expression specifies that only numbers followed by the percentage sign (%) match. The rule match is circled below:

186 SAS Sentiment Analysis Studio: User’s Guide

Page 195: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Figure 7-9 REGEX Rule Match

For more information see Section 7.6.6 Example: Using Regular Expressions to Match Patterns on page 207 and Appendix B.

7.5.15 Specifying Coreference Operators

Use coreference operators to write rules to return a match on a partial term only when the full term is matched once in the document. Specify coreference using either a C_CONCEPT or a CONCEPT_RULE. In this example, Nikon Coolpix L10 is a match for each instance of a match on the partial term L10 in an input document.When the tested document is displayed in the Rule tab, both the canonical word form and the matching term are highlighted. This is because these matches are linked in SAS Sentiment Analysis Studio. Both terms are displayed as matched in the Full Text tab.Use the coreference operator (_ref) to specify the partial match. Use a greater than (>) symbol immediately after the term specified by the _ref operator to locate all instances of matches on the partial match.

SAS Sentiment Analysis Studio: User’s Guide 187

Page 196: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Note: If you do not specify the greater than symbol, no partial matches are returned.

Display 7-12 Coreference Operators

In this example, see the following C_CONCEPT rule components:- _c: The context marker specifies that a match is highlighted, and

returned as a match to the rule, if it appears in the specified context.- Nikon Coolpix: This is the term that is completed by the partial match

term L10.- _ref: The coreference operator specifies the partial match.- L10: This partial match is returned only when the term Nikon Coolpix

L10 is matched at least once in an input document.- >: The greater than symbol indicates that a match is returned for each

instance of L10. This is true when Nikon Coolpix L10 is matched at least once in an input document.

The rule matches are circled below:

188 SAS Sentiment Analysis Studio: User’s Guide

Page 197: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Figure 7-10 The Rule Matches

You can also add more rules to this project to locate related phrases. See the following example:

SAS Sentiment Analysis Studio: User’s Guide 189

Page 198: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Display 7-13 Adding a Rule for Coreference

This PREDICATE_RULE rule is specified using the following components:- AND: This Boolean operator specifies that unless both arguments (_a and

_b) are located, no match is returned.- _a and _b: These arguments specify that the returned matches are

displayed in the input document. (Each argument is distinct because it refers to two different entities.)

Note: These matches are not returned to this PREDICATE_RULE in the Text Result tab, because of the reference to a co-reference rule.

- _def: This operator references the concept that follows.- CAMERA: This referenced product specifies a match is returned when the

CAMERA product definition is located in an input document. In this example, the terms Nikon L10 and L10 are matched.

- CAMERAImage: This referenced feature specifies that a match is returned when the Image feature of the CAMERA product definition is located. In

190 SAS Sentiment Analysis Studio: User’s Guide

Page 199: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

this example, the term as my first digital and was very good are matched.

This rule returns the matches that are circled below:Figure 7-11 Matches for Coreference and the Related Rule

SAS Sentiment Analysis Studio: User’s Guide 191

Page 200: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

7.6 Sentiment Analysis Concept Definition Examples

7.6.1 Example: Matching a Term with a CLASSIFIER Rule

Each CLASSIFIER rule correlates to one term, or dictionary entry. Display 7-14 CLASSIFIER Rules

All matched CLASSIFIER definitions are highlighted in an input document. The rule matches are circled below:

192 SAS Sentiment Analysis Studio: User’s Guide

Page 201: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Figure 7-12 Rule Matches

SAS Sentiment Analysis Studio: User’s Guide 193

Page 202: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

7.6.2 Example: Matching a CONCEPT Rule

CONCEPT rules, like CLASSIFIER rules, match strings. However, CONCEPT rules are more powerful than CLASSIFIER rules because they enable you to use modifiers such as _w, _cap, part-of-speech tags, and the _def marker. You can also specify an XML tag in order to locate matches within a specified field.

Display 7-15 CONCEPT Rule

When you write a CONCEPT rule, all of the matches are highlighted in an input document. These matches are also returned to the rule in the Text Result tab. In other words, CONCEPT rules display results like CLASSIFIER rules.The syntax for this CONCEPT rule is explained below:

- _body: This marker specifies the XML field where the matches are located.

- CAMERAImage: A match occurs when one of the rules in this definition is matched. In this example, a match is returned for pictures.

- _w: This term specifies a match on any word. In the following example, matches are returned to this rule for are, really, and great.

The rule match is circled below:

194 SAS Sentiment Analysis Studio: User’s Guide

Page 203: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Figure 7-13 Rule Match

SAS Sentiment Analysis Studio: User’s Guide 195

Page 204: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

7.6.3 Example: Context Matching with a C_CONCEPT Rule

Write a C_CONCEPT definition to locate matches. The _c marker is used to locate and match concepts within a specific context. Use this marker to specify a match that is highlighted in the input document.

Display 7-16 C_CONCEPT Rule

This C_CONCEPT definition specifies a relationship between the enabled PositivePhrases intermediate entity and the word camera. The syntax for this C_CONCEPT rule is explained below:

- _c: The context marker specifies that a match is highlighted, and returned as a match to the rule, if it appears in the specified context.

- _w: This term specifies a match on any word. In the following example, matches are returned to this term for the words is, are, and arguably.

- _def: This term specifies a match on the definition of the following concept.

- PositivePhrases: This Intermediate Entities definition is highlighted in the input document and returned as a match to this rule.

196 SAS Sentiment Analysis Studio: User’s Guide

Page 205: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

In this example, this concept is enabled. However, these matches are not returned for this rule because they are not preceded by a context marker (_c).

The rule matches are circled below:Figure 7-14 Rule Matches

SAS Sentiment Analysis Studio: User’s Guide 197

Page 206: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

7.6.4 Example: Matching Boolean Operators in a Concept_Rule

Write CONCEPT_RULEs like C_CONCEPT rules. However, CONCEPT_RULEs expand the syntax of C_CONCEPT rules to include Boolean operators. CONCEPT_RULEs also require you to enclose the terms that you want to match in quotation marks (“”).

Display 7-17 CONCEPT_RULE

The syntax for this CONCEPT_RULE rule is explained below:- SENT: A match occurs when the terms are located in one sentence.- _c: The context marker specifies that a match is highlighted, and

returned as a match to the rule, if it appears in the specified context.- _def: This operator references the concept that follows.- NegativePhrases: This Intermediate Entities definition is

highlighted in the input document and returned as a match to this rule. In this example, this concept is enabled and a match is returned for do not.

- CAMERAOverview: A match occurs when one of the rules in this definition is matched. In this example, a match is returned for buy. This

198 SAS Sentiment Analysis Studio: User’s Guide

Page 207: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

term is not returned as a match for the rule because it is not specified with the context marker.

The rule matches are circled below:Figure 7-15 Rule Matches

SAS Sentiment Analysis Studio: User’s Guide 199

Page 208: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

7.6.5 Examples: Predicate Rule

7.6.5.A Overview of Predicate Rule Examples

PREDICATE_RULEs are the most complex and exact of the SAS Sentiment Analysis Studio rules. They extend the functionality of the CONCEPT_RULEs. The unique feature of this rule type is the ability to define arguments that locate relationships between two or more concepts. For example, specify relationships between products, features, and the sentiment expressed about them. These rules, like CONCEPT_RULEs, are unique because they use arguments. Arguments specify the concept, or a specific term, which is located in input documents. Arguments use the _c syntax, but specify a unique letter or term that is not c after the underscore character (_). When you use arguments before you specify _def, you can see the matched term highlighted in your document. This match is also returned for both the rule using the _def marker and the referenced concept.The argument feature, combined with the available operators and symbols enables you to locate matches on related entities. For example, locate a match on tasty meat or tasty meat is very delicious when cooked. In the second example, the product is meat. The feature is cooked and the expressed sentiment terms are tasty and very delicious (where very delicious could be one concept or two).

200 SAS Sentiment Analysis Studio: User’s Guide

Page 209: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

7.6.5.B Example 1: Matching Two Different Concepts

This example shows how a PREDICATE_RULE can be used to locate a match on a product and one of its features. In this case, the PointandShoot feature with its CAMERA product is specified. In this example, both of the referenced concepts are preceded by an argument. These arguments return a string.

Display 7-18 Two Referenced Concepts

This PREDICATE_RULE rule is specified using the following components:- AND: This Boolean operator specifies that unless both arguments (_ab

and _ac) are located, no match is returned.- _ab and _ac: These arguments specify that the returned matches are

displayed in the input document and returned as a rule match. (Each argument is distinct because it refers to two different entities.)

- _def: This operator references the concept that follows.- CAMERA: This referenced product specifies a match is returned when the

CAMERA product definition is located in an input document. In this example, the word camera is matched.

- CAMERAPointandShoot: This referenced feature specifies that a match is returned when the PointandShoot feature of the CAMERA product definition is located. In this example, the term very easy is matched.

This rule returns the string match that is circled below:

SAS Sentiment Analysis Studio: User’s Guide 201

Page 210: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Figure 7-16 String Match

The first word that is highlighted, camera, is a match on the CAMERA product. The last word that is highlighted, easy, is a match on the PointandShoot feature of the CAMERA product. The entire string is returned as a match because this is a PREDICATE_RULE.

202 SAS Sentiment Analysis Studio: User’s Guide

Page 211: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

7.6.5.C Example 2: Referencing One Concept and Two Terms

This example shows how a PREDICATE_RULE can be used to locate a match on sentiment that is followed by the terms five stars and well.

Display 7-19 One Referenced Concept and Two Referenced Terms

This PREDICATE_RULE definition is specified using the following components:- DIST_20: This operator specifies that a match on the rule is returned

when all of the terms are located within a distance of 20 words.- _rev, _a, _b: These arguments specify matches on the CAMERAReview

concept, five stars, and well, respectively. - _def: This operator references the concept that follows.- CAMERAReview: This marker specifies a match on this concept. In this

example, the word Overall is matched.- AND: This operator specifies that a match on the rule is returned when

matches occur on the arguments _a and _b. These two matches (five stars and well) are located within a distance of 20 words from the match on the first argument (CAMERAReview).

This rule returns the string match that is circled below:

SAS Sentiment Analysis Studio: User’s Guide 203

Page 212: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Figure 7-17 String Match

204 SAS Sentiment Analysis Studio: User’s Guide

Page 213: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

7.6.5.D Comparing Products and Features Using a Predicate Rule

Use a PREDICATE_RULE to compare products and their features. When you locate the comparisons in an input text, this information provides the why of the expressed sentiment. Understand how one product compares to another and these determiners affect consumers.

Display 7-20 Products and Features Comparison

This PREDICATE_RULE definition is specified using the following components:- ORDIST_40: This operator specifies that a match on the rule is returned

when the terms are located in the specified order and within a distance of 40 words.

- _m, _o, _p: These arguments specify matches on the CAMERAPrice, AllCameras, and the CAMERA concept, respectively.

- _def: This operator references the concept that follows.- CAMERAPrice, AllCameras, and CAMERA: These referenced definitions

specify that a match is returned when each of these definitions are located in an input document. In this example, buy, Canon SD230, and Nikon are matched.

This rule returns the string match that is circled below:

SAS Sentiment Analysis Studio: User’s Guide 205

Page 214: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Figure 7-18 String Match

Note: You can also specify matches for the same feature for two different products to return a comparative match. In this case you might specify the Dist operator for one set of matches and the AND or SENT operator for the other set.

206 SAS Sentiment Analysis Studio: User’s Guide

Page 215: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

7.6.6 Example: Using Regular Expressions to Match Patterns

Use regular expressions to specify patterns to match with REGEX rules, only. Specify a range of letters or numbers. For example, type a-z or 0-9 inside square brackets ([]) or with a word to write a REGEX, or regular expression, rule.

Display 7-21 REGEX Rule

This REGEX rule uses the following syntax:- \$: The dollar sign ($) is escaped with a backslash (\).- [0-9]: Any number from 0 to 9 can be matched.- +: The plus sign indicates that one, or more, occurrences of a number

between 0 and 9 are returned as matches.The rule match is circled below:

SAS Sentiment Analysis Studio: User’s Guide 207

Page 216: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Figure 7-19 Rule Match

Tip: In this rule, the match is on a Neutral feature node. For this reason, the match is returned in black.

You can also use REGEX rules to match various word spellings. For example:- Arghhhh: Type Arg[h]+.- Liiike: Type Li+ke.- *Super*: Type \*\w+\*.

For more information, see Appendix B: Regular Expressions.

208 SAS Sentiment Analysis Studio: User’s Guide

Page 217: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

7.7 Troubleshoot Your Rules

If you do not obtain the results that you require, or if SAS Sentiment Analysis Studio returns syntax error messages, troubleshoot your rules.Use the following steps, as necessary, to troubleshoot your rules:

1. Rule type: Be sure to specify the rule type that matches your syntax.

2. Text Result tab: Check the matches in this tab to ensure that the results that you expect for your rule are displayed.

3. Related Rules: Check the definitions for any referenced rules to ensure that the matches are returned as expected.

4. Spaces: Are there any spaces at the beginning or end of your rule? If so, the rule might not match.

5. _def: Make sure that you use this marker with any referenced concepts.

6. _c: This marker is required for C_CONCEPT and CONCEPT_RULEs.

7. Arguments: Specify unique arguments for each PREDICATE_RULE.

8. REGEX rules: These rules are used only for regular expressions.

9. Case-sensitivity: At this time, all of the sentiment analysis rules are case-insensitive.

10. Table 7-1 on page 156: Use this table, and its links, to ensure that you are using the appropriate modifiers.

11. Curly braces ({}): Surround any terms that you want to return with curly braces.

12. Colon (:): Did you specify a colon after XML field syntax?

13. sep part-of-speech: Did you remember to specify sep beginning with a lowercase s?

SAS Sentiment Analysis Studio: User’s Guide 209

Page 218: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

210 SAS Sentiment Analysis Studio: User’s Guide

Page 219: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

8 Creating a Hybrid Model

- Understanding a Hybrid Model- Create a Hybrid Model- Test Your Hybrid Model

8.1 Understanding a Hybrid Model

To create a hybrid model, you define both statistical and rule-based models. After you specify all of the settings for these models, including writing rules for the rule-based model, you can develop a hybrid model. The hybrid model applies the rules from the rule-based model and the numerical data from the activated statistical model to your input documents. These technologies work together to locate the expressed sentiment in an input document.

8.2 Create a Hybrid Model

To create a hybrid model, complete these steps:

1. Use Chapter 5: Creating a Statistical Model to define, build, and test your statistical model. When you are satisfied that the testing results meet your requirements, you can activate this model.

2. Use Chapter 6: Creating a Rule-Based Model to define, build, and test your rule-based model. Use Chapter 7: Writing Sentiment Analysis Rules to write the rules for this model. When you are satisfied with your testing results save this model.

211

Page 220: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

3. Select Build —> Set Test Configuration and the Set Test Configuration dialog box appears.

4. (Optional) By default, Test Rule-based Model is selected in the

Default test type field. Click to select Test Hybrid Model.

5. (Optional) Click either or to the right of the Probability threshold field to change the default number 50%. This value is used to determine the overall sentiment of a document. When a document is processed, it is assigned a probability score. If the score of a processed document matches this specification, the sentiment is neutral. If the score exceeds this value, the sentiment is positive. When the score falls below this value, the sentiment is negative.

6. (Optional) Click to select a different statistical model in the Activated statistical model field. Use this operation if you created more than one statistical model.

7. (Optional) Click either or to the right of the Relative weight of positive rules in rule-based model field. Use this operation to change the default number 100%.

212 SAS Sentiment Analysis Studio: User’s Guide

Page 221: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Notes: 100% indicates that the positive rules are treated with the same importance as the negative rules. For more information, see Section 6.2 Understanding Sentiment Computation on page 130.At this time, a change in the relative weight setting does not affect sentiment analysis.

8. (Optional) Click either or to the right of the Weight of statistical model in hybrid model field. Use this operation to change the default weight of a statistical model, which is 70%.

9. Click OK to save these settings.

8.3 Test Your Hybrid Model

After you create your hybrid model, test this model to make sure that the testing results that are returned are expected. To test your hybrid model, complete these steps:

1. Assemble your testing documents. For more information, see Section 5.1 Assembling Training Documents on page 101. Also see Section 6.8 Step 6: Specify the Test Configuration on page 147.

2. After you build and specify the test configuration for your hybrid model, click the Test tab.

3. Select one document, or select a folder of documents to test.

SAS Sentiment Analysis Studio: User’s Guide 213

Page 222: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

4. Click Test to see the testing results in the testing pane and in the Text Result tab.

Tip: The terms that are highlighted in black are matched by the statistical model.

If you do not receive the anticipated results, you can take one of the following steps:

- Use the Set Test Configuration dialog box to reconfigure your hybrid model.

- Make changes to your statistical model.- Make changes to the rules in your rule-based model.

For more information about testing, see Chapter 9: Testing Your Models, Exporting Your Rules.

214 SAS Sentiment Analysis Studio: User’s Guide

Page 223: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

9 Testing Your Models, Exporting Your Rules

- Assembling Testing Documents- Import the Testing Documents- Testing a Model

9.1 Assembling Testing Documents

9.1.1 Overview of Assembling Testing Documents

Assemble at least 10 documents for each of the positive and negative categories of testing documents.Testing documents are used to test each of the models. For this reason, these documents should be representative of the types of documents that you plan to input. Place each of these groups of texts in a taxonomy of folders with a structure that is similar to the example that is shown in Display 9-1 below. (You can also see this structure in the Nikon sample project that is supplied with the application.)

215

Page 224: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Display 9-1 Taxonomy of Testing Documents

After you create your testing directory structure and add all of the documents that you assembled into negative and positive folders, you can test your project. (In this example, the positive and negative folders are labeled pos and neg.)

9.2 Import the Testing Documents

Import your testing documents into the Test pane where you can test them. Test your documents after your specify the definitions for your products, their features, and sentiment in the Rule tab.To import your testing documents, complete these steps:

216 SAS Sentiment Analysis Studio: User’s Guide

Page 225: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

1. Click the Test tab to access the Test Data pane.

2. Right-click in the white space below Manual Test and select New Test Directory.

3. Select the testing folder where you stored your testing documents in neg and pos (or similarly named) subfolders. For more information, see Display 9-1 on page 216.

4. Click OK to import these files.

5. You can expand the folders to access the folders and files that you imported.

SAS Sentiment Analysis Studio: User’s Guide 217

Page 226: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

9.3 Testing a Model

9.3.1 Overview of Testing a Model

After you define and build a model, test this model. You test your model against documents with which you are familiar to ensure that the results that you obtain for your rules are expected.

Tip: In this section, the rule-based model is used as an example. For information and examples specific to the statistical and hybrid models see Chapter 5 and Chapter 8.

9.3.2 Test One Folder

You can test one folder of documents to see the test results for this entire collection of testing documents. For this reason, matches are not displayed in the documents. This operation enables you to use the Graphical Result tab To test one folder of testing documents, complete these steps:

218 SAS Sentiment Analysis Studio: User’s Guide

Page 227: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

1. Select a folder and click Test. The test results for the entire folder appear in the Text Result pane.

For information about these results, see Table 9-1 on page 222.

SAS Sentiment Analysis Studio: User’s Guide 219

Page 228: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

2. Click the Graphical Result tab to see the testing results for the selected folder in bar chart format.

The colors in the Graphical Result pane symbolize the same types of results that are visible in the Text Result pane. For more information, see Table 9-1 on page 222.Sentiment Distribution is divided into overall sentiment and feature distribution. For this reason, the features are determined by your project. In the example above, see the features Overview, Image, and Price.

9.3.3 Test One Document

You can test each document, individually, to see the matched results in the text. You can also see matching and sentiment analysis information for this document in the Text Result pane. To test one document, complete these steps:

1. You can expand each folder to see all of the individual documents.

220 SAS Sentiment Analysis Studio: User’s Guide

Page 229: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

2. Select a document and click Test to see the testing results in the pane under the File field.

Tip: The File field is used only with the Manual Test node. For more information, see Section 9.3.4 Manually Test a Document on page 223.

SAS Sentiment Analysis Studio: User’s Guide 221

Page 230: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

3. You can also see information about the document in the Text Result tab.

Table 9-1: Text Result Tab Information

Information Description

green Highlight positive sentiment expressions.

blue Highlight product and feature matches.

red Highlight negative sentiment expressions.

black Highlight keywords or phrases that match the statistical model or a neutral sentiment node.

overall sentiment The following statement expresses this information. Test in rule-based model is Negative. If the overall sentiment is Positive, or Neutral, one of these words is substituted for the word Negative.

probability This is one of the scores that SAS Sentiment Analysis Studio uses to determine the sentiment of an input document.

confidence This is one of the scores that SAS Sentiment Analysis Studio uses to determine the sentiment of an input document

matches Find information about all of the matching terms that are highlighted in the tested document. This data also includes information about any matched intermediate concepts, if these concepts are enabled. (Intermediate entities are used to define phrases that can be referenced by other concepts. However, when enabled, intermediate entities can also be matched.)

222 SAS Sentiment Analysis Studio: User’s Guide

Page 231: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

After you test your directory, the individual files display the following signs:

- +: for a document that expresses positive sentiment- -: for a document that expresses negative sentiment- *: for a document that expresses neutral sentiment

9.3.4 Manually Test a Document

You can manually test a document that you import. Alternatively, choose to enter text into the test interface. Use this operation for a variety of purposes. For example, choose this operation when you want to test one specific rule against a specific string, paragraph, or a document that is not included in the testing set.

1. Click the Test tab to access the testing pane. The Manual Test node appears in the Test pane.

product prominence

This score indicates where in the document the product is mentioned. This score can take one of these values:

- top 20%: the product is discussed in the first 20% of the text- bottom 80%: the product is discussed in the lower 80% of the

text.- not mentioned: the product is not discussed in the text.

product dominance This score indicates how exclusively this product is mentioned in the document:

- Exclusive: No other product is mentioned in this document except the specified product.

- Dominant: Although other products are mentioned, this product is mentioned more often.

- Average: This product is mentioned as frequently as another product.

- In passing: This product is mentioned infrequently.- Irrelevant: This product is not mentioned in the document.

Table 9-1: Text Result Tab Information (Continued)

Information Description

SAS Sentiment Analysis Studio: User’s Guide 223

Page 232: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

2. Enter text that you copy and paste or that you type into the blank screen below the File field.

Alternatively, click in the File field. Use the SAS Sentiment

Analysis Studio dialog box that appears to select a file to test. For more information, see Section 3.9.16 The Choose a File and Similar Dialog Boxes on page 86.

3. Click Test.

Matches for the rules are highlighted in color in the testing document. Each color represents a different type of match.

224 SAS Sentiment Analysis Studio: User’s Guide

Page 233: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

In the Text Result tab, you can also see matching and other sentiment analysis information such as Test in rule-based model result is Negative. This statement provides the analysis of the overall sentiment for the document. For more information about the colors that are used, see Table 9-1 on page 222.

Note: You can delete the results in the Text Result pane, when you select a test document in one of the folders that you imported and click Test. The results for the new document replace the manual test results. For more information about importing test documents, see Section 9.2 Import the Testing Documents on page 216.

If you test a document that matches a PREDICATE_RULE, the Text Result tab might appear similar to Figure 7-18 on page 206. This example is shown below:

SAS Sentiment Analysis Studio: User’s Guide 225

Page 234: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Display 9-2 String Match in a PREDICATE_RULE

226 SAS Sentiment Analysis Studio: User’s Guide

Page 235: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Appendixes

- Appendix A: The Program Files on page 229- Appendix B: Regular Expressions on page 235- Appendix C: Part-of-Speech Tags on page 239- Appendix D: Recommended Reading on page 245- Appendix E: Glossary on page 247

227

Page 236: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

228 SAS Sentiment Analysis Studio: User’s Guide

Page 237: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Appendix: A The Program Files

- Overview of the Program Files- What Are the Files in the Project Folder?- What Are the Tags in the Project Settings XML File Format?- What Are the Tags in the Rules File?

A.1 Overview of the Program Files

By default, the SAS Sentiment Analysis Studio application is installed in the following folder:

C:\Program Files\Teragram\SAMDisplay A-1: Program Directory

After you create a model, the folders and files for this project appear in a new folder with the name of the new project.

229

Page 238: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Display A-2: Project Directory

A.2 What Are the Files in the Project Folder?

The project folder contains the configuration and the binary files that define the project. If you want to use a previously created project with a new installation of SAS Sentiment Analysis Studio, install the new version in the same folder as the previous version.

Tips: You can create projects in any folder. This statement is not true if you want to use the project settings from a project in an earlier installation. To use these settings, install SAS Sentiment Analysis Studio into the same folder.

See the following description of the files that you can use when you create your project.

Table A-1: Program Files in the Projects Folder

Filename Description

projectname.xml The project configuration file where the project settings are specified. For example, the name of the project, languages, and so on.

230 SAS Sentiment Analysis Studio: User’s Guide

Page 239: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Workshop This directory contains the rules file, model files, intermediate files, and so on.

sam_rule.xml This file stores the settings, taxonomy, and sentiment rules for the rule-based model. This file can be imported into another project.

tmp_li.config This file is the intermediate configuration file that is generated from sam_rule.xml when you compile the rule-based model.

rule_object.sam This file is the rule-based model binary file that is generated by building tmp_li.config when you compile the rule-based model.

hybrid_object.sam This file is the hybrid model binary file that is generated when you build the hybrid model.

Profile_modelname This directory contains all of the files that are generated when you build the statistical model named model name. For example, Profile_bayes_model.

model_name_profile.config The configuration file of the statistical model. The model settings are specified here. For example, see the type of statistical model and training documents.

modelname_stat_object.sam This is the statistical model binary file.

model_name_ss_feature.txt This file lists the keywords that are generated by a statistical model.

model_name_validation.corpus The validation corpus that is used by the validation procedure during statistical model training.

model_name_profile.config.log The log file lists the training results of the statistical model.

model_name_pos_training.lst The training document list for positive sentiment.

model_name_pos_testing.lst The testing document list for positive sentiment.

model_name_neg_training.lst The training document list for negative sentiment.

model_name_neg_testing.lst The testing document list for negative sentiment.

Tmp The temporary directory contains impermanent files generated when a model is trained.

Table A-1: Program Files in the Projects Folder (Continued)

SAS Sentiment Analysis Studio: User’s Guide 231

Page 240: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

A.3 What Are the Tags in the Project Settings XML File Format?

The <projectname>.xml file is an XML file that describes the various settings for the SAS Sentiment Analysis project. See the list of tags and their descriptions in Table A-2 below:

Table A-2: Project Settings XML Tags

Tag Description and Values

SAM_PROJECTV1 Every project setting XML file begins with <SAM_PROJECTV1> and ends with </SAM_PROJECTV1>. All of the other tags lie between the starting and ending SAM_PROJECTV tags

Note: The number after SAM_PREOJECTV is the version number of file format.

StatisticalModelSettings Model settings for the statistical model.

RuleBasedModelSettings Model settings for the rule-based model.

Tokenizer Path to the tokenizer.

SentenceTokenizer Path to the sentence tokenizer.

Tagger Path to the part-of-speech tagger.

NounPhrase Path to the concepts’ file.

StopWords Path to the noise words’ file.

CaseMapping Path to the case mapping file.

Morph Paths to morphological expansion files.

XmlOption List of default XML fields and fields to be ignored in the processing of XML files.

TestSet Test configuration file. For example, positive threshold, relative weight of positive rules, and so on.

Statistical Profiles of created or imported statistical models.

Test Test directories.

Corpora List containing each training corpus.

232 SAS Sentiment Analysis Studio: User’s Guide

Page 241: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

A.4 What Are the Tags in the Rules File?

The sam_rule.xml file in the WorkShop directory is an XML file that contains the settings, product taxonomy, and sentiment rules of rule-based model. The list of tags shown in Table A-3 below are supported for the sam_rule.xml file:

Table A-3: Project Settings XML Tags

Tag Description and Values

SAM_RULE Rules’ XML file begin with <SAM_RULE> and ends with </SAM_RULE>. All of the other tags lie between the starting and ending SAM_RULE tags.

Tools Settings for building the rule-based model.

Tokenizer Path to the tokenizer.

TempFilePrefix Path to the directory for temporary files.

SentTokenizer Path to the sentence tokenizer file.

POStagger Path to the part-of-speech tagger

CaseMapping Path to the case mapping file.

Tlp Path to the morphological expansion file.

TagsEx Path to tags_to_expand file for morphological expansion.

DefaultXMLFields Default XML fields that are specified for use when processing XML documents.

XMLTagsToIgnore Default XML fields that are ignored when processing XML documents.

Product Product taxonomy that contains product definitions and its features.

Feature Product feature containing its definitions, positive rules (positive sentiment about this feature) and negative rules (negative sentiment about this feature).

IntermediateEntity Intermediate concept file.

Polarity Tonal keywords containing positive and negative sentiment.

SAS Sentiment Analysis Studio: User’s Guide 233

Page 242: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

234 SAS Sentiment Analysis Studio: User’s Guide

Page 243: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Appendix: B Regular Expressions

- What Rules and Restrictions Apply to Regular Expressions?- What Special Characters Are Used with Regular Expressions?- What Are the Special Cases

B.1 What Rules and Restrictions Apply to Regular Expressions?

The following rules and restrictions apply to regular expressions:- Any single character a (ASCII 1 through 255, subject to escaping

restrictions in 14 below) is a regular expression. The specified character matches, regardless of case.

- A character class is a regular expression. One or more characters inside square brackets ([]), match any of the characters specified inside of the square brackets. For example, [abc] matches abc. A range inside a character class such as a-z matches any ASCII character whose value is between a through z, inclusive. Any character, including special characters, can appear in a character class. However, \ (backslash), - (hyphen), [ and ] (open and closed brackets) are preceded by a backslash. If you want to return a literal match on these characters, see Section B.3 What Are the Special Cases on page 238.

- A negated character class is a regular expression. One or more characters are inside square brackets, with ^ (caret) being the first character to indicate negation. For example, [^abc] matches any character except a, b, or c. (If you want to return a literal match on a caret, precede the caret with a backslash.)

Note: Matching is case-insensitive.

235

Page 244: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Also see the table below for more information about the rules and restrictions for regular expressions.

Table B-1: More Rules and Restrictions

If Statement Explanation

If a and b are regular expressions

then so is ab that matches whatever a matches followed by whatever b matches (concatenation)

then so is a|b that matches either whatever a matches or whatever b matches

If a is a regular expression

then so is (?:a) that simply serves as a grouping mechanism without remembering what it was grouping. For example, (?:ababb)|b matches either abaab or b. This would be difficult to express without the grouping mechanism.

then so is a* that matches 0 or more occurrences of whatever a matches

then so is a+ that matches 1 or more occurrences of whatever a matches

then so is a? that matches 0 or 1 occurrences of whatever a matches

then so is a{n,m} that matches at least n but no more than m concatenated occurrences of whatever a matches

then so is a{n,} that matches at least n concatenated occurrences of whatever a matches

then so is a(n) that matches exactly n concatenated occurrences of whatever a matches

236 SAS Sentiment Analysis Studio: User’s Guide

Page 245: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

B.2 What Special Characters Are Used with Regular Expressions?

The table below lists, and gives extended meaning to, special characters that are used with regular expressions.

Table B-2: Special Characters in Regular Expressions

Character Meaning

\a Alarm (beep)

\n Newline

\r Carriage return

\t Tab

\f Form feed

\e Escape

\d Digit (same as [0-9])

\D Not a digit (same as [^0-9])

\w Word character (same as [a-zA-Z_0-9])

\W Non-word character (same as [^a-zA-Z_0-9])

\s Whitespace character (same as [ \t\n\r\f])

\S Non-whitespace character (same as [^ \t\n\r\f])

. Wildcard (matches any character)

\xh Hexadecimal number, where h is a hexadecimal character

\xhh Hexadecimal number, where h is a hexadecimal character

\0o Octal number, where o is an octal digit

\0oo Octal number, where o is an octal digit

SAS Sentiment Analysis Studio: User’s Guide 237

Page 246: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

B.3 What Are the Special Cases

There are several special cases for regular expressions. These cases include:[,],(,),?,*,+,.,-,\,|

for metacharacters such as these to have literal meaning, these metacharacters need to be escaped with a backslash (\). If inside a character class, however, only those metacharacters that are explicitly mentioned need escaping.

No support is provided for the following:backward references () as a remembering grouping mechanism.^ as the beginning-of-line zero-width assertion$ as the end-of-line zero-width assertion

Note: Unlike Perl regular expressions, the ^ and $ markers are implicitly assumed.

238 SAS Sentiment Analysis Studio: User’s Guide

Page 247: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Appendix: C Part-of-Speech Tags

The table below provides examples of the majority of morphological feature combinations for English parts of speech. For more information about how these parts of speech are used to write rules, see Section 3.5.14 The Part-of-Speech Tags on page 25. Also see the language book for each language that you purchased.

Table C-1: Part-of-Speech Morphological Features

Code Part-of-Speech Example

A adjective The sky is azure.

ABBREV abbreviation etc.

Acomp comparative adjective The green bag is heavier than the red one.

Adv adverb He is easily the best candidate.

Asup superlative adjective He cooked the best dish.

C conjunction Say nothing of former informers and spies.

239

Page 248: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

date valid date formatsYYYY-MM-DD

YYYYMMDD

YY-MM-DD

YYMMDD

YYYY-MM

YYYYMMs

YY-MM

Standard US Date Formats

MM-DD-YYYY

MM/DD/YYYY

MM-DD-YY

MM/DD/YY

04JAN2001

04jan2001

Det determinant Nothing can be further from the truth.

digit numeric symbols, including floating point decimals

5, 2.14, or 5,254

F French word We went to see the chateaux.

inc unknown word to the part-of-speech tagger

Int interjection Yum!

Md modal verb This might be the best idea.

Mdn't modal verb negated I won't elaborate on this any further.

N noun The e-mail went to the spam folder.

Npl plural noun The geese are leaving for the South.

Table C-1: Part-of-Speech Morphological Features (Continued)

Code Part-of-Speech Example

240 SAS Sentiment Analysis Studio: User’s Guide

Page 249: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Num number She just turned seventeen years old.

PN proper noun We are going to England for vacation.

PossDet possessive determinant It is her choice.

PossPro possessive pronoun The choice is hers alone.

PreDet pre determinant All the king's soldiers could not put him together again.

Prefix prefix The multi-millionaire Soros is going to help us out.

Prep preposition Let's go to grandma's house.

Pro pronoun Give me one of each.

ProMD pronoun contracted with modal If it weren't for him, we'd still be here.

ProV pronoun contracted with a verb we’re

Ptl particle I would go across if I could.

RelPro relative pronoun I want the coin that represents King Kong.

sep separator character matches all punctuation such as , . ; : ?

Table C-1: Part-of-Speech Morphological Features (Continued)

Code Part-of-Speech Example

SAS Sentiment Analysis Studio: User’s Guide 241

Page 250: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

time time formats23:59:59

235959

23:59:59.9942

235959.9942

23:59:59Z

23:59:59.9942Z

235959.9942Z

23:59:59+HH:MM

23:59:59-HH:MM

235959+HHMM

23:59:59.9942Z

235959.9942Z

12:56:32

Table C-1: Part-of-Speech Morphological Features (Continued)

Code Part-of-Speech Example

242 SAS Sentiment Analysis Studio: User’s Guide

Page 251: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

time (continued)

Standard US and British Time Formats

10:15AM

10:15A.M.

10:15am

10.15a.m.

10AM

10A.M.

10am

10a.m.

10:15PM

10:15P.M.

10:15pm

10.15p.m.

10PM

10P.M.

10pm

10p.m.

9:00PM

url urls www.sas.com/success/

V verb You should verbalize your wishes.

V3sg verb, 3rd person singular The boy amuses himself throwing rocks.

V3sgn't verb, 3rd person singular negated This isn't funny.

Ving present participle Why is the hen crossing the street?

Vn't negated verb "it don't mean a thing..."

Table C-1: Part-of-Speech Morphological Features (Continued)

Code Part-of-Speech Example

SAS Sentiment Analysis Studio: User’s Guide 243

Page 252: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Vpp past participle Those tapes were released.

Vpt verb, past tense The president hated broccoli.

Vptn't verb, past tense negated If it weren't for him, we'd still be here.

WAdv w adverb Why do you say that?

WDet w determinant What is he saying?

WPossPro w possessive pronoun Whose hat is this?

WPro w pronoun Whom did you meet?

Table C-1: Part-of-Speech Morphological Features (Continued)

Code Part-of-Speech Example

244 SAS Sentiment Analysis Studio: User’s Guide

Page 253: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Appendix: D Recommended Reading

The following books are recommended as companion guides:- SAS Sentiment Analysis Studio: Installation Guide: Install SAS

Sentiment Analysis Studio.- SAS Sentiment Analysis Server: Administrator’s Guide: Install and set

up SAS Sentiment Analysis Server to automate the application of sentiment extraction in input texts.

- SAS Sentiment Analysis Workbench: Administrator’s Guide: Set up projects that enable analysts to review the sentiment extraction in input documents. You can also assign users to these projects.

- SAS Sentiment Analysis Workbench: User’s Guide: Use SAS Sentiment Analysis Workbench to analyze the expressed sentiment in input documents according to your permission level.

- SAS Sentiment Analysis Workbench: Installation Guide: Install SAS Sentiment Analysis Workbench.

- SAS Contextual Extraction Studio: User’s Guide: Define complex definitions using multiple types of rules. This product is an add-on to SAS Content Categorization Studio. Use these concepts in SAS Sentiment Analysis Studio.

- SAS Content Categorization Studio: User’s Guide: Create a SAS Content Categorization Studio project that specifies concepts that are matched in input texts.

- SAS Content Categorization Studio: Installation Guide: Install SAS Content Categorization Studio.

- Use the language books for each language purchased to see the comprehensive list of part-of-speech tags that are available for concepts.

- SAS offers instructor-led training and self-paced e-learning courses to help you get started with the SAS add-in, learn how the SAS add-in works with the other products in the SAS Enterprise Intelligence Platform, and learn how to run stored processes in the SAS add-in.

245

Page 254: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

For more information about the courses available, see support.sas.com/training.

For a complete list of SAS publications, see the current SAS Publishing Catalog. To order the most current publications or to receive a free copy of the catalog, contact a SAS representative atSAS Publishing SalesSAS Campus DriveCary, NC 27513Telephone: (800) 727-3228*Fax: (919) 677-8166E-mail: [email protected] address:support.sas.com/pubs* For other SAS Institute business, call (919) 677-8000.

Customers outside the United States should contact their local SAS office.

246 SAS Sentiment Analysis Studio: User’s Guide

Page 255: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Appendix: EGlossary

_c

specifies the context for matches._cap

refers to a word beginning with an uppercase letter.CLASSIFIER

specifies a string to match in an input document.CONCEPT

specifies any of the following—a string, token, or an argument—to locate in an input document.

C_CONCEPT

specifies the context for the matches.corpus

specifies a set of training documents. For multiple sets, see corpora.corpora

specifies multiple sets of training documents. See corpus for one set.definition

defines a concept and is used interchangeably with rules. See rule.domain

defined as the taxonomy of nodes for one product that includes its feature and sentiment nodes.

event

used interchangeably with Fact. See fact.fact

links two, or more, concepts to provide otherwise overlooked relationships in input documents. See event.

learned features

keywords generated by a statistical model.

247

Page 256: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

precision

measures the accuracy of the model. It reflects the percentage of documents that were correctly classified.

priority

ranks concepts. It is set in either the Data dialog box or written into a definition.

PREDICATE_RULE

return matches when an operator is specified with arguments.recall

measures the number of documents that are a match for the definition out of those texts that were successfully returned.

REGEX

uses regular expression syntax to define its rule.rule

defines the concept. There can be many rules for each concept definition. This term is used interchangeably with definition. See definition.

sentiment

expresses feeling, or like or dislike. Sentiment determines the expressed feelings within the range of 0 and 1, where 0.5 is neutral, 0 is negative, and 1 is positive. If keywords are used instead, these words are written into concept rules that are used to determine the degree of sentiment expressed in the document.

string

refers to a group of words or characters that you specify for a rule.Syntax error

refers to a mistake in the rule specification. However, this error does not apply to any misspellings of referenced concepts. Referenced concept misspellings are not recognized, at this time, as syntax errors.

Taxonomy consists of one or more product nodes that include its feature and sentiment nodes.

248 SAS Sentiment Analysis Studio: User’s Guide

Page 257: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

token

used as a synonym for a word. Token is not a synonym for the word string that can refer to several words or characters. Token refers to one word, only.

validation

adjust the parameters of the training results with this automatic process.weight

is a number that adjusts the value to prioritize one match, or setting, over another.

SAS Sentiment Analysis Studio: User’s Guide 249

Page 258: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

250 SAS Sentiment Analysis Studio: User’s Guide

Page 259: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Index$ sign .......................................................................................................................207% symbol ................................................................................................................185+ sign .............................................................................................................. 184, 207.li file .........................................................................................................................37.sam file

identifying expressed sentiment ........................................................................11importing a precompiled model ........................................................................66importing a statistical model ...........................................................................121

_cCONCEPT_RULE example ............................................................................173specify matches ...............................................................................................162

_cap marker ............................................................................................................164_def marker .............................................................................................................181

C_CONCEPT example ....................................................................................198match two concepts .........................................................................................201PREDICATE_RULEs .....................................................................................200string match .....................................................................................................205troubleshoot .....................................................................................................209

_ref operatorpartial match example .....................................................................................188specify coreference ..........................................................................................187

_w markermatch any word ...............................................................................................164

AAbout SAS Sentiment Analysis Studio in Help menu .............................................20Activate Model operation .........................................................................................41ALIGNED operator

CONCEPT_RULE example ............................................................................171match on overlap .............................................................................................170

AND operatorCONCEPT_RULE example ............................................................................173match both terms .............................................................................................170

application installation location ..............................................................................229architecture ...............................................................................................................10

251

Page 260: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

BBayes Method .......................................................................................................... 36Best Mode

maximize precision ........................................................................................... 35training algorithms .......................................................................................... 110

black highlighted matches ...................................................................................... 222blue highlighted matches ....................................................................................... 222Body field ................................................................................................................. 51Boolean operators .................................................................................................. 170Build Hybrid Model operation ................................................................................. 20Build Rule-based Model operation .......................................................................... 19Build Statistical Model operation ............................................................................ 19Build the models ...................................................................................................... 20

CC_CONCEPT rule

match in context ................................................................................................ 51overview .......................................................................................................... 155specify spaces ................................................................................................. 169

case-insensitivematching .......................................................................................................... 152troubleshoot .................................................................................................... 209with REGEX rules .......................................................................................... 184

CaseMapping tag ............................................................................................232, 233CLASSIFIER rule

example ........................................................................................................... 192match a string .................................................................................................... 51overview .......................................................................................................... 155

Close operation ........................................................................................................ 17colon characters ..................................................................................................... 169commas

follow definition elements .............................................................................. 168CONCEPT rule

example ............................................................................................164, 183, 194match related terms ........................................................................................... 51overview .......................................................................................................... 155specify spaces ................................................................................................. 169

252 SAS Sentiment Analysis Studio: User’s Guide

Page 261: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

CONCEPT_RULE ruleexample ................................................................................................... 171, 198match related information .................................................................................51OR example .....................................................................................................175overview ..........................................................................................................155specify spaces ..................................................................................................169

confidence as sentiment determinant ......................................................................222context marker (_c) .................................................................................................162Contextual extraction field .......................................................................................37coreference operators ..............................................................................................187corpora

multiple collections of training files ..................................................................10train a classifier .................................................................................................89training documents ...................................................................................... 69, 98

Corpora tab ...............................................................................................................29training documents ............................................................................................30

Corpora tag .............................................................................................................232Corpus pane ..............................................................................................................31corpus used to train statistical model ........................................................................35curly braces

qualify a match ................................................................................................168

DDefaultXMLFields tag ............................................................................................233definition

comprised of rules ................................................................................... 143, 152develop ............................................................................................................139for a keyword ...................................................................................................146total weight ......................................................................................................132total weight for definition ................................................................................137

Delete Model operation ............................................................................................41DIST operator .........................................................................................................170dollar sign ($) ..........................................................................................................207

EExit operation ...........................................................................................................18Export Rules operation .............................................................................................18

SAS Sentiment Analysis Studio: User’s Guide 253

Page 262: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

FFeature tag .............................................................................................................. 233File Content pane ..................................................................................................... 31Font operation .......................................................................................................... 18

GGraphical Result tab

not displayed for precompiled models .............................................................. 68see results in graph format ................................................................................ 39see statistical model testing results ................................................................... 35see the training results for a statistical model ................................................. 114

green highlighted matches ..................................................................................... 222

Hhybrid_object.sam file ............................................................................................ 231

Iicons in the toolbar ................................................................................................... 20Import Learned Features dialog box ........................................................................ 69Import Learned Features operation .......................................................................... 19

import keywords ............................................................................................... 69Import Rules operation ............................................................................................ 18IntermediateEntity tag ............................................................................................ 233

Mmain window ............................................................................................................ 16matches described .................................................................................................. 222Menu bar .................................................................................................................. 16model_name_neg_testing.lst file ........................................................................... 231model_name_neg_training.lst file .......................................................................... 231model_name_pos_testing.lst file ............................................................................ 231model_name_pos_training.lst file .......................................................................... 231model_name_profile.config file ............................................................................. 231model_name_profile.config.log file ....................................................................... 231

254 SAS Sentiment Analysis Studio: User’s Guide

Page 263: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

model_name_ss_feature.txt file ..............................................................................231model_name_validation.corpus file ........................................................................231modelname_stat_object.sam file ............................................................................231Morph tag ...............................................................................................................232

NNegative training documents ....................................................................................31Neutral training documents ......................................................................................31New button ...............................................................................................................21New operation ...........................................................................................................17NounPhrase tag .......................................................................................................232

OOkapi BM25 algorithm .............................................................................................37Open operation ................................................................................................... 17, 21OR operator

locate one match ..............................................................................................170match if one, or more, are present ...................................................................175

ORD operatorrule example ....................................................................................................177specify the matched order ................................................................................170

ORDDIST operatormatch based on order and distance ..................................................................170specify order and distance ...............................................................................180

Output pane in the Rule tab ......................................................................................19overall sentiment for an input document ................................................................222

Pparentheses

group elements ................................................................................................168part-of-speech tags

codes ................................................................................................................239percent symbol (%) .................................................................................................185Pivoted Length Normalization algorithm .................................................................37plus sign (+) in a REGEX rule ...............................................................................207Polarity tag ..............................................................................................................233Positive training documents ......................................................................................31

SAS Sentiment Analysis Studio: User’s Guide 255

Page 264: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

POStagger tag ........................................................................................................ 233PREDICATE_RULE rule

overview .......................................................................................................... 156specify spaces ................................................................................................. 169use arguments to define relationships ............................................................... 51

Preferences operation ..........................................................................................18, 21probability as sentiment determinant ..................................................................... 222Probability threshold ................................................................................................ 36Product tag ............................................................................................................. 233Profile_modelname file .......................................................................................... 231Program and Project title bar ................................................................................... 16Project Settings ........................................................................................................ 22project.xml file ....................................................................................................... 230pseudo code for sentiment computation ................................................................ 130

Qquotation marks ...................................................................................................... 167

RRecent Projects operation ........................................................................................ 18red highlighted matches ......................................................................................... 222References Files pane .............................................................................................. 31REGEX rules

$ sign ............................................................................................................... 207% symbol ........................................................................................................ 185+ sign .............................................................................................................. 207example ........................................................................................................... 207overview .......................................................................................................... 156use regular expressions ..................................................................................... 51

regular expressionsREGEX rule example ..................................................................................... 207troubleshoot .................................................................................................... 209write REGEX rules ......................................................................................... 184

Relative Frequency algorithm .................................................................................. 37rule

benefits ............................................................................................................ 151part of definition ............................................................................................. 152types ................................................................................................................ 155

Rule tab .................................................................................................................... 30

256 SAS Sentiment Analysis Studio: User’s Guide

Page 265: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

see the Search Result tab ...................................................................................63using the search field .........................................................................................49write rules to locate sentiment ...........................................................................42

rule_object.sam .......................................................................................................231RuleBasedModelSettings tag ..................................................................................232Runtime stop words ..................................................................................................38

SSAM_PROJECTV1 tag ..........................................................................................232SAM_RULE tag .....................................................................................................233sam_rule.xml file ....................................................................................................231SAS Sentiment Analysis Studio User’s Guide in Help menu ..................................20Save operation .................................................................................................... 18, 21search field ................................................................................................................49Search Result tab ................................................................................................ 53, 63Search rules operation ........................................................................................ 18, 21SENT operator ........................................................................................................153

match only within a sentence ..........................................................................181match only within sentence .............................................................................170

SentenceTokenizer tag ............................................................................................232SentTokenizer tag ...................................................................................................233Set Percentage For Training operation .....................................................................35Set test configuration button .....................................................................................21Smoothed Relative Frequency algorithm .................................................................37space characters ......................................................................................................169specify rule weights ..................................................................................................52square braces ...........................................................................................................168standard toolbar .................................................................................................. 16, 20statistical model

configure model settings ...................................................................................35import testing documents ................................................................................216settings ...............................................................................................................27testing documents ............................................................................................215

Statistical tab .............................................................................................................29create a mathematical model .............................................................................31specify statistical model settings .......................................................................35

Statistical tag ...........................................................................................................232StatisticalModelSettings tag ...................................................................................232Status Bar operation ..................................................................................................19StopWords tag ........................................................................................................232

SAS Sentiment Analysis Studio: User’s Guide 257

Page 266: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

stopwords.txt ............................................................................................................ 38Summary window .................................................................................................... 28

TTagger tag .............................................................................................................. 232Tags to expand file field. ......................................................................................... 25TagsEx tag ............................................................................................................. 233TempFilePrefix tag ................................................................................................ 233Test Configuration ................................................................................................... 20Test pane

delete a directory ............................................................................................... 60Test tab ..................................................................................................................... 30Test tag ................................................................................................................... 232TestSet tag .............................................................................................................. 232Text normalization model field ................................................................................ 37Text Result tab

accurate rule matches ...................................................................................... 154how to use ......................................................................................................... 38see best model ................................................................................................... 38see matched concepts ........................................................................................ 48see sentiment information ............................................................................... 130see training results .......................................................................................... 113troubleshoot .................................................................................................... 209

Tlp tag .................................................................................................................... 233Tmp file .................................................................................................................. 231tmp_li.config file .................................................................................................... 231Tokenizer tag ..................................................................................................232, 233Toolbar operation ..................................................................................................... 19Tools tag ................................................................................................................. 233Train Model operation ............................................................................................. 41training corpus ......................................................................................................... 35

collection of training files ................................................................................. 10troubleshoot your rules .......................................................................................... 209

UUnclassified training documents .............................................................................. 31understanding overall sentiment ............................................................................ 130Use noun phrase extraction field .............................................................................. 27

258 SAS Sentiment Analysis Studio: User’s Guide

Page 267: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

Use Part-of-speech Tagger .....................................................................................182Use predefined stop words filter field ......................................................................27

VValidate Model operation .........................................................................................41View menu ................................................................................................................20

WWeight field ..............................................................................................................52Workshop file .........................................................................................................231

XXmlOption tag ........................................................................................................232XMLTagsToIgnore tag ...........................................................................................233

SAS Sentiment Analysis Studio: User’s Guide 259

Page 268: Contents....sam The code examples for the .sam file are shown in a fixed-width font. Test button The labels for user interface co ntrols are shown in a bold, sans-serif font. Product

260 SAS Sentiment Analysis Studio: User’s Guide