user modeling through symbolic learning: the lus method and initial results

User Modeling Through Symbolic Learning: The LUS Method and Initial Results

Guido CervoneKen Kaufman

Ryszard Michalski

Machine Learning and Inference LaboratorySchool of Computational Sciences

George Mason UniversityFairfax, VA, USA

{cervone, kaufman, michalski}@gmu.edu

http://www.mli.gmu.edu

Research Objectives

The main objectives of this research are:

(1) To develop a new methodology for user modeling, called LUS (Learning User Style)

(2) To test and evaluate LUS on datasets consisting of real user activity

(3) To implement an experimental computer intrusion detection system based on the LUS methodology

Main Features of LUS

(1) User models are created automatically through a process of symbolic inductive learning from training data sets characterizing users’ interaction with computers

(2) Models are in the form of symbolic descriptions based on attributional calculus, a representation system that combines elements of propositional logic, first-order predicate logic, and multiple-valued logic

(3) Generated user models are easy to interpret by human experts, and can thus be modified or adjusted manually

(4) Generated user models are evaluated automatically on testing data sets using an episode classifier

Terminology

An event is a description of an entity (e.g., a user activity) at a given time or during a given time period, represented by a vector of attribute-values that characterizes the use of the computer by a user at a specific time.

A session is a sequence of events characterizing a user’s interaction with the computer from logon to logoff.

An episode is a sequence of user states extracted from a session that is used for training or testing/execution of user models; it may contain consecutive states or selected states from a session(s).

In the training phase (during which user models are learned) it is generally desirable to use long episodes, as this helps to generate more accurate and complete user models. In the testing (or execution) phase it is desirable to be able to use short episodes, so that a legitimate or illegitimate user can be identified from as little information as possible.

Approach

System polls active processes every half-second and logs information on the processes and the users responsible for them

Data extracted from the logs takes the form of vectors of values of nominal, temporal and structured attributes

Initial experiments concentrated on one attribute, mode, a derived attribute based on the class of process that was running (e.g., compiler)

Data from successive records are combined into n-grams, e.g., <compiler, print, web, print>

Sets of n-grams comprising an episode are passed to the AQ20 learner

AQ20 Algorithm Application

Each training n-gram is used as an example of the class representing the user whose activity it reflects.

To learn a user’s profile, AQ20 divides the n-grams into positive examples (examples representing the user whose profile is being learned) and negative examples (examples representing other users’ activities)

AQ20 searches for maximal conjunctive rules that cover positive examples, but not negative ones, and selects the best ones according to user-specified criteria

The rule:[User = 1] if [mode1 = compiler] and [mode2 and mode4 = print] will be returned in the form:[User = 1] <= <compiler, print, *, print>

Rules and conditions may be annotated with weights (e.g., p, n, u)

EPICn: Episode Classification User Identification Matching Episodes with n-gram Patterns

EPICn matches episodes with n-gram-based patterns of different users’ behavior and computes a degree of match for each user

EPIC employs the ATEST program for matching individual events with patterns

The results from ATEST for each n-gram in the episode are aggregated to give overall episode scores for each class (profile)

EPIC allows flexible classification: all classes whose scores are both above the episode threshold and within the episode tolerance of the best achieved scored are returned as classifications

Experiments

Two sets of preliminary experiments were performed for different training and testing data sizes.Small: First 7 users (SD)Large: All 23 users (LD)

Rules were learned with AQ19 and AQ20, using different control parameters (TF, PD, LEF -- 3 different for SD and LD)

EPICn was used to test the learned hypotheses.

Data Used in the Experiments

24 users for a total of 4,808,024 4-grams. Each user has different number of

sessions, each varying in length. The data contains many repetitions. This is by far the largest dataset AQ20 has

been applied to.

Distribution of the Sessions for Each User

Number of Sessions for each user

0

20

40

60

80

100

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Users

Sess

ions

4-grams for Each UserTotal number of 4-grams for each user

0200000400000600000800000

1000000120000014000001600000

0 1 2 3 4 5 6 7 8 9 10

11

12

13

14

15

16

17

18

19

20

21

22

23

Users

4-g

ram

s

Average number of 4-grams for each user

0

5000

10000

15000

20000

25000

30000

0 1 2 3 4 5 6 7 8 9 10

11

12

13

14

15

16

17

18

19

20

21

22

23

Users

4-g

ram

s

[user = 0 ] <{explorer,web,office,sql,rundll32,system,time,install},

{explorer,web,logon,rundll32,system,time,install},

{explorer,web,office,logon,printing,rundll32,system,time,install}

{web,office,rundll32,system,time,install,multimedia}> : pd=171,nd=52,ud=27, pt=2721, nt=710, ut=160, qd=0.372459, qt=0.60304

[user = 1]

<{netscape,msie,telnet,explorer,web,acrobat,logon,system,welcome,help},

{netscape,msie,telnet,explorer,web,acrobat,logon,rundll32,welcome,help},

{netscape,msie,telnet,explorer,web,acrobat,logon,printing,welcome,dos,help},

{netscape,msie,telnet,explorer,web,acrobat,logon,welcome,dos,help}> : pd=260,nd=54,ud=28,pt=20713,nt=132,ut=2019,qd=0.610064,qt=0.986564...................

Experiment 1A Sample of Results from AQ20 (7 Users)

Distribution of Positive and Negative Events In the Training Set for Each User

(80% of total data; the rest 20% constituted the testing dataset)

User Distinct + Distinct - Total + Total -

0 345 5236 3573 616828

1 348 5214 20858 671154

2 784 4497 19477 570508

3 226 5253 9351 627480

4 3006 2012 92626 545656

5 142 5537 59524 647063

6 865 4413 506532 84895

Predictive Accuracy of User ModelsGenerated Using PD mode and LEF1 (MaxNewPositives,0; MinNumSelectors,0)

Confidence Matrix for 100% of the training data and 100% of the testing data, PD Mode, Default LEF

0

0.05

0.1

0.15

0.2

0.25

0.3

De

gre

e o

f m

atc

h

User 0 0.17 0.11 0.13 0.12 0.13 0.16 0.11

User 1 0.14 0.2 0.08 0.11 0.1 0.09 0.11

User 2 0.16 0.14 0.24 0.16 0.16 0.15 0.14

User 3 0.12 0.1 0.12 0.18 0.14 0.14 0.12

User 4 0.15 0.17 0.23 0.17 0.23 0.15 0.19

User 5 0.13 0.13 0.09 0.12 0.08 0.17 0.1

User 6 0.13 0.16 0.12 0.14 0.16 0.14 0.24

User 0 User 1 User 2 User 3 User 4 User 5 User 6

Predictive Accuracy of User ModelsGenerated Using PD mode and LEF2 (MaxQ,0;MaxNewPositives,0; MinNumSelectors,0)

Confidence Matrix for 100% of the training data and 100% of the testing data, PD Mode, MaxQ

0

0.05

0.1

0.15

0.2

0.25

0.3

De

gre

e o

f m

atc

h

User 0 0.18 0.11 0.13 0.12 0.13 0.16 0.11

User 1 0.14 0.2 0.08 0.11 0.1 0.1 0.11

User 2 0.17 0.14 0.24 0.16 0.16 0.15 0.14

User 3 0.11 0.09 0.11 0.19 0.14 0.14 0.12

User 4 0.15 0.17 0.23 0.17 0.23 0.14 0.19

User 5 0.13 0.12 0.09 0.1 0.08 0.18 0.09

User 6 0.13 0.16 0.12 0.14 0.16 0.13 0.24


Predictive Accuracy of User ModelsGenerated Using PD mode and LEF3 (MaxTotQ,0; MaxNewPositives,0; MinNumSelectors,0)

Confidence Matrix for 100% of the training data and 100% of the testing data,PD Mode, MaxTotalQ

0

0.05

0.1

0.15

0.2

0.25

0.3

De

gre

e o

f m

atc

h

User 0 0.18 0.11 0.13 0.12 0.13 0.16 0.11

User 1 0.14 0.2 0.09 0.11 0.1 0.1 0.11

User 2 0.17 0.14 0.24 0.16 0.16 0.15 0.14

User 3 0.12 0.09 0.11 0.19 0.14 0.14 0.12

User 4 0.15 0.17 0.23 0.17 0.23 0.14 0.19

User 5 0.12 0.12 0.09 0.1 0.07 0.18 0.09

User 6 0.13 0.16 0.12 0.14 0.16 0.13 0.24


Sample Results Using TF

Confidence Matrix for 100% of the training data and 100% of the testing data, TF Mode, Default LEF

0

0.05

0.1

0.15

0.2

0.25

0.3

De

gre

e o

f m

atc

h

User 0 0.17 0.11 0.13 0.12 0.13 0.15 0.11

User 1 0.15 0.2 0.09 0.11 0.1 0.09 0.11

User 2 0.16 0.14 0.24 0.16 0.16 0.15 0.14

User 3 0.12 0.1 0.12 0.2 0.13 0.15 0.12

User 4 0.15 0.17 0.22 0.18 0.24 0.14 0.19

User 5 0.12 0.12 0.09 0.1 0.08 0.19 0.09

User 6 0.13 0.16 0.12 0.14 0.16 0.13 0.24


Sample Rules for User 0PD mode LEF1

# -- This learning took: # -- System time 10.45 # -- User time 10 # -- Number of stars generated = 46 # -- Number of rules for this class = 42 # -- Average number of rules kept from each stars = 1

# -- Size of the training events in the target class: 345 # -- Size of the training events in the other class(es): 5236 # -- Size of the total training events in the target class: 3573 # -- Size of the total training in the other class(es): 616828

[User = 0] <{mail,office,printing,rundll32,system,time,install}

{web,rundll32,system,time,install}{explorer,web,mail,office,logon,rundll32,system,install,multimedia} {explorer,web,office,logon,sql,rundll32,system,help,install,multimedia}> : pd=149,nd=20,ud=22,pt=2490,nt=75,ut=37,qd=0.377406,qt=0.676398

<{explorer,web,office,sql,rundll32,system,time,install,multimedia} {explorer,web,office,logon,sql,rundll32,system,time,install} {web,office,logon,sql,printing,rundll32,system,time,install}{web,rundll32,system,time,install}> : pd=136,nd=30,ud=8,pt=2481,nt=1148,ut=14,qd=0.318267,qt=0.473443

<{explorer,web,rundll32,system,multimedia}{explorer,system,time,install}{explorer,rundll32,system,time,install,multimedia}{explorer,rundll32,system,time,install,multimedia}> : pd=107,nd=21,ud=32,pt=2453,nt=930,ut=474,qd=0.255909,qt=0.496713

Experiment 2

In this experiment hypotheses were generated to describe the behavior of all 24 users

The training set consisted of approximately 4 million 4-grams

The testing set consisted of approximately 1 million 4-grams

Description of Experiment 2

Experiments were performed using 20% and 100% of the training set (which constituted 80% of the sessions that make up the training set)

Experiments were performed in PD and TF modes

Three different LEFs were used: LEF1: (TF MODE) <MaxNewPositives,0; MinNumSelectors,0>

LEF2: (TF MODE) <MaxEstimatedPositives, 0; MinEstimatedNegatives, 0; MaxNewPositives,0, MinNumSelectors,0>

LEF3: (PD MODE) MaxQ, 0, MaxNewPositives,0, MinNumSelectors,0

Experiment 2

When combining all of a user’s testing data into a single long episode, out of the 24 users:20 users classified correctly.3 users could not be classified because the

degrees of match of the best-scoring users were insufficiently separated

1 user was classified incorrectly

Users 0-2

Predictive Accuracy for 100% testing, 100% training

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0 1 2

Users

Users 3-5


0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

3 4 5

Users

Users 6-8


0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

6 7 8

Users

Users 9-11


0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

9 10 11

Users

Users 12-14


0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

12 13 14

Users

Users 15-17


0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

15 16 17

Users

Users 18-20


0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

18 19 20

Users

Users 21-23


0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

21 22 23

Users

Deg

ree

of M

atch

Sample Rules for User 0

# -- This learning took: # -- System time 767.15 # -- User time 768 # -- Number of stars generated = 57 # -- Number of rules for this class = 52 # -- Average number of rules kept from each stars = 1


[user=0] <- <explorer,install,multimedia,system,time> <multimedia,system> <explorer,install,system> <explorer,install,multimedia,system> : pd=64,nd=31,ud=8,pt=916,nt=404,ut=11,qd=0.124322,qt=0.348035 # 18648

<- <explorer,install,office,rundll32,system,time> <multimedia,system> <install,multimedia,rundll32,system,time> <explorer,install,rundll32,system,time> : pd=68,nd=42,ud=9,pt=919,nt=73,ut=11,qd=0.121131,qt=0.466232 # 24747

<- <explorer,help,install,mail,multimedia,rundll32,system,time,web> <help,install,logon,mail,office,rundll32,system,time,web> <help,install,mail,office,printing,rundll32,system,time,web> <help,install,rundll32,system,time> : pd=140,nd=343,ud=41,pt=1316,nt=701,ut=66,qd=0.1159,qt=0.470102 # 5068

<- <install,office,printing,system> <install,rundll32,time> <install,multimedia,office,sql,system,web> <explorer,install,multimedia,rundll32,system,web> : pd=43,nd=4,ud=2,pt=397,nt=4,ut=2,qd=0.11365,qt=0.215245 # 7642

Best Rule for User 23

# -- This learning took: # -- System time -39.9073 # -- User time 4256 # -- Number of stars generated = 658 # -- Number of rules for this class = 533 # -- Average number of rules kept from each stars = 1


[user=23> <- <ControlPanel,activesync,id,mail,multimedia,netscape,network,spreadsheet,system,wordprocessing> <ControlPanel,activesync,explorer,id,logon,mail,msie,multimedia,netscape,network,printing,spreadsheet,web,wordprocessing> <ControlPanel,activesync,mail,multimedia,netscape,printing,spreadsheet,wordprocessing> <ControlPanel,activesync,mail,multimedia,netscape,spreadsheet,web,wordprocessing> : pd=4685,nd=878,ud=975,pt=1296647,nt=1166,ut=34254,qd=0.388046,qt=0.967985 # 3524022

Experiments with Smaller Test Episodes

In experiments with 150 session-sized testing episodes, some performed with traditional “best matching” and others with threshold-tolerance matching, identification accuracy was as follows:

Traditional ATEST (Rform) scoring, threshold-tolerance matching: 169 classifications, 75 correct, 84 incorrect

Traditional, best only matching: 71 (47.3%) correct

Simple scoring, threshold-tolerance matching: 165 classifications, 117 correct, 48 incorrect

Simple scoring, besst-only matching: 112 (74.7%) correct

Prediction-Based Approach

In the prediction-based approach, events characterizing a user are pairs

<predecessor, successor>,

where:

predecessor is a sequence of lb states of the user (in the experiments, modes) that directly precede a given time instance t, and successor is a sequence of lf states of the user (in the experiments, modes) that occur immediately after t.

Parameters lb and lf, called look-back and look-forward respectively, are determined experimentally.

An Initial Small Experiment

Rules were learned using decomposition model with lookback of 1, 2, 3, and 4. The results provided by EPICp were as follows:

CONFUSION MATRIX

Data-1 Data-2 Data-3

User 1: 374 86 66

User 2: 202 141 130

User 3: 176 97 557

Topics for Further Research

1. Comparative study of the n-gram-based methodology for currently available datasets using different control parameters

2. Study performance degradation on reduced session size

3. Annotate process tables with window information

4. Testing the ability to identify unknown users

5. Development and implementation of a prediction-based approach using a dedicated program sequential pattern discovery (SPARCum)

6. Employment of multivariate representation, e.g., <mode, process name, time>

7. Improving the representational space through constructive induction

8. Handling drift and shift of user models

9. Coping with incremental growth and change in the user population

Conclusions

LUS methodology uses symbolic learning to generate user signatures

Unlike traditional classifiers, EPICn classifies based on episodes rather than individual events

Initial experiments have been promising, but several real world situations have yet to be addressed in full

Multistrategy approaches may lead to further performance improvement

user modeling through symbolic learning: the lus method and initial results

Documents