1 strategies for managing missing or incomplete data in biometric and business applications mark...

54
1 Strategies for Managing Missing or Incomplete Data in Biometric and Business Applications Mark Ritzmann Pace University March17, 2007

Post on 21-Dec-2015

221 views

Category:

Documents


2 download

TRANSCRIPT

1

Strategies for Managing Missing or Incomplete Data in Biometric and

Business Applications

Mark Ritzmann

Pace University

March17, 2007

2

Contents

Overview Essence and Significance of Work Experiment Design Outcomes Analysis Future Work

3

Contents

Overview Essence and Significance of Work Experiment Design Outcomes Analysis Future Work

4

OverviewEssence of this work

Address the problem of missing or incomplete data and put forth strategies to overcome that problem

Add to the accuracy of existing Keystroke Biometric Recognition System

Apply finding to other application areas

5

OverviewThe Impact of Missing data

<1% considered trivial 1-5% considered manageable 5-15% requires sophisticated methods >15% may severely impact any interpretation

P. Liu & L. Lei, Missing Data Treatment Methods and NBI Models, Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications, IEEE, 2006

6

OverviewMissing Data Mechanisms

MCAR – Missing Completely at Random MAR – Missing At Random NMAR – Not missing at Random

Most missing data treatment methods assume missing is MAR

P. Liu & L. Lei, Missing Data Treatment Methods and NBI Models, Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications, IEEE, 2006

7

OverviewMissing Data Treatment, High Level

Heuristic Statistical

•Based on established rules and guidelines

•Similar to an expert system

•Association is prime example

•Existing data used to calculate missing data

•Care need to be taken not to over fit

•Mean/mode is prime example

8

OverviewMissing Data Treatment Methods

Case Deletion Parameter Estimation Mean/Mode Imputation Method of Assigning All Possible Values of the Attribute Regression Imputation Hot Deck Imputation and Cold Deck Imputation Multiple Imputation K-Nearest Neighbor Imputation Internal Treatment Method

9

OverviewBiometric background

Roots in CIA & Dept of Defense work Early Issues – technology, cost, lack of standards Basic Uses

– Verification (easier of the two; yes/no)– Identification (harder of the two; 1 of n)

Basic types– Physiological – generally do not change– Behavioral – can change, easier to mimic

10

OverviewBiometric Issues

BIOMETRICS:CHALLENGES

& CAVEATS

Operational•Lab vs Field•Scalability•Continuous Authentication•Security

System•Business Process•Design•Control•Enrollment Challenge•System Downtime•Availability of template database•Effects of malicious code

Business•Financial feasibility•Interaction with traditional controls•Application not subject to rigor•Incompatibility with business partners•Transition to e-business•Control locus

People•User confidence•Privacy issues•User preferences•User acceptance•User profile•Trust

Legal & Regulatory•Lack of precedence•Ambiguous process•Imprecise definition•Logistics of proof of defense Technical

•Adaptation•Hardware•Evolving nature of technology•Scattered proliferation & polarization•Uniqueness of biometric•Scalability

A. Chandra & T. Calderon, Challenges and Constraints to the Diffusion of Biometrics Information Systems, Communications of the ACM, December 2005, Vol 48, No 2

11

OverviewPrivacy Issues – special mention

Opt in/Opt out– Any application or web site that used this system would

need to do so with full disclosure. The user could then knowingly decide.

Dictated environment– Any corporate or instructional e-mail system where the

ultimate ownership of the keystroke resides with that entity

Capture results, not text itself– Use keystrokes to authenticate/identify, not the words

themselves or the intact messages

12

OverviewKeyboard Biometric Studies in the Literature

Key Concepts– Copy vs Free– Authentication vs Identification

Classic Studies– Gaines, 1980– Umphress & Williams, 1985– Leggett & Williams, 1988– Joyce & Gupta, 1990– Bleha et al, 1990– Brown & Rogers, 1993

Recent Studies – University of Torino Pace University contributions

13

Contents

Overview Essence and Significance of Work Experiment Design Outcomes Analysis Future Work

14

Essence and Significance of WorkHigh Level Objectives

Improve the accuracy of the

current Keystroke Biometric

Recognition System

Develop strategies to manage the significant problem of missing or

incomplete data

Apply findings to other areas

1

2

3

15

Essence and Significance of WorkDetailed Objectives

First Objective:

Improve the accuracy of the current Keystroke Biometric Recognition system by improving the FALLBACK model invoked when a sample is of insufficient size

Second Objective:

Gain insight as to the effectiveness and application of MISSING DATA strategies and decision making with incomplete information

Third Objective:• Identify a potential application for a Keystroke Biometric recognition system

•Project the findings to other potential areas

16

Contents

Overview Essence and Significance of Work Experiment Design Outcomes Analysis Future Work

17

Experiment DesignRe-use of assets from previous Pace work

Data set Features/feature extraction Tests Optimal settings

18

Experiment DesignFuture inclusion ?

19

Experiment Design6 Test scenarios

Dr. Mary Vilani, Spring 2006Used with permission

20

Experiment DesignFeature set

Dr. Mary Vilani, Spring 2006Used with permission

21

Experiment DesignSummary of Subject Participation

Subjects by Experiment 36 subjects all four quadrants

52 subjects 1. Copy Task

40 subjects 2. Free Text

93 subjects 3. Desktop

47 subjects 4. Laptop

41 subjects 5. Desk Copy / Lap Free

40 subjects 6. Lap Copy / Desk Free

Dr. Mary Vilani, Spring 2006Used with permission

22

Experiment DesignData/Sample Capture Application

Dr. Mary Vilani, Spring 2006Used with permission

23

Experiment DesignApplication Version 2.0 - developed Fall, 2006

Development and Implementation of 2 additional Fallback Models

Tremendously enhanced Testing functionality Development and Implementation of Trace

Mechanism

24

Experiment DesignNew Bio Feature Extractor Interface

25

Experiment DesignNew Classifier Interface

26

Experiment DesignHigh Level Overview of Fallback Models

Heuristic Statistical

Touch Type Model

Statistical Model

Linguistic Model

New Models

27

Experiment DesignOverview of Models

Linguistic

Touch Type Statistical

28

Experiment DesignLinguistic Fallback Model - Duration

29

Experiment DesignLinguistic Fallback Model - Transition

30

Experiment DesignTouch Type Fallback Model - Background

Touch Type approach invented by Frank Edgar McGurrin in late 1800’s

– Won speed contest on July 25, 1888– Was front page news

Touch Type Idea - use sense of touch rather than sight (looking at key label)

Most keyboards still have raised indicator on “f” and “j” to indicate home position

31

Experiment DesignTouch Type Fallback Model

32

Experiment DesignTouch Type Fallback Model - Duration

A Q

Z 1 S W

X 2D E

C 3

F G R T V

B 4 5

H J Y U N

M 6 7

K I

, 8

L O

. 9

; P

/ 0

LeftLittle

LeftRing

LeftMiddle

LeftIndex

RightIndex

RightMiddle

RightRing

RightLittle

All Left Hand All Right Hand

All Keys

33

Experiment DesignTouch Type Fallback Model - Transition

E/A

Letter/letter

Left/left Right/right Left/right Right/left

R/EA/T

E/SS/TE/R

O/NI/N

T/IE/N

A/NT/H

O/RN/D

H/E

34

Experiment DesignStatistical Fallback Model

For Duration – Mean Imputation For Transition – Multiple Imputation

– Mean and Standard deviation calculated on transition full data set

– Any value >1 Standard deviation from the mean was removed

– New mean and standard deviation calculated on remaining data

– Process repeated 3 times

35

Experiment DesignStatistical Fallback Model – Duration Clusters

36

A

S

WDE C F

G

RT

B

H

Y

U

N

M

I

, L

O.

P

-

CLUSTER 1All Keys

CLUSTER 4CLUSTER 8

CLUSTER 6CLUSTER 7 CLUSTER 5

CLUSTER 2

CLUSTER 3CLUSTER 9

NODE A

UNDER 100

OVER 100

NODE B

Experiment DesignStatistical Fallback Model - Duration

37

Experiment DesignStatistical Fallback Model – Transition development

Data Compacting

Sample Size

% of sample left after

outlier wash

100%Data

Compacting process

38

Experiment DesignStatistical Fallback Model – Transition, Raw Order

39

Experiment DesignStatistical Fallback Model – Transition, Cluster Development

40

Experiment DesignStatistical Fallback Model - Transition

E-R R-E

T-H

O-R

O-N

E-NA-T T-I

E-S

A-N H-E

N-D

E-A

S-T

I-N

Under 50 Over 50

Node A Node BNode C

Node D

Node 1 Node 2 Node 3Node 4

Any/Any

41

Contents

Overview Essence and Significance of Work Experiment Design Outcomes Analysis Future Work

42

OutcomesResults Comparison

43

Contents

Overview Essence and Significance of Work Experiment Design Outcomes Analysis Future Work

44

AnalysisFallback Trace

45

Experiment DesignLinguistic Fallback Model – Duration (repeat of previous)

46

AnalysisProposed Second Generation Touch Type Fallback Model - Duration

A Q

Z S W

XD E

C

F G R T V

B

H J Y U N

M

K IL O

P

LeftLittle

LeftRing

LeftMiddle

LeftIndex

RightIndex

RightMiddle

RightRing

RightLittle

All Left Hand All Right Hand

All KeysRed Circles remain as leafsAll else falls back to next level

47

Contents

Overview Essence and Significance of Work Experiment Design Outcomes Analysis Future Work

48

Future WorkTwo Main Areas

Academic– Hybrid System development – keystroke, mouse

movement, stylistic– Principle Components– Eigen Values

Application– For Keystroke Biometric system:

Academic – online testing Biometric Marketing

– For General Missing data, analytical applications

49

Future WorkKey success factors to system acceptance

Robustness – level of trust Acceptance Level – support by third party processes Cost – hardware/software, communications and support Ease of Use/Portability – extent of support across client

machines Security – privacy, integrity, and non-repudiation

“future research into the use of biometric technology in online marketing applications must consider not only technical

feasibility, but also social and legal acceptability.”

50

Future WorkBiometric Marketing

Use of Biometric technology to identify and segment users/consumers

What you have to believe:– Segmentation is better– Short + short + short = long for sampling

Chat rooms, e-mails etc.

51

Future WorkAnalytical Applications

Currently growing in use and acceptance Can be assumed Missing Data problem is

present– SAP considers <10% a non-factor– IBM identifies missing data, but does not manage– Case deletion most prevalent– Advanced strategies not identified

52

Future WorkAnalytical Applications - Examples

53

Future WorkAnalytical Applications - Examples

54

Future WorkAnalytical Applications - Examples

•Non Performing Loan Analysis•Organization Unit Profitability•Performance Measurement•Staffing Analysis

Store OperationsStore OperationsManagementManagement

•Activity Based Costing Analysis•Location Exposure•Location profitability•Loss Prevention Analysis

•Store Location Analysis•Store Optimization Analysis•Suspicious Activity Analysis

•Capital Allocation Analysis•Credit Risk Analysis

Corporate FinanceCorporate FinanceManagementManagement

•Financial Management Accounting•Income Analysis

•Campaign & Promotion Analysis•Cross Purchase Behavior•Cross Sell Analysis•Customer Attrition Analysis•Customer Complaints Analysis•Customer Credit Risk Profile•Customer Delinquency Analysis•Customer Interaction Analysis

Customer Customer ManagementManagement

•Customer Lifetime Value Analysis•Customer Loyalty•Customer Movement Dynamics •Customer Profile Analysis•Customer Profitability•Involved Party Exposure•Lead Analysis•Market Analysis

•Service Delivery Analysis•Transaction Profitability Analysis•Vendor Performance Analysis

Products & Products & ServicesServicesManagementManagement

•Business Performance Analysis•Planning and Forecasting Analysis•Product Analysis•Product profitability

•Assortment and Allocation Analysis•Inventory Analysis•Physical Merchandising / Space Management Analysis•Pricing Analysis•Promotion Analysis

MerchandisingMerchandisingManagementManagement