thesis proposal practive learning: practical active learning, generalizing active learning for...

9
Thesis Proposal PrActive Learning: Practical Active Learning, Generalizing Active Learning for Real-World Deployments

Upload: buck-bruce

Post on 29-Dec-2015

226 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Thesis Proposal PrActive Learning: Practical Active Learning, Generalizing Active Learning for Real-World Deployments

Thesis Proposal

PrActive Learning: Practical Active Learning, Generalizing Active

Learning for Real-World Deployments

Page 2: Thesis Proposal PrActive Learning: Practical Active Learning, Generalizing Active Learning for Real-World Deployments

Generic example system flow for interactive classification problems

Large volume (in millions) of transactions coming in

Majority transactions automatically cleared

Minority transactions flagged for manual

processing

Transactions processed

successfully

Domain specific transaction processing

Credit Card Fraud

transactions

High false positive rates for typical rule-

based/hypothesis systems

Rule Based System to Flag Transactions for Manual

Intervention

Hypothesis/Rule-based system for

flagging exceptions

Page 3: Thesis Proposal PrActive Learning: Practical Active Learning, Generalizing Active Learning for Real-World Deployments

Generic example system flow for interactive classification problems

Large volume (in millions) of transactions coming in

Majority transactions automatically cleared

Minority transactions flagged for auditing

Transactions processed

successfully

Domain specific transaction processing

Machine Learning model

Goal: Optimize Return On Investment of Auditor’s time over long termCommon Characteristics • Skewed class distribution (minority events)• Concept/Feature drift• Expensive domain experts• Biased sampling of labeled historical data• Lots of unlabeled data

Lower false positive rates

based on learning model

Introduce Learning Model to Flag Transactions for

Manual Intervention

Page 4: Thesis Proposal PrActive Learning: Practical Active Learning, Generalizing Active Learning for Real-World Deployments

Interactive Classification Applications

• Fraud detection• Network Intrusion detection• Video Surveillance• Information Filtering / Recommender Systems• Error prediction/Quality Control

Page 5: Thesis Proposal PrActive Learning: Practical Active Learning, Generalizing Active Learning for Real-World Deployments

•Classifier trained from labeled data•Human (user/expert) in the loop using the results but also providing feedback at a cost

•Goal: Maximize the Return on Investment which is equivalent to the productivity of the human

Interactive Classification Setting

Unlabeled + Labeled Data

Trained Classifier

Ranked List scored by classifier

Page 6: Thesis Proposal PrActive Learning: Practical Active Learning, Generalizing Active Learning for Real-World Deployments

Factorization of the problem

Cost (Time of human expert)

Exploration (Future classifier

performance)

Exploitation (Relevancy to the expert)

Exploration-Exploitation Tradeoffs

Cost-Sensitive Active Learning

Standard Ranking / Relevance Feedback Active Learning

Cost-

Sens

itive

Expl

oita

tion

Page 7: Thesis Proposal PrActive Learning: Practical Active Learning, Generalizing Active Learning for Real-World Deployments

Labeled Data (1,…,t-1)

Trained Classifier (1,…,t-1)

Ranked List

Cost (Time of human expert)

Exploration (Future classifier

performance)

Exploitation (Relevancy to the expert)

Labeled Data (t)

Unlabeled Data (t)

Interactive Classification-High Level Picture

Page 8: Thesis Proposal PrActive Learning: Practical Active Learning, Generalizing Active Learning for Real-World Deployments

Thesis Contributions• Problem Statement: How to generalize active learning to incorporate differential

utility of a labeled example(dynamic/variable exploitation), dynamic cost of labeling an example, concept drift in a unified framework that makes the deployment of such learning systems practical

• Contributions– Generalization of Active Learning along the following dimensions

• Differential utility of a labeled example• Dynamic cost of labeling an example• Tackling concept drift• Cost-Sensitive Exploitation• A unified framework to solve these considerations jointly

– First solution: Optimizing joint utility function based on cost, exploration utility and exploitation utility– Second solution: Using Upper Confidence Bound approach with contextual multi-armed bandit setup to incorporate

the different factors

– Empirical Evaluation of the proposed framework• Using evaluation metric motivated by real business tasks• Datasets

– Synthetic dataset– Real world dataset: Health Insurance Claims Rework

• Comparison with multiple baselines based on underlying factors

Page 9: Thesis Proposal PrActive Learning: Practical Active Learning, Generalizing Active Learning for Real-World Deployments

Situating the thesis work wrt related work

Active Learning

Cost-sensitiveProactiveLearning• Unreliable Oracle• Oracle variation

PrActiveLearning• Differential Utility• Dynamic cost• Concept Drift

Efficiency & Representation• Feature level feedback• Feature acquisition• Batch active learning