combining least absolute shrinkage and selection operator (lasso) and heat map visualization for...
TRANSCRIPT
Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat
Map Visualization for Biomarkers Detection of LGL Leukemia
By: David Garcia
Table of Contents
• What is LASSO?• How does LASSO Work?• LASSO and Feature Selection• LGL Leukemia• Statistical Biomarker Discovery• Methods and Results• Questions
What is LASSO?
• LASSO = Least Absolute Shrinkage and Selection Operator
• Developed by Robert Tibshirani in 1996
• LASSO is a method of feature selection
What is LASSO?
• Estimates regression coefficients bi for each feature xi
• Uses a penalty function via a tuning parameter l
• Sets coefficients of less relevant features to zero
How Does LASSO Work?
Regression Equation:
ŷ = b0 + b1x1 + b2x2 + … + bnxn
x1, x2, ..., xn are the variables/features
ŷ is the predicted outcome
How Does LASSO Work?
y1 = b0 + b1x11 + b2x12 + … + bnx1n
y2 = b0 + b1x21 + b2x22 + … + bnx2n
.
.
.
ym = b0 + b1xm1 + b2xm2 + … + bnxmn
How Does LASSO Work?
How Does LASSO Work?
y1 = b0 + b1x11 + b2x12 + … + bnx1n + e1
y2 = b0 + b1x21 + b2x22 + … + bnx2n + e2
.
.
.
ym = b0 + b1xm1 + b2xm2 + … + bnxmn + em
HOW DOES LASSO WORK?
• GOAL:
find b0, b1, …, bn that minimize the
square of the total prediction error
(e1 + e2 + ... + em)2
HOW DOES LASSO WORK?
• GOAL:
find b0, b1, …, bn that minimize the
square of the total prediction error
HOW DOES LASSO WORK?
• GOAL:
find b0, b1, …, bn that minimize the
square of the total prediction error
How Does LASSO Work?
• Presence of dependent variables (xi) leads to regression coefficients (bi) with very large variances
• Tuning parameter l used to restrict the regression coefficients
b0 + b1 + … + bn ≤ c
How Does LASSO Work?
-c
-c
c
c
LASSO and Feature Selection
• Use of l drives less relevant bi to zero
• LASSO can be used to filter features that contribute less to the expected result
ŷ = b0 + b1x1 + b2x2 + b3x3 + b4x4
LASSO and Feature Selection
• Use of l drives less relevant bi to zero
• LASSO can be used to filter features that contribute less to the expected result
l = 0.5
ŷ = b0 + 0x1 + b2x2 + 0x3 + b4x4
LASSO and Feature Selection
• LASSO can be used in bioinformatics to select genes that may contribute more to the presence of disease
l = 0.5
ŷ = b0 + 0x1 + b2x2 + 0x3 + b4x4
xi is the transcription level of gene i
ŷ is the presence or absence of disease
LGL Leukemia
• LGL = large granular lymphocytic
• Results from lack of programmed cell death
• No current standard treatment
Statistical Biomarker Discovery
• Other methods of biomarker detection select genes based on biomedical perspectives
• Proposed method uses a purely statistical approach
• Results need to be verified via further biomedical studies
Methods and Results
• sample of 45 subjects with 10444 attributes
• 37 infected / 8 normal
• y = 0 for normal / 1 for infected
• sample data standardized based on z score
• combination of heat map visualization and LASSO
Methods and Results
Methods and Results
• Testing set contains one sample
• Leave-one-out cross validation used to choose optimal l
• Authors choose l that results in the most shrinkage with a mean squared error within one standard error of the minimum• l = 0.02868446
Methods and Results
Methods and Results
• 21 genes selected from LASSO method
• "FCGBP", "KIT", "CD34", "NLGN2", "SPINK2", "HIPK1", "SNORA31", "NR4A3", "SNORA27", "CASK", "SNORA4", "ACSM3", "NELL2", "NAGPA", "VPS25", "LYZ", "DUSP2", "GOLGA8A", "PHGDH", "SERF1A“, "TNFSF9"
Methods and Results
• Database for Annotation, Visualization and Integrated Discovery (DAVID) tool used to classify genes
• One gene shows potential as LGL leukemia biomarker
Questions