get rich and cure cancer with support vector machines
DESCRIPTION
Get Rich and Cure Cancer with Support Vector Machines. (Your Summer Projects). Kernel Trick. https://www.youtube.com/watch?v= 3liCbRZPrZA. This is achieved with a polynomial kernel. Feature map: Kernel:. Optimization of transformed problem: Only kernel matters. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Get Rich and Cure Cancer with Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815249550346895dc087bd/html5/thumbnails/1.jpg)
+
Get Rich and Cure Cancer
with Support Vector Machines
(Your Summer Projects)
![Page 2: Get Rich and Cure Cancer with Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815249550346895dc087bd/html5/thumbnails/2.jpg)
+Kernel Trick
https://www.youtube.com/watch?v=3liCbRZPrZA
![Page 3: Get Rich and Cure Cancer with Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815249550346895dc087bd/html5/thumbnails/3.jpg)
+This is achieved with a polynomial kernel
Feature map:
Kernel:
![Page 4: Get Rich and Cure Cancer with Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815249550346895dc087bd/html5/thumbnails/4.jpg)
+Optimization of transformed problem: Only kernel matters Dual Lagrangian for transformed problem:
Optimal weight vector:
Thus, optimal hyperplane:
![Page 5: Get Rich and Cure Cancer with Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815249550346895dc087bd/html5/thumbnails/5.jpg)
+Kernel Trick We can choose the kernel without first defining a
feature map.
How to get a feature map from a kernel?
Define
i.e. map vectors in the original feature space to functions.
Inner product on transformed space:
![Page 6: Get Rich and Cure Cancer with Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815249550346895dc087bd/html5/thumbnails/6.jpg)
+Get rich off of support vectors
![Page 7: Get Rich and Cure Cancer with Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815249550346895dc087bd/html5/thumbnails/7.jpg)
+Making 5-day forecasts of financial futures
Given data on the returns for 5 days
Predict the return on the next day
To achieve this, we need to figure out which 5-day stretches tend to predict good returns on the 6th day, and which predict not-so-good returns
A training data set is used for this purpose
![Page 8: Get Rich and Cure Cancer with Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815249550346895dc087bd/html5/thumbnails/8.jpg)
+Making 5-day forecasts of financial futures
Day 1 Day 2 Day 3 Day 4 Day 5
x11 x12 x13 x14 x15
x21 x22 x23 x24 x25
x31 x32 x33 x34 x35
x41 x42 x43 x44 x45
… … … … …
Day 6
y1
y2
y3
y4
y5
5-dimensional feature space Return on 6th day is classifier for data
Routine learns how to classify 5-day-return data points by working with a training data set for 500 days. Constructs a dividing hypersurface and uses it to decide what the 6th-day return should be for new data points.
![Page 9: Get Rich and Cure Cancer with Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815249550346895dc087bd/html5/thumbnails/9.jpg)
+Good results – you can try it yourself!
Complete with R code: http://www.r-bloggers.com/trading-with-support-vector-machines-svm/
![Page 10: Get Rich and Cure Cancer with Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815249550346895dc087bd/html5/thumbnails/10.jpg)
+Another example: gene expression in normal and cancerous tissue
Gene = unit of heredity
Human genome contains about 21,000 genes
Public domain image from Wikipedia
![Page 11: Get Rich and Cure Cancer with Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815249550346895dc087bd/html5/thumbnails/11.jpg)
+Another example: gene expression in normal and cancerous tissue
DNA transcribes to RNA which translates to proteins
This is the process whereby the “genetic code” is made manifest as biological characteristics (genotype gives rise to phenotype)
Wikimedia Commons image by Madeleine Price Ball
![Page 12: Get Rich and Cure Cancer with Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815249550346895dc087bd/html5/thumbnails/12.jpg)
+Big question: Which genes are responsible for which outcomes?
In various tissues (e.g. tumor versus normal), which genes are active, hyperactive, and silent?
Can use DNA microarrays to measure gene expression levels.
![Page 13: Get Rich and Cure Cancer with Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815249550346895dc087bd/html5/thumbnails/13.jpg)
+DNA Microarray
https://www.youtube.com/watch?v=_6ZMEZK-alM
Source: National Human Genome Research Institute
![Page 14: Get Rich and Cure Cancer with Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815249550346895dc087bd/html5/thumbnails/14.jpg)
+Using support vector machines to determine which genes are important for cancer classification
![Page 15: Get Rich and Cure Cancer with Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815249550346895dc087bd/html5/thumbnails/15.jpg)
+Data
Data points: Patients
Features: Gene expression coefficients (activity level of a given gene)
Feature space will have a huge number of dimensions! Need a way to reduce.
Could examine all possible subspaces of feature space, but note that if dimension (N) of feature space represents thousands of genes, will mean that number of n-dimensional subspaces is
Too large for practical examination of each subspace
![Page 16: Get Rich and Cure Cancer with Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815249550346895dc087bd/html5/thumbnails/16.jpg)
+Generate ranking of features
A ranking of features allows us to make a nested sequence of subspaces of feature space F
and then determine the optimum subspace to work with
One possibility for ranking: Work with each gene individually, get its correlation coefficient with the classifier (i.e. find correlation of gene expression level with classification of tissue into tumor v. normal or into two different types of cancer
Note: ranking by correlation coefficient assumes all the features are independent of one another.
![Page 17: Get Rich and Cure Cancer with Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815249550346895dc087bd/html5/thumbnails/17.jpg)
+Generate ranking of features
Another possible way to generate a ranking of features: sensitivity analysis.
Have training data set, already classified into two classes (cancerous v. non, or cancer type 1 v. cancer type 2)
Construct a cost function to estimate error in classification
Sensitivity of cost function to removal of a feature measures the importance of that feature and allows the construction of a ranking.
![Page 18: Get Rich and Cure Cancer with Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815249550346895dc087bd/html5/thumbnails/18.jpg)
+Ranking by Support Vector Machines Recursive Feature Elimination
Idea of how to use SVM to identify important features: Consider a cartoon scenario.
x1
x2
Indicates that the x1 direction is completely superfluous for classification.
![Page 19: Get Rich and Cure Cancer with Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815249550346895dc087bd/html5/thumbnails/19.jpg)
+Ranking by Support Vector Machines
This suggests the following recursive algorithm for ranking features:
Find weight vector, using all features
Identify the least important feature to be the one with the smallest (in absolute value) component of the weight vector
List that feature as least important and eliminate itfrom the data
Iterate the procedure, with the least important feature thrown out.
End result: Ranked list of features!
![Page 20: Get Rich and Cure Cancer with Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815249550346895dc087bd/html5/thumbnails/20.jpg)
+Try this at home!
Data is available online!
http://www.broadinstitute.org/software/cprg/?q=node/55
Classify two types of leukemia.