imputation algorithms for data mining: categorization and new ideas aleksandar r. mihajlovic...
TRANSCRIPT
![Page 1: Imputation Algorithms for Data Mining: Categorization and New Ideas Aleksandar R. Mihajlovic Technische Universität München mihajlovic@mytum.de +49 176](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649da75503460f94a932f3/html5/thumbnails/1.jpg)
1
Imputation Algorithms for Data Mining: Categorization and New Ideas
Aleksandar R. MihajlovicTechnische Universität München
[email protected]+49 176 673 41387+381 63 183 0081
![Page 2: Imputation Algorithms for Data Mining: Categorization and New Ideas Aleksandar R. Mihajlovic Technische Universität München mihajlovic@mytum.de +49 176](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649da75503460f94a932f3/html5/thumbnails/2.jpg)
Overview
• Explain input data based imputation algorithm categorization scheme
• Introduce a new categorization scheme of imputation algorithms
• Introduce some new ideas for re-categorization and improvement of existing algorithms and creation of new ones
![Page 3: Imputation Algorithms for Data Mining: Categorization and New Ideas Aleksandar R. Mihajlovic Technische Universität München mihajlovic@mytum.de +49 176](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649da75503460f94a932f3/html5/thumbnails/3.jpg)
3
Digitization of Microarray Data and the Missing Value Problem
• Missing SNPs in individual DNA
• These missing values statistically blur SNP allele association with the disease gene allele
![Page 4: Imputation Algorithms for Data Mining: Categorization and New Ideas Aleksandar R. Mihajlovic Technische Universität München mihajlovic@mytum.de +49 176](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649da75503460f94a932f3/html5/thumbnails/4.jpg)
4
Earlier Input Data Based Classification of Imputation Algorithms [2]
• Categorized according to the input data
![Page 5: Imputation Algorithms for Data Mining: Categorization and New Ideas Aleksandar R. Mihajlovic Technische Universität München mihajlovic@mytum.de +49 176](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649da75503460f94a932f3/html5/thumbnails/5.jpg)
5
Global Approach
![Page 6: Imputation Algorithms for Data Mining: Categorization and New Ideas Aleksandar R. Mihajlovic Technische Universität München mihajlovic@mytum.de +49 176](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649da75503460f94a932f3/html5/thumbnails/6.jpg)
6
Local Approach
![Page 7: Imputation Algorithms for Data Mining: Categorization and New Ideas Aleksandar R. Mihajlovic Technische Universität München mihajlovic@mytum.de +49 176](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649da75503460f94a932f3/html5/thumbnails/7.jpg)
7
Hybrid Approach
+
![Page 8: Imputation Algorithms for Data Mining: Categorization and New Ideas Aleksandar R. Mihajlovic Technische Universität München mihajlovic@mytum.de +49 176](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649da75503460f94a932f3/html5/thumbnails/8.jpg)
8
Knowledge Based Approach
![Page 9: Imputation Algorithms for Data Mining: Categorization and New Ideas Aleksandar R. Mihajlovic Technische Universität München mihajlovic@mytum.de +49 176](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649da75503460f94a932f3/html5/thumbnails/9.jpg)
9
Earlier Input Data Based Classification of Imputation Algorithms
• Classification example: Imputation Algorithms (briefly describe each)– Global
• SVDImpute
– Local• KNNimpute
– Hybrid• LinCmb
– Knowledge• GOimpute
![Page 10: Imputation Algorithms for Data Mining: Categorization and New Ideas Aleksandar R. Mihajlovic Technische Universität München mihajlovic@mytum.de +49 176](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649da75503460f94a932f3/html5/thumbnails/10.jpg)
10
Ideas for Algorithmic Improvement [3]
• Ideas for new categorization model of algorithms based on the methods they use.– Link between the method used and the input data – Room for subcategories based on methods
• Revising the categorization model– Mendeleyevization– Hybridization– Transdisciplinarization– Retrajectorization
![Page 11: Imputation Algorithms for Data Mining: Categorization and New Ideas Aleksandar R. Mihajlovic Technische Universität München mihajlovic@mytum.de +49 176](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649da75503460f94a932f3/html5/thumbnails/11.jpg)
11
Mendeleyevization
• Catalyst– Probability based algorithms
• EM: expectation maximization algorithms have not been classified
• Accelerator– Algebraic based algorithms
• With more memory and better processing power we can increase the number of subjects to be examined. This would improve the precision of Principle Component Analysis algorithms such as BPCA and Single Value Decomposition SVDimpute.
![Page 12: Imputation Algorithms for Data Mining: Categorization and New Ideas Aleksandar R. Mihajlovic Technische Universität München mihajlovic@mytum.de +49 176](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649da75503460f94a932f3/html5/thumbnails/12.jpg)
12
Mendeleyevizaiton
Imputation Algorithms
Global
Probability Based
Algebra Based
![Page 13: Imputation Algorithms for Data Mining: Categorization and New Ideas Aleksandar R. Mihajlovic Technische Universität München mihajlovic@mytum.de +49 176](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649da75503460f94a932f3/html5/thumbnails/13.jpg)
13
Hybridization
• Symbiosis– NN Based and Regression Based: The Local based algorithms can
be classified as both symbiotic and synergic. The difference being the varying data types available for the imputation process. Based on the data set, the proper algorithm from statistical closeness category can be selected.
• Synergy– Statistical Closeness: Both Nearest Neighbor based and
Regression based algorithms can be made to work together, they are not too computationally expensive and can thus be used. It can be assumed that Regression based algorithms can be used to correct NN based algorithms by using the regression based result in an average of the two results.
![Page 14: Imputation Algorithms for Data Mining: Categorization and New Ideas Aleksandar R. Mihajlovic Technische Universität München mihajlovic@mytum.de +49 176](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649da75503460f94a932f3/html5/thumbnails/14.jpg)
14
Hybridization
Imputation Algorithms
Local
NN Based Regression Based
Statistical Closeness
![Page 15: Imputation Algorithms for Data Mining: Categorization and New Ideas Aleksandar R. Mihajlovic Technische Universität München mihajlovic@mytum.de +49 176](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649da75503460f94a932f3/html5/thumbnails/15.jpg)
15
Transdisciplinarization
• Modification– Modified NN:
• Modify KNN to include additional parameters– Compare large K to small K or find the average of all plausible K
vlaues– Use different number of flanking markers– Average out all possible outcomes
• Mutation– Modified probability
• Compare probabilites of flanking markers in sequence of i’th subject j’th SNP allele with the rest. The value along with sequence with the highest probability wins.
![Page 16: Imputation Algorithms for Data Mining: Categorization and New Ideas Aleksandar R. Mihajlovic Technische Universität München mihajlovic@mytum.de +49 176](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649da75503460f94a932f3/html5/thumbnails/16.jpg)
16
Transdisciplinarization (1)
Imputation Algorithms
Local
NN Based Regression Based
Statistical Closeness
Modified NN
![Page 17: Imputation Algorithms for Data Mining: Categorization and New Ideas Aleksandar R. Mihajlovic Technische Universität München mihajlovic@mytum.de +49 176](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649da75503460f94a932f3/html5/thumbnails/17.jpg)
17
Transdisciplinarization (2)
Imputation Algorithms
Global
Probability Based Algebra Based
Modified Probability
![Page 18: Imputation Algorithms for Data Mining: Categorization and New Ideas Aleksandar R. Mihajlovic Technische Universität München mihajlovic@mytum.de +49 176](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649da75503460f94a932f3/html5/thumbnails/18.jpg)
18
Retrajectorization
• Reparametrization– Proteome Based and Gene Based Algorithms
• How protein/aminoacid/codon databases can be utilized in gene imputation is being researched
• Regranularization– Process Based: Data Set Partitioning
• Checking if there is Linkage Disequilibrium between the i’th subject with missing values and other sets of diseased patients. – Sets are organized by the geographic origin of the subjects
• Find the frequencies of the j’th SNP alleles (missing SNP allele under scrutiny in one subject) in the other sets– If LD exists between other set and subject then take allele into
account if not then don’t
![Page 19: Imputation Algorithms for Data Mining: Categorization and New Ideas Aleksandar R. Mihajlovic Technische Universität München mihajlovic@mytum.de +49 176](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649da75503460f94a932f3/html5/thumbnails/19.jpg)
19
Retrajectorization
Imputation Algorithms
Knowledge
Gene Based
Proteome Based
Process Based
![Page 20: Imputation Algorithms for Data Mining: Categorization and New Ideas Aleksandar R. Mihajlovic Technische Universität München mihajlovic@mytum.de +49 176](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649da75503460f94a932f3/html5/thumbnails/20.jpg)
20
The Whole Categorization TreeImputation Algorithms
KnowledgeGlobal
Hybrid
Local
Probability based
Algebra Based
Regression based
NN based
Statistical Closeness
Process Based
Gene Based
Proteome Based
Modified Probability Modified
NN
![Page 21: Imputation Algorithms for Data Mining: Categorization and New Ideas Aleksandar R. Mihajlovic Technische Universität München mihajlovic@mytum.de +49 176](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649da75503460f94a932f3/html5/thumbnails/21.jpg)
21
References
• [1] Frey M., Gierl A., De Angelis, Beckers J., Kieser A., Genomics Lecture; Fakultät für Biowissenschaft, TUM, Weihenstephan, Freising bei München; Winter Semester 2011
• [2] Liew A.W., Law N., Yan H., Missing value imputation for gene expression data: computational techniques to recover missing data from available information, Briefings in Bioinformatics, December 14, 2010, pp.3
• [3] Milutinovic V., Korolija N., A Short Course for PhD Students in Science and Engineering: How to Write Papers for JCR Journals
![Page 22: Imputation Algorithms for Data Mining: Categorization and New Ideas Aleksandar R. Mihajlovic Technische Universität München mihajlovic@mytum.de +49 176](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649da75503460f94a932f3/html5/thumbnails/22.jpg)
22
QuestionsAleksandar R. Mihajlovic
Technisceh Universität München
[email protected]+49 176 673 41387+381 63 183 0081
The End