cs 548 knowledge discovery and data mining project 1
TRANSCRIPT
![Page 1: CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1](https://reader036.vdocuments.us/reader036/viewer/2022062503/58d13f161a28ab455d8b5627/html5/thumbnails/1.jpg)
CS 548 KNOWLEDGE DISCOVERY AND DATA MINING
Fall 2016 - Project 1
By:
Yousef Fadila ML TlachacFrancisco Guerrero
![Page 2: CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1](https://reader036.vdocuments.us/reader036/viewer/2022062503/58d13f161a28ab455d8b5627/html5/thumbnails/2.jpg)
Filling in the missing valueDiscretize: ? = “unknown”
Manually filling in the data:
? = Germany GDPPC + Switzerland GDPPC) = 31.35
Regression imputation:GDPPC = 2.1069 * LIFE-EXP + 0.1911 * AC-S-ED + -40.4882 * (SWL= [175-200),[125-150),[200-225),
[225-250),[250-275)) -16.6881 *(SWL=[200-225),[225-250),[250-275)) - 100.3841. GDPPC (USA) = 2.1069 * 77.4 + 0.1911 * 94.6 -40.4882 *1 -16.6881 * 1 - 100.3841 = 23.59
![Page 3: CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1](https://reader036.vdocuments.us/reader036/viewer/2022062503/58d13f161a28ab455d8b5627/html5/thumbnails/3.jpg)
Transforming COUNTRY attribute
COUNTRY HDI score COUNTRY HDI score
Ethiopia LOW Switzerland VERY-HIGH
India MEDIUM Germany VERY-HIGH
Mexico HIGH Japan VERY-HIGH
Thailand HIGH Canada VERY-HIGH
Russia HIGH Brazil HIGH
USA VERY-HIGH France VERY-HIGH
![Page 4: CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1](https://reader036.vdocuments.us/reader036/viewer/2022062503/58d13f161a28ab455d8b5627/html5/thumbnails/4.jpg)
Discretizing AC-S-EDEqual width
Equal frequency
![Page 5: CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1](https://reader036.vdocuments.us/reader036/viewer/2022062503/58d13f161a28ab455d8b5627/html5/thumbnails/5.jpg)
CfsSubsetEval algorithm
![Page 6: CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1](https://reader036.vdocuments.us/reader036/viewer/2022062503/58d13f161a28ab455d8b5627/html5/thumbnails/6.jpg)
Merit
The CfsSubsetEval formula used to calculate merit is ∑corr(aj,t)/√((∑σ(aj)2)+2corr(aj1,aj2)∏σ(aj)) where t is the target attribute (play), and aj are the selected attributes (outlook & humidity).
=(corr(outlook,play) + corr(humidity,play))/√(12+12 + 2corr(humidity,outlook)(1)(1))
= (0.1960 + 0.1565)/√(1+1+2 (0.01610)) = 0.3525/√(2.032202) = 0.2473
![Page 7: CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1](https://reader036.vdocuments.us/reader036/viewer/2022062503/58d13f161a28ab455d8b5627/html5/thumbnails/7.jpg)
Observing the Data
![Page 8: CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1](https://reader036.vdocuments.us/reader036/viewer/2022062503/58d13f161a28ab455d8b5627/html5/thumbnails/8.jpg)
Correlation Matrix
Remove: numbUrban & medFamIncome
![Page 9: CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1](https://reader036.vdocuments.us/reader036/viewer/2022062503/58d13f161a28ab455d8b5627/html5/thumbnails/9.jpg)
Multidimensional arrays and OLAP operations
Operations:
1.Roll-up time from day to year
2.Slice year == 2014
3.Roll-up patients from individual patients to all
![Page 10: CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1](https://reader036.vdocuments.us/reader036/viewer/2022062503/58d13f161a28ab455d8b5627/html5/thumbnails/10.jpg)
OLAP operations on car’s sales data1. Rolling-up
2. Drilling-down
3. Slicing
4. Dicing
![Page 11: CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1](https://reader036.vdocuments.us/reader036/viewer/2022062503/58d13f161a28ab455d8b5627/html5/thumbnails/11.jpg)
Thank You Questions?