a classification data set for plm

6
(c) 2000-2005 SNU CSE B iointelligence Lab, htt p://bi.snu.ac.kr 1 A Classification Data Set A Classification Data Set for PLM for PLM Information Theory of Learning Sep. 15, 2005

Upload: kenley

Post on 04-Jan-2016

29 views

Category:

Documents


0 download

DESCRIPTION

A Classification Data Set for PLM. Information Theory of Learning Sep. 15, 2005. Introduction to Data (1). Handwritten digits (0 ~ 9) From 32x32 bitmaps, non-overlapping 4x4 blocks are extracted. Introduction to Data (2). # of on pixels are counted in each block. (Range: 0 ~ 16) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Classification Data Set for PLM

(c) 2000-2005 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr

1

A Classification Data Set for A Classification Data Set for PLMPLM

Information Theory of Learning

Sep. 15, 2005

Page 2: A Classification Data Set for PLM

(c) 2000-2005 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr

2

Introduction to Data (1)Introduction to Data (1)

Handwritten digits (0 ~ 9) From 32x32 bitmaps, non-overlapping 4x4 blocks are

extracted.

Page 3: A Classification Data Set for PLM

(c) 2000-2005 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr

3

Introduction to Data (2)Introduction to Data (2)

# of on pixels are counted in each block. (Range: 0 ~ 16) If # > 1, otherwise 0 Original 32x32 bitmap is reduced to 8x8 binary matrix.

0 0 0 1 1 0 0 0

1 1

Page 4: A Classification Data Set for PLM

(c) 2000-2005 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr

4

Introduction to Data (3)Introduction to Data (3)

Data train.txt: 3823 examples test.txt: 1797 examples

Representation In the text files, each row consists of 64 binary values with its

label attached at 65-th column.

Class distribution

0 1 2 3 4 5 6 7 8 9

Train 376 389 380 389 387 376 377 387 380 382

Test 178 182 177 183 181 182 181 179 174 180

Page 5: A Classification Data Set for PLM

(c) 2000-2005 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr

5

Page 6: A Classification Data Set for PLM

(c) 2000-2005 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr

6

Preliminary ResultPreliminary Result

k-nn result (k = 3) on the test set Accuray: 93.10% (ratio of correctly classified)

a b c d e f g h i j <-- classified as 174 0 0 0 1 1 2 0 0 0 | a = 0 0 178 1 0 1 0 2 0 0 0 | b = 1 0 9 167 0 0 0 0 1 0 0 | c = 2 1 2 0 174 0 1 0 1 2 2 | d = 3 0 11 0 0 168 0 0 0 0 2 | e = 4 0 2 0 1 1 172 1 0 0 5 | f = 5 2 1 0 0 0 1 176 0 1 0 | g = 6 0 0 1 0 1 0 0 174 1 2 | h = 7 1 16 4 7 1 6 2 1 132 4 | i = 8 2 2 0 10 0 4 0 1 3 158 | j = 9