learning-based indexing of works of art

Learning-Based Indexing of Works of Art

Kurt Grieb

Presentation Overview

Research Divided into 2 parts Parallel Upgrade of ALIP

– Structure of Parallelization– Results

EMPEROR Database Tests– Setup of Tests– Results

Reasons for Parallelization

ALIP statistical computations are computationally expensive

Corel Image Library Comparison:– 15-20 Minutes– Unacceptable for Web and other applications

Parallelization Concept

One server receives request, divides workload between the total number of clients.

Server

Client1 – 30

Client31 – 60

Client541-570

Client571-600

. . . .

Parallelization Structure

PERLGUI CLIENTS

Server

Request With URL Range of Concepts

Likelihoods Best Fit

Results

The Speedup of ALIP

y = 0.6275x + 0.6884

0

1

2

3

4

5

6

0 2 4 6 8 10

Number of Processors Used

Sp

eed

up

Series1

Linear (Series1)

Results

600 concepts can now be computed in roughly 40 seconds over 30 processors.

Roughly ideal speedup More processors on a smaller size reduces

efficiency of speedup

The EMPEROR Library

1700 Images Chinese Historical Images

The Testing

2 sets of tests (9 and 20 concepts) 4 runs per set (best, worst, 2 random) 4 sizes per run (3, 6, 9, 12)

Set 1

Best Sub Worst Sub Random 1 Random 2

Size 3 Size 6 Size 9 Size 12

Set 2

Best Sub Worst Sub Random 1 Random 2

Size 3 Size 6 Size 9 Size 12

Motivation For Test Structure

Effects of more specific classes Effects of different training classes Determine reasonable training sizes

Results

Set 1 Total Percentages

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

3 6 9 12

Sample Size

% C

orr

ect

Worst Case

Random 2

Random 1

Best Case

Random Generation

Interesting Cases / Notable Trends

Set One vs. Set Two The Black and White Sketches General Trends vs. Specific Classes Weak Classes Misclassification of Similar Objects

– Black & White Images vs. Text – All faces vs. Color/BW Faces– Faces and Upper Bodies

The Black and White Sketches

Performed the best of all classes Accuracies of 99% over all tests Due to difference between this class and

most other classes

Interesting Cases / Notable Trends

The overall accuracy of all classes went up with more training

In certain classes, the accuracy went down as all concepts were trained with more imaging

Paintings Accuracy

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

3 6 9 12

Number of Training Images

Per

cen

tag

e co

rrec

t

Paintings

Weak Classes

In certain concepts a weak class outperformed other classes

Could be due to openness of concept spaces

Horses Comparison

0.2

0.3

0.4

0.5

0.6

0.7

0.8

2 4 6 8 10 12

Training Image Size

Per

cen

tag

e C

orr

ect

Best Case

Worst Case

Random 1

Random 2

Misclassification of Similar Objects

Pictures with more than one concept in them sometimes can confuse ALIP

Misclassification of Similar Objects

Further Work

Overlapping of Concepts 3-D representations of objects Improved Accuracy of ALIP Current Results are Promising

ABSTRACT

Digital images are widely and readily in use. Text based indexing of these images is becoming tougher as the number of digital images grows. Therefore, Content Based Image Retrieval is becoming a more viable alternative because of the ability to automate this process. Dr. Wang’s Automatic Linguistic Indexing of Pictures shows great promise as a Contend Based Image Retrieval system. Our lab is looking to expand this indexing of pictures for artistic/historical purposes, which are harder to classify due certain characterizes of these pictures. Additionally, some upgrades need to be made to ALIP in order to convert it to a more user-friendly, mainstream program. I present the results of the upgrades to ALIP and the experiments conducted on a historic image database.

learning-based indexing of works of art

Documents

text based indexing

set best

based image retrieval

number of digital images

trainingin certain classes

historic image database

white sketchesperformed

set twothe black