msc group project 2001 - pdfs.semanticscholar.org file1 msc group project 2001 relevance feedback...

1

MSc Group Project 2001

Relevance Feedback For Image Databases

Supervisor: Stefan Rüger

Paul Aljabar Simon John Pennifer

Rita Hennigan Jim Kilbey Stanton

Philip Cheung Susan Huang

2

Relevance Feedback for Image Databases Abstract This project implements an Image Database, i.e. algorithms for the extraction of simple features (colour and texture) and a search engine to retrieve images. The R-tree structure used to implement this database. A graphic user interface has been developed which allows the user to send feedback to the database, determining the relative emphasis to be placed on individual images and which images are to be considered irrelevant to the search. This then forms the basis for a modified search through the database. Acknowledgements We would like to thank Stefan Rüger for his support and guidance during this project. We are also grateful to Theodore Hong and Olav Beckmann for their technical assistance on (the many!) occasions in which we have needed it.

3

Contents: Page 1. Introduction

1

2. Existing work on image analysis for retrieval systems

1

3 Outline of theoretical issues and algorithms used 2 3.1 Colour: 2

Some established methods involving colour features 3 Types of features and possible algorithms 3 First moment: 4 Second central moment: 4 Third central moment: 5 Scaling 6

3.2 Texture 6 General 6 Texture Feature Measures 6

3.3 Database structure 10 3.4 Distance metrics and feature weightings 11 3.5 Creating a new query based on feedback from the user

13

4 R-trees from theory to practice 14 4.1 General 14 4.2 Building the tree 15 4.3 Searching

16

5 Implementation issues 17 5.1 Implementing colour extraction 17 5.2 Algorithms used for texture extraction 18

Scaling of results 18 Optimization 19

5.3 Implementation of the RMI server 19 General 19 RMI Registration 20 Remote Methods 20 Calculating the post feedback query vector 20 Processing the response data 21

5.4 Implementation of the client side Applet 22 Interaction with the user 22 The Img class 23 Server and client behaviour 24 Getting into the swing of things 24 Applet summary 24 Suggestions for improvement 25

5.5 Implementation of distributed aspects 26 State 29

5.6 R-trees, issues of implementation 29 5.7 Front End implementation and design 31 6 Testing 34 6.1 Testing the R-tree construction 34

Optimal Range 34 6.2 Testing the search with an initial query 36 6.3 Testing the search after feedback. 39

Initial Query 39 First feedback 39 Response to feedback 40

4

7 Conclusion

41

References 42

Appendices: Appendix A: R-tree representation based on colour features Appendix B: R-tree representation based on texture features Appendix C: R-tree representation based on colour and texture Appendix D: All the images Appendix E: Outline of user interface structure Appendix F: outline of system structure Appendix G: Time spent and summary record of group meetings Illustrations: Figure 1: Colour vectors, means 4 Figure 2: Colour vectors, means 4 Figure 3: Colour vectors, standard deviations 5 Figure 4: Colour vectors, standard deviations 5 Figure 5: Examples of image comparison using Tamura texture features: 8/9 Figure 6:illustration of search over more than one coordinate 10 Figure 7: images on a screen 11 Figure 8: The working applet 23 Figure 9: Double-clicking on a thumbnail brings forward the full 23 Figure 10 Applet structure 26 Figure 11: Outline of distributed data flow 27 Figure 12: Flow Diagram representing the different stages of Initial Query 30 Figure 13: Introductory page 31 Figure 14: Submission page 32 Figure 15: Selecting from the database 32 Figure 16: Confirming the submission 33 Figure 17: Draft results page 33 Figure 18: R-tree groupings 34/35 Figure 19: Initial search results 37 Figure 20: initial search results 38 Figure 21: Response to initial query 39 Figure 22: First feedback query 40 Figure 23: Secondary results post-feedback 40

5

Relevance feedback for image databases. 1. Introduction The project aims to implement a database of simple features of images (representing colour and texture) that can be accessed by a remote user. Users should be able to query the database by providing their own image file for which features are extracted and compared with those in the database. The search engine should then return a number of thumbnail images that are ‘close’ to the image provided according to a default metric. The interface should allow the images to be manipulated in order to provide ‘relevance feedback’ as the basis for a new search. The metric used in the new search is a modified version of the default metric depending on the user’s response. This part of the interface should be intuitive allowing the user to identify images that are to be excluded from the new search and to change the emphasis given to other images by moving them nearer to or away from the centre of the screen. On re-submitting the modified thumbnails a new search would return what is (hopefully) a set of images closer to the target of the original query. At either stage the user should have the option of downloading a larger version of a picture file shown in a thumbnail. Algorithms are to be chosen to return particular data structures as a representation of the corresponding feature (e.g. a vector to represent colour) and this in turn will partly determine the structure and implementation of the database. We shall define a measure of the colour of an image using the RGB content of the pixels. As a measure of texture we intend to use the three main Tamura texture features which are well established. If time permits, we will implement an analysis of shape using the wavelet transform. 2. Existing work on image analysis for retrieval systems The subject of image analysis is a widely researched and well developed field and prior to carrying out specific work on developing image analysis software a brief review of existing systems was carried out with two aims:

• to establish the general feature types and approaches to analysis that represent the state-of-the-art;

• to identify if possible open source software that could be built into the project to enhance its functionality.

A large number of systems are described in the literature, and in general these can be categorized under 3 headings: a) Commercial systems (e.g. the IBM corp QBIC system and the Excalibur Technologies (now

Convera) Visual RetrievalWare SDK) These and similar systems provide comprehensive development tools for use by companies, museums, collections and other organizations in creating searchable image databases.

b) Research systems (e.g. the MIT Photobook system, the Berkeley Blobworld system and the CMLA Megawave system)

6

The features offered by such systems vary widely. Some, such as Photobook, are intended to aid researchers in the development of their own feature extraction tools and therefore provide file searching and handling capability but do not provide significant image analysis tools. Other systems, such as Megawave, are presented as demonstrations of research into feature extraction and may include full open source access.

c) C++ class libraries (Paintlib and the Intel OpenCV library) These libraries provide low level routines that offer extended capability for analyzing and manipulating image related data sets and are generally used for the development software packages using or supporting the use of hardware for image capture.

Whilst the large systems available under each of these 3 headings offer much that could be useful, there are two main disadvantages in adopting any as the basis for the image analysis part of this project:

• much of the active code within these systems is either not available as source code or is contained within a complex, many layered dependency structure;

• the time investment required to understand the architecture of the system sufficient to create the basic functionality required at the start of the project would be large.

For the purposes of this project it was therefore decided to create a small set of simple image file manipulation and feature extraction functions, using, if feasible, available open source implementations of the basic algorithms. This is in keeping with the project specification and would provide the data sets on which the comparison and relevance feedback components of the project would operate. 3 Outline of theoretical issues and algorithms used Retrieving images from large and varied collections is a challenging task, and there can be many shortcomings of such image retrieval systems. Most existing applications represent images based only on their low-level features, such as colour, texture and shape, and do not incorporate high-level features, such as searching for particular objects or identifying spatial location. Given the constraints in time and other resources our project aims to implement colour and texture extractions in the first instance, therefore we chose an image database that consists of symmetrical wallpaper designs with relatively simple patterns 3.1 Colour: Modeling of colour Colour is the perceptual result of light having wavelengths from 400nm to 700nm that is incident upon the retina. The human retina has three types of colour photoreceptor cells called cones, each of which responds to incident radiation with different spectral response curves, so human vision is inherently based on three colours. That is the reason why most colour models have a three-dimensional description. Any colour within the visible spectrum can be composed from relative quantities of Red, Green and Blue light, therefore colour is reasonably well modeled by the representation within a 3 dimensional RGB space. In some applications more intuitive notions of hue, saturation and brightness are used in conjunction with other models, such as HIS or HSV, both of which can be derived from the RGB colour space. Colour histograms Several methods for retrieving images on the basis of colour similarity are based on the computation of a colour histogram using the RGB space which shows the proportion of pixels of each colour within the image. The image is considered an array of pixel intensities, with a range of 0 to 255 for each primary colour. This range is divided into intervals or ‘bins’, and the

7

proportion of pixels with various intensities are recorded in these bins. Computing a colour-histogram for tiled images would give a result that is more sensitive to variation in the distribution of colour across the image. Some established methods involving colour features ‘Blobworld’ representation: This involves the transformation from the raw pixel data to a small set of image regions which are coherent in colour and texture space. This technique is base on segmentation using the Expectation-Maximization algorithm [7] on combined colour and texture features. After the image is segmented into regions, a description of each region’s color, texture, and spatial characteristics is produced. In a querying task, the user can access the regions directly, in order to see the segmentation of the query image and specify which aspects of the image are important to the query. When query results are returned, the user sees the blobworld representation of the returned images; this assists greatly in refining the query. JSEG color image segmentation: This is a new approach to fully automatic color image segmentation [8]. Color image segmentation is useful in many applications. From the segmentation results, it is possible to identify regions of interest and objects in the scene, which are very useful in subsequent image analysis or annotation. In the JSEG method colours in the image are quantized into several representing classes that can be used to differentiate regions in the image. Then, image pixel colors are replaced by their corresponding colour class labels, thus forming a class-map of the image. Experiments show that JSEG provides good segmentation results on a variety of images. Types of features and possible algorithms To histogram or not to histogram? The most straightforward feature extracting mechanism for colour is the computation of histograms. It is a relatively simple task to code and has been thoroughly tested in many existing image retrieval systems. Then why did we opt for a different method? In order to preserve reasonable accuracy a minimum of four intervals is required for each RGB colour. That gives 4*4*4 = 64 bins, so the colour feature vector would had to have 64 components. If tiling is introduced the colour vector would then have contained even more elements (256 with 4 tiles), many more than the texture vector we are using, and given our relatively simple relevance-feedback algorithm, this imbalance did not appear to be sensible. Intensity central moments: The rth moment of a frequency distribution about the mean is defined as:

∑

∑ −=

fxf

Mr

r

)( µ

Where µ is the mean of the distribution. Moments describe particular statistical characteristics of a distribution, but other than the first three moments these are rarely used since, for increasing r

a) The moment becomes more arduous to calculate and, b) becomes more sensitive to small changes in the distribution.

We decided to construct the colour feature vector as a concatenation of the first three moments for each RGB colour. Representing colour in an image solely by central moments is inferior to computing histograms, since only the whole range of RGB component is considered. This design decision is based on consideration of the acceptable length for the colour feature vector. To avoid the moments simply being global features, spatial reference is introduced by tiling the images into four windows of the same size. There is a lot of self-similarity in our images, therefore locality is not very important, but we felt we should consider future additions to the database, for which this may not be the case. There are nine values for moments in each window, which gives

8

thirty-six components for the colour vector. Pixel data is not grouped, but instead each pixel is treated separately to improve accuracy and ease computation. In the formula for Mr Σf is the total number of pixels present in the image, and x represents individual pixel intensities for each colour C. Hence the equation becomes:

pixelsofno

xM

rCC

Cr __)(

,∑ −

=µ

First moment: It is easy to see that the first moment about the mean is 0 for any distribution. Therefore the first moment about zero is used instead, which actually provides the mean value of the intensity distributions. The first four elements in the colour vector are the red mean values for the four tiles/quadrants, the next four represent the green averages, and the remainder are the blue mean values. Clearly light images have higher average colours than darker ones. For example, Image5 is predominantly white, while Image19 is rather dark (Figure 1). Img5 mean values Img19 mean values 234 240 232 239 234 240 232 239 234 239 233 240

82 78 79 78 13 10 11 10 27 24 25 24

Figure 1: Colour vectors, means It is also possible to infer the principal colour of the images: in Image18 the Red component is dominant and the relatively low values suggest a dark image. Image33, on the other hand has high averages with prevalence of Red and Green which suggests a yellow principal colour and a light hue (Figure 2). Img18 mean values Img33 mean values 139 140 144 134 60 58 63 53 50 45 52 41 231 230 230 229 236 235 235 233 113 112

112 111

Figure 2 Colour vectors, means Second central moment: This feature represents the statistical variance of a colour. The square root of the second moment, namely the standard deviation is coded in the feature vector for each component, and this ensures that the order of the values is commensurate with those for the means. The standard deviation is the measure of dispersion, thus in general images with a flat, monotonous colour have smaller values than those with a large range of colours. For illustration we can compare Image52 with large values for this feature and Image46 with much smaller variances (Figure 3).

9

Img52 standard deviation Img46 standard deviation 209 229 219 213 217 246 230 218 200 221 210 204

77 84 85 86 70 74 75 76 61 63 63 64

Figure 3: Colour vectors, standard deviations More interesting is the case when the standard deviation of a color component differs significantly from the other two. Let’s take a look at, for example, Image34 (Figure 4) The for standard deviations of the blue component are much smaller than those for red and green. The only pattern in this image is a light yellow grid, which comprises large red and green pixel values and very small blue ones. The dark background must have small pixel values for each colour. The conclusion is: the blue pixel intensities are small and they range over a narrow interval, while the red and green ones vary much more. Img34 standard deviation 155 158 157 148 133 136 135 124 68 74 64 61

Figure 4: Colour vectors, standard deviations Third central moment: The cube root of the third moment gives the last twelve coordinates of the colour features. The third central moment is used in the measure of the skewness of the distribution. More precisely a measure of skewness is given by:

32

3

MM

For the purpose of defining the feature vector it seems sufficient to use the cube root of the third moment instead of a skewness coefficient, though we will not be able to compare images just from looking at the coordinates. If a distribution has a positive skew (‘leans to the right’), then the mean value is less then the median, since the mean is more influenced by extreme values. It follows that more than half of the data must be larger then the mean value, so in the formula for M3 the term (x - µ)3 is positive for more than 50% of the data, and negative for the rest. Therefore the overall sum is positive, and a positive number for the third central moment would indicate a positive skew as dividing by the term m2

3/2 does not change the sign. Similarly, a negative result would indicate a negative skew. The argument probably does not hold when the mean is close to the median, as the size of the term (x - µ)3 for values lying far from the mean could tip the balance between positive and negative skew. For an image with a large positive skew the majority of the pixels would have intensities higher then the mean value, but there would be a few extreme values with very small intensities. This reasoning would imply that these images tend to have some patches darker than the average,

10

while the negative skew would indicate a portion of lighter than average area. In order to illustrate skew with an example it would be necessary to calculate m3 / m2

3/2 before any roots are taken and scaling is done. Scaling It is convenient for retrieval purposes to have all coordinates as integers in roughly the same range, therefore they are scaled linearly between 0 and 255. This reduces the effect of having a diverse set of coordinates for different moments. While the mean can vary between 0 and 255, it is not easy to establish theoretical limits for the second and third moments, so scaling is done experimentally once the feature values of the database are calculated. The smallest occurrence of each feature is scaled to 0 and the largest one to 255. In the unlikely event of the query image having more extreme values, the feature-extracting program adjusts the coordinates so that they all appear in the 0-255 range and there are no negative numbers amongst them. The fact that scaling is based on the feature vectors of the existing database implies that the constants required to shift an image’s values may need to be adjusted with the introduction of a new image to the database. This may require some computation, but it is unlikely that a new image will have original coordinates outside the 0-255 range as our existing images are pretty diverse. Hence the database is reasonably ‘scalable’ as long as extensions to it are done incrementally. Weightings and Suggestions for improvements The colour vector is given a single weighting in the calculations for similarity (see section on relevance feedback), though the three central moments represent different colour features of the given image. A more sophisticated use of weightings would incorporate three weightings, one for each moment, and this could definitely improve relevance feedback for this system. The first moment seems to prevail in determining what an image looks like, so it could be given the highest initial weighting. The second central moment should probably come next, but precise values cannot be determined without experimentation. 3.2 Texture General A basis for much of the current research in texture analysis of images is the original work carried out by Tamura et al, see [2]. In this paper Tamura proposes 3 principal and 3 secondary texture features and validates specific algorithms for calculating characteristic values of each of these against the perceptual results of human assessments of specific images. Texture Feature Measures The 6 texture features proposed by Tamura are: a) Coarseness: This measures the scale of the repeating structures within an image and was

considered by Tamura as the most fundamental texture feature. Images range from a fine to a coarse texture, with large texture element size and low repetition rate characterising an image as coarse. [See figure 5 for specific images to illustrate comparisons of coarseness and the other texture metrics.]

b) Contrast: This measures the dynamic range of grey levels in the image and how this range is distributed about the mean. Images range from low to high contrast, with high contrast images having a wide deviation of grey levels about the mean value. Tamura also considered

11

the effects on contrast of sharpness of edge and periodicity of repeating patterns but concluded that these were not significant.

c) Directionality: This measures the significance of global features by identifying the presence of long lines or sweeping curves. Images range from non-directional to directional, with a significant proportion of edge elements in the same direction characterising an image as directional.

d) Line-likeness: This measures the shape of texture elements and was proposed as a means to distinguish between patterns that had insufficient directionality for that measure to be significant. Images range from blob-like to line-like, with straight edged shapes characterising an image as line-like.

e) Regularity: This measures the propensity of the pattern to deviate from an inferred repeating placement rule. Images range from irregular to regular, with uniform repeating patterns characterising an image as regular.

f) Roughness: This measures the sum of the coarseness and contrast of an image. It was not considered as a characteristic feature for images, but was introduced by Tamura to investigate a possible correlation between image properties and the roughness of textiles.

Of the 16 Brodatz textures ([3]) studied by Tamura, figure 5 overleaf illustrates those images which represent the maximum and minimum value for each of the metrics used for the study. Also illustrated are the images from the database for this project which represent the maximum and minimum value for coarseness, contrast and directionality.

12

Figure 5: Examples of image comparison using Tamura texture features: a) Coarseness

Fine texture: Brodatz D.93

Project database (Img 26):

Coarse texture: Brodatz D.98


b) Contrast

Low contrast: Brodatz D.34

Project database (Img 14) :

High contrast: Brodatz D.20

Project database (Img 40) 1):

1) This image has a fine, high contrast speckle not apparent at this scale

13

c) Directionality

Non-directional texture: Brodatz D.33


Directional texture: Brodatz D.15


d) Line-likeness

Blob-like texture: Brodatz D.67

Line-like texture: Brodatz D.68

e) Regularity

Irregular texture: Brodatz D.69

Regular texture: Brodatz D.34

f) Roughness

Smooth texture: Brodatz D.93

Rough texture: Brodatz D.28

14

3.3 Database structure In planning and building an image database, once images have been collected, we needed to develop our criteria for comparing images, which would then define the information our database needs to hold and how it is used. For each image we have envisaged holding pre-computed information in vector form, for example:

(image identifier, location of image, coordinates representing features ) We needed to plan the way our data is stored and how we search the data in order to formulate a search algorithm to retrieve the images that closely match the query image. This led us to consider structures for our database that might produce more efficient searches. In the simplest case our database could simply be visualised as an ordered list of data items all of which would be compared with our query image in order to return those which have sufficiently similar features according to a particular metric. Computing the distances between every image in the database and the query image would, however, lead to very poor performance. As a step towards improving search efficiency we could consider performing a search for each feature coordinate separately. Each search could be implemented as binary search through a B-tree to improve efficiency since the individual coordinates are numerical and hence each search is 1-dimensional. After performing a separate search for each feature coordinate the sets of returned images could then be compared in order to identify which images are sufficiently similar overall to the query image using a second set of criteria. If we considered, for example, just two coordinates to represent overall colour and texture, then each image could be visualised as lying in a two dimensional plane (figure 6) with its position defined by its feature values. In searching, using each feature in turn and then finding out which items are returned by both searches, we are effectively identifying those images which lie in a rectangle around the position of the query image. This process seems a little too elaborate if the number of dimensions is large.

Figure 6:illustration of search over more than one coordinate

Clearly, some images may be a close match in terms of one feature but a bad match in terms of the other.

15

In the case of more than two, say n, feature measures this is equivalent to identifying a ‘rectangle’ in n-space. This led us to the concept of an R-tree, an indexing structure for spatial searches which was first proposed by Guttman [1]. It offers an algorithm for developing a tree structure which represents data objects by intervals in several dimensions and is based on the idea that all child nodes of a single node in the tree are contained in a bounding volume whose location and dimensions are stored in the parent node. Searches through the tree begin at the root node, which defines the bounding volume for all the data points. At each stage the query point is tested for containment against the regions defined in the sub-nodes. This will eliminate sub-trees that do not contain it, with the aim of improving efficiency. This method is a generalisation of B-tree searches. R-tree searches seems appropriate to the type of searches we want to implement, based not on exact matching (which is most common in database searching) but on locating data objects which are similar to the query image in each of the feature values. Our data items and searches are essentially spatial. Further developments to the R-tree have also been proposed including the R*-tree which attempts to minimize the bounding spaces and any overlaps between bounding spaces in order to further improve performance [5]. The VP-tree proposed by Yianilos [6] is similarly designed for multi-dimensional nearest neighbour searching but partitions data on relative distance (i.e. on the distances between points) rather than on coordinate values. 3.4 Distance metrics and feature weightings Given that the features extracted from images will be represented in vector form, we have a choice of metrics to define the distance between points corresponding to two images (e.g. Manhattan, Euclidean or a more general Minkowski metric, etc.). For any metric used, the distance it provides will be represented by a single number. For more than one feature the distances between two images the features need to be combined to provide an overall measure of the distance between the images. We chose to use the Manhattan metric and hence we defined the distance between two vectors (a1, … an) and (b1, … bn) as

max |ai – bi| 1 ≤ i ≤ n

This metric was chosen for ease of implementation and for speed of run-time calculation. Further testing would be needed to evaluate how well this metric corresponds to perceived similarity. Clearly we want images which the metric defines as close to be perceived as similar, i.e. we would like the metric to be uniform. The R-tree building and searching programs will use the Manhattan metric to define a region around each query point and then test for overlap between this region and pre-computed regions around the database images. This means we have to decide on an appropriate size for the region we define around each image. Using the Manhattan metric for a given vector defined by a point P, the set of all other points a fixed distance d from P will form a ‘cube’ of edge 2d with P at its centre. (In two dimensions the set will define a square, in three a cube, and in four a hypercube and so on.)

16

As well as calculating a distance for each feature, in theory these distances need to be combined in order to give an overall distance between two images. In the first instance this could simply be a weighted average with equal weighting given to each feature. For example if d represents overall distance between two images, dc and dt represent normalised colour and texture distances respectively, we could then use: d = 0,5dc + 0.5dt

On receipt of feedback from the user regarding the relevance of images provided by the initial query, we will need to consider how these weightings need to be altered to provide a more appropriate overall distance, for example if texture is more important the revised distance measure could be: d = 0,2dc + 0.8dt In our implementation the use of overlaps with the Manhattan metric means that weightings need to be converted into acceptable ranges around feature coordinates. Giving one weighting, say, for colour leads to all the coordinates that define colour having an equal effect on the outcome of the search. While this may be over-simplifying matters, it is much easier to implement. In a feedback query we want to be able to give colour and texture different weightings in order to reflect the user’s feedback. The colour and texture weightings determine the shape of the region around the query vector that is considered ‘close’. Equal weightings will mean that the region is a ‘cube’ in the feature space. A large weighting for colour, say, means the user is being more specific about this feature and hence the range of colour values defined as acceptably close should be correspondingly narrow, i.e. the ‘edge length’ in the colour dimension is reduced. If we consider just three features, say c, t and w (for colour, texture and wavelet), given a set of normalised weightings α, β and γ (i.e. α+β+γ=1), the values of α, β and γ will affect how we search the database(s) containing the feature data. For the purposes of implementation the weightings will be scaled to between 0 and 255 and summing to 255. In practical terms we ended up using the formula: colour_Range = int(20.0/((colour_weighting+1)/255.0))+1; texture_Range = int(20.0/((texture_weighting+1)/255.0))+1; Each weighting is normalised out of 255. One is added to avoid division by zero. A value of 20 for d was obtained by experiment. One is added again to ensure no zero ranges are created. This formula ensures that weightings are basically inversely proportional to the ranges as required. Having conducted different searches, for the case where the weightings are 127 and 128, (i.e. initial queries) perceptually similar images are being returned. See figure 16. The above does not preclude having a finer grain of weightings for subsets of an individual feature’s coordinates. For example, we may wish to give separate weightings to coarseness, contrast and directionality within the texture coordinates. Additionally, algorithms exist for relating the variance of a particular set of values to the weighting it is given[4]. Qualitative differences between what two sets of values represent may well have bearing upon their relative weightings; i.e. how should weightings be assigned to coordinates representing colour average intensity as opposed to say colour variance, two perceptually different images can have very close values for variance. If we had more time for this project we could explore these issues further.

17

3.5 Creating a new query based on feedback from the user Let Q be a query image and VQ be its associated features vector containing coordinates for colour, texture and (If we have time) wavelet measures. Following a query, say there are n result images R1, .. ,RN returned by the database with vectors VR1, … VRn representing their feature values. The user identifies some of these as irrelevant to their search and then modifies the positions of the remaining thumbnail images. Let the remaining modified set of images be MR1, … , MRK where clearly k<=n and the MRi’s are a subset of the Ri’s. Let O represent the centre of the screen then the arrangement of modified images could be something like that shown in figure 7. We could then take all the distances from the centre in order starting from the smallest. Define the radius of the viewing area as a fixed number of units, 10 say. Define the distance of each image from the centre as the nearest integer number of units in proportion with the radius of the viewing area. This gives a ranking according to distance from the centre with the possibility of some or all of the images having equal rank. Let di be the distance of image MRi and let D be the maximum of the di ‘s. Assign a weighting to each image of

wi = D - di +1 so the furthest image will have a weighting of 1 and the nearest will have a weighting of D. Give the original query image a weighting of D + 1. (i) Calculate the vector VN of a new point in the features’ space based on the weightings just assigned using a weighted linear combination

∑

∑

++

×++=

k..1i

k..1MRiiQ

N ]w1D[

]VwV)1D[(V

(ii) Obtain the dimensions of the smallest ‘rectangle’ R in the features’ space that contains VQ and VMR1 , … , VMRK.

MR1

O

MR3MR2

MRNFigure 7: images on a screen

18

Obtain the edge ‘length’ of this rectangle in each of the feature directions. E.g.: for colour, find colour length LC by: Max over all vectors [ Max(colour coords of individual vector) ] – Min over all vectors [ Min(colour coords of individual vector) ] Find LT and LW in a similar manner. The smallest of these values indicates that the user is more particular about the corresponding feature and so that feature should have a higher weighting. So the weightings can be calculated by ensuring that they are inversely proportional to the lengths obtained. (iii) So we could set the new weightings for colour, texture and weighting distance call them α, β, γ as :

WTC

C

L1

L1

L1

L1

++=α

WTC

T

L1

L1

L1

L1

++=β

WTC

W

L1

L1

L1

L1

++=γ

I.e. the overall distance is now to be defined as : αdc + βdt + γdw Where dc, dt and dw are the distances in terms of colour, texture and wavelet respectively. These weightings can now be used to calculate the edge lengths for the box around the new query vector. These lengths define the maximum distance in each feature direction defined as close the image. 4 R-trees from theory to practice 4.1 General Rather than attempting to use an existing database package, which would be both complex to use and include far more functionality than we will need, we took the decision to attempt our own implementation of the R-tree data structure. Prior to actually implementing the R-tree structure, we needed to consider how the theoretical background relates to the data we are trying to represent. This form of structure appeared to be most appropriate for the type of data we were dealing with, as it would allow us to build up a structure for representing multi-dimensional data. By providing such a structure we hope to make searching of our database more efficient and to provide a mechanism for ‘nearest neighbour(s)’ searching rather than searching based on exact matching which is the norm in standard databases. This means we are attempting to return images from our database which are ‘close to’ a query image, by a specified definition of ‘closeness’, rather than exactly match it. We have used Guttman’s [1] approach for constructing the structure although we have chosen not to include mechanisms for deleting records at this stage.

19

4.2 Building the tree The tree structure is built up of a pyramid of nodes each attached by pointers to child nodes, or to images if they are leaf nodes at the bottom level. We have chosen to set a maximum of 3 child nodes or images for any node. According to Guttman’s research, altering this number has performance implications but given the constraints of time we have made an assumption in order to make progress and bearing in mind that our database is relatively small. Similarly, nodes are attached by a pointer to a node above unless the node is the root node at the top of the tree. In order to compare images we have extracted a feature vector of forty-eight values (colour and texture) for each image and therefore the images are located in a space of forty-eight dimensions. To allow for ‘nearest neighbour’ searching we define a small interval around each feature value and hence the image will be allocated a space that is delimited by two points in 48-space. Thus ninety-six values are needed, giving maximum and minimum values for a range around each feature image. Each node at whatever level then holds the dimensions of the minimal box in 48 space which encloses all of the images in the sub-tree below it. Initially the tree is set up as a single (root) node with pointers to images attached to it. When an image is inserted into the structure it is added to the node in the tree which contains the other images it is ‘closest to’. This is found by searching down the tree to locate the leaf node which requires the least enlargement of its bounding dimensions to include the new image. If the chosen node has a free image pointer the image can be attached to this leaf node and the bounding dimensions can be updated for this node and for all its ancestor nodes. If, however, the identified node is ‘full’ (i.e. it already has three images associated with it) then a new leaf node must be created and the four images being considered (three from the ‘full’ node and the new image) must be divided between these two nodes. This is done by first identifying the two images which are ‘furthest apart’. Each possible pair from the four images is considered and the volume needed to contain them is calculated. Whichever pair requires the largest volume is considered to be the furthest apart. These two identified images are then separated and each assigned to one of the two nodes. The remaining images are then taken in turn and assigned to whichever of the two nodes holds the images they are nearest to using the same process described earlier. Once this process has been completed the new node then needs to be attached to the same upper node in the tree as the original node and the bounding dimensions updated for all attached upper nodes. This process becomes more complex if the upper node is itself already ‘full’ in which case the ‘node splitting’ process must propagate up the tree as necessary in order to ensure that no node has more than three lower nodes attached to it. Indeed, the root node itself may need to be split which necessitates the creation of a new root node above this level.

20

4.3 Searching Once this structure has been created, searching the database given a query image vector involves checking whether there is any overlap between the region defined by the limits held at a node and a region around the query vector which defines acceptable similarity. The minimum and maximum values for the range of each feature value define a region of similarity around the query image in the same way as this is defined for the original images in the database. Clearly, for any search we will only consider there to be overlap of two regions if all the coordinates defining them overlap. For a texture search, for example, the space around the query image must overlap on all three texture values (coarseness, contrast and directionality) for all four quadrants or tiles. Similarly a colour search looks for overlaps on all 36 colour values and a search on colour and texture looks for overlaps on all 48 colour and texture values. Setting up our overlapping criteria in this way is equivalent to specifying that images will only be sufficiently similar if the ranges around all their required feature values overlap. In saying this we are implicitly using the Manhattan distance to specify the distance between two images, as the maximum difference between corresponding pairs of feature values must be less than a fixed value in order to achieve overlap on all feature values The first stage of searching is to check whether there is any overlap between the space around the query image and the bounds of the database space, which are held at the root node. If there is no overlap then it means that there are no images in the database which are sufficiently similar in which case we have implemented a method for extending the region around the query image and beginning the search process again. The search method is called recursively when the query region is extended and so the query region can be repeatedly extended until there is overlap. Clearly each time a query region is extended the images which are returned will be less similar but we took the decision that it would be far more useful for a user to have some returned images from the database to work with rather than to have no images returned. From the root node each node below needs to be checked in order to ascertain which nodes have images below them which may be sufficiently similar to the query image. This is done by again checking whether there is any overlap between the query region and the bounds of the space defined at the node. If there is overlap then the search descends a level and is repeated until the images are located. It is possible that even if there is overlap at say the root node, that there will not be overlap on all dimensions at any of the lower nodes on the next level down. Each node will overlap on at least some dimensions but may not overlap on all. In this case the query region will again be extended in order to guarantee some returns. At the image level, overlap on all feature values is again checked and if this is successful the image identifier, location and feature values sent to a buffer array to be output to a text file once the search is complete. At this stage we have not implemented any mechanism for guaranteeing a minimum or maximum number of returned images. Clearly this would be a refinement that would be beneficial to the user, as it would ensure that a reasonable range of images was returned. Although the tree structure attempts to make searching more efficient it cannot be guaranteed that only one path from a given node holds similar images and it may be that from a given node, paths through several lower nodes will need to be followed in order to locate the closest images.

21

It was difficult initially, however, to have any understanding of how some of the feature values corresponded to features of the images. Even if the returned images are close in terms of the numerical values of their feature vectors, are they perceptually similar to each other, or indeed to the original query image. Does submitting two separate but perceptually similar query images generate similar sets of results? With further testing once the R-tree building and searching program was implemented it has been possible to identify some of the strengths and weaknesses of the design. See section on testing. 5 Implementation issues 5.1 Implementing colour extraction Why Java? Java's PixelGrabber class can access RGB colour values directly, which means that features can be extracted directly from the original jpg image format. We opted for this simple solution for colour feature extraction. Texture extracting algorithms are more computation-intensive, therefore they are written in C++. The method used in a ‘nutshell’ First the name and path of the file is supplied as a command-line argument of the ImageDataTest file. An instance of our ImageData class is created and the image is loaded as an instance of Java’s Image class to become the instance variable thisImage for the ImageData class. Once the pixels of thisImage are grabbed, the three RGB components are extracted with the help of the ColorModel class, and the moments are calculated for the four sub-images. The results are scaled, then concatenated to give the instance variable feature Vector. This is then written to a textfile by calling the instance methods WriteFeatureVector(). The ImageData class: instance variables: Image thisImage: the instance of the java.awt.Image class, the submitted image is stored in this variable. int height: the height of thisImage int width: width of thisImage int[] featureVector: integer array for storing feature coordinates. It has 12 coordinates for Red, Green and Blue mean values for each quadrant, then the same for standard deviation and the cube root of the 3rd central moment. Methods: ImageData(Image img): the constructor that initialises every instance variable Image loadImage( String infile): a static method called by main that loads the image specified by infile and returns the corresponding instance of the Image class. void findFeatureVector(): calls getSubImageFeature() and concatenates the four results into the featureVector. int[] getSubImageFeature(int x, int y, int w, int h): the function calculates the scaled moments of the subimage given by x and y, the coordinates of the top left corner of the subimage; w and h stand for width and height of thisImage; void writeFeatureVector( String outFile): outputs the String version of the featureVector to a textfile as a bytearray. This is for reading by the C search program. String vectorToString(): converts featureVector into a string

22

5.2 Algorithms used for texture extraction With the limitations of time available and size of the test database for this project it was considered that the 3 primary texture characteristics would provide sufficient discrimination. The texture analysis program was therefore limited to calculating only the coarseness, contrast and directionality metrics. The algorithms for these calculations each operate on a luminance map of the image, and therefore require the pixel colour intensities of the original image to be converted to the appropriate L* luminosity values. Following this conversion, the algorithms can be summarised as follows: a) Coarseness:

i. Define a size variable k, and a map of k values for each pixel of the image (initialized to k=1). ii. Define a map of average values of 2k×2k blocks of pixels. iii. Calculate the difference between the average values in the forward and reverse neighbour blocks. Repeat for the upward and downward neighbour blocks. Take the larger of these two calculated values. iv. For k=1, define a difference map of these values. For k>1, if the new calculated value is not less than the corresponding value saved in the difference map then update the difference map and k map with the new values. v. Increment k and repeat until k = 5. vi. For the values of k now stored in the k map, calculate the average value of 2k.

b) Contrast: i. Calculate the mean value of the pixel intensities. ii. Calculate the variance of the pixel intensities as a measure of the dynamic range. iii. Calculate the kurtosis as a measure of the distribution about the mean. iv. Divide the variance by the 4th root of the kurtosis.

c) Directionality: i. At each pixel, in the horizontal direction calculate the difference in luminosity between the forward and backward neighbour. Add the value for the same calculation at the upward and downward neighbouring pixel location (=gradH). ii. At each pixel, repeat this calculation in the vertical direction (=gradV). iii. Calculate a “gradient” vector at each pixel, as follows: magnitude = (gradH2 + gradV2)0.5

direction = arctan(gradV/gradH) + π/2 iv. Discard pixels with gradient of magnitude less than a threshold (set to 12, as suggested by Tamura). v. For other pixels create a histogram of directions (16 direction bins were used, as used by Tamura). vi. Normalise the histogram by dividing by the number of counted pixels. vi. Divide the histogram into separate peaks and valleys and calculate the variance of each region separately. vii. Calculate the arithmetic sum of these variances.

Scaling of results For convenience the raw values calculated in accordance with these texture algorithms were scaled for storage as integer values between 0 and 255. The scaling was carried out by establishing the following ranges for each of the texture features:

23

a) Coarseness: Minimum value: Would result from a k map of values all equal to 1, corresponding to coarseness of 21 = 2. Maximum value: Would result from a k map of values all equal to 5, corresponding to coarseness of 25 = 32.

b) Contrast: If ∆p is the difference of pixel intensity from the mean, then the contrast value varies as

4 4

2

∑

∑∆

∆

p

p , Hence:

Minimum value: As ∆p → 0 for all pixels, contrast → 0. Maximum value: The calculation of luminance produces values that range from 0-100. Hence for ∆p = 50 for all pixels, contrast = 502/50 = 50.

c) Directionality:

Minimum: Occurs when histogram has minimum variance i.e. equal numbers of pixels occur in each bin, and variance = 0. Maximum: Occurs when the histogram has maximum variance i.e. all pixels occur in a single bin, and variance = ((1-1/16)2)/16 ≈ 1/16.

Optimization To be able to carry out a search based on the texture features of a query image, it will be necessary for these calculations to be carried out in real time. As each feature requires significant calculation for each pixel of the image it was expected that performance would be an issue for this program and it was therefore programmed in C++ in preference to Java. Initial programming suggested that memory allocation would also be an issue, and the first implementation was therefore based on minimizing memory use. The resulting program executed successfully but at several minutes per image was unacceptable for use in a real time search. Having improved the program operation, however, attention was paid to controlling memory usage to allow more efficient calculation techniques, and this enabled the processing time per image to be reduced to about 3-4 seconds. 5.3 Implementation of the RMI server General The communication interface on the server side is provided by the Java program JcoordServer, which carries out the following functions:

1. registers its availability as a server with the RMI registry of the host machine; 2. provides implementations for two remote methods; 3. converts the image location data from the applet into the query vector format required by

the database search program; 4. converts search results back to the format required for image display in the applet.

24

The implementation of JcoordServer to provide these functions is generally described in the following sections. Difficulties in providing an environment capable of supporting the interface between the Java based server with the C++ database search system prevented full testing of the program functionality and testing was generally carried out using pre-processed search response files and a single user. Preliminary testing however has indicated that the client-server communication functions correctly with multiple clients. Modifications would however be necessary to support this in practice to ensure that the query image and the responses returned relate correctly to the same user. RMI Registration The JcoordServer class is required to run as a Java program and therefore implements the public static void main() method. This method instantiates a JcoordServer object and binds this to the remote server name “Jcoordinator” at the RMI registry. The server is then available for calls to remote methods. Remote Methods The JcoordServer implements the interface JcoordInt, and must therefore provide bodies for two remote methods:

1. public Img[] processInitialQuery() – which takes no arguments and the image array returned is the data set defining the images for the applet to display and the radius at which to locate each image.

2. public Img[] processFeedback(Img[]) – which takes the Img array defining the image locations selected by the user for searching as its argument and returns a new Img array defining the images and display radiuses returned by the database search.

Once the applet is loaded processInitialQuery() is called by the init() method. Following this call, the method waits for input on its System in stream giving the filename for the image set returned by a database search. It then reads the file, parses the data contained into an Img array and returns this for display to the applet. processFeedback(Img[]) is called by the applet when the user presses the ‘submit feedback’ button, and receives the current values in the Img array, i.e. the new co-ordinates selected by the user for the images sent by the previous remote method call. The method processes this information to determine the new query vector and feature value weightings. These are then serialized and written to an output file for passing to the database search program. The method then waits for input on its System in stream giving the filename of the results file. This file is processed using the same class methods used by the processInitialQuery() method. Calculating the post feedback query vector The initial query vector is defined by earlier calls to the colour and texture feature extraction programs. The processInitialQuery() method therefore does not need to perform any feature extraction itself. Within the processFeedback() method, the image coordinates received from the applet are processed to form a new query vector by implementing the algorithms described in the section on creating a new query vector on feedback from the user. Processing is carried out in method writeQueryFile(Img[]), which has been written at this development stage in sequential form. In summary, the algorithm comprises:

25

1. For each active image

- calculate distance of the image from the center of the circle in relative units (radius of applet target=10). - calculate weighting as 10 – distance (= imageWeight[i]) - set weighting of query image to 11 - calculate sum of weightings (= totalWeight)

2. For each active image and the original query vector - modify each feature vector by the appropriate weighting calculated in 1. - add together the corresponding vector co-ordinate from each image (the calculation is therefore based on manhattan distances) - divide by the sum of weightings The array feedbackVector[48] then contains the new query vector.

3. Calculate the range of values of the texture vector co-ordinates (= textureLength)

4. Calculate the range of values of the colour vector co-ordinates (= colourLength) 5. Calculate the texture feedback weight (α = feedbackWeight[0]) and colour feedback weight (β

= feedbackWeight[1]) according to the formulae given in the section on creating a new query vector on feedback from the user.

6. Use sub-function writeFile() to serialize the feedback weightings and the new query vector into a file for reading by the database search system. The data is structured as follows (each integer separated by a whitespace): - texture feedback weight - colour feedback weight - 48 values representing the new query vector

Processing the response data The JcoordServer class defines two static variables SEARCH_SIZE and APPLET_SIZE which allows the maximum number of returns from the database and the maximum number of images displayed to differ. The class member array featureVector[SEARCH_SIZE+1][48] is sized to one greater than the search size in order to contain the features for both the return images and the original query vector at index SEARCH_SIZE. The requirements for reading and processing of the response data are identical for both of the remote methods. Both therefore call the local methods readResponseFile() and buildImageSet(). readResponseFile() uses the filename read in at the System in stream and forms a byte array of data from this file. The response file comprises a list of data sets, each comprising an image filename, a pathname and 48 feature vectors. The data is then processed as follows:

1. Read characters to the first whitespace, these being the image filename

2. Read characters to the next whitespace, these being the image pathname, and append the filename to form the fully qualified filename

26

3. For the response to the initial query (identified by the Boolean variable init), this name is discarded. For subsequent queries, copy the filename into the Img object buildImg[j-1] (where j is the data set index).

4. Read and convert to integers 48 times, obtaining the feature vector values for the referenced image

5. Repeat until either the end of the data set is reached or the SEARCH_SIZE limit is exceeded.

6. For the response to the initial query, write the 48 feature vector values of the first data set into the array featureVector[SEARCH_SIZE+1]. For subsequent responses the first data set will contain the previous search vector. This may be used for verification purposes, but in the current implementation is unused and is discarded.

7. For the remaining data sets (up to a maximum of SEARCH_SIZE), read the feature vector values into the array featureVector[j-1] (overwriting previous values where relevant). Set the Boolean value present of buildImg[j-1] to true (used by the applet to determine if the data is included in the current data set). Set the Boolean value present to false for any remaining indexes if the response file contains less than SEARCH_SIZE data sets.

8. For each present image calculate the manhattan distance from the original query image over the 12 texture feature values. Apply the texture feature weighting to this value. Repeat for the 36 colour feature values. [For the response to the initial query, weightings of 0.5 (scaled 128) are assumed.]

9. Read the calculated distance value into the data member myR of buildImg[j-1].

10. Sort the Img array buildImg[] into distance order in the sub-function buildImgSet(), (which also therefore requires corresponding re-ordering of the int[] array featureVector[]).

11. The finished Img array returnImg[] is then returned for processing and display in the applet. 5.4 Implementation of the client side Applet Interaction with the user On submission of a query image the applet starts on a new page, where up to ten thumbnails are then displayed. The thumbnails appear in a spiral inside a circle with the better match lying closer to the centre. The user then moves the images around, arranging acceptable images inside the circle according to the same principle that they were displayed originally, by moving less similar ones away from the centre. If an image does not satisfy the user’s idea of similarity, then it can be dragged outside the circle, where it becomes blurred. It is possible to move deselected images back into the circle and restore their original colour. When the user is happy with the spatial arrangement of the images, he/she clicks the ‘submit feedback’ button. A new set of thumbnails that are thought to be a closer match to the query are then displayed, and the same process is undertaken until a suitable match is found.

27

Figure 8: The working applet The Img class Instances of the Img class have many responsibilities; they must maintain information about the location of the thumbnail and the full image in the database, the position of the thumbnail in the applet and its distance from the centre of the circle, and whether it is currently moving. There are two Java classes suitable for loading and displaying images: the Image class from the java.awt package and the ImageIcon class from the javax.swing package. The class ImageIcon can be used to easily and safely load and image into any applet or application, as instances that are created from a URL or filename are preloaded using MediaTracker to monitor the loaded state of the image. If the Image class is chosen, then it is possible that the applet will display the image before it is completely downloaded. In order to avoid this we opted for the class ImageIcon. Unfortunately we encountered difficulties transporting its instances between server and client, and even though the same compiler was used for the source code of all classes, we were getting error messages about ‘incompatible types of ImageIcon’. This problem was overcome by settling for a less favourable solution, that of transporting the file path of the images instead of the actual instances of ImageIcon. The applet then has to load the image files itself prior to display. The methods for the Img class comprise constructors and accessors only.

Figure 9: Double-clicking on a thumbnail brings forward the full image in a separate window

28

Server and client behaviour The applet looks up and connects to the server during the execution of its init() method. On connection it calls the remote method processInitialQuery(), which returns the first set of images to be displayed. When the user submits his/her feedback by clicking on the ‘submit feedback’ button, the class responsible for implementing the ActionListener interface first calculates the distance of each image from the centre of the circle (myR), then it calls the remote method processFeedback(), and the array of Img instances is sent to the server. Once the feedback is evaluated, a new array is returned to the client. The images’ relative order of similarity is referenced by their myR variables, which is established by the server. On receipt of the new array the applet calculates the x and y coordinates corresponding to the distance from the centre of the circle. Exceptions are handled carefully to ensure successful interaction between client and server. Getting into the swing of things The original GUI components from the java.awt package are tied directly to the local platform’s graphical user interface capabilities. Thus a Java program executing on different Java platforms has a different appearance and sometimes even different user interactions. The Swing components allow the programmer to specify a uniform look-and-feel across all platforms, or even to change the look-and-feel while the application is running. Believing in the importance of this greater level of portability and flexibility we decided to use the javax.swing package for the implementation of the applet. Unfortunately things did not go according to plan, and we encountered many problems with the swing components. Difficulties of transporting ImageIcons between server and client have already been mentioned. We could not find a way to display images on a JFrame, so the ordinary Frame class was used to show the full images. We were unable to control the look-and-feel of the JButton on different platforms; e.g. it worked fine under Windows 95’ but was flickering when the applet ran on Windows NT. Thus the fancy JButton, too, had to go and was replaced by the Button class. Another unforeseen problem was that the browsers do not yet support the Swing components, so the applet had to run in the appletviewer. Applet summary The client ImageApplet is derived from the class JApplet from the javax.swing package. It implements the Runnable interface, and a single thread is used to allow a smoother operation of the paint() method. Double-buffering is applied to eliminate flicker, and the update() method is overridden to reduce work done by the processor. The implicit call to update() would clear the onscreen background before calling paint(), but this is not necessary as the buffer’s background is cleared anyway when repaint() is called in the run() section. For event handling the classes from package java.awt.event are used with the components. Event listeners are registered for mouse events and the event of clicking on the button. For the mouse events both the MouseListener and the MouseMotionListener interfaces had to be implemented, while the button event required the implementation of the ActionListener interface. The class MyFrame is responsible for displaying the full image if a thumbnail is double-clicked. See figure 9 for OMT diagram. Applet Methods public void init()

Connection to the server is established and the first set of images is obtained. Button is added to the content pane and connected to its action listener object, an instance of the class ButtonHandler. A MouseHandler object is then registered with the applet to listen to its mouse events. A call to the polToCart() method of the ButtonHandler class calculates the x and y coordinates of the images to be displayed. With the exception of the Thread instance runner all variables are now initialised.

29

public void start() Initialises runner then calls run()

public void run() Ensures that applet is repainted in every 200ms

public void paint() Draws background and images on the buffer winScratch. If an image is outside the circle, an instance of the java class ImageFilter is created to display the blurred image. Finally, the contents of the buffer are painted on the screen.

public void update() Calls paint()

The ButtonHandler class Methods public void polToCart()

Calculates x and y coordinates for displayable images. It is done so that the least similar image will appear close to the boundary of the circle, and the images will spiral around the centre of the circle.

public void cartToPol() Finds distance from the centre of the circle from the given x and y coordinates of the images.

public void actionPerformed(ActionEvent e) Calls the remote method processFeedback(). When the new set of images is returned, calls polToCart() to get images ready for display.

The MouseHandler class Instance variables private MyFrame frame This is initialised if the user double-clicks on an image. private int count = 0

Registers the number of instances of MyFrame at any one time; the maximum number is one.

All methods of the interfaces MouseListener and MouseMotionListener must be implemented, but mouseEntered( ), mouseExited( ) and mouseMoved() are set to do nothing. public void mousePressed( MouseEvent e) Checks if mouse is currently over image and if so it selects first one found. public void mouseReleased( MouseEvent e)

Deselects selected image and if it is outside the circle, it sets the instance variable active of that image to false.

public void mouseDragged( MouseEvent e) Updates the x and y coordinates of any selected image with the current coordinates of the mouse.

public void mouseClicked( MouseEvent e) The aim is to open new window with the full image if a thumbnail is double-clicked, so the method first must check if the mouse is currently over an image. If this is the case, then it creates an instance of MyFrame for the first such image found and adds it to a window-listener object. It is necessary to dispose of the frame if the window is closed, and an anonymous instance of the java WindowAdapter class is used to perform this task.

Suggestions for improvement In order to make the applet work in browsers, it would be necessary to put some code in the HTML file to make it use the Java plug-in to run the program, since most browsers do not have the Swing classes installed.

30

The user interface could be made more intuitive with the inclusion of bubble-help and some labels to guide the user’s actions. Also, if the user finds the image he/she was looking for, it should be made possible to save that image on the user’s local machine.

ImageApplet init() start() paint(Graphics) run() update()

ButtonHandler polToCart() cartToPol() actionPerformed(actionEvent)

Button-press

MouseHandler mousePressed( MouseEvent e) mouseReleased( MouseEvent e)mouseDragged( MouseEvent e)mouseClicked( MouseEvent e)

Mouse event

MyFrame MyFrame(int) paint(Graphics)

double-click on image

Figure 10 Applet structure

31

5.5 Implementation of distributed aspects One of the early considerations in making the project manageable was to isolate individual tasks, e.g. colour extraction, storing a database in flat files etc. After we divided up these tasks, we each needed to explore the best way(s) to implement them. Clearly it would have been to restrictive to ask every member of the group to only use one programming language, different languages suit different purposes. C executables have a performance advantage, Java has a wealth of reasonably accessible classes for performing specific functions, Perl has powerful text processing and parsing capabilities. Having said this, our programming experience was not exactly extensive at the start of the project, mainly being limited to C++ and a little Java.

Figure 11: Outline of distributed data flow

colour extraction

Jpeg file

djpeg

texture extraction

Bmp file

Colour features file Texture

features file

DB search

Returns and original query file

Jcoord Server

New weightings and query file

Applet

Remote calls

32

Another aspect of the project was the need to pass around quite a lot of data. Passing this data in files seemed appropriate. Additionally the project required that the original query be a submitted file. This necessitated, forming and keeping track of the names of different files as they were passed between the components of the system. Given these circumstances we decided to use CGI scripts written in Perl to manage the various components. Perl is relatively accessible given our training in C and C++ in the Autumn term. It manages processes well by passing strings that could be written on the command line to shells for execution. The output of these calls can be piped to files. The parsing capability allowed us to create and manage the names of the different files that were produced at various stages of processing. Uploading and passing files around using filehandles was also relatively straightforward. It is also straightforward to gain access to environment variables. A remote client opens a file submission page and uploads an image file in jpeg format to the server. The file format is validated. The data is piped using filehandles in a Perl CGI script. The file is placed in a specified directory, ready for preprocessing. Once on the server, after the user confirms that the image is indeed the one they want to submit, the jpeg file is then decompressed (using djpeg) to form a bitmap file. This is achieved by making a system call from within a script. We are passing the bmp file to a C program that extracts the 3 principal Tamura texture features for the original query image split into 4 tiles/quadrants. The resulting list of 12 numbers is piped to a temporary holding file. It was decided to decompress the file and pass it to a C executable because the large number of calculations required for extracting texture features suggested that performance might have been an issue were these calculations done in Java. The colour features are extracted using a Java application (described earlier). This makes use of nice features within the Java 2D API that allow access to individual pixel RGB intensities within a JPEG file. Again the image is split into 4 tiles and values are calculated for mean intensity, variance and the cube root of the third moment. This is done separately for red, green and blue. Hence in total a vector of 36 numbers is produced. This can be stored in a temporary file on the server. The Java application makes use of the Java abstract windowing toolkit package. We deliberately chose this as opposed to swing because we felt that it would have more compatibility with existing browsers. One of the disadvantages was that the awt package (and perhaps even swing) needs to have access to the graphics drivers on machines where the JVM is run. This become a difficulty because the server running the CGI script and hence the JVM may not have graphics drivers installed. We solved this by allowing the server access to the client’s display by setting the appropriate environment variable. Clearly this is far from ideal both from the security point of view and because it requires the client to actively give this access. With more time we could have re-implemented the colour extraction programme either in C/C++ or in Java but using a less demanding package. Another problem we faced with Java was that for a good deal of the term the Java application ran successfully from the command line but would not run when called from the CGI script because the Java Runtime Environment is not installed on the public server we were using. This was solved by running our scripts on a different server that did have both the Apache web server and the JRE installed. The resulting colour and texture files are passed to the search engine which computes the initial set of results, names, paths and feature values and places them in a file for use by the Jcoord Server. It also includes the original query and its feature values as the JcoordServer uses this to calculate a post feedback query vector and weights.

33

As far as RMI is concerned, we have had numerous difficulties in starting and binding to the rmiregistry on the web server through the use of cgi calls hence the feedback part of the system remains disconnected from the submission and initial search section. The rmi server and applet do work successfully on one machine and if we had more time we could make further attempts to distribute them fully. The architecture required for this, given what we have done already, would be as follows: The CCI script opens pipes to and from the server. The CGI script suspends waiting for the processing of the initial query. When completed the script writes the html for the applet and .. Passes the file containing the image returns to the server using the pipe defined earlier. The user manipulates the images on the applet and submits feedback. The server calculates the new query vector and weights and places them in a file on the server. The CGI script has been suspended testing for the existence of this file (Perl –x test) When the response file exists the CGI script passes it to the C search program for a second search. Meanwhile the Jcoordserver suspends waiting for input on System in. The CGI script passes the name of the file containing the results of the second search through the pipe to the JcoordServer which then takes over the job of updating the applet. State As an initial effort to maintain state, the file that is downloaded and subsequent files generated from it will be prefixed by the IP address of the remote client. This is for the purposes of a ‘fast’ prototype. Clearly, if multiple clients make submissions via a single proxy server, then there will be a corresponding loss of information. If we have the time then we will implement a more sophisticated state maintenance method, e.g. by setting cookies for the user. 5.6 R-trees, issues of implementation, practical problems and decisions Unsurprisingly a number of issues have arisen during the course of our work on implementing the R-tree structure and the search procedures and a number of significant decisions have been made regarding implementation. One major issue has been that producing and reading files either produced by or needing to be read by other parts of the system. The database image information has been precomputed and is held in text files, which is read by the program to rebuild the R-tree each time a search is requested. With a database of 54 images this is extremely fast and hence the delay in regenerating the tree was not felt to be an issue. In the case of a much larger database the question of how to store and regenerate the structure without needing to rebuild it from scratch would need to be investigated. After developing the R-tree program we decided that it would be more useful for a user to have feedback on both colour and texture features for an initial search in order to provide a wider range of returned images which the user can then rearrange in order to provide the relevance feedback on the weighting attached to these features. In order to do this we implemented two C++ programs, one for an initial search to return images similar on colour or texture (or both), and one for a subsequent feedback search to return images similar on all features and incorporating the weightings attached to these features by using different search ranges for different features. For an initial search the program reads text files containing the colour and texture information extracted from the query image by the feature extraction programs. The colour information is read first and an R-tree constructed and searched using colour information only in order to return images in a text file providing a reasonable match on colour. This process is then repeated to generate a second text file of images similar in texture.

34

From this point duplicates returned in both files must be removed and the complete feature information for each returned image, together with the image identifier and directory location is returned in a text file. Removing duplicates and accessing the complete information for each returned image proved to be more work than originally envisaged. The complete feature vectors are required, however, in order to allow the feedback program to calculate a new query vector from the returned images feature vectors and incorporating weighting according to the users feedback. Although implementing separate programs for an initial search and a feedback search undoubtedly generated more work, this design does allow potentially more useful results to be presented to the user. The structure of the initial search is shown in the following diagram:

Figure 12: Flow Diagram representing the different stages of the Initial Query

Query Image

Search Engine

Texture Database Colour Database

Interface

Returned images

Query colour R-tree Query texture R-tree

Returns colour related images

Returns texture related images

Eliminates duplicate images

35

For a feedback search, the second program again reads a text file, this time provided by the feedback program, giving two weighting values for colour and texture and the feature values it has calculated for a new query vector. Once this file is read, an R-tree using the complete feature information is generated and searched using different ranges for the colour and feature values calculated from the weightings. Again a text file is generated with the identifier, directory location and feature values for each returned image. The java feedback program initially produced a file of byte data and required a file to be produced giving the image identifiers and locations as strings and the feature values as unsigned characters. Unfortunately this proved problematic particularly the issue of producing a file with a mixture of ascii characters and bytes. Since the initial search program receives standard text files it was decided to adapt the feedback program to generate and read files in the same format rather than to struggle to adapt the C++ programs. The R-tree building and searching programs are complex and, although the algorithms are outlined in Guttman’s original work [1], implementing them required a considerable amount of work. In particular, the recursive nature of the algorithms and the dynamic way the tree configures itself as images are added required careful programming. On a relatively small database searching time may not be an issue, but on a large one could be significant. From the groupings that the R-tree structure generates and the efficiency of a search through the tree it is clear that such a database structure could produce a significant improvement in performance over a less efficient algorithm. 5.7 Front End implementation and design All the web pages are written with HTML either statically or dynamically using cgi scripts. The default page, which is the first page the visitor will see, simply welcomes the visitor to the web site.

Figure 13: Introductory page

36

JavaScript is used to add animated graphic to the default page. The animated graphic is a cycling banner, which simply goes through an array of six images. The next page the visitor will come across is the submission page. The user can decide whether to submit own image, or to choose an image from the database.

Figure 14: Submission page

If the user decides to choose an image from the database, the user will be presented with the list of images. Again, using JavaScript, Radio buttons are added on to this page with the list of images to make sure that only one of the images is chosen. If the user submits without picking any one of the images, a warning box will appear, reminding the user that an image must be picked.

Figure 15: Selecting from the database

37

After the submission there will be a confirmation page to make sure that the image submitted is the one that the user would like to use as the query image.

Figure 16: Confirming the submission

From this page, the user can either choose to submit again, to choose an image from the database instead, or proceed to the result page, which displays the images that are similar to the image submitted.

Figure 17: Draft results page

The user can then decide how relevant the images displayed are to the image that he/she is searching for. The images can be dragged around in the circle. If the image is not relevant at all, then the user can drag the image out of the circle, and it will become blurred. The more relevant images should be placed nearer to the center, while the less relevant further away from the center. After the user is satisfied with the positions images, the user can submit again. Another set of images will then be displayed, which hopefully will be more relevant to what the user has in mind.

38

6 TESTING 6.1 Testing the R-tree construction Optimal Range Several R-trees were constructed each consisting of varying bounding regions of the individual images. Different ranges were set around the feature values from 20 to 100 with the minimum and maximum values for each feature value given by feature_value-range and feature_value+range. (recall values are scaled 0-255) After manually analysing the resulting groupings of images from the R-trees generated, it was established that perceptually optimal grouping occurred with a bounding range of 50 and that sequential increments of the bounding range by 10 resulted in no or little variation in the grouping of images within a leaf node. Example groupings produced using colour feature values and texture feature values separately and colour and texture values together are shown in figure 16. Although the example groupings each have three images the nature of the tree is such that images may be in groups of up to three images and so may be isolated in a group or with just one other image. Groups of three images have been used here to illustrate the similarities of images that are grouped together.

Figure 18: R-tree groupings Two groupings using colour values only:

39

Two groupings using texture values only: Using colour and texture values together:

40

Groupings of the images may well produce perceptual anomalies but these are perhaps difficult to predict. Some images are much stronger with regards to colour or texture and so a grouping based on only colour or feature values may produced a perceptually better grouping. This is certainly the case with the three striped images which clearly have a strong pattern which will be reflected in the texture values but will not be grouped together if colour is also taken into account. It is also illuminating to consider the three very similar groups illustrated which differ only in a single image. The effect of comparing on feature values separately and together becomes clearer from the third image in each of these groups. Since it was clear that there were some perceptual anomalies in the combined R-tree it was decided that potentially more useful results to the user could be produced if the R-tree was separated for the initial query and results returned matching on colour or texture separately. A feedback query is then based on the overall tree when the user has been able to reflect on images returned and to give the system feedback on the relative importance of the different features. 6.2 Testing the search with an initial query Clearly the system can only be effectively evaluated by testing it with query images, in order to ascertain whether the images returned are perceptually similar images to the query image. In order to demonstrate some of the results achieved we have illustrated (figure 19, 20) query images (using images from the database itself) and the database images returned. If the query image is itself one from the database it will also be returned in the results. In any search it appears there may be images returned which do not appear to perceptually match the query image although similarities in colour and texture may be apparent. It may be possible to improve the system’s performance by refining the feature vectors and perhaps giving some values greater weighting but this would require considerable further testing and evaluation. It is because a user’s perception of images may be very subjective that relevance feedback is an important element in the system. It does appear, however, that our criteria for matching images do provide an effective mechanism for selecting appropriate images to return and the speed of searching illustrates the efficiency of searching through an R-tree structure even with multidimensional feature vectors to compare.

41

Figure 19: Initial search results Query1:

Texture search results:

Colour search results:

42

Figure 20: initial search results Query2: Texture search results:

Colour search results:

Again in this case the user is provided with a much greater range of images on which to give feedback than would be the case for an initial search using both colour and texture together.

43

6.3 Testing the search after feedback. Initial Query For this demonstration, image 52 was chosen as search image, therefore we expect to receive an exact match with this image together with a range of similar images, in accordance with the distance algorithms used. The response images displayed were seen to be:

Figure 21: Response to initial query The database search finds an exact match in the database, and this is therefore located at zero distance from the centre on the applet. Other images returned are positioned according to their distance and demonstrate varying similarities in colour and/or texture features. First feedback A feedback query was devised to concentrate on colour features, such that predominantly red images were placed centrally. Images with some red were placed towards the edge and images with little or no red content were deselected (see Figure 2).

44

Figure 22: First feedback query Response to feedback The response to this feedback is shown in figure 3, and shows significant redness in many of the images returned. Some images with low redness but similar texture are also returned, but this can be seen to be a secondary effect in this query sequence.

Figure 23: Secondary results post-feedback

45

The results show that the mechanisms developed do appear to effectively implement user feedback and to provide the user with appropriate images. 7 Conclusion We have all felt this to be an enormously challenging project. While we may not have achieved all that we set out to achieve, mainly because we are yet to ensure that the system is fully distributed, we have overcome many other hurdles along the way. We feel we have implemented the relatively sophisticated database structure and search algorithms reasonably well and we have learnt a lot about the issues underlying the design and building of a distributed system (despite our relative lack of experience in many technical areas). We have learnt a lot during our research of methods for extracting features of images and how content can be represented. Our implementation of the database, the client server interaction and distribution through a scripting language has afforded us the opportunity to develop many practical skills that will no doubt be useful in the future. If we had more time then we would firstly attempt to make CGI Perl calls mesh with an RMI registry and include wavelet features in the database. We would also attempt to implement a more sophisticated form of state maintenance. If we were to repeat the project from scratch then it is fairly certain that we would choose a radically different architecture, perhaps using Java RMI throughout and we would be much more likely to run our server from a non-public machine. Other possible refinements are discussed in the various sections of the report to which they pertain. We have been fortunate in being part of a cohesive and hardworking group. This is, in part, the reason for the length of this report, editing the many contributions towards this document has been a challenge in itself.

46

References: [1] Antonin GUTTMAN, “R-trees: a dynamic index structure for spatial searching”, University of California, Berkeley, 1984. [2] Hideyuki TAMURA, Shunji MORI, Takeshi YAMAWAKI. Textural features corresponding to visual perception. IEEE Transactions on Systems, Man and Cybernetics, June 1978. [3] P. BRODATZ. Textures. Dover, 1966 [4] Using a relevance feedback mechanism to improve content based Image retrieval G. CIOCCA, R. SCHETTINI, paper in Visual information and information systems : Third International Conference, VISUAL '99, Amsterdam, The Netherlands, June 2-4, 1999, proceedings / Dionysius P. Huijsmans, Arnold W.M. Smeulders (eds.) [5] Norbert Beckmann, Hans-Peter Kriegel, Ralph Schneider, Bernhard Seeger: The R*-tree: an efficient and robust access method for points and rectangles. SIGMOD Conference 1990. [6] Peter N. Yianilos, “Data structures and algorithms for nearest neighbour search in general metric spaces”, In Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 311-321, Austin, Texas, 25-27 January 1993. [7] Color and Texture Based Image Segmentation Using EM and Its Application to Content-Based Image Retrieval. Authors: Serge Belongie, Chad Carson, Hayit Greenspan, and Jitendra Malik, Computer Science Division,University of California at Berkeley [8] Color Image Segmentation. Authors: Yining Deng, B. S. Manjunath and Hyundoo Shin*, Department of Electrical and Computer Engineering University of California, Santa Barbara, CA 93106-9560 *Samsung Electronics Inc.

47

19

21

35

43

44

2524

37

13

41

46

23

36

16

32

38

49

08

45

51

40

07

09

53

10

33

30

1826

34

42

20

14

15

Appendix A: R-tree representation of the images based on colour features

Root Node

31

39

54

2722

50

17

28

29

47

03

05

04

48 02

06

52

12

11

01

48

40

37

43

30

27

41

35

44

34

28

23

49

52

51

48

17

18

13

50

46

19

42

20

26

24

53

36

54

47

38

10

45

Appendix B: R-tree representation of the images based on texture

Root Nod

1

32

33

16

14

15

4

5

12

3

11

25

21

22

2

6

9

8

7

29

31

39

features

e

49

19

38

21

40

18

44

13

41

7

9

8

45

36 10

20

24

34

53

43

37

46

4

5

47

33

27

35

30

16

42

15

32

14

2

6

12

2

2

3

Appendix C: R-tree representation of the images based on bo

Root Node

8

9

1

1

3

11

50

52

39

48

49

51

17

22

54

25

23

26

th colour and texture features

50

Appendix D: All the images

img01.jpg img02.jpg img03.jpg img04.jpg img05.jpg img06.jpg img07.jpg







img50.jpg img51.jpg img52.jpg img53.jpg img54.jpg

51

Front Page

Introduction Submit own image

Select image from database

Search Result Search again, after user

feedback Submit another image

Select another image from database

Back to Introduction

Submit Picture

Warning image submitted must be in

correct format

Confirmation Submit again

Go search Select image from database instead

Back to Introduction

Select Image from Database

Confirmation

Select from database again Go search

Submit picture instead Back to Introduction

Search again

Appendix E: Outline of user interface structure

52

Query image feature values response coordinates

Response, references to image(s) with distances

Coordinates and image references of thumbnails as feedback Request for full image

Coordinates & references to thumbnails in response to query Full image file

JavaGUI

Display generator Query co-ordinate reader Local directory interface (open, download …)

CLIENT SIDE MANAGER CGI script(s) controlling flow of information between different areas

Query image(s) Updated positions

Query feature values. Image management data.

DATABASE

• Search engine • R-tree containing

feature values and references to images

CALCULATION UNIT

• Weighting calculation

• Feature extractor

NETWORKCLIENT SIDE SERVER SIDE

FILE DECODER

• JPEG

• BMP

Appendix F: outline of system structure

53

SUMMARY RECORD OF GROUP PROJECT MEETINGS Regular group meetings were held throughout the term some more informally as a check on ongoing progress and others more formally to facilitate decision-making and to allow discussion of project issues. It is hard to determine how many hours we have all spent on the various parts of the group project. All group members have spent a considerable number of hours on the project throughout the term. This is partly because the new knowledge and skills we have needed have mainly been self taught and partly due to lack of experience, a lot of time has been spent simply overcoming technical hitches. A rough average is probably 15 hours per person per week.

IMAGE DATABASE GROUP PROJECT

Initial meeting with Stefan Reuger – 10.01.01 In working on the project we should aim to concentrate on: • feature extraction from images • the user interface

- the user interface and aspects we can demonstrate during our presentation will be particularly important.

• This project has been done previously as an individual project and so the earlier project report should

be an important starting point. • We need to form a picture database of images, which will work well with our retrieval system. • Any format will be suitable – converting images from different formats is not an essential part of the

project. • It should be a substantial database of approximately 50 pictures. • The database will depend on our feature extraction method. Using colour and pattern, for example,

we could use wallpaper patterns. • We should research into what extraction mechanisms exist and existing mechanisms can be used. • Hence a primary task will be to identify images and our feature extraction mechanisms. • In considering image retrieval we could think about speed of retrieval and scalability i.e. how the time

and complexity of search mechanisms will grow as the size of the database increases. • We can research into r search trees – a branching decision making process – and b trees in order to

try to optimise the decision making process. (r trees are an extension of 1 dimensional b trees) We will only be using a small number of features.

• We also need to consider how to store the images and their features, which may be precomputed and stored, in order to make the database efficient.

• In programming our program needs to be network aware. • Only necessary data should be transferred, as speed of data transference may be slow. • Research into the CGI interface model. • The program must not affect the client computer – design a Java applet, which works within the

browser but with restricted functionality. • C++ can be used for programming for the server. • Research into Java applets, which have libraries we can use. • We discussed ideas for image retrieval based on two features as an example, say texture and colour.

If a user queries using a colour feature vector and a texture feature vector q: cq, tq

our program could compute a metric (q,pi) representing the distance between the query image and any other picture in the database.

If the ‘distance’ between colours is represented by dc = |cq, cpi | and similarly for texture dt = |tq, tpi | then the ‘distance’ between pictures is given by �c dc + �tdt where �c and �t are positive weightings such that �c + �t = 1. These weighting could then be amended as required to place different emphasis on features.

• Our retrieval mechanism should present a series of thumbnail images of pictures from the database, which most closely match the query image. Images could be displayed according to how well they match the query image with the closest matching images displayed, for example, in the centre of the screen.

• We will meet with Stefan Reuger next Thursday 18.01.01 at 12pm and will meet together on Tuesday

16.01.01 also at 12pm. • I will send Stefan our e-mail addresses and he will send us a prototype web page to set up for this

project.


Group meeting – 11.01.01 We discussed points which had arisen in our meeting with Stefan and identified areas we needed to research into. • feature extraction methods – possibly three or four – and wavelet analysis to compare two images.

(James and Rita will research this area.) • Databases and tree structures which might be suitable for our purposes. (Philip and Simon) • Putting together a client-server interface and linking together the constituent parts of our system.

(Paul) • A database of pictures. (Susan) • All of us can use the report of the previous project as an initial source of information and references. • Our vision at present is of a system made up as follows:

CGI interface

- but w • Ola• We
C++
e need to research both the individual pa

v may be able to offer pointers on CGI s need to investigate how to obtain web s

IMAGE DATAB

Java applets for the user interface

Database

54

rts and how to link these together!

cripting pace for our work.

ASE GROUP PROJECT

55

Group meeting – 16.01.01 We reported back on areas of research: • A CD of images, which we thought we might be able to obtain from Designer’s Guild, will not be

available for several months! The difficulty is that although we can obtain thumbnail images from the website we cannot enlarge these to create larger images without losing too much definition. Susan will continue to investigate other possible sources. We will also need to look for code for compression and decompression of images.

• Paul has planned to speak to Per for help with setting up a website for our project. • Although we have found information on feature extraction we still need to find code in order to make

our task achievable. • Database packages such as Oracle can implement b trees automatically but this may not be

appropriate for our data if this is multidimensional. Simon has references to code for implementing r trees but we need to consider whether these are appropriate and can be implemented in an existing database package. Philip is planning to begin starting to set up a simple database and attempting to link this with a C++ program.

• We will continue our research and plan to meet again at 1pm on Friday in order to clarify our

questions before we meet with Stefan again on Monday.


Group meeting – 19.01.01 • We’ve had no success so far with finding a database of images (Susan investigated the V&A but

without success). We may end up having to take images from the web. We will need to consider what format to store images in. Software is available on a trial basis to download from the web for creating thumbnails. We still need to look for compression and decompression algorithms.

• The most commonly mentioned algorithms for feature extraction are for colour, texture and shape,

with shape being the most difficult to implement, but have yet to find any readily available code. It would be a good idea to consider collecting some images so that we can test each feature extraction algorithm separately. We could easily produce, for example, some flat colour images to test a colour algorithm.

• As we are still looking for feature extraction algorithms we do not yet know what kind of values we will

need to store in our database although it will need to be an array or vector but we cannot be clear yet how large. Some algorithms may be difficult to implement into our system. OpenGL may have functions, which can be imported.

• Philip will try to begin setting up a basic database structure and investigating how to get Oracle to

‘speak to’ a C++ program. We are not certain at this stage that a database package will be appropriate for the data we will want to store or whether we will have to implement our own data structure in a C++ program.

• Susan will begin planning out a draft web page for the database interface. • We discussed Jim’s diagram showing our proposed structure and we will need to get some feedback

from Stefan on this on Monday.

56

• We have some introductory lectures on Java next week and probably on CGI the following week. • We need to discuss a sense of priorities with Stefan in order to identify which are the areas of most

immediate importance. • We also need a greater allocation of server space as some downloads may be very large and to

clarify permissions to install packages, for example for databases. • Paul has now set up a web page for us to communicate through and to record our progress at

http://www.doc.ic.ac.uk/~pa100/bb • Simon is keeping a log of minutes from our meetings and will make our web page up to date, but

each of us also needs to keep an individual record of time we spend working on the group project, which can also be entered onto the web page.

• Our next meeting with Stefan is on Monday at 12pm.


Meeting with Stefan Reuger – 22.01.01 • We discussed Jim’s structure diagram for our proposed system. • In planning the database we need to think of the types of queries we want our system to be able to

handle. Holding an image per file may be most efficient but not putting too many images into one directory. Building a directory tree may improve performance.

• We would be better off not pursuing commercial database packages as these are too complex and

offer far more features than we need. We could look for a public domain library for tree searches. It may be better to implement the database as a C/C++ program accessing text/binary files (?). Binary flat files for the features are easy to access in C/C++. We could envisage adding a database at a later stage if our data set grew large enough to justify this.

• Most feature extraction requires bitmap format to access individual pixels. So far there seems to be a

lack of documentation on feature extraction algorithms, which makes it difficult to decide what is useful. We need to read a picture into memory as a bitmap and then manipulate it using feature extraction. We have some libraries, which we can look at the functionality of. Two sources of algorithms are Cubic (not open source) and Megawave. Libraries have functions, which take any format. We need to decide which to use and what functionality we need.

Colour histograms extract information more easily e.g. from a colour histogram we could take the most common colour or say the most common three colours as a representation of overall colour.

Tamura algorithms extract six features for texture e.g. coarseness and directionality. It may be easier to write feature extraction algorithms ourselves rather than attempting to use complex packages.

• We can use a library to open files e.g. to open JPEG images as bitmaps. PM and PGM format have

values of RGB as ASCII. The TIF library may also be useful. • In order to make a web page dynamic with the possibility that the server takes input and dynamically

produces a page we will need to use CGI scripting. GCI is a program (possibly in the php language) whose input is an html file.

• Susan has produced a good rough layout of the functionality of the front end. We may want to allow

the user to specify the URL of an image in order to submit their own image or use an image from our own database as the query image. Some form of java applet will be needed.

57

• For the preliminary report we need to detail some background and motivation for the project including what the challenges are and how we are going to solve these. Current research projects could be referred to.

• Performance enhancements can always be discussed in later reports but may not be implemented. • Stefan will send us copies of his lecture slides for us to look at about colour, texture and wavelets.

Texture extraction will tend to be more ad hoc as perception of texture is not well understood. • Demonstrating the relevance feedback is the most important part of the project. • We will meet ourselves on Tuesday 23rd January at 12pm and with Stefan again on Monday 29th

January at 12pm.


Group meeting – 23.01.01 We discussed points which had arisen during our meeting with Stefan. • Jim has e-mailed us information on bitmap files and Jim and Paul will work on accessing data from

bitmap files. • Stefan has e-mailed us copies of his slides, which have algorithms, which we can incorporate into

programs. • For wavelet analysis we would need to use a library. Rita has seen references to C code. • Binary files may be a better form to hold data. We could set up text files and change to binary files.

Rita will investigate algorithms. • Simon and Philip will continue to research data structures as a database package seems to be

inappropriate and Stefan seems to be keen that we implement a structured search tree such as an R-tree.

• We can all attend Theo’s introductory lectures on Java on Wednesday at 10am. • As the first report is due at the end of January each of us needs to produce a short summary, of

around one page in length, on the area(s) we have been working on. If we can e-mail these to Paul by Saturday morning then he will begin to put together a rough draft of the first report over the weekend.


Meeting with Stefan Reuger – 29.01.01 • Jim has found code which can access pixels in bitmap files. • Paul is investigating the use of Perl as he feels that he will have access to more support on Perl

programming rather than PHP. The functionality that we will require can be done using Perl script. • Stefan suggested we may want to look at an MSc conversion visualisation project from last year

which may be another source of information. A pdf file is available from Stefan’s home directory.

58

• Java API from the Sun site may be useful and worth investigating. • The feature algorithms we will aim to use have been mainly decided. At this stage we will aim to

implement feature algorithms for colour and texture but consider the possibility of wavelets for shape features if time and progress allow. Jim and Rita will continue to work on this.

• Simon will continue to work on implementing the algorithms for R-trees. • Susan is continuing to work on applets and the design and implementation of the front end of the

system. • Susan has also now found suitable images from a fabric manufacturer’s website. This represents a

good application area with ideal images to demonstrate the forms of image similarity that we want to implement.

• We considered our possible resource needs but Stefan agreed that we should not need any

additional resources and hence it was decided that we did not need to apply for a project machine. He can distribute space on a 70GB server if needed as we may need more storage space if we decide to extend the database. Again this will depend on time and progress.

• Our next meeting with Stefan was arranged for two weeks time in order to allow us time to make

more significant progress before we discuss issues and progress again. We will meet again briefly on Friday 2nd February.

•


Group meeting – 02.02.01 • The first report was completed and handed in describing our initial design of the image database

system and the elements of it, which we will aim to implement. • Only a brief meeting was necessary to ensure that each member of the group was clear about the

area of the project they are working on. The nature of the system we are developing has meant that we have been able to identify separate elements of the system, which can be worked on almost independently at this stage. Hopefully this will mean that we can make faster progress. All areas will involve a considerable amount of learning as we have little prior experience of the technical skills needed.

• Rita and Jim are working on implementing feature extraction algorithms both in C and in Java. • Susan is learning about applets and implementing the functionality required for the front end. Rita can

also help with this as she has some experience with java. • Simon and Philip will continue to work on implementing building and searching the R-tree data

structure which will form the database element of the system. This can be implemented in C/C++. • Paul will continue to learn about CGI script and Perl, which will be required to link the separate

elements of the system together. • Our next meeting with Stefan will be on Tuesday 13th and so any issues, which need to be raised,

should be passed to Simon before this date.


59

Meeting with Stefan Reuger – 13.02.01 • We outlined our progress to Stefan in the different areas we have been working on. • Colour and texture feature algorithms have been successfully coded. Some investigation of wavelets

has been done but any attempts to code this may have performance implications. If we decide that we will attempt to incorporate wavelet analysis it may save time to attempt to use available software rather than developing our own implementation. At this stage wavelet analysis will not be pursued further unless significant progress is made overall as it is not a priority.

• We are developing coding of the R-tree data structure algorithms although their complex nature

makes the implementation difficult. It was felt, however, that this should be pursued as a data structure should be an element of our database system. Such a data structure would be essential for a larger database as simple sequential searching through a large database would be far too time-consuming.

• We also discussed the issue of saving the database structure as it is not clear how we can save the

structure in order to minimise the time needed to reconstruct it when the program is run. Possibilities might be to save it in a flat file using offsets to mimic the behaviour of pointers within the linked structure.

• We have also begun consider ideas based on the theory of metrics to begin to develop ideas for

implementing ‘distances’ between images and weighing of features based on user feedback. • Using CGI scripting we can now allow a user to submit a file for a query image and to rename files to

handle the issue of multiple users. We have implemented using the users IP address as the file name but this may not be sufficient. It may be necessary to set a cookie to distinguish users but this is a technique in which none of us have experience.

• Methods for passing information to and from the applet were discussed. One possibility is using

sockets in unix as sockets can be opened in java. The socket tutorial on the Sun site may be useful. • Our main concern at this stage is time. We are making progress in all areas but the amount of

learning required and the complexity of the task means that we may not be able to make as much progress as we would like to within the time we have.

• We need to set priorities in order to try to ensure that we realise our main tasks and to develop a

prototype system. Further improvements and refinements can be discussed in the final report. • Out next group meeting will be on Friday 16th February.


Group meeting – 16.02.01 • Following on from our meeting with Stefan it was agreed that implementing wavelets for shape

analysis will only be attempted if we have time later. • Susan is continuing to work on the GUI investigating mouse dragging and in particular how to drag

panels. Clients can submit image files using a browse button. • Simon is continuing to work on a C++ program to implement R-trees.

60

• Storage of the database of images will be in a flat text file as this seems to be most realistic option for us to use. It may be that the tree itself will need to be reconstructed from scratch each time the program is run and with fifty images this should not have too much of a performance implication.

• Paul is continuing to work on CGI and RMI. It may be useful for all of us to look at the java tutorial on

networking basics using RMI. • Jim is currently trying to compile his program for feature extraction to run on the command line with

unix. This is not an issue with Rita’s colour extraction program as it is written in java. • Our next meeting will be on Friday 23rd February.


Group meeting – 23.02.01 • Progress in all areas was discussed and some issues discussed. • The most significant issue, which is becoming clearer, is that elements of the program will need to be

able to read files generated by other elements. This may well require some elements of the system to change the format of files and perhaps to generate intermediate files before they can proceed.

• Significant progress has been made on implementing the R-tree structure after it had seemed that we

might have to resort to using a simpler but less efficient search mechanism. This is now being tested and refined.

• Paul has encountered problems calling the java extraction program and further problems regarding

calling the java program which implements the user feedback are beginning to emerge. Paul has contacted Olav about these but he is currently away.

• We also discussed the requirements of the second report and identified the areas that individuals or

subgroups have been working on and can write about to contribute to this. Paul will edit our contributions in order to try to produce a more consistent style.

• Our next meeting will be on Friday 2nd March.


Group meeting – 02.03.01 • As we have been effectively communicating as we have worked on the elements of the project most

of our progress and resulting issues have already been discussed between ourselves. • As we anticipated time is proving to be one of the most significant problems. All the group members

have put in and continue to spend a considerable amount of time working on the project. • The time spent developing the individual elements of the system is limiting the time available for

integration and this appears likely to be a significant problem. Olav is still away and so we have not had any real support with this problem.

• Our next meeting will be on Tuesday 13th March.

61


Group meeting – 13.03.01 • Olav has now returned from holiday and having contacted him over two weeks ago Paul was finally

able get some support with our integration problems. • The main issue is that java programs cannot be called with CGI on the public server for security

reasons. Had we known this would be a problem we could have requested a project machine when these were still available but it was agreed with Stefan at the time that we would not need this.

• At this stage it is almost certainly unrealistic to hope to integrate a fully operational system. • Our priority now must be to try to ensure that all the separate elements of the system are working in

order that we can demonstrate the results achieved even if files have to be passed manually from one element to another for testing purposes.

• We should not be penalised for being unable to integrate the system as this problem was beyond our

control. Paul has already discussed this with Stefan who did not see this as a failing on our part. • We will all be putting in a considerable amount of time over the next week and everyone should look

at the first and second reports and Olav’s guidelines for the final report to begin to think about their own contributions.

• Rita will be absent for most of the rest of this week following her eye operation. • Further meetings at this stage would seem to be unnecessary as we will undoubtedly be spending a

considerable amount of time together in the labs!

msc group project 2001 - pdfs.semanticscholar.org file1 msc group project 2001 relevance feedback...

Documents