university of surrey image retrieval using object-relational technology “ images must be...
TRANSCRIPT
1
��
University of Surrey School of Electronics, Computing & Mathematics
�
Depar tment of Computing Centre�for�Knowledge�Management
�
Technical Repor t ���
Image Retrieval using Object-Relaional Technology: A Case Study
Mariam�Tariq�AI Group �
February 2002 �
�
� 2
Image Retr ieval Using Object-Relational Technology
“ Images must be understood as a kind of language” ��
W.�J.�T.�Mitchell,�Iconology�
�
�
CONTENTS
1�Introduction..............................................................................................................................2�
2� Characteristics�of�Images.................................................................................................3�2.1� Image�Categorization�and�Indexing.........................................................................4�2.2� Specialist�Images:�Scene�of�Crime�Images ..............................................................7�
3� Methods�of�Image�Retrieval ..........................................................................................11�3.1� Text-Based�Image�Retrieval ..................................................................................12�3.2� Content-Based�Image�Retrieval .............................................................................13�3.3� Hybrid/Integrated�Approaches�to�Image�Retrieval .................................................15�
4� Image�Retrieval�using�Object-Relational�Technology:�A�Case�Study .............................17�4.1� Storage�And�Manipulation�Of�Multimedia�Data. ...................................................17�4.2� Example�of�Retrieval�By�Visual�Properties............................................................19�4.3� Example�of�Text-Based�Retrieval ..........................................................................28�4.4� A�Comparison�of�Image-�and�Text-based�Retrieval ...............................................32�
5� Discussion.....................................................................................................................39�
References………………………………………………………………………………………....42�
1 Introduction
Increasingly,� digital� image� collections� are� becoming� an� intrinsic� part� of� many� professional�
and� specialist� groups� in� domains� as� disparate� as� art,� medicine� and� scene� of� crime�
documentation,� not� to� mention� the� plethora� of� image� galleries� now� available� on� the� World�
Wide�Web.� In� response� to� this� increase� in� image�repositories,� recent�research� is� focussing�on�
the�most�effective�ways�to�organize,�index�and�retrieve�images.�This�report�aims�to�review�the�
current�state�of� the�art� in� image�retrieval� techniques�and�provide�a�case�study�of�how� images�
can� be� stored� and� retrieved� using� object-relational� technology.� Until� recently� most�
information�retrieval�systems�dealt�with�unstructured�text� repositories�with�the�searches�being�
limited� to� matching� keywords� or� full� text;� while� most� database� systems� provided� for� the�
storage� and� querying� of� highly� structured� but� simple� alphanumeric� data.� Now� with� the�need�
and� ability� for� systems� to� store,� utilize� and� distribute� more� complex� data� such� as� images,�
audio� and� spatial� data,� a� whole� new� paradigm� for� a� more� intelligent� search� and� retrieval�
solution�has�become�necessary.��
� 3
�
The� existing� methods� and� systems� available� for� image� retrieval� purposes� will� be� discussed�
with� the�aim�to� identify� their�strengths�and� limitations�as�well�as�raise�some�issues�associated�
with�multi-modal�information�extraction�and�retrieval.�Firstly,�the�various�types�of�images�that�
might� be� used� by� different� specialist� groups� are� briefly� discussed� with� a� more� detailed�
overview�of�scene�of�crime� images.�An�attempt� is�made�to�identify�the�various�characteristics�
of� the� images� that� might� be� relevant� for� indexing� purposes.� The� various� methods� currently�
being� used� for� image� retrieval� are� also� discussed,� from� text-based� retrieval� to� content-based�
retrieval� by� visual� properties,� and� finally� the� more� recent� trend� by� some� researchers� to�
combine� these� two� methods.� Next� a� case� study� is� provided� of� Informix,� an� object-relational�
database�system�that�provides�support� for� the�storage�and�retrieval�of�complex�data�based�on�
DataBlade� technology.�Finally,�a�detailed�discussion� is�provided�to�highlight� the�major� issues�
and� define� the� main� criteria� that�need� to�be�considered� for� the�most�effective� image� retrieval�
solution�based�on�current�technology.�
2 Character istics of Images
�This�section�aims�to� identify� the�various�characteristics�an� image�may�possess�and�how�these�
characteristics� can� be� utilized� for� indexing� and� querying� purposes.� Issues� related� to�
classification� are� briefly� discussed� followed� by� a� look� at� the� different� levels� of� information�
that�could�be�related�to�an�image.�Next�we�talk�about�“specialist�images”�which�have�peculiar�
patterns� and� structures� that� can� be� exploited.� Different� groups� of� professionals� deal� with� a�
different�set�of�images�and�the�types�of�queries�performed�may�vary�greatly.�It�is�important�to�
study� the� way� specialists� may� understand� and� describe� images� as� well� as� define� what�
particular�features�they�look�for�when�searching�for�a�particular�image.�Srihari�(1995b,�p409)�
suggests�that,�“Understanding�a�picture�can�be�informally�defined�as�the�process�of�identifying�
relevant� people� and� objects” .� Some� users� might� be� looking� for� a� specific� object,� a� certain�
scene,�a�particular�event�or�an�aberration�from�a�reference�image.�For�example�a�doctor�might�
be�searching�for�a�slight�variation�in�colour�or�texture�of�an�image�of�a�tissue�sample�to�detect�
a� tissue�abnormality,�while�an�architect�might�be� looking�for� the�north�elevation�of�a�specific�
building.� An� art� critic� might� search� for� a� more� abstract� theme� in� an� image� such� as� the�
depiction�of�pain�or�love.�
�
It�may�be�argued�that�individual�images�have�characteristic�properties:�An�image�is�essentially�
a�contrast�–a�contrast�of�colours,� textures�and�shapes.�These�basic�contrasts�help�humans� to�
� 4
identify� objects� within� an� image.� Hence� it� is� important� to� identify� the�various�characteristics�
that� an� image� might� have� so� that� certain� classes� of� images� with� similar� features� may� be�
grouped� together� and� placed� under� a� certain� category.� Depending� on� this� categorization,�
images� can� be� organized� and� stored� in� a� fashion� that� makes� retrieval� and� browsing� more�
effective.�There�are�numerous�criteria�on�which� images�can�be�categorized�e.g.�an�image�may�
be�coloured�or�black�& �white,�indoor�or�outdoor,�represent�a�seascape�or�landscape,�and�so�on.�
Some�images�may�depict�complex�scenes�such�as�a�group�of�players�in�a�football�match�while�
others� just� focus� on� one� object� e.g.� a� ball.� Unfortunately� due� to� the� variations� in� different�
types�of� images�a�standard�scheme�has�not�yet�been�developed� for�classifying� them.�Fine�art�
cataloguers�were�one�of� the� first� to�attempt� to�organize�and�classify�pictures�of�works�of�art.�
Two� classification� schemes� that� have� been� used� frequently� are� ICONCLASS1� and� the� AAT�
thesaurus�produced�by�Getty�Images2.��
�
2.1 Image Categor ization and Indexing
�There�can�be�various�kinds�of�information�associated�with�an�image:�visual�information�that�is�
related�directly�to�the�properties�of�the�image�such�as�the�colours�and�shapes�of�objects�present�
(e.g.� green,� brown,� round)� as� well� as� non-visual� information� such� as� the� identification� of�
objects� (e.g.� football� field,� players,� ball)� and� events� (e.g.� football� match,� world� cup)� or� the�
format� of� the� image� (e.g.� gif,� jpeg).� Objects� may� have� meanings� at� different� levels� e.g.� an�
image� of� an� eagle� may� be� described� in� terms� of� its� colour� and� shape� which� is�difficult�with�
words� –brown� (or� black� and� white)� in� colour� with� a� streamlined� shape?� At� a� higher� level� it�
can�be�said�to�be�the�image�of�a�bird,�which�may�be�identified�as�an�eagle.�In�a�more�abstract�
sense� it�may,� for�example,�depict�strength� to�certain�people�such�as� the�Nigerians�who�use� it�
on�their�national�emblem.��
�
According�to�Del�Bimbo�(Del�Bimbo,�1999)�there�may�be�three�different�types�of�information�
associated�with�images:�content-independent�metadata,�such�as�the�image�format,�author�name�
and� date;� content-dependant� metadata,� which� refers� to� low-level�or�perceptual features�such�
as� colour,� shape,� texture� and�spatial� relationships;�content-descriptive�metadata,�which�deals�
with�semantic primitives�such�as�the�identity�and�role�of�image�objects�or�their�relationship�to�
real-world� objects� etc,� and� impressions� which� are� more� abstract� concepts� such� as� the�
significance�of�the�depicted�scenes.�One�can�argue�that�an�image�can�be�indexed�and�retrieved�
������������������������������������������������1�http://www.iconclass.nl/�2�http://www.getty.edu/�
� 5
at� these�three�levels�–namely�perceptual,�semantic�and�impressions�with�an�increasing�level�of�
abstraction� when� moving� from� perceptual� to� impressions.� The� semantic� and� impressions�
levels� will� most� probably� use� text� descriptors� for� indexing� and� retrieval.� These� various�
attributes� (e.g.� semantic� relationships)� can� also� be� used� to� classify� images� into� distinct�
categories,� which� could� be� arranged� hierarchically� to� enable� navigation� through� image�
collections.�
�
Jaimes�&�Chang�(2000)�have�discussed�the�need�to�index�visual�information�at�different�levels�
depending�on� the�different�ways�users�might�want� to�access�the� information.�There�may�be�a�
variety� of� information� associated� with� an� image� establishing� the� need� to� determine� what�
information�is�important�for�indexing�and�at�which�level�of�abstraction.�The�authors�appear�to�
distinguish� between� precept,� which� refers� to� what� our� senses� perceive� and� concept,� an�
abstract� or� generic� idea� that� is� an� interpretation�of�what� is�perceived�usually�based�on�some�
background� knowledge.� The� authors� also� make� a�distinction�between�syntax,�which� refers� to�
the� way� visual� elements� are� arranged� and� semantics,� which� deals� with� the� meaning� of� these�
elements;� finally� the� difference� between� general� and� visual� concepts� is� made.� Based� on� the�
above� distinctions,� they� have� developed� a� 10-level� conceptual� framework� for� indexing�visual�
information�at�different�levels�(see�figure�1).��
�
The� 10-level� structure� can� be� divided� into� two� sections:� those� features� based� on�
syntax/precepts� such� as� type/technique� of� image,� global distribution (e.g.�colour,�shape�and�
texture),� local structure� (colour,� shape� and� texture� of� local� components)� and� global
composition (the� particular� arrangement� of� different� items� in� the� image);� and� those� features�
based� on� semantics/visual� concepts� such� as� generic,�specific�or�abstract�objects�and�scenes.�
There�does�not�seem�to�be�a�provision�in�the�model�though�for�the�different�objects�that�make�
up�an�image�to�be�recursively�defined�in�terms�of�the�objects’ �own�visual�properties.�It�should�
be� noted� that� at� the� perceptual� level� there� is� completely� automatic� indexing,� at� the� semantic�
there�might�be�some�automation�but�some�manual�annotation�may�have�to�be�done�as�well.�At�
the�abstract�level�it�has�to�be�completely�manual�and�there�is�more�knowledge�required�as�you�
go�down�these�levels�i.e.�perceptual�to�abstract.�
�
�
�
�
�
�
�
� 6
�
�
�
�
�
�
�
�
�
�
Figure 1�10-level�conceptual�framework�proposed�by�Jaimes�and�Chang�(2000,�p5)�
�
Jaimes� and� Chang’s� 10-level� conceptual� framework� shares� many� features� in� common� with�
Del-Bimbo’s�broader�classification.�The�following�table�shows�how�their�classifications�relate�
to�each�other.�
�
GENERAL�CLASSIFICATION�OF�IMAGES
Del�Bimbo� Jaimes�&�Chang�
Content-Independent Metadata �
Syntax/Precept 1.�Type/Technique�
Content-Dependant Metadata 2.�Global�Distribution�
3.�Local�Structure�
4.�Global�Composition�
Content-Descr iptive Metadata -Semantic�primitives�
Semantics/Visual Concept 5.�Generic�Objects�
6.�Generic�Scene�
7.�Specific�Objects�
8.�Specific�Scene�
-Impressions� 9.�Abstract�Objects�
10�Abstract�Scene�
Table 1�Comparison�of�Del�Bimbo�and�Jaimes�&�Chang’s�Indexing�Levels�for�Images.�
�
Jaimes� &� Chang’s� syntax/precept� category� seems� to� relate� to� Del� Bimbo’s� content-
independent� metadata� and� content-dependent� metadata� categories.� According� to� Jaimes� &�
Chang,� the� Type/Technique� category� could� be� a� description� of� the� type� of� image� such� as� a�
painting�or�black�and�white�photograph� in�which�case� it�would�correspond�with�Del�Bimbo’s�
� 7
content-dependent metadata� level.� The�content-independent�metadata,� for�example� the� image�
format,�could�be�said� to� fall�under� the�type�and�technique�level.�The�semantics/visual�concept�
category� can� be� seen� to� correspond� with� the� content-descriptive� metadata,� which� is� divided�
into� semantic� primitives� and� impressions.� Since� impressions� deals� with� abstract� concepts,� it�
could�map�to�level�9�(abstract�objects)�& �10�(abstract�scene)�of�Jaimes�&�Chang’s�framework.�
It� is� interesting� to�note� that� if�we� try� to�draw�an�analogy�with� linguistics,� the�syntax/precept�
(content-independent�and�dependent�metadata)�level�is�similar�to�the�syntax and grammar�of�a�
language,� the� semantics/visual� concept,� (content-descriptive� metadata)� is� akin� to� the�
semantics�of�a� language�and� impressions�could�be�likened�to�pragmatics�where�the�context�is�
important.�The�above�table�gives�us�a�scheme�for�indexing�all�types�of�images�in�general.�The�
next� section� goes� on� to�discuss� the�characteristics�of�specialist� images�with� the�aim� to�study�
whether�they�have�distinct�features�that�could�be�used�for�indexing�purposes�in�addition�to�the�
general�features�discussed�above.��
�
2.2 Specialist Images: Scene of Cr ime Images
�Specialist� images� may� be� classified� as� a� set� of� images� that� are� used� by� professional� or�
specialist� groups� such� as� architects,� engineers,� radiologists� and� crime� scene� investigators.� It�
usually� takes�an�expert� in� the�specific� field� to� fully�comprehend�the�complex�information�that�
might� be� depicted� by� the� images.�Usually�specialist� images� tend� to�be� relatively�constrained,�
which�makes�it�easier�to�categorize�them�as�compared�to�any�random�set�of�images.�Consider,�
for�example,�a�radiologist� looking�at�a�set�of�x-ray� images.�Each� image�varies�from�the�other�
but�there�are�certain�features�that�are�common�in�all�such�as�depicting�bone�material�and�being�
on�a�grey�scale.�A�specialist�image�generally�focuses�on�objects�that�are�specific�to�the�domain�
and� a� number� of� these� objects� will� reoccur� frequently� in� different� images.� One� image� may�
contain�a�number�of�different�objects,�which�may�be�related�to�each�other�through�a�variety�of�
meaning� relations.� The� presence� of� one� object� may� help� to� elaborate� the� meaning� or� role� of�
another�or�it�might�preclude�the�existence�of�the�other.��
�
Key� meaning� relationships� include� spatial� relationships� and� part-whole� relationships.� Hence�
images�can�be�said�to�have�a�number�of�‘hidden’ �features�that�are�discernable�only�to�a�trained�
person.� For� instance� most� people� will� appreciate� an� aesthetically� pleasing� building� but� only�
few� of� us� will�be�able� to�discern� the�aesthetics� from� the�architectural�design�of� the�building.�
Similarly�those�of�us�who�have�studied�physics�or�electronics�at�school�may�be�able�to�identify�
the�components�of�an�electronic�circuit�but� it� is�only� the�well�trained�who�can�tell�at�a�glance�
� 8
whether�one�or�more�component�of�the�circuit�is�missing�or�misplaced.�When�experts�look�at�a�
building,� a� circuit,� an� x-ray� image,� they� see� a� unity,� which� conveys� a� certain� meaning� that�
escapes�the�novice.�A�specialist� image�can�be�distinguished�at�the�level�of�meaning�and�at�the�
level� of� communicative� intent� or� pragmatics� for� example� in� Del� Bimbo’s� classification� a�
specialist�image�may�require�rich�content-descriptive�metadata.��
�
In� this� section� we� attempt� to� discuss� in� some� detail� the� different� types� of� images� a�scene�of�
crime� officer� may� take� with� respect� to� the� content� of� the� image� in� order� to� elicit� some�
characteristic� features� that�could�be�used� for�categorization.�Crime�scene�photographs�play�a�
key� role� in� most� serious� crime� scene� investigations.� The� main� purpose� of� crime� scene�
photographs� is� to� document� the� scene� of� crime� exactly� as� the� crime� site� was� found.� These�
photographs� are� then� used� for� detailed� analysis� by� the� investigation� officers� and� later� as�
evidence� in�court.�Since�every�crime�scene� is�unique�there� is�no�standard�rule� for� the�number�
or� types� of� photos� that� are� taken� but� a� basic� pattern� is� usually� followed.� The� crime� scene�
officer� usually� follows� along� the� supposed� ‘path’ � of� the� crime,� including� the� point� of� entry,�
location�of�the�crime,�and�the�point�of�exit.�There�are�generally�three�types�of�photographs�that�
are�taken�(Staggs,�1997):��
�
1)� Overview� photographs� that� show� the� entire� scene� for� example� in� the� case� of� a�
burglary,� photos� might� be� taken� of� the� four� sides� of� the� building� especially� the�
elevation(s)� that� might� have� the� point� of� entry/exit� as� well� as� the� surroundings.� In�
some� cases� birds� eye� view� or� aerial� photos� might� be� taken� as� well� to� show� a� wider�
extent� of� the� surroundings.� If� an� incident� was� in� a� room,�photos�of�each� face�of� the�
walls� might� be� taken� as� well� as� from� each� corner� of� the� room.�Sketches�are�usually�
made�to�indicate�the�location�and�angle�of�the�camera;��
2)� Midrange� photos� are� taken� of� interesting� objects� to� show� the� spatial� relationships�
amongst�them;��
3)� Lastly�close-up photographs�are�taken�of�relevant�objects.�These�might�be�taken�using�
a�special�one-to-one�lens�or�horizontal�and�vertical�rulers�might�be�placed�to�show�the�
scale� of� the� object.� In� this� case� the� object� is� first� photographed� without� the� rulers.�
Sometimes� overlapping� photographs� might� be� taken� to�provide�a�panoramic�view�of�
the�crime�scene.��
�
�
� 9
Overview
Midrange
Close-up
�
Table 2�Examples�of�overview,�midrange�and�close-up�photographs�of�some�ridge�detail.�
�
The� crime� scene� photographer� uses� special� techniques� such� as� “painting� by� light” � to�
photograph� images� at� night� or� indoors� when� it� is� dark.� The� types� of� photographs� taken�will�
generally�differ�depending�on�the�specific�crime�scene�e.g.�car�accidents,�homicide,�arson�and�
so�on.�Photographs�are�often� taken�of�fingerprint�marks,�footwear�marks,�tool�marks�and�tyre�
marks� that� can� be� later� compared� for� identification.� Blood� stain� patterns� and� trajectories� as�
well� as� ricochet� marks� might� be� photographed� for� further� analysis� by� a� forensic� expert� to�
determine�the�angle�and�height�etc�of�a�shot�or�stab�wound�(Staggs,�1997).�Currently�scene�of�
crime�officers�provide�brief�textual�captions�for�the�images,�which�may�or�may�not�have�more�
detailed� descriptions� provided� in� the� report.� It� should� be� noted� that� the� officers� have� to� be�
careful�not�to�provide�any�interpretation�related�to�the�scene,�for�example,�they�tend�to�refer�to�
a�blood-like�substance�or�red-coloured� liquid� instead�of�saying�blood�even�though�that’s�what�
it� obviously� appears� to� be.� The� language� used� to� describe� the� images� is� discussed� in� some�
detail�in�the�next�section.��
�
�
�
�
�
�
Figure 2�Overview�of�possible�image�categories�for�scene�of�crime�
�
Figure� 2� intends� to� provide� an� overview� of� some� categories� of� images� that� are� used� in� the�
scene�of�crime�domain.� It� is� important� to�know�in�what�manner�the�SOCOs�will�be�searching�
CRIME�SCENE�IMAGE�CATEGORIES�
GENERAL�SCENES�CARS�
BODY�
BLOODSTAIN�PATTERNS�
TYREMARK�
SHOEMARK�
FINGERPRINT� WEAPONS�
� 10
for� the� images.� For� fingerprints,�shoe-marks,� tyre-marks,�and�bloodstain�patterns� it�might�be�
impossible� to� articulate� a� description� in� words� so� a� content-based� search� will� definitely� be�
needed� in� this�case.�Fingerprint�databases�are�already�one�of�the�most�successful�examples�of�
CBIR�and�are�being�used�extensively�for�identification�purposes�such�as�the�AFIS�(Automatic�
Fingerprint� Identification� System)� used� in� the� USA� and� NAFIS3� (National� Fingerprint�
Identification� System)� recently� launched� in� the� UK.� All� these� categories� have� very� distinct�
patterns�so�CBIR�works�best.�However�when�it�comes�for�general�scene�of�crime�pictures�and�
images� of� bodies� and� weapons,� a� CBIR� search� has� a� lot� of� limitations� so� there� is�a�definite�
need�for�indexing�by�concepts�in�the�form�of�text.��
�
The�table�below�shows�an�image�taken�at�a�scene�of�crime.�An�attempt�has�been�made�purely�
on� the� intuition� of� this� author� to� classify� the� image� based� on� Jaimes� &� Chang’s� stratified�
model. The�content-independent� information�such�as�a�colour� image�of� jpeg�format�can�go� in�
the� Type/Technique� level.� This� image� looks� like� it� has� been� taken� at� midrange� with� the�
intension� of� showing� the� relationships� between� the� body� and� the� table.� Since� midrange� is� a�
type� of� image,� that� information� can� go� to� level� 1� as� well.� Levels� 2� to� 4� involve� low-level�
image� processing� techniques,� which� will� be� done� automatically� with� no� human� intervention.�
Generic objects� could�be�a� ‘ table,’ � ‘body,’ � ‘gun’ �and� ‘ floor’ �while�a�generic scene� could�be�
‘ indoor’ .� One� can� then� go� on� to� define� the� objects� more� specifically� in� level� 7� such� as� ‘a�
wooden� table,’ � ‘male� body,’ � ‘browning� pistol.’ � The� specific� scene� in� question� could� be� a�
crime� scene.� It� is� much� harder� to� define� abstract� objects� and�scenes:�One�could�say� that� the�
pool�cue�could�have�been�used�as�a� ‘weapon’ �and�the�crime�scene�could�be�a�‘murder�scene’ .�
Scene�of�crime�officers�will�most�probably�not�be�searching�for�images�at�the�abstract�level.�
�
������������������������������������������������3�http://www.pito.org.uk/news/press/26apr01.asp�
� 11
Body on floor showing adjacent table
Syntax/Precept�
Jpeg,� Colour,�Midrange�
1.�Type/Technique�
2.�Global�Distribution�
3.�Local�Structure�
(Automatic�low-level�processing�
techniques�used�here)�4.�Global�Composition�
Semantics/Visual Concept�
Body,� floor,� table,�can,�gun�
5.�Generic�Objects�
Indoor� 6.�Generic�Scene�
Wooden� Table,� male�body,�Budweiser�can,�Browning� pistol,�brown�pool�cue�
7.�Specific�Objects�
Crime�scene� 8.�Specific�Scene�
Weapon,� deceased�man�
9.�Abstract�Objects�
Murder�scene� 10�Abstract�Scene��
Table 3-� Example� of� how� Jaimes� &� Chang’s� classification� can� be� used� for� a� crime� scene�image��
3 M ethods of Image Retr ieval
�Initially� keyword-based� searching� was� one� of� the� most� popular� ways� for� retrieving� images.�
Images� were� sometimes� placed� under� specific� categories,� which� were� then� organized� into� a�
hierarchy� that� users� could� navigate� and� browse� to� search� for� relevant� images.� Images� were�
generally� annotated� and/or� categorized� manually,� which� may� be� very� time� consuming� if� the�
image� repository� is� large� as� well� as� have� a� degree� of� subjectivity.� The� content-based� image�
retrieval� techniques� were� developed� in� order� to� address� these� limitations.� Here� retrieval� is�
based�on�the�visual�properties�of�an�image�such�as�colour�and�texture.�Though�successful�to�a�
certain�extent�this�method�has�its�own�limitations.�Recently�there�have�been�trends�to�combine�
these�two�approaches�with�the�premise�that�the�two�methods�will�complement�each�other.�The�
next� three� sections� discuss� the� text-based,� content-based,� and� integrated� image� retrieval�
methods�in�some�detail.�
�
� 12
3.1 Text-Based Image Retr ieval
�Keyword-based� search� has� been� amongst� the� oldest� techniques� used� for� indexing� and�
retrieving� information� initially� just� for� text� repositories�and� later� for� images�as�well.�Experts�
or�people�familiar�with�a�certain�domain�would�manually�annotate�images�with�keywords�they�
thought� appropriate.� These� keywords� could� then� be� used� for� indexing� the� images� and�
performing� searches.� Systems� started� supporting� more� complex� searches� such� as� Boolean�
Searches� where� various� keywords� can� be� combined� together� using� various� connectives� to�
make�a�query�more�precise.�This�method�is�being�used�for�most�search�engines�on�the�Web.�
�
Web-based� search� engines� such� as� Yahoo� and� WebSeek� (Chang� 1997)� place� images� into�
categories,�which�are�hierarchically�arranged�so�that�people�can�navigate�and�browse�through�
an� image� collection.� Recently� automatic� indexing� is� being� attempted� by� using� text� in�
proximity�to�an�image�on�a�page�in�the�WWW�to�generate�keywords.�Frequency�of�occurrence�
is� used� as� a� measure� to� determine� relevancy� with,� for� example,� more� weight� being� given� to�
words� in� title�blocks�and�words�in�proximity�to�the�image�in�multimedia�documents.�Different�
methods� are� used� to� automatically� produce� a� set� of� relevant� index� terms� by� limiting� the�
number�of� false� terms�and�repetitions�such�as�elimination�of�stopwords,� the�use�of�stemming,�
identification� of� noun� groups� to� eliminate� verbs� and� adjectives� as� well� as� exploiting� the�
structure�of�a�document� if�present�(Yates�& �Neto,�1999).�WordNet� is�quite�extensively�being�
used� for�query�expansion�purposes,� for�example,�Flank� (1998)�uses�weights� for� the�different�
lexical� semantic� relationships� present� in� WordNet� such� as� hypernymy� and� meronymy� for�
semantic�expansion.�Recently�some�researchers�have�started�using�NLP�techniques�to�process�
captions� associated� with� images� to� aid� in� the� retrieval� process.� Rose� et� al� (1999)� extract�
dependency� structures� from� image� captions� as� well� as� queries� and� a� matching� algorithm� is�
used� to� compare� the� two� (they� call� this� phrase matching).� The� match� is� weighted� and�
combined� with� keyword� matching� to� provide� an� overall� score.� Their� system� is� limited� to�
working� with� short� captions.� An� average� of� 9�words�was�used� for� the�evaluation,�conducted�
by�2� judges,�which�had�an�average�precision�of�85.5%�for�keyword�matching�and�93.5%�for�
phrase�matching�at�10%�recall.�
�
�
�
�
� 13
3.2 Content-Based Image Retr ieval
�One� can� argue� that� content-based� image� retrieval� (CBIR)� systems� work� at� the� perceptual�
level.� Primitive� features� such� as� colour,� shape� and� texture� are� automatically� extracted� using�
certain�algorithms� for�a�stored�set�of� images.�The�user�supplies�a�sample�visual�query�whose�
features� are� extracted� and� compared� with� those� of� the� stored� images� and� the� images� most�
similar� to� the� query� image� are� returned.� This� process� is� known� as� query by visual example�
where�the�example�may�be�an�image,�a�sketch�produced�by�the�user,�or�in�the�case�of�retrieval�
by�colour,�patches�of�relevant�colour�strategically�located�by�the�user.�This�type�of�retrieval�is�
known�as�similarity-based� retrieval�and�differs� from�matching in� that� images� in� the�database�
are� reordered� according� to� their� measured� similarity� to� a� query� example.� Similarity-based�
retrieval� is� concerned� with� ranking� rather� than� classification.� The� user� can� interact� with� the�
system� to� search� for� relevant� images� by� recursively� defining� and� refining� a� query� using� a�
mechanism�known�as�relevance feedback (Del�Bimbo�1999).��
�
This� section� will� very� briefly� discuss� the� different� methods� used� in� image� processing� to�
compute� the�various�visual�properties.�Colour� is�one�of�the�most�popular�visual�features�used�
for� indexing.� The� most� common� technique� used� is� the� colour histogram,� which� identifies�
different� colour� channels� in� an� image� and� is� constructed� by� counting� the� number� of� pixels�
belonging� to� each� channel.� The� colour� histogram� has� a� number� of� variations� such� as� the�
cumulative�colour�histogram,�colour�moments�and�colour�sets.�When�a� large�set�of� images� is�
used,�using�a�global�colour�scheme�can�result�in�a�large�number�of�false�positives.�Due�to�this�
limitation� a� colour layout� method� can� be� used� which� divides� the� image� up� into� different�
regions� and� then� computes� the� colour� features� of�each� region.�The�problem�here�can�be� that�
the�segmentation�may�not�work�very�reliably.��
�
Shape� is� one� of� the� key� features� used� to� identify� an� object.� In� image� processing� there� are�
mainly� two� different� methods� to� represent� shapes:� the� boundary� based� method� such� as� the�
Fourier� descriptor� used� to� define� just� the� outer� contour� of� a� shape� and� the� region� based�
method� such� as� the� moment� invariant,� which� computes� values� for� the� entire� shape� region.�
Invariance� is� an� important� property� that� deals� with� the� issue� of� shape� representations�
remaining�unaltered�due�to�object�transformations�such�as�translation,�rotation�and�scaling�etc.�
Due�to� this� transformation� issue,�automatic�segmentation�is�difficult�for�shape�features�if� it� is�
needed�to�identify�objects�in�a�reliable�way�from�images.��
�
� 14
Texture� is� a� property� of� most� surfaces� and� is� characterized� by� differences� in� brightness�and�
intensity.� It� is� an� important� feature� used� to� distinguish� between� image� patches� of� the� same�
colour� such� as� the� sky� and� the� sea.� Examples� of� some� visual� texture� properties� used� are�
coarseness,� contrast,� regularity� and� directionality.� The� most� popular� method� used� is� the�
wavelet� transform,� which� has� been�combined�with�other�methods�such�as�Kohonen�maps� for�
improved�results�(Del�Bimbo�1999,�Chang�1999).��
�
Spatial relationships�between�objects�in�images�can�be�useful�in�identifying�objects�as�well�as�
providing� information� on� how� different� objects� relate� to� each� other.� These� relationships� can�
be� directional� or� topological� (Del� Bimbo� 1999).� Directional� relationships� consider� relative�
directions�of�different�objects�such�as�right�of,�below,�above,�as�well�as� the�distance�between�
objects,� which� may� be� defined� using� the� Euclidean� metric.� Topological� relationships,� which�
are� invariant� under� various� transformations,� use� set� theoretical� concepts� such� as� adjacency,�
overlapping,� disjunction� and� containment� to� determine� relationships� between� objects.� In�
images� if� the� different� visual�objects�can�be� identified� the�spatial� relationships�between� them�
can� be� determined� using� a� spatial� parser.� Similarly� natural� language� can� describe� spatial�
relationships� between� a� reference� object� and� unknown� object� using� relational� propositions,�
which� can� then� be� used� to� locate� the� unidentified� object.� Srihari� (1995b)� has� used� this�
information�present�in�photographic�captions�to�aid�in�identifying�the�people�present.�
�
A� number� of� CBIR�systems�are�currently�available,�which�may�be�used�either� independently�
or�incorporated�into�database�modules�(see�section�3).�At�the�point�of�writing�this�report�three�
commercial� systems�are�available�(Visual�Retrieval�Ware�by�Excalibur�Technologies,�Virage�
and� QBIC).� The� QBIC� system� (Maybury� 1999)� developed� at� IBM� is� one� of� the� earliest�
commercial� CBIR� systems� that� had� a� large� influence� on� systems� developed� later� using� the�
same� techniques� and� ideas.� QBIC� supports� queries� on� image� and� video� databases� based� on�
sample� images,� user-defined� sketches,� user-selected� colour� and� texture� patterns.� The� QBIC�
system�has� two�main�components:�a�database�and�a�visual�query� language.�When�populating�
the�database�the�images�are�processed�to�extract�the�relevant�features,�which�are�then�stored�in�
the�database.�The�query�language�is�used�to�generate�a�graphical�query�whose�features�can�be�
used�to�search�for�similar�images�in�the�database.�Most�of�the�other�systems�are�experimental�
systems�that�are�available�on�the�Web�for�demonstration�purposes.�Chang�et�al�(1999)�provide�
an� insightful� discussion� of� the� current� techniques� and� issues� involved� in� the�CBIR�approach�
while� Veltkamp� &� Tanase� (2000)� provide� an� extensive� survey� of� 39� systems� based� on� a�
detailed�framework�of�the�relevant�features.�
� 15
3.3 Hybr id/Integrated Approaches to Image Retr ieval
�There�has�been�a�large�extent�of�research�done�in�developing�models�and�systems�in�the�fields�
of�Vision�Processing�and�Natural�Language�Processing�but�until� recently� there�has�been�little�
interest� in� integrating� these�two�areas.�Srihari�was�one�of� the� first� researchers� to�contemplate�
the�notion�of�a�more�effective�understanding�and�retrieval�of�pictures� if� the� two�modalities�of�
vision� and� language� are� combined.� This� section� uses� her� work� as� a� basis� to� discuss� similar�
research�trends�as�well�as�to�identify�some�of�the�main�issues�that�need�to�be�considered�when�
building�a�multi-modal�system�(Srihari�1995a).��
�
The� central� issue� addressed� by� Srihari's� research� is� the� correspondence problem (Srihari,�
1995a) i.e.�how�visual� information�can�be�correlated�with�words� (it� is� important� to�note� that�
words� could� refer� to� sentences,� events� etc.,� and� not� just� nouns).� She� proposes� developing�
models� for� vision� and� language� which� when� combined� with� domain-specific� knowledge�
would� enable� a� mapping� from� one� modality� to� another.� An� interesting� issue� to� investigate�
here� is� how� the� different� semantics� of� vision� and� language� relate� to� each� other.� Srihari� has�
developed� a� system� called� PICTION� which� extends� the� notion� of� using� text� associated� with�
images,� known� as� collateral� text,� in� scene� understanding� by� defining� visual semantics -� a�
theory� of� how� to� systematically� extract� and� interpret� the� visual� information� present� in�
language.� This� is� carried� out� at� the� lexical,� syntactic� and� semantic� levels� as� well� as� by�
interpreting�spatial�prepositions.�Image�interpretation�is�carried�out�to�derive�the�meaning�of�a�
scene�in�terms�of�the�objects�present�and�their�interrelationships.�
�
According� to� Srihari� visual� descriptions� can� be� organised� in� a� hierarchy� similar� to� textual�
descriptions� and� she� has� extended� WordNet� by� superimposing� a� visual hierarchy� with� links�
such� as� visual� is_a� to� represent� hyponymy� and� visual� part_of� for� meronymy.� Srihari’s�
research� is� however� limited� to� photographs� of� people� and� certain� types� of� structured�
information� available� in� the� respective� photographic� captions.� The� idea� needs� to� be� further�
extended�for�use�with�any�random�text�or�language.�Another�point�she�brings�up�is�that�of�co-
referencing. This� involves� the�ability� to�determine�the�pictorial�object�being�referred�to�by�an�
entity�described�in�a�text.�It�should�be�noted�here�that�the�word�being�used�is�entity -�there�will�
always�be�a�set�of�words� that�might�not�have�a�corresponding� image�(e.g.�certain�adjectives).�
As�Srihari�has�pointed�out,� the�correspondence�between�words�and�images�is�generally�many-
to-many.� It� should� be� considered� that� words� and� images� could� be� inter-related� amongst�
themselves�as�well�as�each�other�in�various�ways.��
�
� 16
Paek� et� al.� (1999)� use� a� different� methodology� for� the� integration� of� linguistic� and� visual-
based� approaches� for� the� labelling� and� classification� of� indoor/outdoor� photographs� in� the�
domain� of� terrorist� news.� They� use� a� term� frequency� inverse� document� frequency� (TF*IDF)�
vector-based� approach� for� the� text� using� different� types� of� words� extracted� from� different�
amounts� of� text� such� as� caption� and� full� article.� On� the� image� side� they� use� an� object�
frequency� inverse�image�frequency�(OF*IIF)�vector-based�approach�to�classifying�objects�that�
are� defined� by� the� clustering� of� automatically� segmented� regions� of� training� images.� By�
combining� these� two�vectors� they�reach�a�classification�accuracy�of�around�12%�over� that�of�
other�existing�methods.��
�
Similarly� Sclaroff� et� al� (1999)� have� proposed� a� system� that� combines� visual� and� textual�
statistics�in�the�form�of�a�single�vector�that�can�be�used�to�search�for�images�on�the�Web.�The�
novel� idea� here� was� to� use� Latent� Semantic� Indexing� (LSI)� on� the� text� side� and� colour� and�
orientation� histograms� for� the� images.� LSI� provides� some� advantages� over� the� classical�
keyword� method� used� by� most� search� engines� in� that� it� implicitly� covers� the� issues� of�
synonyms,� word� senses,� term� omissions� etc.� The� text� used� is� either� taken� from� the� URL�
and/or� heuristically� determined� form� its� proximity� to� the� image� on� a� web� document.� Their�
experiments�showed�that�a�better�performance�was�achieved�by�a�combination�of�the�text�and�
visual�vectors.�
�
The� research� discussed� above� mainly� focuses� on� integrated� indexes� and� not� knowledge�
structures� that� model� both� visual� and� linguistic� features.� Benitez� et� al� have� developed� a�
knowledge� representation� framework� called� MediaNet� that� represents� both� semantic� and�
perceptual� information� related� to� multimedia� data.� A� semantic� network� is� used� where� the�
concepts� (nodes)� refer� to� the�semantic�notion�of�what�an�object� is.�Each�concept�may�have�a�
number�of� text� representations�such�as�“man,” � “human,” � “homo”�as�well�as�audio�and�visual�
representations.�Concepts�can�be� linked� together�by�various� relationships�such�as�hyponymy,�
meronymy,� entailment� etc.� Concepts� with� their� textual� representations� were� created� using�
WordNet,� which� also� generated� all� the� senses� and� synonyms� for� each� word.� WordNet� was�
also�used� to�automatically�generate�all� the�required�relationships.�Visual� representations�were�
automatically�generated�using�colour�and�texture�feature�extraction�tools.�
�
�
�
�
�
�
� 17
4 Image Retr ieval using O-R Technology �This�section�aims�to�provide�an�overview�of�the�current�state�of�the�art�in�database�technology�
within�the�context�of�image�retrieval.�Databases�have�become�an�essential�and�integral�part�of�
most� modern� enterprises.� A� database� system� consists� of� a� database� management� system�
(DBMS)� and� one� or� more� databases.� A� database� is� a� collection� of� interrelated� data,� which�
represents� information� concerning� some� real-world� enterprise� such� as� a� scene� of� crime�
information� system.� The� DBMS� is� a� software� system� that� handles� the� access,� storage� and�
maintenance� of� the� data� as� well� as� functioning� as� the� interface� between� the� users� and� the�
database.�There�has�been�a�steady�evolution�of�database�systems�in�response�to�changes�in�the�
type�and�amount�of�data�as�well�as�advances� in�hardware�support.�For�many�years�relational�
database� management� systems� � (1980’s� to� present)� have� dominated� the� market� being�
optimised� for� storing� and� querying� simple� alphanumeric� data,� which� was� sufficient� for�
traditional� applications.� However,� increasingly� modern� database� applications� need� to� store�
and� manipulate� more�complex�data� types�such�as� images,�video,�spatial�and� time�series�data�
etc.� This� need� led� to� the� advent� of� object-oriented� database� technology� (early� 1990’s)� based�
on�the�object-oriented�programming�paradigm,�which�only�acquired�a�niche�market.�The�main�
reason�for� that�being�that�OODBMSs�are�optimised�for�handling�complex�objects�but�are�not�
query-oriented.� Recently� these� two� technologies� have� been� merged� together� resulting� in�
object-relational�database�technology�with�the�WWW�being�the�major�driving�force�behind�it.��
�
ORDBMSs�tend�to�be�query-oriented�on�complex�data�hence�taking�the�best�of�both�worlds�as�
well� as� being� downward� compatible� with� relational� systems.� Most� of� the� mainstream�
relational� vendors� have� adopted� object-relational� technology� such� as� Oracle,� Informix,� IBM,�
and� Sybase� etc.� As� yet� the� OR� model� functionality� is� based� on� the� SQL-3� standard.� Some�
ORDBMSs� features� include� support� for� inheritance� and�encapsulation,�definition�of�domain-
specific�data�types�with�related�routines�and�functions,�support�of�smart�large�objects.��
4.1 Storage And Manipulation Of Multimedia Data.
�As� discussed� in�section�3.2,�certain�software�companies�such�as�Excalibur�Technologies�and�
Virage� have� developed� retrieval� solutions� for� image,� audio� and� video� data.� An� interesting�
trend� recently� has� been� to� integrate� such� technologies� with� database� systems� in� the� form� of�
plug-in� modules� (for� example� Informix’s� extensible� DataBlade� module� technology� enables�
the� movement� of� business� logic� from� the� client� to� the� server).� A� DataBlade� module� is� a�
� 18
software�package�that�consists�of�a�collection�of�domain-specific�data� types�with�their�related�
functions� that� can� be� plugged� into� the� Informix� dynamic� server� enabling� it� to� provide� the�
same�level�of�support�for�the�new�data�types�as�it�provides�for�the�built-in�ones.�A�DataBlade�
module� may� consist� of� a� number� of� the� following� components:� User� defined� data types,�
which� may� be� created� as� row,� distinct� or� opaque� types� and� whose� values� can� be� stored,�
manipulated� using� queries� or� routines,� as� well� as� passed� as� arguments� to� routines;� A�
collection� of� routines� which� may� operate� on� the� data� types� providing� new� domain-specific�
operations� that� extend� the� processing� and� aggregation� functions� provided� by� the� server;�
Interfaces,�which�are�a�collection�of�routines�providing�a�standard�for�DataBlade�development�
and�use;�Tables and Indexes� for�storing�and�accessing�data�directly� from�a�database;�Access
Methods�defined�by�the�user�to�operate�on�tables�and�indexes;�Client Code�which�may�provide�
a�user�interface�so�that�users�can�query,�modify�and�display�the�new�data�types.��
�
Informix�provides�a� tool�known�as�the�DataBlade�Developers�Kit�(DBDK)�to�help�in�creating�
a�DataBlade.�Examples�of�some�of�the�existing�DataBlades�are�those�that�provide�support�for�
multimedia� data� such� as� the� Excalibur� Image� DataBlade� (discussed� in� the� next� section),� the�
Video� Foundation� DataBlade,� the� Spatial� DataBlade� for� geographical� information,� and� the�
Excalibur� and� Verity� Text� DataBlades� for� document� management.� As� mentioned� in� the�
previous�section� the� Informix�server�provides�support� for�smart� large�objects�such�as�images,�
audio,� and� Microsoft� word� files,� which� are� used� by� these� DataBlades.� Various� DataBlade�
packages�can�be�used�together�if�a�combination�of�data�types�is�needed�for�an�application�e.g.�
using� the� Image�and�Text�DataBlades� together� to�store� images�with�captions�so�searches�can�
be� done� on� visual� properties� as� well� as� keywords.� Other� database� vendors� provide� similar�
technologies�such�as�DB2�Extenders�(e.g.�the�Image�and�Video�Extenders),�and�Oracle’s�Data�
Cartridges.�
�
Figure 34�Various�components�of�a�DataBlade�module.�
������������������������������������������������4.http://www.informix.com�Developing�DataBlade�Modules�for�Informix�Internet�Foundation.2000�–White�Paper�
�������������� ���� ������
��������������� � ���
����������� ! ����
�����"���� �� �#� �! ��
�������$�"����%&� ���� �
���'��)(�"����"����
������ � �&��*� ������"�'$"����+ %�
� 19
4.2 Example of Retr ieval By Visual Proper ties
�Until� recently� most� photographic� data� such� as� scene� of� crime� photos� were� stored� as� hard�
copies�on�film�with�an�ID�or�name-tag�provided�which�might�be�stored�on�a�computer�–similar�
to�an�art�cataloguing�system.�With�advances�in�hardware�and�communications�support,�images�
could� then�be�stored�as� files�on�computer�systems.�This� is�still�being�commonly�used� in�most�
image� databases� and� has� the� major� disadvantage� of� the� files� being� moved� or� tampered� with�
while� the�database�table�still�has�a�reference�to�it�or�the�problem�of�having�a�wrong�reference�
stored� in� the� table.� Now� most� major� DBMSs� provide� support� for� Binary� Large� Objects�
(BLOBs),� such� that� the� images� can� be� stored� in� binary� form�directly� in� the�database,�which�
provides� the� advantage� of� integrity� and� security.� The� Excalibur� Image� DataBlade� provides�
data� types� and� functions� that� allow� the� storage�and� � retrieval�of� images.� Images�of�all� types�
and�formats�are�supported�from�BMPs�to�JPEGs�and�TIFFS.�Images�can�be�stored�as�external�
files� or� BLOBs� and� a� feature� extractor� function� can� be� used� to� extract� features� based� on�
colour,� shape,� texture,� brightness� structure,� colour� structure� and� aspect� ratio� (see� Table� 4),�
which� are� then� stored� as� a� combined� feature� vector.� A� trigger� can� be� used� to� update� the�
feature�vectors�if�an�image�is�changed.��
�
FEATURE DESCRIPTION
Colour�content� Measure�of�the�colours�in�an�image�
Shape�content� Measure�of�the�relative�orientation,�curvature,�and�contrast�of�lines�in�an�image�
Texture�content� Measure,�on�a�small�scale,�of�the�flow�and�roughness�of�an�image�
Brightness�structure� Measure�of�the�brightness�at�each�point�in�the�image�
Colour�structure� Measure�of�the�hue,�saturation,�and�brightness�at�each�point�in�the�image�
Aspect�ratio� Measure�of�the�ratio�of�the�width�to�the�height�in�the�image�
Table 45 Properties�of�an�image used�for�the�feature�vector��
A�content-based�search�(see�section�3.2)�can�be�carried�out�by�providing�a�sample�image�and�
searching� for� similar� images� in� the� database� based� on� their� feature�vectors.�Searches�can�be�
made�on�a�combination�of�all�the�image�features�or�one�or�more�features�can�be�combined.�For�
example� you� can� search� for� just� similarity� by� shape� or� combine� colour� and� shape� by� using�
������������������������������������������������5�Excalibur�Image�DataBlade�Users�Guide.����http://www.informix.com/answers/english/docs/datablade/5356.pdf���
� 20
flags,� which� can� be� turned� on� (1� or� above)� or�off� (0)�using�binary�values.�Different� relative�
weights�can�also�be�given�to�the�properties�such�as�colour:�2,�shape:�6,�texture:�4,�brightness:�3�
and�so�on.�A�‘ resembles’ �function�can�be�used�to�provide�a�similarity�threshold�for�the�number�
of� returned� images,� for� example� if�a� resemblance�of�0.80� is�used� then� images�with� less� than�
80%� similarity� will� not� be� returned.� The� following� example� shows� three� features� (colour,�
shape� and� texture� from� left� to� right)� weighted� equally� with� a� 75%� resemblance� and� the�
resulting�images�being�given�a�ranking.��
�
�
�
SELECT�img_id,�rank�FROM�gallery�
WHERE�Resembles(fv,�GetFeatureVector(IfdImgDescFromFile(‘ \images\deadBody.gif’ )),�
����������������������������������0.75,�1,�1,�1,�0,�0,�0,�rank�#REAL)�
ORDER�BY�rank;�
�
�
�
Figure 4 The�different�components�of�an�image�feature-based�search�query��
�
A�selection�of�different� types�of� images� from� the�scene�of�crime�domain�was�used�to� test� the�
image� retrieval� capabilities� of� the� DataBlade.� Different� categories� of� relevant� images� were�
used,� based� on� the� types� of� photographs� taken� at� a� crime� scene� such� as� close-up� shots� of�
single�objects,� indoor�and�outdoor�photographs.�The�images�were�divided�into�the�following�4�
categories�based�on�their�complexity:�
1.� One� object/image� –Same� object,� different� rotations.� Close-up� photographs� are� often�
taken�of�the�same�object�from�various�sides�or�angles.�The�background�would�usually�
be�the�same�in�this�case.�
2.� One� object/image� –Different� objects.� In� this� case� close-up� photographs� are� taken� of�
different�objects,�which�may�or�may�not�belong�to�the�same�set�(for�example�all�guns�
or�knives).�The�background�may�vary.�
3.� Many� objects/image� –Indoor.� These� are� photographs� taken� inside� a� room� in� a�
building�where�a�crime�may�have�been�committed�and�may�be�mid-range�or�overview�
photographs�
FEATURE VECTOR TYPE FV OF QUERY IMAGE
COLOUR CONTENT
ASPECT RATIO
SHAPE CONTENT
TEXTURE CONTENT
BRIGHTNESS STRUCTURE
COLOUR STRUCTURE
SIMILARITY THRESHOLD
� 21
4.� Many� objects/image� –Outdoor.� These� are� overview� or� mid-range� photographs� taken�
outside� if�a�crime�has�been�committed�outdoors�or�of�the�approach�path�to�a�building�
or�car�in�which�the�crime�may�have�been�committed.�
The� next� few� sections� will� discuss� an� example� scenario� for�each�of� the� four�cases�presented�
above.�For�every�scenario�each� image� is�used�as�a�query� image�and�the�similarity�ranking�of�
the�retrieved�images�is�noted.��
SEARCH SCENARIO ONE
(One Object/Image –Same Object)��
�
The� purpose� of� this� test� was� to� see� that� if� an� object� is� rotated� in� different� directions� what�
would� be� the� range� of� rankings� returned� in� the� search.� A� pistol� was� chosen� as� the� required�
object� and� 6� images� of� the� object� at� different� angles� were� used� as� shown� in� table� 5a.� The�
images�have�been�given�the�IDs�Pistol-1,�to�Pistol-6.��
�
Pistol-1�
Pistol-2
Pistol-3
Pistol-4
Pistol-5
Pistol-6�
Table 5a Images�of�a�browning�pistol�at�different�angles�and�rotations�
� 22
�
Table�5b�shows�the�percentage�similarity�of�each�of�the�images�to�Pistol-1,�which�was�chosen�
as� the�query� image.� It� is� interesting� to�note� that�even�though� images�Pistol-1�and�Pistol-6�are�
of� the� same� pistol� they� are� shown� to�have�a�20%�difference� in�similarity�because� they�show�
different�sides�of�the�pistol.�
�
Image 1 2 3 4 5 6
Rank 100%� 97.5%� 96.9%� 94.7%� 83.3%� 79.4%�
Table 5b�Relative�similarity�and�order�of�retrieval�of�the�Pistol�images�when�Pistol-1�is�given�as�the�sample�query�image����The�matrix�below�was�created�to�see�the�variations� in�ranks�when�each� image�of� the�set�of�6�
above� was� used� as� the� search� image.� It� can� be� seen� from� the� matrix� that� even� though� the�
images� are� of� the� same� object,� due� to� the� different� rotations� they� are� returned�with�different�
rankings.�In�certain�cases�if�the�angle�of�rotation�is�important�then�the�difference�in�ranking�is�
justified�otherwise�it�gives�the�indication�that�the�images�are�of�different�objects.�
�
1 2 3 4 5 6
Pistol-1 1.0000
Pistol-2 0.9747 1.0000
Pistol-3 0.9688 0.9335 1.0000
Pistol-4 0.9467 0.9823 0.9424 1.0000
Pistol-5 0.8327 0.8324 0.8242 0.8088 1.0000
Pistol-6 0.7940 0.7915 0.8016 0.7859 0.7367 1.0000
Table 5c Matrix�showing�similarity�ranking�of�Single�Object�images�
SEARCH SCENARIO TWO
(One Object/Image –Different Objects).
The� images�shown�below�are�of�1-,�2-�and�3-bladed�knives�with�a�black�background�and�one�
knife� per� image.� The� images� have� been� given� the� IDs� Knife-1,� to� Knife-9.� The� images� are�
� 23
similar�in�that�they�are�all�of�pocketknives�but�vary�in�size,�in�the�number�of�blades,�the�angle�
at� which� the� blades� are� open,� as� well� as� the� colour� of� the� handle.� The� aim� was� to� see� the�
efficacy�of�the�ranking�based�on�the�equally�weighted�features�of�colour,�texture�and�shape.�
�
Knife-1
Knife-2
�
Knife-3
Knife-4
Knife-5
Knife-6
Knife-7
Knife-8
Knife-9
Table 6a Images�of�1-,�2-�and�3-bladed�knives�with�the�same�background
�
The� result� set� returned� when� Knife-1� was� given� as� the� query� image� is� shown� in� the� table�
below.� It� is�interesting�to�note�that�the�difference�in�similarity�between�Knife-1�and�Knife-9�is�
21.5%,�which�is�very�close�to�the�difference�in�the�previous�example�with�the�gun�rotated�to�a�
different�side,�even�though�these�2�images�of�the�knives�are�much�more�dissimilar.�
�
Image 1 2 3 4 5 6 7 8 9
Rank 100%� 85.6%� 85.5%� 85.2%� 81.8%� 79.7%� 79.3%� 78.5%� 78.5%�
Table 6b Relative�similarity�and�order�of�retrieval�of�the�Knife�images�when�Knife-1�is�given�as�the�sample�query�image���
� 24
The�matrix�below�was�done�similar� to� the�one�for� the�pistol� to�see�how�the�rank�varied�when�
each�of�the�9�knife�images�was�used�as�the�search�image.�It�is�interesting�to�note�that�for�each�
of�the�2-bladed�knives�(Knife-2,�3�& �4)�the�other�2�were�retrieved�as�the�closest�match.�The�1-
bladed�knives�(Knife-6,�7�& �8)�had�the�closest�similarity�to�Knife-9�but�when�1-bladed�Knife-
5� was� given� as� the� query� image,� 2-bladed� Knife-6� was� returned� as� the� closest� match.� The�
similarity� in� this� case� was� based� on� the� type� of� handle� and� angle� of� the� blade� and� not� the�
number�of�blades.�
�
1 2 3 4 5 6 7 8 9
Knife-1 1.0000
Knife-2 0.8563 1.0000
Knife-3 0.8546 0.8949 1.0000
Knife-4 0.8517 0.8847 0.9209 1.0000
Knife-5 0.8180 0.8548 0.8671 0.8708 1.0000
Knife-6 0.7965 0.8091 0.7903 0.7968 0.8021 1.0000
Knife-7 0.7925 0.8044 0.8006 0.7924 0.7824 0.8403 1.0000
Knife-8 0.7853 0.8323 0.8139 0.8143 0.8471 0.8059 0.8112 1.0000
Knife-9 0.7852 0.7825 0.7687 0.7687 0.7854 0.8414 0.8204 0.8060 1.0000
Table 6c Matrix�showing�similarity�ranking�of�Knife�images�
�
�
SEARCH SCENARIO THREE
(Many Objects/Image-Indoor Scene).
�
In� this� case� we� have� taken� 9� images� from� an� indoor� mock� crime� scene.� The� images� show�a�
body�on�the�floor�near�a�table�with�various�items�on�it�as�well�as�some�images�of�ridge�detail.�
These� images� are� much� more� complex� than� the�previous�2�scenarios� in� that� they�have�many�
objects� in� one� image� as� well� as� differences� in� background� due� to� the� different� colours� and�
textures� of� the� floor,� the� walls� and� the� furniture.� This� complexity� of� the� images� makes� it�
difficult�for�the�user�to�search�for�an�item�present�in�an�image�with�many�objects.�The�images�
have�been�labelled�Indoor-1�to�Indoor-9�and�are�shown�in�table�7a.�
� 25
Indoor-1
Indoor-2�
�
Indoor-3
Indoor-4
Indoor-5�
�
Indoor-6
Indoor-7
Indoor-8�
�
Indoor-9
Table 7a�Images�of�an�indoor�crime�scene
�
Table�7b�shows�the�percentage�similarity�of�the�images�above�to�the�image�Indoor-1.�From�the�
result� set� it� can� be� seen� that� 9� is� the� most� dissimilar� which� can� be� expected.� It� is� not� clear�
though�on�the�basis�of�the�objects�present�why�Indoor-6�should�be�a�closer�match�than�Indoor-
2�or�4.�
�
Image 1 6 4 7 5 2 8 3 9
Rank 100%� 74.2%� 70.5%� 69.3%� 69.3%� 67%� 61.5%� 58.6%� 53.2%�
� 26
Table 7b Relative� similarity� and� order� of� retrieval� of� the� Indoor� images� when� Indoor-1� is�given�as�the�sample�query�image��
It�can�be�observed�in�the�matrix�below�that�Indoor-8�is�the�closest�match�to�Indoor-9�which�is�
good�but� it�does�not�hold�the�other�way�round.�The�most�similar�image�to�Indoor-8�is�Indoor-
7,� which� would� be� expected� since� Indoor-8� is� a� close-up� shot� of� Indoor-7.� Images� Indoor-1�
and�4�were�retrieved�as�the�closest�match�to�Indoor-5�indicating�that�the�similarity�was�mainly�
based�on�the�expanse�of�the�floor.�
1 2 3 4 5 6 7 8 9
Indoor -1 1.0000 0.6695 0.5861 0.7047 0.6926 0.7418 0.6931 0.6148 0.5321
Indoor -2 0.6695 1.0000 0.6135 0.6843 0.6148 0.6575 0.6451 0.6271 0.5936
Indoor -3 0.5861 0.6135 1.0000 0.6047 0.6196 0.5846 0.5955 0.5992 0.6039
Indoor -4 0.7047 0.6843 0.6047 1.0000 0.6748 0.7366 0.7183 0.6395 0.5978
Indoor -5 0.6926 0.6148 0.6196 0.6748 1.0000 0.6641 0.6434 0.5328 0.5877
Indoor -6 0.7418 0.6575 0.5846 0.7366 0.6641 1.0000 0.7397 0.6313 0.5982
Indoor -7 0.6931 0.6451 0.5955 0.7183 0.6434 0.7397 1.0000 0.6454 0.5955
Indoor -8 0.6148 0.6271 0.5992 0.6395 0.5328 0.6313 0.6454 1.0000 0.6057
Indoor -9 0.5321 0.5936 0.6039 0.5978 0.5877 0.5982 0.5955 0.6057 1.0000
Table 7c Matrix�showing�similarity�ranking�of�Indoor�images
SEARCH SCENARIO FOUR
(Many Objects/Image –Outdoor Scene)
Outdoor� images� might� be� even� more� complex� than� indoor� images.� The� images� in� table� 8a,�
which� are� labelled� Outdoor-1� to� Outdoor-2,� can� be� seen� to� have� “noise” � in� the� background�
such� as� leaves,� grass,� bushes,� which�might�make� it�even�more�difficult� to� identify�objects�of�
interest.�The� images�on� the�next�page�are�of�a�mock�crime�scene�set-up� in�a�church�where�a�
body�was�found�outdoors.�The�overall�view�of�the�church�is�shown�and�then�the�photographer�
zooms� in� to� the� exact� location� where� the� body� was� found.� Close-up� shots� were� taken�of� the�
body�and�other�objects�of�interest.�Similar�to�the�examples�above,�the�percentage�similarity�of�
all� the� images� to� image�Outdoor-1� is�shown� in� table�8b�while� the�matrix�of�all� the� images� is�
shown�in�table�8c.�
� 27
Outdoor-1
Outdoor-2
Outdoor-3
Outdoor-4
Outdoor-5
Outdoor-6
Outdoor-7
Outdoor-8
Outdoor-9
Table 8a Images�of�an�Outdoor�crime�scene
The�table�below�shows�that�images�2,�3�and�4�are�the�most�similar�to�Indoor-1,�which�is�what�
will�be�expected�since�they�all�show�part�of�the�building�and�some�landscape�nearby.�
� 28
Image 1 2 3 4 8 9 7 6 5
Rank 100%� 76.7%� 71.7%� 71.6%� 71.3%� 70.8%� 70%� 67.9%� 67.1%�
Table 8b Relative�similarity�and�order�of� retrieval�of� the�Outdoor� images�when�Outdoor-1� is�given�as�the�sample�query�image���
In�the�matrix�below�it�can�be�observed�that�the�closest�match�to�Indoor-9,�which�is�a�close-up�
shot� of� the� body,� was� Indoor-4� while� one� would� expect� that� Outdoor-5� or� 8� would�be�more�
similar�since�they�show�a�closer�view�of�the�body.��
�
1 2 3 4 5 6 7 8 9
Outdoor -1 1.0000 0.7669 0.7171 0.7163 0.6705 0.6785 0.6992 0.7132 0.7075
Outdoor -2 0.7669 1.0000 0.8672 0.8031 0.7105 0.7026 0.7947 0.7873 0.7785
Outdoor -3 0.7171 0.8672 1.0000 0.8328 0.7475 0.7178 0.7996 0.8105 0.7833
Outdoor -4 0.7163 0.8031 0.8328 1.0000 0.8285 0.8132 0.8507 0.8450 0.8447
Outdoor -5 0.6705 0.7105 0.7475 0.8285 1.0000 0.8181 0.8051 0.8349 0.8163
Outdoor -6 0.6785 0.7026 0.7178 0.8132 0.8181 1.0000 0.8052 0.7950 0.8104
Outdoor -7 0.6992 0.7947 0.7996 0.8507 0.8051 0.8052 1.0000 0.8598 0.8326
Outdoor -8 0.7132 0.7873 0.8105 0.8450 0.8349 0.7950 0.8598 1.0000 0.8396
Outdoor -9 0.7075 0.7785 0.7833 0.8447 0.8163 0.8104 0.8326 0.8396 1.0000
Table 8c Matrix�showing�similarity�ranking�of�Outdoor�images�
�
�
4.3 Example of Text-Based Retr ieval
�Simple� keyword� searches� have� been� supported� for� a� very� long� time� both� by� databases� and�
information� retrieval� systems.� Traditional� relational� databases� supported� the� search� for�
keywords� or� phrases� in� columns� containing� text� by� using� the� LIKE� or� MATCHES� clause.�
This� search� matches� the� phrase� or� keyword� exactly,� not� accounting� for� any� misspellings,�
alternative� spellings� or� similar� phrases.� The� Text� Search� DataBlade� module� extends� the�
Informix� Dynamic� Server� by� providing� a� set� of� data� types� and� routines� that� enable� the�
performance� of� much� more� sophisticated� searches� such� as� fuzzy� searching,� proximity�
searching,�using�thesauri�for�query�expansion�and�stop�word�lists�to�improve�efficiency.��
�
� 29
Text� can�be�stored�as�various� types�such�as�CHAR,�CLOB�or�BLOB�depending�on� the�size�
and� type� of� text.� An� operator,� etx_contains(),� is� defined� for� the� DataBlade� and� instructs� the�
server�of�whatever�type�of�search�needs�to�be�performed.�A�special�index�is�created�specifying�
various�parameters�such�as� inclusion�of�a�synonym�list�and�exclusion�of�stop�words�for�each�
table�on�which�the�search� is�required,�which� the�search�engine�then�uses.�The�simplest�search�
that�can�be�performed� is� the�keyword�search,�which�can�be�used�together�with�a�synonym�list�
if� required.� Boolean� search� enables� the� combination� of� keywords� to� form� more� complex�
queries.� Exact� or� approximate� phrase� searches� can� be� done� (a� phrase� being� a� clue� that�
contains� more� than� one� word� and� is� treated� by� the� search� engine� as� a� single� unit).� The�
DataBlade� also� supports� proximity� searches� where� a� number� can� be� specified� as� the�
maximum�number�of�words�allowed�between�the�specified�words�in�the�search�phrase.�Finally�
fuzzy� or� pattern� searches�can�be�performed�where�words� that�closely� resemble� the�keywords�
given�in�the�search�query�are�also�considered.�
�
A� synonym� list� can� be�extremely�helpful�especially� in� the�Scene�of�Crime�domain�where� the�
use� of� differing� terminologies� and� acronyms� amongst� police� forces� is� very� common.� For�
example� ‘ lift’ � and� ‘ ridge� detail’ � are� common� alternatives� to� ‘ fingerprint’ ,� an� ‘ index’ � is� the�
same� as� a� ‘number� plate’ � or� ‘car� registration� number’ ,� ‘ in� situ’ � can� be� used� meaning� at� the�
‘crime�site’ �or�‘crime�scene’ ,�when�describing�a�(dead)�person�‘Caucasian�woman’ �is�the�same�
as� ‘white� female’ � or� the� code� ‘ IC1’ .� Similarly� a� lot� of� acronyms� are� used� such� as� MO� for�
modus� operandi� and� DOA� for� dead� on� arrival.� The� Text� DataBlade� allows� you� to� create� a�
domain-specific�synonym�list�but�the�problem�here�is�that�it� is�not�possible�to�have�compound�
terms� in� the� list�–and� the�scene�of�crime� terminology� is� full�of�compound�terms�as�discussed�
in� the�next�chapter.�The�table�below�shows�a�few�examples�from�the�synonym�list�created�for�
the�forensic�science�corpus.��
��FINGERPRINT�LIFT�LIFT�FINGERPRINT�GUN�PISTOL�PISTOL�GUN�DRUGS�NARCOTICS�CONTRABAND�NARCOTICS�DRUGS�CONTRABAND�CONTRABAND�NARCOTICS�DRUGS�MARIJUANA�CONTRABAND�NARCOTICS�ROCK�STONE�VICTIM�BODY�BODY�VICTIM Table 9�Example�of�a�Synonym�list�
�
� 30
�
�
Table�10�shows�a�list�of�10�image�Ids�with�their�captions�that�were�used�as�the�data-set�to�test�
the�use�of�the�Text�DataBlade.�The�images�are�shown�in�Appendix�B.�
�
IMAGE ID CAPTION OF THE IMAGE
CS-1� Overall�view�of�crime�scene�
CS-2� Beer�cans�
CS-3� Rock�stained�with�blood�
CS-4� Close-up�of�clothes�on�victim�
CS-5� Drugs�found�near�body�of�victim�
CS-6� Gas�container�found�near�body�of�victim�
CS-7� Gun�found�near�body�of�victim�
CS-8� Fingerprint�from�body�of�victim�
CS-9� Close-up�of�rock�stained�with�blood�
CS-10� Close-up�of�gas-container�
Table 10 Image�Ids�with�their�captions�
�
The� table� below� shows� some� examples� of� queries� with� the� output� result.� The� first� query�
demonstrates� the�use�of� the�synonym�list.�Even�though�the�search�keyword�was�‘ lift’ ,�the�row�
with� fingerprint� in� it� was� returned.� The� other� two� queries� illustrate� the� use� of� a� proximity�
search�as�compared�with�an�exact�search.�
�
1. Example of query showing the use of a synonym list ���������SELECT�img_id,�caption�FROM�mock_scene2���������WHERE� etx_contains(caption,� Row('lift',� 'MATCH_SYNONYM� =� FS-SynonymList-1'));��
img_id caption CS-8� � �������������Fingerprint�from�body�of�victim��
� 31
2. Example of text-based query for a proximity match �
��������SELECT�img_id,�caption�FROM�mock_scene2���������WHERE�etx_contains(caption,�Row('Gas�container�found�near�body',�'SEARCH_TYPE�=�����������PHRASE_APPROX'));������������
img_id caption CS-6� ��������������������������Gas�container�found�near�body�of�victim�CS-5� ��������������������������Drugs�found�near�body�of�victim�CS-7� ��������������������������Gun�found�near�body�of�victim�CS-10� ��������������������������Close-up�of�gas-container� �CS-8� ��������������������������Fingerprint�from�body�of�victim��
3.� Example of text-based query for an exact match�
��������SELECT��img_id,�caption�FROM�mock_scene2���������WHERE�etx_contains(caption,�Row('Gas�container�found�near�body',�'SEARCH_TYPE�=����������PHRASE_EXACT'));�����
img_id caption CS-6� � �������������Gas�container�found�near�body�of�victim�
��Table 11 Some�example�text-based�SQL�queries��
Image� and� text� datatypes� can� be� stored� in� one� table� and� combined� queries� can� be� made� by�
using� the�AND�or�OR�connectives�as�shown� in� the�example�below.�AND�acts�as�a�switch� in�
that�only�those�images�are�returned�that�have�the�keywords�present�in�their�captions�regardless�
of�how�close�the�image�resembles�the�search�image.��
�
SELECT�img_id,�caption,�rank�FROM�mock_scene2�WHERE�Resembles�(fv,�GetFeatureVector(IfdImgDescFromFile('C:\images\Album4\Crime-Scenes\Mock-Scene\piece5-gas-container.jpg')),�0.5,�1,�1,�1,�0,�0,�0,�rank�#REAL)�AND etx_contains(caption,� Row('container',� 'MATCH_SYNONYM� =� FS-SynonymList-1� & �PATTERN_ALL'))�ORDER�BY�rank;��
img_id caption rank
CS-6� � Gas�container�found�near�body�of�victim� � 0.67007446�CS-10� � Close-up�of�gas-container� � � 0.97129822��
When�just�the�query�image�was�used�then�the�result�returned�was�as�shown�below.�
�
���
� 32
SELECT�img_id,�caption,�rank�FROM�mock_scene2�WHERE�Resembles�(fv,�GetFeatureVector(IfdImgDescFromFile('C:\images\Album4\Crime-
Scenes\Mock-Scene\piece5-gas-container.jpg')),�0.5,�1,�1,�1,�0,�0,�0,�rank�#REAL)�ORDER�BY�rank;��
img_id caption rank
CS-8� � Fingerprint�from�body�of�victim� � � 0.58262634�CS-5� � Drugs�found�near�body�of�victim� � 0.59703064�CS-4� � Close-up�of�clothes�on�victim� � � 0.60975647�CS-2� � Beer�cans� � � � � 0.61682129�CS-7� � Gun�found�near�body�of�victim� � � 0.63639832�CS-9� � Close-up�of�rock�stained�with�blood� � 0.64620972�CS-3 Rock stained with blood 0.64979553 CS-6 Gas container found near body of victim 0.67007446 CS-1 Overall view of cr ime scene 0.67984009 ��������� ����� ����������������������������� !�"��# ��$%'&(�*)�%�+,)-)
4.4 A Compar ison of Image- and Text-based Retr ieval
�In� this� section� we� will� discuss� the� comparative� effectiveness� of� image-based� retrieval� versus�
text-based� retrieval.� We� shall� use� a� set� of� 65� captioned� images� taken� at� an� artificial� crime�
scene�set�up� for� training�purposes.�The�scene�is�set�at�a�pub�where�there�has�been�a�break-in,�
theft� and� murder.� The� captions� of� the� images� were� taken� from� the� spoken� commentary�
provided�by�the�scene�of�crime�officer.�The�images�of�the�pistol�used�in�scenario�1�were�taken�
form�this�set�of�images�as�well�as�the�indoor�images�in�scenario�3�above.��
�
The� following� tests� include� the� different� types� of� photographs� taken� at� the� scene� such� as�
photographs� of� the� gun,� body,� and� ridge� details.� The� purpose� was� to� test� out� different�
scenarios� that� a� scene� of� crime� officer� would� use� to� search� for� relevant� images.� Initially� a�
close-up�shot�of�a�fingerprint�(ridge-detail)�was�used�to�test�whether�all�other�images�showing�
ridge� detail� will� be� retrieved.� Then� a� close-up� shot� of� a� gun� found� at� the�scene�was�used� to�
find�all�other� images�with�a�gun� in� it�and�finally� two�different� images�of�a�body� lying�on�the�
floor� (overview� and� midrange� shots)� were� used� to� test� whether� other� images� containing� the�
body� would� be� found.� The� search� was� based� on� colour,� shape,� texture,� brightness� structure�
and� colour� structure.� The� tables� shown� below� display� the� top� ten� closest� matching� images�
sorted� in� increasing� order� of� rank� with� 1.00000� being� the� highest.� All� the� images� used� with�
the�captions�are�available�in�Appendix�A.�
�
� 33
1.� Testing for retrieval of image with fingerprint �
�
DSCN1494
DSCN1495
�
DSCN1496
Table 12a Close-up�photographs�of�ridge-detail�
�
Image�DSCN1496�was�given�as�the�query�image�and�the�ten�closest�ranked�images�are�shown�
in� the� table� below.� This� query� result� was� pretty� good� –it� retrieved� the� two� other� images� of�
fingerprints�and�also�picked�up�3�images�of�footwear�impressions,�which�is�interesting.�
�
img_id caption rank
DSCN1473� Photograph� of�writing� in�dust�on�games�machine�saying�clean�me�
0.67176819���
DSCN1445� Shot�of�male�dressed� 0.67431641���
DSCN1459� Showing�top�of�bar� 0.68014526���
DSCN1450� Photograph�of�footwear�impression�in�blood�fully�labelled� 0.68031311�
DSCN1440� Close-up� of� hand,� of� left� hand� showing� footwear� impression,�zig-zag�pattern�
0.68505859�
DSCN1475� Wooden� chair� with� vinyl� seat� partially� broken� found� on� floor�adjacent�to�the�feet�of�the�body�
0.69798279�
DSCN1465� Photograph� of� footwear� impression� in� dust� inside� machine� on�the�second�shelf�up�eighteen�inches�from�floor�
0.70825195�
DSCN1495 fingerpr ints r idge detail 0.72203064�
DSCN1494 fingerpr ints r idge detail 0.79878235��
DSCN1496 fingerpr ints r idge detail 1.0000000�Table 12b Results�of�giving�DSCN1486�(close-up�of�fingerprint)�as�the�query�image.�
�
�
�
�
� 34
There� are� some� other� images� showing� ridge� detail� such� as� DSCN1488� (“Showing� window�
with� visible� ridge� detail” ),� DSCN1489� (“Same� shot�close-up”),�DSCN1491� (“Photograph�of�
ridge� detail� from� behind� the� bar”),�DSCN1492� (“Photograph�of� ridge�detail� from�behind� the�
bar� showing� salon� in� background”)� and� DSCN1493� (“Photograph� close-up� of� ridge� detail” )�
but� none� of� these� were� retrieved� because� there� are� other� objects� in� the� background� and� the�
ridge� detail� is� shown� at� a� much� smaller� scale� (see� table� 12c� below).� If� the� SoCO� wants� to�
retrieve� all� the� images� with� fingerprints� then� using� a� sample� image� for� the� query� will� not�
result� in� a� satisfactory� result.� However,� if� the� SoCO� typed� in� the�keyword� fingerprint�and�a�
synonym� list� was� used� (FINGERPRINT� =� RIDGE� DETAIL)� then� all� the� expected� images�
will�be�retrieved.�
�
DSCN1489
DSCN1491
�
DSCN1493
Table 12c Overview�and�mid-range�photographs�of�ridge�detail.�
�
2.� Testing for retrieval of image with gun �
�
Here�we�want� to�determine�whether�using�a�close-up�photograph�of�a�gun�will�retrieve�all�the�
other�images�containing�a�gun�as�well.�Surprisingly�the�system�did�not�pick�up�DSCN1458�the�
image� of� the� same� pistol� taken� from� a� different� side� as� the� next� closest� match.� If� only� the�
shape�and�colour�parameters�were�used�then�the�other�image�was�located�as�the�closest�match.�
As�discussed� later� in� the�section,�varying�the�combination�and�weights�of�features�used�in�the�
search�can�result�in�a�different�output.�
�
�
�
�
� 35
DSCN1457�
DSCN1458�
DSCN1436
Table 13a Close-up�and�mid-range�photographs�of�a�pistol�found�at�the�crime�scene�
�
The� other� two� images� (DSCN1436� and� DSCN1455),� which� are� distant� shots� of� the� pistol�
lying� on� the� table� with� some� other� items,� were� retrieved� with� a� similarity� of� 67%� and� 65%�
respectively.� There� were� also� 4� and� 5� other� irrelevant� images� appearing� with� a� closer�
similarity�than�the�2�above�(see�table�13b).�
�
img_id caption rank
DSCN1455 Just� a� distance� shot� of� the� table� showing� knife,� firearm� and�pool�cue�for�information��
0.65344238�
DSCN1437� Table�showing�bottles�and�pool�cue,�knife� 0.65429688�
DSCN1436 Table� showing� browning high power ,� bottles� and� pool� cue,�knife�
0.67178345�
DSCN1438� Body�on�floor�showing�adjacent�table� 0.67379761�
DSCN1452� Close-up� photograph� of� bottle� showing� apparent� blood-like�substance�
0.67881775�
DSCN1492� Photograph� of� ridge� detail� from�behind� the�bar�showing�salon�in�the�background�
0.68412781�
DSCN1491� Photograph�of�ridge�detail�from�behind�the�bar� 0.68791199�
DSCN1458 Browning high power �self-loading�pistol� 0.69396973�
DSCN1454� Part�of�pool�cue�with�apparent�blood-like�substance�thereon� 0.71173096�
DSCN1457 Nine�millimeter�browning high power �self-loading�pistol� 1.0000000�Table 13b Results�of�giving�DSCN1457�(close-up�of�pistol)�as�the�query�image.�
�
If� the�keyword� ‘browning�high�power’ �was�used� instead�of� the�query� image�then�DSCN1436�
would�have�also�been�retrieved.�
�
�
� 36
3.� Testing for retrieval of image with body �
DSCN1434�
DSCN1435�
DSCN1438
�
DSCN1438
DSCN1439
DSCN1440
DSCN1441
DSCN1442
DSCN1443
DSCN1444�
�
DSCN1445�
�
DSCN1446�
�
Table 14a Photographs�taken�at�the�crime�scene�showing�the�body.�
�
�
� 37
img_id caption rank
DSCN1480� Shot�of�room�from�behind�bar� 0.70158386�
DSCN1436� Table� showing� browning� high� power,� bottles� and� pool� cue,�knife�
0.70312500�
DSCN1453� Photograph�of�green�wine�bottle�on�floor�enfragments� 0.70510864�
DSCN1445 Shot�of�male�dressed� 0.71545410�
DSCN1477� Wooden� table� with� bottles� and� cigar� packet� thereon� with�broken�glass�and�ashtray�
0.71885681�
DSCN1429� Sign�of�the�baskerville�arms� � 0.72137451�
DSCN1479� Shot�of�counter�behind�bar� 0.72909546�
DSCN1440 Close-up� of� hand,� of� left� hand� showing� footwear� impression,�zig-zag�pattern�
0.72961426�
DSCN1434 Body�on�floor�surrounded�by�blood� 0.76255798�
DSCN1446 Full�length�shot�of�body� 1.0000000�Table 14b Results�of�giving�DSCN1446�(full�length�shot�of�body)�as�the�query�image.�
�
In� this� query� the� attempt� is� to� retrieve� all�photographs� that�show� the�body.�An� image�of� the�
full� length� of� the� body� was� given� as� the� search� image.� The� expected� result� should� have�
displayed� images� DSCN1434,� DSCN1435,� DSCN1437,� DSCN1438� and� DSCN1439� –�
DSCN1446�but�only�2�other� images�of� the�body�were�retrieved�as�the�closest�match�with�one�
other�image�appearing�as�the�7th�closest�match.��
�
4.� Testing for retrieval of image with close-up of body (DSCN1457)�
�
The� query� below� aimed� to� retrieve� images�of� the�body�as�well�but�giving�a�close-up�shot�as�
the� search� image.� The� result� here� shows� the� 20� closest� matches.� 3� correct� images� were�
retrieved�but� then�there�was�a�gap�of�1�before�the�next�image�and�a�gap�of�6�images�before�3�
other�relevant�ones.�
�
img_id caption rank
DSCN1463� Photograph�of� footwear� impression�with�zig-zag�and�target�on�piece�of�wood�
0.65731812�
DSCN1440 Close-up� of� hand,� of� left� hand� showing� footwear� impression,�zig-zag�pattern�
0.65827942�
DSCN1462� Photograph� of� rear� of� games� machine� showing� footwear�impression� on� piece� of� wood� in� foreground� and� footwear�
0.66088867�
� 38
impression�on�chair�adjacent�to�the�machine�
DSCN1459� Showing�top�of�bar� 0.66104126�
DSCN1490� Photograph�of�bar�area�from�extreme�corner�of�room.�Extreme�left-hand�side�of�picture�green�arrow�
0.66152954�
DSCN1438 Body�on�floor�showing�adjacent�table� 0.67022705�
DSCN1434 Body�on�floor�surrounded�by�blood� 0.67086792�
DSCN1446 Full�length�shot�of�body� 0.67202759�
DSCN1450� Photograph�of�footwear�impression�in�blood�fully�labelled� 0.67501831�
DSCN1467� Close-up�of�same�footwear�mark� 0.67640686�
DSCN1436� Table� showing� browning� high� power,� bottles� and� pool� cue,�knife�
0.68112183�
DSCN1480� Shot�of�room�from�behind�bar� 0.68278503�
DSCN1461� Front�of�games�machine� 0.68499756�
DSCN1479� Shot�of�counter�behind�bar� 0.68737793�
DSCN1445 Shot�of�male�dressed� 0.68955994�
DSCN1477� Wooden� table� with� bottles� and� cigar� packet� thereon� with�broken�glass�and�ashtray�
0.70429993�
DSCN1432 Showing�common�approach�path�to�body� 0.70771790�
DSCN1441 Close-up�of�head�and�blood�on�floor� 0.70849609�
DSCN1442 Close-up�of�body,�head�and�blood�on�floor� 0.73866272�
DSCN1443 Male�laying�on�his�back�showing�open�shirt�blood�on�nose�and�chest�with�tie�around�neck�
1.0000000�
Table 14c Results�of�giving�DSCN1457�(close-up�shot�of�the�body)�as�the�query�image.�
�
These�tests�illustrate�that�when�you�have�many�objects�in�an�image�it�becomes�very�difficult�to�
retrieve� other� images� with� similar� objects.� In� example� (1)� if� the� user� was� looking� for� all�
images�showing� fingerprints� the�search�was�not�successful�using�a�query� image�since�a� large�
number�of� relevant� images�were�not� retrieved.�Similarly� in�example�(2)�with� the�gun,� images�
with�the�gun�on�a�table�with�other�objects�were�retrieved�with�much�lower�ranking.�The�next�2�
examples� with� the� query� images� of� the� body� show� that� all� the� relevant� images� were� not�
retrieved� as� the� closest� matches.� In� all� of� these� examples,� the� relevant� keywords� have� been�
rendered� in� blue� and� one� can� see� that� if� these� keywords� were� used� to� search� the� respective�
captions� of� the� images� then� the� retrieval� would� be� much� more� effective,� for� example� using�
“Browning�high�power” �in�example�(2)�and�“body” �in�examples�(3)�and�(4).��
�
Another� important� observation� is� that� if� there� could� be� some� query� expansion� such� as� the�
SoCO�providing� the�keyword� “ fingerprint” (example�1)�and�all� the� images�with�“ ridge�detail” �
in� the� caption� being� returned� the� efficiency� increases� even� further.� Similarly� the� keyword�
� 39
“ firearm”�could�be�used�to�retrieve�all�the�images�of�the�browning�pistol�(example�2).�Another�
example�would�be�to�provide�the�keyword�“body” �and�retrieve�all� images�showing�any�part�of�
the� body� containing� keywords� such� as� “head”,� “neck” � and� “male” � (examples� 3� & � 4).� This�
could�be�done�through�the�use�of�a�domain-specific�ontology,�which� is�a�conceptualisation�of�
all� the� entities� present� in� the� domain� as� well� as� their� interrelationships.� If� a� text� and� image�
query�is�combined�together�than�even�higher�precision�might�be�achieved.��
�
It�was�observed�that� if� the�data�set� is�constrained�or� limited�the�search�has�a�higher�precision�
while�if�a�large�database�of�random�images�is�used�the�search�becomes�less�effective.�A�search�
is� generally� more� effective� if� objects� have� very� distinct� colours� or� shapes.� If� an� image�
comprised�of�a�large�combination�of�indistinct�objects�and�colours�the�retrieval�was�extremely�
ineffective.� It�was�also�observed� that�varying� the�combination�of�visual� features�used� for� the�
search�resulted�in�different�result�sets.��
�
5 Discussion
�One� main� issue� to� consider� here� is� the� text-based� versus� content-based� versus� hybrid-based�
approaches�to� indexing�and�consequently�retrieving� images.� It� is� interesting� to�consider� in� the�
first�place�whether�communication� is�more�effective�when�several�modes�work� together.�This�
question� may� have� a� different� response� if� considered� from� the� perspective� of� a� machine� as�
opposed� to� a� human.� For� example� if� you� have� an� image� of� Bush� shaking� hands� with� Tony�
Blair� with� the� British� flag� in� the� background� and� a� caption� “Bush� (left)� shaking� hands�with�
Blair.” �For�most�people�the�caption�will�be�redundant�since�they�are�familiar�with�the�physical�
appearance�of�Bush�and�Blair�and�the�flag� in� the�background�will�help�place�a�context�to�the�
image.� For� others� who� are� familiar� with� the� names� and� not� the� appearance,� the� spatial� clue�
‘ left’ � in� the�caption�will�help� identify� the�people.�For�others�who�have�no�clue�who�Bush�and�
Blair�are�will�need�some�supporting�information�apart�from�the�caption�to�identify�them�as�the�
president�of�the�USA�and�the�prime-minister�of�the�UK�respectively.��
�
The� current� state� of� the� art� in� computer� vision� technology� is� very� limited� when� it� comes� to�
identifying� any� random� object� based� on� just� the� perceptual� features� such� as� colour,� shape,�
texture� or� position� in� an� image.� It� becomes� even� more� difficult� to� capture� higher-level�
semantic�or�abstract�meanings�based�on�context,� roles,�events�and� impressions.�Similarly�text�
has� its� limitations�when� it�comes�to�describing�certain�visual�features�such�as�shape�(e.g.�of�a�
� 40
knife,� fingerprint,� blood� pattern),� texture� (e.g.� sky� and� sea),� define� exact� colour� shades� or�
make� spatial� relationships� explicit.� Hence� if� these� two� modes� are� working� together� they� can�
complement� and� reinforce� each� other� to�address� their� respective� limitations�e.g.� if�a�machine�
detects�a�round�brown�object�in�an�image�and�the�supporting�text�is�related�to�soccer�than�the�
object� is� more� likely� to� be�a�ball�as�opposed� to�say,�a�coconut.�Another�disadvantage�of� the�
CBIR� approach� is� that� it� is� necessary� to� have� a� query� sample,� whether� it� is� an� image� or� a�
sketch�provided�by� the�user,�otherwise� the�system�cannot�be�used.�As�seen� from�section�3.3,�
all� the� researchers� that�used�an� integrated�approach�considered� it�an� improvement�on�using�a�
single�mode�–linguistic�or�visual,�and�achieved�a�higher�precision�and�recall.�
�
Another� interesting� question� to� consider� is� what� are� the� relative�merits�of� IR�systems�versus�
database� systems� versus� knowledge-based� systems� –what� role� can� each� of� them� play� in� the�
image� retrieval� scenario� individually� as� well� as� collectively?� Traditional� IR� systems� are�
optimised� for� dealing� with� textual� unstructured� data� and� do� not� provide� for� metadata�
information� such� as� a� database� schema.� However� they� provide� for� query� iteration� and�
expansion,� which� is� important� when� retrieving� images� due� to� the� lack� of� precision� in� the�
querying.� IR� systems� also� provide� a� similarity-based� approach� to� image� retrieval,� which� is�
usually�displayed�in�a�ranked�form.�Databases�have�been�optimised�for�structured�textual�data.�
As�discussed�in�the�previous�section�object-relational�(OR)�technology�now�allows�the�storage�
and� retrieval� of� more� complex� data� while� maintaining� the� benefits� of� the� relational� model�
such� as� the� provision� of� metadata,� data� security� and� integrity,� transaction� management� and�
update,� as� well� as� the� simple� structured� query� language,� which� are� not� supported� by� IR�
systems.�Of�course�an�obvious�advantage� is�having�all� this�different�data�stored� in�one�place�
with�a�shared�data�model.�Text,�image�and�video�data�are�not�structured�and�specific�methods�
for� feature� extraction� and� representation� are� required� which� have� been� implemented� by� IR�
systems.� According� to� Yates� & � Neto� (1999)� the� technologies� provided� by� information�
retrieval� systems� and� database� systems� should� be� combined� for� multimedia� information�
retrieval.�This�concept�has�been� implemented� in� Informix�using�DataBlade�technology.�When�
it�comes�to�representing�visual�and�textual�data�it�has�to�be�decided�what�purpose�it�is�going�to�
be�used�for�–just�for�retrieval�purposes�or�whether�there�is�a�need�to�reason�over�the�data.�As�
Srihari�(1995a)�suggested,�it�may�be�more�effective�to�use�a�database�system�to�store�the�data�
using�a�suitable�data�model�since�there�may�be�a� large�amount�of�overhead�associated�with�a�
powerful� representation� scheme.� As� discussed� previously� the� new� OR� paradigm� provides� a�
data� model� based� on� the� object-oriented� paradigm� which� makes� it� more� powerful� and�
versatile� than� the� relational�model�so� in�a�situation�where�vast�quantities�of�data�needs�to�be�
stored�without�the�need�for�reasoning�it�might�be�suitable�to�use�the�OR�model.�
� 41
�
If�a�text-based�or�hybrid�approach�is�to�be�used�for�image�indexing�and�retrieval�a�main�issue�
is�how�to�analyse�the�various�texts�related�to�an�image.�There�could�be�closely�collateral�texts�
such� as� a� description� or� caption� of� the� image,�or� there�could�be�broadly�collateral� texts� that�
might�contain�generic�or� informal�descriptions�of�objects� in� the� image.�For�example,�consider�
the� images� in� table�5a,� relating� to�a�pistol.�The�caption�of� the� image,� “browning�9mm�pistol�
found�on�table,” �could�be�classified�as�a�closely�collateral�text.�The�next�closest�collateral�text�
could�be�the�case�notes�of�the�SoCOs,�which�describe�part�or�all�of�the�scene�but�focus�on�the�
pistol.� Amongst� the� class� of� broadly� collateral� texts� there� could� be� the� report� for� that�
particular�crime,�which�refers�to�the�pistol�but�other�objects�and�points�of�interest�as�well;�the�
next� more� distant� collateral� text� could� be� an� encyclopaedic� description� of� the� pistol.� There�
could� also� be� even� more� broadly� collateral� texts� which� may� be� written� in� the� less� formal�
general� language� of� every� day� use� including� newspaper� reports� about� the� particular� crime�
whose� images�are�being�discussed�or�reports�about�other�related�crimes.�The�above�examples�
of� texts�are�essentially� those� in�the�formal�register�written�in�an�official�language�with�a�clear�
view� of� the� readership� such� as� the� crime� scene� investigators� associated� with� the� case.� This�
formal� register� may� include� occasional� informal� words,� short� hands� or� ellipses;� for� instance�
instead� of� suggesting� a� fingerprint� was� taken� at� a� scene� of� crime� they� may� say� a� ‘ lift’ � was�
taken.��
�
�
���������� �
�
Figure 5�Closely�and�broadly�collateral�texts.�
�
This�research�was�carried�out�for�the�benefit�of�the�EPSRC-sponsored�SoCIS�(Scene�of�Crime�
Information� System)� project.� Since� one� of� the� aims� of� the� project� is� to� build� a� visual�
information� system� to� store� and� retrieve� digital� photographs� taken� at� the� scene-of-crime,� it�
was� important� to� review� the� current� technology� available� in� order� to� make� an� informed�
decision�about� the�methods�and�systems�to�use�for� the�storage�and�effective�retrieval�of�scene�
of�crime�images.�
CLOSELY COLLATERAL TEXTS
BROADLY COLLATERAL TEXTS
THIS�IS�A�CLOSELY�COLLATERAL��TEXT�THAT�
COULD�BE�THE�DESCRIPTION�OR�THE�
CAPTION��
THIS�IS�A�BROADLY��COLLATERAL�TEXT�WHICH�COULD�BE�A�
REPORT��
THIS�IS�A�BROADLY��COLLATERAL�TEXT�WHICH�COULD�BE�A�
NEWSPAPER�ARTICLE�OR�
ENCYCLOPAEDIC�DEFINITION�
�
CAPTION� NEWSPAPER�ARTICLE�
CRIME�SCENE�REPORT�
DICTIONARY�DEFINITION�
� 42
References
Ahmad et al (2002).� Khurshid� Ahmad,� Bogdan� Vrusias� & � Mariam� Tariq,� “Co-operative�neural� networks� and� ‘ integrated’ � classification,” � To� appear� in� the� International Joint Conference on Neural Networks (IJCNN 2002),�Honolulu,�Hawaii,�May�12-17.��Al-Khatib (1999).� Wasfi� Al-Khatib,� Y.� Francis� Day,� Arif� Ghafoor� & � P.� Bruce� Berra,�“Semantic� Modelling� and� Knowledge� Representation� in� Multimedia� Databases,” � IEEE Transactions on Knowledge and Data Engineering,�Vol.11,�No.1,�pp.�64-80,�IEEE.��Baeza-Yates & Ribeiro-Neto� (1999).� Ricardo� Baeza-Yates,� Berthier� Ribeiro-Neto,� Modern Information Retrieval,�Essex,�England�ACM�Press.��Benitez et al. (2000).� Ana� B.� Benitez,� John� R.� Smith� & � Shih-Fu� Chang,� “MediaNet:� A�Multimedia� Information� Network� for� Knowledge� Representation,” � Proceedings of the SPIE 2000 Conference on Internet Multimedia Management Systems (IS&T/SPIE-2000,�Nov�6-8),�Vol.�4210,�Boston,�MA.� Bertino et al. (2001).�Elisa�Bertino,�Barbara�Catania�& �Gian�Piero�Zarri,�Intelligent Database Systems,�ACM�Press�&�Addison�Wesley,�Oxford,�England.� Brown� (2001).� Paul� Brown,� Object-Relational Database Development: A Plumbers Guide,�Menlo�Park,�CA�Informix�Press/Prentice�Hall.��Chang� et al� (1999).� Shih-Fu� Chang,� Yong� Rui� & � Thomas� S.� Huang,� “ Image� Retrieval:�Current� techniques,� Promising� Directions,� and� Open� Issues,” � Journal of Visual Communication and Image Representation, Vol.10,�pp.�39-62,�Academic�Press.��Del Bimbo� (1999).� Alberto� Del� Bimbo, Visual Information Retrieval, Morgan� Kaufmann�Publishers.� Jaimes & Chang�(2000).�Alejandro�Jaimes�&�Shih-Fu�Chang,�“A�Conceptual�Framework�for�Indexing� Visual� Information� at� Multiple� Levels,” � IS&T/SPIE Internet Imaging, Vol.� 3964,�San�Jose,�CA.��Maybury�(1997).�Mark�T.�Maybury�ed.,�Intelligent Multimedia Information Retrieval, Menlo�park�CA,�AAAI�Press/The�MIT�Press.��Mckeowen (1998).� Kathleen� R.� Mckeowen,� Steven� K.� Feiner,� Mukesh� Dalal� & � Shih-Fu�Chang,�“Generating�Multimedia�Briefings:�Coordinating�Language�and�Illustration,” �Artificial Intelligence, Vol.103,�pp.�95-116,�Elsevier�Science.� Mitchell (1987).�W.�J.�T.�Mitchell,�Iconology: Image, Text, Ideology, Chicago: University�of�Chicago�Press.��Ogle & StoneBraker � (1995).� Virginia� E.� Ogle,� Michael� Stonebraker,� “Chabot:� Retrieval�from�A�Relational�Database�of�Images,” �IEEE Computer Magazine,�Vol:�28(9),�pp.40-48.�
� 43
Paek et al�(1999).�S.�Paek,�C.�L.�Sable,�V.�Hatzivassiloglou,�A.�Jaimes,�B.�H.�Schiffman,�S.-F.� Chang,� and� K.� R.� McKeown,� “ Integration� of� visual� and� text� based� approaches� for� the�content�labeling�and�classification�of�Photographs,” �ACM SIGIR'99 Workshop on Multimedia Indexing and Retrieval,�Berkeley,�CA,�Aug.�19,�1999.��Sclaroff� (1999).� Stan� Sclaroff,� Marco� La� Cascia,� Saratendu� Sethi,� “Unifying� Textual� and�Visual� Cues� for� content-Based� Image� Retrieval� on� the�World�Wide�Web,” �Computer Vision and Image Understanding,�Vol.75,�Nos.�1-2,�pp.�86-98,�Academic�Press.��Sonka (1999).�Milan�Sonka,�Vaclav�Hlavac,�Roger�Boyle,� Image Processing, Analysis, and Machine Vision,�Pacific�Grove�CA�Brooks/Cole�Publishing�Company.� Sr ihar i� (1995a).� Rohini� K.� Srihari,� “Computational� Models� for� Integrating� Linguistic� and�Visual� Information:� A� Survey,” Artificial Intelligence Review,� special� issue� on� Integrating Language and Vision,�Vol.�8�(5-6),�pp.349-369.��Srihar i� (1995b).� Rohini� K.� Srihari,� “Use� of� Collateral� Text� in� Understanding� Photos,” �Artificial Intelligence Review,� special� issue�on� Integrating Language and Vision,�Vol.�8�pp.�409--430.��Srihar i & Zhang� (2000).� Rohini� K.� Srihari� & � Zhongfai� Zhang,� “Show&Tell:� a� Semi-Automated�Image�Annotation�System,” �IEEE Multimedia,�vol.7,�no.3,�July-Sept.�2000,�pp.61-71.�IEEE,�USA.��Srihar i et al� (2000).�Rohini�K.�Srihari,�Zhongfai�Zhang�&�Aibing�Rao,�“ Intelligent�Indexing�And� Semantic� Retrieval� Of� Multimodal� Documents,” � Information Retrieval,� Vol.2� (2-3),�pp.245-75,�Kluwer,�USA.��Staggs (1997).� Steven� Staggs,� Crime Scene & Evidence Photographer's Guide,� Staggs�Publishers,�Temecula,�CA. Stonebraker � (1999).� Michael� Stonebraker,� Paul� Brown,� with� Dorothy� Moore,� Object-Relational DBMSs: Tracking the Next Great Wave,� San� Francisco,� CA� Morgan� Kaufmann�Publishers,�Inc.��Subramanian (1999).� V.S.� Subramanian,� Principles of Multimedia Database Systems,�Morgan�Kaufmann.��Veltkamp & Tanase� (2000).� Remco� C.� Veltkamp�&�Mirela�Tanase,� “Content-Based� Image�Retrieval� Systems:� A� Survey,” � Technical� Report,� Dept.� of� Computing� Science,� Utrecht�University.���