summarizing data collections by their spatial, temporal ... · andreas henrich and daniel blank −...

69
15. November 2012 Summarizing data collections by their spatial, temporal, textual, and image footprint: Techniques for source selection and beyond Andreas Henrich and Daniel Blank [email protected] University of Bamberg Media Informatics Group Leipzig, 15.11.2012

Upload: others

Post on 16-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

15. November 2012

Summarizing data collections by their

spatial, temporal, textual, and image footprint:

Techniques for source selection and beyond

Andreas Henrich and Daniel Blank [email protected] University of Bamberg Media Informatics Group Leipzig, 15.11.2012

Page 2: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 2)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

Agenda

1. Motivation and Introduction

(1) Distributed indexing and search

(2) Media types in use

(3) Scenario: source selection for image retrieval

2. Source Selection: Research Goals & Work in our Group

(1) Efficient source selection with HFS and UFS

(2) Visualizing the source selection process

(3) Applicability in other application fields (geographic IR, access methods)

3. Conclusion and Outlook

Page 3: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 3)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

Dramatic increase in data volumes in many areas:

in the world wide web

on private devices

in companies

adequate indexing and search techniques needed

distributed indexing and search

1. Motivation and Introduction

Page 4: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 4)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

global vs. local indexing:

global indexing: “one” global index is distributed

inserts and updates → high network load

no locality of data (autonomy) → replication

local Indexing: many local indexes & query routing based on

local data summaries

query processing with logarithmic cost hard to guarantee

hybrid approaches: caching, replication, data specialization, …

1. Motivation and Introduction

Distributed indexing and search

Page 5: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 5)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

Visualization: features of 1 object/image 1 point in 2d

1. Motivation and Introduction

Distributed indexing and search

Page 6: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Müller, Boykin, Sarshar, Roychowdhury: Comparison of CBIR Queries in P2P Cambridge, 2006-09-06

query (dot in 0; 1 × 0; 1 )

query result (4-NN)

white points: 2d vectors

black squares with axes: peers

25 peers/resources and their data

1. Motivation and Introduction

Distributed indexing and search

Page 7: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 7)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

CAN image

Peer/resource responsible for (a) region(s) within the whole feature space

global indexing 1 coordinate system in total

1. Motivation and Introduction

Distributed indexing and search

Data has to be transferred to / indexed by the responsible peer

Page 8: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 8)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

local indexing 1 coordinate system per resource

Data summaries per resource enhance query routing

Querying resource knows summaries of all other resources (scalability: Rumorama)

our scenario 1. Motivation and Introduction

Distributed indexing and search

Page 9: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 9)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

Administration of distributed media items:

Content-based image retrieval (CBIR) just an example scenario.

Goal: Provide resource description and selection techniques | for general

metric spaces | and apply them elsewhere (e.g. geo-context)

media item (MI)

low-level features

1. Motivation and Introduction

Media types in use

textual content

timestamps

geographic footprint

location

color and texture

tags, description

shot date

Page 10: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 10)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

Resource description and selection: Identify promising MAs

according to a query based on a set of known data summaries.

media item

textual content

low-level features

timestamps

geographic footprint

… …

MA desc. text

MA desc. low-level

MA desc. time

media archive (MA)

MA desc. geo

four summaries per MA (one per criterion)

1. Motivation and Introduction

Media types in use

Page 11: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 11)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

text → (counting) Bloom filters, Topic models, etc.

time → histogram, clustering, our approach feasible

geo → discussed at the end of the talk

low-level → focus in the remainder of the talk (CBIR)

Search by multiple criteria: e.g. ‘town hall Leipzig’ at sunset

merging of several criteria-specific resource rankings

need to model correlations in the resource descriptions

future work: not covered so far

1. Motivation and Introduction

Media types in use → resource descriptions

Page 12: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 12)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

Callan [Cal00] defines distributed information retrieval (IR) based on three

problems and tasks:

1. resource description

adequate descriptions/summaries

2. resource selection

adequate selection mechanisms

3. result merging

merging of resource-specific search results

Often in literature (and also in this talk): resource selection = resource description & selection;

result merging trivial

space requirements

selectivity

1. Motivation and Introduction

Scenario: source selection for image retrieval

Page 13: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 13)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

Image retrieval: often text search & tags

Only text search not sufficient:

Bolettieri et al. [BEF+09]: >100 Mio. Flickr images

30%: no comments and no tags → searchable?

on average: 0.5 comments and 5.0 tags

tag spam, homonyms/synonyms,

selectivity (‘phone’), …

Benefits of content-based image retrieval (CBIR):

automatically extracted image properties

features: color (histograms), texture, salient points, …

1. Motivation and Introduction

Scenario: source selection for image retrieval

Page 14: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 14)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

given: query object 𝑞

database 𝑂

wanted: ‘similar’ images w.r.t. query; criterion: 𝑑 𝑞, 𝑜 with 𝑜 ∈ 𝑂

important query types:

range queries (search radius 𝑟):

range 𝑞, 𝑟 = 𝑜 ∈ 𝑂 𝑑 𝑞, 𝑜 ≤ 𝑟

𝑘-nearest-neighbor queries (desired #hits: 𝑘):

kNN 𝑞, 𝑘 = 𝐾 with ∀𝑜 ∈ 𝐾, 𝑜′ ∈ 𝑂 ∖ 𝐾: 𝑑 𝑞, 𝑜 ≤ 𝑑 𝑞, 𝑜′ and 𝐾 = 𝑘

1. Motivation and Introduction

Scenario: source selection for CBIR

Page 15: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 15)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

resource A

resource B resource C

1. Motivation and Introduction

Scenario: source selection for CBIR

Page 16: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 16)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

1. Motivation and Introduction

Scenario: source selection for CBIR resource A

resource B resource C

Page 17: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 17)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

1. Motivation and Introduction

Scenario: source selection for CBIR

C

A

B

resource description

similarity query

1. C 2. A 3. B

resource selection

Page 18: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 18)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

2. Source Selection:

Research Goals and Work in our Group

Design resource selection techniques which improve

earlier work w.r.t. efficiency:

Space efficiency and selectivity of the resource descriptions

Visualize the source selection process

Show usefulness of the techniques in other scenarios (apart from distributed IR: metric access methods, geo IR, …)

Page 19: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 19)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

1. Assign every DB object to the closest reference object (RO).

2. Count per reference object, how many DB objects have been assigned to it.

cluster histogram computation

2. Source Selection – Research Goals and Work in our Group

Efficient source selection with HFS and UFS

DB objects

reference objects

distance d

Baseline: [MEH05b]

resource description

|.|= γ

Page 20: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 20)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

● reference objects ■ feature objects

assumption: 2-dim. feature space euclidean distance

cluster histogram:

2. Source Selection – Research Goals and Work in our Group

Efficient source selection with HFS and UFS

Page 21: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 21)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

Anfragebearbeitung mit

Zusammenfassungen

cluster histograms as robust approximations of

Page 22: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 22)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

Sicht auf andere Peers mit

Zusammenfassungen

… density distributions

Page 23: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 23)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

Histogrammerstellung cluster histogram

computation

DB objects

reference objects

distance d

[MEH05b]

distributed clustering

resource description

γ

γ

Zusammenfassung

1. Assign every DB object to the closest reference object (RO).

2. Count per reference object, how many DB objects have been assigned to it.

2. Source Selection – Research Goals and Work in our Group

Efficient source selection with HFS and UFS

Page 24: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 24)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

Histogrammerstellung cluster histogram

computation

DB objects

reference objects

distance d resource description

random selection γ

γ

Zusammenfassung

1. Assign every DB object to the closest reference object (RO).

2. Count per reference object, how many DB objects have been assigned to it.

[EMH+06]

2. Source Selection – Research Goals and Work in our Group

Efficient source selection with HFS and UFS

Page 25: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 25)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

Source Selection as in [EMH+06]: version 1 for small 𝛾

1. compute the reference object 𝑐𝑗 closest to 𝑞 w.r.t. 𝑑(𝑞, 𝑐𝑗)

2. ranking of peers

peer 𝑝𝑎 has more DB objects in cluster 𝑐𝑗 than 𝑝𝑏

⇒ 𝑝𝑎 is contacted before 𝑝𝑏

peer 𝑝𝑏 has more DB objects in cluster 𝑐𝑗 than 𝑝𝑎

⇒ 𝑝𝑏 is contacted before 𝑝𝑎

otherwise (same number of objects in 𝑐𝑗) a random decision is made

2. Source Selection – Research Goals and Work in our Group

Efficient source selection with HFS and UFS

Page 26: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 26)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

Anfragebearbeitung mit

Zusammenfassungen

starting point

query

Page 27: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 27)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

Starting 4-NN query: resource ranking

criterion: number of images in query

cluster

query object

Page 28: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 28)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

intermediate result

top 4 documents of contacted

resource

1st resource visited

Page 29: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 29)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

intermediate result (improved)

top 4 documents of contacted

resource

2nd resource visited

Page 30: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 30)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

3rd resource visited

Page 31: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 31)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

4th resource visited

Page 32: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 32)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

5th resource visited

Page 33: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 33)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

Cluster Histogramm Erstellung

HFS computation

DB objekts

distance d resource description

random selection

γ* ≫ γ

γ* ≫ γ

reference objects

[BEM+07]

1. Assign every DB object to the closest reference object (RO).

2. Count per reference object, how many DB objects have been assigned to it.

2. Source Selection – Research Goals and Work in our Group

Efficient source selection with HFS and UFS

Page 34: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 34)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

20 ROs

Page 35: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 35)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

1000 ROs

Page 36: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 36)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

Source Selection as in [EMH+06]: version 2 for higher values of 𝛾

1. compute list 𝐿

sort reference objects 𝑐𝑗 ascending w.r.t. 𝑑(𝑞, 𝑐𝑗)

2. ranking of peers

Iterate over the list 𝐿 (from beginning to end):

peer 𝑝𝑎 has more DB objects in current cluster than 𝑝𝑏

⇒ 𝑝𝑎 is contacted before 𝑝𝑏

peer 𝑝𝑏 has more DB objects in current cluster than 𝑝𝑎

⇒ 𝑝𝑏 is contacted before 𝑝𝑎

otherwise (same number of objects in all clusters) random decision is made

2. Source Selection – Research Goals and Work in our Group

Efficient source selection with HFS and UFS

Page 37: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 37)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

Cluster Histogramm Erstellung

HFS computation

DB objects

distance d

random selection γ* ≫ γ

reference objects

[BEM+07]

resource description !

2. Source Selection – Research Goals and Work in our Group

Efficient source selection with HFS and UFS

γ* ≫ γ

Page 38: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 38)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

Cluster Histogramm Erstellung

HFS computation

DB objects

distance d

random selection

reference objects

[BEM+07]

compression

resource description

2. Source Selection – Research Goals and Work in our Group

Efficient source selection with HFS and UFS

γ* ≫ γ

γ* ≫ γ

Page 39: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 39)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

Cluster Histogramm Erstellung

HFS computation

DB objects

distance d

reference objects

[BEM+07]

compression

resource description

2. Source Selection – Research Goals and Work in our Group

Efficient source selection with HFS and UFS

γ* ≫ γ

γ* ≫ γ

transferred with software updates

external collection selection

Page 40: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 40)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

Cluster Histogramm Erstellung

UFS computation

DB objects

distance d

reference objects

compression

resource description

2. Source Selection – Research Goals and Work in our Group

Efficient source selection with HFS and UFS

γ* ≫ γ

γ* ≫ γ

transferred with software updates

external collection selection

[BH10a, BH10b]

UFS are bit vectors. A bit set at position 𝑗 indicates, that at least one DB object is assigned to reference object 𝑐𝑗.

Page 41: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 41)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

Another view on the source selection process with UFS:

2. Source Selection – Research Goals and Work in our Group

Visualizing the source selection process

Set of peers: 𝑃 = 𝑝𝑖 1 ≤ 𝑖 ≤ 𝑛 ,

(1 summary 𝑠𝑖 𝑗 per row)

Set of ROs: 𝐶 = 𝑐𝑗 1 ≤ 𝑗 ≤ 𝛾 ,

(1 RO 𝑐𝑗 with cluster 𝑐𝑗 per column)

𝑠𝑖 𝑗 = 0

𝑝𝑖 has NO docs in 𝑐𝑗

𝑠𝑖 𝑗 > 0

𝑝𝑖 has ≥ 1 doc(s) in 𝑐𝑗

source selection as an 𝛾 × 𝑛 image.

Page 42: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 42)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

Another view on the source selection process with UFS:

2. Source Selection – Research Goals and Work in our Group

Visualizing the source selection process

Set of peers: 𝑃 = 𝑝𝑖 1 ≤ 𝑖 ≤ 𝑛 ,

(1 summary 𝑠𝑖 𝑗 per row)

Set of ROs: 𝐶 = 𝑐𝑗 1 ≤ 𝑗 ≤ 𝛾 ,

(1 RO 𝑐𝑗 with cluster 𝑐𝑗 per column)

𝑠𝑖 𝑗 = 0

𝑝𝑖 has NO docs in 𝑐𝑗

𝑠𝑖 𝑗 > 0

𝑝𝑖 has ≥ 1 doc(s) in 𝑐𝑗

source selection as an 𝛾 × 𝑛 image.

one cluster /

reference object

one peer

Page 43: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 43)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

2. Source Selection – Research Goals and Work in our Group

Visualizing the source selection process

no ranking

top 80 peers

lowest 80 peers

MIRFLICKR 25k collection: http://press.liacs.nl/mirflickr/ 24.258 images assigned to 9.862 peers based on the Flickr-UserID CEDD descriptor + Hellinger distance

Page 44: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 44)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

2. Source Selection – Research Goals and Work in our Group

Visualizing the source selection process

… …

no ranking by peer size HFS/UFS ranker

𝑐𝑗 ordered by

Increasing

𝑑 𝑞, 𝑐𝑗

top 80 peers

lowest 80 peers

Peers having docs in

query cluster

Peers having NO docs in

query cluster

Page 45: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 45)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

2. Source Selection – Research Goals and Work in our Group

Applicability in other application fields

a) Design of a MAM based on inverted files:

𝑐0 𝑜7 𝑜8

𝑐1 𝑜1 𝑜4 𝑜6

𝑐γ−1 𝑜5 …

… 𝑑 𝑐1, 𝑜4 , … , 𝑑 𝑐γ−1 , 𝑜4

store object-pivot distances in the postings

map clusters to posting list

Search: pruning of clusters/lists pruning of DB objects

Page 46: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 46)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

2. Source Selection – Research Goals and Work in our Group

Applicability in other application fields

b) Apply HFS/UFS and extensions in the context of geographic data and

compare it with:

geometric approaches space partitioning approaches

→ also: hybrid combinations!

= peer data

Page 47: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 47)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

: peer data

2. Source Selection – Research Goals and Work in our Group

Applicability in other application fields

kMeans++ MBR

RecMBR CH

geometric approaches

Page 48: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 48)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

Exemplary ranking for MBR summaries, other rankers for geometric

resource descriptions work likewise!

Given:

Query 𝑞 and two peers: 𝑝𝑎 with MBRa and 𝑝𝑏 with MBRb

Peer Ranking:

kMeans++ and RecMAR: For each peer a queue of shapes is created (ordered

by minimum distance to query point), queue based consideration of shapes.

),,(),(),(

),(),(),(

),(),(

),(),(

baba

baba

abab

baba

MBRMBRqdistToMBRqMBRcontainsqMBRcontains

MBRMBRrecMBRSizeqMBRcontainsqMBRcontains

ppqMBRcontainsqMBRcontains

ppqMBRcontainsqMBRcontains

reciprocal size of area covered

by MBR

minimum distance to MBR

2. Source Selection – Research Goals and Work in our Group

Applicability in other application fields

Page 49: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 49)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

: peer data

2. Source Selection – Research Goals and Work in our Group

Applicability in other application fields

GFBun

UFSn

GRIDn

DFSn

: reference object

space partitioning approaches

Page 50: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 50)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

1. Create list L of subspaces/clusters sorted by minimum

distance of subspace to query object (cf. HFS/UFS as before)

2. j = 1;

3. Choose subspace [cj], positioned j-th in list L (1 ≤ j ≤ γ):

pa administers documents in [cj], pb does not ⇒ 𝑝𝑎 ≻ 𝑝𝑏

pa and pb both do (not) administer documents in [cj]

j++; GOTO 3;

2. Source Selection – Research Goals and Work in our Group

Applicability in other application fields

Grid: considers rings of neighboring cells instead of single subspaces

UFS/DFS: list ordered by min. distance of ROs to query object

Page 51: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 51)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

20

40

60

80

100

120

140

160

0,5

0,7

0,9

1,1

1,3

1,5

1,7

1,9

512 2048 8192

avg s

um

mary

siz

e in b

yte

fraction o

f conta

cte

d p

eers

in %

number of subspaces

queryMode=1

GFBu_e UFS_e DFS_e

2. Source Selection – Research Goals and Work in our Group

Applicability in other application fields

Experimental Results – Space Partitioning Approaches

better selectivity than

grid-based approach

storing additional

distances increases

selectivity but increases

storage requirements

Page 52: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 52)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

3. Conclusion and Outlook: ‘Source selection through visual analytics’

0

100

200

300

02040

0

peers

ROs cluster

populations peer sizes

Page 53: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 53)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

3. Conclusion and Outlook: ‘Source selection through visual analytics’

0

100

200

300

02040

0

no ranking

Page 54: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 54)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

3. Conclusion and Outlook: ‘Source selection through visual analytics’

0

100

200

300

… …

0 050 0

ranking by peer size

Page 55: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 55)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

3. Conclusion and Outlook: ‘Source selection through visual analytics’

0

100

200

300

… …

0 050 0

visualize ROs

Page 56: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 56)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

3. Conclusion and Outlook: ‘Source selection through visual analytics’

0

100

200

300

… …

0 050 0

‘arbitrary’ ROs possible

(metric space)

AGSTVAVREA

| |||||| |

ATSTVAVRSA

X

Y Z

B

D

visualize ROs

Page 57: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 57)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

3. Conclusion and Outlook: ‘Source selection through visual analytics’

0

100

200

300

… …

0 050 0

choose RO as query

Page 58: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 58)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

3. Conclusion and Outlook: ‘Source selection through visual analytics’

0 0 02040

[cj] ordered by increasing d(q,cj)

UFS ranking

0

100

200

300

0

Page 59: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 59)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

3. Conclusion and Outlook: ‘Source selection through visual analytics’

0 0 02040

[cj] ordered by increasing d(q,cj)

0

100

200

300

0

… …

9 NNs

Page 60: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 60)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

3. Conclusion and Outlook: ‘Source selection through visual analytics’

0

100

200

300

0 0 02040

0

[cj] ordered by increasing d(q,cj)

visualize peer summaries

Page 61: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 61)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

3. Conclusion and Outlook: ‘Source selection through visual analytics’

0

100

200

300

0 0 02040

0

[cj] ordered by increasing d(q,cj)

visualize peer summaries

Page 62: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 62)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

3. Conclusion and Outlook: ‘Source selection through visual analytics’

0

100

200

300

0 0 02040

0

[cj] ordered by increasing d(q,cj)

find similar peers

Similarity on UFS/HFS summaries: Hamming dist. (ext.) quadratic form KL divergence … → using correlations: distances between ROs

Page 63: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 63)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

3. Conclusion and Outlook: ‘Source selection through visual analytics’

0

100

200

300

0 0 02040

0

[cj] ordered by increasing d(q,cj)

browsing: images/peers least similar to q → interestingness?

interestingness specialties

Page 64: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 64)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

3. Conclusion and Outlook: ‘Source selection through visual analytics’

0

100

200

300

0 0 02040

0

[cj] ordered by increasing d(q,cj)

browsing: images/peers least similar to q → interestingness?

interestingness specialties

Page 65: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 65)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

3. Conclusion and Outlook: ‘Source selection through visual analytics’

0

100

200

300

0 0 02040

0

interestingness specialties

browsing: images/peers least similar to q → interestingness?

Page 66: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Summarizing data collections by their spatial, temporal, textual and image footprint (p. 66)

Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012

15. November 2012

Thank you very much for your attention!

Questions?

3. Conclusion and Outlook: ‘Source selection through visual analytics’

0100200300 0

50

100

0

the end

Page 67: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Literaturverzeichnis (1/2)

[BDP04] S. Berretti, A. Del Bimbo, and P. Pala. Merging results for distributed content based image

retrieval. Multimedia Tools Appl., 24(3):215–232, 2004.

[BEF+09] P. Bolettieri, A. Esuli, F. Falchi, C. Lucchese, R. Perego, T. Piccioli, and F. Rabitti. CoPhIR: a Test

Collection for Content-Based Image Retrieval. CoRR, abs/0905.4627v2,

http://arxiv.org/abs/0905.4627v2 (last visit: 2.11.2011), 2009.

[BEM+07] D. Blank, S. El Allali, W. Müller, and A. Henrich. Sample-based creation of peer summaries for

efficient similarity search in scalable peer-to-peer networks. In Intl. SIGMM Workshop on

Multimedia Information Retrieval, pages 143–152, Augsburg, Germany, 2007. ACM.

[BH10a] D. Blank and A. Henrich. Binary histograms for resource selection in peer-to-peer media

retrieval. In Proc. of LWA Workshop Lernen, Wissen, Adaptivität, pages 183–190, Kassel, Germany,

2010.

[BH10b] D. Blank and A. Henrich. Description and selection of media archives for geographic nearest

neighbor queries in P2P networks. In Proc. of the Information Access for Personal Media Archives

Workshop, pages 22–29, http://doras.dcu.ie/15373/ (last visit: 2.11.2011), 2010.

[BH12] D. Blank and A. Henrich. Inverted file-based indexing for efficient multimedia information

retrieval in metric spaces. In Proc. of the 27th ACM Intl. Symp. on Applied Computing. Riva del

Garda, Italy, 2012 (to appear).

p. 69 Summarizing data collections by their spatial, temporal, textual and image footprint:

Techniques for source selection and beyond

Page 68: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Literaturverzeichnis (2/2)

[Cal00] J. Callan. Distributed information retrieval. In W. B. Croft, editor, Advances in Information

Retrieval, pages 127–150. Kluwer Academic Publishers, 2000.

[EMH+06] M. Eisenhardt, W. Müller, A. Henrich, D. Blank, and S. El Allali. Clustering-based source selection

for efficient image retrieval in peer-to-peer networks. In Proc. of the 8th Intl. Symp. on Multimedia,

pages 823–830, San Diego, CA, USA, 2006. IEEE.

[MEH05a] W. Müller, M. Eisenhardt, and A. Henrich. Scalable summary based retrieval in P2P networks. In

Proc. of the 14th Intl. Conf. on Information and Knowledge Management, pages 586–593, Bremen,

Germany, 2005. ACM.

[MEH05b] W. Müller, M. Eisenhardt, and A. Henrich. Fast retrieval of high-dimensional feature vectors in

P2P networks using compact peer data summaries. Multimedia Systems, 10(6):464–474, 2005.

[MKL+03] D. S. Milojicic, V. Kalogeraki, R. Lukose, K. Nagaraja, J. Pruyne, B. Richard, S. Rollins, and Z. Xu.

Peer-to-peer computing. Technical Report HPL-2002-57 (R.1), HP Laboratories, Palo Alto, CA, USA, July

2003.

[ZAD+05] P. Zezula, G. Amato, V. Dohnal, and M. Batko. Similarity Search: The Metric Space Approach.

Springer New York, Inc., Secaucus, NJ, USA, 2005.

p. 70 Summarizing data collections by their spatial, temporal, textual and image footprint:

Techniques for source selection and beyond

Page 69: Summarizing data collections by their spatial, temporal ... · Andreas Henrich and Daniel Blank − "Scalable Visual Analytics" Text-Workshop in Leipzig, November 15, 2012 15. November

Bildquellen des Vortrags

Seite 3: [BEF+09]

Seite 5: http://commons.wikimedia.org/wiki/File:Bamberg-Rathaus1-Asio.JPG

Seite 6: http://commons.wikimedia.org/wiki/File:Dreieck.svg?uselang=de

Seite 7 und 12: http://commons.wikimedia.org/wiki/File:Bamberg-Rathaus3-Asio.JPG

Seite 10 und 11:

http://commons.wikimedia.org/wiki/File:Bamberg-Rathaus1-Asio.JPG

http://commons.wikimedia.org/wiki/File:Bamberg_Klein-Venedig_I.jpg

http://commons.wikimedia.org/wiki/File:Marienberg_wuerzburg.jpg

http://commons.wikimedia.org/wiki/File:Marienkapelle.JPG

http://commons.wikimedia.org/wiki/File:Zugspitze_Sonnenuntergang_Westseite.JPG

http://commons.wikimedia.org/wiki/File:Sonnenuntergang_001.jpg

Das pinke Sonnenuntergangsbild ist derzeit nicht mehr bei Wikimedia verfügbar.

Seite 17: http://commons.wikimedia.org/wiki/File:Voronoi_diagram.svg

p. 71 Summarizing data collections by their spatial, temporal, textual and image footprint:

Techniques for source selection and beyond