link discovery tutorial part i: efficiency
TRANSCRIPT
![Page 1: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/1.jpg)
Link Discovery TutorialPart I: Efficiency
Axel-Cyrille Ngonga Ngomo(1), Irini Fundulaki(2), Mohamed Sherif(1)
(1) Institute for Applied Informatics, Germany(2) FORTH, Greece
October 18th, 2016Kobe, Japan
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 1 / 106
![Page 2: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/2.jpg)
Table of Contents
1 Introduction
2 LIMES
3 MultiBlock
4 Reduction-Ratio-Optimal Link Discovery
5 Link Discovery for Temporal Data
6 GNOME
7 GPUs and Hadoop
8 Summary and Conclusion
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 2 / 106
![Page 3: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/3.jpg)
Table of Contents
1 Introduction
2 LIMES
3 MultiBlock
4 Reduction-Ratio-Optimal Link Discovery
5 Link Discovery for Temporal Data
6 GNOME
7 GPUs and Hadoop
8 Summary and Conclusion
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 3 / 106
![Page 4: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/4.jpg)
IntroductionLink Discovery
Definition (Declarative Link Discovery)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : R(s, t)Here, find M ′ = (s, t) ∈ S × T : δ(s, t) ≥ τ
ProblemNaïve complexity ∈ O(S × T ), i.e., O(n2)Example: ≈ 70 days for cities in DBpedia and LinkedGeoData
Solutions1 Reduce the number of comparisons C(A) ≥ |M ′| [NA11; IJB11; Ngo12;
Ngo13]2 Use planning [Ngo14]3 Reduce the complexity of linking to < n2 [GSN16]4 Use features of hardware environment [Ngo+13; NH16]
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 4 / 106
![Page 5: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/5.jpg)
IntroductionLink Discovery
Definition (Declarative Link Discovery)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : R(s, t)Here, find M ′ = (s, t) ∈ S × T : δ(s, t) ≥ τ
ProblemNaïve complexity ∈ O(S × T ), i.e., O(n2)Example: ≈ 70 days for cities in DBpedia and LinkedGeoData
Solutions1 Reduce the number of comparisons C(A) ≥ |M ′| [NA11; IJB11; Ngo12;
Ngo13]2 Use planning [Ngo14]3 Reduce the complexity of linking to < n2 [GSN16]4 Use features of hardware environment [Ngo+13; NH16]
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 4 / 106
![Page 6: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/6.jpg)
IntroductionLink Discovery
Definition (Declarative Link Discovery)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : R(s, t)Here, find M ′ = (s, t) ∈ S × T : δ(s, t) ≥ τ
ProblemNaïve complexity ∈ O(S × T ), i.e., O(n2)Example: ≈ 70 days for cities in DBpedia and LinkedGeoData
Solutions1 Reduce the number of comparisons C(A) ≥ |M ′| [NA11; IJB11; Ngo12;
Ngo13]2 Use planning [Ngo14]3 Reduce the complexity of linking to < n2 [GSN16]4 Use features of hardware environment [Ngo+13; NH16]
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 4 / 106
![Page 7: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/7.jpg)
Table of Contents
1 Introduction
2 LIMES
3 MultiBlock
4 Reduction-Ratio-Optimal Link Discovery
5 Link Discovery for Temporal Data
6 GNOME
7 GPUs and Hadoop
8 Summary and Conclusion
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 5 / 106
![Page 8: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/8.jpg)
LIMESIntuition
Definition (Declarative Link Discovery)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : R(s, t)Here, find M ′ = (s, t) ∈ S × T : δ(s, t) ≥ τ
Some δ are distances (in the mathematical sense) [NA11]Examples
Levenshtein distanceMinkowski distance
IntuitionDistances abide by triangle inequalityδ(x , z) + δ(z , y) ≤ δ(x , y) ≤ δ(x , z) + δ(z , y)Use exemplars for pessimistic approximationReduce number of computations
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 6 / 106
![Page 9: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/9.jpg)
LIMESIntuition
Definition (Declarative Link Discovery)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : R(s, t)Here, find M ′ = (s, t) ∈ S × T : δ(s, t) ≥ τ
Some δ are distances (in the mathematical sense) [NA11]Examples
Levenshtein distanceMinkowski distance
IntuitionDistances abide by triangle inequalityδ(x , z) + δ(z , y) ≤ δ(x , y) ≤ δ(x , z) + δ(z , y)Use exemplars for pessimistic approximationReduce number of computations
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 6 / 106
![Page 10: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/10.jpg)
LIMESApproach
1 Start with random e1 ∈ T
2 en+1 = argmaxx∈T
n∑i=1
δ(x , ei )
3 Assign each t to e(t) with e(t) = argminei
δ(t, ei )
4 Approximate δ(s, t) by using δ(s, t) ≥ δ(s, e(t))− δ(e(t), t)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 7 / 106
![Page 11: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/11.jpg)
LIMESApproach
1 Start with random e1 ∈ T
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 8 / 106
![Page 12: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/12.jpg)
LIMESApproach
1 Start with random e1 ∈ T
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 9 / 106
![Page 13: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/13.jpg)
LIMESApproach
2 en+1 = argmaxx∈T
n∑i=1
δ(x , ei )
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 10 / 106
![Page 14: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/14.jpg)
LIMESApproach
3 Assign each t to e(t) with e(t) = argminei
δ(t, ei )
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 11 / 106
![Page 15: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/15.jpg)
LIMESApproach
4 Approximate δ(s, t) by using δ(s, t) ≥ δ(s, e(t))− δ(e(t), t)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 12 / 106
![Page 16: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/16.jpg)
LIMESEvaluation
Measure = LevenshteinThreshold = 0.9, KB base size = 103
x-axis is number of exemplars, y-axis is number of comparisons
0
2
4
6
8
10
0 50 100 150 200 250 300
0.75
0.8
0.85
0.9
0.95
Brute force
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 13 / 106
![Page 17: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/17.jpg)
LIMESEvaluation
Measure = LevenshteinThreshold = 0.9, KB base size = 104
x-axis is number of exemplars, y-axis is number of comparisonsOptimal number of examplars ≈
√|T |
0
100
200
300
400
500
600
700
800
900
1000
0 50 100 150 200 250 300
0.75
0.8
0.85
0.9
0.95
Brute force
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 14 / 106
![Page 18: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/18.jpg)
LIMESConclusion
Time-efficent approachReduces number of comparisons through approximationNo guarantee pertaining to reduction ratio
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 15 / 106
![Page 19: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/19.jpg)
Table of Contents
1 Introduction
2 LIMES
3 MultiBlock
4 Reduction-Ratio-Optimal Link Discovery
5 Link Discovery for Temporal Data
6 GNOME
7 GPUs and Hadoop
8 Summary and Conclusion
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 16 / 106
![Page 20: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/20.jpg)
MultiBlockIntuition
IdeaCreate multidimensional index of dataPartition the space so that σ(x , y) < θ ⇒ index(x) 6= index(y)Decrease number of computations by only comparing pairs (x , y) withindex(x) = index(y)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 17 / 106
![Page 21: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/21.jpg)
MultiBlockIntuition
IdeaCreate multidimensional index of dataPartition the space so that σ(x , y) < θ ⇒ index(x) 6= index(y)Decrease number of computations by only comparing pairs (x , y) withindex(x) = index(y)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 17 / 106
![Page 22: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/22.jpg)
MultiBlockApproach
Similarity functions are aggregations of atomic measures sim, e.g.,σ = MIN(levenshtein, jaccard)Each atomic measure must define a blocking functionblock : (S ∪ T )× [0, 1]→ P(Nn)Each aggregation must define a similarity, a block and a thresholdaggregation
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 18 / 106
![Page 23: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/23.jpg)
MultiBlockExample
sim = levenshtein(s,t)max(|s|,|t|) (normalized levenshtein)
block : Σq → NMaps each q-gram to exactly one block (e.g., Goedelisation)Only c = max(|s|, |t|)(1− θ) · q + 1 q-grams to be indexed per string
index(s, θ) = block(qgrams(s[0...c])String mapped to several blocksComparison only within blocks
Comparison of drugs in DBpedia and Drugbank
Setting Comparisons Runtime LinksFull comparison 22,242,292 430s 1,403Blocking (100 blocks) 906,314 44s 1,349Blocking (1,000 blocks) 322,573 14s 1,287MultiBlock 122,630 6s 1,403
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 19 / 106
![Page 24: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/24.jpg)
MultiBlockExample
sim = levenshtein(s,t)max(|s|,|t|) (normalized levenshtein)
block : Σq → NMaps each q-gram to exactly one block (e.g., Goedelisation)Only c = max(|s|, |t|)(1− θ) · q + 1 q-grams to be indexed per string
index(s, θ) = block(qgrams(s[0...c])String mapped to several blocksComparison only within blocks
Comparison of drugs in DBpedia and Drugbank
Setting Comparisons Runtime LinksFull comparison 22,242,292 430s 1,403Blocking (100 blocks) 906,314 44s 1,349Blocking (1,000 blocks) 322,573 14s 1,287MultiBlock 122,630 6s 1,403
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 19 / 106
![Page 25: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/25.jpg)
Table of Contents
1 Introduction
2 LIMES
3 MultiBlock
4 Reduction-Ratio-Optimal Link Discovery
5 Link Discovery for Temporal Data
6 GNOME
7 GPUs and Hadoop
8 Summary and Conclusion
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 20 / 106
![Page 26: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/26.jpg)
HR3
Link Discovery
Definition (Declarative Link Discovery)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : R(s, t)Here, find M ′ = (s, t) ∈ S × T : δ(s, t) ≥ τ
Intuition: Reduce the number of comparisons C(A) ≥ |M ′|Maximize reduction ratio:
RR(A) = 1− C(A)|S||T |
QuestionCan we devise lossless approaches with guaranteed RR?Advantages
Space managementRuntime predictionResource scheduling
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 21 / 106
![Page 27: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/27.jpg)
HR3
Link Discovery
Definition (Declarative Link Discovery)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : R(s, t)Here, find M ′ = (s, t) ∈ S × T : δ(s, t) ≥ τ
Intuition: Reduce the number of comparisons C(A) ≥ |M ′|Maximize reduction ratio:
RR(A) = 1− C(A)|S||T |
QuestionCan we devise lossless approaches with guaranteed RR?Advantages
Space managementRuntime predictionResource scheduling
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 21 / 106
![Page 28: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/28.jpg)
HR3
Link Discovery
Definition (Declarative Link Discovery)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : R(s, t)Here, find M ′ = (s, t) ∈ S × T : δ(s, t) ≥ τ
Intuition: Reduce the number of comparisons C(A) ≥ |M ′|Maximize reduction ratio:
RR(A) = 1− C(A)|S||T |
QuestionCan we devise lossless approaches with guaranteed RR?Advantages
Space managementRuntime predictionResource scheduling
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 21 / 106
![Page 29: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/29.jpg)
HR3
RR Guarantee
Best achievable reduction ratio: RRmax = 1− |M′||S||T |
Approach H(α) fulfills RR guarantee criterion, iff:
∀r < RRmax,∃α : RR(H(α)) ≥ r
Here, we use relative reduction ratio (RRR):
RRR(A) = RRmaxRR(A)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 22 / 106
![Page 30: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/30.jpg)
HR3
RR Guarantee
Best achievable reduction ratio: RRmax = 1− |M′||S||T |
Approach H(α) fulfills RR guarantee criterion, iff:
∀r < RRmax,∃α : RR(H(α)) ≥ r
Here, we use relative reduction ratio (RRR):
RRR(A) = RRmaxRR(A)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 22 / 106
![Page 31: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/31.jpg)
HR3
RR Guarantee
Best achievable reduction ratio: RRmax = 1− |M′||S||T |
Approach H(α) fulfills RR guarantee criterion, iff:
∀r < RRmax,∃α : RR(H(α)) ≥ r
Here, we use relative reduction ratio (RRR):
RRR(A) = RRmaxRR(A)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 22 / 106
![Page 32: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/32.jpg)
Goal
Formal GoalDevise H(α) : ∀r > 1,∃α : RRR(H(α)) ≤ r
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 23 / 106
![Page 33: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/33.jpg)
HR3
Restrictions
Minkowski Distance
δ(s, t) = p
√√√√ n∑i=1|si − ti |p, p ≥ 2
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 24 / 106
![Page 34: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/34.jpg)
HR3
Space Tiling
HYPPOδ(s, t) ≤ θ describes a hypersphereApproximate hypersphere by using a hypercube
Easy to computeNo loss of recall (blocking)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 25 / 106
![Page 35: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/35.jpg)
HR3
Space Tiling
Set width of single hypercube to ∆ = θ/α
Tile Ω = S ∪ T into the adjacent cubes CCoordinates: (c1, . . . , cn) ∈ Nn
Contains points ω ∈ Ω : ∀i ∈ 1 . . . n, ci ∆ ≤ ωi < (ci + 1)∆
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 26 / 106
![Page 36: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/36.jpg)
HR3
Space Tiling
Set width of single hypercube to ∆ = θ/αTile Ω = S ∪ T into the adjacent cubes C
Coordinates: (c1, . . . , cn) ∈ Nn
Contains points ω ∈ Ω : ∀i ∈ 1 . . . n, ci ∆ ≤ ωi < (ci + 1)∆
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 26 / 106
![Page 37: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/37.jpg)
HR3
Space Tiling
Set width of single hypercube to ∆ = θ/αTile Ω = S ∪ T into the adjacent cubes C
Coordinates: (c1, . . . , cn) ∈ Nn
Contains points ω ∈ Ω : ∀i ∈ 1 . . . n, ci ∆ ≤ ωi < (ci + 1)∆
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 26 / 106
![Page 38: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/38.jpg)
HYPPOIdea
Combine (2α + 1)n hypercubes around C(ω) to approximate hypersphere
RRR(HYPPO(α)) = (2α+1)n
αnS(n)
limα→∞
RRR(HYPPO(α)) = 2n
S(n)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 27 / 106
![Page 39: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/39.jpg)
HYPPOReduction Ratio
RRR(HYPPO) for p = 2, n = 2, 3, 4 and 2 ≤ α ≤ 50
limα→∞
RRR(HYPPO(α)) = 4π ≈ 1.27 (n = 2)
limα→∞
RRR(HYPPO(α)) = 6π ≈ 1.91 (n = 3)
limα→∞
RRR(HYPPO(α)) = 32π2 ≈ 3.24 (n = 4)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 28 / 106
![Page 40: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/40.jpg)
HYPPOReduction Ratio
RRR(HYPPO) for p = 2, n = 2, 3, 4 and 2 ≤ α ≤ 50
limα→∞
RRR(HYPPO(α)) = 4π ≈ 1.27 (n = 2)
limα→∞
RRR(HYPPO(α)) = 6π ≈ 1.91 (n = 3)
limα→∞
RRR(HYPPO(α)) = 32π2 ≈ 3.24 (n = 4)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 28 / 106
![Page 41: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/41.jpg)
HR3
Idea
index(C , ω) =
0 if ∃i : |ci − c(ω)i | ≤ 1, 1 ≤ i ≤ n,n∑
i=1(|ci − c(ω)i | − 1)p else,
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 29 / 106
![Page 42: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/42.jpg)
HR3
Idea
Compare C(ω) with C iff index(C , ω) ≤ αp
α = 4, p = 2
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 30 / 106
![Page 43: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/43.jpg)
HR3
Idea
ClaimsNo loss of recalllimα→∞
RRR(HR3(α)) = 1
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 31 / 106
![Page 44: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/44.jpg)
HR3
Lemma 1
Lemmaindex(C , s) = x ⇒ ∀t ∈ C δp(s, t) > x∆p
p = 2, n = 2, index(C , s) = 5
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 32 / 106
![Page 45: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/45.jpg)
HR3
Lemma 1
Lemmaindex(C , s) = x ⇒ ∀t ∈ C δp(s, t) > x∆p
p = 2, n = 2, index(C , s) = 5
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 32 / 106
![Page 46: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/46.jpg)
HR3
Lemma 1
Lemmaindex(C , s) = x ⇒ ∀t ∈ C δp(s, t) > x∆p
Proof.
index(C , s) = x ⇒n∑
i=1(|ci − ci (s)| − 1)p = x
Out of definition of cube index follows |si − ti | > (|ci − ci (s)| − 1)∆
Thus,n∑
i=1|si − ti |p >
n∑i=1
(|ci − ci (s)| − 1)p∆p
Therewith, δp(s, t) > x∆p
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 33 / 106
![Page 47: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/47.jpg)
HR3
Lemma 2
Lemma
∀s ∈ S : index(C , s) > αp implies that all t ∈ C are non-matches
Proof.Follows directly from Lemma 1:index(C , s) > αp ⇒ ∀t ∈ C , δp(s, t) > ∆pαp = θp
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 34 / 106
![Page 48: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/48.jpg)
HR3
Idea
Claims
No loss of recallXlimα→∞
RRR(HR3(α)) = 1
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 35 / 106
![Page 49: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/49.jpg)
HR3
Lemma 3
Lemma∀α > 1 RRR(HR3(2α)) < RRR(HR3(α))
p = 2, α = 4
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 36 / 106
![Page 50: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/50.jpg)
HR3
Proof (idea)
Lemma
∀α > 1 RRR(HR3(2α)) < RRR(HR3(α))
p = 2, α = 8
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 37 / 106
![Page 51: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/51.jpg)
HR3
Proof
Lemma∀α > 1 RRR(HR3(2α)) < RRR(HR3(α))
p = 2, α = 25
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 38 / 106
![Page 52: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/52.jpg)
HR3
Proof
Lemma∀α > 1 RRR(HR3(2α)) < RRR(HR3(α))
p = 2, α = 50
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 39 / 106
![Page 53: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/53.jpg)
HR3
Theorem
Theorem
limα→∞
RRR(HR3(α)) = 1
Proof.α→∞⇒ ∆→ 0Thus, C(s) = s,C = t
Index function :n∑
i=1∆p(|ci (s)− ci | − 1)p ≤ ∆pαp
∆→ 0⇒n∑
i=1|si − ti |p ≤ θp
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 40 / 106
![Page 54: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/54.jpg)
HR3
Idea
Claims
No loss of recallXlimα→∞
RRR(HR3(α)) = 1X
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 41 / 106
![Page 55: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/55.jpg)
HR3
Evaluation
Compare HR3 with LIMES 0.5’s HYPPO and SILK 2.5.1Experimental Setup:
Deduplicating DBpedia places by minimum elevation, elevation and maximumelevation (θ = 49m, 99m).Geonames and LinkedGeoData by longitude and latitude (θ = 1, 9)
Windows 7 Enterprise machine 64-bit computer with a 2.8GHz i7 processorwith 8GB RAM.
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 42 / 106
![Page 56: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/56.jpg)
HR3
Evaluation (Comparisons)
Experiment 2: Deduplicating DBpedia places, θ = 99m0.64× 106 less comparisons
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 43 / 106
![Page 57: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/57.jpg)
HR3
Experiments (Comparisons)
Experiment 4: Linking Geonames and LinkedGedData, θ = 94.3× 106 less comparisons
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 44 / 106
![Page 58: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/58.jpg)
HR3
Experiments (RRR)
Experiment 3,4: Geonames and LGD, θ = 1, 9
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 45 / 106
![Page 59: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/59.jpg)
HR3
Experiments (Runtime)
Experiment 3,4: Geonames and LGD, θ = 1, 9
2 4 8 16 32
Granularity
0
20
40
60
80
100
120
140
160
180
Run
time
(s)
HYPPO (Exp 3)HR3 (Exp 3)HYPPO (Exp. 4)HR3 (Exp. 4)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 46 / 106
![Page 60: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/60.jpg)
HR3
Experiments (Runtime)
Experiment 1, 2: DBpedia, θ = 49, 99mExperiment 3, 4: Geonames and LGD, θ = 1, 9
Exp. 1 Exp. 2 Exp. 3 Exp. 4100
101
102
103
104
Run
time
(s)
HR3HYPPOSILK
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 47 / 106
![Page 61: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/61.jpg)
HR3
Conclusion
Main result: new category of algorithms for link discoveryOutperforms the state of the art (runtime, comparisons)Future Work
Combine HR3 with multi-indexing approachDevise resource management approachDevelop other algorithms (esp. for strings) with the same/similar theoreticalguarantees
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 48 / 106
![Page 62: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/62.jpg)
Table of Contents
1 Introduction
2 LIMES
3 MultiBlock
4 Reduction-Ratio-Optimal Link Discovery
5 Link Discovery for Temporal Data
6 GNOME
7 GPUs and Hadoop
8 Summary and Conclusion
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 49 / 106
![Page 63: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/63.jpg)
AEGLEWhy?
:E1 rdfs:label "Engine failure"@en:E1 rdf:type :Error:E1 :beginDate :"2015-04-22T11:39:35":E1 :endDate :"2015-04-22T11:39:37"
:E2 rdfs:label "Car accident"@en:E2 rdf:type :Accident:E2 :beginDate :"2015-06-28T11:45:22":E2 :endDate :"2015-06-28T11:45:24"
Need to create links between events, e.g., :startsEvent
Need to deal with volume and velocity
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 50 / 106
![Page 64: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/64.jpg)
AEGLEEvent Definition
Definition (Event)Events can be modeled as time intervals: v = (b(v), e(v))
b(v) is the beginning time (:beginDate)e(v) is the end time (:endDate)b(v) < e(v)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 51 / 106
![Page 65: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/65.jpg)
AEGLEAllen’s Interval Algebra
Relation Notation Inverse Illustration
X before Y bf (X ,Y ) bfi(X ,Y )
X
Y
X meets Y m(X ,Y ) mi(X ,Y )
X
Y
X finishes Y f (X ,Y ) fi(X ,Y )
X
Y
X starts Y st(X ,Y ) sti(X ,Y )
X
Y
X during Y d(X ,Y ) di(X ,Y )
X
Y
X equal Y eq(X ,Y ) eq(X ,Y )
X
Y
X overlaps with Y ov(X ,Y ) ovi(X ,Y )
X
Y
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 52 / 106
![Page 66: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/66.jpg)
AEGLESolution
Aegle: Allen’s intErval alGebra for LinkdiscovEry
Efficient computation of temporalrelations between eventsAllen’s Interval Algebra: distinct,exhaustive, and qualitative relationsbetween time intervalsIntuition
Expressing 13 Allen relationsusing 8 atomic relationsTime is ordered: Find matchingentities using two sorted lists
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 53 / 106
![Page 67: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/67.jpg)
AEGLEExpress st(s, t) using atomic relations
s
tb(s) = b(t)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 54 / 106
![Page 68: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/68.jpg)
AEGLEExpress st(s, t) using atomic relations
s
tb(s) = b(t)
s
tb(s) < e(t)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 54 / 106
![Page 69: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/69.jpg)
AEGLEExpress st(s, t) using atomic relations
s
tb(s) = b(t)
s
tb(s) < e(t)
s
te(s) > b(t)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 54 / 106
![Page 70: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/70.jpg)
AEGLEExpress st(s, t) using atomic relations
s
tb(s) = b(t)
s
tb(s) < e(t)
s
te(s) > b(t)
s
te(s) < e(t)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 54 / 106
![Page 71: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/71.jpg)
AEGLEAtomic relations
Compute 8 atomic Boolean relations between begin and end pointsBeginBegin (BB) for b(s), b(t):
BB1(s, t) ⇔ (b(s) < b(t))BB0(s, t) ⇔ (b(s) = b(t))BB−1(s, t) ⇔ (b(s) > b(t)) ⇔ ¬(BB1(s, t) ∨ BB0(s, t))
BeginEnd(BE) for b(s), e(t)EndBegin(EB) for e(s), b(t)EndEnd(EE) for e(s), e(t)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 55 / 106
![Page 72: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/72.jpg)
AEGLECombination of relations
s
tst(s, t)⇔ BB0(s, t) ∧ BE 1(s, t) ∧ EB−1(s, t) ∧ EE 1(s, t) ⇔ BB0(s, t) ∧ EE 1(s, t)
t
ssti(s, t)⇔ BB0(s, t) ∧ BE 1(s, t) ∧ EB−1(s, t) ∧ EE−1(s, t) ⇔
BB0(s, t) ∧ EE−1(s, t) = BB0(s, t) ∧¬(EE 0(s, t)∨ EE 1(s, t))
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 56 / 106
![Page 73: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/73.jpg)
AEGLECombination of relations
s
tst(s, t)⇔ BB0(s, t) ∧ BE 1(s, t) ∧ EB−1(s, t) ∧ EE 1(s, t) ⇔ BB0(s, t) ∧ EE 1(s, t)
t
ssti(s, t)⇔ BB0(s, t) ∧ BE 1(s, t) ∧ EB−1(s, t) ∧ EE−1(s, t) ⇔
BB0(s, t) ∧ EE−1(s, t) = BB0(s, t) ∧¬(EE 0(s, t)∨ EE 1(s, t))
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 56 / 106
![Page 74: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/74.jpg)
AEGLECombination of relations
s
tst(s, t)⇔ BB0(s, t) ∧ BE 1(s, t) ∧ EB−1(s, t) ∧ EE 1(s, t) ⇔ BB0(s, t) ∧ EE 1(s, t)
t
ssti(s, t)⇔ BB0(s, t) ∧ BE 1(s, t) ∧ EB−1(s, t) ∧ EE−1(s, t) ⇔
BB0(s, t) ∧ EE−1(s, t) = BB0(s, t) ∧¬(EE 0(s, t)∨ EE 1(s, t))
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 56 / 106
![Page 75: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/75.jpg)
AEGLEAlgorithm for st, sti
Source
s1
s2
Targett1
t2
For st:Compute BB0:
s1 s2 t1 t2
(s1, t1), (s2, t1)
Compute EE 1:(s1, t1), (s2, t1), (s1, t2)
Intersection between BB0 and EE 1:(s1, t1), (s2, t1)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 57 / 106
![Page 76: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/76.jpg)
AEGLEAlgorithm for st, sti
Source
s1
s2
Targett1
t2
For st:
Compute BB0:
s1 s2 t1 t2
(s1, t1), (s2, t1)
Compute EE 1:(s1, t1), (s2, t1), (s1, t2)
Intersection between BB0 and EE 1:(s1, t1), (s2, t1)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 57 / 106
![Page 77: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/77.jpg)
AEGLEAlgorithm for st, sti
Source
s1
s2
Targett1
t2
For st:Compute BB0:
s1 s2 t1 t2
(s1, t1), (s2, t1)
Compute EE 1:(s1, t1), (s2, t1), (s1, t2)
Intersection between BB0 and EE 1:(s1, t1), (s2, t1)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 57 / 106
![Page 78: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/78.jpg)
AEGLEAlgorithm for st, sti
Source
s1
s2
Targett1
t2
For st:Compute BB0:
s1 s2 t1 t2
(s1, t1), (s2, t1)
Compute EE 1:
s1 s2 t1 t2
(s1, t1), (s2, t1), (s1, t2)
Intersection between BB0 and EE 1:(s1, t1), (s2, t1)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 57 / 106
![Page 79: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/79.jpg)
AEGLEAlgorithm for st, sti
Source
s1
s2
Targett1
t2
For st:Compute BB0:
s1 s2 t1 t2
(s1, t1), (s2, t1)
Compute EE 1:
s1 s2 t1 t2
(s1, t1), (s2, t1), (s1, t2)
Intersection between BB0 and EE 1:(s1, t1), (s2, t1)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 57 / 106
![Page 80: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/80.jpg)
AEGLEAlgorithm for st, sti
Source
s1
s2
Targett1
t2
For st:Compute BB0:
s1 s2 t1 t2
(s1, t1), (s2, t1)
Compute EE 1:
s1 s2 t1 t2
(s1, t1), (s2, t1), (s1, t2)
Intersection between BB0 and EE 1:(s1, t1), (s2, t1)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 57 / 106
![Page 81: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/81.jpg)
AEGLEAlgorithm for st, sti
Source
s1
s2
Targett1
t2
For sti :
Retrieve BB0 and EE 1:Compute EE 0:
s1 s2 t1 t2
(s2, t2)
Union between EE 0 and EE 1:(s1, t1), (s2, t1), (s2, t1), (s2, t2)
Difference between BB0 and EE 0, EE 1:(s1, t2), (s2, t2)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 58 / 106
![Page 82: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/82.jpg)
AEGLEAlgorithm for st, sti
Source
s1
s2
Targett1
t2
For sti :Retrieve BB0 and EE 1:
Compute EE 0:
s1 s2 t1 t2
(s2, t2)
Union between EE 0 and EE 1:(s1, t1), (s2, t1), (s2, t1), (s2, t2)
Difference between BB0 and EE 0, EE 1:(s1, t2), (s2, t2)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 58 / 106
![Page 83: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/83.jpg)
AEGLEAlgorithm for st, sti
Source
s1
s2
Targett1
t2
For sti :Retrieve BB0 and EE 1:Compute EE 0:
s1 s2 t1 t2
(s2, t2)
Union between EE 0 and EE 1:(s1, t1), (s2, t1), (s2, t1), (s2, t2)
Difference between BB0 and EE 0, EE 1:(s1, t2), (s2, t2)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 58 / 106
![Page 84: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/84.jpg)
AEGLEAlgorithm for st, sti
Source
s1
s2
Targett1
t2
For sti :Retrieve BB0 and EE 1:Compute EE 0:
s1 s2 t1 t2
(s2, t2)
Union between EE 0 and EE 1:(s1, t1), (s2, t1), (s2, t1), (s2, t2)
Difference between BB0 and EE 0, EE 1:(s1, t2), (s2, t2)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 58 / 106
![Page 85: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/85.jpg)
AEGLEAlgorithm for st, sti
Source
s1
s2
Targett1
t2
For sti :Retrieve BB0 and EE 1:Compute EE 0:
s1 s2 t1 t2
(s2, t2)
Union between EE 0 and EE 1:(s1, t1), (s2, t1), (s2, t1), (s2, t2)
Difference between BB0 and EE 0, EE 1:(s1, t2), (s2, t2)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 58 / 106
![Page 86: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/86.jpg)
AEGLEExperimental Set-Up
Datasets: S = TLog Type Dataset name Size Unique b(s) Unique e(s)
Machinery3KMachines 3,154 960 96030KMachines 28,869 960 960300KMachines 288,690 960 960
Query3KQueries 3,888 3,636 3,63830KQueries 30,635 3,070 3,070300KQueries 303,991 184 184
State-of-the-art:Silk extended to deal with spatio-temporal dataBaseline for eq using brute-force
Evaluation measures:atomic runtime of each of the atomic relationsrelation runtime required to compute each Allen’s relationtotal runtime required to compute all 13 relations
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 59 / 106
![Page 87: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/87.jpg)
AEGLEExperimental Set-Up
Datasets: S = TLog Type Dataset name Size Unique b(s) Unique e(s)
Machinery3KMachines 3,154 960 96030KMachines 28,869 960 960300KMachines 288,690 960 960
Query3KQueries 3,888 3,636 3,63830KQueries 30,635 3,070 3,070300KQueries 303,991 184 184
State-of-the-art:Silk extended to deal with spatio-temporal dataBaseline for eq using brute-force
Evaluation measures:atomic runtime of each of the atomic relationsrelation runtime required to compute each Allen’s relationtotal runtime required to compute all 13 relations
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 59 / 106
![Page 88: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/88.jpg)
AEGLEExperimental Set-Up
Datasets: S = TLog Type Dataset name Size Unique b(s) Unique e(s)
Machinery3KMachines 3,154 960 96030KMachines 28,869 960 960300KMachines 288,690 960 960
Query3KQueries 3,888 3,636 3,63830KQueries 30,635 3,070 3,070300KQueries 303,991 184 184
State-of-the-art:Silk extended to deal with spatio-temporal dataBaseline for eq using brute-force
Evaluation measures:atomic runtime of each of the atomic relationsrelation runtime required to compute each Allen’s relationtotal runtime required to compute all 13 relations
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 59 / 106
![Page 89: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/89.jpg)
AEGLEResults
Q1: Does the reduction of Allen relations to 8 atomic relations influence theoverall runtime of the approach?
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 60 / 106
![Page 90: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/90.jpg)
AEGLEResults
Q1: Does the reduction of Allen relations to 8 atomic relations influence theoverall runtime of the approach?
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 61 / 106
![Page 91: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/91.jpg)
AEGLEResults
Q2: How does Aegle perform when compared with the state of the art interms of time efficiency?
Log Type Dataset Name Total RuntimeAegle Aegle * Silk
Machine3KMachines 11.26 5.51 294.0030KMachines 1,016.21 437.79 29,846.00
300KMachines 189,442.16 78,416.61 NA
Query3KQueries 26.94 17.91 541.00
30KQueries 988.78 463.27 33,502.00300KQueries 211,996.88 86,884.98 NA
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 62 / 106
![Page 92: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/92.jpg)
AEGLEResults
Machine QueryRelation Approach 3KMachines 30KMachines 300KMachines 3KQueries 30KQueries 300KQueries
m Aegle 0.02 0.19 3.42 0.02 0.21 3.89Silk 23.00 2,219.00 NA 41.00 2,466.00 NA
eqAegle 0.05 0.79 49.84 0.05 0.45 348.51
Silk 23.00 2,250.00 NA 41.00 2,473.00 NAbaseline 2.05 171.10 23,436.30 3.15 196.09 31,452.54
ovi Aegle 3.16 222.27 38,226.32 11.97 257.59 42,121.68Silk 22.00 2,189.00 NA 42.00 2,503.00 NA
ConclusionAegle = efficient temporal linking
Reduction of 13 Allen Interval relations to 8 atomic relationsEfficiency: simple sorting with complexity O(n log n)Scalable LD
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 63 / 106
![Page 93: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/93.jpg)
Table of Contents
1 Introduction
2 LIMES
3 MultiBlock
4 Reduction-Ratio-Optimal Link Discovery
5 Link Discovery for Temporal Data
6 GNOME
7 GPUs and Hadoop
8 Summary and Conclusion
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 64 / 106
![Page 94: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/94.jpg)
GnomeProblem
What if ...Data does not fit memory C , i.e.,|S|+ |T | > |C |
1 Common problem2 Growing size and number of
datasets≈ 150× 109 triples≈ 10× 103 datasetsLargest dataset with> 20× 109triples
3 Mostly in-memory solutions
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 65 / 106
![Page 95: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/95.jpg)
GnomeIdea
InsightMost approaches rely on divide-and-merge paradigmExample: HR3
σ(s, t) ≥ θ ⇔ δ(s, t) ≤ ∆
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 66 / 106
![Page 96: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/96.jpg)
GnomeIdea
InsightMost approaches rely on divide-and-merge paradigmExample: HR3
σ(s, t) ≥ θ ⇔ δ(s, t) ≤ ∆
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 66 / 106
![Page 97: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/97.jpg)
GnomeFormal Model
1 Define S = S1, . . . ,Sn with Si ⊆ S ∧⋃iSi = S
2 Define T = T1, . . . ,Sm with Tj ⊆ T ∧⋃jTj = T
3 Find mapping function µ : S → 2T withelements of Si only compared with elements of sets in µ(Si )union of results over all Si ∈ S is exactly M′.
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 67 / 106
![Page 98: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/98.jpg)
GnomeFormal Model
1 Define S = S1, . . . ,Sn with Si ⊆ S ∧⋃iSi = S
2 Define T = T1, . . . ,Sm with Tj ⊆ T ∧⋃jTj = T
3 Find mapping function µ : S → 2T withelements of Si only compared with elements of sets in µ(Si )union of results over all Si ∈ S is exactly M′.
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 67 / 106
![Page 99: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/99.jpg)
GnomeFormal Model
1 Define S = S1, . . . ,Sn with Si ⊆ S ∧⋃iSi = S
2 Define T = T1, . . . ,Sm with Tj ⊆ T ∧⋃jTj = T
3 Find mapping function µ : S → 2T withelements of Si only compared with elements of sets in µ(Si )union of results over all Si ∈ S is exactly M′.
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 67 / 106
![Page 100: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/100.jpg)
GnomeFormal Model
1 Define S = S1, . . . ,Sn with Si ⊆ S ∧⋃iSi = S
2 Define T = T1, . . . ,Sm with Tj ⊆ T ∧⋃jTj = T
3 Find mapping function µ : S → 2T withelements of Si only compared with elements of sets in µ(Si )union of results over all Si ∈ S is exactly M′.
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 67 / 106
![Page 101: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/101.jpg)
GNOMETask Graph
DefinitionA task Eij stands for comparing Si with Tj ∈ µ(Si )
Task Graph G = (V ,E ,wv ,we), withV = S ∪ Twv (v) = |V |we(eij ) = |Si ||Tj |
S1
3
T1
2S2
2
T2
1
S3
4
T3
36
3
4
2
4
12
6
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 68 / 106
![Page 102: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/102.jpg)
GNOMETask Graph
DefinitionA task Eij stands for comparing Si with Tj ∈ µ(Si )Task Graph G = (V ,E ,wv ,we), with
V = S ∪ Twv (v) = |V |we(eij ) = |Si ||Tj |
S1
3
T1
2S2
2
T2
1
S3
4
T3
3
6
3
4
2
4
12
6
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 68 / 106
![Page 103: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/103.jpg)
GNOMETask Graph
DefinitionA task Eij stands for comparing Si with Tj ∈ µ(Si )Task Graph G = (V ,E ,wv ,we), with
V = S ∪ Twv (v) = |V |we(eij ) = |Si ||Tj |
S1
3
T1
2S2
2
T2
1
S3
4
T3
36
3
4
2
4
12
6
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 68 / 106
![Page 104: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/104.jpg)
GNOMETask Graph
DefinitionA task Eij stands for comparing Si with Tj ∈ µ(Si )Task Graph G = (V ,E ,wv ,we), with
V = S ∪ Twv (v) = |V |we(eij ) = |Si ||Tj |
S1
3
T1
2S2
2
T2
1
S3
4
T3
36
3
4
2
4
12
6
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 68 / 106
![Page 105: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/105.jpg)
GNOMEProblem Reformulation
Locality maximizationTwo-step approach:
1 Clustering: Find groups of nodes that fit in memory and2 Scheduling: Compute sequence of groups that minimizes hard drive access
S1
3
T1
2S2
2
T2
1
S3 4
T3
36
3
4
2
4
12
6
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 69 / 106
![Page 106: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/106.jpg)
GNOMEStep1: Clustering
Naïve ApproachCluster by Si
Example: |C | = 7
S1
3
T1
2S2
2
T2
1
S3 4
T3
36
3
4
2
4
12
6
6
3
4
2
6
4
12
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 70 / 106
![Page 107: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/107.jpg)
GNOMEStep1: Clustering
Naïve ApproachCluster by Si
Example: |C | = 7
S1
3
T1
2S2
2
T2
1
S3 4
T3
36
3
4
2
4
12
6
6
3
4
2
6
4
12
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 70 / 106
![Page 108: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/108.jpg)
GNOMEStep1: Clustering
Naïve ApproachCluster by Si
Example: |C | = 7
S1
3
T1
2S2
2
T2
1
S3 4
T3
36
3
4
2
4
12
6
6
3
4
2
6
4
12
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 70 / 106
![Page 109: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/109.jpg)
GNOMEStep1: Clustering
Naïve ApproachCluster by Si
Example: |C | = 7
S1
3
T1
2S2
2
T2
1
S3 4
T3
36
3
4
2
4
12
6
6
3
4
2
6
4
12
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 70 / 106
![Page 110: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/110.jpg)
GNOMEStep1: Clustering
Naïve ApproachCluster by Si
Example: |C | = 7
S1
3
T1
2S2
2
T2
1
S3 4
T3
36
3
4
2
4
12
6
6
3
4
2
6
4
12
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 70 / 106
![Page 111: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/111.jpg)
GNOMEStep1: Clustering
Naïve ApproachCluster by Si
Example: |C | = 7
S1
3
T1
2S2
2
T2
1
S3 4
T3
36
3
4
2
4
12
6
6
3
4
2
6
4
12
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 70 / 106
![Page 112: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/112.jpg)
GNOMEStep1: Clustering
Greedy ApproachStart by largest taskAdd connected largest tasks until none fits in C
Example: |C | = 7
S1
3T1
2S2
2
T2
1
S3 4
T3
36
3
4
2
412
6
12
6
46
3 2
4
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 71 / 106
![Page 113: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/113.jpg)
GNOMEStep1: Clustering
Greedy ApproachStart by largest taskAdd connected largest tasks until none fits in C
Example: |C | = 7
S1
3T1
2S2
2
T2
1
S3 4
T3
36
3
4
2
412
6
12
6
46
3 2
4
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 71 / 106
![Page 114: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/114.jpg)
GNOMEStep1: Clustering
Greedy ApproachStart by largest taskAdd connected largest tasks until none fits in C
Example: |C | = 7
S1
3T1
2S2
2
T2
1
S3 4
T3
36
3
4
2
412
6
12
6
46
3 2
4
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 71 / 106
![Page 115: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/115.jpg)
GNOMEStep1: Clustering
Greedy ApproachStart by largest taskAdd connected largest tasks until none fits in C
Example: |C | = 7
S1
3T1
2S2
2
T2
1
S3 4
T3
36
3
4
2
412
6
12
6
4
6
3 2
4
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 71 / 106
![Page 116: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/116.jpg)
GNOMEStep1: Clustering
Greedy ApproachStart by largest taskAdd connected largest tasks until none fits in C
Example: |C | = 7
S1
3T1
2S2
2
T2
1
S3 4
T3
36
3
4
2
412
6
12
6
46
3 2
4
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 71 / 106
![Page 117: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/117.jpg)
GNOMEStep2: Scheduling
InsightsOutput of clustering: Sequence G1, . . . ,GN of clustersIntuition: Consecutive clusters should share dataGoal: Maximize overlap of generated sequence
Overlap o(Gi ,Gj) =∑
v∈V (Gi )∩V (Gj )|v |
o(G4, G1) = 4
Overlap o(G1, . . . ,GN) =N−1∑i=1
o(Gi ,Gi+1)
o(G4, G1, G2, G1) = 9
S1
3T1
2S2
2
T2
1
S3 4
T3
36
3
4
2
412
6
12
6
46
3 2
4
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 72 / 106
![Page 118: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/118.jpg)
GNOMEStep2: Scheduling
InsightsOutput of clustering: Sequence G1, . . . ,GN of clustersIntuition: Consecutive clusters should share dataGoal: Maximize overlap of generated sequence
Overlap o(Gi ,Gj) =∑
v∈V (Gi )∩V (Gj )|v |
o(G4, G1) = 4
Overlap o(G1, . . . ,GN) =N−1∑i=1
o(Gi ,Gi+1)
o(G4, G1, G2, G1) = 9
S1
3T1
2S2
2
T2
1
S3 4
T3
36
3
4
2
412
6
12
6
46
3 2
4
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 72 / 106
![Page 119: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/119.jpg)
GNOMEStep2: Scheduling
InsightsOutput of clustering: Sequence G1, . . . ,GN of clustersIntuition: Consecutive clusters should share dataGoal: Maximize overlap of generated sequence
Overlap o(Gi ,Gj) =∑
v∈V (Gi )∩V (Gj )|v |
o(G4, G1) = 4
Overlap o(G1, . . . ,GN) =N−1∑i=1
o(Gi ,Gi+1)
o(G4, G1, G2, G1) = 9
S1
3T1
2S2
2
T2
1
S3 4
T3
36
3
4
2
412
6
12
6
46
3 2
4
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 72 / 106
![Page 120: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/120.jpg)
GNOMEStep2: Scheduling
Best-EffortSelect random pair of clustersIf permutation improves overlap, then permuteRelies on local knowledge for scalability
Trick:
∆(Gi ,Gj) = (o(Gi−1,Gj) + o(Gj ,Gi+1) + o(Gj−1,Gi ) + o(Gi ,Gj+1))−(o(Gi−1,Gi ) + o(Gi ,Gi+1) + o(Gj−1,Gj) + o(Gj ,Gj+1))
(1)
G3 G1 G2 G40 3 2
G4 G1 G2 G34 3 2
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 73 / 106
![Page 121: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/121.jpg)
GNOMEStep2: Scheduling
Best-EffortSelect random pair of clustersIf permutation improves overlap, then permuteRelies on local knowledge for scalability
Trick:
∆(Gi ,Gj) = (o(Gi−1,Gj) + o(Gj ,Gi+1) + o(Gj−1,Gi ) + o(Gi ,Gj+1))−(o(Gi−1,Gi ) + o(Gi ,Gi+1) + o(Gj−1,Gj) + o(Gj ,Gj+1))
(1)
G3 G1 G2 G40 3 2
G4 G1 G2 G34 3 2
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 73 / 106
![Page 122: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/122.jpg)
GNOMEStep2: Scheduling
GreedyStart with random clusterChoose next cluster with largest overlapGlobal knowledge needed
G3 G1 G2 G40 3 2
G3 G2 G1 G42 3 4
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 74 / 106
![Page 123: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/123.jpg)
GNOMEStep2: Scheduling
GreedyStart with random clusterChoose next cluster with largest overlapGlobal knowledge needed
G3 G1 G2 G40 3 2
G3 G2 G1 G42 3 4
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 74 / 106
![Page 124: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/124.jpg)
GNOMEExperimental Setup
Datasets1 DBP: 1 million labels from DBpedia
version 04-20152 LGD: 0.8 million places from
LinkedGeoDataHardware
1 Intel Xeon E5-2650 v3 processors(2.30GHz)
2 Ubuntu 14.04.3 LTS3 10GB RAM
Measures1 Total runtime2 Hit ratio
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 75 / 106
![Page 125: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/125.jpg)
GNOMEEvaluation of Clustering
Only show results of LGDResults on DBP lead to similar insights
Runtimes Hit Ratio|C | Naive Greedy Naive Greedy100 568.0 646.3 0.57 0.77200 518.3 594.0 0.66 0.80400 532.0 593.3 0.67 0.80
1,000 5,974.0 118,454.7 0.51 0.642,000 6,168.0 115,450.0 0.51 0.634,000 7,118.3 121,901.7 0.50 0.63
Conclusion1 Naïve approach is more efficient2 Greedy approach is more effective3 Select naïve approach for clustering
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 76 / 106
![Page 126: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/126.jpg)
GNOMEEvaluation of Clustering
Only show results of LGDResults on DBP lead to similar insights
Runtimes Hit Ratio|C | Naive Greedy Naive Greedy100 568.0 646.3 0.57 0.77200 518.3 594.0 0.66 0.80400 532.0 593.3 0.67 0.80
1,000 5,974.0 118,454.7 0.51 0.642,000 6,168.0 115,450.0 0.51 0.634,000 7,118.3 121,901.7 0.50 0.63
Conclusion1 Naïve approach is more efficient2 Greedy approach is more effective3 Select naïve approach for clustering
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 76 / 106
![Page 127: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/127.jpg)
GNOMEEvaluation of Scheduling
Only show results of LGDResults on DBP lead to similar insights
Runtimes (ms) Hit ratio|C | Best-Effort Greedy Best-Effort Greedy100 571.3 1,599.3 0.56 0.68200 565.7 1,448.3 0.66 0.85400 581.0 1,379.3 0.67 0.88
1,000 5,666.0 814,271.7 0.51 0.862,000 6,268.0 810,855.0 0.51 0.864,000 6,675.7 814,041.7 0.50 0.86
Conclusion1 Best-effort approach more time-efficient2 Greedy more effective3 Best-effort approach is to be used for scheduling
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 77 / 106
![Page 128: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/128.jpg)
GNOMEEvaluation of Scheduling
Only show results of LGDResults on DBP lead to similar insights
Runtimes (ms) Hit ratio|C | Best-Effort Greedy Best-Effort Greedy100 571.3 1,599.3 0.56 0.68200 565.7 1,448.3 0.66 0.85400 581.0 1,379.3 0.67 0.88
1,000 5,666.0 814,271.7 0.51 0.862,000 6,268.0 810,855.0 0.51 0.864,000 6,675.7 814,041.7 0.50 0.86
Conclusion1 Best-effort approach more time-efficient2 Greedy more effective3 Best-effort approach is to be used for scheduling
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 77 / 106
![Page 129: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/129.jpg)
GNOMEComparison with Caching Approaches
Runtimes (ms)|C | Gnome FIFO F2 LFU LRU SLRU
1,000 5,974.0 37,161.0 42,090.3 45,906.7 54,194.3 56,904.32,000 6,168.0 31,977.0 39,071.3 39,872.0 45,473.0 46,795.04,000 7,118.3 21,337.0 40,860.0 28,028.3 26,816.7 27,200.0
Hit ratio1,000 0.51 0.17 0.16 0.19 0.17 0.172,000 0.51 0.29 0.30 0.32 0.30 0.304,000 0.51 0.54 0.55 0.59 0.55 0.56
Conclusion1 Gnome is more time-efficient2 Leads to higher hit rates in most cases
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 78 / 106
![Page 130: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/130.jpg)
GNOMEComparison with Caching Approaches
Runtimes (ms)|C | Gnome FIFO F2 LFU LRU SLRU
1,000 5,974.0 37,161.0 42,090.3 45,906.7 54,194.3 56,904.32,000 6,168.0 31,977.0 39,071.3 39,872.0 45,473.0 46,795.04,000 7,118.3 21,337.0 40,860.0 28,028.3 26,816.7 27,200.0
Hit ratio1,000 0.51 0.17 0.16 0.19 0.17 0.172,000 0.51 0.29 0.30 0.32 0.30 0.304,000 0.51 0.54 0.55 0.59 0.55 0.56
Conclusion1 Gnome is more time-efficient2 Leads to higher hit rates in most cases
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 78 / 106
![Page 131: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/131.jpg)
GNOMEScalability
100k 200k 400k 800kLGD 362,141.3 1,452,922.0 5,934,038.7 20,001,965.7DBP 434,630.7 1,790,350.7 6,677,923.0 12,653,403.3
ConclusionSub-quadratic growth of runtimeRuntime grows linearly with number of mappingsFor LGD, 360 – 370 mappings/s
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 79 / 106
![Page 132: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/132.jpg)
Table of Contents
1 Introduction
2 LIMES
3 MultiBlock
4 Reduction-Ratio-Optimal Link Discovery
5 Link Discovery for Temporal Data
6 GNOME
7 GPUs and Hadoop
8 Summary and Conclusion
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 80 / 106
![Page 133: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/133.jpg)
Reach for the Cloud?Dealing with Time Complexity
1 Devise better algorithms
BlockingAlgorithms for given metrics
PPJoin+, EDJoinHR3
2 Use parallel hardware
Threads poolsMapReduceMassively Parallel Hardware
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 81 / 106
![Page 134: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/134.jpg)
Reach for the Cloud?Dealing with Time Complexity
1 Devise better algorithms
BlockingAlgorithms for given metrics
PPJoin+, EDJoinHR3
2 Use parallel hardware
Threads poolsMapReduceMassively Parallel Hardware
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 81 / 106
![Page 135: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/135.jpg)
Reach for the Cloud?Parallel Implementations
Different architecturesMemory (shared, hybrid, distributed)Execution paths (different, same)Location (remote, local)
QuestionWhen should which use which hardware?
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 82 / 106
![Page 136: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/136.jpg)
Reach for the Cloud?Parallel Implementations
Different architecturesMemory (shared, hybrid, distributed)Execution paths (different, same)Location (remote, local)
QuestionWhen should which use which hardware?
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 82 / 106
![Page 137: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/137.jpg)
Reach for the Cloud?Premises and Goals
PremisesGiven an algorithm that runs on all threearchitectures ...
Note to self: Implement onePicked HR3
Reduction-ratio-optimal
Reach for the Cloud?Goals
Compare runtimes on all three parallelarchitecturesFind break-even points
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 83 / 106
![Page 138: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/138.jpg)
Reach for the Cloud?Premises and Goals
PremisesGiven an algorithm that runs on all threearchitectures ...Note to self: Implement one
Picked HR3
Reduction-ratio-optimal
Reach for the Cloud?Goals
Compare runtimes on all three parallelarchitecturesFind break-even points
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 83 / 106
![Page 139: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/139.jpg)
Reach for the Cloud?Premises and Goals
PremisesGiven an algorithm that runs on all threearchitectures ...Note to self: Implement one
Picked HR3
Reduction-ratio-optimal
Reach for the Cloud?Goals
Compare runtimes on all three parallelarchitecturesFind break-even points
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 83 / 106
![Page 140: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/140.jpg)
Reach for the Cloud?HR3
1 Assume spaces with Minkowski metric and p ≥ 2
δ(s, t) = p
√√√√ n∑i=1|si − ti |p
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 84 / 106
![Page 141: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/141.jpg)
Reach for the Cloud?HR3
2 Create grid of width ∆ = θ/α
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 85 / 106
![Page 142: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/142.jpg)
Reach for the Cloud?HR3
3 Link discovery condition describes a hypersphere
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 86 / 106
![Page 143: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/143.jpg)
Reach for the Cloud?HR3
4 Approximate hypersphere with hypercube
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 87 / 106
![Page 144: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/144.jpg)
Reach for the Cloud?HR3
5 Use index to discard portions of hypercube
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 88 / 106
![Page 145: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/145.jpg)
Reach for the Cloud?HR3 in GPUs
Large number of simple compute coresSame instruction, multiple dataBottleneck: PCI Express Bus
1 Run discretization on CPU2 Run indexing on GPU3 Run comparisons on CPU
Local memory
Global/constant memory
Local memory Local memory Local memory
Work item
Work group
Global/constant cache
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 89 / 106
![Page 146: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/146.jpg)
Reach for the Cloud?HR3 on the Cloud
Naive Approach
1 Rely on Map Reduce paradigm2 Run discretization and assignment to cubes in map step3 Run distance computation in reduce step
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 90 / 106
![Page 147: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/147.jpg)
Reach for the Cloud?HR3 on the Cloud
Naive Approach1 Rely on Map Reduce paradigm2 Run discretization and assignment to cubes in map step3 Run distance computation in reduce step
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 90 / 106
![Page 148: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/148.jpg)
Reach for the Cloud?HR3 on the Cloud
Load Balancing1 Run two jobs2 Job1: Compute cube population matrix3 Job2: Distribute balanced linking tasks across mappers and reducers
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 91 / 106
![Page 149: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/149.jpg)
Reach for the Cloud?Evaluation Hardware
1 CPU (Java)32-core server running Linux 10.0.4AMD Opteron 6128 clocket at 2.0GHz
2 GPU (C++)AMD Radeon 7870 with 20 compute units, 64parallel threadsHost program ran on Intel Core i7 3770 CPU with8GB RAM and Linux 12.10Ran the Java code on the same machine for scaling
3 Cloud (Java)10 c1.medium nodes (2 cores, 1.7GB) for smallexperiments30 c1.large nodes (8 cores, 7GB) for largeexperiments
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 92 / 106
![Page 150: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/150.jpg)
Reach for the Cloud?Experimental Setup
Run deduplication taskEvaluate behavior on different number of dimensionsImportant: Scale results
Different hardware (2-7 times faster C++ workstation)Programming language
Evaluate scalability (DS4)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 93 / 106
![Page 151: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/151.jpg)
Reach for the Cloud?Results – DS1
DBpedia, 3 dimensions, 26KGPU scales better
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 94 / 106
![Page 152: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/152.jpg)
Reach for the Cloud?Results – DS2
DBpedia, 2 dimensions, 475KGPUs scale better across different θBreak-even point ≈ 108 results
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 95 / 106
![Page 153: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/153.jpg)
Reach for the Cloud?Results – DS3
LinkedGeoData, 2 dimensions, 500KSimilar picture
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 96 / 106
![Page 154: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/154.jpg)
Reach for the Cloud?Results – Scalability
LinkedGeoData, 2 dimensions, 6MCloud (with load balancing) better for ≈ 1010+ results
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 97 / 106
![Page 155: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/155.jpg)
Reach for the Cloud?Summary
Question: When should we use which hardware for link discovery?Results
Implemented HR3 on different hardwareProvided the first implementation of link discovery on GPUsDevised a load balancing approach for linking on the cloudDiscovered 108 and 1010 results as break-even points
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 98 / 106
![Page 156: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/156.jpg)
Reach for the Cloud?Summary
Question: When should we use which hardware for link discovery?Insights
Load balancing important for using the cloudGPUs: Need faster buses that PCIe (e.g., Firewire speed)Accurate use of local resources sufficient for most of the currentapplications
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 99 / 106
![Page 157: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/157.jpg)
Table of Contents
1 Introduction
2 LIMES
3 MultiBlock
4 Reduction-Ratio-Optimal Link Discovery
5 Link Discovery for Temporal Data
6 GNOME
7 GPUs and Hadoop
8 Summary and Conclusion
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 100 / 106
![Page 158: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/158.jpg)
Summary and Conclusion
Four approaches to scalability1 Improve reduction ratio (LIMES,
MultiBlock, HYPPO, HR3, . . . )2 Reduce runtime complexity
(AEGLE)3 Better use of hardware (GNOME)4 Improve execution of
specifications
Challenges include1 Increase number of
reduction-ratio-optimalapproaches (HR3, ORCHID)
2 Adaptive resource scheduling3 Self-regulating approaches4 Distribution in modern in-memory
architecture (SPARK)5 . . .
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 101 / 106
![Page 159: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/159.jpg)
Summary and Conclusion
Four approaches to scalability1 Improve reduction ratio (LIMES,
MultiBlock, HYPPO, HR3, . . . )2 Reduce runtime complexity
(AEGLE)3 Better use of hardware (GNOME)4 Improve execution of
specificationsChallenges include
1 Increase number ofreduction-ratio-optimalapproaches (HR3, ORCHID)
2 Adaptive resource scheduling3 Self-regulating approaches4 Distribution in modern in-memory
architecture (SPARK)5 . . .
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 101 / 106
![Page 160: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/160.jpg)
Acknowledgment
This work was supported by grants from the EU H2020 Framework Programmeprovided for the project HOBBIT (GA no. 688227).
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 102 / 106
![Page 161: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/161.jpg)
References I
Kleanthi Georgala, Mohamed Ahmed Sherif, andAxel-Cyrille Ngonga Ngomo. “An Efficient Approach for theGeneration of Allen Relations”. In: ECAI 2016 - 22nd EuropeanConference on Artificial Intelligence, 29 August-2 September 2016,The Hague, The Netherlands - Including Prestigious Applications ofArtificial Intelligence (PAIS 2016). Ed. by Gal A. Kaminka et al.Vol. 285. Frontiers in Artificial Intelligence and Applications. IOSPress, 2016, pp. 948–956. isbn: 978-1-61499-671-2. doi:10.3233/978-1-61499-672-9-948. url:http://dx.doi.org/10.3233/978-1-61499-672-9-948.
Robert Isele, Anja Jentzsch, and Christian Bizer. “EfficientMultidimensional Blocking for Link Discovery without losing Recall”.In: Proceedings of the 14th International Workshop on the Web andDatabases 2011, WebDB 2011, Athens, Greece, June 12, 2011. Ed. byAmélie Marian and Vasilis Vassalos. 2011. url:http://webdb2011.rutgers.edu/papers/Paper%2039/silk.pdf.
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 103 / 106
![Page 162: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/162.jpg)
References II
Axel-Cyrille Ngonga Ngomo and Sören Auer. “LIMES - ATime-Efficient Approach for Large-Scale Link Discovery on the Webof Data”. In: IJCAI 2011, Proceedings of the 22nd International JointConference on Artificial Intelligence, Barcelona, Catalonia, Spain, July16-22, 2011. Ed. by Toby Walsh. IJCAI/AAAI, 2011, pp. 2312–2317.isbn: 978-1-57735-516-8. doi:10.5591/978-1-57735-516-8/IJCAI11-385. url: http://dx.doi.org/10.5591/978-1-57735-516-8/IJCAI11-385.
Axel-Cyrille Ngonga Ngomo et al. “When to Reach for the Cloud:Using Parallel Hardware for Link Discovery”. In: The Semantic Web:Semantics and Big Data, 10th International Conference, ESWC 2013,Montpellier, France, May 26-30, 2013. Proceedings. Ed. byPhilipp Cimiano et al. Vol. 7882. Lecture Notes in Computer Science.Springer, 2013, pp. 275–289. isbn: 978-3-642-38287-1. doi:10.1007/978-3-642-38288-8_19. url:http://dx.doi.org/10.1007/978-3-642-38288-8_19.
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 104 / 106
![Page 163: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/163.jpg)
References III
Axel-Cyrille Ngonga Ngomo. “Link Discovery with GuaranteedReduction Ratio in Affine Spaces with Minkowski Measures”. In: TheSemantic Web - ISWC 2012 - 11th International Semantic WebConference, Boston, MA, USA, November 11-15, 2012, Proceedings,Part I. Ed. by Philippe Cudré-Mauroux et al. Vol. 7649. Lecture Notesin Computer Science. Springer, 2012, pp. 378–393. isbn:978-3-642-35175-4. doi: 10.1007/978-3-642-35176-1_24. url:http://dx.doi.org/10.1007/978-3-642-35176-1_24.
Axel-Cyrille Ngonga Ngomo. “ORCHID - Reduction-Ratio-OptimalComputation of Geo-spatial Distances for Link Discovery”. In: TheSemantic Web - ISWC 2013 - 12th International Semantic WebConference, Sydney, NSW, Australia, October 21-25, 2013,Proceedings, Part I. Ed. by Harith Alani et al. Vol. 8218. LectureNotes in Computer Science. Springer, 2013, pp. 395–410. isbn:978-3-642-41334-6. doi: 10.1007/978-3-642-41335-3_25. url:http://dx.doi.org/10.1007/978-3-642-41335-3_25.
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 105 / 106
![Page 164: Link Discovery Tutorial Part I: Efficiency](https://reader038.vdocuments.us/reader038/viewer/2022102917/5879f1b01a28ab70298b4d41/html5/thumbnails/164.jpg)
References IV
Axel-Cyrille Ngonga Ngomo. “HELIOS - Execution Optimization forLink Discovery”. In: The Semantic Web - ISWC 2014 - 13thInternational Semantic Web Conference, Riva del Garda, Italy,October 19-23, 2014. Proceedings, Part I. Ed. by Peter Mika et al.Vol. 8796. Lecture Notes in Computer Science. Springer, 2014,pp. 17–32. isbn: 978-3-319-11963-2. doi:10.1007/978-3-319-11964-9_2. url:http://dx.doi.org/10.1007/978-3-319-11964-9_2.
Axel-Cyrille Ngonga Ngomo and Mofeed M. Hassan. “The LazyTraveling Salesman - Memory Management for Large-Scale LinkDiscovery”. In: The Semantic Web. Latest Advances and NewDomains - 13th International Conference, ESWC 2016, Heraklion,Crete, Greece, May 29 - June 2, 2016, Proceedings. Ed. byHarald Sack et al. Vol. 9678. Lecture Notes in Computer Science.Springer, 2016, pp. 423–438. isbn: 978-3-319-34128-6. doi:10.1007/978-3-319-34129-3_26. url:http://dx.doi.org/10.1007/978-3-319-34129-3_26.
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 106 / 106