efficient algorithms for association finding and frequent association pattern mining
TRANSCRIPT
![Page 1: Efficient Algorithms for Association Finding and Frequent Association Pattern Mining](https://reader036.vdocuments.us/reader036/viewer/2022062522/589bfa4f1a28ab40348b6e17/html5/thumbnails/1.jpg)
Efficient Algorithms for Association Finding and
Frequent Association Pattern Mining
Gong Cheng, Daxin Liu, Yuzhong QuWebsoft Research Group
National Key Laboratory for Novel Software TechnologyNanjing University, China
Websoft
![Page 2: Efficient Algorithms for Association Finding and Frequent Association Pattern Mining](https://reader036.vdocuments.us/reader036/viewer/2022062522/589bfa4f1a28ab40348b6e17/html5/thumbnails/2.jpg)
Background and motivation• To suggest friends, recognize suspected terrorists,
answer questions … based on massive graph data
![Page 3: Efficient Algorithms for Association Finding and Frequent Association Pattern Mining](https://reader036.vdocuments.us/reader036/viewer/2022062522/589bfa4f1a28ab40348b6e17/html5/thumbnails/3.jpg)
Problem statementAn association connecting a set of query entities isa minimal subgraph that• contains all the query entities, and• is connected.
Chris
Bob
Paper-A
Paper-B
Dan
isAuthorOf
knows
correspondingAuthor
acceptedAt
Ellenknows
ISWC
isAuthorOf
COLD
acceptedAt
attended
attended
attendedreviewer
reviewer
Frankknows
Alice
![Page 4: Efficient Algorithms for Association Finding and Frequent Association Pattern Mining](https://reader036.vdocuments.us/reader036/viewer/2022062522/589bfa4f1a28ab40348b6e17/html5/thumbnails/4.jpg)
Problem statementAn association connecting a set of query entities isa minimal subgraph that• contains all the query entities, and• is connected.
Chris
Bob
Paper-A
Paper-B
Dan
isAuthorOf
knows
correspondingAuthor
acceptedAt
Ellenknows
ISWC
isAuthorOf
COLD
acceptedAt
attended
attended
attendedreviewer
reviewer
Frankknows
Alice
![Page 5: Efficient Algorithms for Association Finding and Frequent Association Pattern Mining](https://reader036.vdocuments.us/reader036/viewer/2022062522/589bfa4f1a28ab40348b6e17/html5/thumbnails/5.jpg)
Problem statementAn association connecting a set of query entities isa minimal subgraph that• contains all the query entities, and• is connected.
Chris
Bob
Paper-A
Paper-B
Dan
isAuthorOf
knows
correspondingAuthor
acceptedAt
Ellenknows
ISWC
isAuthorOf
COLD
acceptedAt
attended
attended
attendedreviewer
reviewer
Frankknows
Alice
![Page 6: Efficient Algorithms for Association Finding and Frequent Association Pattern Mining](https://reader036.vdocuments.us/reader036/viewer/2022062522/589bfa4f1a28ab40348b6e17/html5/thumbnails/6.jpg)
Problem statementAn association connecting a set of query entities isa minimal subgraph that• contains all the query entities, and• is connected.
Chris
Bob
Paper-A
Paper-B
Dan
isAuthorOf
knows
correspondingAuthor
acceptedAt
Ellenknows
ISWC
isAuthorOf
COLD
acceptedAt
attended
attended
attendedreviewer
reviewer
Frankknows
Alice
• Tree-structured• Leaves Query entities⊆
![Page 7: Efficient Algorithms for Association Finding and Frequent Association Pattern Mining](https://reader036.vdocuments.us/reader036/viewer/2022062522/589bfa4f1a28ab40348b6e17/html5/thumbnails/7.jpg)
Problem statement1. How to efficiently find associations in a possibly
very large graph?2. How to help users explore a possibly large set of
associations that have been found?
Chris
Bob
Paper-A
Paper-B
Dan
isAuthorOf
knows
correspondingAuthor
acceptedAt
Ellenknows
ISWC
isAuthorOf
COLD
acceptedAt
attended
attended
attendedreviewer
reviewer
Frankknows
Alice
![Page 8: Efficient Algorithms for Association Finding and Frequent Association Pattern Mining](https://reader036.vdocuments.us/reader036/viewer/2022062522/589bfa4f1a28ab40348b6e17/html5/thumbnails/8.jpg)
Problem statement1. How to efficiently find associations in a possibly
very large graph?2. How to help users explore a possibly large set of
associations that have been found?
Chris
Bob
Paper-A
Paper-B
Dan
isAuthorOf
knows
correspondingAuthor
acceptedAt
Ellenknows
ISWC
isAuthorOf
COLD
acceptedAt
attended
attended
attendedreviewer
reviewer
Frankknows
Alice
Association finding
Frequent association pattern mining
![Page 9: Efficient Algorithms for Association Finding and Frequent Association Pattern Mining](https://reader036.vdocuments.us/reader036/viewer/2022062522/589bfa4f1a28ab40348b6e17/html5/thumbnails/9.jpg)
Association finding: Problem• To find all the associations having a limited diameter
(Diameter = Greatest distance between any pair of vertices)
Chris
Bob
Paper-A
Paper-B
Dan
isAuthorOf
knows
correspondingAuthor
acceptedAt
Ellenknows
ISWC
isAuthorOf
COLD
acceptedAt
attended
attended
attendedreviewer
reviewer
Frankknows
Alice
![Page 10: Efficient Algorithms for Association Finding and Frequent Association Pattern Mining](https://reader036.vdocuments.us/reader036/viewer/2022062522/589bfa4f1a28ab40348b6e17/html5/thumbnails/10.jpg)
Association finding: Basic solution
Chris
Bob
Paper-AisAuthorOfacceptedAt
ISWC attendedreviewer
Alice
Chris
attended
Paper-AisAuthorOfacceptedAt
ISWC
Alice
Bobreviewer
ISWC
ISWC
An association A set of paths
![Page 11: Efficient Algorithms for Association Finding and Frequent Association Pattern Mining](https://reader036.vdocuments.us/reader036/viewer/2022062522/589bfa4f1a28ab40348b6e17/html5/thumbnails/11.jpg)
Association finding: Basic solution
Chris
Bob
Paper-AisAuthorOfacceptedAt
ISWC attendedreviewer
Alice
Chris
attended
Paper-AisAuthorOfacceptedAt
ISWC
Alice
Bobreviewer
ISWC
ISWC
1. Path finding2. Path merging
![Page 12: Efficient Algorithms for Association Finding and Frequent Association Pattern Mining](https://reader036.vdocuments.us/reader036/viewer/2022062522/589bfa4f1a28ab40348b6e17/html5/thumbnails/12.jpg)
Association finding: Optimization• Distance-based search space pruning
Chris
Bob
Paper-A
Paper-B
Dan
isAuthorOf
knows
correspondingAuthor
acceptedAt
Ellenknows
ISWC
isAuthorOf
COLD
acceptedAt
attended
attended
attendedreviewer
reviewer
Frankknows
Alice
• DiameterConstraint ≤ 3• Length(AliceDan) = 1• Distance(Dan, Bob) = 4Length+Distance > DiameterConstraint
![Page 13: Efficient Algorithms for Association Finding and Frequent Association Pattern Mining](https://reader036.vdocuments.us/reader036/viewer/2022062522/589bfa4f1a28ab40348b6e17/html5/thumbnails/13.jpg)
Association finding: Optimization• Distance computation• Materializing offline computed results: O(V2) space• Online computing: O(E) time per pair• Using distance oracle: a space-time trade-off
Chris
Bob
Paper-A
Paper-B
Dan
isAuthorOf
knows
correspondingAuthor
acceptedAt
Ellenknows
ISWC
isAuthorOf
COLD
acceptedAt
attended
attended
attendedreviewer
reviewer
Frankknows
Alice
![Page 14: Efficient Algorithms for Association Finding and Frequent Association Pattern Mining](https://reader036.vdocuments.us/reader036/viewer/2022062522/589bfa4f1a28ab40348b6e17/html5/thumbnails/14.jpg)
Association finding: Deduplication
Chris
Bob
Paper-AisAuthorOfacceptedAt
ISWC attendedreviewer
Alice
Chris
attended
Paper-AisAuthorOfacceptedAt
ISWC
Alice
Bobreviewer
ISWC
ISWC
An association A set of paths
![Page 15: Efficient Algorithms for Association Finding and Frequent Association Pattern Mining](https://reader036.vdocuments.us/reader036/viewer/2022062522/589bfa4f1a28ab40348b6e17/html5/thumbnails/15.jpg)
Association finding: Deduplication
Chris
Bob
Paper-AisAuthorOfacceptedAt
ISWC attendedreviewer
Alice
Chris
attended
Paper-AisAuthorOf
Alice
Bobreviewer
ISWC
ISWC
A duplicate association Another set of paths
Paper-AacceptedAt
Paper-AacceptedAt
![Page 16: Efficient Algorithms for Association Finding and Frequent Association Pattern Mining](https://reader036.vdocuments.us/reader036/viewer/2022062522/589bfa4f1a28ab40348b6e17/html5/thumbnails/16.jpg)
Association finding: Deduplication• Canonical code
Chris
Bob
Paper-AisAuthorOfacceptedAt
ISWC attendedreviewer
Alice
code(Tree(Alice)) = Alice,isAuthorOf,code(Tree(Paper-A))$ = … Paper-A,acceptedAt,code(Tree(ISWC))$ … = … ISWC,reviewer,code(Tree(Bob)),~attended,code(Tree(Chris))$ … = … Bob$ …
(Assuming Bob precedes Chris)
![Page 17: Efficient Algorithms for Association Finding and Frequent Association Pattern Mining](https://reader036.vdocuments.us/reader036/viewer/2022062522/589bfa4f1a28ab40348b6e17/html5/thumbnails/17.jpg)
Frequent association pattern mining• Association pattern: A conceptual abstract that
summarizes a group of associations
Chris
Bob
Paper-AisAuthorOfacceptedAt
ISWC attendedreviewer
Alice Association Association pattern(Intermediate entity Class)
Chris
Bob
PaperisAuthorOfacceptedAt
Conference attendedreviewer
Alice
![Page 18: Efficient Algorithms for Association Finding and Frequent Association Pattern Mining](https://reader036.vdocuments.us/reader036/viewer/2022062522/589bfa4f1a28ab40348b6e17/html5/thumbnails/18.jpg)
Frequent association pattern mining• Problem: To mine all the association patterns matched by
more than a threshold proportion of associations
Chris
Bob
Paper-AisAuthorOfacceptedAt
ISWC attendedreviewer
Alice Association Association pattern(Intermediate entity Class)
Chris
Bob
PaperisAuthorOfacceptedAt
Conference attendedreviewer
Alice
![Page 19: Efficient Algorithms for Association Finding and Frequent Association Pattern Mining](https://reader036.vdocuments.us/reader036/viewer/2022062522/589bfa4f1a28ab40348b6e17/html5/thumbnails/19.jpg)
Frequent association pattern mining• Basic solution:
Chris
Bob
Paper-AisAuthorOfacceptedAt
ISWC attendedreviewer
Alice Association Association pattern(Intermediate entity Class)
Chris
Bob
PaperisAuthorOfacceptedAt
Conference attendedreviewer
Alice
Calculating the frequency of an association pattern= Counting the occurrence of its canonical code
![Page 20: Efficient Algorithms for Association Finding and Frequent Association Pattern Mining](https://reader036.vdocuments.us/reader036/viewer/2022062522/589bfa4f1a28ab40348b6e17/html5/thumbnails/20.jpg)
Frequent association pattern mining• Canonical code
Chris
Bob
PaperisAuthorOfacceptedAt
Conference
attended
reviewer
Alice PaperisAuthorOf acceptedAt
Conference
code(Tree(Alice)) = Alice,isAuthorOf,code(Tree(Paper)),isAuthorOf,code(Tree(Paper))$
Paper equals Paper.So which subtree should go first?(Canonical code may not be unique!)
? ?
![Page 21: Efficient Algorithms for Association Finding and Frequent Association Pattern Mining](https://reader036.vdocuments.us/reader036/viewer/2022062522/589bfa4f1a28ab40348b6e17/html5/thumbnails/21.jpg)
Frequent association pattern mining• Canonical code
Chris
Bob
PaperisAuthorOfacceptedAt
Conference
attended
reviewer
Alice PaperisAuthorOf acceptedAt
Conference
code(Tree(Alice)) = Alice,isAuthorOf,code(Tree(Paper)),isAuthorOf,code(Tree(Paper))$
Smallest leaf as its proxy to be compared
Equality would never happen.(Canonical code is now unique!)
(Assuming Bob precedes Chris)
![Page 22: Efficient Algorithms for Association Finding and Frequent Association Pattern Mining](https://reader036.vdocuments.us/reader036/viewer/2022062522/589bfa4f1a28ab40348b6e17/html5/thumbnails/22.jpg)
Experiments• Datasets
• LinkedMDB: 1M vertices and 2M arcs• DBpedia (2015-04): 4M vertices and 15M arcs
• Parameter settings• Diameter constraint (λ): 2, 4• Number of query entities (n): 2, 3, 4, 5
• Test queries• 1,000 random sets of query entities under each setting of λ and n
• Hardware configuration• 3.3GHz CPU, 24GB memory• Data graphs: in memory• Distance oracles: on disk
![Page 23: Efficient Algorithms for Association Finding and Frequent Association Pattern Mining](https://reader036.vdocuments.us/reader036/viewer/2022062522/589bfa4f1a28ab40348b6e17/html5/thumbnails/23.jpg)
Experiments• Results: Association finding
BSC: Basic solution (not pruning)PRN: Optimized solution (distance-based pruning)PRN-1: Optimized solution (distance-based pruning except for the last level of search)
![Page 24: Efficient Algorithms for Association Finding and Frequent Association Pattern Mining](https://reader036.vdocuments.us/reader036/viewer/2022062522/589bfa4f1a28ab40348b6e17/html5/thumbnails/24.jpg)
Experiments• Results: Frequent association pattern mining• LinkedMDB
• <10,000 associations: <21ms• 13,531 associations: 68ms
• DBpedia• <10,000 associations: <65ms• 1,198,968 associations: 2909ms
![Page 25: Efficient Algorithms for Association Finding and Frequent Association Pattern Mining](https://reader036.vdocuments.us/reader036/viewer/2022062522/589bfa4f1a28ab40348b6e17/html5/thumbnails/25.jpg)
Takeaway messages• Subgraph finding and mining are faster than what we expected.• Consider distance oracle and canonical code in your own research.
![Page 26: Efficient Algorithms for Association Finding and Frequent Association Pattern Mining](https://reader036.vdocuments.us/reader036/viewer/2022062522/589bfa4f1a28ab40348b6e17/html5/thumbnails/26.jpg)
Takeaway messages• Subgraph finding and mining are faster than what we expected.• Consider distance oracle and canonical code in your own research.