bryan lahartinger. “the apriori algorithm is a fundamental correlation-based data mining...
TRANSCRIPT
![Page 1: Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d015503460f949d3773/html5/thumbnails/1.jpg)
An Architecture for Efficient Hardware Data Mining using Reconfigurable Computing Systems
Bryan Lahartinger
![Page 2: Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d015503460f949d3773/html5/thumbnails/2.jpg)
“The Apriori algorithm is a fundamental correlation-based data mining [technique]”
“Software implementations of the Aprioiri algorithm utilize…methods...for the support and candidate generation operations”
“This paper demonstrates an efficient structure for computing the support of a set of candidates.”
“…though the combination of Content-Addessable-Memories (CAM)”
“As far as we know, the Aprioiri algorithm has not been studied in any significant way for hardware implementation.”
Objective Investigation
![Page 3: Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d015503460f949d3773/html5/thumbnails/3.jpg)
To exploit parallelism in hardware to accelerate a bottleneck in the Apriori algorithm with applications specifically to data mining.
What is the Aprioiri algorithm?
What is the bottleneck?
How does hardware acceleration fit into the picture?
Objective
![Page 4: Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d015503460f949d3773/html5/thumbnails/4.jpg)
• Background• Apriori Algorithm• Apriori bottleneck• Bitmapped CAM
• Implementing Bitmap CAM
• Analysis of the Approach
• Results of software comparisons
• Conclusions
Paper Overview
![Page 5: Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d015503460f949d3773/html5/thumbnails/5.jpg)
Given transactions consisting of sets:
{1,2,3,4}, {2,3,4}, {2,3}, {1,2,4}, {1,2,3,4}, and {2,4}
Apriori
Item Support
1 3
2 6
3 4
4 5
Item Support
{1,2} 3
{1,3} 2
{1,4} 3
{2,3} 4
{2,4} 5
{3,4} 3
Item Support
{1,2,4} 3
{2,3,4} 3
![Page 6: Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d015503460f949d3773/html5/thumbnails/6.jpg)
• Each candidate can be addressedto a row of bits
• Each column represents if a candidate is included in the CAM entry as a candidate
• Column bits can be summed toform the number of matchingcandidates
Bitmapped CAM
![Page 7: Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d015503460f949d3773/html5/thumbnails/7.jpg)
• Large LUT in memory
Candidate 249 is frequently associated with candidates 1-11 but not 12…
ImplementedCAM Bitmap
![Page 8: Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d015503460f949d3773/html5/thumbnails/8.jpg)
• They varied the number of CAM elements to candidates • Max CAM blocks of 32
• 32 Blocks fit most cases
• When they didn’t…• Solution:• Stop adding candidates to
the block when full [why?]
Analysis of the Approach
![Page 9: Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d015503460f949d3773/html5/thumbnails/9.jpg)
• VHDL architecture req only 10 cycles per CAM stage (Xilinx 7.2 on Viritex II)
• Max clock rate 120MHz
• Used standard datasets
• Compared software from only 1 hardware platform
• Used half logic cells per candidates compared to USC FCCM05 (Half FPGA Area?)
Results
![Page 10: Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d015503460f949d3773/html5/thumbnails/10.jpg)
• CAM = awesome VS software = sucks• Allows similarities between candidates to be utilized
• Their previous paper on systolic array architecture of Apriori Algoin hardware would work even better with this improvement
• An ideal architecture will be constructed/tested with both arch’s combined
Conclusions
![Page 11: Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d015503460f949d3773/html5/thumbnails/11.jpg)
Pros
• Intro was unclear at first i.e. NOT about Apriori, but more general applications
• Reasonable explanation of Apriori and CAM
Criticisms
![Page 12: Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d015503460f949d3773/html5/thumbnails/12.jpg)
Cons
• No VHDL implementation details – “highly pipelined”, that’s it…for real
• Software only tested on one hardware platform – 2.8Ghz Xeon 3Gb ram
![Page 13: Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d015503460f949d3773/html5/thumbnails/13.jpg)
• Bad analysis of their methodology• Hard to follow• Unclear how to reproduce
• Unclear results Questionable standard datesets• 120Mhz??? 10 cycles/CAM stage?????