1 powerset explorer: a datamining application jordan lee
Post on 15-Jan-2016
226 views
TRANSCRIPT
![Page 1: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/1.jpg)
1
Powerset Explorer: A Datamining Application
Jordan Lee
![Page 2: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/2.jpg)
2
Background
![Page 3: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/3.jpg)
3
Background
PAST– Datamining accomplished with human intuition
![Page 4: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/4.jpg)
4
Background
PAST– Datamining accomplished with human intuition
PRESENT– Computer aided with AI and brute force CPU cycles
![Page 5: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/5.jpg)
5
Background
PAST– Datamining accomplished with human intuition
PRESENT– Computer aided with AI and brute force CPU cycles
FUTURE– Enter PowersetViewer….
![Page 6: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/6.jpg)
6
Dataset
![Page 7: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/7.jpg)
7
Dataset
Alphabet– Items that can be found in transactions– Eg. Apples, bread, chips
![Page 8: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/8.jpg)
8
Dataset
Alphabet– Items that can be found in transactions– Eg. Apples, bread, chips
Transaction– Sets of items (unordered)– Eg. Tx1 = { Apples, Chips }– Eg. Tx2 = { Bread }
![Page 9: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/9.jpg)
9
Dataset
Alphabet– Items that can be found in transactions– Eg. Apples, bread, chips
Transaction– Sets of items (unordered)– Eg. Tx1 = { Apples, Chips }– Eg. Tx2 = { Bread }
Transaction database– Collection of transactions (unordered, possibly repetitive)– Eg. Walmart transaction logs
![Page 10: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/10.jpg)
10
Example Dataset
Student enrollment database
![Page 11: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/11.jpg)
11
Example Dataset
Student enrollment database– Alphabet = courses
{ CPSC124, CPSC126, PHIL120, ANTH100, ENGL112 }
![Page 12: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/12.jpg)
12
Example Dataset
Student enrollment database– Alphabet = courses
{ CPSC124, CPSC126, PHIL120, ANTH100, ENGL112 }
– Transaction = courses student is enrolled in #29389002 -> { CPSC 124, PHIL120, ENGL112 }
![Page 13: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/13.jpg)
13
Example Dataset
Student enrollment database– Alphabet = courses
{ CPSC124, CPSC126, PHIL120, ANTH100, ENGL112 }
– Transaction = courses student is enrolled in #29389002 -> { CPSC 124, PHIL120, ENGL112 }
– Transaction DB = list of student course schedules
![Page 14: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/14.jpg)
14
Example Dataset (cont’d)
72423298 5 676 1701 3046 3900 1327 38578546 7 175 178 1182 1701 3038 680 39127660625 5 326 676 1701 3038 390843359163 3 1177 1699 4317 26495781 6 676 1177 1701 3038 3900 4275 48536452 4 1699 2339 1327 2826 64251972 6 676 1177 1701 3038 3900 2549 23212318 5 676 1701 3040 3813 3900 19820119 5 104 676 1699 3038 3900 65954629 4 480 676 3040 3908 54392012 5 676 1701 3038 3813 3899 85833501 5 676 1699 3040 3813 3900 65136197 5 676 1699 3038 3900 2580
![Page 15: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/15.jpg)
15
Why?
Why is this interesting?
![Page 16: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/16.jpg)
16
Why?
Why is this interesting?– Consumer transaction logs -> trends in consumer
buying
![Page 17: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/17.jpg)
17
Why?
Why is this interesting?– Consumer transaction logs -> trends in consumer
buying– Student enrollment database -> trends in
enrollment What electives do most undergrad computer science
students take? Departments can determine which joint majors would fit
the student population.
![Page 18: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/18.jpg)
18
Why? (cont’d)
Dataset sizes growing exponentially
![Page 19: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/19.jpg)
19
Why? (cont’d)
Dataset sizes growing exponentially– Human intuition has reached its limits
![Page 20: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/20.jpg)
20
Why? (cont’d)
Dataset sizes growing exponentially– Human intuition has reached its limits– Require computers and AI (expensive)
![Page 21: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/21.jpg)
21
Why? (cont’d)
Dataset sizes growing exponentially– Human intuition has reached its limits– Require computers and AI (expensive)– Information visualization can scale the power of
human intuition
![Page 22: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/22.jpg)
22
Powerset Explorer
Code base from TreeJuxtaposer (Munzner)– AccordianDrawer package
![Page 23: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/23.jpg)
TreeJuxtaposer
![Page 24: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/24.jpg)
24
Powerset Explorer
Code base from TreeJuxtaposer (Munzner)– AccordianDrawer package
Goals
![Page 25: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/25.jpg)
25
Powerset Explorer
Code base from TreeJuxtaposer (Munzner)– AccordianDrawer package
Goals– Focus + context exploration using grids
![Page 26: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/26.jpg)
26
Powerset Explorer
Code base from TreeJuxtaposer (Munzner)– AccordianDrawer package
Goals– Focus + context exploration using grids– Guaranteed visibility
![Page 27: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/27.jpg)
27
Powerset Explorer
Code base from TreeJuxtaposer (Munzner)– AccordianDrawer package
Goals– Focus + context exploration using grids– Guaranteed visibility– Marking of groups
![Page 28: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/28.jpg)
28
Milestones Status Update
![Page 29: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/29.jpg)
29
Milestones Status Update
#1 Completion of the basic visualization of a randomized database of small set size (~10)
![Page 30: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/30.jpg)
30
Milestones Status Update
#1 Completion of the basic visualization of a randomized database of small set size (~10)
#2 Addition of a single level of “marking”.
![Page 31: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/31.jpg)
31
Milestones Status Update
#1 Completion of the basic visualization of a randomized database of small set size (~10)
#2 Addition of a single level of “marking”. #3 Addition of multiple levels of “marking” (6)
![Page 32: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/32.jpg)
32
Milestones Status Update
#1 Completion of the basic visualization of a randomized database of small set size (~10)
#2 Addition of a single level of “marking”. #3 Addition of multiple levels of “marking” (6) #4 Addition of background marking to demarcate
areas of sets containing different amounts of items.
![Page 33: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/33.jpg)
33
Milestones Status Update
#1 Completion of the basic visualization of a randomized database of small set size (~10)
#2 Addition of a single level of “marking”. #3 Addition of multiple levels of “marking” (6) #4 Addition of background marking to demarcate
areas of sets containing different amounts of items. #5 Implement multiple constraints
![Page 34: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/34.jpg)
34
Milestones Status Update
#1 Completion of the basic visualization of a randomized database of small set size (~10)
#2 Addition of a single level of “marking”. #3 Addition of multiple levels of “marking” (6) #4 Addition of background marking to demarcate
areas of sets containing different amounts of items. #5 Implement multiple constraints #6 Increase maximum possible dataset size to at
least 100.
![Page 35: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/35.jpg)
35
Difficulties
![Page 36: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/36.jpg)
36
Difficulties
Multiple constraints difficult to implement on current server-side dataminer
![Page 37: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/37.jpg)
37
Difficulties
Multiple constraints difficult to implement on current server-side dataminer
Can not enumerate a powerset of alphabet size greater than 14 elements (integer = 32 bits)– Solution: use java class BigInteger
![Page 38: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/38.jpg)
38
Difficulties
Multiple constraints difficult to implement on current server-side dataminer
Can not enumerate a powerset of alphabet size greater than 14 elements (integer = 32 bits)– Solution: use java class BigInteger
High CPU and memory usage– Solultion: upgrade computer! hack
![Page 39: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/39.jpg)
39
Current Status
Reduced database8680433 3 0 7 5 2768129 2 6 4 6385608 5 1 9 10 9 11 147924 5 5 2 9 5 2 234140 3 11 4 8 4331093 4 4 6 0 0 3158394 5 12 1 12 5 4 5797538 6 11 4 3 13 12 4 6243191 1 5 5872060 4 3 8 9 6
![Page 40: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/40.jpg)
40
Current Status
Property file– 0 CPSC 325 75.0 3
1 PHIL 327 84.0 1 2 ANTH 329 45.0 2 3 MATH 327 0.0 3 4 PSYC 328 0.0 1 5 ENGL 329 0.0 2 6 APSC 540 0.0 1 7 MECH 541 0.0 1 8 STAT 543 0.0 1 9 SPAN 201 71.0 1 10 FREN 258 76.0 2 11 ECON 260 84.0 1 12 LING 295 42.0 1 13 EECE 302 73.0 1
![Page 41: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/41.jpg)
41
![Page 42: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/42.jpg)
42
![Page 43: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/43.jpg)
43
![Page 44: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/44.jpg)
44
![Page 45: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/45.jpg)
45
![Page 46: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/46.jpg)
46
![Page 47: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/47.jpg)
47
![Page 48: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/48.jpg)
48
![Page 49: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/49.jpg)
49
![Page 50: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/50.jpg)
50
![Page 51: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/51.jpg)
51
![Page 52: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/52.jpg)
52
![Page 53: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/53.jpg)
53
![Page 54: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/54.jpg)
54
![Page 55: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/55.jpg)
55
![Page 56: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/56.jpg)
56
![Page 57: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/57.jpg)
57
![Page 58: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/58.jpg)
58
![Page 59: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/59.jpg)
59
![Page 60: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/60.jpg)
60
![Page 61: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/61.jpg)
61
![Page 62: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/62.jpg)
62
![Page 63: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/63.jpg)
63
![Page 64: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/64.jpg)
64
![Page 65: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/65.jpg)
65
![Page 66: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/66.jpg)
66
![Page 67: 1 Powerset Explorer: A Datamining Application Jordan Lee](https://reader033.vdocuments.us/reader033/viewer/2022051115/56649d265503460f949fdaa2/html5/thumbnails/67.jpg)
67
Questions?