Download - Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin
![Page 1: Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812ed5550346895d947634/html5/thumbnails/1.jpg)
1
Plan B Defense –Context-Inclusive Approach to Speed-up Function
Evaluation for Statistical Queries: An Extended Abstract
Vijay GandhiMasters Student
AdvisorDr. Shashi Shekhar
Committee MembersDr. Bradley Carlin
Dr. Jaideep Srivastava
Department of Computer ScienceUniversity of Minnesota, USA
![Page 2: Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812ed5550346895d947634/html5/thumbnails/2.jpg)
Biography Bachelor of Computer Science & Engineering, Madras University Master of Science, Computer Science, University of Minnesota (current) 3 years of work experience in the field of Data warehousing and Business
Intelligence at Oracle Corporation
Publications
Context-Inclusive Approach to Speed-up Function Evaluation for Statistical Queries: An Extended Abstract, Vijay Gandhi, James Kang, Shashi Shekhar, Junchang Ju, Eric Kolaczyk, Sucharita Gopal. IEEE International Conference on Data Mining Workshop on Spatio and Spatio-Temporal Data Mining (SSTDM), 2006
Parallelizing Multi-scale and Multi-granular Spatial Data Mining Algorithm, Vijay Gandhi, Mete Celik, Shashi Shekhar. PGAS Programming Models Conference 2006
Context-Inclusive Approach to Speed-up Function Evaluation for Statistical Queries, Vijay Gandhi, James Kang, Shashi Shekhar, Junchang Ju, Eric Kolaczyk, Sucharita Gopal. Submitted to the Journal on Knowledge and Information Systems, 2007
2
![Page 3: Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812ed5550346895d947634/html5/thumbnails/3.jpg)
Scope of the talk NSF Project on Land-use classification Joint collaboration with Boston University Main goal: Reduce the execution time of the algorithm Contributions:
Approach Description Results Effort (in hrs)
Black box tuning Limiting factor was modified to control
the precision
Reduced CPU time by 50% with
98% accuracy
20
Code Conversion Code was converted from MATLAB to C
Reduced CPU time by about 50%
120
Parallel implementation
Converted code from C to UPC on
Cray X1
Obtained near linear scalability
20
Algorithm Changes
Context-Inclusive approach
100
3
![Page 4: Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812ed5550346895d947634/html5/thumbnails/4.jpg)
4
Overview Motivation Background Problem Statement Related Work Contribution Validation Conclusion & Future Work
![Page 5: Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812ed5550346895d947634/html5/thumbnails/5.jpg)
5
Motivation Land-cover Change
Loss of land - 217 square miles of Louisiana’s coastal lands were transformed to water after Hurricanes Katrina and Rita.
Deforestation – Brazil lost 150,000 sq. km. of forest between
May 2000 and August 2006 Urban Sprawl
Mississippi River Delta, Louisiana(Red represents land loss between 2004
and 2005. Courtesy: USGS)
Deforestation, Ariquemes, Brazil(Courtesy: Global Change Program,
University of Michigan)
Urban Sprawl in Atlanta(Red indicates expansion between
1976 and 1992)
![Page 6: Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812ed5550346895d947634/html5/thumbnails/6.jpg)
6
Multiscale Multigranular Image Classification (MSMG) Input: Class hierarchy, Likelihood of specific classes
Conifer Hardwood Brush Grass
Likelihood of specific-classesLand-use Class Hierarchy
Output: Classified images at multiple scales
Scale: 2x2 Scale: 4x4 Scale: 64x64. . .
Scale: 1x1
Source courtesy: Boston University
![Page 7: Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812ed5550346895d947634/html5/thumbnails/7.jpg)
Background Algorithm
Divide input image into quad segments recursively
Calculate the log-likelihood for each class using Expectation Maximization
Decide the class for each segment
ProfilingInput Image: 128 x 128
Function # Calls % Time
gen_loglike 294,915 80
7
![Page 8: Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812ed5550346895d947634/html5/thumbnails/8.jpg)
Background Gen_loglike
gen_loglike calculates the log-likelihood of a non-specific class It uses Expectation Maximization # gen_loglike = f ( # general classes, image size, spatial scale)
Number of iterations depends on the number of general class, image size, and spatial scale
8
![Page 9: Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812ed5550346895d947634/html5/thumbnails/9.jpg)
10
Problem Statement Given:
Algorithm for Multiscale Multigranular Image Classification Input image with likelihood of specific classes, class hierarchy
Find: Classification at each quad segment
Objective: Minimize computation time
Constraints: Use the Expectation Maximization (EM) algorithm to calculate
quality measure of each non-specific class High Accuracy
![Page 10: Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812ed5550346895d947634/html5/thumbnails/10.jpg)
Classification Examples Likelihood of classes are compared to
find the best candidate
C
C1 C2
0.9 0.9
0.9 0.9
LikelihoodC1
0.1 0.1
0.1 0.1
LikelihoodC2
Best Candidate
C1
0.9 0.9
0.9 0.9
0.1 0.1
0.1 0.1
C2
0.1 0.9
0.1 0.9
0.9 0.1
0.9 0.1
C
0.9 0.1
0.1 0.9
0.1 0.9
0.9 0.1
C
0.8 0.8
0.1 0.1
0.2 0.2
0.9 0.9
C
0.75 0.75
0.1 0.1
0.25 0.25
0.9 0.9
C1
3.6
0.4
2.0
2.0
2.2
2.3
0.4
3.6
2.0
2.0
1.8
1.7
11
![Page 11: Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812ed5550346895d947634/html5/thumbnails/11.jpg)
12
Algorithm: Expectation Maximization Given:
Class hierarchy, Likelihood of specific classes
Find: Best Class and corresponding likelihood for a region (e.g. 2x2 region)
Likelihood of a specific class = sum of corresponding likelihood
(Likelihood of C1 = 2.2; Likelihood of C2 = 1.8 ) Log-likelihood of best specific class = -3.4296 (C1)
Likelihood of non-specific class (EM):
1. Initialize the proportion of each corresponding specific class
2. Multiply each likelihood by corresponding specific class proportion
3. Add the likelihood at corresponding pixel
4. Divide the value in step 1 by corresponding value in Step 2
5. Average the likelihood for each specific class
6. Repeat Step 2 to Step 5 until required accuracy
Likelihood of classes C1 and C2
at a 2x2 region
C
C1 C2
Class hierarchy
Lij(C1) Lij(C2)
0.8 0.8
0.1 0.1
0.2 0.2
0.9 0.9
Example
![Page 12: Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812ed5550346895d947634/html5/thumbnails/12.jpg)
13
Execution Trace: Expectation Maximization
Given: Class hierarchy, Likelihood of
specific classes
Find: Best Class for the 2x2 region
Likelihood of C:
1. Iteration 1: EM(p1n, p2n)
2. Multiply: L1ij(C1) = Lij(C1) . p1n; L2ij(C2) = Lij(C2) . p2n
3. Add: Lij = L1ij(C1) .+ L2ij(C2)
4. Divide: L1ij(C1) = L1ij(C1)./Lij; L2ij(C2) = L2ij(C2)./Lij
5. Average: p1n+1 = Avg(L1ij(C1)); p2n+1 = Avg(L2ij(C2))
0.1 0.1
0.45 0.45
0.4 0.4
0.05 0.05
EM(0.5, 0.5)
0.5 0.5
0.5 0.5
0.2 0.2
0.9 0.9
0.8 0.8
0.1 0.1
0.55, 0.45
Likelihood of classes C1 and C2
at a 2x2 region
C
C1 C2
Class hierarchy
Lij(C1) Lij(C2)
0.8 0.8
0.1 0.1
0.2 0.2
0.9 0.9
![Page 13: Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812ed5550346895d947634/html5/thumbnails/13.jpg)
After 17 iterations: Mixing proportions = (0.6042, 0.3958)
Likelihood of C:
Log-likelihood of best specific class = -3.4296 (C1) Log-likelihood considering penalties:
C (penalty: 4.3922) = -7.1235 C1 (penalty: 4.0456) = -7.4752
Best class: Class with maximum log-likelihood C1
Execution Trace: Expectation Maximization
Likelihood of classes C1 and C2
at a 2x2 region
C
C1 C2
Lij(C1) Lij(C2)
0.8 0.8
0.1 0.1
0.2 0.2
0.9 0.9
0.31664 0.31664
0.03958 0.03958
0.12084 0.12084
0.10875 0.10875
Log-likelihood of C = -2.7314
14
![Page 14: Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812ed5550346895d947634/html5/thumbnails/14.jpg)
18
Related Work
EM Performance improvement
Single candidate[ Aitken’s Method ][ Triple Jump – Huang et al.]
Multiple candidate[Context-Inclusive]
![Page 15: Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812ed5550346895d947634/html5/thumbnails/15.jpg)
19
Context-Exclusive Approach Instance Tree
Each candidate model is analyzed independently until convergence
The candidate model with maximum likelihood is selected
Instance Tree
Context-Exclusive Approach:1. Select the best specific class, Brush
(Specific classes do not require EM)2. Vegetation is evaluated until convergence (46)3. Forest is evaluated until convergence (34)4. Non-Forest is evaluated until convergence (3)5. Select the best class (Non-Forest)
1.2.
3.4.
1
2
3
4
Land-use Class Hierarchy
Total iterations: 46 + 34 + 3 = 83
![Page 16: Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812ed5550346895d947634/html5/thumbnails/16.jpg)
20
Limitations of Context-Exclusive Approach Computational Scalability
For 512 x 512 pixels - 7 hours of CPU time Where is the computational bottleneck?
80% of total execution time is spent in computing maximum likelihood
Number of function calls is dependent on the number of pixels, and spatial scale
CPU Time for example datasets
As spatial scale increases, the computation time increases exponentially
![Page 17: Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812ed5550346895d947634/html5/thumbnails/17.jpg)
Contributions – Context Inclusive Approaches Context-Inclusive Approaches – Ideal and Heuristic Context-Inclusive Approach – Ideal
If we may calculate a theoretical upper bound on likelihood for each class, it may be used to filter candidates
If upper bound calculation is not significant and has good filtering property, Context-Inclusive approach may perform better
21
Land-use Class Hierarchy
Context-Inclusive Approach - Ideal:1. Select the best specific class, Brush (Specific classes do not require EM)2. Calculate upper bound for each non-specific class (Non-forest , Forest , Vegetation )3. Assume Non-forest is evaluated first (3 iteration)4. We may prune Forest because upper bound of Forest is less than likelihood of Non-forest (0 iteration)5. We may prune Vegetation because upper bound of Vegetation is less than likelihood of Non-forest (0 iteration)6. Non-forest is selected
Total EM iterations: 3+ 0 + 0 = 3
![Page 18: Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812ed5550346895d947634/html5/thumbnails/18.jpg)
Context-Inclusive Approach – Ideal Lemma
Context Inclusive is correct such that each region is classified as with the best candidate class from the user-defined concept hierarchy. Likelihood value of a non-specific class can never go beyond its upper bound Upper bound can be used to compare the likelihood of non-specific classes
Discussion Is there any method to calculate the upper bound? C. Biernacki. An Asymptotic Upper Bound of the Likelihood to Prevent Gaussian
Mixtures from Degenerating,Preprint, 2005 Expensive
22
![Page 19: Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812ed5550346895d947634/html5/thumbnails/19.jpg)
23
Context Inclusive Approach - HeuristicInstance Tree is evaluated with context
Each candidate model is analyzed until it is better than the current best
Uses a instance-level syntax tree
Context-Inclusive Approach - Heuristic:1. Select the best specific class, Brush (Specific classes do not require EM)2. Vegetation is evaluated until convergence (46)3. Forest is evaluated (4)4. Non-Forest is evaluated (1)5. Non-Forest is the best-so-far
1.2.
3.4.
1
2
3
4 Land-use Class Hierarchy
Total iterations: 46 + 4 + 1 = 51
![Page 20: Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812ed5550346895d947634/html5/thumbnails/20.jpg)
24
Context-Exclusive vs. Context-Inclusive HeuristicAlgorithm 1 Context-Exclusive Approach
1: Function ContextExclusive(set Cand)2: Select the best specific class3: for each candidate model c Cand do4: repeat5: Refine quality measure for each candidate model c Cand6: until EM converges7: end for8: Select candidate model with the maximum quality measure9: return c
1: Function ContextInclusive(set Cand)2: Select the best specific class3: for each remaining candidate model c Cand do4: repeat5: Refine quality measure for each candidate model c Cand 6: until EM converges OR quality measure exceeds best so far7: end for8: Select candidate model that is best so far9: return c
Algorithm 2 Context-Inclusive Approach - Heuristic
![Page 21: Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812ed5550346895d947634/html5/thumbnails/21.jpg)
26
Experimental Design
Input: Synthetic dataset and Real dataset Language: MATLAB Platform: UltraSparc III 1.1 GHz, 1 GB RAM Measurements: Number of Iterations, CPU Time, Accuracy
ImageClassification
Benchmark Datasets
Limiting Factor
Measurements
Experimental Design
Experimental Questions: How does Context-Exclusive compare to Context-Inclusive Heuristic
approach? Accuracy Computational Efficiency
Candidates:Context-Exclusive,Context-Inclusive Heuristic
CompareClassifications
ClassificationAccuracy
Synthetic
Real
![Page 22: Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812ed5550346895d947634/html5/thumbnails/22.jpg)
27
Experiments – Dataset S Synthetic Dataset 128 x 128 pixels, 7 Classes Input: Class hierarchy, Likelihood of specific classes
Conifer Hardwood Brush Grass
Likelihood of specific-classesLand-use Class Hierarchy
Output: Classified images at multiple scales
Scale: 2x2 Scale: 4x4 Scale: 64x64. . .
Scale: 1x1
![Page 23: Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812ed5550346895d947634/html5/thumbnails/23.jpg)
28
Experiments – Dataset R Real Dataset; Plymouth County, Massachusetts 128 x 128 pixels, 12 Classes Input: Class hierarchy, Likelihood of specific classes
Land-use Class Hierarchy
Output: Classified images at multiple scales
Scale: 2x2 Scale: 4x4 Scale: 64x64. . .
Barren Brush Pitch Pine Bogs
…
Scale: 1x1
![Page 24: Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812ed5550346895d947634/html5/thumbnails/24.jpg)
How accurate is Context-Inclusive as compared to Context-Exclusive?
Accuracy (Limiting Factor = 0.00001)
Accuracy of Above 99% for Synthetic Dataset About 98% for Real Dataset
30
![Page 25: Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812ed5550346895d947634/html5/thumbnails/25.jpg)
31
How does computation of Context-Exclusive Compare that of Context-Inclusive? Number of Iterations (Limiting Factor: 0.00001)
Iterations reduced 67% for Synthetic Dataset 61% for Real Dataset
Dataset S Dataset R
![Page 26: Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812ed5550346895d947634/html5/thumbnails/26.jpg)
32
CPU Time (Limiting Factor = 0.00001)
CPU Time reduced 53% for Synthetic Dataset 47% for Real Dataset
How does computation of Context-Exclusive compare to that of Context-Inclusive?
Dataset S Dataset R
![Page 27: Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812ed5550346895d947634/html5/thumbnails/27.jpg)
33
Conclusion Context-Inclusive approach for function evaluation Experimental results supporting contributions Reduced the CPU Time by 50% without sacrificing
accuracy Future work: Context-Inclusive with Upper bound
![Page 28: Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812ed5550346895d947634/html5/thumbnails/28.jpg)
SummaryApproach Description Results Effort (in hrs)
Black box tuning Limiting factor was modified to control
the precision
Reduced CPU time by 50% with 98%
accuracy
20
Code Conversion Code was converted from MATLAB to C
Reduced CPU time by about 50%
120
Parallel implementation
Converted code from C to UPC on
Cray X1
Obtained near linear scalability
20
Algorithm Changes Context-Inclusive approach
Reduced CPU time without sacrificing
accuracy
100
34
![Page 29: Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812ed5550346895d947634/html5/thumbnails/29.jpg)
Acknowledgements
James Kang Dr. Junchang Ju Dr. Eric Kolaczyk Dr. Sucharita Gopal
35
![Page 30: Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812ed5550346895d947634/html5/thumbnails/30.jpg)
40
Forest
Non-Forest
VegetationL1
Context-Exclusive Approach Instance Tree
Each candidate model is analyzed independently until convergence
The candidate model with maximum likelihood is selected
Instance Tree
Iterations
Qua
lity
Mea
sure
Context-Exclusive Approach:1. Vegetation is evaluated until convergence, L12. Forest is evaluated until convergence, L23. Non-Forest is evaluated until convergence, L3
L2
L3
![Page 31: Vijay Gandhi Masters Student Advisor Dr. Shashi Shekhar Committee Members Dr. Bradley Carlin](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812ed5550346895d947634/html5/thumbnails/31.jpg)
41
Contributions Context Inclusive Approach Instance Tree is evaluated with context
Each candidate model is analyzed until it is better than the current best
Uses a instance-level syntax tree
Context-Inclusive Approach:1. Vegetation is evaluated until convergence, L12. Forest is evaluated until L23. Non-Forest is evaluated until L3
Forest
Non-Forest
Vegetation
Iterations
Qua
lity
Mea
sure
L1L2
L3