inference by randomly perturbing max-solvers · outline •random perturbation - why and how?...
TRANSCRIPT
![Page 1: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/1.jpg)
Inference by randomly perturbing max-solvers
Tamir Hazan University of Haifa
!!!!
![Page 2: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/2.jpg)
Inference in machine learning
• 20 years ago: does an image contain a person?
![Page 3: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/3.jpg)
Inference in machine learning
• 10 years ago: which object is in the image?
![Page 4: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/4.jpg)
Inference in machine learning
• Today’s challenge: exponentially many options
• For each pixel: decide if it is foreground or background.
• The space of possible structures is exponential
![Page 5: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/5.jpg)
Inference in machine learning
• Interactive annotation:
![Page 6: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/6.jpg)
Outline
• Random perturbation - why and how? - Sampling likely structures as fast as finding the most likely one.
![Page 7: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/7.jpg)
Outline
• Random perturbation - why and how? - Sampling likely structures as fast as finding the most likely one.
• Connections and Alternatives to Gibbs distribution: - the marginal polytope - non-MCMC sampling for Gibbs with perturb-max
![Page 8: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/8.jpg)
Outline
• Random perturbation - why and how? - Sampling likely structures as fast as finding the most likely one.
• Connections and Alternatives to Gibbs distribution: - the marginal polytope - non-MCMC sampling for Gibbs with perturb-max
• Application: interactive annotation. - New entropy bounds for perturb-max models.
![Page 9: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/9.jpg)
• machine learning applications are characterized by: - complex structures y = (y1, ..., yn)
Inference in machine learning
![Page 10: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/10.jpg)
• machine learning applications are characterized by: - complex structures y = (y1, ..., yn)
y 2 {0, 1}n
Inference in machine learning
![Page 11: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/11.jpg)
• machine learning applications are characterized by: - complex structures y = (y1, ..., yn)
Inference in machine learning
![Page 12: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/12.jpg)
• machine learning applications are characterized by: - complex structures - potential function that scores these structures
y = (y1, ..., yn)
✓(y1, ..., yn) =X
i2V
✓i(yi) +X
i,j2E
✓i,j(yi, yj)
Inference in machine learning
![Page 13: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/13.jpg)
• machine learning applications are characterized by: - complex structures - potential function that scores these structures
y = (y1, ..., yn)
✓(y1, ..., yn) =X
i2V
✓i(yi) +X
i,j2E
✓i,j(yi, yj)
high score low score
Inference in machine learning
![Page 14: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/14.jpg)
• machine learning applications are characterized by: - complex structures - potential function that scores these structures
y = (y1, ..., yn)
• For machine learning we need to efficiently infer from distributions over complex structures.
✓(y1, ..., yn) =X
i2V
✓i(yi) +X
i,j2E
✓i,j(yi, yj)
Inference in machine learning
![Page 15: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/15.jpg)
Gibbs distribution
• MCMC samplers: - Gibbs sampling, Metropolis-Hastings, Swendsen-Wang
• Many efficient sampling algorithms for special cases: - Ising models (Jerrum 93) - Counting bi-partite matchings in planar graphs (Kasteleyn 61) - Approximating the permanent (Jerrum 04) - Many others…
p(y1, ..., yn) =1
Zexp
⇣X
i
✓i(yi) +X
i,j
✓i,j(yi, yj)⌘
![Page 16: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/16.jpg)
• Sampling from the Gibbs distribution is provably hard in AI applications (Goldberg 05, Jerrum 93)
Sampling likely structures
✓i(yi) = log p(yi|xi)
• RGB color of pixel ixi
p(y) / exp
⇣X
i
✓i(yi) +X
i,j
✓i,j(yi, yj)⌘
✓i,j(yi, yj) =
(1 if yi = yj�1 otherwise
![Page 17: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/17.jpg)
• Sampling from the Gibbs distribution is provably hard in AI applications (Goldberg 05, Jerrum 93)
p(y) / exp
⇣X
i
✓i(yi) +X
i,j
✓i,j(yi, yj)⌘
Sampling likely structures
✓i(yi) = log p(yi|xi)
• RGB color of pixel ixi
✓i,j(yi, yj) =
(1 if yi = yj�1 otherwise
![Page 18: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/18.jpg)
• Sampling from the Gibbs distribution is provably hard in AI applications (Goldberg 05, Jerrum 93)
p(y) / exp
⇣X
i
✓i(yi) +X
i,j
✓i,j(yi, yj)⌘
Sampling likely structures
✓i(yi) = log p(yi|xi)
• RGB color of pixel ixi
✓i,j(yi, yj) =
(1 if yi = yj�1 otherwise
![Page 19: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/19.jpg)
• Sampling from the Gibbs distribution is provably hard in AI applications (Goldberg 05, Jerrum 93)
p(y) / exp
⇣X
i
✓i(yi) +X
i,j
✓i,j(yi, yj)⌘
Sampling likely structures
✓i(yi) = log p(yi|xi)
![Page 20: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/20.jpg)
• Sampling from the Gibbs distribution is provably hard in AI applications (Goldberg 05, Jerrum 93)
Sampling likely structures
✓i(yi) = log p(yi|xi)
• Recall: sampling from the Gibbs distribution is easy in Ising models (Jerrum 93)
✓i(yi) = 0
![Page 21: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/21.jpg)
• Sampling from the Gibbs distribution is provably hard in AI applications (Goldberg 05, Jerrum 93)
Sampling likely structures
✓i(yi) = log p(yi|xi)
• Recall: sampling from the Gibbs distribution is easy in Ising models (Jerrum 93)
✓i(yi) = 0
![Page 22: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/22.jpg)
• Sampling from the Gibbs distribution is provably hard in AI applications (Goldberg 05, Jerrum 93)
Sampling likely structures
✓i(yi) = log p(yi|xi)
• Recall: sampling from the Gibbs distribution is easy in Ising models (Jerrum 93)
✓i(yi) = 0
• Data terms (signals) that are important in AI applications significantly change the complexity of sampling
![Page 23: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/23.jpg)
• Instead of sampling, it may be significantly faster to find the most likely structure
Most likely structure
![Page 24: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/24.jpg)
• Instead of sampling, it may be significantly faster to find the most likely structure
Most likely structure
• The most likely structure
![Page 25: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/25.jpg)
Most likely structure
y⇤ = arg max
y1,...,yn
X
i
✓i(yi) +X
i,j
✓i,j(yi, yj)
• Maximum a-posterior (MAP) inference.
• Many efficient optimization algorithms for special cases: - Beliefs propagation: trees (Pearl 88), perfect graphs (Jebara 10), - Graph-cuts for image segmentation - branch and bound (Rother 09), branch and cut (Gurobi) - Linear programming relaxations (Schlesinger 76, Wainwright 05,
Kolmogorov 06, Werner 07, Sontag 08, Hazan 10, Batra 10, Nowozin 10, Pletscher 12, Kappes 13, Savchynskyy13, Tarlow 13, Kohli 13, Jancsary 13, Schwing 13)
- CKY for parsing - Many others…
![Page 26: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/26.jpg)
The challenge
Sampling from the likely high dimensional structures (with millions of variables, e.g., image segmentation with 12 million pixels) as efficient as optimizing
![Page 27: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/27.jpg)
• Selecting the maximizing structure is appropriate when one structure (e.g., segmentation / parse) dominates others
structures
scores
y⇤
Most likely structure
![Page 28: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/28.jpg)
• Selecting the maximizing structure is appropriate when one structure (e.g., segmentation / parse) dominates others
structures
scores
y⇤
Most likely structure
✓(y) =X
i
✓i(yi) +X
i,j
✓i,j(yi, yj)
y = (y1, ..., yn)
![Page 29: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/29.jpg)
• Selecting the maximizing structure is appropriate when one structure (e.g., segmentation / parse) dominates others
structures
scores
y⇤
Most likely structure
![Page 30: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/30.jpg)
• Selecting the maximizing structure is appropriate when one structure (e.g., segmentation / parse) dominates others
structures
scores
y⇤
Most likely structure
![Page 31: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/31.jpg)
• The maximizing structure is not robust in case of multiple high scoring alternatives
Most likely structure
![Page 32: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/32.jpg)
• The maximizing structure is not robust in case of multiple high scoring alternatives
structures
scores
y⇤
Most likely structure
![Page 33: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/33.jpg)
• The maximizing structure is not robust in case of multiple high scoring alternatives
structures
scores
y⇤
Probabilistic Vs. Rule-Based
• Rule based grammars do not generalize well across domains and languages:
Probabilistic Vs. Rule-Based
• Rule based grammars do not generalize well across domains and languages:
Most likely structure
![Page 34: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/34.jpg)
• The maximizing structure is not robust in case of multiple high scoring alternatives
Most likely structure
![Page 35: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/35.jpg)
• The maximizing structure is not robust in case of multiple high scoring alternatives
structures
scores
y⇤
Most likely structure
![Page 36: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/36.jpg)
• Randomly perturbing the system reveals its complexity
structures
scores
- little effect when the maximizing structure is “evident”
y⇤
Random perturbations
![Page 37: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/37.jpg)
• Randomly perturbing the system reveals its complexity - little effect when the maximizing structure is “evident”
structures
scores
y⇤
Random perturbations
![Page 38: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/38.jpg)
• Randomly perturbing the system reveals its complexity - little effect when the maximizing structure is “evident” - substantial effect when there are alternative high scoring
structures
structures
scores
structures
scores
y⇤
y⇤
Random perturbations
![Page 39: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/39.jpg)
• Randomly perturbing the system reveals its complexity - little effect when the maximizing structure is “evident” - substantial effect when there are alternative high scoring
structures
structures
scores
structures
scores
y⇤
y⇤
Random perturbations
![Page 40: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/40.jpg)
• Randomly perturbing the system reveals its complexity - little effect when the maximizing structure is “evident” - substantial effect when there are alternative high scoring
structures
structures
scores
structures
scores
y⇤
y⇤
Random perturbations
![Page 41: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/41.jpg)
• Randomly perturbing the system reveals its complexity - little effect when the maximizing structure is “evident” - substantial effect when there are alternative high scoring
structures
!
• Related work: - McFadden 74 (Discrete choice theory) - Talagrand 94, Barvinok 07 (Canonical processes)
structures
scores
structures
scores
y⇤
y⇤
Random perturbations
![Page 42: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/42.jpg)
Random perturbations
structures
scores
y⇤
✓(y⇤)
✓(y)
y
• Notation:
scores (potential) ✓(y)
![Page 43: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/43.jpg)
Random perturbations
structures
scores
y⇤
✓(y⇤)
✓(y)
y
• Notation:
scores (potential)
perturbed score
✓(y)
![Page 44: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/44.jpg)
Random perturbations
structures
scores
y⇤
✓(y⇤)
✓(y)
y
• Notation:
�(y)
scores (potential)
perturbed score
perturbations �(y)
✓(y)
![Page 45: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/45.jpg)
Random perturbations
structures
scores
y⇤
✓(y⇤)
✓(y)
y
• Notation:
�(y)
scores (potential)
perturbed score
perturbations �(y)
✓(y)
✓(y) + �(y)
![Page 46: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/46.jpg)
Random perturbations
structures
scores
y⇤
✓(y⇤)
✓(y)
y
�(y)
• For every structure y, the perturbation value is a random variable (y is an index, traditional notation is ).
�(y)�y
• Perturb-max models: how stable is the maximal structure to random changes in the potential function.
![Page 47: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/47.jpg)
Outline
• Random perturbation - why and how? - Sampling likely structures as fast as finding the most likely one.
• Connections and Alternatives to Gibbs distribution: - the marginal polytope - non-MCMC sampling for Gibbs with perturb-max
• Application: interactive annotation. - New entropy bounds for perturb-max models.
![Page 48: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/48.jpg)
Perturb-max models
•Theorem Let be i.i.d. with Gumbel distribution with zero mean�(y)
F (t)def= P [�(y) t] = exp(� exp(�t))
![Page 49: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/49.jpg)
Perturb-max models
•Theorem Let be i.i.d. with Gumbel distribution with zero mean�(y)
F (t)def= P [�(y) t] = exp(� exp(�t))
f(t) = F 0(t) = exp(�t)F (t)
![Page 50: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/50.jpg)
Perturb-max models
•Theorem Let be i.i.d. with Gumbel distribution with zero mean�(y)
F (t)def= P [�(y) t] = exp(� exp(�t))
then the perturb-max model is the Gibbs distribution
1
Zexp(✓(y)) = P�⇠Gumbel[y = argmax
y{✓(y) + �(y)}]
![Page 51: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/51.jpg)
has Gumbel distribution whose mean is
Let be i.i.d Gumbel ( ). Then�(y)
logZ
max
y{✓(y) + �(y)}
P [�(y) t] = F (t)
Perturb-max models
• Why Gumbel distribution?
• Since maximum of Gumbel variables is a Gumbel variable.F (t) = exp(� exp(�t))
Z =
X
y
exp(✓(y))
![Page 52: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/52.jpg)
has Gumbel distribution whose mean is
Let be i.i.d Gumbel ( ). Then�(y)
logZ
max
y{✓(y) + �(y)}
P [�(y) t] = F (t)
Perturb-max models
• Why Gumbel distribution?
• Since maximum of Gumbel variables is a Gumbel variable.F (t) = exp(� exp(�t))
![Page 53: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/53.jpg)
Perturb-max models
• Why Gumbel distribution?
• Since maximum of Gumbel variables is a Gumbel variable.F (t) = exp(� exp(�t))
= exp(�X
y
exp(�(t� ✓(y)))) = F (t� logZ)
•Proof: P� [max
y{✓(y) + �(y)} t] =
Y
y
F (t� ✓(y))
has Gumbel distribution whose mean is
Let be i.i.d Gumbel ( ). Then�(y)
logZ
max
y{✓(y) + �(y)}
P [�(y) t] = F (t)
![Page 54: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/54.jpg)
Perturb-max models
• Max stability:
1
Zexp(✓(y)) = P�⇠Gumbel[y = argmax
y{✓(y) + �(y)}]
• Implications (taking gradients):
log
⇣X
y
exp(✓(y))
⌘= E�⇠Gumbel
hmax
y{✓(y) + �(y)}
i
![Page 55: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/55.jpg)
Perturb-max models
• Representing the Gibbs distribution using perturb-max models may require exponential number of perturbations
![Page 56: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/56.jpg)
Perturb-max models
• Representing the Gibbs distribution using perturb-max models may require exponential number of perturbations
P� [y = argmax
y{✓(y) + �(y)}]
![Page 57: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/57.jpg)
Perturb-max models
• Representing the Gibbs distribution using perturb-max models may require exponential number of perturbations
P� [y = argmax
y{✓(y) + �(y)}]
y = (y1, ..., yn)
![Page 58: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/58.jpg)
Perturb-max models
• Representing the Gibbs distribution using perturb-max models may require exponential number of perturbations
P� [y = argmax
y{✓(y) + �(y)}]
y = (y1, ..., yn)
![Page 59: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/59.jpg)
Perturb-max models
• Representing the Gibbs distribution using perturb-max models may require exponential number of perturbations
P� [y = argmax
y{✓(y) + �(y)}]
• Use low dimension perturbations
P� [y = argmax
y{✓(y) +
nX
i=1
�i(yi)}]
![Page 60: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/60.jpg)
Outline
• Random perturbation - why and how? - Sampling likely structures as fast as finding the most likely one.
• Connections and Alternatives to Gibbs distribution: - the marginal polytope - non-MCMC sampling for Gibbs with perturb-max
• Application: interactive annotation. - New entropy bounds for perturb-max models.
![Page 61: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/61.jpg)
The marginal polytope✓(y1, ..., yn) =
X
i2V
✓i(yi) +X
i,j2E
✓i,j(yi, yj)
![Page 62: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/62.jpg)
The marginal polytope
y1 y2 y3
✓(y1, ..., yn) =X
i2V
✓i(yi) +X
i,j2E
✓i,j(yi, yj)
![Page 63: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/63.jpg)
The marginal polytope
y1 y2 y3
✓2(0)
✓2(1)
✓(y1, ..., yn) =X
i2V
✓i(yi) +X
i,j2E
✓i,j(yi, yj)
✓1(0)
✓1(1)
✓3(0)
✓3(0)
![Page 64: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/64.jpg)
The marginal polytope
y1 y2 y3
✓2(0)
✓2(1)
✓(y1, ..., yn) =X
i2V
✓i(yi) +X
i,j2E
✓i,j(yi, yj)
✓1,2(0, 0)✓1,2(0, 1)
✓1,2(1, 0)✓1,2(1, 1)
✓2,3(0, 0)✓2,3(0, 1)
✓2,3(1, 0)✓2,3(1, 1)✓1(0)
✓1(1)
✓3(0)
✓3(0)
![Page 65: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/65.jpg)
The marginal polytope
y1 y2 y3
✓(y1, ..., yn) =X
i2V
✓i(yi) +X
i,j2E
✓i,j(yi, yj)
M
µ
![Page 66: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/66.jpg)
The marginal polytope
y1 y2 y3
✓(y1, ..., yn) =X
i2V
✓i(yi) +X
i,j2E
✓i,j(yi, yj)
µ =
0
@µ1(0), µ1(1), µ2(0), µ2(1), µ3(0), µ3(1),µ1,2(0, 0), µ1,2(0, 1), µ1,2(1, 0), µ1,2(1, 1),µ2,3(0, 0), µ2,3(0, 1), µ2,3(1, 0), µ2,3(1, 1))
1
AM
µ
![Page 67: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/67.jpg)
The marginal polytope
y1 y2 y3
✓(y1, ..., yn) =X
i2V
✓i(yi) +X
i,j2E
✓i,j(yi, yj)
µ =
0
@µ1(0), µ1(1), µ2(0), µ2(1), µ3(0), µ3(1),µ1,2(0, 0), µ1,2(0, 1), µ1,2(1, 0), µ1,2(1, 1),µ2,3(0, 0), µ2,3(0, 1), µ2,3(1, 0), µ2,3(1, 1))
1
AM
µ9p(y1, y2, y3) s.t. µ1(y1) =
X
y2,y3
p(y1, y2, y3), ...
µ1,2(y1, y2) =X
y3
p(y1, y2, y3), ...
![Page 68: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/68.jpg)
The marginal polytope
M
![Page 69: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/69.jpg)
The marginal polytope
M
p(y) / exp
⇣X
i
✓i(yi) +X
i,j
✓i,j(yi, yj)⌘
![Page 70: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/70.jpg)
The marginal polytope
M
p(y) / exp
⇣X
i
✓i(yi) +X
i,j
✓i,j(yi, yj)⌘
![Page 71: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/71.jpg)
The marginal polytope
M
p(y) / exp
⇣X
i
✓i(yi) +X
i,j
✓i,j(yi, yj)⌘
minimal
![Page 72: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/72.jpg)
The marginal polytope
M
p(y) / exp
⇣X
i
✓i(yi) +X
i,j
✓i,j(yi, yj)⌘
minimal
p(y) = P�
hy = argmax
y
�X
i
✓i(yi) +X
i,j
✓i,j(yi, yj) +X
i
�i(yi) i
![Page 73: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/73.jpg)
The marginal polytope
M
p(y) / exp
⇣X
i
✓i(yi) +X
i,j
✓i,j(yi, yj)⌘
minimal
p(y) = P�
hy = argmax
y
�X
i
✓i(yi) +X
i,j
✓i,j(yi, yj) +X
i
�i(yi) i
![Page 74: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/74.jpg)
The marginal polytope
M
p(y) / exp
⇣X
i
✓i(yi) +X
i,j
✓i,j(yi, yj)⌘
minimal
p(y) = P�
hy = argmax
y
�X
i
✓i(yi) +X
i,j
✓i,j(yi, yj) +X
i
�i(yi) i
minimal
![Page 75: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/75.jpg)
The marginal polytope
M
p(y) = P�
hy = argmax
y
�X
i
✓i(yi) +X
i,j
✓i,j(yi, yj) +X
i
�i(yi) i
minimal
![Page 76: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/76.jpg)
The marginal polytope
M
p(y) = P�
hy = argmax
y
�X
i
✓i(yi) +X
i,j
✓i,j(yi, yj) +X
i
�i(yi) i
minimal
µ =
0
@µ1(0), µ1(1), µ2(0), µ2(1), µ3(0), µ3(1),µ1,2(0, 0), µ1,2(0, 1), µ1,2(1, 0), µ1,2(1, 1),µ2,3(0, 0), µ2,3(0, 1), µ2,3(1, 0), µ2,3(1, 1))
1
A
![Page 77: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/77.jpg)
The marginal polytope
M
p(y) = P�
hy = argmax
y
�X
i
✓i(yi) +X
i,j
✓i,j(yi, yj) +X
i
�i(yi) i
minimal
• Proof idea:
µi(yi) =@E�
hmaxy
�Pi ✓i(yi) +
Pi,j ✓i,j(yi, yj) +
Pi �i(yi)
i
@✓i(yi)
µ =
0
@µ1(0), µ1(1), µ2(0), µ2(1), µ3(0), µ3(1),µ1,2(0, 0), µ1,2(0, 1), µ1,2(1, 0), µ1,2(1, 1),µ2,3(0, 0), µ2,3(0, 1), µ2,3(1, 0), µ2,3(1, 1))
1
A
![Page 78: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/78.jpg)
The marginal polytope
M
p(y) = P�
hy = argmax
y
�X
i
✓i(yi) +X
i,j
✓i,j(yi, yj) +X
i
�i(yi) i
minimal
• Proof idea:
µi(yi) =@E�
hmaxy
�Pi ✓i(yi) +
Pi,j ✓i,j(yi, yj) +
Pi �i(yi)
i
@✓i(yi)
µ =
0
@µ1(0), µ1(1), µ2(0), µ2(1), µ3(0), µ3(1),µ1,2(0, 0), µ1,2(0, 1), µ1,2(1, 0), µ1,2(1, 1),µ2,3(0, 0), µ2,3(0, 1), µ2,3(1, 0), µ2,3(1, 1))
1
A
µi,j(yi, yj) =@E�
hmaxy
�Pi ✓i(yi) +
Pi,j ✓i,j(yi, yj) +
Pi �i(yi)
i
@✓i,j(yi, yj)
![Page 79: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/79.jpg)
Outline
• Random perturbation - why and how? - Sampling likely structures as fast as finding the most likely one.
• Connections and Alternatives to Gibbs distribution: - the marginal polytope - non-MCMC sampling for Gibbs with perturb-max
• Application: interactive annotation. - New entropy bounds for perturb-max models.
![Page 80: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/80.jpg)
Non-MCMC sampling
• Perturb-max provably approximate the Gibbs marginals on trees [H, Maji, Jaakkola 13] and practically on general graphs.
• Perturb-max sample from tree-shaped Gibbs distribution [Gane, H, Jaakkola 14]
![Page 81: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/81.jpg)
Outline
• Random perturbation - why and how? - Sampling likely structures as fast as finding the most likely one.
• Connections and Alternatives to Gibbs distribution: - the marginal polytope - non-MCMC sampling for Gibbs distributions with perturb-max
• Application: interactive annotation. - New entropy bounds for perturb-max models.
![Page 82: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/82.jpg)
Image annotation
• Image annotation is a time consuming (and tedious) task. Can computers do it for us?
![Page 83: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/83.jpg)
Image annotation• Why not to use the most likely annotation instead?
![Page 84: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/84.jpg)
Image annotation
• Most likely annotation is inaccurate around - “thin” areas
• Why not to use the most likely annotation instead?
![Page 85: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/85.jpg)
Image annotation
• Most likely annotation is inaccurate around - “thin” areas - clutter
• Why not to use the most likely annotation instead?
![Page 86: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/86.jpg)
Interactive image annotation
• Perturb-max models show the boundary of decision.
![Page 87: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/87.jpg)
Interactive image annotation
• Perturb-max models show the boundary of decision.
![Page 88: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/88.jpg)
Interactive image annotation
• Interactive annotation directs the human annotator to areas of uncertainty - significantly reduces annotation time [Maji, H., Jaakkola 14].
• Perturb-max models show the boundary of decision.
![Page 89: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/89.jpg)
Uncertainty
• Entropy H(p✓) = �X
y
p✓(y) log p✓(y)
• Entropy = uncertainty - It is a nonnegative function over probability distributions. - It attains its maximal value for the uniform distribution. - It attains its minimal value for the zero-one distribution.
• Computing the entropy requires summing over exponential many configurations y = (y1, ..., yn)
• Can we bound it with perturb-max approach?
![Page 90: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/90.jpg)
Uncertainty
• Perturb-max models
p✓(y)def= P� [y = argmax
y{✓(y) +
nX
i=1
�i(yi)}]
• Entropy
H(p✓) = �X
y
p✓(y) log p✓(y)
• Entropy bound H(p✓) E�
h nX
i=1
�i(y⇤i )i
y⇤ = argmax
y{✓(y) +
nX
i=1
�i(yi)}
![Page 91: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/91.jpg)
• Perturb-max entropy bound:
H(p✓) EhX
i
�i(y⇤i )i=
X
i
Eh�i(y
⇤i )i
• Standard entropy independence bound:
H(p✓) X
i
H(p✓(yi))
p✓(yi) = P� [yi = argmax
y{✓(y) +
nX
i=1
�i(yi)}]
• Perturb-max entropy bound requires less samples since sampled average tail decreases exponentially [Orabona, H. Sarwate, Jaakkola 14].
Uncertainty
• How does it compare to standard entropy bounds?
![Page 92: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/92.jpg)
Perturb-max entropy bounds• Spin glass, 10x10 grid
X
i
✓i(yi) +X
i,j
✓i,j(yi, yj)
10−1 100 101 1020
1
2
3
4
5
6
7
λ (E = Σi θi yi − Σi,j λθijyiyj)
Entro
py e
stim
ate
3 x 3 grid
EntropyMarginal entropyPerturb entropy
![Page 93: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/93.jpg)
Interactive image annotation
initial boundary final boundary
refinedcurrent refinedcurrent
…
…
![Page 94: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/94.jpg)
Outline
• Random perturbation - why and how? - Sampling likely structures as fast as finding the most likely one.
• Connections and Alternatives to Gibbs distribution: - the marginal polytope - non-MCMC sampling for Gibbs distributions with perturb-max
• Application: interactive annotation. - New entropy bounds for perturb-max models.
![Page 95: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/95.jpg)
Open problems
• Perturb-max models: - When does fixing variables in the max-function amount to statistical
conditioning? - When do perturb-max models preserve the most likely assignment? - How do the perturbations dimension affect the model properties? - In what ways higher dimension perturbations reveal complex
structures in the model? - How to apply perturbations in restricted spaces, e.g., super-modular
potential functions? - How to encourage diverse sampling? - Perturb-max models stabilize the prediction. Do they connect
computational and statistical stability?
![Page 96: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/96.jpg)
Thank you
![Page 97: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/97.jpg)
• Theorem:
y⇤ = argmax
y{✓(y) +
nX
i=1
�i(yi)}
Uncertainty*
H(p✓) EhX
i
�i(y⇤i )i
![Page 98: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/98.jpg)
• Theorem:
y⇤ = argmax
y{✓(y) +
nX
i=1
�i(yi)}
Uncertainty*
H(p✓) EhX
i
�i(y⇤i )i
• Proof idea: conjugate duality
H(p) = min
✓
n
logZ(
ˆ✓) �X
y
ˆ✓(y)p(y)
o
![Page 99: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/99.jpg)
• Theorem:
y⇤ = argmax
y{✓(y) +
nX
i=1
�i(yi)}
Uncertainty*
H(p✓) EhX
i
�i(y⇤i )i
• Proof idea: conjugate duality
H(p) = min
✓
n
logZ(
ˆ✓) �X
y
ˆ✓(y)p(y)
o
logZ(
ˆ✓) E�
hmax
y
�ˆ✓(y) +
X
i
�i(yi) i
![Page 100: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/100.jpg)
The flashback slide
• Max stability:
log
⇣X
y
exp(✓(y))
⌘= E�⇠Gumbel
hmax
y{✓(y) + �(y)}
i
![Page 101: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/101.jpg)
• Theorem:
y⇤ = argmax
y{✓(y) +
nX
i=1
�i(yi)}
Uncertainty*
H(p✓) EhX
i
�i(y⇤i )i
• Proof idea: conjugate duality
H(p) = min
✓
n
logZ(
ˆ✓) �X
y
ˆ✓(y)p(y)
o
logZ(
ˆ✓) E�
hmax
y
�ˆ✓(y) +
X
i
�i(yi) i
H(p) min
✓
n
E�
h
max
y
�
ˆ✓(y) +X
i
�i(yi)
i
�X
y
ˆ✓(y)p(y)o
![Page 102: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/102.jpg)
• Theorem:
y⇤ = argmax
y{✓(y) +
nX
i=1
�i(yi)}
Uncertainty*
H(p✓) EhX
i
�i(y⇤i )i
• Proof idea: conjugate duality
H(p) = min
✓
n
logZ(
ˆ✓) �X
y
ˆ✓(y)p(y)
o
logZ(
ˆ✓) E�
hmax
y
�ˆ✓(y) +
X
i
�i(yi) i
H(p) min
✓
n
E�
h
max
y
�
ˆ✓(y) +X
i
�i(yi)
i
�X
y
ˆ✓(y)p(y)o
p✓ p✓
![Page 103: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/103.jpg)
• Theorem:
y⇤ = argmax
y{✓(y) +
nX
i=1
�i(yi)}
Uncertainty*
H(p✓) EhX
i
�i(y⇤i )i
• Proof idea: conjugate duality
H(p) = min
✓
n
logZ(
ˆ✓) �X
y
ˆ✓(y)p(y)
o
logZ(
ˆ✓) E�
hmax
y
�ˆ✓(y) +
X
i
�i(yi) i
H(p) min
✓
n
E�
h
max
y
�
ˆ✓(y) +X
i
�i(yi)
i
�X
y
ˆ✓(y)p(y)o
p✓ p✓✓⇤ = ✓ ✓⇤ = ✓
![Page 104: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/104.jpg)
• Theorem:
y⇤ = argmax
y{✓(y) +
nX
i=1
�i(yi)}
Uncertainty*
H(p✓) EhX
i
�i(y⇤i )i
• Proof idea: conjugate duality
H(p) = min
✓
n
logZ(
ˆ✓) �X
y
ˆ✓(y)p(y)
o
logZ(
ˆ✓) E�
hmax
y
�ˆ✓(y) +
X
i
�i(yi) i
H(p) min
✓
n
E�
h
max
y
�
ˆ✓(y) +X
i
�i(yi)
i
�X
y
ˆ✓(y)p(y)o
p✓ p✓✓⇤ = ✓ ✓⇤ = ✓
H(p✓) E�
hmax
y
�✓(y) +
X
i
�i(yi) i
�X
y
✓(y)p✓(y)
![Page 105: Inference by randomly perturbing max-solvers · Outline •Random perturbation - why and how? !-Sampling likely structures as fast as finding the most likely one.!•Connections](https://reader035.vdocuments.us/reader035/viewer/2022071218/604f1d4b8c0a52062474845a/html5/thumbnails/105.jpg)
• Theorem:
y⇤ = argmax
y{✓(y) +
nX
i=1
�i(yi)}
Uncertainty*
H(p✓) EhX
i
�i(y⇤i )i
• Proof idea: conjugate duality
H(p) = min
✓
n
logZ(
ˆ✓) �X
y
ˆ✓(y)p(y)
o
logZ(
ˆ✓) E�
hmax
y
�ˆ✓(y) +
X
i
�i(yi) i
H(p) min
✓
n
E�
h
max
y
�
ˆ✓(y) +X
i
�i(yi)
i
�X
y
ˆ✓(y)p(y)o
p✓ p✓✓⇤ = ✓ ✓⇤ = ✓
H(p✓) E�
hmax
y
�✓(y) +
X
i
�i(yi) i
�X
y
✓(y)p✓(y)