learning a stroke-based representation for fonts[campbell '14] limitations ... ciples and font...
TRANSCRIPT
Learning A Stroke-Based Representation for Fonts
E. Balashova, A. H Bermano, V. G. Kim, S. DiVerdi, A. Hertzmann, T. Funkhouser
1 - Princeton University 2 - Adobe Research
1 1
1
2 2
2
Goal: Stroke-Based Font Representation Suitable for Learning
Manifold Learning: Stroke-Based Parametrization Examples:
Motivation
Motivation
Desired Font Representation Characteristics
Desired Font Representation Characteristics
• Detail-preserving
• Structure-aware
• Scales without artifacts
• Suitable for learning
Manifold Learning:]
Consistency:Examples:Retrieval:
Completion:
Interpolation:
Desired Font Representation Characteristics
• Detail-preserving
• Structure-aware
• Scales without artifacts
• Suitable for learning
Manifold Learning:]
Consistency:Examples:Retrieval:
Completion:
Interpolation:
Desired Font Representation Characteristics
• Detail-preserving
• Structure-aware
• Scales without artifacts
• Suitable for learning
Manifold Learning:]
Consistency:Examples:Retrieval:
Completion:
Interpolation:
Manifold Learning:]
Consistency:Examples:Retrieval:
Completion:
Interpolation:
Desired Font Representation Characteristics
• Detail-preserving
• Structure-aware
• Scales without artifacts
• Suitable for learning
Manifold Learning:]
Consistency:Examples:Retrieval:
Completion:
Interpolation:
Font Representations
• Raster - Based
• Contour - Based
Font Representations
Raster - Based
Limitations
• Scales with artifacts
• Not detail-preserving
• Not structure-aware
Font Representations
Contour-Based
Manifold Learning:]
Consistency:Examples:Retrieval:
Completion:
Interpolation:
Font Representations
Bezier-Based
Limitations
• Not suitable for learning
[Microsoft Docs]
Font Representations
Consistent Outline-Based
[Campbell '14]
Limitations
• Not structure-aware
Font Representations
Proposed Representation
Characteristics
• Detail-preserving
• Structure-aware
• Scales without artifacts
• Suitable for learning
Part-Aware
Skeleton-Based Representation
sa
oa+
oa-
Cap
θo t(1,1,1,0,1,0)θs
oa+ oaoa+ ...oac -
SkeletonOutline
Connectivity Constraints
Other topologysa,b
oa+oac oa-
Our Approach
Examples: Consistency:
Our Approach
Examples:
sa
oa+
oa-
Cap
θo θt(1,1,1,0,1,0)θs
oa+ oaoa+ ...oac -
SkeletonOutline
Connectivity Constraints
Other topologysa,b
oa+oac oa-
sa
oa+
oa-
Cap
θo t(1,1,1,0,1,0)θs
oa+ oaoa+ ...oac -
SkeletonOutline
Connectivity Constraints
Other topologysa,b
oa+oac oa-
Template DefinitionTemplate Fitting
Consistency:
Our Approach
Examples:
Template DefinitionTemplate Fitting
Consistency:
Our Approach
Examples:
Skeleton OptimizationTemplate Fitting
Consistency:
E. Balashova et al. / Learning A Stroke-Based Representation for Fonts
(a) (b) (c) (d)
Figure 5: We illustrate our skeleton fitting pipeline. Given an inputglyph, we first extract its skeleton (a). We then register this skele-ton to a template (b) to obtain an initial segmentation (c). We thenuse CRF-based segmentation to obtain the final consistent segmen-tation of the skeleton with respect to the template (d).
of the skeleton to the template, and the pairwise term favors cuts atsharp features and intersections. In particular, we optimize:
E = Âx2V
Wu ·U(x,L(x))+ Âx,y2V,L(x) 6=L(y)
Wb ·B(x,y), (5)
where the unary U(x, l) term is a penalty for assigning a point x alabel l and the binary term B(x,y) is a penalty for a pair of adjacentskeleton points to have different labels. These are formulated as:
U(x, l) = hsu (Dl(x)) (6)
B(x,y) =
(1�hsa (A(x,y)) , (x,y) 2 E ^ x /2 J0, otherwise
(7)
To define the unary term, we use Dl(x), a distance between thetemplate and the extracted skeleton after it is registered onto thetemplate skeleton using Coherent Point Drift method [MSCP⇤07],where the distance is measured to the lth skeleton segment. To de-fine the binary term we use the angle between the normals of adja-cent points A(x,y), where E denotes adjacency and J denotes junc-tions - points where more than two skeleton paths meet. Both termsare smoothed with the aforementioned Gaussian kernel hsigma, andwe set su = 0.25,sa =
p3,Wu = 1,Wb = 4.
After the skeleton is consistently segmented with the template,we parametrize every segment according to the template Tc(q), i.e.we fit a Bezier curve to match each stroke sc, f . Once the strokes ofthe glyph are fitted, we turn to initialize the outlines next.
5.3.2. Outline Initialization, Q0o
We represent the outline in the coordinate system of the extractedskeleton segment (see Figure 2). To create an initial outline esti-mate, we first generate a profile curve at a constant distance thatfits closely to the input outline for every stroke (i.e., such that theaverage distance between the curve and the closest point on the out-line is minimized) (see Figure 6). The resulting outline is typicallynot continuous since the curve parameters are estimated indepen-dently for each stroke. We enforce C0 continuity at junctions ofoutline segments by snapping the corresponding points to their av-erage position. Sometimes this averaged control point can flip sidesof the skeleton, which leads to inferior optimization performances.We detect these cases and project the average control point back tothe correct side of the corresponding skeleton segment.
6. Learning
Given a collection of fonts F = {F1,F2, ...,FN} we use our glyphfitting techniques to represent all letter glyphs consistently. We con-
Figure 6: Outline optimization. Given an input glyph and its cal-culated skeleton (top left), we use its segmented version (see Sec-tion 5.3.1) to generate uniform width outlines around it (top mid-dle). Our initial guess Oc(Q0) is the latter, after incorporating thetemplate’s connectivity constraints (top right). Our fitting proce-dure takes three considerations into account. Optimizing using onlyEcorr (see Section 5.1.1) can attract incompatible regions, as can beseen in the red box (bottom left). Considering normal directions(see Section 5.1.2) mitigates this issue (bottom middle). Finally,incorporating feature point alignment (see Section 5.1.3) is crucialfor a semantically meaningful fitting, inducing consistent represen-tation across different fonts. This is highlighted in the red circles(bottom right).
catenate per-glyph parameters to create a feature vector represent-ing a font: Fi = [Q‘a’,Q‘b’, ...Q‘z’]. In this representation each fontbecomes a point in high-dimensional feature space: Fi 2 RDfont ,where Dfont = 3998 (the number of parameters required to repre-sent all glyph parts in a font, after enforcement of continuity andtemplate part-sharing constraints). We expect this representation tobe redundant due to stylistic and structural similarities in fonts:glyphs in the same font will share stylistic elements, and glyphsthat correspond to the same letter will have similar stroke structure.We propose to learn these relationships by projecting all fonts toa lower-dimensional embedding. The main motivation behind thisis to create a space where important correlations between differentattributes are captured implicitly, and thus, any sample point X inthis representation will implicitly adhere to common design prin-ciples and font structures in the training data F . For this we needan embedding function M that projects a font to a low-dimensionalfeature space, i.e. M(F) = X , where X 2 RD, and an inverse func-tion C that can reconstruct a font from the embedding C(X) = F .To enforce learning correlations in the data we set the latent dimen-sionality D = aDfont = 39 with a = 0.01, which was determinedexperimentally to work best for our setup.
In addition to capturing important correlations, we want the em-bedding function to handle missing entries in the input feature vec-tors Fi because most fonts do not have glyphs with all possibletopologies (i.e., decorative elements and stroke structures). In addi-tion, in many applications in order to help the typeface designer weneed to be able to reason about the whole font before it has been
c� 2018 The Author(s)Computer Graphics Forum c� 2018 The Eurographics Association and John Wiley & Sons Ltd.
E. Balashova et al. / Learning A Stroke-Based Representation for Fonts
(a) (b) (c) (d)
Figure 5: We illustrate our skeleton fitting pipeline. Given an inputglyph, we first extract its skeleton (a). We then register this skele-ton to a template (b) to obtain an initial segmentation (c). We thenuse CRF-based segmentation to obtain the final consistent segmen-tation of the skeleton with respect to the template (d).
of the skeleton to the template, and the pairwise term favors cuts atsharp features and intersections. In particular, we optimize:
E = Âx2V
Wu ·U(x,L(x))+ Âx,y2V,L(x) 6=L(y)
Wb ·B(x,y), (5)
where the unary U(x, l) term is a penalty for assigning a point x alabel l and the binary term B(x,y) is a penalty for a pair of adjacentskeleton points to have different labels. These are formulated as:
U(x, l) = hsu (Dl(x)) (6)
B(x,y) =
(1�hsa (A(x,y)) , (x,y) 2 E ^ x /2 J0, otherwise
(7)
To define the unary term, we use Dl(x), a distance between thetemplate and the extracted skeleton after it is registered onto thetemplate skeleton using Coherent Point Drift method [MSCP⇤07],where the distance is measured to the lth skeleton segment. To de-fine the binary term we use the angle between the normals of adja-cent points A(x,y), where E denotes adjacency and J denotes junc-tions - points where more than two skeleton paths meet. Both termsare smoothed with the aforementioned Gaussian kernel hsigma, andwe set su = 0.25,sa =
p3,Wu = 1,Wb = 4.
After the skeleton is consistently segmented with the template,we parametrize every segment according to the template Tc(q), i.e.we fit a Bezier curve to match each stroke sc, f . Once the strokes ofthe glyph are fitted, we turn to initialize the outlines next.
5.3.2. Outline Initialization, Q0o
We represent the outline in the coordinate system of the extractedskeleton segment (see Figure 2). To create an initial outline esti-mate, we first generate a profile curve at a constant distance thatfits closely to the input outline for every stroke (i.e., such that theaverage distance between the curve and the closest point on the out-line is minimized) (see Figure 6). The resulting outline is typicallynot continuous since the curve parameters are estimated indepen-dently for each stroke. We enforce C0 continuity at junctions ofoutline segments by snapping the corresponding points to their av-erage position. Sometimes this averaged control point can flip sidesof the skeleton, which leads to inferior optimization performances.We detect these cases and project the average control point back tothe correct side of the corresponding skeleton segment.
6. Learning
Given a collection of fonts F = {F1,F2, ...,FN} we use our glyphfitting techniques to represent all letter glyphs consistently. We con-
Figure 6: Outline optimization. Given an input glyph and its cal-culated skeleton (top left), we use its segmented version (see Sec-tion 5.3.1) to generate uniform width outlines around it (top mid-dle). Our initial guess Oc(Q0) is the latter, after incorporating thetemplate’s connectivity constraints (top right). Our fitting proce-dure takes three considerations into account. Optimizing using onlyEcorr (see Section 5.1.1) can attract incompatible regions, as can beseen in the red box (bottom left). Considering normal directions(see Section 5.1.2) mitigates this issue (bottom middle). Finally,incorporating feature point alignment (see Section 5.1.3) is crucialfor a semantically meaningful fitting, inducing consistent represen-tation across different fonts. This is highlighted in the red circles(bottom right).
catenate per-glyph parameters to create a feature vector represent-ing a font: Fi = [Q‘a’,Q‘b’, ...Q‘z’]. In this representation each fontbecomes a point in high-dimensional feature space: Fi 2 RDfont ,where Dfont = 3998 (the number of parameters required to repre-sent all glyph parts in a font, after enforcement of continuity andtemplate part-sharing constraints). We expect this representation tobe redundant due to stylistic and structural similarities in fonts:glyphs in the same font will share stylistic elements, and glyphsthat correspond to the same letter will have similar stroke structure.We propose to learn these relationships by projecting all fonts toa lower-dimensional embedding. The main motivation behind thisis to create a space where important correlations between differentattributes are captured implicitly, and thus, any sample point X inthis representation will implicitly adhere to common design prin-ciples and font structures in the training data F . For this we needan embedding function M that projects a font to a low-dimensionalfeature space, i.e. M(F) = X , where X 2 RD, and an inverse func-tion C that can reconstruct a font from the embedding C(X) = F .To enforce learning correlations in the data we set the latent dimen-sionality D = aDfont = 39 with a = 0.01, which was determinedexperimentally to work best for our setup.
In addition to capturing important correlations, we want the em-bedding function to handle missing entries in the input feature vec-tors Fi because most fonts do not have glyphs with all possibletopologies (i.e., decorative elements and stroke structures). In addi-tion, in many applications in order to help the typeface designer weneed to be able to reason about the whole font before it has been
c� 2018 The Author(s)Computer Graphics Forum c� 2018 The Eurographics Association and John Wiley & Sons Ltd.
Initial Skeleton
Our Approach
Examples:
Skeleton OptimizationTemplate Fitting
Consistency:
E. Balashova et al. / Learning A Stroke-Based Representation for Fonts
(a) (b) (c) (d)
Figure 5: We illustrate our skeleton fitting pipeline. Given an inputglyph, we first extract its skeleton (a). We then register this skele-ton to a template (b) to obtain an initial segmentation (c). We thenuse CRF-based segmentation to obtain the final consistent segmen-tation of the skeleton with respect to the template (d).
of the skeleton to the template, and the pairwise term favors cuts atsharp features and intersections. In particular, we optimize:
E = Âx2V
Wu ·U(x,L(x))+ Âx,y2V,L(x) 6=L(y)
Wb ·B(x,y), (5)
where the unary U(x, l) term is a penalty for assigning a point x alabel l and the binary term B(x,y) is a penalty for a pair of adjacentskeleton points to have different labels. These are formulated as:
U(x, l) = hsu (Dl(x)) (6)
B(x,y) =
(1�hsa (A(x,y)) , (x,y) 2 E ^ x /2 J0, otherwise
(7)
To define the unary term, we use Dl(x), a distance between thetemplate and the extracted skeleton after it is registered onto thetemplate skeleton using Coherent Point Drift method [MSCP⇤07],where the distance is measured to the lth skeleton segment. To de-fine the binary term we use the angle between the normals of adja-cent points A(x,y), where E denotes adjacency and J denotes junc-tions - points where more than two skeleton paths meet. Both termsare smoothed with the aforementioned Gaussian kernel hsigma, andwe set su = 0.25,sa =
p3,Wu = 1,Wb = 4.
After the skeleton is consistently segmented with the template,we parametrize every segment according to the template Tc(q), i.e.we fit a Bezier curve to match each stroke sc, f . Once the strokes ofthe glyph are fitted, we turn to initialize the outlines next.
5.3.2. Outline Initialization, Q0o
We represent the outline in the coordinate system of the extractedskeleton segment (see Figure 2). To create an initial outline esti-mate, we first generate a profile curve at a constant distance thatfits closely to the input outline for every stroke (i.e., such that theaverage distance between the curve and the closest point on the out-line is minimized) (see Figure 6). The resulting outline is typicallynot continuous since the curve parameters are estimated indepen-dently for each stroke. We enforce C0 continuity at junctions ofoutline segments by snapping the corresponding points to their av-erage position. Sometimes this averaged control point can flip sidesof the skeleton, which leads to inferior optimization performances.We detect these cases and project the average control point back tothe correct side of the corresponding skeleton segment.
6. Learning
Given a collection of fonts F = {F1,F2, ...,FN} we use our glyphfitting techniques to represent all letter glyphs consistently. We con-
Figure 6: Outline optimization. Given an input glyph and its cal-culated skeleton (top left), we use its segmented version (see Sec-tion 5.3.1) to generate uniform width outlines around it (top mid-dle). Our initial guess Oc(Q0) is the latter, after incorporating thetemplate’s connectivity constraints (top right). Our fitting proce-dure takes three considerations into account. Optimizing using onlyEcorr (see Section 5.1.1) can attract incompatible regions, as can beseen in the red box (bottom left). Considering normal directions(see Section 5.1.2) mitigates this issue (bottom middle). Finally,incorporating feature point alignment (see Section 5.1.3) is crucialfor a semantically meaningful fitting, inducing consistent represen-tation across different fonts. This is highlighted in the red circles(bottom right).
catenate per-glyph parameters to create a feature vector represent-ing a font: Fi = [Q‘a’,Q‘b’, ...Q‘z’]. In this representation each fontbecomes a point in high-dimensional feature space: Fi 2 RDfont ,where Dfont = 3998 (the number of parameters required to repre-sent all glyph parts in a font, after enforcement of continuity andtemplate part-sharing constraints). We expect this representation tobe redundant due to stylistic and structural similarities in fonts:glyphs in the same font will share stylistic elements, and glyphsthat correspond to the same letter will have similar stroke structure.We propose to learn these relationships by projecting all fonts toa lower-dimensional embedding. The main motivation behind thisis to create a space where important correlations between differentattributes are captured implicitly, and thus, any sample point X inthis representation will implicitly adhere to common design prin-ciples and font structures in the training data F . For this we needan embedding function M that projects a font to a low-dimensionalfeature space, i.e. M(F) = X , where X 2 RD, and an inverse func-tion C that can reconstruct a font from the embedding C(X) = F .To enforce learning correlations in the data we set the latent dimen-sionality D = aDfont = 39 with a = 0.01, which was determinedexperimentally to work best for our setup.
In addition to capturing important correlations, we want the em-bedding function to handle missing entries in the input feature vec-tors Fi because most fonts do not have glyphs with all possibletopologies (i.e., decorative elements and stroke structures). In addi-tion, in many applications in order to help the typeface designer weneed to be able to reason about the whole font before it has been
c� 2018 The Author(s)Computer Graphics Forum c� 2018 The Eurographics Association and John Wiley & Sons Ltd.
E. Balashova et al. / Learning A Stroke-Based Representation for Fonts
(a) (b) (c) (d)
Figure 5: We illustrate our skeleton fitting pipeline. Given an inputglyph, we first extract its skeleton (a). We then register this skele-ton to a template (b) to obtain an initial segmentation (c). We thenuse CRF-based segmentation to obtain the final consistent segmen-tation of the skeleton with respect to the template (d).
of the skeleton to the template, and the pairwise term favors cuts atsharp features and intersections. In particular, we optimize:
E = Âx2V
Wu ·U(x,L(x))+ Âx,y2V,L(x) 6=L(y)
Wb ·B(x,y), (5)
where the unary U(x, l) term is a penalty for assigning a point x alabel l and the binary term B(x,y) is a penalty for a pair of adjacentskeleton points to have different labels. These are formulated as:
U(x, l) = hsu (Dl(x)) (6)
B(x,y) =
(1�hsa (A(x,y)) , (x,y) 2 E ^ x /2 J0, otherwise
(7)
To define the unary term, we use Dl(x), a distance between thetemplate and the extracted skeleton after it is registered onto thetemplate skeleton using Coherent Point Drift method [MSCP⇤07],where the distance is measured to the lth skeleton segment. To de-fine the binary term we use the angle between the normals of adja-cent points A(x,y), where E denotes adjacency and J denotes junc-tions - points where more than two skeleton paths meet. Both termsare smoothed with the aforementioned Gaussian kernel hsigma, andwe set su = 0.25,sa =
p3,Wu = 1,Wb = 4.
After the skeleton is consistently segmented with the template,we parametrize every segment according to the template Tc(q), i.e.we fit a Bezier curve to match each stroke sc, f . Once the strokes ofthe glyph are fitted, we turn to initialize the outlines next.
5.3.2. Outline Initialization, Q0o
We represent the outline in the coordinate system of the extractedskeleton segment (see Figure 2). To create an initial outline esti-mate, we first generate a profile curve at a constant distance thatfits closely to the input outline for every stroke (i.e., such that theaverage distance between the curve and the closest point on the out-line is minimized) (see Figure 6). The resulting outline is typicallynot continuous since the curve parameters are estimated indepen-dently for each stroke. We enforce C0 continuity at junctions ofoutline segments by snapping the corresponding points to their av-erage position. Sometimes this averaged control point can flip sidesof the skeleton, which leads to inferior optimization performances.We detect these cases and project the average control point back tothe correct side of the corresponding skeleton segment.
6. Learning
Given a collection of fonts F = {F1,F2, ...,FN} we use our glyphfitting techniques to represent all letter glyphs consistently. We con-
Figure 6: Outline optimization. Given an input glyph and its cal-culated skeleton (top left), we use its segmented version (see Sec-tion 5.3.1) to generate uniform width outlines around it (top mid-dle). Our initial guess Oc(Q0) is the latter, after incorporating thetemplate’s connectivity constraints (top right). Our fitting proce-dure takes three considerations into account. Optimizing using onlyEcorr (see Section 5.1.1) can attract incompatible regions, as can beseen in the red box (bottom left). Considering normal directions(see Section 5.1.2) mitigates this issue (bottom middle). Finally,incorporating feature point alignment (see Section 5.1.3) is crucialfor a semantically meaningful fitting, inducing consistent represen-tation across different fonts. This is highlighted in the red circles(bottom right).
catenate per-glyph parameters to create a feature vector represent-ing a font: Fi = [Q‘a’,Q‘b’, ...Q‘z’]. In this representation each fontbecomes a point in high-dimensional feature space: Fi 2 RDfont ,where Dfont = 3998 (the number of parameters required to repre-sent all glyph parts in a font, after enforcement of continuity andtemplate part-sharing constraints). We expect this representation tobe redundant due to stylistic and structural similarities in fonts:glyphs in the same font will share stylistic elements, and glyphsthat correspond to the same letter will have similar stroke structure.We propose to learn these relationships by projecting all fonts toa lower-dimensional embedding. The main motivation behind thisis to create a space where important correlations between differentattributes are captured implicitly, and thus, any sample point X inthis representation will implicitly adhere to common design prin-ciples and font structures in the training data F . For this we needan embedding function M that projects a font to a low-dimensionalfeature space, i.e. M(F) = X , where X 2 RD, and an inverse func-tion C that can reconstruct a font from the embedding C(X) = F .To enforce learning correlations in the data we set the latent dimen-sionality D = aDfont = 39 with a = 0.01, which was determinedexperimentally to work best for our setup.
In addition to capturing important correlations, we want the em-bedding function to handle missing entries in the input feature vec-tors Fi because most fonts do not have glyphs with all possibletopologies (i.e., decorative elements and stroke structures). In addi-tion, in many applications in order to help the typeface designer weneed to be able to reason about the whole font before it has been
c� 2018 The Author(s)Computer Graphics Forum c� 2018 The Eurographics Association and John Wiley & Sons Ltd.
RegistrationInitial Skeleton
Our Approach
Examples:
Skeleton OptimizationTemplate Fitting
Consistency:
E. Balashova et al. / Learning A Stroke-Based Representation for Fonts
(a) (b) (c) (d)
Figure 5: We illustrate our skeleton fitting pipeline. Given an inputglyph, we first extract its skeleton (a). We then register this skele-ton to a template (b) to obtain an initial segmentation (c). We thenuse CRF-based segmentation to obtain the final consistent segmen-tation of the skeleton with respect to the template (d).
of the skeleton to the template, and the pairwise term favors cuts atsharp features and intersections. In particular, we optimize:
E = Âx2V
Wu ·U(x,L(x))+ Âx,y2V,L(x) 6=L(y)
Wb ·B(x,y), (5)
where the unary U(x, l) term is a penalty for assigning a point x alabel l and the binary term B(x,y) is a penalty for a pair of adjacentskeleton points to have different labels. These are formulated as:
U(x, l) = hsu (Dl(x)) (6)
B(x,y) =
(1�hsa (A(x,y)) , (x,y) 2 E ^ x /2 J0, otherwise
(7)
To define the unary term, we use Dl(x), a distance between thetemplate and the extracted skeleton after it is registered onto thetemplate skeleton using Coherent Point Drift method [MSCP⇤07],where the distance is measured to the lth skeleton segment. To de-fine the binary term we use the angle between the normals of adja-cent points A(x,y), where E denotes adjacency and J denotes junc-tions - points where more than two skeleton paths meet. Both termsare smoothed with the aforementioned Gaussian kernel hsigma, andwe set su = 0.25,sa =
p3,Wu = 1,Wb = 4.
After the skeleton is consistently segmented with the template,we parametrize every segment according to the template Tc(q), i.e.we fit a Bezier curve to match each stroke sc, f . Once the strokes ofthe glyph are fitted, we turn to initialize the outlines next.
5.3.2. Outline Initialization, Q0o
We represent the outline in the coordinate system of the extractedskeleton segment (see Figure 2). To create an initial outline esti-mate, we first generate a profile curve at a constant distance thatfits closely to the input outline for every stroke (i.e., such that theaverage distance between the curve and the closest point on the out-line is minimized) (see Figure 6). The resulting outline is typicallynot continuous since the curve parameters are estimated indepen-dently for each stroke. We enforce C0 continuity at junctions ofoutline segments by snapping the corresponding points to their av-erage position. Sometimes this averaged control point can flip sidesof the skeleton, which leads to inferior optimization performances.We detect these cases and project the average control point back tothe correct side of the corresponding skeleton segment.
6. Learning
Given a collection of fonts F = {F1,F2, ...,FN} we use our glyphfitting techniques to represent all letter glyphs consistently. We con-
Figure 6: Outline optimization. Given an input glyph and its cal-culated skeleton (top left), we use its segmented version (see Sec-tion 5.3.1) to generate uniform width outlines around it (top mid-dle). Our initial guess Oc(Q0) is the latter, after incorporating thetemplate’s connectivity constraints (top right). Our fitting proce-dure takes three considerations into account. Optimizing using onlyEcorr (see Section 5.1.1) can attract incompatible regions, as can beseen in the red box (bottom left). Considering normal directions(see Section 5.1.2) mitigates this issue (bottom middle). Finally,incorporating feature point alignment (see Section 5.1.3) is crucialfor a semantically meaningful fitting, inducing consistent represen-tation across different fonts. This is highlighted in the red circles(bottom right).
catenate per-glyph parameters to create a feature vector represent-ing a font: Fi = [Q‘a’,Q‘b’, ...Q‘z’]. In this representation each fontbecomes a point in high-dimensional feature space: Fi 2 RDfont ,where Dfont = 3998 (the number of parameters required to repre-sent all glyph parts in a font, after enforcement of continuity andtemplate part-sharing constraints). We expect this representation tobe redundant due to stylistic and structural similarities in fonts:glyphs in the same font will share stylistic elements, and glyphsthat correspond to the same letter will have similar stroke structure.We propose to learn these relationships by projecting all fonts toa lower-dimensional embedding. The main motivation behind thisis to create a space where important correlations between differentattributes are captured implicitly, and thus, any sample point X inthis representation will implicitly adhere to common design prin-ciples and font structures in the training data F . For this we needan embedding function M that projects a font to a low-dimensionalfeature space, i.e. M(F) = X , where X 2 RD, and an inverse func-tion C that can reconstruct a font from the embedding C(X) = F .To enforce learning correlations in the data we set the latent dimen-sionality D = aDfont = 39 with a = 0.01, which was determinedexperimentally to work best for our setup.
In addition to capturing important correlations, we want the em-bedding function to handle missing entries in the input feature vec-tors Fi because most fonts do not have glyphs with all possibletopologies (i.e., decorative elements and stroke structures). In addi-tion, in many applications in order to help the typeface designer weneed to be able to reason about the whole font before it has been
c� 2018 The Author(s)Computer Graphics Forum c� 2018 The Eurographics Association and John Wiley & Sons Ltd.
E. Balashova et al. / Learning A Stroke-Based Representation for Fonts
(a) (b) (c) (d)
Figure 5: We illustrate our skeleton fitting pipeline. Given an inputglyph, we first extract its skeleton (a). We then register this skele-ton to a template (b) to obtain an initial segmentation (c). We thenuse CRF-based segmentation to obtain the final consistent segmen-tation of the skeleton with respect to the template (d).
of the skeleton to the template, and the pairwise term favors cuts atsharp features and intersections. In particular, we optimize:
E = Âx2V
Wu ·U(x,L(x))+ Âx,y2V,L(x) 6=L(y)
Wb ·B(x,y), (5)
where the unary U(x, l) term is a penalty for assigning a point x alabel l and the binary term B(x,y) is a penalty for a pair of adjacentskeleton points to have different labels. These are formulated as:
U(x, l) = hsu (Dl(x)) (6)
B(x,y) =
(1�hsa (A(x,y)) , (x,y) 2 E ^ x /2 J0, otherwise
(7)
To define the unary term, we use Dl(x), a distance between thetemplate and the extracted skeleton after it is registered onto thetemplate skeleton using Coherent Point Drift method [MSCP⇤07],where the distance is measured to the lth skeleton segment. To de-fine the binary term we use the angle between the normals of adja-cent points A(x,y), where E denotes adjacency and J denotes junc-tions - points where more than two skeleton paths meet. Both termsare smoothed with the aforementioned Gaussian kernel hsigma, andwe set su = 0.25,sa =
p3,Wu = 1,Wb = 4.
After the skeleton is consistently segmented with the template,we parametrize every segment according to the template Tc(q), i.e.we fit a Bezier curve to match each stroke sc, f . Once the strokes ofthe glyph are fitted, we turn to initialize the outlines next.
5.3.2. Outline Initialization, Q0o
We represent the outline in the coordinate system of the extractedskeleton segment (see Figure 2). To create an initial outline esti-mate, we first generate a profile curve at a constant distance thatfits closely to the input outline for every stroke (i.e., such that theaverage distance between the curve and the closest point on the out-line is minimized) (see Figure 6). The resulting outline is typicallynot continuous since the curve parameters are estimated indepen-dently for each stroke. We enforce C0 continuity at junctions ofoutline segments by snapping the corresponding points to their av-erage position. Sometimes this averaged control point can flip sidesof the skeleton, which leads to inferior optimization performances.We detect these cases and project the average control point back tothe correct side of the corresponding skeleton segment.
6. Learning
Given a collection of fonts F = {F1,F2, ...,FN} we use our glyphfitting techniques to represent all letter glyphs consistently. We con-
Figure 6: Outline optimization. Given an input glyph and its cal-culated skeleton (top left), we use its segmented version (see Sec-tion 5.3.1) to generate uniform width outlines around it (top mid-dle). Our initial guess Oc(Q0) is the latter, after incorporating thetemplate’s connectivity constraints (top right). Our fitting proce-dure takes three considerations into account. Optimizing using onlyEcorr (see Section 5.1.1) can attract incompatible regions, as can beseen in the red box (bottom left). Considering normal directions(see Section 5.1.2) mitigates this issue (bottom middle). Finally,incorporating feature point alignment (see Section 5.1.3) is crucialfor a semantically meaningful fitting, inducing consistent represen-tation across different fonts. This is highlighted in the red circles(bottom right).
catenate per-glyph parameters to create a feature vector represent-ing a font: Fi = [Q‘a’,Q‘b’, ...Q‘z’]. In this representation each fontbecomes a point in high-dimensional feature space: Fi 2 RDfont ,where Dfont = 3998 (the number of parameters required to repre-sent all glyph parts in a font, after enforcement of continuity andtemplate part-sharing constraints). We expect this representation tobe redundant due to stylistic and structural similarities in fonts:glyphs in the same font will share stylistic elements, and glyphsthat correspond to the same letter will have similar stroke structure.We propose to learn these relationships by projecting all fonts toa lower-dimensional embedding. The main motivation behind thisis to create a space where important correlations between differentattributes are captured implicitly, and thus, any sample point X inthis representation will implicitly adhere to common design prin-ciples and font structures in the training data F . For this we needan embedding function M that projects a font to a low-dimensionalfeature space, i.e. M(F) = X , where X 2 RD, and an inverse func-tion C that can reconstruct a font from the embedding C(X) = F .To enforce learning correlations in the data we set the latent dimen-sionality D = aDfont = 39 with a = 0.01, which was determinedexperimentally to work best for our setup.
In addition to capturing important correlations, we want the em-bedding function to handle missing entries in the input feature vec-tors Fi because most fonts do not have glyphs with all possibletopologies (i.e., decorative elements and stroke structures). In addi-tion, in many applications in order to help the typeface designer weneed to be able to reason about the whole font before it has been
c� 2018 The Author(s)Computer Graphics Forum c� 2018 The Eurographics Association and John Wiley & Sons Ltd.
Initial Skeleton Registration
Initial Segmentation
Our Approach
Examples:
Skeleton OptimizationTemplate Fitting
Consistency:
E. Balashova et al. / Learning A Stroke-Based Representation for Fonts
(a) (b) (c) (d)
Figure 5: We illustrate our skeleton fitting pipeline. Given an inputglyph, we first extract its skeleton (a). We then register this skele-ton to a template (b) to obtain an initial segmentation (c). We thenuse CRF-based segmentation to obtain the final consistent segmen-tation of the skeleton with respect to the template (d).
of the skeleton to the template, and the pairwise term favors cuts atsharp features and intersections. In particular, we optimize:
E = Âx2V
Wu ·U(x,L(x))+ Âx,y2V,L(x) 6=L(y)
Wb ·B(x,y), (5)
where the unary U(x, l) term is a penalty for assigning a point x alabel l and the binary term B(x,y) is a penalty for a pair of adjacentskeleton points to have different labels. These are formulated as:
U(x, l) = hsu (Dl(x)) (6)
B(x,y) =
(1�hsa (A(x,y)) , (x,y) 2 E ^ x /2 J0, otherwise
(7)
To define the unary term, we use Dl(x), a distance between thetemplate and the extracted skeleton after it is registered onto thetemplate skeleton using Coherent Point Drift method [MSCP⇤07],where the distance is measured to the lth skeleton segment. To de-fine the binary term we use the angle between the normals of adja-cent points A(x,y), where E denotes adjacency and J denotes junc-tions - points where more than two skeleton paths meet. Both termsare smoothed with the aforementioned Gaussian kernel hsigma, andwe set su = 0.25,sa =
p3,Wu = 1,Wb = 4.
After the skeleton is consistently segmented with the template,we parametrize every segment according to the template Tc(q), i.e.we fit a Bezier curve to match each stroke sc, f . Once the strokes ofthe glyph are fitted, we turn to initialize the outlines next.
5.3.2. Outline Initialization, Q0o
We represent the outline in the coordinate system of the extractedskeleton segment (see Figure 2). To create an initial outline esti-mate, we first generate a profile curve at a constant distance thatfits closely to the input outline for every stroke (i.e., such that theaverage distance between the curve and the closest point on the out-line is minimized) (see Figure 6). The resulting outline is typicallynot continuous since the curve parameters are estimated indepen-dently for each stroke. We enforce C0 continuity at junctions ofoutline segments by snapping the corresponding points to their av-erage position. Sometimes this averaged control point can flip sidesof the skeleton, which leads to inferior optimization performances.We detect these cases and project the average control point back tothe correct side of the corresponding skeleton segment.
6. Learning
Given a collection of fonts F = {F1,F2, ...,FN} we use our glyphfitting techniques to represent all letter glyphs consistently. We con-
Figure 6: Outline optimization. Given an input glyph and its cal-culated skeleton (top left), we use its segmented version (see Sec-tion 5.3.1) to generate uniform width outlines around it (top mid-dle). Our initial guess Oc(Q0) is the latter, after incorporating thetemplate’s connectivity constraints (top right). Our fitting proce-dure takes three considerations into account. Optimizing using onlyEcorr (see Section 5.1.1) can attract incompatible regions, as can beseen in the red box (bottom left). Considering normal directions(see Section 5.1.2) mitigates this issue (bottom middle). Finally,incorporating feature point alignment (see Section 5.1.3) is crucialfor a semantically meaningful fitting, inducing consistent represen-tation across different fonts. This is highlighted in the red circles(bottom right).
catenate per-glyph parameters to create a feature vector represent-ing a font: Fi = [Q‘a’,Q‘b’, ...Q‘z’]. In this representation each fontbecomes a point in high-dimensional feature space: Fi 2 RDfont ,where Dfont = 3998 (the number of parameters required to repre-sent all glyph parts in a font, after enforcement of continuity andtemplate part-sharing constraints). We expect this representation tobe redundant due to stylistic and structural similarities in fonts:glyphs in the same font will share stylistic elements, and glyphsthat correspond to the same letter will have similar stroke structure.We propose to learn these relationships by projecting all fonts toa lower-dimensional embedding. The main motivation behind thisis to create a space where important correlations between differentattributes are captured implicitly, and thus, any sample point X inthis representation will implicitly adhere to common design prin-ciples and font structures in the training data F . For this we needan embedding function M that projects a font to a low-dimensionalfeature space, i.e. M(F) = X , where X 2 RD, and an inverse func-tion C that can reconstruct a font from the embedding C(X) = F .To enforce learning correlations in the data we set the latent dimen-sionality D = aDfont = 39 with a = 0.01, which was determinedexperimentally to work best for our setup.
In addition to capturing important correlations, we want the em-bedding function to handle missing entries in the input feature vec-tors Fi because most fonts do not have glyphs with all possibletopologies (i.e., decorative elements and stroke structures). In addi-tion, in many applications in order to help the typeface designer weneed to be able to reason about the whole font before it has been
c� 2018 The Author(s)Computer Graphics Forum c� 2018 The Eurographics Association and John Wiley & Sons Ltd.
E. Balashova et al. / Learning A Stroke-Based Representation for Fonts
(a) (b) (c) (d)
Figure 5: We illustrate our skeleton fitting pipeline. Given an inputglyph, we first extract its skeleton (a). We then register this skele-ton to a template (b) to obtain an initial segmentation (c). We thenuse CRF-based segmentation to obtain the final consistent segmen-tation of the skeleton with respect to the template (d).
of the skeleton to the template, and the pairwise term favors cuts atsharp features and intersections. In particular, we optimize:
E = Âx2V
Wu ·U(x,L(x))+ Âx,y2V,L(x) 6=L(y)
Wb ·B(x,y), (5)
where the unary U(x, l) term is a penalty for assigning a point x alabel l and the binary term B(x,y) is a penalty for a pair of adjacentskeleton points to have different labels. These are formulated as:
U(x, l) = hsu (Dl(x)) (6)
B(x,y) =
(1�hsa (A(x,y)) , (x,y) 2 E ^ x /2 J0, otherwise
(7)
To define the unary term, we use Dl(x), a distance between thetemplate and the extracted skeleton after it is registered onto thetemplate skeleton using Coherent Point Drift method [MSCP⇤07],where the distance is measured to the lth skeleton segment. To de-fine the binary term we use the angle between the normals of adja-cent points A(x,y), where E denotes adjacency and J denotes junc-tions - points where more than two skeleton paths meet. Both termsare smoothed with the aforementioned Gaussian kernel hsigma, andwe set su = 0.25,sa =
p3,Wu = 1,Wb = 4.
After the skeleton is consistently segmented with the template,we parametrize every segment according to the template Tc(q), i.e.we fit a Bezier curve to match each stroke sc, f . Once the strokes ofthe glyph are fitted, we turn to initialize the outlines next.
5.3.2. Outline Initialization, Q0o
We represent the outline in the coordinate system of the extractedskeleton segment (see Figure 2). To create an initial outline esti-mate, we first generate a profile curve at a constant distance thatfits closely to the input outline for every stroke (i.e., such that theaverage distance between the curve and the closest point on the out-line is minimized) (see Figure 6). The resulting outline is typicallynot continuous since the curve parameters are estimated indepen-dently for each stroke. We enforce C0 continuity at junctions ofoutline segments by snapping the corresponding points to their av-erage position. Sometimes this averaged control point can flip sidesof the skeleton, which leads to inferior optimization performances.We detect these cases and project the average control point back tothe correct side of the corresponding skeleton segment.
6. Learning
Given a collection of fonts F = {F1,F2, ...,FN} we use our glyphfitting techniques to represent all letter glyphs consistently. We con-
Figure 6: Outline optimization. Given an input glyph and its cal-culated skeleton (top left), we use its segmented version (see Sec-tion 5.3.1) to generate uniform width outlines around it (top mid-dle). Our initial guess Oc(Q0) is the latter, after incorporating thetemplate’s connectivity constraints (top right). Our fitting proce-dure takes three considerations into account. Optimizing using onlyEcorr (see Section 5.1.1) can attract incompatible regions, as can beseen in the red box (bottom left). Considering normal directions(see Section 5.1.2) mitigates this issue (bottom middle). Finally,incorporating feature point alignment (see Section 5.1.3) is crucialfor a semantically meaningful fitting, inducing consistent represen-tation across different fonts. This is highlighted in the red circles(bottom right).
catenate per-glyph parameters to create a feature vector represent-ing a font: Fi = [Q‘a’,Q‘b’, ...Q‘z’]. In this representation each fontbecomes a point in high-dimensional feature space: Fi 2 RDfont ,where Dfont = 3998 (the number of parameters required to repre-sent all glyph parts in a font, after enforcement of continuity andtemplate part-sharing constraints). We expect this representation tobe redundant due to stylistic and structural similarities in fonts:glyphs in the same font will share stylistic elements, and glyphsthat correspond to the same letter will have similar stroke structure.We propose to learn these relationships by projecting all fonts toa lower-dimensional embedding. The main motivation behind thisis to create a space where important correlations between differentattributes are captured implicitly, and thus, any sample point X inthis representation will implicitly adhere to common design prin-ciples and font structures in the training data F . For this we needan embedding function M that projects a font to a low-dimensionalfeature space, i.e. M(F) = X , where X 2 RD, and an inverse func-tion C that can reconstruct a font from the embedding C(X) = F .To enforce learning correlations in the data we set the latent dimen-sionality D = aDfont = 39 with a = 0.01, which was determinedexperimentally to work best for our setup.
In addition to capturing important correlations, we want the em-bedding function to handle missing entries in the input feature vec-tors Fi because most fonts do not have glyphs with all possibletopologies (i.e., decorative elements and stroke structures). In addi-tion, in many applications in order to help the typeface designer weneed to be able to reason about the whole font before it has been
c� 2018 The Author(s)Computer Graphics Forum c� 2018 The Eurographics Association and John Wiley & Sons Ltd.
Initial Skeleton Registration
Initial Segmentation Final Segmentation
Our Approach
Examples:
Outline OptimizationTemplate Fitting
Consistency: E = Ecorr + Eft
Our Approach
Examples:
Outline OptimizationTemplate Fitting
Consistency: E = Ecorr + Eft
Ecorr(Gc,f ,Θ) =∑
x∈P (Gc,f )
hσcorr(D(x,Oc(Θ))) +
∑
y∈P (Oc(Θ))
hσcorr(D(y,Gc,f ))
Ecorr(Gc,f ,Θ) =∑
x∈P (Gc,f )
hσcorr(D(x,Oc(Θ))) +
∑
y∈P (Oc(Θ))
hσcorr(D(y,Gc,f ))
Our Approach
Examples:
Outline OptimizationTemplate Fitting
Consistency: E = Ecorr + Eft
Ecorr(Gc,f ,Θ) =∑
x∈P (Gc,f )
hσcorr(D(x,Oc(Θ))) +
∑
y∈P (Oc(Θ))
hσcorr(D(y,Gc,f ))
Ecorr(Gc,f ,Θ) =∑
x∈P (Gc,f )
hσcorr(D(x,Oc(Θ))) +
∑
y∈P (Oc(Θ))
hσcorr(D(y,Gc,f ))
GT outline
templateparameters
densecurve
sampling
generatedoutline
Gaussian kernel
x = [qX , qY , wnnX(q), wnnY (q)]coordinates
normal
normal weight
Our Approach
Examples:
Outline OptimizationTemplate Fitting
Consistency: E = Ecorr + Eft
Eft(Gc,f ,Θ) =∑
x∈F (Oc(Θ))
Dcurve(x, Fτft(Gc,f ))
Our Approach
Examples:
Outline OptimizationTemplate Fitting
Consistency: E = Ecorr + Eft
Eft(Gc,f ,Θ) =∑
x∈F (Oc(Θ))
Dcurve(x, Fτft(Gc,f ))GT
outline
templateparameters
set offeature points
arc-length intrinsic distance
subset of junction points near
feature point
Our Approach
Consistency: Examples:
Outline OptimizationTemplate Fitting
Input Initial Width Initial Guess
Only Ecorr With Normal Reg With FeatureAlignment
Our Approach
Manifold Learning: ]
Consistency: Examples:
Our Approach
Manifold Learning: ]
Consistency: Examples:
f = [ ]fa1fa2NaN
...
Our Approach
Manifold Learning: ]
Consistency: Examples:
[Row98] EM-PCA
Y = CX + VTransformation
MatrixNoise
Latent coordinates
Input Vectors
Our Approach
Manifold Learning: ]
Consistency: Examples:
Learning Steps:
X = (CTC)−1
CTYE-step:
M-step:
[Row98] EM-PCA
Y = CX + V
Latent coordinates
TransformationMatrix
Input Vectors
Noise Cnew = Y X
T (XXT )−1
Our Approach
Manifold Learning: ]
Consistency: Examples:
Initial Fit
Retrieval
Completion
Manifold Learning:]
Consistency:Examples:
Iterative Improvement:
T: com 43-40r 5-18
Iterative Improvement:
Manifold Guess
Improved Fit
Fitting Results
Fitting Results
0 1 2 3 4 5 6 7 8 9 10Error (px)
0
20
40
60
80
100
Pe
rce
ntil
e
ab
cd
ef
gh
ij
kl
mn
op
qr
st
uvwxyz
0.3 px
Fitting Results
0 1 2 3 4 5 6 7 8 9 10Error (px)
0
20
40
60
80
100
Perc
entil
e
ab
cd
ef
gh
ij
kl
mn
op
qr
st
uvwxyz
Fitting Results
0 1 2 3 4 5 6 7 8 9 10Error (px)
0
20
40
60
80
100
Perc
entil
e
ab
cd
ef
gh
ij
kl
mn
op
qr
st
uvwxyz
Iterative Improvement Evaluation
Iterative Improvement Evaluation
Retrieval
Completion
Manifold Learning:]
Consistency:Examples:
Iterative Improvement:
T: com 43-40r 5-18
Manifold Guess Improved FitInitial Fit
Iterative Improvement Evaluation
Start 1 2 3 4 5
Iteration
0.58
0.6
0.62
0.64
0.66
0.68
0.7
0.72
% o
f G
lyphs
Belo
wLearn
ing T
hre
shold
Iterative Improvement Evaluation
Start 1 2 3 4 5
Iteration
1.1
1.2
1.3
1.4
1.5
Me
an
Err
or
Comparison to Non-Part-Aware Approach
Comparison to Non-Part-Aware Approach
Part-Based Representation
Comparison to Non-Part-Aware Approach
Part-Based Representation
Single Contour Representation[CK14]
Part-Based Representation
Single Contour Representation[CK14]
Comparison to Non-Part-Aware Approach
Part-Based Representation
Raster-Based
Part-Based Representation
Raster-Based
Comparison to Non-Part-Aware Approach
Part-Based Representation
Single Contour Representation[CK14]
Raster-Based
Part-Based Representation
Single Contour Representation[CK14]
Raster-Based
Generative Model Comparison
Generative Model Comparison
EM-PCA
Denoising VAE
Vanilla VAE
EM-PCA
Denoising VAE
Vanilla VAE
Generative Model Comparison
EM-PCA
Denoising VAE
EM-PCA
Denoising VAE
Applications
Style Completion
Style Completion: Light Fonts
MerriweatherSans-Light
GT
Eval
.
Input Prediction
GT
Eval
.
Input Prediction
Exo-Light
Style Completion: Regular/Bold Fonts
GT
Eval
.
Input Prediction
Nobile-BoldItalic
GT
Eval
.
Input Prediction
DoppioOne-Regular
Style Completion: Italic Fonts
GT
Eval
.
Input Prediction
Nobile-BoldItalic
GT
Eval
.
Input Prediction
MinionPro-It
Style Completion: Serif Fonts
GT
Eval
.
Input Prediction
Amethysta-Regular
GT
Eval
.
Input Prediction
Oldenburg-Regular
Style Completion: Serif Fonts
GT
Eval
.
Input Prediction
Neuton Light
GT
Eval
.
Input Prediction
Neuton Regular
Style Completion
Style Completion
Style Completion
Style Completion
Style Completion
Style Completion
Style Completion
Style Completion
Style Completion
Style Completion
Style Completion
Style Completion
Style Completion
Style Completion
Style Completion
Style Completion
Style Completion
Style Completion
Style Completion
Style Completion
Style Completion
Style Completion
Style Completion
Style Completion
Style Completion
Style Completion
Font Retrieval
Font RetrievalQuery Font: Ubuntu-Light
Nearest Neighbor: Ubuntu
Farthest Font: Dosis-ExtraLight
Topology-Aware RetrievalQuery Font: Ubuntu-Light
Nearest Neighbor: Ubuntu
Farthest Font: Dosis-ExtraLight
Nearest Serifed Font: AnticSlab-Regular
Topology-Aware RetrievalQuery Font: Ubuntu-Light
Nearest Neighbor: Ubuntu
Farthest Font: Dosis-ExtraLight
Nearest Serifed Font: AnticSlab-Regular
Nearest Font With Topology 1 of a: Inder-Regular
Nearest Font With Topology 2 of a: Ubuntu
Limitations
Limitations
Limitations
[https://design.google]
Limitations
[https://inventingsituations.net]
Future Work
Future Work
[https://www.theatlantic.com]
Future Work
[https://www.123rf.com/]
Acknowledgments
• Princeton Graphics and Vision groups, Ariel Shamir, anonymous reviewers
• Princeton CS Staff (Asya Dvorkin)
• Support: National Science Foundation Graduate Fellowship (NSF GRFP),NSF CRI 1729971 and Adobe
Thank You!