apermutationtestfortheregression kink design

43
A Permutation Test for the Regression Kink Design Peter Ganong and Simon Jäger The Regression Kink (RK) design is an increasingly popular empirical method for estimat- ing causal eects of policies, such as the eect of unemployment benefits on unemployment duration. Using simulation studies based on data from existing RK designs, we document empirically that the statistical significance of RK estimators based on conventional stan- dard errors can be spurious. In the simulations, false positives arise as a consequence of nonlinearities in the underlying relationship between the outcome and the assignment vari- able, confirming concerns about misspecification bias of discontinuity estimators pointed out by Calonico et al. (2014b). As a complement to standard RK inference, we propose that researchers construct a distribution of placebo estimates in regions with and without a policy kink and use this distribution to gauge statistical significance. Under the assumption that the location of the kink point is random, this permutation test has exact size in finite samples for testing a sharp null hypothesis of no eect of the policy on the outcome. We implement simulation studies based on existing RK applications, which estimate the eect of unemployment benefits on unemployment duration, and show that our permutation test as well as estimators proposed in Calonico et al. (2014b) improve upon the size of standard approaches, while having enough power to detect an eect of unemployment benefits on unemployment duration. KEY WORDS: Randomization Inference; Permutation Test; Policy Discontinuities. We thank Alberto Abadie, Josh Angrist, David Card, Matias Cattaneo, Avi Feller, Edward Glaeser, Paul Goldsmith-Pinkham, Guido Imbens, Maximilian Kasy, Zhuan Pei, Mikkel Plagborg- Møller, and Guillaume Pouliot as well as participants at Harvard University’s Research in Statistics Seminar and the Research in Econometrics Seminar for helpful comments and discussions. We are especially thankful to Gary Chamberlain, Raj Chetty and Larry Katz for guidance and suggestions. We thank Andrea Weber and Camille Landais for sharing supplemental figures based on administra- tive UI data. We thank Harvard’s Lab for Economic Applications and Policy for financial support and Thorben Wölk, Shawn Storm, and Cody Tuttle for excellent research assistance. Harvard University, email: [email protected] Harvard University and IZA, email: [email protected] 1

Upload: others

Post on 07-May-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: APermutationTestfortheRegression Kink Design

A Permutation Test for the Regression

Kink Design

Peter Ganong

†and Simon Jäger

The Regression Kink (RK) design is an increasingly popular empirical method for estimat-ing causal effects of policies, such as the effect of unemployment benefits on unemploymentduration. Using simulation studies based on data from existing RK designs, we documentempirically that the statistical significance of RK estimators based on conventional stan-dard errors can be spurious. In the simulations, false positives arise as a consequence ofnonlinearities in the underlying relationship between the outcome and the assignment vari-able, confirming concerns about misspecification bias of discontinuity estimators pointedout by Calonico et al. (2014b). As a complement to standard RK inference, we proposethat researchers construct a distribution of placebo estimates in regions with and without apolicy kink and use this distribution to gauge statistical significance. Under the assumptionthat the location of the kink point is random, this permutation test has exact size in finitesamples for testing a sharp null hypothesis of no effect of the policy on the outcome. Weimplement simulation studies based on existing RK applications, which estimate the effectof unemployment benefits on unemployment duration, and show that our permutation testas well as estimators proposed in Calonico et al. (2014b) improve upon the size of standardapproaches, while having enough power to detect an effect of unemployment benefits onunemployment duration.

KEY WORDS: Randomization Inference; Permutation Test; Policy Discontinuities.

∗We thank Alberto Abadie, Josh Angrist, David Card, Matias Cattaneo, Avi Feller, EdwardGlaeser, Paul Goldsmith-Pinkham, Guido Imbens, Maximilian Kasy, Zhuan Pei, Mikkel Plagborg-Møller, and Guillaume Pouliot as well as participants at Harvard University’s Research in StatisticsSeminar and the Research in Econometrics Seminar for helpful comments and discussions. We areespecially thankful to Gary Chamberlain, Raj Chetty and Larry Katz for guidance and suggestions.We thank Andrea Weber and Camille Landais for sharing supplemental figures based on administra-tive UI data. We thank Harvard’s Lab for Economic Applications and Policy for financial supportand Thorben Wölk, Shawn Storm, and Cody Tuttle for excellent research assistance.

†Harvard University, email: [email protected]‡Harvard University and IZA, email: [email protected]

1

Page 2: APermutationTestfortheRegression Kink Design

1 Introduction

We develop a permutation test for Regression Kink (RK) designs which rely on an

identification principle analogous to the one underlying the better-known Regression

Discontinuity designs. Regression Discontinuity designs estimate the change in the

level of an outcome Y at the threshold level of the assignment variable V at which

the level of the policy changes discontinuously (see, e.g., Thistlethwaite and Camp-

bell (1960), Imbens and Lemieux, 2008). RK designs exploit discontinuous changes

in the slope of a policy B at a specific level of the assignment variable and assess

whether there is also a discontinuous change in the slope of the outcome variable. By

comparing the ratio of the slope change in the outcome variable to the slope change

in the policy variable at the kink point, the RK design recovers a causal effect of the

policy on the outcome at the kink point. This is again analogous to RD designs which

calculate the ratio of changes in the level of the outcome to the change in the level of

treatment at the discontinuity. The parameter of interest that the slope change at the

kink point identifies is the average effect of increasing the policy conditional on the

level of the assignment variable at the kink point. Key identification and inference

results for the RK design were derived in Nielsen, Sørensen, and Taber (2010), Card,

Lee, Pei and Weber (2015, CLPW in the following), and Calonico, Cattaneo, and

Titiunik (2014, CCT in the following).

This article discusses the proposed permutation test for the RK design, its under-

lying assumptions and implementation. For motivation, we begin by describing the

regression kink design based on an example from a growing body of literature that

implements the RK design to estimate the causal effect of unemployment benefits B

on an outcome Y such as unemployment duration or employment (see, Britto, 2015,

CLPW, Card et al., 2015a, Kyyrä and Pesola, 2015, Landais, 2015, Sovago, 2015 and

Kolsrud, Landais, Nilsson and Spinnewijn 2015, KLNS in the following). Even though

crucial for policy design, the question of whether unemployed individuals stay out of

2

Page 3: APermutationTestfortheRegression Kink Design

work for longer if they receive more generous benefits is hard to address in the absence

of a randomized experiment. RK studies of unemployment insurance aim to fill this

gap by exploiting the fact that many unemployment insurance systems pay out ben-

efits B that rise linearly with prior income V —up to a benefit cap B for individuals

earning above a reference income V , the kink point. In such a schedule, the slope

between the policy variable B and the assignment variable V changes discontinuously

at V . To study the effect of benefits B on unemployment duration, researchers esti-

mate by how much the slope of the outcome variable Y — unemployment duration —

with respect to the assignment variable changes discontinuously at such kink points.

Intuitively, if unemployment benefits have no impact on unemployment duration, one

would not expect to see discontinuous changes in the slope of unemployment duration

Y and prior income V at a kink point V . However, if unemployment benefits do deter

individuals from finding employment, then one would expect discontinuous changes in

the slope of unemployment duration with respect to prior income at kink points that

depend on the strength of the causal effect of benefits B on unemployment duration

Y . Section 2 provides a more detailed review of the RK design and key identification

results.

While the RK design has become increasingly popular, RK estimators as typically

implemented may suffer from non-negligible misspecification bias and consequently

too narrow confidence intervals (CCT). In most applications of the RK design, re-

searchers use local linear or quadratic estimators to estimate the slope change at the

kink and choose bandwidths with the goal of minimizing mean squared error (Fan

and Gijbels, 1996). Table A.1 in the Appendix shows an overview of 43 RK stud-

ies and that the vast majority of them use a linear or quadratic specification. In

these specifications, misspecification bias can arise as a consequence of non-linearity

in the conditional expectation function. To see how, consider Figure 1 which displays

data with a piecewise linear data generating process (DGP) featuring a kink and a

3

Page 4: APermutationTestfortheRegression Kink Design

quadratic DGP with no kink. For both DGPs, local linear estimators that are com-

mon in the RK literature indicate statistically highly significant slope changes at the

kink, even though the quadratic DGP does not feature a discontinuous slope change.

CCT show that such misspecification bias is non-negligible with standard bandwidth

selectors and leads to poor empirical coverage of the resulting confidence intervals.

As a remedy, CCT develop an alternative estimation and inference approach for RD

and RK designs based on a bias-correction of the estimators and a new standard error

estimator that reflects the bias correction.

To provide a complement and robustness check to existing RK inference, we pro-

pose a permutation test which treats the location of the kink point V as random

and has exact size in finite samples. The key assumption underlying our test is that

the location of the policy kink point is randomly drawn from a set of potential kink

locations. This assumption is rooted in a thought experiment in which the data are

taken as given and only the location of the kink point V is thought of as a random

variable. In many RK contexts, this assumption is appealing because the kink point’s

location is typically not chosen based on features of the DGP and in some cases—such

as kinks in many unemployment insurance schedules—is determined as the outcome

of a stochastic process (see Section 3). Under the null hypothesis that the policy has

no effect on the outcome and the assumption that the location of the policy kink

is randomly drawn from a specified support, the distribution of placebo estimates

provides an exact null distribution for the test statistic at the policy kink. We prove

that the permutation test controls size exactly in finite samples.

We implement simulation studies to compare the size and power of our permuta-

tion test to that of standard RK inference as well as the RK estimators and confidence

intervals developed in CCT. Specifically, we simulate data based on estimated DGPs

from existing RK applications aimed at estimating the effect of unemployment ben-

efits on unemployment duration in Austria and Sweden (Card et al., 2015b, Kolsrud

4

Page 5: APermutationTestfortheRegression Kink Design

1015

2025

30U

nem

ploy

men

t Dur

atio

n

-500 0 500Prior Income Relative to Kink

Y (Linear DGP with Kink)Y (Quad. DGP, no Kink)

τRKD = -.016, SE: (.0015)

1015

2025

30U

nem

ploy

men

t Dur

atio

n

-500 0 500Prior Income Relative to Kink

Y (Linear DGP with Kink)Linear RK Estimator

τRKD = -.016, SE: (.0014)

1015

2025

30U

nem

ploy

men

t Dur

atio

n

-500 0 500Prior Income Relative to Kink

Y (Quad. DGP, no Kink)Linear RK Estimator

Figure 1: Piecewise Linear and Quadratic Simulated DGPs. The data generating process (DGP) iseither linear with a kink (blue dots) or quadratic (red diamonds) without a kink. We generate 1000observations with a variance of 12 and plot the data in bins based on the approach in Calonico et al.(2014a, 2015). We estimate a linear Regression Kink model with heteroskedasticity-robust standarderrors. The top panel shows the relationship between the outcome variable and the running variablefor both the piecewise linear and the quadratic DGP. In the middle panel, we display the data forthe piecewise linear DGP and add the predictions from a local linear model. The bottom panelshows predictions from a linear RK model estimated on the quadratic DGP which features no kink.Significance levels are reported based on heteroskedasticity-robust standard errors following CLPW.

5

Page 6: APermutationTestfortheRegression Kink Design

et al., 2015). These Monte Carlo simulations document that asymptotic inference and

standard bandwidth choice procedures (Fan and Gijbels, 1996) lead to over-rejection

of the null hypothesis; RK estimates are statistically significant using asymptotic

methods even when the kink is fact zero. Further simulation studies of linear and

non-linear data-generating processes document that asymptotic inference and stan-

dard bandwidth choice procedures are particularly problematic—as evidenced by high

type I error rates—when the relationship between outcome and assignment variable

is non-linear. In contrast, CCT and the permutation test maintain actual size close

to nominal size.

The simulation studies further reveal that both the permutation test and the

estimator based on CCT have enough power to reject the null hypothesis based on

the effect sizes in KLNS, i.e. they reject that unemployment benefits do not affect

unemployment duration. In the simulation studies, we also document examples of

settings in which the permutation test fails to detect non-zero kinks, which can occur

when the relationship between the outcome and the assignment variable is sufficiently

non-linear relative to the magnitude of the kink. In additional simulation studies

based on linear and non-linear data-generating processes, we document that there is

a spectrum where asymptotic inference substantially over-rejects, CCT does better

on size and worse on power and the permutation test performs best on size but worst

on power.

Our permutation test also serves as a complement to existing inference methods for

Regression Discontinuity (RD) designs. In particular, Cattaneo et al. (2015) develop

a randomization inference approach for RD designs based on an interpretation of

RD designs as local randomized experiments in a narrow window around the RD

cutoff. While the procedure developed by Cattaneo et al. (2015) treats the locations

of observations above and below the cutoff as random within a narrow window and

also proposes a method for the selection of the window in which the randomization

6

Page 7: APermutationTestfortheRegression Kink Design

assumption holds, our test offers a complementary approach by treating the location

of the kink point or cutoff itself as random.

By drawing on randomization inference, the permutation test we propose builds

on a long tradition in the statistics literature (Fisher, 1935; Lehmann and Stein,

1949; Welch and Gutierrez, 1988; Welch, 1990; Rosenbaum, 2001; Ho and Imai, 2006,

see Rosenbaum, 2002, for an introduction), which has recently seen renewed interest

from econometricians (see, for instance, Bertrand, Duflo, and Mullainathan, 2004;

Imbens and Rosenbaum, 2005; Chetty, Looney, and Kroft, 2009; Abadie, Diamond,

and Hainmueller, 2010; Abadie, Athey, Imbens, and Wooldridge, 2014; Cattaneo,

Frandsen, and Titiunik, 2015). Our approach generalizes a suggestion by Imbens and

Lemieux (2008) for the RD design, “testing for a zero effect in settings where it is

known that the effect should be 0”, to the RK design by estimating slope changes in

regions where there is no change in the slope of the policy. Engström et al. (2015)

and Gelman and Imbens (2014) pursue related placebo analyses. Our paper builds

on CCT in developing new methods to deal with misspecification bias in RK and RD

designs. In related work, Landais (2015) proposes an alternative way of gauging the

robustness in RK estimates by constructing difference-in-differences RK estimates

which are based on the same kink point but using data from time periods with

and without the presence of an actual policy kink. Ando (2013) uses Monte Carlo

simulations to argue that linear RK estimates are biased in the presence of plausible

amounts of curvature.

2 Notation and Review

2.1 Identification in the Regression Kink Design

The following section reviews key results and notation for the RK design based on

7

Page 8: APermutationTestfortheRegression Kink Design

CLPW and CCT and illustrates the method based on RK designs aimed at estimating

the causal effect of unemployment benefits B on an outcome Y unemployment dura-

tion, by exploiting kinks in the unemployment insurance schedule (see Britto, 2015,

Card et al., 2015a, CLPW, Kolsrud et al., 2015, Kyyrä and Pesola, 2015, Landais,

2015, and Sovago, 2015).

Formally, the outcome Y is modeled as

Y = y(B, V, U) (1)

where B denotes the policy variable, V denotes a running variable, here thought of

as prior income, which determines the assignment of B, and U denotes an error term.

The parameter of interest is the effect of the policy variable, B, on the outcome Y .

Analogous to treatment effects for binary treatments, defined as y(1, V, U)�y(0, V, U)

in a potential outcomes framework, the treatment parameter that RK design intend

to estimate is the marginal effect of increasing the level of the policy B on the outcome

Y , i.e.dy(B, V, U)

dB

(2)

Integrating this marginal effect over the distribution of U conditional on B = b

and V = v leads to the “treatment on the treated” parameter in Florens, Heckman,

Meghir, and Vytlacil (2008):

TT

B=b,V=v

=

ˆ@y(b, v, u)

@b

dF

U |B=b,V=v

(u) (3)

where F

U |B=b,V=v

denotes the conditional c.d.f. of the error term U . In the

context of unemployment benefits, this corresponds to the average effect of marginally

increasing unemployment benefits on unemployment duration for individuals with

unemployment benefits B = b and prior income V = v.

The key feature that RK designs exploit is a discrete slope change in the assign-

8

Page 9: APermutationTestfortheRegression Kink Design

ment mechanism of the policy. Let B = b(V ) denote the policy function. In many

unemployment systems, benefits B rise linearly with the prior income V that an in-

dividual earned before becoming unemployed. A maximum level of benefits is also

typically offered for individuals earning above a higher reference income V . This im-

plies that the slope between the policy variable B (unemployment benefits) and the

assignment variable V (prior income) changes discontinuously at V when the prior

income rises above the reference income. To illustrate, the top panel of Figure 2 shows

plots of unemployment benefits plotted against earnings in the previous year based

on Austrian UI data (CLPW). The benefit schedule or policy function B = b(V ) can

be simply described as follows:

@b(V )

@v

=

8>><

>>:

1

, v < V

2

, V < v

(4)

where ↵

1

6= ↵

2

and limv" ¯V db(v)/dv = ↵

1

and limv# ¯V db(v)/dv = ↵

2

.

Researchers can exploit the discrete change in the policy function b(V ) at the

kink point to identify the marginal effect of the policy. Intuitively, if unemployment

benefits have no impact on unemployment duration, one would not expect to see

discontinuous changes in the slope of unemployment duration and prior income at

kink points. However, if unemployment benefits do deter individuals from going back

to work, then one would expect discontinuous changes in the slope of unemployment

duration with respect to prior income at kink points. At the kink, one would expect a

positive slope change as individuals above the reference income receive more generous

benefits and consequently stay unemployed for longer. To study the effect of benefits

B on unemployment duration, researchers can then estimate by how much the slope of

the outcome variable Y — unemployment duration — with respect to the assignment

variable changes discontinuously at such kink points. To illustrate, the bottom panel

in Figure 2 plots a measure of unemployment duration Y against the running variable

9

Page 10: APermutationTestfortheRegression Kink Design

Slope+ - Slope- = 2.22222

.523

23.5

2424

.5Av

erag

e D

aily

UI B

enefi

t

-1800 -900 0 900 1800Base Year Earnings Relative to T-min

First Stage -- UI Benefit

Elasticity of Dur w.r.t. Bens:= 3.8/2.2 = 1.7Slope+ - Slope- = 3.8

4.5

4.55

4.6

4.65

4.7

Log(

Dur

atio

n)

-1800 -900 0 900 1800Base Year Earnings Relative to T-min

Reduced Form -- Job Search Duration

Figure 2: RK Example: UI Benefits in Austria. Notes: The figures are based on Card et al. (2015b).T-min refers to the earnings threshold at which benefits start to rise. Bin number chosen based onthe approach in Calonico et al. (2014a, 2015).

10

Page 11: APermutationTestfortheRegression Kink Design

V , earnings in the previous year, again based on Austrian UI data (CLPW) and

documents an apparent slope change at the kink point.

2.2 Estimation and Identification

The RK estimator, ⌧RK

, is defined in the population as the change in the slope of the

outcome variable at the kink point normalized by the slope change in the policy at

the kink point:

RK

⌘ lim

v#V dE(Y |V=v)/dv�lim

v"V dE(Y |V=v)/dv

lim

v#V db(v)/dv�lim

v"V db(v)/dv

(5)

In the example of unemployment benefits, the denominator of this expression,

i.e. the slope change in the policy variable at the reference income, corresponds to

limv# ¯V dE(Y |V = v)/dv � lim

v" ¯V dE(Y |V = v)/dv = ↵

2

� ↵

1

. This is analogous to

the denominator in fuzzy RD designs which scales up the difference in the level of the

outcome variable at the discontinuity by the difference in the level of the treatment

at the discontinuity.

CLPW prove that this RK estimator in (5) identifies the “treatment on the treated”

parameter in (3) (Florens et al., 2008) for individuals at the kink point under mild

regularity conditions, in particular an assumption of smoothness of y, so that:

RK

=

ˆ@y(b, v, u)

@b

dF

U |B=b,V=V

(u). (6)

Local polynomial regression techniques are used for estimation of ⌧RK

(Fan and

Gijbels, 1996). The data is split into two subsamples to the left and right of the

kink point (denoted by + and -, respectively) and a local polynomial regression is

11

Page 12: APermutationTestfortheRegression Kink Design

estimated separately for each subsample. For the sharp RK design, in which the slope

change in the policy at the kink point is known, this amounts to solving the following

least squares problem in the sample:

min{��

j

}

PN

i=1

{Y �i

�P

p

j=0

�j

(V �i

)j}2K⇣

V

�i

h

min{�+

j

}

PN

+

i=1

{Y +

i

�P

p

j=0

+

j

(V +

i

)j}2K⇣

V

+i

h

subject to �

�0

= �

+

0

p

RK

⌘ (�+

1

� �

�1

)/(↵2

� ↵

1

) (7)

Here, p denotes the order of the polynomial, K the kernel function, and h the band-

width used for estimation. In the literature, the bandwidth is typically chosen based

on the formula in Fan and Gijbels or the procedure in CCT. The denominator of the

left-hand side of equation (7) is identified as �+

1

��

�1

. The papers in the RK literature

have primarily adopted a uniform kernel as choice of K and overwhelming use local

linear and quadratic specifications.

2.3 Asymptotic Bias

A potential problem of RK designs is that non-linearities of the conditional expecta-

tion function E[Y |V = v] can generate bias in the estimator ⌧

P

RK

. Panel 3 of Figure

1 illustrates the intuition of this result as curvature of the conditional expectation

function generates bias of linear RK estimators. A formal argument supporting this

intuition follows from CCT who derive a general formula for the asymptotic bias of

RK and RD estimators. Based on the general formula, the asymptotic misspecifica-

tion bias of local linear RK estimators is proportional to (m(2)

+

+ m

(2)

� )h where h is

the bandwidth and the terms m

(j)

+

and m

(j)

� denote the limits of the j

th derivative

12

Page 13: APermutationTestfortheRegression Kink Design

of m(v) ⌘ E[Y |V = v] from above and below at the kink. A similar expression can

be derived for local quadratic estimators for which first-order bias is proportional to

third-order terms of the conditional mean function. CCT prove that such misspecifi-

cation bias is non-negligible with standard bandwidth selectors and, as a consequence,

leads to poor empirical coverage of the resulting confidence intervals.

3 A Permutation Test for the Regression Kink De-

sign

3.1 The Thought Experiment

We propose a simple permutation test to assess the null hypothesis that treatment

has no effect on the outcome of interest. At the core of our test is the assumption that

the location of the policy kink can be considered as randomly drawn from a known

set of placebo kink points. This assumption needs to be evaluated in the context of

the specific research design under scrutiny. We describe a method for how researchers

can estimate a distribution of placebo kink points in the context of unemployment

insurance systems. In this interval, we can reassign the location of the kink and

calculate RK estimates, ⌧ pRK

, at these placebo kinks. The permutation test assesses

the extremeness of the estimated change in the slope at the kink point relative to

estimated slope changes at non-kink points under the null hypothesis that the policy

does not affect the outcome.

The thought experiment underlying this permutation test and randomization in-

ference more generally is different from the one underlying asymptotic inference.

Whereas the idea underlying asymptotic inference is one of sampling observations

from a large population, the thought experiment in randomization inference is based

on a fixed population that the researcher observes in the data, with the realizations

13

Page 14: APermutationTestfortheRegression Kink Design

of the running variable v and the outcome variable y, in which the assignment of

treatment is sampled repeatedly. In the latter approach, treatment assignment is

thought of as the random variable. Our test therefore does not treat the sample as

being drawn from a (super) population for which we seek inference but rather takes

the observed sample as given and tests hypotheses regarding this particular sample,

treating the location of the policy kink as a random variable.

3.2 The Permutation Test Statistic

This subsection introduces the reduced form RK estimator as the test statistic for the

permutation test, which can be easily adjusted to correspond to researchers’ modeling

choices in a given RK application. We let y denote the vector of yi

values, v denote

the vector of vi

realizations and k denote a potential kink point, with a policy kink

featuring a discontinuous slope change in the policy or a placebo kink not featuring

such a discontinuous slope change. The data are a vector of n observations each

with (yi

, v

i

, b(vi

)) denoting outcome, running variable and policy variable: in the

context of using the RK design to estimate the effect of unemployment benefits on

unemployment duration, these would correspond to unemployment duration, prior

income, and unemployment benefits, respectively.

For notational tractability and expositional clarity, our exposition pertains to the

linear RK model with a uniform kernel. This can be easily generalized to higher-order

polynomials and other kernels. Define the matrix

vk ⌘ v(k) (8)

0

BBBB@

1 (v1

� k) (v1

� k)1(v1

� k)

......

...

1 (vn

� k) (vn

� k)1(vn

� k)

1

CCCCA.

14

Page 15: APermutationTestfortheRegression Kink Design

We define the test statistic for the slope change at the potential kink point k as

T (v,y, k) ⌘ ( 0 0 1)0�vk 0vk

��1

vk 0y, (9)

|vi

� k| h(v,y, k),

where h(v,y, k) denotes the bandwidth used for estimation. This test statistic corre-

sponds to the reduced form of a linear RK estimator. At the true kink point, which

we label k⇤, this estimator—scaled up by the slope change at the policy—identifies

the causal effect of the policy on the “treated”,´

@y(b,v,u)

@b

dF

U |B=b(k

⇤),V=k

⇤,

(u), under the

assumptions laid out in CLPW. We can calculate the test statistic T (v,y, k) at the

true policy kink point k⇤, T (v,y, k⇤), and at other points k 2 [vmin

, v

max

] in the range

of v.

Modeling choices. The test statistic used for the permutation test should corre-

spond to the RK estimator and preferred modeling choices, including bandwidth (or a

bandwidth selection mechanism), polynomial order and bias-correction, implemented

by the researcher for the RK estimator at the actual policy kink. The permutation test

approach can be easily generalized to incorporate alternative RK estimators, poly-

nomial orders, and bandwidth choices. It also can be easily applied to a Regression

Discontinuity application. The Monte Carlo studies we present in Section 4 provide

some guidance for the modeling choices and suggest that estimators based on the

procedure in CCT perform relatively well compared to local polynomial estimators

with bandwidth choice based on Fan and Gijbels (1996).

3.3 The Randomization Assumption

The core assumption underlying our permutation test is that the location of the policy

kink point k

⇤ can be thought of as being randomly drawn:

Assumption: Random Kink Placement. k

⇤ is a realization of a random variable

15

Page 16: APermutationTestfortheRegression Kink Design

K distributed according to a known distribution P .

The assumption that the policy kink location can be thought of as being randomly

drawn is a strong one but natural in the context of many RK designs. It would be

violated if, for instance, policy-makers had chosen a kink location explicitly or im-

plicitly in response to the shape of the conditional expectation function E[Y |V ], e.g.,

at a location where curvature is particularly high or low. For example, KLNS study

an example where the benefit cap was removed beginning in 2001. If policymakers

removed the cap specifically because they observed curvature in unemployment du-

ration near the policy kink, then that would contaminate this analysis. However, the

institutional setup in KLNS makes this type of violation unlikely. We discuss four

implementable strategies for researchers to identify P .

1. Estimation of stochastic process based on institutional features. In the example

of estimating the causal effect of unemployment benefits on unemployment duration ,

researchers implementing the permutation test for the RK design can exploit features

of many unemployment insurance systems to directly estimate the distribution P .

In many unemployment insurance systems, the location of the kink point at which

benefits are capped is determined as a consequence of past aggregate wage growth

in the economy. For instance, in Austria—the setting of the study by CLPW—

the earnings ceiling in the unemployment insurance system changes as a function of

aggregate wage growth from the third to the second previous calendar year (§ 108

Allgemeines Sozialversicherungsgesetz ). Therefore, data on past wage growth can

be used to directly estimate the properties of the stochastic process that determines

the realization of k⇤ in a given year or researchers can directly use the distribution

of past changes in the location of the kink point as a set of potential kink locations

for implementing the permutation test. In Section 4 and Appendix Table A.3, we

describe how we draw from the distribution of realized kink location changes in the

context of the Austrian unemployment insurance system.

16

Page 17: APermutationTestfortheRegression Kink Design

If directly estimating the stochastic process determining k

⇤ is infeasible, then there

is a class of alternative selection mechanisms which involve a discretized set of kinks

on range [v, v].

2. Documentary evidence on rule-making. Researchers can still proxy P by draw-

ing on information on the institutional environment of the relevant RK application.

In the spirit of randomization inference, K should correspond to the actual range

of proposals for kinks that could have been adopted. If, for example, several policy

proposals existed in a political debate regarding the choice of a reference income V

in the example of unemployment insurance we discussed, researchers could use the

discretized range [v, v] that includes all of these proposals. For instance, prior to

switching to a system of automatic updates of the earnings ceiling based on aggre-

gate wage growth in 1969, the German Bundestag adjusted the earnings ceiling in a

discretionary fashion so that the minutes of plenary proceedings can be used to gauge

the range of discussed proposals.

3. Local randomization neighborhood (Cattaneo et al., 2015). In the context of

developing a randomization inference approach for the RD design treating observa-

tions as randomly assigned, Cattaneo et al. (2015) design a data-driven procedure to

select a window around an RD cutoff based on balance tests of pre-treatment covari-

ates in which treatment status is arguably as good as randomly assigned. A natural

extension of their procedure is to treat the location of the cutoff or kink as randomly

assigned within this window.

4. Range of available data. As a final benchmark, we suggest that researchers

consider the whole range of available data [vmin

, v

max

] and treat the empirical dis-

tribution of V as the distribution of K. This follows the approach in Section 4 of

Gelman and Imbens (2014) for selecting pseudo-thresholds in the context of evaluat-

ing RD designs. In the context of using the RK design to estimate the causal effect of

unemployment benefits on unemployment duration this approach is natural because

17

Page 18: APermutationTestfortheRegression Kink Design

there are a wide range of policy kink locations in practice. For example the maximum

weekly UI benefit is a direct function of prior income varies across US states from

$235 to $698.

3.4 Exact Size For Testing the Null Hypothesis of Policy Irrel-

evance

The goal of our permutation test is to assess whether the data reject the null hy-

pothesis that the policy does not affect outcomes. We formalize this as a sharp null

hypothesis where B and V denote the range of the policy and assignment variable,

respectively:

Null Hypothesis: Policy Irrelevance. The policy does not affect outcomes at any

v: dy(b,v,U)

db

= 0, 8b 2 B, 8v 2 V .

Note that this hypothesis implies that in the range the policy is irrelevant: y(b1

, v, U) =

y(b2

, v, U), 8b1

, b

2

2 B, 8v 2 V . Under the Policy Irrelevance Hypothesis and the

Assumption of Random Kink Placement, the distribution of kink estimates over P

corresponds to the exact distribution of possible estimates which could have arisen

had the policy kink been at a different location in the same dataset. Under these

assumptions, we can construct an exact test following the logic of Fisher (1935) and

Pitman (1937).

Proposition 1. Under the Null Hypothesis of Policy Irrelevance and the Random

Kink Location assumption, there exists a test function �(v,y, k) for significance level

↵ which has an exact finite sample level of ↵.

In Appendix B, we follow the structure of a simple proof in Romano (1990) docu-

menting that, under the Null Hypothesis of Policy Irrelevance and the Random Kink

Location assumption, there exists a test function �(v,y, k) for significance level ↵

which has an exact finite sample level of ↵.

18

Page 19: APermutationTestfortheRegression Kink Design

Under the assumption of Random Kink Placement, the Null Hypothesis thus leads

to a testable implication which can be assessed by measuring how unusual a given

realization of the test statistic is at the policy kink. Analogous to the test outlined

above, researchers can also calculate p-values for assessing the likelihood that the Null

Hypothesis is true given the RK estimate at the policy kink k

⇤ and the distribution of

placebo kink estimates. Suppose a researcher had calculated 1000 placebo kink esti-

mates and the estimate at the policy kink k

⇤ were the 20th lowest of these estimates.

Then the two-sided p-value would be calculated to be 4% corresponding to twice the

one-sided p-value of 2%. More generally, the two-sided p-value can be calculated as

twice the minimum of the two one-sided p-values, i.e., the minimum of the fraction

of placebo estimates—including the one at the actual policy kink k

⇤—that are no

greater than or no smaller than the test statistic at the policy kink k

⇤.

4 Applications of the Permutation Test

4.1 The Effect of Unemployment Benefits on Duration: Simu-lation Studies Comparing Asymptotic and RandomizationInference

We simulate data based on two RK applications estimating the effect of unem-

ployment insurance on unemployment duration – CLPW and KLNS; we find that FG

bandwidth choice has good power but bad size and that CCT and the permutation

test have improved size, with sufficient power to reject the null hypothesis in the

KLNS setting. Both of these applications implement our permutation test as part of

their analysis. The running variable in both applications corresponds to measures of

prior income and the outcome variable to a measure of unemployment duration. The

top panel of Figure 3 is based on data from CLPW and shows the relationship between

log unemployment duration and base year earnings at the bottom kink. Similarly,

19

Page 20: APermutationTestfortheRegression Kink Design

the bottom panel of Figure 3 shows the relationship between unemployment duration

and the running variable, prior wages, for the application in KLNS, normalizing the

kink point to zero.

4.45

4.5

4.55

4.6

4.65

4.7

Log

Une

mpl

oym

ent D

urat

ion

0 5000 10000 15000 20000Annual Income Relative to Policy Threshold

Data: CLPW BandwidthData: No-Kink RegionCubic Spline, No KinkCubic Spline, With Kink

Card, Lee, Pei & Weber (2015)

2025

3035

Dur

atio

n of

Une

mpl

oym

ent S

pell

-400 -300 -200 -100 0 100 200 300Daily Wage Relative to Policy Threshold

Data: KLNS BandwidthData: No-Kink RegionCubic Spline, No KinkCubic Spline, With Kink

Kolsrud, Landais, Nilsson & Spinnewijn (2015)

Figure 3: Conditional mean functions for RK applications in Card et al. (2015b) and Kolsrud et al.(2015). The data showing the global relationship between the outcome variable and the runningvariable were shared with the authors by Andrea Weber and Camille Landais. The solid maroonline denotes the fit for a natural cubic spline estimated on the data without allowing for a slopechange at the kink point; the dashed maroon line denotes a natural cubic spline allowing for a slopechange at the kink point.

For our simulation studies based on RK studies on the effect of unemployment

benefits, we draw on data on the global relationship between the outcome and the

running variables. We construct our analysis sample for simulations in two steps:

first we estimate a cubic spline model on these data and second simulate data from

the estimated model. We implement this procedure twice: once with a cubic spline

that assumes there is no discontinuous slope change to evaluate size and again with

20

Page 21: APermutationTestfortheRegression Kink Design

Table 1: Empirical Study: Regression Kink Estimators

Estimate Interval Length Error RateData Kink? Method Mean SD Asymp Permute Error Type Asymp Permute

CLPW No FG 1.58 2.86 3.11 5.58 Type I (Size) 0.51 0.10CLPW No CCT -0.11 7.55 19.21 24.33 Type I (Size) 0.11 0.10KLNS No FG 0.14 0.48 0.79 1.58 Type I (Size) 0.42 0.10KLNS No CCT 0.00 0.42 1.27 1.42 Type I (Size) 0.12 0.10CLPW Yes FG 4.25 1.52 4.15 5.70 Type II (Power) 0.06 0.70CLPW Yes CCT 5.58 14.54 41.58 25.46 Type II (Power) 0.82 0.72KLNS Yes FG -5.99 0.12 0.44 5.46 Type II (Power) 0.00 0.00KLNS Yes CCT -6.36 0.29 0.93 1.88 Type II (Power) 0.00 0.00Note: To compare the false rejection rate (size) and false acceptance rate (power) of asymptotic

and permutation-based methods, we analyze data from two empirical applications which computethe elasticity of unemployment duration with respect to benefits: CLPW (2015) and KLNS (2015).We fit a natural cubic spline to each dataset. To estimate the type I error rate, we assume no kink(rows 1-4), and require that the first derivative is continuous at all of the knots. To estimate thetype II error rate (rows 5-8), we allow the first derivative to change discontinuously at the policykink. We randomly generate 10,000 unemployment durations y = E(y|x) + " with y = E(y|x) fromthe cubic spline and " ⇠ N(0, 0.125). FG uses a local linear estimator with Fan and Gijbels (1996)’sbandwidth, while CCT uses Calonico Cattaneo and Titiunik (2014b)’s local linear estimator withquadratic bias-correction. We set the nominal level of the test to 10%. We reject the null hypothesiswhen the estimate at the kink location is outside the 90% confidence interval, where the intervalis constructed either using standard asymptotic methods or the permutation method described inSection 3 for a set of placebo kinks on [-1400,17800] for CLPW and [400,895] for KLNS. We useone draw of the dataset and 100 placebo kinks to estimate the type I error rate; because there isno slope change by assumption, the type I error rate for an accurate estimation method should be10%. To estimate the type II error rate, we use 50 draws of the dataset.

a modified cubic spline which allows for a discontinuous slope change at the policy

kink to evaluate power. To illustrate, the solid maroon lines in both panels of Figure

3 show the estimated conditional mean function in both applications using a cubic

spline without allowing for a slope change at the kink point; the dashed maroon

lines show a cubic spline fit allowing for discontinuous slope changes at the kink

point. Following the precedent in the RK literature and the modeling choices in

both applications, we test the performance of linear and quadratic estimators and

bandwidth choice procedures based on the approach in Fan and Gijbels (1996) (FG

in the following) and the estimator proposed in CCT and then compare size and power

based on asymptotic and permutation test-based inference for each of these modeling

choices. To implement the permutation test, we estimate 100 equally-spaced placebo

RK estimates on the feasible range of the running variable. We chose a grid of 100

placebo kinks to arrive at round rejection rates; our results remain robust for finer

21

Page 22: APermutationTestfortheRegression Kink Design

grids. In addition, we also implement the permutation test by drawing on information

on the institutional environment and using the past realization of changes in the

earnings ceiling as described in Table A.3 in the Appendix.

Table 1 shows the size of different linear RK estimators under asymptotic and

permutation test-based inference and demonstrates that the conclusions drawn under

asymptotic inference with FG bandwidth choice are not reliable, while CCT’s pro-

cedure and the permutation test achieve exact size. To calculate size, we randomly

generate a dataset with 10,000 unemployment durations from the no-kink cubic spline

using y = E

no�kink

(y|x) + " where " ⇠ N(0, 0.125). At 100 potential kink locations,

we calculate a regression kink estimate and an asymptotic standard error. For asymp-

totic inference, we reject the null hypothesis if the 90% asymptotic confidence interval

excludes zero; for permutation inference, we reject the null hypothesis if the kink is

below the 5th or above the 95th percentile in the placebo kink distribution. To evalu-

ate size, we report the type I error rate of different estimators for tests with nominal

level of 10% in the first four rows of the table. Both linear and quadratic estima-

tors with FG bandwidth choice have substantially higher than nominal coverage rates

(51% and 42%), demonstrating a failure of asymptotic inference for these modeling

choices. In contrast, estimators following CCT’s procedure perform substantially bet-

ter and lead to type I error rates much closer to the nominal level. In all cases, the

type I error rate of the permutation test corresponds to the nominal level of 10%,

illustrating that the permutation test has exact size under the assumption that the

kink location is random (see Proposition 1).

Our analyses of power of the permutation test and asymptotic inference in Table

1 reveals that the permutation test and CCT’s method have lower power compared

to standard asymptotic inference, but still have enough power to reject the null hy-

pothesis in the application of KLNS. To calculate power, we randomly generate a

dataset with 10,000 unemployment durations from the cubic spline with a kink us-

22

Page 23: APermutationTestfortheRegression Kink Design

ing y = E

kink

(y|x) + " where " ⇠ N(0, 0.125). At 100 potential kink locations, we

calculate a regression kink estimate. For asymptotic inference, we accept the null hy-

pothesis if the 90% asymptotic confidence interval at the policy kink includes zero; for

permutation inference, we accept the null hypothesis if the estimate at the policy kink

is within the 5th-95th percentile of the placebo kink distribution.To analyze power,

we report the type II error rate of the different estimators and inference procedures

for simulated data that features a slope change at the kink point in the last four rows

of Table 1. To get a further sense of the relative power of the different estimators

and inference approaches, the table also reports interval lengths for 90% confidence

intervals based on asymptotic standard errors as well as the length of the permutation

test-based intervals, i.e. the average distance between the 5th and 95th placebo esti-

mate. The lowest type II error rates and interval lengths for asymptotic inference are

achieved with FG bandwidth choice—although the short interval sizes come at the

cost of over-rejecting the null hypothesis and inadequate size. Both CCT’s procedure

as well as the permutation test have insufficient power to reject the null hypothesis

in the simulated data based on CLPW. In contrast, the application based on KLNS

demonstrates a setting where CCT’s procedure as well as the permutation test have

enough power to reject the null hypothesis.

We also assess the performance of quadratic and cubic specifications and find that

standard inference achieves lower type I and higher type II error rates as the order of

the local polynomial model increases. Table A.2 in the Appendix demonstrates quali-

tatively similar results for quadratic specifications, while Table A.4 demonstrates that

cubic specifications with standard inference also have lower type I error rates. Finally,

Table A.3, which reports error rates of the permutation test using the past realization

of changes in the earnings ceiling as the distribution of placebo kink locations—as

proposed in Section 3.3—illustrates that the permutation test achieves exact size,

albeit with higher type II error rates.

23

Page 24: APermutationTestfortheRegression Kink Design

Table 2: Simulation Study: Size of Regression Kink Estimators

Estimate Interval Length Type I Error RateDGP Method Mean SD Asymp Permute Asymp Permute

1 FG 0.01 0.15 0.37 0.54 0.18 0.101 CCT -0.07 0.53 1.48 1.89 0.11 0.102 FG -0.92 1.72 2.31 5.91 0.60 0.102 CCT 0.08 1.76 4.54 6.32 0.25 0.103 FG -2.62 19.98 2.63 59.24 0.94 0.103 CCT -0.37 6.24 16.24 20.13 0.23 0.10Note: To compare the false rejection rate (size) of asymptotic and permutation-based methods, we

analyze the data-generating processes displayed in Figure A.1. For every DGP, we randomly generate10,000 observations with x distributed uniformly on [-2,2] and y = E(y|x) + " with " ⇠ N(0, 0.25).FG uses a local linear estimator with Fan and Gijbels (1996)’s bandwidth, while CCT uses CalonicoCattaneo and Titiunik (2014b)’s estimator. We set the nominal level of the test to 10%. We rejectthe null hypothesis when the estimate at the kink location is outside the 90% confidence interval,where the interval is constructed either using standard asymptotic methods or the permutationmethod described in Section 3 for a set of 100 placebo kinks on [-1,1]. In this setting, the nullhypothesis is true by assumption and the type I error rate for an accurate estimation method shouldbe 10%.

Taken together, the results show that linear and quadratic estimators with FG

bandwidth choice over-reject the null hypothesis in data simulated based on actual

RK applications, while CCT’s estimator as well as permutation test-based inference

lead to type I error rates closer to the nominal level. In the application based on

KLNS, both CCT and the permutation test have enough power to detect an effect of

unemployment benefits on unemployment duration.

4.2 Comparing Standard and Randomization Inference: Addi-tional Simulation Studies

To understand where our approach has power, we extend the analysis to three

artificial data-generating processes (DGPs). These simulations show that there is a

spectrum where asymptotic inference using FG bandwidth substantially over-rejects,

CCT does better on size and worse on power and the permutation test performs best

on size but worst on power. In general, the power of RK estimators is worst when

24

Page 25: APermutationTestfortheRegression Kink Design

the DGP is highly non-linear relative to the effect size studied. The conditional mean

functions for the linear and non-linear DGPs that we analyze are displayed in Figure

A.1 in the Appendix. For each of these conditional mean functions, we simulate data

with and without a kink at zero analogous to the procedure in the previous section

and report the same set of statistics as in Table 1. DGP 1 is a linear function and

piece-wise linear in the specification with a kink. DGP 2 is based on a combination

of trigonometric polynomial and exponential functions functions with and without

kinks. DGP 3 follows a sine function.

Table 2 reports the type I error rate of linear RK estimators under asymptotic

and permutation test-based inference. In line with the previous section, the results

document that asymptotic inference with FG bandwidth choice leads to substantial

overrejection of the null hypothesis—type I error rates greater than 60%—when the

DGP features non-linearity (DGPs 2 and 3); it performs relatively well when the

DGP is linear—in the case of DGP 1—with a type I error rate of 0.18. Asymptotic

inference based on CCT’s procedure outperforms the estimator with FG bandwidth,

but leads to overrejection of the null hypothesis in some settings with non-linearity

(DGPs 2 and 3). Inference based on the permutation test, randomly drawing from

100 placebo kink locations on the interval [�1, 1], achieves type I error rates at the

nominal level of the test. In Appendix Table A.4 and A.5 we repeat the exercise

for the case of quadratic and cubic RK estimators. The relative performance of

these estimators with asymptotic inference improves relative to the case of linear RK

estimators, but still stays above nominal in many cases. Again, the permutation test

leads to empirical coverage at the nominal level.

Table 3 reports the type II error rate of linear RK estimators under asymptotic

and permutation test-based inference for slope changes of 5 and 20 at the kink point

and shows that the permutation test suffers from lower power for highly non-linear

DGPs. Asymptotic inference with RK estimators relying on the bandwidth choice

25

Page 26: APermutationTestfortheRegression Kink Design

Table 3: Simulation Study: Power of Regression Kink Estimators

True Estimate Interval Length Type II Error RateDGP Kink Size Method Mean SD Asymp Permute Asymp Permute

1 5 FG 4.98 0.16 0.36 5.70 0.00 0.121 5 CCT 5.02 0.58 1.44 6.14 0.00 0.022 5 FG 4.23 1.72 2.26 8.12 0.02 0.542 5 CCT 4.97 2.20 4.44 7.39 0.12 0.303 5 FG 3.98 20.87 2.62 60.03 0.02 0.863 5 CCT 5.11 5.83 16.35 19.47 0.70 0.781 20 FG 20.01 0.14 0.40 20.11 0.00 0.001 20 CCT 20.02 0.50 1.45 14.30 0.00 0.002 20 FG 18.81 1.83 2.22 19.63 0.00 0.042 20 CCT 20.00 2.34 4.32 14.64 0.00 0.003 20 FG 18.61 21.05 2.65 62.56 0.08 0.763 20 CCT 18.87 5.40 16.64 21.38 0.02 0.06Note: To compare the false acceptance rate (power) of asymptotic and permutation-based methods,we analyze the data-generating processes displayed in Figure A.1. For every DGP, we randomlygenerate 10,000 observations with x distributed uniformly on [-2,2] and y = E(y|x) + " with " ⇠N(0, 0.25). FG uses a local linear estimator with Fan and Gijbels (1996)’s bandwidth, while CCTuses Calonico Cattaneo and Titiunik (2014b)’s estimator. In our baseline specification, we randomlychoose a kink location on [-1,1] with a kink size specified in column 2. We set the nominal level of thetest to 10%. We accept the null hypothesis when the 90% confidence interval includes zero, wherethe interval is constructed either using standard asymptotic methods or the permutation methoddescribed in Section 3 for a set of 100 placebo kinks on [-1,1]. This process is repeated 50 times toestimate a type II error rate.

26

Page 27: APermutationTestfortheRegression Kink Design

procedure in FG leads to the lowest type II error rates and has small interval lengths

compared to asymptotic inference based on CCT as well as the permutation test.

Yet, comparing the mean of the estimators across specifications reveals that CCT’s

estimator is much less biased, in particular in the case of non-linear DGPs. The

results also document that the power of the permutation test is particularly low

when the DGP is very non-linear. Curvature makes the implementation of the RK

design particularly problematic due to misspecification bias as discussed in section

2.3 and pointed out by CCT. In the case of highly non-linear DGPs, point estimates

are also much more dispersed as documented by the standard deviation and longer

intervals for the permutation test. In Appendix Tables A.4 (last six rows) and A.6

we repeat the exercise for the case of quadratic and cubic estimators.

4.3 Applying the Permutation Test to Existing RD Applica-tions

In this setting we briefly document results of applying the permutation test to RD

designs based on two well-known studies by Lee (2008) and Ludwig et al. (2007) and

illustrate that standard asymptotic inference and the permutation test deliver similar

conclusions in the RD setting. We display the conditional mean functions in Figure

A.2 in the Appendix. To apply our analysis to an RD setting, we modify equations

(8) and (9) to allow for an intercept shift at the discontinuity:

vk ⌘ (

1 1(v1 � k) (v1 � k) (v1 � k)1(v1 � k)

.

.

.

.

.

.

.

.

.

.

.

.

1 1(v1 � k) (v

n

� k) (v

n

� k)1(vn

� k)

) (10)

T (v,y, k) ⌘ (

0 1 0 0)

0⇣vk 0vk

⌘�1vk 0y, (11)

The analysis of size and power of asymptotic and permutation test-based inference

follows the same structure as in Sections 4.1 and 4.2 and we report the results of

our simulations in Table 4, which also features randomization inference for the RD

design based on the approach in Cattaneo et al. (2015). The simulation studies reveal

27

Page 28: APermutationTestfortheRegression Kink Design

Table 4: Empirical Study: Regression Discontinuity Estimator

Estimate Interval Length Error RateData Jump? Method Mean SD Asymp Permute Error Asymp Permute

Lee No IK 0.00 0.11 0.31 0.35 Type I (Size) 0.11 0.10Lee No CCT -0.00 0.05 0.13 0.16 Type I (Size) 0.12 0.10LM No IK 0.11 1.32 3.94 4.23 Type I (Size) 0.12 0.10LM No CCT -0.03 1.49 4.61 4.88 Type I (Size) 0.09 0.10Lee Yes IK 0.46 0.09 0.33 0.25 Type II (Power) 0.00 0.00Lee Yes CCT 0.44 0.04 0.12 0.20 Type II (Power) 0.00 0.00LM Yes IK -1.84 1.37 4.19 4.21 Type II (Power) 0.54 0.68LM Yes CCT -1.04 1.75 4.87 4.88 Type II (Power) 0.78 0.86Note: To compare the false rejection rate (size) and false acceptance rate (power) of asymptotic

and permutation-based methods, we analyze data from Lee (2008) and Ludwig-Miller (LM, 2007).For the ’Yes’ Jump rows, we estimate a natural cubic spline, allowing for a jump in the intercept inthe knot at the policy discontinuity. For the ’No’ Jump rows, we take the true DGP and subtractthe estimated jump at the policy discontinuity so that the conditional mean function is continuousat zero. For Lee, we randomly generate results from 10,195 elections with a victory probabilityequal to the predicted mean of the cubic spline. For LM, we randomly generate a mortality rate in2,810 counties using a predicted mean of the cubic spline and a standard deviation of 5.7. IK usesa local linear estimator with Imbens-Kalyanaraman (2012)’s bandwidth, while CCT uses CalonicoCattaneo and Titiunik (2014b)’s local linear estimator with quadratic bias-correction. We computetwo-sided asymptotic p-values and permutation-based p-values from placebo kinks on [-48,47] forLee and [-40,9.5] for LM using the method described in Section 3. We use one draw of the datasetand 100 placebo kinks to compute a type I error rate. We use 50 draws of the dataset to compute atype II error rate. We also implemented the Cattaneo, Frandsen and Titiunik (2015) randomizationinference test for 50 simulated datasets based on Lee (window of 1) and LM (window of 1.1, followingCattaneo, Titiunik and Vasquez-Bare 2016). With nominal size of 10%, the type I error rates were0.16 and 0.56 for Lee and LM respectively. The type II error rates were 0.00 and 0.86 for Lee andLM respectively.

that for both RD applications, the permutation test and asymptotic inference lead to

actual size close to nominal size and have comparable type II error rates.

5 Conclusion

We have developed a permutation test for the regression kink design and document

its performance compared to standard asymptotic inference. The thought experiment

underlying our test differs from the one of standard asymptotic inference which has is

based on the thought experiment of drawing observations from a large population so

that standard errors reflect sampling uncertainty. Our test follows the randomization

inference approach in taking the sample as given and takes the assignment of treat-

ment, here the location of the kink point, as a random variable and thus the source

of uncertainty. The test can offer a complement to existing methods and will be more

28

Page 29: APermutationTestfortheRegression Kink Design

robust than asymptotic inference with standard bandwidth choice in the presence of

non-linearity which is ubiquitous in many of the settings in which RK designs are

applied. Based on the results of our simulation studies, we recommend that prac-

titioners: (1) avoid using linear and quadratic RK estimators with FG bandwidth

choice, (2) use CCT’s robust procedure as preferred procedure for estimating kinks,

(3) use the distribution of placebo estimates to assess whether they will have power

to detect economically meaningful results in their context, (3) report p-values con-

structed by comparing their point estimate to the distribution of placebo estimates

and (4) report the robustness of the permutation test to different assumptions on the

placebo kink distribution.

References

Abadie, A., Athey, S., Imbens, G., and Wooldridge, J. “Finite Population Causal StandardErrors.” Working Paper (2014).

Abadie, A., Diamond, A., and Hainmueller, J. “Synthetic control methods for comparativecase studies: Estimating the effect of California’s tobacco control program.” Journal ofthe American Statistical Association, 105(490) (2010).

Ando, M. “How Much Should We Trust Regression-Kink-Design Estimates?” Departmentof Economics Working Paper 2013:22 Uppsala University (2013).

Bertrand, M., Duflo, E., and Mullainathan, S. “How Much Should We Trust Differences-in-Differences Estimates?” Quarterly Journal of Economics, 119:249–75 (2004).

Blouin, A. “Culture and Contracts: The Historical Legacy of Forced Labour.” University ofWarwick Working Paper (2013).

Böckerman, P., Kanninen, O., and Suoniemi, I. “A Kink that Makes You Sick: The IncentiveEffect of Sick Pay on Absence.” IZA DP No. 8205 (2014).

Bravo, J. “The Effects of Intergovernmental Grants on Local Revenue: Evidence from Chile.”Documentos de Trabajo (Instituto de Economía PUC), (393):1 (2011).

Britto, D. G. C. d. “Unemployment Insurance and the Duration of Employment: Evidencefrom a Regression Kink Design.” SSRN Working Paper 2648166 (2015).

Bulman, G. B. and Hoxby, C. “The Returns to the Federal Tax Credits for Higher Educa-tion.” In Tax Policy and the Economy , volume 29, 13–88. University of Chicago Press(2015).

29

Page 30: APermutationTestfortheRegression Kink Design

Calonico, S., Cattaneo, M. D., and Titiunik, R. “Robust data-driven inference in theregression-discontinuity design.” Stata Journal , 14(4):909–946 (2014a).

—. “Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs.”Econometrica, 82(6):2295–2326 (2014b).

—. “Optimal Data-Driven Regression Discontinuity Plots.” Journal of the American Statis-tical Association, 110(512):1753–1769 (2015).

Card, D., Johnston, A., Leung, P., Mas, A., and Pei, Z. “The Effect of UnemploymentBenefits on the Duration of Unemployment Insurance Receipt: New Evidence from aRegression Kink Design in Missouri, 2003-2013.” American Economic Review , 105(5):126–30 (2015a).

Card, D., Lee, D. S., Pei, Z., and Weber, A. “Inference on Causal Effects in a GeneralizedRegression Kink Design.” Econometrica, 83(6):2463–2483 (2015b).

Cattaneo, M., Frandsen, B., and Titiunik, R. “Randomization Inference in the RegressionDiscontinuity Design: An Application to Party Advantages in the U.S. Senate.” Journalof Causal Inference, 3(1):1–24 (2015).

Cattaneo, M., Titiunik, R., and Vazquez-Bare, G. “Comparing Inference Approaches forRD Designs: A Reexamination of the Effect of Head Start on Child Mortality,.” WorkingPaper (2016).

Chetty, R., Looney, A., and Kroft, K. “Salience and Taxation: Theory and Evidence.” TheAmerican Economic Review , 99(4):1145–1177 (2009).

Conlin, M. and Thompson, P. N. “Impacts of New School Facility Construction: An Analysisof a State-Financed Capital Subsidy Program in Ohio.” (2015).

Dahlberg, M., Mörk, E., Rattsø, J., and Ågren, H. “Using a discontinuous grant rule toidentify the effect of grants on local taxes and spending.” Journal of Public Economics,92(12):2320–2335 (2008).

d’Astous, P. and Shore, S. H. “Liquidity Constraints and Credit Card Delinquency: Evidencefrom Raising Minimum Payments.” Working Paper (2015).

Dobbie, W. and Skiba, P. M. “Information Asymmetries in Consumer Credit Markets: Evi-dence from Payday Lending.” American Economic Journal: Applied Economics , 5(4):256–282 (2013).

Dobridge, C. L. “Fiscal Stimulus and Firms: A Tale of Two Recessions.” Working Paper(2016).

Dong, Y. “Jump or Kink? Identifying Education Effects by Regression Discontinuity Designwithout the Discontinuity.” Working Paper (2016).

Engels, B., Geyer, J., and Haan, P. “Labor Supply and the Pension System-Evidence froma Regression Kink Design.” Netspar Discussion Paper (2015).

30

Page 31: APermutationTestfortheRegression Kink Design

Engström, P., Nordblom, K., Ohlsson, H., and Persson, A. “Tax compliance and loss aver-sion.” American Economic Journal , 7(4):132–164 (2015).

Fan, J. and Gijbels, I. Local Polynomial Modelling and its Applications., volume 66. Chap-man and Hall (1996).

Fe, E. and Hollingsworth, B. “Estimating the effect of retirement on mental health via paneldiscontinuity designs.” Working Paper (2012).

Fidrmuc, J. and Tena, J. d. D. “National minimum wage and employment of young workersin the UK.” CESifo Working Paper, No. 4286 (2013).

Fisher, R. The Design of Experiments. Oliver and Boyd, Oxford, England (1935).

Florens, J.-P., Heckman, J. J., Meghir, C., and Vytlacil, E. “Identification of treatmenteffects using control functions in models with continuous, endogenous treatment and het-erogeneous effects.” Econometrica, 76(5):1191–1206 (2008).

Fritzdixon, K. and Skiba, P. M. “The Consequences of Online Payday Lending.” WorkingPaper (2016).

Garmann, S. “The Causal Effect of Coalition Governments on Fiscal Policies: Evidence froma Regression Kink Design.” Applied Economics , 46(36):4490–4507 (2014).

Gelber, A., Moore, T., and Strand, A. “The Effect of Disability Insurance on BeneficiariesMortality.” GSPP Working Paper (2015).

Gelman, A. and Imbens, G. “Why high-order polynomials should not be used in regressiondiscontinuity designs.” National Bureau of Economic Research Working Paper (2014).

Hanson, A. “The Incidence of the Mortgage Interest Deduction: Evidence from the Marketfor Home Purchase Loans.” Public Finance Review , 40(3):339–359 (2012).

Ho, D. E. and Imai, K. “Randomization inference with natural experiments: An analysis ofballot effects in the 2003 California recall election.” Journal of the American StatisticalAssociation, 101(475):888–900 (2006).

Huang, P.-C. and Yang, T.-T. “Evaluation of Optimal Unemployment Insurance with Reem-ployment Bonuses Using Regression Discontinuity (Kink) Design.” Working Paper (2016).

Imbens, G. and Kalyanaraman, K. “Optimal bandwidth choice for the regression disconti-nuity estimator.” The Review of Economic Studies, 79(3):933–959 (2012).

Imbens, G. W. and Lemieux, T. “Regression discontinuity designs: A guide to practice.”Journal of Econometrics, 142(2):615–635 (2008).

Imbens, G. W. and Rosenbaum, P. R. “Robust, accurate confidence intervals with a weakinstrument: quarter of birth and education.” Journal of the Royal Statistical Society:Series A (Statistics in Society), 168(1):109–126 (2005).

Jones, M. R. “The EITC and Labor Supply: Evidence from a Regression Kink Design.”Washington, DC: Center for Administrative Records Research and Applications, US Cen-sus Bureau (2013).

31

Page 32: APermutationTestfortheRegression Kink Design

Kisin, R. and Manela, A. “Funding and Incentives of Regulators: Evidence from Banking.”SSRN Working Paper (2014).

Kolsrud, J. “Precaution versus Risk Aversion: Decomposing the effect of UnemploymentBenefits on Saving.” Working Paper (2013).

Kolsrud, J., Landais, C., Nilsson, P., and Spinnewijn, J. “The Optimal Timing of Unem-ployment Benefits: Theory and Evidence from Sweden.” IZA Discussion Paper 9185(2015).

Kristensen, S. R., Fe, E., Bech, M., and Mainz, J. “Is the quality of hospital care pricesensitive? Regression kink estimates from a volume dependent price setting scheme.”COHERE-Centre of Health Economics Research, University of Southern Denmark (2013).

Kyyrä, T. and Pesola, H. “The effects of unemployment insurance benefits on subsequentlabor market outcomes: Evidence from an RKD approach.” (2015).

Landais, C. “Assessing the Welfare Effects of Unemployment Benefits Using the RegressionKink Design.” American Economic Journal: Economic Policy , 7(4):243–278 (2015).

Lee, D. S. “Randomized experiments from non-random selection in US House elections.”Journal of Econometrics, 142(2):675–697 (2008).

Lehmann, E. and Stein, C. “On the theory of some non-parametric hypotheses.” The Annalsof Mathematical Statistics, 20(1):28–45 (1949).

Ludwig, J., Miller, D. L., et al. “Does Head Start Improve Children’s Life Chances? Ev-idence from a Regression Discontinuity Design.” The Quarterly Journal of Economics,122(1):159–208 (2007).

Lundqvist, H., Dahlberg, M., and Mörk, E. “Stimulating local public employment: Do gen-eral grants work?” American Economic Journal: Economic Policy , 6(1):167–92 (2014).

Manoli, D. and Turner, N. “Cash-on-Hand and College Enrollment: Evidence from Popula-tion Tax Data and Policy Nonlinearities.” NBER Working Paper 19836 (2014).

Marx, B. and Turner, L. “Borrowing Trouble? Human Capital Investment with Opt-InCosts and Implications for the Effectiveness of Grant Aid.” Working Paper (2015).

Messacar, D. “Do Workplace Pensions Crowd Out Other Retirement Savings? Evidencefrom Canadian Tax Records.” Statistics Canada Working Paper (2015).

Nielsen, H. S., Sørensen, T., and Taber, C. R. “Estimating the effect of student aid oncollege enrollment: Evidence from a government grant policy reform.” American EconomicJournal: Economic Policy , 2:185–215 (2010).

Peck, J. R. “Can Hiring Quotas Work? The Effect of the Nitaqat Program on the SaudiPrivate Sector.” Working Paper, MIT (2014).

Pitman, E. “Significance Tests Which May be Applied to Samples From any Populations.”Supplement to the Journal of the Royal Statistical Society , 4(1):119–130 (1937).

32

Page 33: APermutationTestfortheRegression Kink Design

Romano, J. P. “On the behavior of randomization tests without a group invariance assump-tion.” Journal of the American Statistical Association, 85(411):686–692 (1990).

Rosenbaum, P. R. “Stability in the absence of treatment.” Journal of the American StatisticalAssociation, 96(453):210–219 (2001).

—. Observational studies. New York: Springer, 2 edition (2002).

Scharlemann, T. C. and Shore, S. H. “The Effect of Negative Equity on Mortgage Default:Evidence from HAMP PRA.” Working Paper (2015).

Seim, D. “Behavioral Responses to an Annual Wealth Tax: Evidence from Sweden.” WorkingPaper, University of Toronto (2014).

Simonsen, M., Skipper, L., and Skipper, N. “Price sensitivity of demand for prescriptiondrugs: Exploiting a regression kink design.” Journal of Applied Econometrics, 31(2)(2016).

Sovago, S. “The effect of the UI benefit on labor market outcomes-regression kink evidencefrom the Netherlands.” Working Paper (2015).

Sukhatme, N. U. and Cramer, J. N. “How Much Do Patent Applicants Care About PatentTerm? Cross-Industry Differences in Term Sensitivity.” Working Paper, Princeton Uni-versity (2014).

Thistlethwaite, D. L. and Campbell, D. T. “Regression-discontinuity analysis: An alternativeto the ex post facto experiment.” Journal of Educational psychology , 51(6):309 (1960).

Turner, L. J. “The Road to Pell is Paved with Good Intentions: The Economic Incidence ofFederal Student Grant Aid.” Working Paper University of Maryland (2014).

Welch, W. J. “Construction of permutation tests.” Journal of the American StatisticalAssociation, 85(411):693–698 (1990).

Welch, W. J. and Gutierrez, L. G. “Robust permutation tests for matched-pairs designs.”Journal of the American Statistical Association, 83(402):450–455 (1988).

Wong, M. “Estimating Ethnic Preferences Using Ethnic Housing Quotas in Singapore.” TheReview of Economic Studies, 80(3):1178–1214 (2013).

33

Page 34: APermutationTestfortheRegression Kink Design

A Appendix: Tables

Tabl

eA

.1:

Ove

rvie

wof

Exist

ing

RK

Pape

rs

Pape

r:Po

licy

Varia

ble

Out

com

eVa

riabl

ePr

efer

red

Poly

nom

ial/

Estim

atio

n

And

o(2

013)

Fede

ralS

ubsid

yG

ov’t

Expe

nditu

reLi

near

/Qua

drat

icB

loui

n(2

013)

Forc

edLa

bor

Inte

reth

nic

Trus

tC

ubic

/Qui

ntic

Böc

kerm

an,K

anni

nen,

and

Suon

iem

i(20

14)

Sick

ness

Insu

ranc

eD

urat

ion

ofSi

ckne

ssA

bsen

ceLi

near

Bra

vo(2

011)

Fede

ralS

ubsid

yLo

calR

even

ueQ

uadr

atic

Brit

to(2

015)

UIb

enefi

tsEm

ploy

men

tdu

ratio

nLi

near

/Qua

drat

icB

ulm

anan

dH

oxby

(201

5)Fe

dera

lTax

Cre

dits

Col

lege

Att

enda

nce

Line

ar/Q

uad/

Cub

icC

ard

etal

.(20

15b)

UIB

enefi

tsU

nem

ploy

men

tD

urat

ion

Line

ar/Q

uadr

atic

Car

d,Jo

hnst

on,L

eung

,Mas

,and

Pei(

2015

a)U

IBen

efits

Une

mpl

oym

ent

Dur

atio

nLi

near

/Qua

drat

ic/C

CT

bias

-cor

rect

edC

onlin

and

Tho

mps

on(2

015)

Cap

italS

ubsid

yTa

xR

even

ue,T

est

scor

esQ

uadr

atic

/Cub

icd’

Ast

ous

and

Shor

e(2

015)

Min

imum

Paym

ents

Del

inqu

ency

Line

ar(D

iff-in

-Diff

)D

ahlb

erg,

Mör

k,R

atts

ø,an

gren

(200

8)Fe

dera

lGra

ntG

ov’t

Expe

nditu

reQ

uadr

atic

Dob

bie

and

Skib

a(2

013)

Payc

heck

Loan

Def

ault

Line

arD

obrid

ge(2

016)

Tax

Ref

und

Inve

stm

ent,

Cas

hH

oldi

ngQ

uadr

atic

Don

g(2

016)

Ret

irem

ent

Food

Expe

nditu

reQ

uadr

atic

Enge

lset

al.(

2015

)Pe

nsio

nde

duct

ions

Labo

rsu

pply

Line

arEn

gstr

öm,N

ordb

lom

,Ohl

sson

,and

Pers

son

(201

5)Ta

xLi

abili

tyTa

xB

ehav

ior

Line

ar,Q

uadr

atic

,Cub

icFe

and

Hol

lings

wort

h(2

012)

Ret

irem

ent

Hea

lthO

utco

mes

Qui

ntic

Fidr

muc

and

Tena

(201

3)M

inim

umW

age

Empl

oym

ent

Stat

usQ

uadr

atic

Fritz

dixo

nan

dSk

iba

(201

6)Lo

ansiz

eLo

anre

paym

ent

Line

ar,Q

uadr

atic

,Cub

icG

arm

ann

(201

4)Vo

teSh

are

Mun

icip

alEx

pend

iture

/Tax

esC

ubic

to6t

hor

der

poly

n.G

elbe

ret

al.(

2015

)D

isabi

lity

Insu

ranc

eEa

rnin

gsLi

near

,Qua

drat

ic,C

ubic

Han

son

(201

2)Lo

anA

mou

ntIn

tere

stR

ates

Line

arH

uang

and

Yang

(201

6)R

eem

ploy

men

tbo

nus

Labo

rsu

pply

Line

arJo

nes

(201

3)EI

TC

Hou

rsW

orke

dLi

near

/Qua

drat

icK

isin

and

Man

ela

(201

4)Fe

espa

idby

bank

sLe

nien

cyof

regu

latio

nQ

uadr

atic

/Cub

icK

olsr

ud(2

013)

UIB

enefi

tsA

sset

Acc

umul

atio

nLi

near

/Qua

drat

icK

olsr

udet

al.(

2015

)U

IBen

efits

Une

mpl

oym

ent

dura

tion

Line

ar/Q

uadr

atic

/Cub

icK

riste

nsen

,Fe,

Bec

h,an

dM

ainz

(201

3)R

eim

burs

emen

tR

ate

Hos

pita

lPro

cess

Qua

lity

qMLE

Kyy

räan

dPe

sola

(201

5)U

Iben

efits

Empl

oym

ent/

earn

ings

Line

ar/Q

uadr

atic

Land

ais

(201

5)U

IBen

efits

Sear

chD

urat

ion

Line

arLu

ndqv

ist,D

ahlb

erg,

and

Mör

k(2

014)

Fede

ralS

ubsid

yLo

calE

mpl

oym

ent

Qua

drat

icM

anol

iand

Turn

er(2

014)

EIT

CC

olle

geEn

rollm

ent

Line

arM

arx

and

Turn

er(2

015)

Col

lege

Subs

idy

Stud

ent

Loan

sLi

near

Mes

saca

r(2

015)

Pens

ion

Plan

Ret

irem

ent

Savi

ngLi

near

Nie

lsen,

Søre

nsen

,and

Tabe

r(2

010)

Col

lege

Subs

idy

Enro

llmen

tLi

near

Peck

(201

4)%

Empl

oyee

sH

iring

Line

ar/Q

uadr

atic

Scha

rlem

ann

and

Shor

e(2

015)

Mor

tgag

eB

alan

ceR

educ

tion

Mor

tgag

eD

elin

quen

cyLi

near

Seim

(201

4)Ta

xLi

abili

tyW

ealth

Acc

umul

atio

nLi

near

Sim

onse

n,Sk

ippe

r,an

dSk

ippe

r(2

016)

Dru

gSu

bsid

yD

rug

Expe

nditu

reLi

near

Sova

go(2

015)

UIb

enefi

tsU

nem

ploy

men

tdu

ratio

n/re

-em

ploy

men

tjo

bqu

ality

Line

ar/Q

uadr

atic

Sukh

atm

ean

dC

ram

er(2

014)

App

licat

ion

Pend

ency

ofPa

tent

Stoc

kPr

ice

Line

arTu

rner

(201

4)C

olle

geSu

bsid

yEn

rollm

ent

Line

arW

ong

(201

3)%

Ethn

icgr

oups

Hou

sing

Pric

esQ

uart

ic

34

Page 35: APermutationTestfortheRegression Kink Design

Table A.2: Empirical Study: Quadratic Regression Kink Estimators

Estimate Interval Length Error RateData Kink? Method Mean SD Asymp Permute Error Type Asymp Permute

CLPW No FG -0.40 3.75 6.16 8.89 Type I (Size) 0.36 0.10CLPW No CCT 0.43 16.05 38.67 45.35 Type I (Size) 0.09 0.10KLNS No FG 0.04 0.42 0.58 1.08 Type I (Size) 0.24 0.10KLNS No CCT 0.01 0.40 1.13 1.27 Type I (Size) 0.11 0.10CLPW Yes FG 4.74 3.01 9.91 9.97 Type II (Power) 0.56 0.68CLPW Yes CCT 4.65 27.99 78.28 53.26 Type II (Power) 0.86 0.86KLNS Yes FG -6.72 0.05 0.18 6.73 Type II (Power) 0.00 0.00KLNS Yes CCT -6.31 0.33 0.90 2.92 Type II (Power) 0.00 0.00Note: To compare the false rejection rate (size) and false acceptance rate (power) of asymptotic

and permutation-based methods, we analyze data from two empirical applications which computethe elasticity of unemployment duration with respect to benefits: CLPW (2015) and KLNS (2015).We fit a natural cubic spline to each dataset. To estimate the type I error rate, we assume no kink(rows 1-4), and require that the first derivative is continuous at all of the knots. To estimate thetype II error rate (rows 5-8), we allow the first derivative to change discontinuously at the policykink. We randomly generate 10,000 unemployment durations y = E(y|x) + " with y = E(y|x) fromthe cubic spline and " ⇠ N(0, 0.125). FG uses a local linear estimator with Fan and Gijbels (1996)’sbandwidth, while CCT uses Calonico Cattaneo and Titiunik (2014b)’s local linear estimator withquadratic bias-correction. We set the nominal level of the test to 10%. We reject the null hypothesiswhen the estimate at the kink location is outside the 90% confidence interval, where the intervalis constructed either using standard asymptotic methods or the permutation method described inSection 3 for a set of placebo kinks on [-1400,17800] for CLPW and [400,895] for KLNS. We useone draw of the dataset and 100 placebo kinks to estimate the type I error rate; because there isno slope change by assumption, the type I error rate for an accurate estimation method should be10%. To estimate the type II error rate, we use 50 draws of the dataset.

35

Page 36: APermutationTestfortheRegression Kink Design

Table A.3: Empirical Study: Regression Kink with Placebo Kink Locations ChosenFrom Data

Estimate Interval Length Error RateKink? Method? Mean SD Asymp Permute Error Asymp Permute

No FG 1.14 2.36 6.45 6.84 Type I (Size) 0.38 0.10No CCT 1.38 6.75 30.58 13.60 Type I (Size) 0.05 0.10Yes FG 4.45 1.35 4.02 6.30 Type II (Power) 0.04 0.80Yes CCT 5.04 14.04 39.41 27.16 Type II (Power) 0.82 0.48Note: To compare the false rejection rate (size) and false acceptance rate (power) of asymptotic

and permutation-based methods, we analyze data from an empirical application which computes theelasticity of unemployment duration with respect to benefits: CLPW (2015). We fit a natural cubicspline to each dataset. To estimate the type I error rate, we assume no kink, and require that thefirst derivative is continuous at all of the knots. To estimate the type II error rate, we allow the firstderivative to change discontinuously at the policy kink. We randomly generate 10,000 unemploymentdurations y = E(y|x)+" with y = E(y|x) from the cubic spline and " ⇠ N(0, 0.125). FG uses a locallinear estimator with Fan and Gijbels (1996)’s bandwidth, while CCT uses Calonico, Cattaneo, andTitiunik (2014b)’s local linear estimator with quadratic bias-correction. We set the nominal levelof the test to 10%. We reject the null hypothesis when the estimate at the kink location is outsidethe 90% confidence interval, where the interval is constructed either using standard asymptoticmethods or the permutation method described in Section 3 for a set of placebo kinks. Specifically,we use historical data on changes in the earnings ceiling of the Austrian unemployment insurancesystem from 1996 onwards. Per law, the earnings ceiling moves as a function of past aggregate wagegrowth (§ 108 Allgemeines Sozialversicherungsgesetz). For each year from to 2016, we calculate thenominal change in the earnings ceiling and normalize them by the average change from 1996 to 2016.We then draw from this distribution of changes in the earnings ceiling to simulate counterfactualkink locations by adding a given draw to the actual realization of the kink point, here correspondingto the lower kink in the unemployment insurance system. We use one draw of the dataset and 100placebo kinks to estimate the type I error rate; because there is no slope change by assumption,the type I error rate for an accurate estimation method should be 10%. To estimate the type IIerror rate, we use 50 draws of the dataset and test for an effect size as indicated by the actual slopechange based on a natural cubic spline specification (see Section 4 for more details).

36

Page 37: APermutationTestfortheRegression Kink Design

Table A.4: Regression Kink with Cubic Specification

Estimate Interval Length Error RateData Kink? Kink Size? Mean SD Asymp Permute Error Asymp Permute

KLNS No – -0.04 0.46 1.45 1.60 Type I (Size) 0.13 0.10CLPW No – 1.71 9.08 15.47 14.19 Type I (Size) 0.14 0.10Artificial DGP 1 No 0 -0.03 0.34 1.08 1.19 Type I (Size) 0.16 0.10Artificial DGP 2 No 0 -0.52 3.72 3.49 14.26 Type I (Size) 0.66 0.10Artificial DGP 3 No 0 -0.43 53.62 6.49 176.03 Type I (Size) 0.95 0.10KLNS Yes – -6.32 0.13 0.44 5.52 Type II (Power) 0.00 0.00CLPW Yes – 6.34 8.16 24.80 19.31 Type II (Power) 0.78 0.80Artificial DGP 1 Yes 5 5.00 0.34 1.00 7.70 Type II (Power) 0.00 0.18Artificial DGP 1 Yes 20 20.02 0.35 1.02 24.87 Type II (Power) 0.00 0.00Artificial DGP 2 Yes 5 4.93 3.55 3.45 11.83 Type II (Power) 0.16 0.66Artificial DGP 2 Yes 20 19.55 3.63 3.37 25.30 Type II (Power) 0.00 0.16Artificial DGP 3 Yes 5 4.27 53.56 6.35 164.31 Type II (Power) 0.08 0.90Artificial DGP 3 Yes 20 20.17 52.64 6.55 146.58 Type II (Power) 0.08 0.86Note: This table replicates Fan and Gijbels (1996) specifications in Tables 1, 2 and 3 using a cubic

polynomial estimator instead of a linear estimator. See notes to Tables 1, 2 and 3 for details.

37

Page 38: APermutationTestfortheRegression Kink Design

Table A.5: Simulation Study: Size of Quadratic RK Estimators

Estimate Interval Length Type I Error RateDGP Method Mean SD Asymp Permute Asymp Permute

1 FG -0.03 0.10 0.43 0.33 0.05 0.101 CCT -0.04 1.06 2.87 3.32 0.06 0.102 FG 0.33 2.61 1.42 9.03 0.80 0.102 CCT -0.25 2.07 6.46 6.52 0.11 0.103 FG -2.94 20.95 3.75 73.71 0.81 0.103 CCT -0.13 5.88 17.56 19.62 0.15 0.10Note: To compare the false rejection rate (size) of asymptotic and permutation-based methods, we

analyze the data-generating processes displayed in Figure A.1. For every DGP, we randomly generate10,000 observations with x distributed uniformly on [-2,2] and y = E(y|x) + " with " ⇠ N(0, 0.25).FG uses a local linear estimator with Fan and Gijbels (1996)’s bandwidth, while CCT uses CalonicoCattaneo and Titiunik (2014b)’s estimator. We set the nominal level of the test to 10%. We rejectthe null hypothesis when the estimate at the kink location is outside the 90% confidence interval,where the interval is constructed either using standard asymptotic methods or the permutationmethod described in Section 3 for a set of 100 placebo kinks on [-1,1]. In this setting, the nullhypothesis is true by assumption and the type I error rate for an accurate estimation method shouldbe 10%.

38

Page 39: APermutationTestfortheRegression Kink Design

Table A.6: Simulation Study: Power of Quadratic RK Estimators

True Estimate Interval Length Type II Error RateDGP Kink Size Method Mean SD Asymp Permute Asymp Permute

1 5 FG 4.98 0.12 0.40 7.13 0.00 0.241 5 CCT 5.01 1.05 2.90 7.35 0.02 0.102 5 FG 5.37 2.86 1.40 11.36 0.08 0.782 5 CCT 4.50 2.24 6.22 8.53 0.24 0.463 5 FG 1.91 23.65 3.82 84.07 0.06 0.883 5 CCT 6.54 6.37 18.28 17.97 0.56 0.801 20 FG 19.98 0.14 0.41 26.60 0.00 0.001 20 CCT 20.08 1.06 2.88 16.83 0.00 0.002 20 FG 20.37 2.90 1.36 27.30 0.00 0.142 20 CCT 19.67 2.08 6.39 17.56 0.00 0.003 20 FG 16.30 24.52 3.91 113.10 0.02 0.903 20 CCT 19.28 6.11 17.60 22.18 0.08 0.14Note: To compare the false acceptance rate (power) of asymptotic and permutation-based methods,we analyze the data-generating processes displayed in Figure A.1. For every DGP, we randomlygenerate 10,000 observations with x distributed uniformly on [-2,2] and y = E(y|x) + " with " ⇠N(0, 0.25). FG uses a local linear estimator with Fan and Gijbels (1996)’s bandwidth, while CCTuses Calonico Cattaneo and Titiunik (2014b)’s estimator. In our baseline specification, we randomlychoose a kink location on [-1,1] with a kink size specified in column 2. We set the nominal level of thetest to 10%. We accept the null hypothesis when the 90% confidence interval includes zero, wherethe interval is constructed either using standard asymptotic methods or the permutation methoddescribed in Section 3 for a set of 100 placebo kinks on [-1,1]. This process is repeated 50 times toestimate a type II error rate.

39

Page 40: APermutationTestfortheRegression Kink Design

-11

-1 -.5 0 .5 1

DGP 1, No Kink

05

1015

20

-1 -.5 0 .5 1

DGP 1, Kink @ 0

-15

-10

-50

-1 -.5 0 .5 1

DGP 2, No Kink

-4-2

02

46

-1 -.5 0 .5 1

DGP 2, Kink @ 0

-2-1

01

2

-1 -.5 0 .5 1

DGP 3, No Kink

05

1015

20

-1 -.5 0 .5 1

DGP 3, Kink @ 0

Figure A.1: Conditional mean function for simulation DGPs. To compare the performance of asymptotic and

permutation-based methods, we analyze four different data-generating processes. They are:

E(y|x) =

0

@0

0.3(0.5 + |x|) cos2(5(x� 0.1))� 5e

x

2 (sin 10(x� 0.1))

1

A

We add kinks with a slope change of 20 to these DGPs. The Figure displays the conditional mean function E(y|x)

for each of these DGPs. To illustrate, we also show DGPs where a kink at x= 0 (20x1(x > 0)) has been added. In

Simulation Study I, the kink location is always at x = 0, In Simulation Study II, we randomly draw kink locations

on [�1, 1].

40

Page 41: APermutationTestfortheRegression Kink Design

0.2

.4.6

.81

Dem

ocra

t Win

Nex

t Ele

ctio

n-100 -50 0 50 100

Democratic Vote Share

DataCubic Spline, No JumpCubic Spline, With Jump

Lee (2008)

12

34

5C

hild

Mor

talit

y

-40 -20 0 20Poverty Rate Relative to Cutoff

Data Cubic Spline, No JumpCubic Spline, With Jump

Ludwig & Miller (2007)

Figure A.2: Conditional mean functions for RD applications in Lee (2008) and Ludwig et al. (2007).The solid maroon line denotes the fit for a natural cubic spline estimated on the data withoutallowing for a discontinuity at the cutoff; the dashed maroon line denotes a natural cubic splineallowing for a discontinuity at the cutoff. Bin number chosen based on the approach in Calonicoet al. (2014a, 2015).

41

Page 42: APermutationTestfortheRegression Kink Design

B Appendix: Proof of Finite Sample Size of Permu-

tation Test

Proposition: Under the Null Hypothesis of Policy Irrelevance and the Random KinkLocation assumption, there exists a test function �(v,y, k) for significance level ↵which has an exact finite sample level of ↵.

Proof: Let K denote the number of potential kink points k drawn according to

distribution P . For any data vector (v,y) let

T

(1)(v,y) T

(2)(v,y) ... T

(

¯

K)(v,y)

denote the kink estimates of T (v,y, k) ranked from smallest to largest.

For a given nominal level ↵ 2 (0, 1) and a two-sided test, we define the critical

values c and c as

c =h↵

2K

i

and c = K �h↵

2K

i

where⇥↵

2

K

⇤denotes the largest integer less than or equal to ↵

2

K.

Let K+(v,y) and K

�(v,y) denote the number of values T (j)(v,y) (j = 1, ... , K)

that are smaller than T

(c)(v,y) and greater than T

(c)(v,y), respectively, and let

K

0(v,y) be the total number of values of T (j)(v,y) (j = 1, ... , K) equal to T

(c)(v,y)

or T

(c)(v,y).

Define

a(v,y) =↵K � (K+(v,y) +K

�(v,y))

K

0(v,y)

as the rejection probability in the case of a tie.

42

Page 43: APermutationTestfortheRegression Kink Design

We can define a permutation test function �(v,y, k) which determines the prob-

ability of rejecting the null hypothesis given a test statistic T (v,y, k):

�(v,y, k) =

8>>>>>><

>>>>>>:

1 , T (v,y, k) < T

(c)(v,y) or T (v,y, k) > T

(c)(v,y)

a(v,y) , T (v,y, k) = T

(c)(v,y) or T (v,y, k) = T

(c)(v,y)

0 , T

(c)(v,y) < T (v,y, k) < T

(c)(v,y)

This implies that for any realization of (v,y) we have that:

X

k2K

�(v,y, k) = K

+(v,y) +K

�(v,y) + a(v,y) ·K0(v,y) = ↵K.

Note that under the Null Hypothesis of Policy Irrelevance, the distribution of

(v,y) is invariant to the realized location of the kink point. This implies that the test

statistic T (v,y, k) is also invariant to the realization of the kink point k

⇤, since the

test statistic depends on the policy variable only through its effect on the outcome.

Taking expectations of the expressions above over the distribution of the location of

kink points which we had denoted as P , we thus have

↵K = E

P

[X

k2K

�(v,y, k)] =X

k2K

E

P

[�(v,y, k)] = KE

P

[�(v,y, k)]

Thus, it follows that EP

[�(v,y, k)] = ↵ proving that the exact finite sample level

of the permutation test is ↵. ⌅

43