unveiling the identity of pin from the flash crash

Upload: christian-westheide

Post on 09-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/7/2019 Unveiling the Identity of PIN from the Flash Crash

    1/37Electronic copy available at: http://ssrn.com/abstract=1697879

    Unveiling the Identity of PIN from the Flash Crash:

    Illiquidity or Information Asymmetry?

    Qin Lei

    First Draft: October 25, 2010

    Current Version: November 25, 2010

    Qin Lei, Finance Department, Cox School of Business at Southern Methodist University, 6212Bishop Blvd, Dallas, TX 75275-0333. Phone: (214) 768-3183. Email: [email protected].

  • 8/7/2019 Unveiling the Identity of PIN from the Flash Crash

    2/37Electronic copy available at: http://ssrn.com/abstract=1697879

    Abstract

    This paper extends the original PIN framework to explicitly allow for the coexistence of liq-

    uidity shocks and fundamental news, both of which can lead to order imbalances. The pseudo

    market makers submit contrarian orders in the event of liquidity shocks and thus move the

    stock prices back to the fundamental level. Consequently, the conventional PIN measure

    consists of one component driven by the informed traders who receive the fundamental newsand another component due to pseudo market makers who arrive upon liquidity shocks. Dur-

    ing the ash crash on May 6, 2010, there is a nearly ten-fold market-wide increase in the

    illiquidity component of PIN but there is a lack of uniform increase in the information asym-

    metry component, based on the estimation of the extended PIN model for common stocks

    listed on NYSE and AMEX. In contrast, the original PIN model disallows liquidity shocks

    and thus overestimates the extent of asymmetric information. In addition to introducing a

    conceptually more pure measure of asymmetric information than that is previously available,

    this paper also contributes to the literature through methodological improvements to the

    PIN estimation and provides the recipe to eradicate the numerical overow and underow

    problems and impute the daily PIN series from repeated estimations of quarterly PINs.

    JEL Classications: G10, G14

    Keywords: Probability of Informed Trading (PIN), Information Asymmetry, Flash Crash,

    Floating Point Overow, Daily PIN Series

  • 8/7/2019 Unveiling the Identity of PIN from the Flash Crash

    3/37

    1 Introduction

    The U.S. nancial markets experienced a tumultuous day on May 6, 2010, when the Dow

    Jones Industrial Average (DJIA) stock index witnessed its biggest one-day loss of 998.5 points

    since its inception more than 100 years ago. Miraculously, the sharp decline across a muchwider spectrum of stocks than just the thirty component stocks in DJIA reversed itself within

    a thirty-minute interval in the same day. Due to the dramatic plummet and swift reversal

    of stock prices, this event became known as a ash crash in the popular press. The ash

    crash continues to reverberate among market participants and policy makers, and academic

    researchers also have keen interests in knowing more about the mechanism underlying this

    event. I study the ash crash in this paper because this event amounts to a critical challenge

    to the PIN literature, but at the same time presents a unique opportunity to unveil the

    identity of PIN. This paper addresses the challenge in an extended PIN model that is better

    supported by the data and exploits the ash crash as a natural experiment to reveal that

    PIN does not always purely represent asymmetric information.

    In a series of important papers, Easley, OHara and their coauthors design and test the

    probability of information-based trading (with the shorthand PIN) to capture the most likely

    fraction of informed trades among all orders submitted by informed and uninformed traders

    under a statistical structure that describes the news and order arrival processes. The PIN

    measure has subsequently been studied in a number of contexts to quantify the eect of

    asymmetric information.1 Yet the ash crash reveals a key weakness of the PIN measure in

    that it overlooks the possibility of order imbalances induced by rm-specic or market-wide

    liquidity shocks and thus would overestimate the fraction of informed trades during such

    times. The notion of PIN as a measure of asymmetric information being contaminated by

    liquidity eects extends beyond the ash crash event. In fact, the interpretation of PIN has

    been subject to much controversy despite its popularity. For instance, Easley, Hvidkjaer, and

    OHara (2002) nd that a 10% dierence in PIN of two stocks results in a 2.5% dierence in

    expected returns and interpret this nding as the information risk being priced. Duarte and

    Young (2009) counter that the PIN factor is priced only because it is a proxy for illiquidity

    in light of the evidence regarding the disappearance of the pricing power for the private

    information factor after controlling for illiquidity. Even before the identity crisis of PIN in

    the asset pricing context, there has been some confusion in the literature over the varying

    interpretations of PIN.2 In light of the potential dual roles of the PIN measure, it is important

    1 Here is an incomplete list of studies that apply the PIN measure outside the market microstructure eld.Easley and OHara (2004) and Duarte, Han, Harford and Young (2008) study the eect of PIN on the costof capital. Vega (2006) and Jayaraman (2008) examine the role of PIN in the context of corporate earnings.Bharath, Pasquariello and Wu (2009) study the relationship between PIN and the capital structure. Easley,Hvidkjaer and OHara (2002) and Duarte and Young (2009) examine the pricing power of an aggregated PINfactor in the asset pricing context.

    2 For instance, Easley, Kiefer, OHara and Paperman (1996) understandably advocate PIN as a measure ofprivate information, yet Easley, Engle, OHara and Wu (2008) assert PIN as a simple measure of illiquidity

    1

  • 8/7/2019 Unveiling the Identity of PIN from the Flash Crash

    4/37

    to ascertain the true identity of the PIN measure. Is it a pure measure of information

    asymmetry as originally designed? Or, is it confounded by the extent of illiquidity? If the

    latter is true, how can one possibly carve the illiquidity component out of the PIN measure

    and obtain a more pure measure of asymmetric information? Answering these questions

    requires disentangling the role of asymmetric information from the role of illiquidity and it

    can be done either qualitatively or quantitatively. This paper takes the initiative to do both.

    Given the elusive nature of illiquidity and information asymmetry, it is dicult to tell

    them apart unless there is some exogenous shock that naturally separates them. The ash

    crash on May 6, 2010, is such a natural experiment because it enables a qualitative distinction

    between the two competing interpretations for the PIN measure. The U.S. Commodity

    Futures Trading Commission (CFTC) and the Securities and Exchange Commission (SEC)

    attribute the ash crash to a short-lived liquidity crisis on both the index futures market and

    the equity market (e.g., CFTC-SEC, 2010). This quick episode of a market-wide liquidity

    crunch would necessarily imply a hike in an illiquidity measure for many stocks on the dayof the ash crash. Consequently, one can largely rule out a uniform increase in the estimated

    PIN if it purely measures the extent of asymmetric information. I repeatedly estimate the

    quarterly PINs with and without the day of the ash crash and then impute the daily PIN

    series based on the knowledge that the informed orders have to add up over time. It turns

    out that there is a marked increase in the estimated PINs for almost all stocks on May 6,

    2010, and this nding goes against PIN as a pure measure of asymmetric information. It

    appears unlikely that there is a simultaneous hike in the amount of private information across

    all stocks. A systematic liquidity shock is more plausible than the common arrival of private

    information across all stocks at once especially because of the sharp and swift price reversal

    on the day of the ash crash. Stocks prices are supposed to gradually incorporate information

    revealed from the informed orders and thus the information-based trading activities would

    imply a price continuation rather than a sharp reversal. In other words, the qualitative

    inference around the ash crash event suggests that the empirical data lean in favor of the

    illiquidity interpretation for the PIN measure at least during the ash crash.

    Having achieved the qualitative separation between asymmetric information and illiquid-

    ity, I turn to quantifying the distinction so as to obtain a more pure measure of information

    asymmetry. For this purpose, I extend the original PIN framework to explicitly allow for

    the coexistence of liquidity shocks and fundamental news, both of which can lead to order

    imbalances. The news probability of a liquidity shock is observed from the actual frequency

    with which sizeable intraday price reversals occur. The idea is that fundamental news should

    be steadily incorporated into stock prices without major reversal within a short time span,

    while a sharp and quick reversal in stock prices is the hallmark of liquidity shocks that are

    (pp. 190). Amihud (2002) treats the PIN measure as both a ner and better measure of illiquidity (pp. 32)and a measure of microstructure risk ... that reects the adverse selection cost resulting from asymmetricinformation (pp. 34).

    2

  • 8/7/2019 Unveiling the Identity of PIN from the Flash Crash

    5/37

    unrelated to the fundamental news. The pseudo market makers submit contrarian orders

    in the event of liquidity shocks and thus move the stock prices back to the fundamental

    level. Consequently, the conventional PIN measure consists of one component driven by the

    informed traders who receive the fundamental news and another component due to pseudo

    market makers who arrive upon liquidity shocks. During the ash crash on May 6, 2010, there

    is a nearly ten-fold market-wide increase in the illiquidity component of PIN that would have

    been mistakenly attributed to information-based trading under the original PIN model that

    disallows liquidity shocks. The information asymmetry component also rises relative to the

    preceding quarter but not in a uniform manner. The daily increment in asymmetric informa-

    tion is not statistically signicant at the 1% level for four of ten volume deciles, and stocks in

    the highest two volume deciles experience the largest hike since they have a very low level of

    asymmetric information in the preceding quarter. These ndings could suggest an environ-

    ment with higher information asymmetry among those heavily traded stocks on the day of the

    ash crash. The sta report in CFTC-SEC (2010) identies the ash crash as initiated in the

    S&P 500 index futures market. Since the index futures market leads the equity market on an

    intraday basis (e.g., Chan, 2002), it is plausible that some of the S&P 500 index component

    stocks were indeed traded as if accompanied by material private information on the day of

    the ash crash. This circumstance would naturally translate into a more prominent eect

    among the most heavily traded stocks.

    Using a novel idea to trace the identity of PIN through the ash crash as a natural exper-

    iment, this paper helps to address some challenges that the original PIN model faces. Aktas,

    de Bodt, Declerck, and Van Oppens (2007) document the apparent diculty in reconciling

    the information leakage with the lower PIN estimates prior to the announcements of mergers

    and acquisitions. One would have expected a higher estimated PIN during periods of infor-

    mation leakage if PIN purely captures the extent of information asymmetry. Though Aktas

    et al. label the inconsistency as a PIN anomaly, it is possible that the stock liquidity actu-

    ally improves when traders exploit the leaked information so long as the lower PIN estimates

    reect lower illiquidity. Therefore, it warrants further investigation to see if the extended

    PIN model resolves the anomaly.

    This paper is closely related to Duarte and Young (2009) in that both papers extend

    the original PIN framework to address the concern that the original PIN measure captures

    illiquidity as well as information asymmetry. One critical distinction is that my extension

    explicitly allows pseudo market makers to submit one-sided orders upon the occurrence of

    liquidity shocks and thus addresses the problematic premise in the original PIN framework

    that informed traders are the exclusive source of order imbalances. Duarte and Young (2009)

    acknowledge the possibility that order imbalances could also result from liquidity shocks

    rather than informed trades, leading to potentially misleading inferences from the problematic

    premise. However, they also carry on the tradition in Easley, Kiefer, OHara and Paperman

    3

  • 8/7/2019 Unveiling the Identity of PIN from the Flash Crash

    6/37

    (1996) that order imbalances are interpreted as an exclusive indication for informed trades,

    and leave the issue unaddressed as an important caveat. Moreover, Duarte and Young

    (2009) have a fairly dierent motivation behind their extension compared to this paper. My

    model extension is inspired by the ash crash and I introduce the pseudo market makers

    to break the exclusivity of the informed traders in creating order imbalances. In contrast,

    Duarte and Young (2009) are most concerned about the mismatch between theory and reality

    because the observed correlation between buy orders and sell orders is positive even though

    the original PIN framework implies a negative correlation. They introduce symmetric positive

    order ow shocks on both the buy side and the sell side to accomplish the goal of eliminating

    the mismatch.

    Beyond the theoretical extension of the PIN framework to explicitly allow for liquid-

    ity eects to coexist with, and thus be separated from, information asymmetry, this paper

    also contributes to the literature by providing methodological improvements to the PIN es-

    timation. Specically, I design one simple procedure to dynamically factorize the daily log-likelihood function for the maximum likelihood estimation of PIN and eectively eliminate

    the numerical overow and underow problems that have long plagued academic researchers

    and practitioners alike. The buy and sell orders have steadily increased in recent years es-

    pecially in light of the prevalence of algorithmic trading that often splits large orders into

    smaller pieces. The explosive growth of the number of trades often contributes to the failure

    of PIN estimations. After applying the dynamic factorization scheme, my estimation has a

    100% convergence rate while avoiding corner solutions and local maxima. Without applying

    the scheme, however, the estimation failure rate is a staggering 54.88%. I also make avail-

    able a technique to impute the daily PIN series through repeated estimations of quarterly

    PINs. Researchers are expected to benet from these methodological improvements in dif-

    ferent settings, especially among studies of short-lived corporate events where the change of

    asymmetric information needs to be measured on a daily basis.

    The balance of the paper proceeds as follows. Section 2 describes the original PIN frame-

    work and proposes a few methodological improvements to the PIN estimation. The PIN

    framework is then extended in Section 3 to explicitly allow for liquidity shocks. Section 4

    contains the empirical analysis and the concluding remarks are in Section 5.

    2 PIN Estimation

    2.1 Original PIN Framework

    The expanding literature concerning the probability of informed trading is built on the the-

    oretical foundation in Easley and OHara (1992). Easley, Kiefer and OHara (1996) and

    especially Easley, Kiefer, OHara and Paperman (1996) popularize the PIN measure by pro-

    4

  • 8/7/2019 Unveiling the Identity of PIN from the Flash Crash

    7/37

    viding an empirical recipe for the maximum likelihood estimation of PIN. The parsimonious

    structure in Easley, Kiefer, OHara and Paperman (1996) becomes the natural starting point

    for many subsequent papers that extend the trading process, the parameterization underlying

    the PIN measure or both. In essence the original PIN framework in Easley, Kiefer, OHara

    and Paperman (1996) imposes a statistical structure on the observed order ows for a given

    stock and relies on the parameter values that maximize the sample likelihood to compute the

    average fraction of orders due to information-based trading.

    Only two types of investors trade stocks in the setting of Easley, Kiefer, OHara and

    Paperman (1996), either informed or uninformed. The orders from these traders are modeled

    as Poisson processes with arrival rates and " for the informed traders and the uninformed

    traders, respectively. While the uninformed traders submit both buy orders and sell orders

    with equal probabilities on average, the informed traders commit to one-sided orders that are

    consistent with the private news about the stock fundamentals. There is an probability

    that the fundamental news would arrive on any trading interval, and the arrived news has a probability of being negative. Therefore, there is an probability for a trading interval

    to be associated with bad news, during which the informed traders submit only sell orders.

    Likewise, there is an (1 ) probability that the informed traders submit only buy orders

    on a trading interval with good news. When there is a lack of news with probability 1 ,

    only the uninformed traders participate in trading the stock.

    Easley, Kiefer, OHara and Paperman (1996) recommend aggregating the order ows at

    the daily level for all stocks so that the modeled trading interval lasts exactly one day. The

    daily likelihood of observing B buy orders and S sell orders on one specic stock is

    L[(B; S)j] = (1 ) exp(" ")(")B

    B!

    (")S

    S!

    +exp(" ")(")B

    B!

    ( + ")S

    S!(1)

    +(1 )exp( " ")( + ")B

    B!

    (")S

    S!;

    where denotes the vector of parameters to be estimated.

    The standard practice in the literature is that under the assumption of constant parame-

    ters over each calendar quarter, one can estimate the set of parameters that maximize the

    sample likelihood of observing the daily order ows. The average orders from the informed

    traders are while the uninformed traders contribute 2". Therefore, the most likely fraction

    of informed orders, or the probability of informed trading, can be dened as

    P IN =

    + 2": (2)

    It is fairly intuitive that under this framework the informed traders are the sole source of

    5

  • 8/7/2019 Unveiling the Identity of PIN from the Flash Crash

    8/37

    order imbalance by construction and the observation of high order imbalance is necessarily

    associated with a high level of estimated PIN. As long as the order imbalance can be exclu-

    sively attributed to the informed traders, it is straightforward to demonstrate that the PIN is

    essentially equivalent to the absolute percentage order imbalance. Three teams of researchers

    uncover this relationship independently around the same time. Kaul, Lei and Stoman (2005)

    derive it through the change of variables using a system of equations after invoking perfect

    foresight. Aktas, de Bodt, Declerck, and Van Oppens (2007) nd that the PIN is the ratio of

    expected absolute order imbalances to expected total orders. Easley, Engle, OHara and Wu

    (2008) document the same relationship with a rst-order approximation while noting that

    the expected absolute dierence of Poisson variables is quite complicated.

    2.2 Common Factorization of Log-likelihood

    As the number of orders gets very large, the likelihood function becomes harder, and even

    impossible in certain cases, to compute due to the factorial, the exponential and the power

    functions. Regardless of the specic hardware and software used for the computation, there

    are limits on the maximum and minimum numbers allowed, beyond which an overow and

    underow error would be triggered, respectively. To get around this issue, one can re-arrange

    the likelihood function to produce a common factor whose natural logarithm is easy to com-

    pute. That is, rewrite the likelihood function as

    L[(B; S)j] = c m=(B!S!);

    where the common factor c makes ln(c) easy to compute and the multiplicative factor m isconstructed to moderate the magnitude of inputs to the exponential functions and the power

    functions. One can skip calculating the common factorial in the denominator because it does

    not involve any parameters to be estimated and thus aects only the absolute magnitude of

    the likelihood value.

    After purging some constants unrelated to parameters in , one can write the daily

    log-likelihood function in the following computation-friendly form,

    L() = 2" + (B + S)ln(")

    + ln f(1 ) + exp[ Sln(k)] + (1 )exp[ B ln(k)]g ; (3)

    where k "

    + "; and thus ln(k) 0:

    The ratio k of arrival rates is bounded between 0 and 1, and thus the inputs for the exponential

    functions can be of moderate size.

    6

  • 8/7/2019 Unveiling the Identity of PIN from the Flash Crash

    9/37

    2.3 Eliminating Overow and Underow Problems

    The common factorization in equation (3) works reasonably well to alleviate the overow and

    underow problems among stocks with low to moderate trading volumes, but it is far from

    eliminating these problems. Stocks with high trading volume often suer from the overow

    and underow problems even after the moderation introduced by the factorization. Given

    the recent trends of institutional investors breaking up their orders into smaller pieces and

    the increasing prevalence of high frequency traders who often submit orders of small size,

    more and more stocks fall into the category for which the PIN estimation simply fails. Note

    that the overow and underow problems are not exclusively aicting only stocks with high

    trading volumes, however. To break down the estimation process, it takes no more than

    one day of severely one-sided order ows or the optimization procedures directing one of

    the interested parameters into a certain region of value that would trigger an overow or

    underow problem.

    In contrast to the dire situation regarding the empirical estimation of PIN, the PIN mea-

    sure as a theoretical concept has clearly gained popularity among researchers who are keen

    to measure the extent of information asymmetry in various contexts. It is thus understand-

    ably desirable to eectively eliminate the overow and underow problems from the PIN

    estimation. This paper provides one such solution by dynamically changing the factorization

    process for each pair of order ows on a daily basis so as to actively avoid triggering any

    overow or underow error.

    To implement the dynamic factorization, it is necessary to rst identify the trigger value

    for overow and underow errors in the hardware and software combination for the PINestimations. In order to obtain the trigger value, the researcher can keep increasing the

    input value C to an exponential function expfjCjg until the calculation fails. For instance, I

    use the SAS software on a desktop computer that associates expf708g with an overow and

    expf708g with an underow. Alternatively, one can use the constant function in SAS to

    identify the trigger values. In my computer, I nd that constant(logsmall) = 708:396 and

    constant(logbig) = 709:783. In other words, the combination of my computing hardware

    and software yields an approximate critical value C = 708, and the factorization of the daily

    log-likelihood function has to be done in a way to actively avoid numbers outside the range

    [exp(C); exp(C)], which is equivalent to [10C= ln(10); 10C

    = ln(10)]. Otherwise, an overow

    or underow problem can occur.

    Note that the overow and underow problems aect mainly the multiplicative factor m

    of the daily likelihood. The basic strategy of my factorization scheme is to pull the largest

    exponential input out of the multiplicative factor m, make it part of the common factor c,

    and identify all occasions necessary to replace an exponential with zero that would trigger an

    underow problem. The Appendix spells out the full details of a simple three-step procedure

    7

  • 8/7/2019 Unveiling the Identity of PIN from the Flash Crash

    10/37

    to dynamically factorize the daily log-likelihood function. Once the overow and underow

    problems are completely eliminated through the dynamic factorization algorithm, it is fairly

    easy to conduct a grid search over dierent regions of parameter values so as to ensure a

    global maximization.

    2.4 Computing Daily PIN Series

    The presence of information asymmetry is applicable in many contexts. It is very often

    the case that empirical researchers seek to measure the change in the extent of asymmetric

    information around certain corporate events that can be short-lived. It is the common practice

    in the literature to estimate the PIN measure over one quarter of daily data for a given stock.

    Therefore, the estimated quarterly PIN measures are not well suited for studying corporate

    events whose eects on information asymmetry may last only a day or two. While there are

    studies that extend the original theoretical framework underlying the PIN measure to allow

    for the estimation of daily PIN series (e.g., Lei and Wu, 2005; Easley, Engle, OHara and

    Wu, 2008), the high frequency series comes at the cost of imposing more elaborate structures

    on the observed order ows and thus the extended models are not nearly as popular as the

    simple estimation of quarterly PINs. In fact, even the studies that promote the extended

    PIN models allowing for high frequency PIN series often avoid a large scale estimation for

    many stocks and restrict the exercise to a selected few stocks instead.

    In this paper, I propose a simple method to impute the daily PIN series from the quarterly

    PIN estimates. The basic idea is to estimate the quarterly PIN measures with and without

    the trading day t and infer the daily PIN measure from the dierence in quarterly PINestimates. Denote by Nx the total number of trades in the quarter (e.g., 62 trading days)

    prior to trading day t. Denote by Nt the total number of trades in trading day t. Then the

    cumulative total number of trades over a 63-trading-day span ended on trading day t is

    Nc = Nx + Nt:

    Denote by P INx the PIN estimated from using the rst 62 days of trades. Denote by P INc

    the PIN estimated from using the 63 days of trades. Denote by P INt the imputed PIN

    measure for trading day t. Clearly, the informed orders have to add up over the period of 63

    days. In other words, the following relation holds

    Nx P INx + Nt P INt = Nc P INc:

    Substituting the denition of the cumulative total number of trades and re-arranging the

    8

  • 8/7/2019 Unveiling the Identity of PIN from the Flash Crash

    11/37

    terms, the implied daily PIN measure is

    P INt = P INc +NxNt

    (P INc P INx): (4)

    The daily incremental PIN relative to the prior 62 trading days is

    P INt P INx =NcNt

    (P INc P INx):

    So one alternative representation of the imputed daily PIN measure is

    P INt = P INx +NcNt

    (P INc P INx): (5)

    The intuition behind the daily PIN measure is straightforward. The PIN estimated over

    63 trading days is essentially a weighted average of the PIN measure on day t and the PIN

    estimated over the preceding 62 trading days. If the PIN measure over the period inclusiveof the trading day t is higher than the PIN measure excluding the trading day t, then it must

    be the case that the PIN on the trading day t is higher than before. On the other hand,

    if the PIN drops lower on the trading day t, then the inclusive PIN measure must be lower

    than the PIN measure excluding the trading day t.

    The inference above also delivers a boundary condition between P INx and P INc as an

    added benet. Since the daily PIN is bounded between zero and one, the estimated P INc

    must be bounded as well,

    NxNc

    P INx P INc NxNc

    P INx + NtNc

    : (6)

    In practice, estimated pairs of P INx and P INc that do not satisfy the above boundary

    condition should be re-visited. The violation of the boundary condition could have resulted

    from local maximum rather than global maximum estimates for either P INx or P INc and

    thus re-estimations may help. If the model structure is too rigid to t the data well, however,

    the estimated pairs have to be either discarded or re-estimated under an alternative model

    structure. For instance, an estimation window with a dierent time span may t the data

    better, with daily order ows over one month as opposed to one quarter.

    3 PIN Extension

    The ash crash on May 6, 2010, provides a good motivation to introduce liquidity shocks into

    an extended PIN framework. In the event of a rm-specic or market-wide liquidity shock,

    the stock prices can experience a sizeable reversal over a short period of time that would not

    9

  • 8/7/2019 Unveiling the Identity of PIN from the Flash Crash

    12/37

    necessarily be consistent with the presence of informed traders. Stock prices are typically

    assumed to gradually incorporate the information revealed from the informed orders and

    thus the information-based activities would imply a price continuation rather than a sizeable

    reversal. One way of justifying the sizeable price reversals associated with liquidity shocks is

    to introduce a third group of investors, known as the pseudo market makers, whose orders

    arrive only upon the liquidity shocks. In the extended PIN model, the pseudo market makers

    trade in a contrarian fashion in the same way as market makers would and thus move stock

    prices back to the fundamental level.

    3.1 Revised Trade Process and Sample Likelihood

    It is useful to extend the news arrival process so that news reects either signals about the

    fundamental value of the stock or simply liquidity shocks. The trading process can be revised

    as follows. There is a news event in each trading interval (e.g., one day) with probability .

    Conditional on the arrival of a news event, there is a probability that the news reects the

    liquidity shock and a 1 probability that the news reects the value-relevant fundamental.

    The orders from the pseudo market makers arrive only upon the occurrence of a liquidity shock

    and follow a Poisson process with arrival rate . As before, the informed orders arrive upon

    the release of fundamental news and follow a Poisson process with arrival rate . Regardless

    of whether a news event occurs, the uninformed orders always arrive in each trading interval

    and follow a Poisson process with arrival rate ". Irrespective of the news type, each news

    event has an identical probability of being negative. The uninformed orders are insensitive

    to the news nature and thus balanced across the buy and sell sides. In contrast, both the

    informed investors and the pseudo market makers submit only one-sided orders depending

    upon the news nature. Specically, the liquidity shock triggers only buy (or sell) orders from

    the pseudo market makers on days with bad (or good) news, while the fundamental news

    induces only sell (or buy) orders from the informed traders on days with bad (or good) news.

    Denote by the vector of parameters to be estimated. The daily likelihood of observing

    B buy trades and S sell trades on one specic stock is

    L[(B; S)j] = (1 ) exp(" ")(")B

    B!

    (")S

    S!

    +exp( " ") (+ ")B

    B!(")

    S

    S!

    +(1 ) exp(" ")(")B

    B!

    (+ ")S

    S!(7)

    +(1 )exp(" ")(")B

    B!

    ( + ")S

    S!

    +(1 )(1 ) exp( " ")( + ")B

    B!

    (")S

    S!:

    10

  • 8/7/2019 Unveiling the Identity of PIN from the Flash Crash

    13/37

    Purging some constants unrelated to parameters in , one can write the daily log-likelihood

    function in the following computation-friendly form,

    L() = 2" + (B + S)ln(")

    + ln8

  • 8/7/2019 Unveiling the Identity of PIN from the Flash Crash

    14/37

    Instead of specifying a constant based on the observed frequency of intraday price

    reversals, one may choose to directly introduce the price series into the model and make both

    the probability of liquidity shock and the PIN time-varying. It is also possible to link the

    time-varying probability of liquidity shock to certain well-known liquidity measures. One has

    to carefully balance though the benets of having a model with rich dynamics against the

    costs of imposing a more elaborate and thus complex structure on the data. I choose to allow

    for a constant in the PIN extension here because of its inherent parsimony.

    3.3 PIN Decomposition

    The conventional PIN measure in the extended framework can be re-dened as

    P IN =(1 ) +

    (1 ) + + 2"(8)

    to reect the fact that both the informed investors and the pseudo market makers contribute

    to order imbalances. The component of PIN measure that is purely related to information

    asymmetry can be isolated as

    P INinasy =(1 )

    (1 ) + + 2"; (9)

    after carving out the component of PIN measure related to illiquidity

    P INilliq =

    (1 ) + + 2": (10)

    Note that the arrival rate of the pseudo market makers is endogenously determined by the

    extended PIN framework and thus can have great inuence over the PIN decomposition, even

    though the constant probability is pinned down from the price reversal statistics that are

    outside the PIN framework.

    It is clear now that the conventional PIN measure actually consists of both an illiquidity

    component and an information asymmetry component. In the special case with = 0, the

    PIN measure fully represents the extent of asymmetric information. In the special case with

    = 1, the PIN measure fully represents the extent of illiquidity. One of these two roles can

    dominate the other from time to time. Keeping in mind the coexisting roles of illiquidityand information asymmetry, helps one to reconcile the potential confusion over the varying

    interpretations of the PIN measure in the literature (see the second footnote).

    From the perspective of the dual roles that the PIN measure can take, it is possible to

    address the PIN anomaly documented in Aktas, de Bodt, Declerck, and Van Oppens (2007).

    Instead of nding higher PIN estimates in periods with information leakage prior to the

    announcements of mergers and acquisitions, these authors nd lower PIN estimates and thus

    12

  • 8/7/2019 Unveiling the Identity of PIN from the Flash Crash

    15/37

    label the nding an anomaly. It is not inconceivable that the lower PIN estimates could

    actually reect the improved liquidity due to the heightened trading activities along with

    information leakage. I examine this empirical possibility in a separate paper.

    3.4 Literature Review

    In a closely related paper, Duarte and Young (2009) decompose the PIN measure into two

    components and attribute the pricing power of the PIN factor in the asset pricing context to

    the illiquidity component rather than the information asymmetry component. This nding

    makes an important contrast to the nding of information risk being priced in Easley, Hvid-

    kjaer, and OHara (2002). Despite the similarity over the decomposition of the PIN measure,

    this paper is distinctively dierent from Duarte and Young (2009) in several aspects. As dis-

    cussed earlier, the PIN measure is essentially equivalent to absolute order imbalance under

    the original framework in Easley, Kiefer, OHara and Paperman (1996). Duarte and Young

    (2009) inherit the same critical premise as Easley, Kiefer, OHara and Paperman (1996) that

    the informed investors are the exclusive source of order imbalances. Since this assumption

    does not have to hold in reality, Duarte and Young (2009) carefully discuss the potential

    problem with this assumption. They acknowledge the possibility that the order imbalances

    could also result from liquidity shocks rather than informed trades, leading to potentially

    problematic inferences. My paper directly tackles this important caveat in Duarte and

    Young (2009) and explicitly allows both the informed investors and the pseudo market mak-

    ers to create order imbalances. This papers decomposition of the PIN measure clearly reects

    the importance of liquidity shocks. In light of this important distinction, it is worthwhile to

    examine whether or not the conclusion in Duarte and Young (2009) regarding the pricing

    power of the two components of the PIN factor is robust to introducing liquidity shocks to

    the PIN framework. I carry out this empirical exercise in another paper.

    Moreover, the motivation behind the PIN extension in Duarte and Young (2009) is quite

    dierent. In this paper, I introduce the liquidity shocks, upon which the pseudo market

    makers arrive to move prices back to the fundamental level, in order to break the exclusivity

    of informed traders in creating order imbalances. In contrast, Duarte and Young (2009)

    reasonably argue that the one-sided nature of informed orders necessarily implies a negative

    correlation between buy orders and sell orders even though the observed daily correlation is

    positive. The PIN extension in Duarte and Young (2009) is motivated by eliminating the

    mismatch with the observed order ows in terms of correlation, and they accomplish the

    goal by introducing symmetric positive shocks to both buy and sell orders. In a way, their

    motivation and approach are quite similar to the PIN extension in Weston (2001) who also

    worries about trading volume on information days being abnormally large on both buy and

    sell sides. Weston (2001) argues that the positive correlation between buy and sell orders

    is driven by noise trading, which is characterized as a third group of traders that submit

    13

  • 8/7/2019 Unveiling the Identity of PIN from the Flash Crash

    16/37

    both buy and sell orders simultaneously. While Weston (2001) allows the symmetric order

    ow hikes to take place only on a day with news arrival, Duarte and Young (2009) introduce

    the symmetric order ow hikes regardless of whether the fundamental news arrives. Since

    informed traders and pseudo market makers in my extension would submit only one-sided

    orders depending upon the nature of news events, one limitation of my extension is that it

    does not imply a positive correlation between buy and sell order ows. This is an empirical

    limitation in the sense that the modelled trading interval does not have to span exactly one

    day and the observed positive correlation between buy and sell orders does not necessarily

    extend beyond all sampling intervals other than one day. One way to address the limitation is

    to have a ner grid of trading intervals so as to allow intraday interactions between dierent

    news events and thus higher buy orders and sell orders on the same day. Alternatively, one

    can follow the lead of Weston (2001) and Duarte and Young (2009) and further complement

    the orders from the pseudo market makers with symmetric order ow hikes to ensure daily

    order ows that are positively correlated.

    Easley, Lopez and OHara (2010) also study the PIN around the ash crash but rely on

    an approximation rather than the maximum likelihood estimation that is typically used in

    the literature. In light of the nding that the original PIN measure is essentially equivalent

    to absolute order imbalance as discussed earlier, Kaul, Lei and Stoman (2005) advocate

    using the absolute percentage of order imbalance (AIM) in place of the PIN measure that is

    much harder to estimate than AIM. Easley, Lopez and OHara (2010) advance this proposal by

    detailing a procedure to measure the absolute order imbalance in lieu of PIN and applying the

    revised measure to a number of dierent security products beyond stocks. As a result, these

    two papers step outside the typical PIN framework and do not conduct maximum likelihood

    estimations for the proposed PIN alternative. It is noteworthy that Easley, Lopez and OHara

    (2010) update the order imbalance more frequently among heavily traded stocks than thinly

    traded stocks and thus partly address the issue that Kaul, Lei and Stoman (2005) raise

    regarding the practice in the literature of applying a uniform frequency to measure order

    ows for all stocks. Unfortunately, however, the absolute percentage order imbalance can

    be a proxy for both illiquidity and information asymmetry much in the same way as the

    original PIN measure does. Both Kaul, Lei and Stoman (2005) and Easley, Lopez and

    OHara (2010) suer from the lack of distinction between these two roles precisely because

    of the awed assumption that the informed traders are the sole source of order imbalance.

    Moreover, Easley, Engle, OHara and Wu (2008) illustrate that using the absolute percentage

    order imbalance as an approximation for PIN may actually miss the dynamics over short-

    lived corporate events such as earnings announcements that a daily PIN series would have

    captured. So it is not straightforward to conclude that the documented properties of the

    alternative measure in Easley, Lopez and OHara (2010) necessarily reect those of PIN

    around the ash crash.

    14

  • 8/7/2019 Unveiling the Identity of PIN from the Flash Crash

    17/37

    In sum, this papers extension to the PIN framework marks an important departure from

    the extant literature and contributes a measure of information asymmetry that is conceptually

    purer than that is previously available.

    4 Empirical Analysis

    4.1 Construction of Sample

    My primary data source is the detailed stock transactions from the New York Stock Exchange

    (NYSE) Trade and Quote (TAQ) database between February 5 and May 6 of 2010. This

    study focuses on stocks listed on NYSE and American Stock exchange (AMEX). Because

    the auto-quotes are not ltered in TAQ, I follow Chordia, Roll and Subrahmanyam (2001) in

    using only the primary market (NYSE) quotes, and retain quotes within the regular trading

    block after purging those quotes with non-positive bid or ask prices, negative bid or ask sizes,

    missing time stamps, or bid prices higher than ask prices. I also remove trades that are out of

    sequence, recorded before the open or after the close time, have special settlement conditions,

    or have missing trade size or time stamp. As is the standard practice in the literature, the

    algorithm in Lee and Ready (1991) is utilized to determine the buyer-initiated or seller-

    initiated nature of each trade.4 Basically, all trades with a price higher (or lower) than the

    midpoint of the bid and ask prices are classied as buyer-initiated (or seller-initiated). Trades

    with a price identical to the mid point of the prevailing quote are subject to a tick test so

    that a trade is classied as buyer-initiated (or seller-initiated) if the price is higher (or lower)

    than the preceding trade. I follow the advice of Chordia, Roll and Subrahmanyam (2005)who recommend revoking the ve-second delay rule in Lee and Ready (1991) for matching

    trades with quotes starting in 1999.

    For each stock the PIN measure is estimated separately for the 62-day period ending on

    May 5, 2010, and the 63-day period ending on May 6, 2010. With a minimum requirement of

    order ows for 30 trading days, the maximum likelihood estimation is carried out using the

    NLMIXED procedure in SAS. The dynamic factorization of the daily log-likelihood function

    is remarkably successful. After an extensive grid search over dierent regions of parameter

    values to ensure a global maximum, the optimization exercise nishes successfully for all

    stocks in each estimation period. To facilitate imputing the daily PIN series on the dateof ash crash, stocks with zero trades on May 6, 2010, are removed from the sample. As

    discussed earlier in Section 2, the imputation of the daily PIN involves a set of boundary

    conditions on the resulting pair of quarterly PIN estimates. Only stock quarters that survive

    this additional requirement remain in the nal sample.

    4 Note that Boehmer, Grammig and Theissen (2007) study the bias on PIN estimates introduced by thesometimes erroneous classication of the trade initiation and provide a method to correct this bias.

    15

  • 8/7/2019 Unveiling the Identity of PIN from the Flash Crash

    18/37

    The master le of the TAQ database provides the CUSIP underlying each stock ticker

    symbol and I rely on the Center for Research in Security Prices (CRSP) database to extract

    the stock characteristics (such as primary exchange, share code and market equity) after

    merging the two datasets on CUSIP. There are 1,765 stocks on the NYSE/AMEX with

    qualied pairs of quarterly PIN estimates thus far. To check for results sensitivity to the exact

    grouping of stocks, I employ a set of lters to further rene the sample. After removing the

    American Depositary Receipts (ADRs), the sample size becomes 1,600. Focusing on common

    stocks with CRSP share code of either 10 or 11 further reduces the sample size to 998 stocks.

    To guard against the potential confounding eects from the earnings announcements adjacent

    to the ash crash event, I also remove stocks that have their earnings announced between

    May 5 and May 7, 2010, inclusive on both ends. The announcement dates are extracted from

    the actual earnings le for the U.S. rms in the I/B/E/S database. The sample size comes

    down to 847.

    4.2 Estimation of Original PIN Measure

    The simple algorithm of dynamic factorization for the daily log-likelihood function outlined in

    the Appendix is quite successful, achieving a 100% convergence rate in my sample while avoid-

    ing corner solutions and local maxima. In contrast, the common factorization in equation

    (3) fares much poorer and has a success rate of 45.12% in the same sample. The staggering

    failure rate from the common factorization in equation (3) illustrates the dire situation of the

    PIN estimation for the trading data in recent years. With algorithmic trading increasingly

    popular, many orders are split into smaller pieces, often resulting in tens and thousands of

    trades for one stock on one typical day. The sharp increase in the observed order ows makes

    it more likely to trigger a numerical overow or underow. Hence it is critical to have an

    eective factorization scheme that is exible enough to adapt to various patterns of daily

    order ows in eradicating the overow and underow problem.

    To show that extreme cases of order imbalances signicantly contribute to the estimation

    complexity, I run a logit regression to explain the success of maximum likelihood estimations

    for the original PIN framework with the common factorization of daily log-likelihood in

    equation (3). The cross-sectional regression results are reported in Table 1. When the

    total number of trades averaged across all trading days is the sole predictor, it is inversely

    related to the estimation success. In other words, the PIN estimation is more dicult among

    heavily traded stocks. The maximum absolute order imbalance also adds to the diculty of

    maximum likelihood estimation in that extreme imbalances often trigger numerical overow

    and underow problems. Note that the extreme absolute order imbalance delivers a better

    t than the total trades as a sole predictor for the estimation success, and there is little

    incremental explanatory power from the total trades after controlling for the extreme order

    imbalance. The percentage absolute order imbalance averaged across all trading days beats

    16

  • 8/7/2019 Unveiling the Identity of PIN from the Flash Crash

    19/37

    the aforementioned two predictors, however, by delivering a pseudo-R2 of 0.42 as a sole

    predictor. The positive coecient with the percentage absolute order imbalance suggests that

    the original PIN framework thrives at cases with extremely imbalanced orders on average,

    which in turn strongly reect the presence of informed orders. Putting these predictive

    variables together to explain the estimation success retains their respective signs with the

    exception of the total orders. Further augmenting the logit regression with the logarithmic

    market equity does not materially change the inferences, and as expected the estimations for

    large cap stocks are more dicult. All the estimated coecients in the top panel of Table 1,

    including the intercepts, are statistically signicant at the 1% level.

    In the bottom panel of Table 1, I repeat the same set of six regression designs while

    replacing the independent variables by the cross-sectional percentile rank when possible.

    The percentile ranks help us to gauge the result sensitivity to potential outliers since the

    regressions in the top panel would place too much weight on observations with extreme

    values. The qualitative pattern of results remains largely unchanged with a few exceptions.The goodness of t has improved after the transformation of independent variables. The

    rm size is no longer statistically signicant and the intercepts in two designs are also less

    statistically signicant than before. Moreover, the total number of trades has one extra

    change of sign in the bottom panel compared to the top panel.

    Overall order imbalances contribute to the estimation complexity in an interesting way.

    While a high level of extreme imbalances implies a lower estimation success, stocks with a

    higher percentage of order imbalances are actually easier to estimate. The former nding

    speaks directly to the numerical overow and underow problems of the estimation and the

    latter points to the strategy of the original PIN framework in identifying order imbalancesas informed trades.

    4.3 Inferences based on Daily PIN Series

    Table 2 reports the cross-sectional mean probability of informed trading related to the ash

    crash on May 6, 2010, based on the estimations of the original PIN model for a number

    of sub-samples. The reported PIN measures include the quarterly PIN excluding the ash

    crash event, the imputed daily PIN on the day of the ash crash as well as the incremental

    PIN on the day of the ash crash. Relative to the PIN estimated for the preceding quarter,there appears to be a market-wide hike of about 0.12 (or a doubling eect) in the imputed

    daily PIN on the ash crash event regardless of whether we exclude American Depositary

    Receipts, focus on the common stocks only, or exclude stocks with earnings announced on

    days immediately adjacent to the ash crash. The incremental PIN on May 6, 2010, is reliably

    positive, so are the quarterly and daily PIN measures.

    To better understand the cross-sectional dierences, I further classify the 847 common

    17

  • 8/7/2019 Unveiling the Identity of PIN from the Flash Crash

    20/37

    stocks in the nal sample into ten volume deciles based on the daily average total number of

    trades over the quarter ended on May 5, 2010. The pattern of PIN estimates in the quarter

    leading up to the ash crash appears similar to that reported in Easley, Kiefer, OHara and

    Paperman (1996). That is, thinly traded stocks have higher estimated PINs than heavily

    traded stocks. The PIN estimates are monotonically declining as the volume decile gets

    higher. The average PIN of 0.221 for the stocks in the lowest volume decile nearly triples

    that for the stocks in the highest volume decile at 0.080.

    The pattern of imputed PINs on the day of the ash crash is remarkably dierent. While

    the stocks in the lowest volume decile continue to have the highest average daily PIN at

    0.318, the average daily PIN for the rest of nine volume deciles ranges from 0.217 to 0.258

    without any discernible pattern among them. The heavily traded stocks in the 9th and 10th

    volume decile share the same average imputed PIN of 0.238 on the day of the ash crash,

    which almost triples their respective PIN level in the preceding quarter. The stark contrast

    of the estimated PINs around the ash crash, coupled with the seemingly lack of distinctionbetween stocks with high volume and those with modest volume on the event day, suggests

    the uniqueness and the usefulness of the ash crash event in revealing the true identity of

    the PIN. The pattern of daily imputed PIN series points out one key weakness of the original

    PIN as a pure measure of information asymmetry. For someone holding such a pure view, it

    is very worrisome that the level of asymmetric information exceeds 0.217 for stocks in every

    volume decile even among the most heavily traded stocks. It is also dicult to make the

    case that all stocks other than those most thinly traded stocks exhibit the same extent of

    asymmetric information on the day of the ash crash as long as they are not among the most

    thinly traded group. In contrast, it is far easier for someone viewing the PIN as a simple

    measure of illiquidity to associate the ash crash event with a market-wide liquidity shock

    that aects almost all stocks to the same degree on average.

    There are at least two ways to present the contrast between the daily imputed PIN on

    the day of the ash crash and the quarterly PIN just prior to that date. Table 2 reports

    both the cross-sectional mean incremental PIN and the ratio of average daily PIN to average

    quarterly PIN. The most thinly traded stocks experience the least increase in PIN on the day

    of the ash crash while the most heavily traded stocks experience the largest hike. Based on

    the ratio of means, the most thinly traded stocks register a 44% hike in PIN and the most

    heavily traded stocks 199%. The degree of PIN hike is gradually increasing as the volume

    decile climbs higher, but not in a strictly monotonic fashion. The nding of a stronger PIN

    hike on the day of the ash crash among those most frequently traded stocks is another piece

    of evidence corroborating the notion that the conventional PIN measure may actually better

    proxy for illiquidity than information asymmetry on the day of the ash crash. After all, the

    sta report by CFTC-SEC (2010) traces the ash crash to a large and aggressive trade in

    the S&P 500 index futures market, and the highest two volume deciles indeed include many

    18

  • 8/7/2019 Unveiling the Identity of PIN from the Flash Crash

    21/37

    stocks in the S&P 500 index.

    In light of the ndings above, it appears reasonable to conclude that the empirical evidence

    surrounding the ash crash leans in favor of the illiquidity interpretation rather than the

    information asymmetry interpretation for the conventional PIN measure. After all, it is very

    dicult to exclusively attribute the market-wide hike in the PIN on the day of the ash crashto asymmetric information as the original PIN model would. The extended PIN framework

    demonstrate that the conventional PIN measure consists of both an illiquidity component

    and an information asymmetry component. It is interesting to see how well the extended

    PIN model addresses the situation.

    4.4 Estimation of Extended PIN Measures

    As discussed in Section 3, it goes beyond the observation of daily order ows to identify

    possible liquidity shocks. Consequently, the constant probability of liquidity shocks isdetermined outside the PIN structure and becomes a crucial input for the extended PIN

    model. In this paper, I equate the constant probability of liquidity shock to the empirical

    frequency for the occurrence of a sizeable intraday reversal of stock prices within a given

    stock quarter. Here is the detailed procedure to identify sizeable intraday reversals. First,

    one can cut each regular trading day into thirteen half-hour slots from 9:30am EST to 4:00pm

    EST and nd the minimum and maximum prices within each time slot. Second, the timing

    information of these minimum and maximum prices along with the opening and closing prices

    helps us create an intraday return series and determine the intraday maximum and minimum

    returns. Suppose that the aforementioned intraday maximum return happens to be positiveand the intraday minimum return is negative. Moreover, suppose that both the intraday

    maximum and minimum returns exceed a pre-specied threshold level in absolute value, then

    this trading day qualies to be a day with sizeable intraday price reversals. Finally, one can

    tally the number of trading days with sizeable price reversals and compute the fraction of

    such days within all trading days over the entire estimation period. The resulting fraction is

    the constant probability of liquidity shocks that is used to estimate the rest of parameters

    in the maximum likelihood estimation and construct the two components of PIN.

    The pre-specied return threshold can be either stock-specic or uniform across all stocks.

    For the former, I use the sample standard deviation of daily stock returns based on the con-secutive daily closing prices over the entire estimation period. The intuition behind this

    benchmark is that intraday stock price reversals exceeding one standard deviation of daily

    returns on each direction constitute a sizeable swing within the day. In a robustness check, I

    also try to set a uniform cuto of 2% across all stocks to identify sizeable intraday reversals.

    The cross-sectional average is 0:0805 based on the stock-specic cutos and 0:1191 based

    on the uniform cuto of 2% during the quarter ended on the ash crash. When the date

    19

  • 8/7/2019 Unveiling the Identity of PIN from the Flash Crash

    22/37

    of the ash crash is excluded, the cross-sectional average s are 0:0801 and 0:1081, respec-

    tively. Given the equal weight assigned for all trading days associated with liquidity shocks

    irrespective of the magnitude of the price reversal beyond the threshold, the inclusion of the

    ash crash event only slightly boosts the empirical frequency of liquidity shocks.

    The extended PIN model is repeatedly estimated for the nal sample of common stockslisted on the NYSE/AMEX, excluding those with earnings announced on days immediately

    adjacent to the ash crash. The maximum likelihood estimations for each stock produce

    two pairs of PIN components, one for the quarter excluding the ash crash and the other

    including the ash crash. As before, each PIN component can be imputed for the day of the

    ash crash based on the set of quarterly PIN components with and without the ash crash.

    Depending upon the cuto used to identify liquidity shocks, it is possible that none of the

    trading days in the estimation period qualies to be a day with liquidity shocks, resulting in

    a zero probability of liquidity shock. For instance, 8.84% of stock quarters correspond to a

    zero probability of liquidity shock when the stock-specic cuto is used to identify liquidityshocks. In such cases, the extended PIN model degenerates to the original PIN model and

    no further estimation is needed.

    4.5 PIN Decomposition

    Under the extended PIN model, the conventional PIN measure can be decomposed into an

    information asymmetry component and an illiquidity component. Table 3 presents the decom-

    position for common stocks across ten volume deciles around the ash crash. In the quarter

    ended one day before the ash crash, the information asymmetry component P INinasy isnon-surprisingly large (at the level of around 0.20) among the lowest three volume deciles,

    gradually declines in trading volume but not in a strictly monotonic fashion, and reaches the

    lowest value 0.066 for the highest volume decile. The estimated P INinasy for the low volume

    deciles is two to three times larger than for the highest volume decile. The quantitative

    pattern here appears comparable to the conventional PIN measure in Table 2. In the same

    quarter, the illiquidity component P INilliq for the lowest volume decile is about twice as

    large for each of the rest nine volume deciles, reaching 0.026 and about 0.013, respectively.

    The quarterly decomposition prior to the ash crash suggests that the information asymme-

    try component strictly dominates the illiquidity component by a factor of 4.7 to 22.6. Even

    at the lowest volume decile where the illiquidity component is twice as large as the rest of

    volume deciles, the information asymmetry component is more than seven times as large as

    the illiquidity component.

    While the quarterly PIN decomposition is highlighted by the strict dominance of the infor-

    mation asymmetry component over the illiquidity component, the imputed PIN components

    on the day of the ash crash are characteristic of the disappearance of this strong dominance

    20

  • 8/7/2019 Unveiling the Identity of PIN from the Flash Crash

    23/37

    and the lack of any distinctive pattern across volume deciles. The daily P INinasy for stocks

    in each of the lowest three volume deciles exceeds 0.200, followed by the fourth and the ninth

    volume decile at 0.192 and 0.191, respectively. One might have expected the lowest volume

    decile to continue having the highest P INilliq on the day of the ash crash as it does in the

    quarter prior to the ash crash. This is actually not the case as the fth volume decile has the

    highest P INilliq . As far as the magnitude is concerned, the illiquidity component beats the

    information asymmetry component in the fth and sixth volume deciles, and is only slightly

    behind in the other eight volume deciles.

    The quarterly and daily PIN components reported in Table 3 are all reliably positive,

    statistically signicant at any conventional level. The daily incremental PIN relative to the

    quarterly PIN in terms of the information asymmetry component shows a modest increase

    among thinly traded stocks but registers a fairly large hike among heavily traded stocks,

    ranging from 0.041 for the lowest volume decile to 0.106 for the highest volume decile. Also

    note that the incremental P INinasy is not statistically dierent from zero at the 1% levelfor the lowest three volume deciles and the fth decile. When expressed in terms of the

    ratio of average daily P INinasy to average quarterly P INinasy, the PIN hike varies from

    21% to 124%. Overall the hike in asymmetric information on the day of the ash crash is

    considerably weakened both economically and statistically under the extended model than

    the original PIN framework that fails to distinguish the information asymmetry component

    from the illiquidity component.

    In a striking contrast, the daily PIN in terms of the illiquidity component is drastically

    higher than its quarterly counterpart. The daily incremental P INilliq is invariably positive

    and reliably so for all ten volume deciles. The boost in P INilliq on the day of the ash crashamounts to a more than ve-fold increase for the most thinly traded stocks and a nearly

    14-fold increase for the fourth volume decile.

    The aforementioned results of PIN decomposition are not conned to using the stock-

    specic cutos to identify liquidity shocks. In a robustness check, I repeat the exercise after

    dening liquidity shocks as intraday price reversals exceeding 2% for all stocks. The results

    under the uniform 2% cuto are reported in Table 4, which closely replicates all the qualitative

    patterns in Table 3 under the stock-specic cutos.

    There are a number of lessons we can draw from the PIN decomposition around the ash

    crash. First and foremost, it is critically important to introduce liquidity shocks to extend

    the original PIN framework. Otherwise, the original PIN measure can be misleadingly high in

    cases where the credit to the illiquidity component is due. Second, even though the illiquidity

    component of PIN is negligibly small in the quarter leading up to the ash crash, it accounts

    for nearly as large a fraction as the information asymmetry component on the day of the ash

    crash. Since the asset pricing tests of the information risk as measured by a PIN factor have

    been done at the annual interval using the original PIN model, it may well be worthwhile to

    21

  • 8/7/2019 Unveiling the Identity of PIN from the Flash Crash

    24/37

    revisit the test using the extended PIN model at the monthly interval that is typically used

    by asset pricing studies. To the extent that the illiquidity component of PIN is declining in

    the length of the sampling interval, the factor based on the illiquidity component of PIN may

    be even stronger than previously reported in Duarte and Young (2009). Third, the roughly

    similar magnitude of P INilliq across volume deciles points to the commonality of liquidity

    shocks across all stocks at the time of crisis.5 This is evidence further corroborating the sta

    report by CFTC-SEC (2010) that documents the ash crash as twin liquidity crises on the

    S&P 500 index futures market and the equity market.

    4.6 Forecasting the Opening Bid-Ask Spread

    To examine the role of PIN components in predicting future spreads, I run the regression

    ln(ospreadi;t) = a0 + a1 ln(P INinasy;i;x) + a2 ln(P INilliq;i;x) + a3 ln(volumei;x) + mi;t:

    The dependent variable is the logarithmic opening bid-ask spread as a percentage of midpoint

    price on the day of the ash crash (with time subscript t). The set of predictors are measured

    in the quarter immediately preceding the ash crash (with time subscript x) and include

    the logarithmic PIN components on information asymmetry and illiquidity as well as the

    logarithmic share volume. The individual stocks are denoted by the subscript i and the

    residuals are denoted by mi;t. The logarithmic transformation of variables is partly motivated

    by theory and has appeared in previous studies such as Weston (2001) and Easley, Engle,

    OHara and Wu (2008). The cross-sectional regression results are presented in Table 5.

    When liquidity shocks are identied using rm-specic cutos, both the information asym-

    metry and the illiquidity components are positive and statistically signicant as a standalone

    predictor. The illiquidity component has a much weaker forecasting power for the opening

    spread than does the asymmetric information component, with adjusted R2 of 0.012 and

    0.195, respectively. Both PIN components in the preceding quarter are positive and highly

    statistically signicant when they join the share volume in forecasting the opening spread on

    the day of the ash crash. Not surprisingly, the average daily share volume in the preceding

    quarter is negatively associated with the opening spread and statistically signicant.

    All the qualitative patterns of the cross-sectional regression results above are preserved

    when liquidity shocks are identied through intraday price reversals that uniformly exceed 2%

    for all stocks. As far as the goodness of t is concerned under the extended PIN estimations

    with a uniform cuto, the forecasting power of the information asymmetry component is

    5 Chordia, Roll and Subrahmanyam (2000), Hasbrouck and Seppi (2001) and Korajczyk and Sadka (2008)study the cross-sectional commonality of liquidity. It can be interesting to carry out the principal componentanalysis on the illiquidity component of the PIN over an extended period of time even though the datalimitation around the ash crash event prevents such an exercise here.

    22

  • 8/7/2019 Unveiling the Identity of PIN from the Flash Crash

    25/37

    weakened somewhat while that of the illiquidity component strengthens, with adjusted R2 of

    0.095 and 0.181, respectively.

    It is clear that the two PIN components and the share volume are able to jointly explain a

    large fraction of the cross-sectional variations in the opening spread. The adjusted R2 is 0.292

    with stock-specic cutos and 0.394 with a uniform 2% cuto. The power of the informationasymmetry component forecasting the opening spread and the positive and highly signicant

    association between these two variables are consistent with the ndings in the literature (e.g.,

    Easley, Kiefer, OHara and Paperman, 1996; Weston, 2001; Lei and Wu, 2005; Easley, Engle,

    OHara and Wu, 2008). This is the rst paper to my knowledge that formally introduces

    the rm-specic or market-wide liquidity shocks into the PIN framework so as to break the

    exclusivity of the informed trades in creating order imbalances. So the new nding of the

    illiquidity component of PIN as an important predictor for future bid-ask spreads validates the

    PIN extension in this paper. It contributes to the literature by helping us better understand

    the role of liquidity shocks and allowing practitioners to better anticipate the trading costsand design trading strategies accordingly.

    4.7 Explaining the Illiquidity Component of PIN

    In a further analysis of the illiquidity component of PIN, I run the contemporaneous cross-

    sectional regression for stocks in the nal sample on the day of the ash crash

    P INilliq;i;t = b0 + b1 ln(twspreadi;t) + b1 ln(volumei;t) + ni;t:

    The individual stocks are denoted by subscript i and the residuals are denoted by ni;t. Since

    the opening spread provides only a snap shot, it may not adequately represent the full-day

    dynamics of the spread on the day of the ash crash. So I construct the time-weighted

    average spread (denoted by twspread) as a percentage of the midpoint price.6 The illiquidity

    component of PIN is expected to be positively correlated with the time-weighted average

    spread. The contemporaneous share volume is also included in the regression.

    Table 6 reports the regression results with liquidity shocks dened using either stock-

    specic cutos or the uniform 2% cuto. Regardless of the cuto scheme, the time-weighted

    average spread is positive and highly statistically signicant in explaining the cross-sectional

    variations of the illiquidity component of PIN. This relationship is quite remarkable in that

    the illiquidity component of PIN is based on the rather coarse price reversal statistics and

    primarily driven by the daily order ows that are abstract from any price information. So

    6 For the construction of the time-weighted average spread on the day of the ash crash, I retain the sameset of quotes that are used to determine the trade initiation as the Lee and Ready (1991) procedure requires,and purge other quotes that do not correspond to any actual trades. The time span in seconds between theretained quotes is then used as the weight for each bid-ask spread in percentage of the midpoint price tocompute the time-weighted average spread for each day.

    23

  • 8/7/2019 Unveiling the Identity of PIN from the Flash Crash

    26/37

    the positive relationship is quite revealing in light of the fact that the time-weighted spread

    is purely price information.

    As a single explanatory variable, the share volume is inversely related to the illiquidity

    component. This relationship is weak from the statistical viewpoint, however, reinforcing

    the conclusion from the visual inspection of Tables 3 and 4 that volume is not a very goodsorting device for the illiquidity component of PIN on a standalone basis. After controlling for

    the time-weighted average spread, however, the logarithmic share volume registers a positive

    coecient that is highly statistically signicant. The positive relationship with share volume

    is quite unique here because it alludes to the fact that some of the S&P 500 index component

    stocks are hit the hardest with extreme price reversals during the ash crash, and the most

    heavily traded stocks experience the largest hike in the illiquidity component of PIN.

    When liquidity shocks are identied with stock-specic cutos, the time-weighted spread

    and the share volume explain a fairly small fraction of the cross-sectional variations in the

    illiquidity component with an adjusted R2 of 0.024. When a uniform 2% cuto is used instead,

    the time-weighted spread and the share volume have a much better t with the data. The

    adjusted R2 is 0.161. While further research is needed to glean additional insights from the

    illiquidity component of PIN, the ndings thus far in this paper illustrate the importance of

    introducing liquidity shocks into the PIN framework.

    5 Conclusion

    The ash crash event on May 6, 2010, provides both the motivation and the testing eldfor this paper. During this event, the sharp drop of stock prices and the swift reversal over

    a thirty-minute interval are very interesting in that they essentially amount to a serious

    challenge to the original PIN framework in Easley, Kiefer, OHara and Paperman (1996). On

    the day of the ash crash, there is a wide spread large increase in PIN for various sub-samples

    of stocks, and the PIN nearly tripled among the most heavily traded stocks. Such a pervasive

    PIN hike cannot be solely attributed to the increase of asymmetric information as would the

    original PIN model and the gradual incorporation of private information into stock prices

    seems at odds with the sizeable and quick reversal of stock prices across the board.

    By explicitly allowing for liquidity shocks, this paper extends the original PIN frameworkto introduce a third trading motive in addition to the private information and the exogenous

    liquidity needs. The pseudo market makers can submit contrarian orders during periods of

    liquidity shocks and thus help to restore stock prices back to the fundamental level, resulting

    in an observed price reversal. The coexistence of fundamental news and liquidity shocks in

    the extended PIN model implies that the informed investors are no longer the sole source

    of order imbalances and the pseudo market makers can also submit one-sided orders during

    24

  • 8/7/2019 Unveiling the Identity of PIN from the Flash Crash

    27/37

    liquidity shocks. Consequently, the conventional PIN measure consists of both an information

    asymmetry component and an illiquidity component.

    The extended PIN model is then put to test around the ash crash. The illiquidity

    component of PIN accounts for a negligible fraction during the quarter leading up to the ash

    crash but experiences a ve- to fourteen-fold hike on the day of the ash crash, reaching at alevel nearly at par with the information asymmetry component. Even though the information

    asymmetry component also witnesses an increase on the day of the ash crash, it is not nearly

    as drastic as the illiquidity component. Compared to the original PIN framework, the hike

    in asymmetric information on the day of the ash crash is weakened substantially under the

    extended model both from the statistical and the economic perspectives. Moreover, there is

    evidence that both the information asymmetry component and the illiquidity component of

    PIN can forecast the opening bid-ask spread. On the day of the ash crash, the illiquidity

    component of PIN is positively and contemporaneously correlated the time-weighted average

    spread, further supporting the notion that the rm-specic or market-wide liquidity shockaects the inference on the information-based trading.

    These new ndings contribute to the literature and deepen our understanding of the

    role of information asymmetry. They certainly point to the importance of accounting for

    liquidity shocks in the PIN framework and invite us to revisit a number of interesting issues.

    For instance, would the PIN decomposition under the extended model imply a stronger PIN

    factor or a weaker one in the asset pricing context? Can we actually resolve the documented

    PIN anomaly in the context of mergers and acquisitions announcements? I study these

    and other interesting questions in a series of companion studies.

    In addition to the development and testing of an extension to the PIN framework, this

    paper also provides a number of methodological improvements to the PIN estimation. In the

    Appendix, I outline one simple procedure to dynamically factorize the daily log-likelihood

    function for the maximum likelihood estimation and eectively eliminate the numerical over-

    ow and underow problems that have long plagued the academic researchers and practition-

    ers alike in the PIN context. Moreover, this paper also furnishes the guidelines of imputing the

    daily PIN series through repeated estimations of quarterly PINs. Researchers are expected

    to benet from these methodological improvements in a wide variety of settings, especially

    among the corporate event studies that would most appreciate the availability of a daily PIN

    series without the cost of imposing a complex data structure.

    25

  • 8/7/2019 Unveiling the Identity of PIN from the Flash Crash

    28/37

    6 Appendix. Dynamic Factorization of Log-likelihood

    For ease of exposition, I illustrate the factorization process under the original PIN framework.

    The daily log-likelihood function can be written as

    L() = 2" + (B + S)ln(") + ln hXwi exp(xi)i ;where the weights and the exponential inputs are given in the table below.

    Weight wi Exponential Input xi

    1 0

    Sln(k)

    (1 ) B ln(k)

    The computational complexity lies in the weighted sum of exponential functions, each of which

    has the potential of triggering an overow or underow. One can dynamically factorize the

    log-likelihood function on a daily basis using a three-step procedure.

    First, nd the maximum input xmax and pull xmax into the common factor. Alternatively

    speaking, one can compute the modied exponential input

    yi = xi xmax:

    Second, examine each modied exponential input yi and see if it falls below the critical

    value C that is determined by the researchers hardware and software for estimating the

    PINs. Note that the maximum input ymax is zero and thus yi 0 always holds. Ifyi C,

    then it is necessary to force a zero weight so as to avoid the underow from evaluating exp(yi).

    If C < yi 0, then it is ne to directly evaluate exp(yi). Alternatively speaking, one can

    compute the modied weight

    vi = wi (C < yi 0);

    where the indicator function takes the value of 1 if C < yi 0 and 0 otherwise.

    Third, the daily log-likelihood function can be rewritten as

    L() = 2" + (B + S)ln(") + xmax + ln

    Xvj6=0

    vj exp(yj)

    :

    Note that there is no need to check for the logarithmic inputs. The arrival rates are often

    coded as an exponential function to ensure their positiveness so ln(") will not cause numerical

    problems. Moreover, the fact of yi 0 implies that the input for the second logarithmic

    26

  • 8/7/2019 Unveiling the Identity of PIN from the Flash Crash

    29/37

    function is properly bounded between 0 and 1.

    Having discussed the process of factoring the log-likelihood function, I should also note the

    importance of checking for the presence of overow and underow problems when handling

    the various transformations of raw parameter inputs that help to ensure that 0 1,

    0 1, > 0, and " > 0. Researchers often use the exponential transformation to ensurea positive parameter and the logistic transformation for a parameter that is a probability. One

    can use techniques similar to the ones documented above to handle these transformations.

    For instance, denote by e, e, e and e" the parameters before transformation and c a constantto scale the arrival rates so as to make Hessian matrix of the vector of parameters well

    behaved. The transformation for the news probability and the probability of arrived news

    being negative can be written as

    =

    8>>>>>:

    0 if

    e C;

    11+exp(e) if jej < C;1 if e C. =

    8>>>>>:

    0 if

    e C;

    1

    1+exp(e) if e < C;1 ife C.

    The informed arrival rate and the uninformed arrival rate " can be written as

    =

    8>>>>>:

    0 ife C;exp(e) exp(c) if jej < C;exp(C) exp(c) ife C.

    " =

    8>>>>>:

    0 ife" C;exp(e") exp(c) if je"j < C;exp(C) exp(c) ife" C.

    Finally, one needs to anticipate potential overow and underow problems for the com-putation of

    k ="

    + "=

    1

    1 + exp(e e") and ln(k) = ln [1 + exp(e e")] :There are four cases to consider. (1) If e e" C then k = 1 and ln(k) = 0. (2) If e e" C then k = 0 and ln(k) = e" e. (3) Ifje e"j < C and 1+exp(e e") 10C= ln(10)then k = 0 and ln(k) = e" e. (4) If je e"j < C and 1 + exp(e e") < 10C= ln(10) thenk = 1=[1 + exp(

    e

    e")] and ln(k) = ln [1 + exp(

    e

    e")].

    The log-likelihood function under the extended PIN framework can be handled in a similarway. For instance, the table of weights and exponential inputs can be simply augmented with

    two more rows along with the introduction of the parameter. I omit the details for brevity.

    27

  • 8/7/2019 Unveiling the Identity of PIN from the Flash Crash

    30/37

    References

    [1] Aktas, Nihat, Eric de Bodt, Fany Declerck, and Herve Van Oppens, 2007, The PIN

    anomaly around M&A announcements, Journal of Financial Markets 10, 169-191.

    [2] Benos, Evangelos, and Marek Jochec, 2007, Testing the PIN variable, University of

    Illinois at Urbana-Champaign Working Paper.

    [3] Bessembinder, Hendrik, Kalok Chan, and Paul J. Seguin, 1996, An empirical examina-

    tion of information, dierences of opinion, and trading activity, Journal of Financial

    Economics 40, 105-134.

    [4] Bharath, Sreedhar T., Paolo Pasquariello, and Guojun Wu, 2009, Does asymmetric

    information drive capital structure decisions?, Review of Financial Studies 22, 3211-

    3243.

    [5] Boehmer, Ekkehart, Joachim Grammig, and Erik Theissen, 2007, Estimating the proba-

    bility of informed tradingdoes trade misclassication matter?, Journal of FinancialMarkets 10, 26-47.

    [6] Chan, Kalok, 1992, A further analysis of the leadlag relationship between the cash

    market and stock index futures market, Review of Financial Studies 5, 123-152.

    [7] Chordia, Tarun, Richard Roll, and Avanidhar Subrahmanyam, 2000, Commonality in

    liquidity, Journal of Financial Economics 56, 3-28.

    [8] Chordia, Tarun, Richard Roll, and Avanidhar Subrahmanyam, 2001, Market liquidity

    and trading activity, Journal of Finance 56, 501-530.

    [9] Chordia, Tarun, Richard Roll, and Avanidhar Subrahmanyam, 2005, Evidence on the

    speed of convergence to market eciency, Journal of Financial Economics 76, 271-

    292.

    [10] CFTC-SEC, 2010, Findings regarding the market events of May 6, 2010, Report of the

    Stas of CFTC and SEC to the Joint Advisory Committee on Emerging Advisory

    Issues.

    [11] Duarte, Jeerson, Xi Han, Jarrad Harford, and Lance Young, 2008, Information asym-

    metry, information dissemination and the eect of regulation FD on the cost of

    capital, Journal of Financial Economics 87, 24-44.

    [12] Duarte, Jeerson, and Lance Young, 2009, Why is PIN priced?, Journal of FinancialEconomics 91, 119-138.

    [13] Easley, David, Robert F. Engle, Maureen OHara, and Liuren Wu, 2008, Time-varying

    arrival rates of informed and uninformed trades, Journal of Financial Econometrics

    6, 171-207.

    [14] Easley, David, Soeren Hvidkjaer, and Maureen OHara, 2002, Is information risk a de-

    terminant of asset returns?, Journal of Finance 57, 2185-2221.

    28

  • 8/7/2019 Unveiling the Identity of PIN from the Flash Crash

    31/37

    [15] Easley, David, Nicholas M. Kiefer, and Maureen OHara, 1996, Cream-skimming or

    prot-sharing? The curious role of purchased order ow, Journal of Finance 51,

    811-833.

    [16] Easley, David, Nicholas M. Kiefer, and Maureen OHara, 1997, One day in the life of a

    very common stock, Review of Financial Studies 10, 805-835.[17] Easley, David, Nicholas M. Kiefer, Maureen OHara, and Joseph B. Paperman, 1996,

    Liquidity, information, and infrequently traded stocks, Journal of Finance 51, 1405-

    1436.

    [18] Easley, David, Marcos M. Lopez de Prado, and Maureen OHara, 2010, The microstruc-

    ture of the Flash crash: Flow toxicity, liquidity crashes and the probability of

    informed trading, Cornell University Working Paper.

    [19] Easley, David, and Maureen OHara, 1992, Time and the process of security price ad-

    justment, Journal of Finance 47, 577-605.

    [20] Easley, David, and Maureen Ohara, 2004, Information and the cost of capital, Journal

    of Finance 59, 1553-1583.

    [21] Hasbrouck, Joel, and Duane J. Seppi, 2001, Common factors in prices, order ows, and

    liquidity, Journal of Financial Economics 59, 383-411.

    [22] Jayaraman, Sudarshan, 2008, Earnings volatility, cash ow volatility, and informed trad-

    ing, Journal of Accounting Research 46, 809-851.

    [23] Kaul, Gautam, Qin Lei, and Noah Stoman, 2005, AIMing at PIN: Order ow, infor-

    mation, and liquidity, University of Michigan Working Paper.

    [24] Korajczyk, Robert A., and Ronnie Sadka, 2008, Pricing the commonality across alter-

    native measures of liquidity, Journal of Financial Economics 87, 45-72.

    [25] Lee, Charles M. C., and Mark J. Ready, 1991, Inferring trade direction from intraday

    data, Journal of Finance 46, 733-746.

    [26] Lei, Qin, and Guojun Wu, 2005, Time-varying informed and uninformed trading activi-

    ties, Journal of Financial Markets 8, 153-181.

    [27] Vega, Clara, 2006, Stock price reaction to public and private information, Journal of

    F