using r in continuous assurance: restricted vector autoregressive model (rvar) of continuity...

Upload: erik-van-kempen

Post on 14-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 Using R in Continuous Assurance: Restricted Vector Autoregressive Model (RVAR) of Continuity Equations

    1/6

    CONTRIBUTED RESEARCH ARTICLE 1

    Using R in Continuous Assurance:

    Restricted Vector Autoregressive Model

    (RVAR) of Continuity Equationsby Erik van Kempen

    Abstract Continuous assurance is a methodology to provide assurance on financial data on a nearreal-time basis. One of the fundamental elements of continuous assurance is continuous data auditingin which the integrity of the data provided by the client is tested. Continuity equations can be used toevidence assertions regarding data integrity. In order to do so, data is tested by predicting subsequentvalues based on a fitting model. In this paper the Restricted Vector Autoregressive Model (RVAR) iscovered as a method for continuous data auditing. The vars package is used to implement the modeland prediction methods in R.

    Introduction

    Continuous assurance has been a subject of interest for auditors and financial professionals for the lastthree decades. However, this field of research took off only after Vasarhelyi et al. (2004) publisheda widely accepted conceptual framework for continuous assurance in 2004. In the following yearsadditional studies were performed in this field, but most of these studies were focused on refiningthe theoretical framework and developing new and innovative analysis methods. Fully functionalimplementations were not yet in scope.

    This paper focuses on the implementation of continuity equations in R as one of the most powerfultools from the continuous assurance domain. As most auditors and financial professionals are nottrained to develop algorithms, code or applications the final implementation in R needs to be fairlyeasy to understand for these target groups.

    Continuous assurance

    Continuous auditing [or continuous assurance] is a methodology that enables independent auditors toprovide written assurance on a subject matter using a series of auditors reports issued simultaneouslywith, or a short period of time after, the occurrence of events underlying the subject matter. (CanadianInstitute of Chartered Accountants (CICA), 1999)

    In order to be able to provide assurance on a near real-time basis, the auditors have to relyheavily on automated testing. Vasarhelyi et al. (2004, 2010) have defined three elements of continuousassurance and continuous monitoring: Continuous Control Monitoring (CCM), Continuous DataAuditing (CDA), Continuous Risk Monitoring and Assessment (CRMA). CCM can be compared tointerim testing of procedures in the conventional audit framework and CDA can be compared to finaltesting focusing more on data than procedures. These two elements combined can be used to providesufficient assurance. CRMA can be used as an additional part of the control framework, but is notessential for providing assurance.

    CDA verifies the integrity of the data flowing through the information system. The data providedby the client is the basis for all testing procedures, so data assurance forms an essential part ofcontinuous assurance. Continuity equations can be used as a tool from the CDA sub-domain toevidence assertions focusing on data integrity.

    Continuity equations

    Continuity equations have been a fundamental part of classical physics since the 18th century. Theseequations describe the transport of a quantity, while simultaneously ensuring conservation of thisquantity (like mass and/or energy). Accordingly similar relations can be defined for the transportof quantities within a system in the financial domain. The movement of reported quantities, e.g.ordered kilograms or invoiced units, between steps in the key business processes can be describedwith continuity equations.

    The term continuity equations was coined in 1991, when Vasarhelyi and Halper (1991) modeledthe flow of billing data at AT&T. Although Vasarhelyi and Halper proposed continuity equations morethan 20 years ago, little research has been performed on the application in practice and implementation

  • 7/30/2019 Using R in Continuous Assurance: Restricted Vector Autoregressive Model (RVAR) of Continuity Equations

    2/6

    CONTRIBUTED RESEARCH ARTICLE 2

    of a decent continuity equations model. Especially research focusing on the Vector Autoregressive(VAR) model has been rarely performed in the last two decades. Only Dzeng (1994) and Kogan et al.(2010) have also considered the VAR model in their papers.

    In most businesses the flow of goods is the most important basis for revenue recognition. Assuch, it can be used to provide evidence for the completeness, timeliness and accuracy of the reportedrevenue. If the continuity equations hold for a specific business process, one can assert that thereare no leakages from the transaction flow, i.e. the integrity of the flow of goods can be asserted.Therefore, continuity equations provide a method to evidence the integrity of the basis for revenuerecognition, which makes them a valuable tool in continuous assurance.

    Continuity equations are based on historical data of quantities in the separate steps of businessprocesses. For example, the sales cycle can be modeled as three separate steps: receiving the orderfrom the customer, shipping goods to the customer and invoicing for the ordered and shipped goods.The quantity of ordered goods today will of course show up in the invoicing step a certain amount ofdays later. The daily flow of goods between these steps can be defined with a certain quantity Q and alag between the steps .

    In this paper we will focus on the sales cycle consisting of the three previously defined processsteps. The continuity equations for the sales cycle can be represented as Equation 1. In this modelorderedt, shippedt and invoicedt are respectively the quantities ordered, shipped and invoiced at timet, the terms are N 1 transition vectors for a multivariate linear model, the M terms are N 1vectors containing daily aggregates of quantities Q for the given dimension and N is the amount of

    time periods covered in the model.

    orderedt = ooM(ordered) +soM(shipped) +ioM(invoiced)

    shippedt = osM(ordered) +ssM(shipped) +isM(invoiced)

    invoicedt = oiM(ordered) +siM(shipped) +iiM(invoiced)

    (1)

    Each of these sub-equations models a predictor for the reported quantities in a specific step in thebusiness process. As previously defined, the quantities are related to quantities in the other processsteps by a time delay (lag). For example, if orders are shipped in exactly one day, without exception,and invoicing is performed simultaneously with shipping, the resulting predictors can be defined asequation 2.

    ordert = 1 shippedt+1 + 2 invoicedt+1

    shippedt = 1 ordert1 + 2 invoicedt

    invoicedt = 1 ordert1 + 2 shippedt

    (2)

    In practice most business processes cannot be modeled this simplistically sufficiently due to varyinglags and dependencies between process steps.

    Basic Vector Autoregressive model

    In the basic Vector Autoregressive (VAR) model the model is estimated optimizing for the overallR2 by trying different lags for the process steps. Only the maximum expected lag is provided to thealgorithm, which then tries to find the best fitting model by iterating trough all lag possibilities. Theexact lags do not have to be known prior to modeling as the best fitting lags are determined whilemodeling.

    One can easily understand that it is not always trivial to determine lags prior to the modeling

    process, e.g. lags in the purchasing cycle are highly dependent on the policies and processes at thirdparties. Therefore, the VAR model can be a powerful tool for modeling continuity equations whenlags can not be predefined easily.

    Restricted Vector Autoregressive model

    Kogan et al. (2010) have shown in their studies that the VAR model shows outstanding results. Moreimportantly, they showed that the Subset VAR or Restricted VAR model resulted in better results.With a MAPE (mean absolute percentage error) of 0.3374 on the test set it outscored even several othermodels, i.e. SEM, GARCH and LRM. Only the BVAR model performed better whenk taking only theMAPE into account, but it also resulted in a larger standard deviation for the absolute percentage error.The RVAR model was found to be one of the best models for continuity equations.

    The Restricted Vector Autoregressive Model translates roughly to optimizing for R2 of the predictor

    by removing insignificant variables from the VAR model. For example, if the mean lag between orderand shipping is less than a month shipment shippedt+365 a year after ordering is obviously notsignificant and thus excluded from the model. This method iterates the modeling process per equation

  • 7/30/2019 Using R in Continuous Assurance: Restricted Vector Autoregressive Model (RVAR) of Continuity Equations

    3/6

    CONTRIBUTED RESEARCH ARTICLE 3

    by removing all variables with |t|-statistics below a predefined threshold, as explained in figure 1.

    Figure 1: RVAR modeling process. The initial VAR model is restricted by excluding parameters witha t-statistic below a predefined threshold. The model is re-estimated followed by the next exclusioniteration, until all parameters satisfy the t-statistic requirement.

    Implementation

    The RVAR modeling is implemented in four stages: data collection, pre-processing, modeling andprediction. The code is centered around the vars pacakge, which has been developed and pusblishedby Bernhard Pfaff and Matthieu Stigle and is available via CRAN. (Pfaff, 2008b; Pfaff and Im Taunus,2007; Pfaff, 2008a) The package includes several functions for modeling VARs, testing the VARs andpresenting the results.

    Data collection

    The proposed base model for the sales cycle is based on three different quantities: the ordered quantity,the quantity of goods shipped and the quantity invoiced. These three variables can be provided by

    most ERP systems on a daily basis.In this implementation data was used from a wholesaler in technical supplies. This company uses

    an off-the-shelf solution of Microsoft Dynamics AX 2009. The data was extracted from separatelygenerated reports for each of the process steps by merging the columns by date, as presented in figure2.

    Figure 2: Data model consisting of daily aggregates for three different stages in the sales cycle: orderedquantity (SO), quantity of goods shipped to customer (GS) and quantity invoiced (IS) combined bydate via a SQL join clause. The date serves as the primary and foreign keys of the data source involved.

    The resulting data is exported as a CSV file to be imported by the implementation of the modelingtool in R. The CSV file consists of four data fields, i.e. date, the quantities ordered, quantities shippedand quantities invoiced, and is imported.

    http://cran.r-project.org/package=varshttp://cran.r-project.org/package=vars
  • 7/30/2019 Using R in Continuous Assurance: Restricted Vector Autoregressive Model (RVAR) of Continuity Equations

    4/6

    CONTRIBUTED RESEARCH ARTICLE 4

    > data.raw summary(data.raw)

    Date SO GS IS

    Min. :2007-01-02 Min. : 0 Min. : 0 Min. : -100

    1st Qu.:2007-03-22 1st Qu.: 38384 1st Qu.: 42098 1st Qu.: 39736

    Median :2007-06-14 Median : 63227 Median : 63738 Median : 60765

    Mean :2007-06-14 Mean : 67769 Mean : 62624 Mean : 606953rd Qu.:2007-09-06 3rd Qu.: 85723 3rd Qu.: 83428 3rd Qu.: 79757

    Max. :2007-11-30 Max. :547694 Max. :285074 Max. :299235

    Pre-processing

    The data generated by the ERP system is probably not provided in the correct data format as usedby the modeling functions. Therefore, the data has to be pre-processed in order to be used as inputfor the modeling stage. First the raw data has to be imported. If data is missing for a specific day, e.g.weekends, the date is left out of the reports from the ERP system. The missing dates are added to thedata set with quantities zero resulting in a complete time linear data set. This complete data set can beconverted to a multiple time series object (mts), which is used in the modeling stage.

    > data.empty data.merged data.merged[is.na(data.merged)] data.tseries library("vars")

    > model.var model.var.restricted summary(model.var.restricted)

    ...

    Estimation results for equation SO:

    ===================================SO = IS.l1 + IS.l5 + SO.l7 + IS.l7 + GS.l9 + SO.l16 + GS.l19 + GS.l20 + SO.l21

    Estimate Std. Error t value Pr(>|t|)

    IS.l1 0.22214 0.05723 3.881 0.000127 ***IS.l5 -0.18138 0.05998 -3.024 0.002705 **SO.l7 0.18734 0.05663 3.308 0.001052 **IS.l7 0.29996 0.07139 4.202 3.49e-05 ***GS.l9 -0.19140 0.06479 -2.954 0.003381 **SO.l16 0.11165 0.05287 2.112 0.035512 *GS.l19 0.16278 0.06126 2.657 0.008301 **GS.l20 0.27508 0.05881 4.677 4.38e-06 ***SO.l21 0.15094 0.05026 3.003 0.002896 **---

    Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

    ...

  • 7/30/2019 Using R in Continuous Assurance: Restricted Vector Autoregressive Model (RVAR) of Continuity Equations

    5/6

    CONTRIBUTED RESEARCH ARTICLE 5

    Prediction

    Finally the RVAR model is used to generate predictions for subsequent time periods with the predictfunction. These predictions can be used to be compared to actual quantities reported in subsequenttime periods for conformance. Deviations between the actual values and predicted values are flaggedas exceptions if they exceed a predefined threshold. In this specific implementation the restrictedmodel is used for predicting 10 subsequent time periods with a confidence interval of 1%. In this

    example the confidence interval parameter ci is misused in order to obtain a small range betweenthe upper and lower bounds. These resulting upper and lower bounds can be used as the thresholdsfor flagging the dates as exceptions. The vars also provides functions, like print and fanchart tovisualize the predicted values, as shown in figure 3.

    > model.predictions print(model.predictions)

    $SO

    fcst lower upper CI

    [1,] 7583.843 6982.703 8184.983 601.1400

    [2,] 4133.422 3523.785 4743.058 609.6366

    [3,] 58130.976 57521.340 58740.613 609.6366

    ...

    $GS

    fcst lower upper CI

    [1,] 9136.999 8706.614 9567.384 430.3852

    [2,] 3682.716 3252.331 4113.101 430.3852

    [3,] 44230.601 43800.215 44660.986 430.3852

    ...

    $IS

    fcst lower upper CI

    [1,] 8660.376 8203.789 9116.964 456.5877

    [2,] 2995.246 2538.658 3451.833 456.5877

    [3,] 49854.651 49398.063 50311.239 456.5877

    ...

    Fanchart for variable SO

    0 50 100 150 200 250 300 3501e+05

    2e+05

    5e+05

    Fanchart for variable GS

    0 50 100 150 200 250 300 35050000

    150000

    Fanchart for variable IS

    0 50 100 150 200 250 300 35050000

    150000

    Figure 3: Plot of the three predicted (or forecasted) variables, i.e. SO, GS and IS, using the fanchartfunction.

  • 7/30/2019 Using R in Continuous Assurance: Restricted Vector Autoregressive Model (RVAR) of Continuity Equations

    6/6

    CONTRIBUTED RESEARCH ARTICLE 6

    Conclusion

    The resulting script proves that it is feasible to implement the Restricted Vector Autoregressive modelfor continuity equations fairly easily and understandable in R. Due to the availability of the varspackage by Pfaff the VAR model and the RVAR model can be implemented in just a few lines of code.Once the data is processed to fit the desired data model for the modeling functions from the packagein total three functions have to be called to generate predictions for the RVAR model.

    Recommendations

    As expressed by Roy Sidebotham, Professor of Accountancy at Victoria University of Wellington,Accountants are cautious men, and their caution is expressed in the concept of conservatism. Cer-tainly in the field of continuous assurance this has been shown to be true throughout the years.Although continuous assurance has been available for a very long time, only a handful of auditorshave embraced the advantages the tools can add to the organization. By providing a more intuitivelook-and-feel to these tools, auditors might be more inclined to try and use them.

    Furthermore, this paper was focused on a single model of continuity equations. Other modelsmight provide a better fit on data for specific company types. Additional model implementations, e.g.Bayesian VAR, GARCH or SEM should be studied to provide a wider range of models.

    Bibliography

    Canadian Institute of Chartered Accountants (CICA). Continuous auditing, 1999. [p1]

    S. Dzeng. A comparison of analytical procedures expectation models using both aggregate anddisaggregate data. Auditing: A Journal of Practice & Theory, 13(Fall):124, 1994. [p2]

    A. Kogan, M. G. Alles, M. A. Vasarhelyi, and J. Wu. Analytical procedures for continuous data levelauditing: Continuity equations. 2010. [p2]

    B. Pfaff. vars: Var modelling. R package version, pages 13, 2008a. [p3]

    B. Pfaff. Var, svar and svec models: Implementation within r package vars. Journal of Statistical Software,27(4):132, 2008b. [p3]

    B. Pfaff and K. Im Taunus. Using the vars package. 2007. [p 3]

    M. A. Vasarhelyi and F. B. Halper. The continuous audit of online systems. Auditing: A Journal ofPractice & Theory, 10(1):110125, 1991. [p1]

    M. A. Vasarhelyi, M. G. Alles, and A. Kogan. Principles of analytic monitoring for continuousassurance. Journal of Emerging Technologies in Accounting, 1(1):121, 2004. [p1]

    M. A. Vasarhelyi, M. Alles, and K. T. Williams. Continuous assurance for the now economy. Institute ofChartered Accountants in Australia Sydney, Australia, 2010. [p1]

    Erik van KempenFontys University of Applied SciencesFontys Hogeschool Financieel Management (Fontys School of Financial Management)Rachelsmolen 15612MA EindhovenThe [email protected]

    mailto:[email protected]:[email protected]