continuity equations in continuous auditing: the models...

84
Continuity Equations in Continuous Auditing: Detecting Anomalies in Business Processes Michael Alles Department of Accounting & Information Systems Rutgers University 180 University Ave Newark, NJ 07102 Alex Kogan Department of Accounting & Information Systems Rutgers University 180 University Ave Newark, NJ 07102 Miklos Vasarhelyi Department of Accounting & Information Systems Rutgers University 180 University Ave Newark, NJ 07102

Upload: others

Post on 24-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Continuity Equations In Continuous Auditing: The Models for Anomaly Detection

Continuity Equations in Continuous Auditing: Detecting Anomalies in Business Processes

Michael Alles

Department of Accounting & Information Systems

Rutgers University

180 University Ave

Newark, NJ 07102

Alex Kogan

Department of Accounting & Information Systems

Rutgers University

180 University Ave

Newark, NJ 07102

Miklos Vasarhelyi

Department of Accounting & Information Systems

Rutgers University

180 University Ave

Newark, NJ 07102

Jia Wu

Dept of Accounting and Finance

University of Massachusetts – Dartmouth

285 Oldwestport Road

North Dartmouth, MA 02747

Nov, 2005

Abstract:

This research discusses how Continuity Equations (CE) can be developed and implemented in Continuous Auditing (CA) for anomaly detection purpose. We use real-world data sets extracted from the supply chain of a large healthcare management firm in this study. Our first primary objective is to demonstrate how to develop CE models from a Business Process (BP) auditing approach. Two types of CE models are constructed in our study — the Simultaneous Equation Model (SEM) and the Multivariate Time Series Model (MTSM). Our second primary objective is to design a set of online learning and error correction protocols for automatic model selection and updating. Our third primary objective is to evaluate the CE models through comparison. First, we compare the prediction accuracy of the CE models and the traditional analytical procedure model. Our results indicate that CE models have relatively good prediction accuracy. Second, we compare the anomaly detection capability of the AP models with error correction and models without error correction. We find that models with error correction have better performance than models without error correction. Lastly, we examine the difference in detection capability between CE models and the traditional AP model. Overall, we find that CE models outperform linear regression model in terms of anomaly detection.

Keywords: continuous auditing, analytical procedure, anomaly detection

Data availability: Proprietary data, not available to the public, contact the author for details.

Table of Contents

3I. Introduction

6II.Background, Literature Review and Research Questions

62.1 Continuous Auditing

72.2 Business Process Auditing Approach

82.3 Continuity Equations

82.4 Analytical Procedures

102.5 Research Questions

14III. Research Method

143.1 Data Profile and Data Preprocessing

173.2 Analytical Modeling

173.2.1 Simultaneous Equation Model

193.2.2 Multivariate Time Series Model

203.2.3 Linear Regression Model

213.3 Automatic Model Selection and Updating

233.4 Prediction Accuracy Comparison

243.5 Anomaly Detection Comparison

263.5.1 Anomaly Detection Comparison of Models with Error Correction and without Error Correction

273.5.2 Anomaly Detection Comparison of SEM, MTSM and Linear Regression

28IV: Conclusion, Limitations and Future Research Directions

284.1 Conclusion

294.2 Limitations

304.3 Future Research Directions

31V: References

34VI: Figures, Tables and Charts

51VII: Appendix: Multivariate Time Series Model with All Parameter Estimates

I. Introduction

The CICA/AICPA Research Report defines CA as “a methodology that enables independent auditors to provide written assurance on a subject matter using a series of auditors’ reports issued simultaneously with, or a short period of time after, the occurrence of events underlying the subject matter.” Generally speaking, audits in a CA environment are performed on a more frequent and timely basis relative to traditional auditing. CA is a great leap forward in both audit depth and audit breadth and is expected to improve audit quality. Thanks to fast advances in information technologies, the implementation of CA has become technologically feasible. Besides, the recent spate of corporate scandals and related auditing failures are driving the demand for audits of better quality. Additionally, new regulations such as Sarbanes-Oxley Act require verifiable corporate internal controls and shorter reporting lags. All these taken together have created an amenable environment for CA development since it is expected that CA can outperform traditional auditing on many aspects including anomaly detection.

In the past few years CA has caught the attention of more and more academic researchers, auditing professionals, and software developers. The research on CA has been continuously flourishing. A number of papers discuss the enabling technologies in CA (Vasarhelyi and Halper 1991; Kogan et al. 1999; Woodroof and Searcy 2001; Rezaee et al. 2002; Murthy and Groomer 2004, etc.). Other papers, mostly normative ones, address CA from a variety of theoretical perspectives (Alles et al. 2002 and 2004; Elliott 2002; Vasarhelyi 2002). However, there is a dearth of empirical research on CA due to the lack of data availability. This study extends the prior research by using real-world data sets to build analytical procedure models for CA. This research proposes and demonstrates how a set of novel analytical procedure (AP) models, Continuity Equation Models, can be developed and implemented in CA for anomaly detection purpose which is considered as one of the fortes of CA.

Statement on Auditing Standards (SAS) No. 56 requires that analytical procedures be performed during the planning and review stages of an audit. It also recommends the use of analytical procedures in substantive tests. Effective and efficient AP can reduce the audit workload of substantive tests and cut the audit cost because it can help auditors focus their attention on most suspicious accounts. In applying analytical procedures an auditor first relies on an AP expectation model, or an AP model, to make prediction the value of an important business metric (e.g. an account balance). Then, the auditor compares the predicted value with the actual value of the metric. Finally, if the variance between the two values exceeds a pre-established threshold, an alarm should be triggered. This would warrant the auditor’s further investigation.

The expectation models in AP therefore play an important role in helping auditors to identify anomalies. In comparison to traditional auditing, CA usually involves with high frequency audit tests, highly disaggregate business process data, and continuous new data feeds. Moreover, any detected anomalies must be corrected in a timely fashion. Therefore, an expectation model in CA must be capable of processing high volumes of data, detecting anomalies at the business process level, self-updating using the new data feeds, and correcting errors immediately after detection. Besides, it is of vital importance for the expectation model in CA to detect anomalies in an accurate and timely manner.

With these expectations in mind we define four requirements for AP models in CA. First, the analytical modeling process should be largely automated and the AP models should be self-adaptive, requiring as little human intervention as possible. The high frequency audit tests make it impossible for human auditors to select the best model on a continuous basis. One the other hand, new data are continuous fed into a CA system. A good AP model for CA should be able to assimilate additional information contained in the new data feeds, adapting itself continuously. Second, the AP models should be able to generate accurate predictions. Auditors reply on expectation models to forecast business metric values. It is very important for the expectation model to generate accurate forecast. Third, the AP models should be able to detect errors effectively and efficiently. The ultimate objective for auditors applying AP is to detect anomalies and then to apply test of details on these anomalies. To improve error detection capability, the AP model should be able to correct any detected errors as soon as possible to ensure that new prediction is based on the correct data as opposed to the erroneous ones.

In this study we construct the expectation models using the supply chain procurement cycle data provided by a large healthcare management firm. These models are built using the Business Process (BP) approach as opposed to the traditional transactional level approach. Three key business processes are identified in the procurement cycle: the ordering process, the receiving process, and the voucher payment process. Our CE models are constructed on the basis of these three BPs. Two types of CE models are proposed in this paper — Simultaneous Equation Model and Multivariate Time Series Model. We evaluate the two CE models through comparison with traditional AP models such as the linear regression model. First, we examine the prediction accuracy of these models. Our first findings suggest that the two CE models can produce relative accurate forecasts. Second, we compare AP models with and without error correction. Our finding shows that AP models with error correction can outperform AP models without error correction. Lastly, we compare the two CE models with traditional linear regression model in an error correction scenario. Our finding indicates that the Simultaneous Equation Model and Multivariate Time Series Model outperform the linear regression model in terms of anomaly detection.

The remainder of this paper is organized as follows. Section II provides some background knowledge and literature review on CA and AP. Research questions are stated in this section. Section III describes the data profile and data preprocessing steps, discusses model construction procedures, and presents the findings of the study. The final section discusses the results, identifies the limitations of the study, and suggests future research directions in this line of study.

II. Background, Literature Review and Research Questions

2.1 Continuous Auditing

Continuous auditing research came into being over a decade ago. The majority of the papers on continuous auditing are descriptive, focusing on the technical aspect of CA (Vasarhelyi and Halper 1991; Kogan et al. 1999; Woodroof and Searcy 2001; Rezaee et al. 2002; Murthy 2004; Murthy and Groomer 2004, etc.). Only a few papers discuss CA from other perspectives (e.g. economics, concepts, research directions, etc.) and most of these are normative research (Alles et al. 2002 and 2004; Elliott 2002; Vasarhelyi 2002; Searcy et al. 2004). Due to the data unavailability, there is a lack of empirical studies on CA in general and on analytical procedures for CA in particular. This study enhances the prior CA literature by using empirical evidence to illustrate the prowess of CA in anomaly detection. Additionally, it extends prior CA research by discussing the implementation of analytical procedures in CA and proposing new models for it.

2.2 Business Process Auditing Approach

When Vasarhelyi and Halper (1991) introduced the concept of continuous auditing over a decade ago, they discussed the use of key operational metrics and analytics generated by the CPAS auditing system to help internal auditors monitor and control AT&T’s billing system. Their study uses the operational process auditing approach and emphasizes the use of metrics and analytics in continuous auditing. Bell et al. (1997) also propose a holistic approach to audit an organization: structurally dividing a business organization into various business processes (e.g. the revenue cycle, procurement cycle, payroll cycle, and etc.) for the auditing purpose. They suggest the expansion of auditing subjects from business transactions to the routine activities associated with different business processes.

Following these two prior studies, this paper also adopts the Business Process auditing approach in our AP model construction. One advantage of BP auditing approach is that anomalies can be detected in a more timely fashion. Anomalies can be detected at the transaction level as opposed to the account balance level. Traditionally, AP is applied at the account balance level after business transactions have been aggregated into account balances. This would not only delay the anomaly detection but also create an additional layer of difficulty for anomaly detection because transactions are consolidated into accounting numbers. BP approach auditing can solve these problems.

2.3 Continuity Equations

We use Continuity Equations to model the different BPs in our sample firm. Continuity Equations are commonly used in physics as mathematical expressions of various conservation laws, such as the law of the conservation of mass: “For a control volume that has a single inlet and a single outlet, the principle of conservation of mass states that, for steady-state flow, the mass flow rate into the volume must equal the mass flow rate out.” This paper borrows the concept of CE from physical sciences and applies it in a business scenario. We consider the each business process as a control volume made up of a variety of transaction flows, or business activities. If transaction flows into and out of each BP are equal, the business process would be in a steady-state, free from anomalies. Otherwise, if spikes occur in the transaction flows, the steady-state of the business process can not be maintained. Auditors should initiate detailed investigations on the causes of these anomalies. We use Continuity Equations to model the relationships between different business processes.

2.4 Analytical Procedures

There are extensive research studies on analytical procedures in auditing. Many papers discuss the traditional analytical procedures (Hylas and Ashton 1982; Kinney 1987; Loebbecke and Steinbart 1987; Biggs et al. 1988; Wright and Ashton 1989). A few papers examine new analytical procedure models using disaggregate data, which are more relevant to this study. Dzeng (1994) introduces VAR (vector) model into his study, comparing 8 univariate and multivariate AP models using quarterly and monthly financial and non-financial data of a university. His study finds that less aggregate data can yield better precisions in the time-series expectation model. Additionally, his study also concludes that VAR is better than other modeling techniques in generating expectation models. Other studies also find that applying new AP models to high frequency data can improve analytical procedure effectiveness (Chen and Leitch 1998 and 1999, Leitch and Chen 2003). On the other hand, Allen et al. (1999) do not find any supporting evidence that geographically disaggregate data can improve analytical procedures. In this study we test the CE models’ effectiveness using daily transaction data, which has higher frequency than the data sets used by prior studies.

We propose two types of CE models for our study: the Simultaneous Equation Model (SEM) and the Multivariate Time Series Model (MTSM). The SEM can model the interrelationships between different business processes simultaneously while traditional expectation models such as linear regression model can only model one relationship at a time. In SEM each interrelationship between two business processes is represented by an equation. A SEM usually consists of a simultaneous system of two or more equations which represent a variety of business activities co-existing in a business organization. The use of SEM in analytical procedures has been examined by Leitch and Chen (2003). They use monthly financial statement data to compare the effectiveness to different AP models. Their finding indicates that SEM can generally outperform other AP models including Martingale and ARIMA.

In addition to SEM, this paper also proposes a novel AP model — the Multivariate Times Series Model. To the best of our knowledge, the MTSM has never been explored in prior auditing literature even though there are a limited number of studies on the univariate time series models (Knechel 1988; Lorek et al. 1992; Chen and Leitch 1998; Leitch and Chen 2003). The computational complexity of MTSM hampers its application as an AP model. Prior researchers and practitioners were unable to apply this model because appropriate statistical tools were unavailable. However, with the recent development in statistical software applications, it is not difficult to compute this sophisticated model. Starting with version 8, SAS (Statistical Analysis System) allows users to make multivariate time series forecasts. The MTSM can not only model the interrelationships between BPs but represent the time series properties of these BPs as well. Although MTSM has never been discussed in the auditing literature, studies in other disciplines have either employed or discussed MTSM as a forecasting method (Swanson 1998; Pandther 2002; Corman and Mocan 2004).

2.5 Research Questions

Because the statistically sophisticated CE models can better represent business processes, we expect that the CE models can outperform the traditional AP models. We select linear regression model for comparison purpose because it is considered as the best traditional AP model (Stringer and Stewart 1986). Following the previous line of research on AP model comparison (Dzeng 1994; Allen et al. 1999; Chen and Leitch 1998 and 1999; Leitch and Chen 2003), this study compares the SEM and MTSM with the traditional linear regression model on two aspects. First, we compare the prediction accuracy of these models. A good expectation model is expected to generate predicted values close to actual values. Auditors can rely on these accurate predictions to identify anomalies. This leads to our first research question:

Question 1: Do Continuity Equation models have better prediction accuracy than the traditional linear regression model?

We use Mean Absolute Percentage Error (MAPE) as the benchmark to measure prediction accuracy of expectation models. It first calculates the absolute variance between the predicted value and the actual value. Then it computes the percentage of the absolute variance over the actual value. A good expectation model is supposed to have better prediction accuracy thereby low MAPE.

Our primary interest in developing AP models is for anomaly detection purpose. To the best of our knowledge, previous auditing studies have not discussed how error correction can affect the detection capabilities of AP models. In this study we compare the anomaly detection capabilities between models with error correction and without error correction. In a continuous auditing scenario involving the high frequency audit tests, it may be necessary that an error should be corrected immediately after its detection, before subsequent audit tests. And the AP models will make subsequent predictions based on the correct value as opposed to the erroneous value. We expect that AP models with error correction can outperform AP models without error correction. This leads to our second research question:

Question 2: Do AP models with error correction have better anomaly detection capability than AP models without error correction?

The ultimate purpose for us to develop CE models is for anomaly detection. We expect that CE models can outperform traditional AP models in term of anomaly detection. Hence our third research question is stated as follows:

Question 3: Do Continuity Equation models have better anomaly detection capability than traditional linear regression AP model?

After the analysis of our second research question, we find that models with error correction generally outperform models without error correction. Therefore, when we analyze our third research question, we specify that both the CE models and the linear regression model have error correction capability. We use false positive error rate and false negative error rate as benchmarks to measure the anomaly detection capability. A false positive error, also known as a false alarm or Type I error, is a non-anomaly mistakenly detected by the AP model as an anomaly. On the other hand, a false negative error is, or a type II error, which indicates that an anomaly failed to be detected by the model. An effective AP model is expected to have a low false positive error rate and low false negative error rate.

In summary, we expect that AP models in CA should be equipped with error correction function for better detection rate. And we also expect that CE models can outperform traditional linear regression models in a simulated CA environment.

III. Research Method

3.1 Data Profile and Data Preprocessing

The data sets are extracted from the data warehouse of a large healthcare management firm. At current stage we are working with the supply chain procurement cycle data which consists of 16 tables concerning a variety of business activities. The data sets include all procurement cycle daily transactions from Oct 1st, 2003 through June 30th, 2004. These transactions are performed by ten facilities of the firm including one regional warehouse and nine hospitals and surgical centers. The data was first collected by the ten facilities and then transferred to the central data warehouse in the firm’s headquarters. Even though the firm headquarters have implemented an ERP system, many of the 10 facilities still rely on legacy systems. Not surprisingly, we have identified a number of data integrity issues which we believe are caused by the legacy systems. These problems should be resolved in the data preprocessing phase of our study.

Following the BP auditing approach, and also as a means to facilitate our research, our first step is to identify key business processes in the supply chain procurement cycle and focus our attention on them. The three key BPs we have identified are: ordering, receiving, and voucher payment, which involve six tables in our data sets.

[Insert Figure 1 here]

At the second step we clean the data by removing the erroneous records in the 6 tables. Two categories of erroneous records are removed from our data sets: those that violate data integrity and those that violate referential integrity. Data integrity violations include but are not limited to invalid purchase quantities, receiving quantities, and check numbers. Referential integrity violations are largely caused by many unmatched records among different business processes. For example, a receiving transaction can not be matched with any related ordering transaction. A payment for a purchase order can not be matched by the related receiving transaction. Before we can build any analytical model, these erroneous records must be eliminated. We expect that in a real world CA environment the data cleansing task can be automatically completed by the auditee’s ERP systems.

The third step in the data preprocessing is to identify records with complete transaction cycles. To facilitate our research, we exclude those records with partial delivery or partial payments. We specify that all the records in our sample must have undergone a complete transaction cycle. In other words, each record in one business process must have a matching record in a related business process and have the same transaction quantity.

The fourth step in the data preprocessing phase is to delete non-business-day records. Though we find sporadic transactions occurred on some weekends and holidays, the number of these transactions accounts for only a small fraction of that on a working day. However, if we leave these non-business-day records in our sample, these records would inevitably trigger false alarms simply because of low transaction volume.

The last step in the data preprocessing is to aggregate individual transactional records by day. Aggregation is a critical step before the construction of an AP model. It can reduce the variance among individual transactions. The spikes among individual transactions can be somewhat smoothed out if we aggregate them by day, which can lead to the construction of a stable model. Otherwise, it would be impossible to derive a stable model based on data sets with enormous variances because the model would either trigger too many alarms or lack the detection power. On the other hand, if we aggregate individual transactions over a longer time period such as a week or a month, then the model would fail to detect many abnormal transactions because the abnormality would be mostly smoothed out by the longer time interval.

Aggregation can be performed on other dimensions besides the time interval. For example, aggregation can be based on each facility (hospitals or surgical centers), each vendor, each purchase item, etc. Moreover, various metrics can be used for aggregation. At current stage, we use transaction quantity as the primary metric for aggregation. Other metrics including the dollar amounts of each transaction or the number of transactions can also be aggregated. Analytical procedures can be performed on these different metrics to monitor the transaction flows in the business organization. Auditing on different metrics plays an important role today. It would enable auditors to detect more suspicious patterns of transaction. Summary statistics are presented in Table 1.

[Insert Table 1 here]

3.2 Analytical Modeling

3.2.1 Simultaneous Equation Model

Following the BP auditing approach, we have identified three key business processes for our sample firm which include ordering, receiving, and voucher payment processes. We model the interrelationships between these processes. We select the transaction quantity as our audit metric and use individual working day as our level of aggregation. After completing these initial steps, we are able to estimate our first type of CE model — the Simultaneous Equation Model. We specify the daily aggregate of order quantity as the exogenous variable while the daily aggregates of receiving quantity and payment quantity as endogenous variables. Time stamps are added to the transaction flow among the three business processes. The transaction flow originates from the ordering process at time t. After a lag period Δ1, the transaction flow pops up in the receiving process at time t+ Δ1. After another lag period Δ2, the transaction flow re-appears in the voucher payment processes at time t+ Δ2. The basic SEM model is:

11

- 22

( ) *( )

( ) *( )

tt

tt

qtyofreceiveqtyoforder

qtyofvouchersqtyofreceive

ae

be

-D

D

=+

ì

í

=+

î

We then select transaction quantity as the primary metric for testing as opposed to dollar amounts due to two reasons: First, we want to illustrate that CA can work efficiently and effectively on operational data (non-financial data); Second, in our sample set dollar amounts contains noisy information including sales discounts and tax. We aggregate the transaction quantities for the ordering, receiving, and voucher payment processes respectively. After excluding weekends and holidays, we have obtained 147 observations in our data sets for each business process.

Our next step in constructing the simultaneous equation model is to estimate the lags. Initially, we used the mode and average of the individual transactions’ lags as estimates for the lags between the BPs. The mode lag between the ordering process and the receiving process is 1 day. The mode lag between the receiving process and the payment process is also 1 day. The average lags are 3 and 6 days respectively. Later, we tried different combinations of lag estimates from 1 day to 7 days to test our model. Our results indicate that the mode estimate works best among all estimates for the simultaneous equation model. Therefore, we can express our estimated model as:

-11

-12

*

*

tt

tt

receiveorder

voucherreceive

ae

be

=+

ì

í

=+

î

Where

order = daily aggregate of transaction quantity for the purchase order process

receive = daily aggregate of transaction quantity for the receiving process

voucher = daily aggregate of transaction quantity for the voucher payment process

t = transaction time

We divide our data set into two parts. The first part which accounts for 2/3 of the observations is categorized as the training set and used to estimate the model. The second part which accounts for 1/3 of the total observations is categorized as the hold-out set and used to test our model. Our estimated simultaneous equation model estimated on the training set is as follows:

-11

-12

0.8462*

0.8874*

tt

tt

receiveordere

voucherreceivee

=+

ì

í

=+

î

The R squares for the equation are 0.73 and 0.79 respectively, which indicate a good fit of data for the simultaneous equation model. However, we have also realized some limitations associated with SEM. First, the lags have to be separately estimated and such estimations are not only time-consuming but also prone to errors. Second, SEM model is a simplistic model. Each variable can only depend on a single lagged value of the other variable. For example, vouchert can only depend on receivet-1 even though there may be a good chance that vouchert can depend on other lagged value of the receive variable, or even the lagged value of the order variable. Due to these limitations, we need to develop a more flexible CE model.

3.2.2 Multivariate Time Series Model

We continue to follow the BP auditing approach and use daily aggregates of transaction quantity as audit metric to develop the MTSM. However, unlike in the case of SEM, no lag estimation is necessary. We only need to specify the maximum lag period. All possible lags within the period can be tested by the model. We specify 18 days as the maximum lag because 95% of the lags of all the individual transactions fall within this time frame. Our basic multivariate time series model is expressed as follows:

ordert = Φro*M(receive)+ Φvo*M(voucher)+ εo

receive t = Φor*M(order)+ Φvr*M(voucher)+ εr

vouchert = Φov*M(order)+ Φrv*M(receive)+ εv

M(order)= n*1 vector of daily aggregate of order quantity

M(receive)= n*1 vector of daily aggregate of receive quantity

M(voucher)= n*1 vector of daily aggregate of voucher quantity

Φ = corresponding 1*n transition vectors

Again we split our data set into two subsets: the training set and the hold-out (test) set. We use SAS VARMAX procedure to estimate the large MTSM model (a 3x18x3 matrix has been estimated - see Appendix). Despite the fact that this model is a good fit to our data sets, the predictions it generates for the hold-out (test) sample have large variances. In addition, a large number of the parameter estimates are not statistically significant. We believe the model suffers from the over-fitting problem. Therefore, we apply step-wise procedures to restrict the insignificant parameter values to zero and retain only the significant parameters in the model in each step. Then, we estimate the model again. If new insignificant parameters appear, we restrict them to zero and re-estimate the model. We repeat the step-wise procedure several times until there are no insignificant parameters appearing in the model. One of our estimated multivariate time series model is expressed as:

ordert = 0.24*order t-4 + 0.25*order t-14 + 0.56*receive t-15 + eo

receive t= 0.26*order t-4 + 0.21*order t-6 + 0.60*voucher t-10 + er

vouchert =0.73*receivet-1 - 0.25*ordert-7 + 0.22*ordert-17 + 0.24*receivet-17+ ev

3.2.3 Linear Regression Model

We construct the linear regression model for comparison purpose. In our linear regression model we specify the lagged values of daily aggregates of transaction quantity in the order process and the receive process as two independent variables respectively, and the voucher payment quantity aggregate as the dependent variable. Again, we use the mode value of lags in individual transactions as estimates for the lags in the model (i.e. 2 day lag between the ordering and voucher payment processes, and 1 day lag between the receiving and voucher payment processes). No intercept is used in our model because we can not find any valid meaning for the intercept. Our OLS linear regression model is expressed as follows:

vouchert = α*ordert-2 + β*receivet-1 + ε

Where

order = daily aggregate of transaction quantity for the ordering process

receive = daily aggregate of transaction quantity for the receiving process

voucher = daily aggregate of transaction quantity for the voucher payment process

t= transaction time at time t

Again we use the first 2/3 of our data set as the training subset to estimate our model. The estimated linear regression model is:

vouchert = 0.02* ordert-2 + 0.81* receivet-1 + e

The α estimate is statistically insignificant (p>0.73) while the β estimate is significant at 99% level (p<0.0001).

3.3 Automatic Model Selection and Updating

One distinctive feature of analytical modeling in CA is the automatic model selection and updating capability. Traditional analytical modeling is usually based on static archival data. Auditors generally apply one model to the entire audit data set. In comparison, analytical modeling in CA can be based on the continuous data streams dynamically flowing into the CA system. The analytical modeling in CA should be able to assimilate the new information contained in every segment of the data flows and adapt itself constantly. Each newly updated analytical model is used to generate a prediction only for one new segment of data. This model updating procedure is expected to improve prediction accuracy and anomaly detection capability.

[Insert Figure 2 here]

When we develop multivariate time series models, we have encountered model over-fitting problem. Specifically, our initial model is a large and complex one including many parameters. Though this model fits the training data set very well, it suffers from a severe model over-fitting problem as indicated by the poor model prediction accuracy. To improve the model, we have applied a step-wise procedure. First, we determine a p-value threshold for all the parameter estimates. Then, in each step, we only retain the parameter estimates under the pre-determined threshold and restrict those over the threshold to zero, and re-estimate the model. If we find new parameter estimates over the threshold, we apply the previous procedure again until all the parameter estimates are below the threshold. The step-wise procedure ensures that all the parameters are statistically significant and the over-fitting problem is largely eliminated.

[Insert Figure 3 here]

When we apply step-wise procedures to the multivariate time series model, a set of different p-value thresholds are used. We choose thresholds at 5%, 10%, 15%, 20% and 30% and test the prediction accuracy for each variable in MTSM. We find that if we use 15% threshold, the MTSM has the overall best prediction accuracy.

3.4 Prediction Accuracy Comparison

While performing the analytical procedures, auditors use different measures to make predictions on account numbers. The methods they use include historical analysis, financial ratio analysis, reasonableness tests, and statistical AP models. One of the expectations for AP models is that auditors can rely on the models to make accurate predictions. Hence, it is important for AP models to make forecasts as close to actual values as possible. In this subsection we compare the prediction accuracy for the three AP models: the Simultaneous Equation Model, the Multivariate Time Series Model, and the Linear Regression Model.

We use MAPE as the benchmark to measure prediction accuracy, expecting that a good model should have a small MAPE (i.e. the variance between the predicted value and the actual value is small). We first use the training sample to estimate each of the three models. Then, each estimated model is used to make 1-step-aheard forecasts, to calculate the forecast variance, and then the model is adapted according to the new data feeds in the hold-out (test) sample. Finally, all the variances are summed up and divided by the total number of observations in the hold-out sample to compute the MAPE. The results for MAPE of Voucher predictions are presented in Table 2.

[Insert Table 2 here]

We find that the MAPEs generated by the three AP models differ only by less than 2%, which indicates that all three models have very similar prediction accuracy. Linear regression model has the lowest MAPE, followed by the MTSM, and SEMhas the highest MAPE. Therefore, H1 is rejected. Theoretically, the best AP model should have the lowest MAPE. The slightly higher MAPE for SEM and MTSM can possibly be attributed to the pollution in our data sets. As mentioned in later section, our AP models can detect 5 or 6 original anomalies in the hold-out (test) sample before we seed any errors. These outliers can increase the MAPEs of the AP models which are capable of making accurate predictions. In summary, the CE models have similar prediction accuracy as the linear regression models. Compared with prior literature, the predictions are relatively accurate for all of our AP models.

3.5 Anomaly Detection Comparison

The primary objective for AP models is to detect anomalies. A good AP model can detect anomalies in an effectively and efficiently fashion. To measure anomaly detection capability of the AP models, we use two benchmarks: the number of false positive errors and the number of false negative errors. A false positive error is also called a false alarm or a type I error, which is a non-anomaly mistakenly detected by the model as an anomaly. A false negative error is also called a type II error, which is an anomaly failed to be detected by the model. While a false positive error can waste auditor’s time and thereby increase audit cost, a false negative error is usually more detrimental because of the material uncertainty associated with the undetected anomaly. An effective and efficient AP model should keep both the number of false positive errors and the number of false negative errors at a low level.

To compare the anomaly detection capabilities of the CE models and linear regression model, we need to seed errors into our hold-out (test) sample. Our AP models have detected around 5 original anomalies even before we seed any errors. Therefore, we select those observations other than the original anomalies to seed errors. Each time we randomly seed 8 errors into the hold-out sample. We also want to test how the error magnitude can affect each AP model’s anomaly detection capability. Therefore, we use 5 different magnitudes respectively in every round of error seeding: 10%, 50%, 100%, 200% and 400% of the original actual value of the seeded observations. The entire error seeding procedure is repeated 10 times to reduce selection bias and ensure randomness.

We use confidence intervals (CI) for the individual dependant variable, or the prediction interval, as the acceptable threshold of variance to define anomaly detection. If the value of the prediction exceeds the upper confidence limit or falls below the lower confidence limit, then we mark the observation as an anomaly. The selection of prediction interval is another issue to discuss. If we choose a high percentage for the prediction interval (e.g. 95%), the prediction interval would be too wide and thereby result in a low detection rate. On the other hand, if a low percentage prediction interval is selected, then the prediction interval would be too narrow and thereby many normal observations would be categorized as anomalies. To solve this problem, we have tested a set of prediction interval percentages from 50% through 95%. We have found that 97% prediction interval works the best for simultaneous equations, 70% prediction interval works best for the multivariate time series model and 90% prediction interval works best for the linear regression model. The relatively low percentage of prediction interval for the multivariate time series model is most probably due to the data pollution problem.

Leitch and Chen (2003) use both positive and negative approach to evaluate the anomaly detection capability of various models. In the positive approach all the observations are treated as non-anomalies. The model is used to detect those seeded errors. In contrast, the negative approach treats all observations as anomalies. The model is used to find those non-anomalies. This study only adopts the positive approach because it fits better to the BP auditing scenario.

3.5.1 Anomaly Detection Comparison of Models with Error Correction and without Error Correction

In a CA environment when an anomaly is detected, the auditor will be notified immediately and a detailed investigation will be initiated. Ideally, the auditor will correct the error with the true value in a timely fashion, usually before the next round of audit starts. In other words, errors are detected and corrected in real time in a CA environment. We use error correction model to simulate this scenario. Specifically, when the AP model detects a seeded error in the hold-out (test) sample, the seeded error will be substituted by the original actual value before the model is used again to predict subsequent values.

For comparison purpose, we also test how our CE models and linear regression model work without the error correction. Unlike continuous auditing, anomalies are detected but usually not corrected immediately in traditional auditing. To simulate this scenario, we simply don’t correct any errors we seeded in the hold-out (test) sample even if the AP model detects them.

[Insert Table 3A, 3B, 4, 5A, 5B, 6, 7A, 7B, 8, Chart 1A, 1B, 2A, 2B, 3A, 3B here]

We find that SEM with error correction consistently outperforms SEM without error correction. SEM with error correction has lower false negative error rate and higher detection rate (Table 3A, 3B and Chart 1A, 1B). Neither the models generate any false positive errors (Table 4). The results indicate that the MTSM error correction model generally has lower false negative error rates than the MTSM without error correction (Tables 5A and 5B, Charts 2A and 2B), which supports H2 that error correction models have better detection rate. In addition, the error correction model for MTSM has no false positive errors while the model without error correction occasionally has false positive errors (Table 6). Similar results have been found for the linear regression model (Tables 7A and 7B, Charts 3A and 3B), except that there are no false positive errors in both the error correction and the without correction models (Table 8). A further investigation reveals that some of the new false negative errors are due to the detection failure of the original anomalies, especially when the magnitude of seeded error increases. This indicates that AP models without error correction may fail to detect those relatively small-size errors when large-size errors are present simultaneously. In general, the results are consistent with our expectation that error correction can significantly improve the anomaly detection capability of AP models. H2 is supported.

3.5.2 Anomaly Detection Comparison of SEM, MTSM and Linear Regression

It is of interest for us to know whether CE models are better than linear regression model in terms of anomaly detection.We have known from the test results of H2 that error correction models generally have better anomaly detection capability than non-correction models. Hence, we compare SEM, MTSM and the linear regression model in an error correction scenario.

[Insert Tables 9A, 9B, 10 and Charts 4A, 4B here]

Table 9A and Chart 1A present the results of the false negative error percentage rates of the three different AP models with error correction. Table 9B and Chart 9B present the detection success rates of the AP models, which is another way to represent the anomaly detection capability. It is not difficult to realize that though we have mixed results for the three models in anomaly detection when error magnitude is small (at 10% and 50% level), multivariate time series model can detect more anomalies as error magnitude increases than the linear regression model. The difference is most pronounced when error magnitude is at 200% level. Besides, the simultaneous equation model also has better performance than the linear regression model when error magnitude is larger than 100%. However, it is not as good as the multivariate time series model when error magnitude is at the 200% level. Table 10 presents the results of the false positive error percentage rates of the AP models with error correction. There are no false positive errors generated by all three models, indicating perfect performance on this aspect. In summary, we believe that the both simultaneous equation model and the multivariate time series model perform better than the linear regression model in general, because it is more important for the AP models to detect material errors than small errors. Our finding supports H3.

IV: Conclusion, Limitations and Future Research Directions

4.1 Conclusion

In this study we have explored how to implement analytical procedures to detect anomalies in a continuous auditing environment. Specifically, we have constructed two continuity equation models: a simultaneous equation model and a multivariate time series model. And we compare the CE models with the linear regression model in terms of prediction accuracy and anomaly detection performance. We can not find evidence to support our first hypothesis that CE models can normally generate better prediction accuracy. We find evidence to support our second hypothesis that models with error correction are better than models without error correction in anomaly detection. The results from the empirical tests are also consistent with our third hypothesis that the CE models generally outperform traditional linear regression model in terms of anomaly detection in a simulated CA environment which has high frequency data available.

This is the first study on the analytical procedures of continuous auditing. It is also the first attempt to use empirical data to compare different AP models in a CA context. We have also proposed a novel AP model in auditing research — the multivariate time series model and examine the different detection capabilities between models with error correction and without error correction.

4.2 Limitations

This study has a number of limitations. Firstly, our data sets are extracted from a single firm, which may constitute a selection bias. Until we test our CE model, using other firms’ data sets, we will not have empirical evidence to support that our AP models are portable and can be applied to other firms. In addition, our data sets contain some noise. Since our data sets are actually extracted from a central data warehouse which accepts data from both ERP and legacy systems in the firm’s subdivisions, it is inevitable for our datasets to be contaminated by some errors and noise. And the date truncation problem also produces some noise in our data sets. The appearance of original anomalies is one indication of the presence of noise in our data sets.

4.3 Future Research Directions

Since this paper is devoted to a new research area, there are many future research directions to fill the vacuum. For example, it can be very interesting to see if our model is portable to other firms or other audit dimensions such as financial numbers if data is available. It would also be of interest to see how CE models can be compared with other innovative AP models such as artificial intelligence models and other time series models including Martingale model and X11 model. Moreover, our models do not include many independent variables and control variables, which can be included in CE models in future studies.

V: References:

1. Allen R.D., M.S. Beasley, and B.C. Branson. 1999. Improving Analytical Procedures: A Case of Using Disaggregate Multilocation Data, Auditing: A Journal of Practice and Theory 18 (Fall): 128-142.

2. Alles M.G., A. Kogan, and M.A. Vasarhelyi. 2002. Feasibility and Economics of Continuous Assurance. Auditing: A Journal of Practice and Theory 21 (spring):125-138.

3. ____________________________________.2004. Restoring auditor credibility: tertiary monitoring and logging of continuous assurance systems. International Journal of Accounting Information Systems 5: 183-202.

4. ____________________________________ and J. Wu. 2004. Continuity Equations: Business Process Based Audit Benchmarks in Continuous Auditing. Proceedings of American Accounting Association Annual Conference. Orlando, FL.

5. Bell T., Marrs F.O., I. Solomon, and H. Thomas 1997. Monograph: Auditing Organizations Through a Strategic-Systems Lens. Montvale, NJ, KPMG Peat Marwick.

6. Chen Y. and Leitch R.A. 1998. The Error Detection of Structural Analytical Procedures: A Simulation Study. Auditing: A Journal of Practice and Theory 17 (Fall): 36-70.

7. ______________________. 1999. An Analysis of the Relative Power Characteristics of Analytical Procedures. Auditing: A Journal of Practice and Theory 18 (Fall): 35-69.

8. Corman H. and H.N. Mocan 2004. A Time-series Analysis of Crime, Deterrence and Drug Abuse in New York City. American Economic Review. Forthcoming.

9. Dzeng S.C. 1994. A Comparison of Analytical Procedures Expectation Models Using Both Aggregate and Disaggregate Data. Auditing: A Journal of Practice and Theory 13 (Fall): 1-24.

10. Elliot, R.K. 2002. Twenty-First Century Assurance. Auditing: A Journal of Practice and Theory. 21 (Spring): 129-146.

11. Groomer, S.M. and U.S. Murthy. 1989. Continuous auditing of database applications: An embedded audit module approach. Journal of Information Systems 3 (2): 53-69.

12. Kogan, A. E.F. Sudit, and M.A. Vasarhelyi. 1999. Continuous Online Auditing: A Program of Research. Journal of Information Systems. 13. (Fall): 87–103.

13. Koreisha, S. and Y. Fang. 2004. Updating ARMA Predictions for Temporal Aggregates. Journal of Forecasting. 23: 275-396.

14. Leitch and Y. Chen. 2003. The Effectiveness of Expectation Models In Recognizing Error Patterns and Eliminating Hypotheses While Conducting Analytical Procedures. Auditing: A Journal of Practice and Theory 22 (Fall): 147-206.

15. Murthy, U.S. 2004. An Analysis of the Effects of Continuous Monitoring Controls on e-Commerce System Performance. Journal of Information Systems. 18 (Fall): 29–47.

16. ___________and M.S. Groomer. 2004. A continuous auditing web services model for XML-based accounting systems. International Journal of Accounting Information Systems 5: 139-163.

17. Pandher G.S. 2002. Forecasting Multivariate Time Series with Linear Restrictions Using Unconstrained Structural State-space Models. Journal of Forecasting 21. 281-300.

18. Rezaee, Z., A. Sharbatoghlie, R. Elam, and P.L. McMickle. 2002. Continuous Auditing: Building Automated Auditing Capability. Auditing: A Journal of Practice and Theory 21 (Spring): 147-163.

19. Searcy, D. L., Woodroof, J. B., and Behn, B. 2003. Continuous Audit: The Motivations, Benefits, Problems, and Challenges Identified by Partners of a Big 4 Accounting Firm. Proceedings of the 36th Hawaii International Conference on System Sciences: 1-10.

20. Stringer, K. and T. Stewart. 1986. Statistical techniques for analytical review in auditing. Wiley Publishing. New York.

21. Swanson, N., E. Ghysels, and M. Callan. 1999. A Multivariate Time Series Analysis of the Data Revision Process for Industrial Production and the Composite Leading Indicator. Book chapter of Cointegration, Causality, and Forecasting: Festchrift in Honour of Clive W.J. Granger. Eds. R. Engle and H. White. Oxford: Oxford University Press.

22. Vasarhelyi, M.A and F.B. Halper. 1991. The Continuous Audit of Online Systems. Auditing: A Journal of Practice and Theory 10 (Spring):110–125.

23. _______________ 2002. Concepts in Continuous Assurance. Chapter 5 in Researching Accounting as an Information Systems Discipline, Edited by S. Sutton and V. Arnold. Sarasota, FL: AAA.

24. ______________, M.A. Alles, and A. Kogan. 2004. Principles of Analytic Monitoring for Continuous Assurance. Forthcoming, Journal of Emerging Technologies in Accounting.

25. Woodroof, J. and D. Searcy 2001. Continuous Audit Implications of Internet Technology: Triggering Agents over the Web in the Domain of Debt Covenant Compliance. Proceedings of the 34th Hawaii International Conference on System Sciences.

VI: Figures, Tables and Charts

Figure 1: Business Process Transaction Flow Diagram

Figure 2: Model Updating Protocol

Figure 3: Multivariate Time Series Model Selection

Table 1: Summary Statistics

VariableN

Mean

Std DevMinimumMaximum

Order147

6613.373027.463240

30751

Receive147

6488.293146.43171

29599

Voucher147

5909.713462.990

30264

The table presents the summary statistics for the transaction quantity daily aggregates for each business process. The low minimums for Receive and Voucher are due to the date cutting off problem. Our data sets span from 10/01/03 to 06/30/04. Many related transactions for the Receive and Voucher for the first 2 days of our data set may happen before 10/01/03.

Table 2 – MAPE Comparison among SEM, MTSM, and Linear Regression Model

1. Simultaneous Equations MAPE

Analysis Variable: Voucher Quantity Variance

N Mean Std Dev Minimum Maximum

——————————————————————————————————

45 0.3805973 0.3490234 0.0089706 2.0227909

——————————————————————————————————

2. Multivariate Time Series MAPE

Analysis Variable: Voucher Quantity Variance

N Mean Std Dev Minimum Maximum

—————————————————————————————————

47 0.3766894 0.3292023 0.0147789 1.9099106

—————————————————————————————————

3. Linear Regression MAPE

Analysis Variable: Voucher Quantity Variance

N Mean Std Dev Minimum Maximum

——————————————————————————————————

45 0.3632158 0.3046678 0.0366894 1.7602224

——————————————————————————————————

The MAPE is represented by the Mean value of each panel.

Table 3A: False Negative Error Rates of Simultaneous Equation Models with and without Error Correction

Error Magnitude

Simultaneous Equation Model with Error Correction

Simultaneous Equation Model without Error Correction

10%

90%

91.25%

50%

78.75%

78.75%

100%

33.75%

40%

200%

12.5%

16.25%

400%

0

10%

The false negative error rate indicates the percentage of errors that are not detected by the AP model. It is calculated as: (total number of undetected errors) / 8 (which is the number of seeded errors)*100%.

Table 3B: Detection Rates of Simultaneous Equation Models with and without Error Correction

Error Magnitude

Simultaneous Equation Model with Error Correction

Simultaneous Equation Model without Error Correction

10%

10%

8.75%

50%

21.25%

21.25%

100%

66.25%

60%

200%

87.5%

83.75%

400%

100.00%

90%

The detection rate indicates the percentage of errors that have been successfully detected. It is calculated as: 100% - False Negative Error Percentage.

Chart 1A: Anomaly Detection Comparison between Simultaneous Equation Models (SEM) with and without Error Correction — False Negative Error Rate

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

10%E50%E100%E200%E400%E

SEM_Error_CorrectionSEM_No_Error_Correction

Chart 1B: Anomaly Detection Comparison between Simultaneous Equation Models (SEM) with and without Error Correction — Detection Rate

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

10%E50%E100%E200%E400%E

SEM_Error_CorrectionSEM_No_Correction

Table 4: False Positive Error Rates of Simultaneous Equation Models with and without Error Correction

Error Magnitude

Simultaneous Equation Model with Error Correction

Simultaneous Equation Model without Error Correction

10%

0

0

50%

0

0

100%

0

0

200%

0

0

400%

0

0

The false positive error rate indicates the percentage of non-errors that are reported by the AP model as errors. It is calculated as: (total number of reported non-errors) / 8 (which is the number of seeded errors)*100%.

Table 5A: False Negative Error Rates of MTSM with and without Error Correction

Error Magnitude

Multivariate Time Series with Error Correction

Multivariate Time Series without Error Correction

10%

96.25%

95%

50%

71.25%

75%

100%

32.5%

40%

200%

8.75%

42.5%

400%

0

37.5%

The false negative error rate indicates the percentage of errors that are not detected by the AP model. It is calculated as: (total number of undetected errors) / 8 (which is the number of seeded errors)*100%.

Table 5B: Detection Rates of MTSM with and without Error Correction

Error Magnitude

Multivariate Time Series with Error Correction

Multivariate Time Series without Error Correction

10%

3.75%

5%

50%

28.75%

25%

100%

67.50%

60%

200%

91.25%

57.50%

400%

100.00%

62.50%

The detection rate indicates the percentage of errors that have been successfully detected. It is calculated as: 100% - False Negative Error Percentage.

Chart 2A: Anomaly Detection Comparison between MTSM with and without Error Correction — False Positive Error Rate

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

10%E50%E100%E200%E400%E

MTSM_Error_CorretionMTSM_No_Error_Corretion

Chart 2B: Anomaly Detection Comparison between MTSM with and without Error Correction — Detection Rate

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

10%E50%E100%E200%E400%E

MTSM_Error_CorrectionMTSM_No_Error_Correction

Table 6: False Positive Error Rates of MTSM with and without Error Correction

Error Magnitude

Multivariate Time Series with Error Correction

Multivariate Time Series without Error Correction

10%

0

0

50%

0

2.5%

100%

0

2.5%

200%

0

1.25%

400%

0

0

The false positive error rate indicates the percentage of non-errors that are reported by the AP model as errors. It is calculated as: (total number of reported non-errors) / 8 (which is the number of seeded errors)*100%.

Table 7A: False Negative Error Rates of Linear Regression Model with and without Error Correction

Error Magnitude

Linear Regression with Error Correction

Linear Regression without Error Correction

10%

95%

92.5%

50%

68.75%

76.25%

100%

33.75%

45%

200%

17.5%

28.75%

400%

2.5%

21.25%

The false negative error rate indicates the percentage of errors that are not detected by the AP model. It is calculated as: (total number of undetected errors) / 8 (which is the number of seeded errors)*100%.

Table 7B: Detection Rates of Linear Regression Model with and without Error Correction

Error Magnitude

Linear Regression with Error Correction

Linear Regression without Error Correction

10%

5%

7.50%

50%

31.25%

23.75%

100%

66.25%

55%

200%

82.50%

71.25%

400%

97.50%

78.75%

The detection rate indicates the percentage of errors that have been successfully detected. It is calculated as: 100% - False Negative Error Percentage.

Chart 3A: Anomaly Detection Comparison between Linear Regression Model with and without Error Correction — False Negative Error Rate

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

10%E50%E100%E200%E400%E

Linear_Regression_Error_Correction

Linear_Regression_No_Error_Correction

Chart 3B: Anomaly Detection Comparison between Linear Regression Model with and without Error Correction — Detection Rate

0%

20%

40%

60%

80%

100%

120%

10%E50%E100%E200%E400%E

Linear_Regression_Error_Correction

Linear_Regression_No_Error_Correction

Table 8: False Positive Error Rates of Linear Regression with and without Error Correction

Error Magnitude

Linear Regression Model without Error Correction

Linear Regression Model with Error Correction

10%

0

0

50%

0

0

100%

0

0

200%

0

0

400%

0

0

The false positive error rate indicates the percentage of non-errors that are reported by the AP model as errors. It is calculated as: (total number of reported non-errors) / 8 (which is the number of seeded errors)*100%.

Table 9A: False Negative Error Rates of SEM, MTSM, and Linear regression

Error Magnitude

Simultaneous Equations

Multivariate Time Series

Linear Regression

10%

90.00%

96.25%

95%

50%

78.75%

71.25%

68.75%

100%

33.75%

32.5%

33.75%

200%

12.50%

8.75%

17.5%

400%

0

0

2.5%

The false negative error rate indicates the percentage of errors that are not detected by the AP model. It is calculated as: (total number of undetected errors) / 8 (which is the number of seeded errors)*100%.

Table 9B: Detection Rates of SEM, MTSM, and Linear regression

Error Magnitude

Simultaneous Equations

Multivariate Time Series

Linear Regression

10%E

10.00%

3.75%

5%

50%E

21.25%

28.75%

31.25%

100%E

66.25%

67.50%

66.25%

200%E

87.50%

91.25%

82.50%

400%E

100.00%

100.00%

97.50%

The detection rate indicates the percentage of errors that have been successfully detected. It is calculated as: 100% - False Negative Error Percentage.

Chart 4A: Anomaly Detection Comparison of SEM, MTSM and Linear Regression — False Negative Error Rate.

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

10%E50%E100%E200%E400%E

SEMMTSMLinear Regression

Chart 4B: Anomaly Detection Comparison of SEM, MTSM and Linear Regression — Detection Rate

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

10%E50%E100%E200%E400%E

SEMMTSMLinear_Regression

Table 10: False Positive Error Rates of SEM, MTSM, and Linear regression

Error Magnitude

Simultaneous Equations

Multivariate Time Series

Linear Regression

10%

0

0

0

50%

0

0

0

100%

0

0

0

200%

0

0

0

400%

0

0

0

The false positive error rate indicates the percentage of non-errors that are reported by the AP model as errors. It is calculated as: (total number of reported non-errors) / 8 (which is the number of seeded errors)*100%.

VII: Appendix: Multivariate Time Series Model with All Parameter Estimates

(No Restriction Model)

Equation

Parameter

Estimate

StdErr

tValue

Probt

Variable

Voucher

AR1_1_1

-0.16037

0.186429

-0.86023

0.396966301

Voucher(t-1)

Voucher

AR1_1_2

0.773021

0.170199

4.541852

0.0001

Receive(t-1)

Voucher

AR1_1_3

0.056123

0.161157

0.348252

0.730255749

Order(t-1)

Voucher

AR2_1_1

-0.03406

0.178949

-0.19033

0.85042111

Voucher(t-2)

Voucher

AR2_1_2

0.093277

0.203196

0.459047

0.649743948

Receive(t-2)

Voucher

AR2_1_3

-0.07466

0.165114

-0.45219

0.654615021

Order(t-2)

Voucher

AR3_1_1

0.005592

0.171766

0.032554

0.974261525

Voucher(t-3)

Voucher

AR3_1_2

0.105725

0.197241

0.536021

0.596177159

Receive(t-3)

Voucher

AR3_1_3

-0.0933

0.161175

-0.57885

0.567318487

Order(t-3)

Voucher

AR4_1_1

-0.17319

0.18603

-0.93098

0.35982232

Voucher(t-4)

Voucher

AR4_1_2

0.084414

0.188326

0.448234

0.65743388

Receive(t-4)

Voucher

AR4_1_3

-0.17791

0.151337

-1.17561

0.249649354

Order(t-4)

Voucher

AR5_1_1

-0.14743

0.194655

-0.75738

0.45515182

Voucher(t-5)

Voucher

AR5_1_2

0.179332

0.197046

0.9101

0.370538137

Receive(t-5)

Voucher

AR5_1_3

-0.14668

0.138626

-1.0581

0.299052293

Order(t-5)

Voucher

AR6_1_1

-0.19199

0.198044

-0.96941

0.340638372

Voucher(t-6)

Voucher

AR6_1_2

0.104713

0.201667

0.519238

0.607675379

Receive(t-6)

Voucher

AR6_1_3

-0.02084

0.137556

-0.1515

0.880666117

Order(t-6)

Voucher

AR7_1_1

0.089424

0.183408

0.48757

0.629650204

Voucher(t-7)

Voucher

AR7_1_2

0.105111

0.198981

0.528246

0.601490467

Receive(t-7)

Voucher

AR7_1_3

-0.22252

0.142094

-1.566

0.128581577

Order(t-7)

Voucher

AR8_1_1

0.162881

0.167633

0.971649

0.339544558

Voucher(t-8)

Voucher

AR8_1_2

-0.00173

0.188214

-0.00917

0.992744711

Receive(t-8)

Voucher

AR8_1_3

0.092181

0.140709

0.65512

0.517737052

Order(t-8)

Voucher

AR9_1_1

-0.01444

0.197208

-0.07324

0.942134933

Voucher(t-9)

Voucher

AR9_1_2

-0.2778

0.202377

-1.37267

0.180752008

Receive(t-9)

Voucher

AR9_1_3

0.140415

0.136786

1.026532

0.313428089

Order(t-9)

Voucher

AR10_1_1

-0.06404

0.220088

-0.29099

0.773200909

Voucher(t-10)

Voucher

AR10_1_2

0.215473

0.284546

0.757252

0.455224862

Receive(t-10)

Voucher

AR10_1_3

0.052242

0.138868

0.376195

0.709607308

Order(t-10)

Voucher

AR11_1_1

0.137301

0.25217

0.544478

0.590423095

Voucher(t-11)

Voucher

AR11_1_2

0.120841

0.277529

0.435417

0.666597994

Receive(t-11)

Voucher

AR11_1_3

0.070449

0.149185

0.472223

0.640427632

Order(t-11)

Voucher

AR12_1_1

0.06304

0.251674

0.250484

0.804041662

Voucher(t-12)

Voucher

AR12_1_2

-0.03973

0.277996

-0.14291

0.887381603

Receive(t-12)

Voucher

AR12_1_3

-0.12179

0.151897

-0.80181

0.429415981

Order(t-12)

Voucher

AR13_1_1

-0.21532

0.228969

-0.94037

0.355071398

Voucher(t-13)

Voucher

AR13_1_2

0.134275

0.251694

0.533484

0.597908337

Receive(t-13)

Voucher

AR13_1_3

-0.06533

0.149511

-0.43696

0.665491493

Order(t-13)

Voucher

AR14_1_1

-0.15346

0.229446

-0.66885

0.50907024

Voucher(t-14)

Voucher

AR14_1_2

-0.2001

0.250259

-0.79958

0.430686579

Receive(t-14)

Voucher

AR14_1_3

-0.05806

0.148166

-0.39182

0.698154055

Order(t-14)

Voucher

AR15_1_1

0.069123

0.230829

0.299456

0.76680342

Voucher(t-15)

Voucher

AR15_1_2

0.187622

0.268598

0.698524

0.490610739

Receive(t-15)

Voucher

AR15_1_3

0.087902

0.147067

0.597699

0.554844733

Order(t-15)

Voucher

AR16_1_1

-0.04247

0.252551

-0.16817

0.867660209

Voucher(t-16)

Voucher

AR16_1_2

0.068967

0.273247

0.252396

0.802578559

Receive(t-16)

Voucher

AR16_1_3

-0.15622

0.146218

-1.06839

0.294469019

Order(t-16)

Voucher

AR17_1_1

0.18883

0.249186

0.757787

0.454909542

Voucher(t-17)

Voucher

AR17_1_2

0.394092

0.274718

1.43453

0.162496664

Receive(t-17)

Voucher

AR17_1_3

0.169976

0.146612

1.159358

0.256101901

Order(t-17)

Voucher

AR18_1_1

-0.15459

0.251826

-0.61386

0.544263909

Voucher(t-18)

Voucher

AR18_1_2

0.03755

0.294286

0.127596

0.899380593

Receive(t-18)

Voucher

AR18_1_3

-0.03482

0.151271

-0.23016

0.819643158

Order(t-18)

Receive

AR1_2_1

-0.03055

0.217119

-0.14073

0.889092916

Voucher(t-1)

Receive

AR1_2_2

0.020737

0.198218

0.104617

0.917425712

Receive(t-1)

Receive

AR1_2_3

0.320549

0.187687

1.707894

0.098722011

Order(t-1)

Receive

AR2_2_1

0.121545

0.208408

0.58321

0.564420534

Voucher(t-2)

Receive

AR2_2_2

-0.01138

0.236646

-0.04809

0.961986335

Receive(t-2)

Receive

AR2_2_3

0.118937

0.192296

0.618512

0.541237292

Order(t-2)

Receive

AR3_2_1

0.014457

0.200043

0.072267

0.942903001

Voucher(t-3)

Receive

AR3_2_2

0.163507

0.229711

0.711796

0.482480094

Receive(t-3)

Receive

AR3_2_3

-0.09866

0.187708

-0.5256

0.603303003

Order(t-3)

Receive

AR4_2_1

0.12031

0.216655

0.555308

0.583093628

Voucher(t-4)

Receive

AR4_2_2

0.082318

0.219328

0.37532

0.710250662

Receive(t-4)

Receive

AR4_2_3

0.04232

0.17625

0.240113

0.811992104

Order(t-4)

Receive

AR5_2_1

-0.0029

0.2267

-0.01278

0.98989428

Voucher(t-5)

Receive

AR5_2_2

-0.02093

0.229484

-0.09123

0.927963093

Receive(t-5)

Receive

AR5_2_3

-0.03364

0.161446

-0.20834

0.836473545

Order(t-5)

Receive

AR6_2_1

0.065528

0.230646

0.284106

0.778419671

Voucher(t-6)

Receive

AR6_2_2

-0.20524

0.234866

-0.87385

0.389635351

Receive(t-6)

Receive

AR6_2_3

0.241198

0.1602

1.505604

0.14336769

Order(t-6)

Receive

AR7_2_1

-0.18737

0.213601

-0.87721

0.387837709

Voucher(t-7)

Receive

AR7_2_2

-0.07976

0.231737

-0.34419

0.73327743

Receive(t-7)

Receive

AR7_2_3

0.128889

0.165486

0.778849

0.442601876

Order(t-7)

Receive

AR8_2_1

-0.01147

0.195229

-0.05877

0.953553137

Voucher(t-8)

Receive

AR8_2_2

0.245291

0.219198

1.119039

0.272631917

Receive(t-8)

Receive

AR8_2_3

-0.04621

0.163872

-0.28199

0.780028486

Order(t-8)

Receive

AR9_2_1

0.209188

0.229672

0.91081

0.370170555

Voucher(t-9)

Receive

AR9_2_2

0.134783

0.235692

0.57186

0.571979672

Receive(t-9)

Receive

AR9_2_3

-0.13692

0.159304

-0.85951

0.397355911

Order(t-9)

Receive

AR10_2_1

0.713458

0.25632

2.783469

0.009526644

Voucher(t-10)

Receive

AR10_2_2

-0.222

0.331388

-0.66992

0.508396592

Receive(t-10)

Receive

AR10_2_3

-0.18318

0.161729

-1.13262

0.266982012

Order(t-10)

Receive

AR11_2_1

-0.07424

0.293683

-0.25279

0.802280928

Voucher(t-11)

Receive

AR11_2_2

0.270723

0.323216

0.837592

0.409352563

Receive(t-11)

Receive

AR11_2_3

-0.00188

0.173745

-0.01082

0.99144303

Order(t-11)

Receive

AR12_2_1

-0.04491

0.293105

-0.15321

0.87933249

Voucher(t-12)

Receive

AR12_2_2

-0.31936

0.32376

-0.98642

0.332374044

Receive(t-12)

Receive

AR12_2_3

-0.00739

0.176902

-0.04176

0.966987811

Order(t-12)

Receive

AR13_2_1

-0.17556

0.266663

-0.65836

0.515684352

Voucher(t-13)

Receive

AR13_2_2

0.08359

0.293128

0.285165

0.777616586

Receive(t-13)

Receive

AR13_2_3

0.124446

0.174124

0.714697

0.480713307

Order(t-13)

Receive

AR14_2_1

-0.00485

0.267218

-0.01816

0.98563778

Voucher(t-14)

Receive

AR14_2_2

-0.15028

0.291458

-0.51563

0.610159856

Receive(t-14)

Receive

AR14_2_3

0.031932

0.172557

0.185049

0.854524356

Order(t-14)

Receive

AR15_2_1

-0.02061

0.268828

-0.07668

0.939424617

Voucher(t-15)

Receive

AR15_2_2

-0.07347

0.312815

-0.23488

0.8160131

Receive(t-15)

Receive

AR15_2_3

-0.07068

0.171278

-0.41263

0.683017006

Order(t-15)

Receive

AR16_2_1

0.162509

0.294126

0.552515

0.584979682

Voucher(t-16)

Receive

AR16_2_2

-0.403

0.31823

-1.26638

0.21581074

Receive(t-16)

Receive

AR16_2_3

0.095081

0.170288

0.558351

0.581042391

Order(t-16)

Receive

AR17_2_1

-0.20449

0.290208

-0.70463

0.486859807

Voucher(t-17)

Receive

AR17_2_2

0.219422

0.319943

0.685814

0.498469878

Receive(t-17)

Receive

AR17_2_3

-0.02093

0.170748

-0.12259

0.903307972

Order(t-17)

Receive

AR18_2_1

0.03724

0.293282

0.126976

0.899866486

Voucher(t-18)

Receive

AR18_2_2

0.096999

0.342732

0.283016

0.779246537

Receive(t-18)

Receive

AR18_2_3

0.019438

0.176174

0.110332

0.91293285

Order(t-18)

Order

AR1_3_1

0.154733

0.203475

0.76045

0.453341887

Voucher(t-1)

Order

AR1_3_2

-0.07571

0.185762

-0.40756

0.68669768

Receive(t-1)

Order

AR1_3_3

-0.22663

0.175892

-1.28846

0.208130573

Order(t-1)

Order

AR2_3_1

-0.00724

0.195311

-0.03705

0.9707106

Voucher(t-2)

Order

AR2_3_2

-0.15633

0.221775

-0.70491

0.486686616

Receive(t-2)

Order

AR2_3_3

0.011352

0.180212

0.062995

0.950218399

Order(t-2)

Order

AR3_3_1

0.152747

0.187472

0.814774

0.422077713

Voucher(t-3)

Order

AR3_3_2

-0.1919

0.215276

-0.89143

0.380297082

Receive(t-3)

Order

AR3_3_3

0.264969

0.175912

1.506259

0.143200403

Order(t-3)

Order

AR4_3_1

0.232692

0.20304

1.14604

0.261478928

Voucher(t-4)

Order

AR4_3_2

-0.20221

0.205545

-0.98378

0.33364933

Receive(t-4)

Order

AR4_3_3

0.427858

0.165175

2.590337

0.015051624

Order(t-4)

Order

AR5_3_1

-0.07093

0.212454

-0.33388

0.740960556

Voucher(t-5)

Order

AR5_3_2

-0.21575

0.215063

-1.00321

0.324351539

Receive(t-5)

Order

AR5_3_3

0.116105

0.151301

0.767376

0.449281197

Order(t-5)

Order

AR6_3_1

0.329587

0.216152

1.524792

0.138527803

Voucher(t-6)

Order

AR6_3_2

-0.16738

0.220107

-0.76044

0.453345853

Receive(t-6)

Order

AR6_3_3

0.356457

0.150133

2.374273

0.024676001

Order(t-6)

Order

AR7_3_1

0.22186

0.200178

1.108314

0.277155871

Voucher(t-7)

Order

AR7_3_2

-0.27512

0.217175

-1.26682

0.215657318

Receive(t-7)

Order

AR7_3_3

0.241143

0.155087

1.554892

0.131203438

Order(t-7)

Order

AR8_3_1

-0.13461

0.182961

-0.73572

0.46802254

Voucher(t-8)

Order

AR8_3_2

-0.36707

0.205423

-1.78689

0.084787272

Receive(t-8)

Order

AR8_3_3

0.135778

0.153574

0.884119

0.384161153

Order(t-8)

Order

AR9_3_1

-0.49843

0.215239

-2.31572

0.028116941

Voucher(t-9)

Order

AR9_3_2

0.065586

0.220881

0.296931

0.768710913

Receive(t-9)

Order

AR9_3_3

-0.16232

0.149293

-1.08723

0.28620511

Order(t-9)

Order

AR10_3_1

-0.47439

0.240212

-1.97487

0.058217519

Voucher(t-10)

Order

AR10_3_2

0.383478

0.310563

1.234784

0.22717318

Receive(t-10)

Order

AR10_3_3

-0.16732

0.151566

-1.10395

0.279012969

Order(t-10)

Order

AR11_3_1

-0.02051

0.275228

-0.07452

0.941126086

Voucher(t-11)

Order

AR11_3_2

-0.20287

0.302905

-0.66974

0.508511703

Receive(t-11)

Order

AR11_3_3

0.115627

0.162826

0.710127

0.483498415

Order(t-11)

Order

AR12_3_1

0.224882

0.274686

0.818685

0.419879493

Voucher(t-12)

Order

AR12_3_2

0.350439

0.303415

1.154983

0.257859377

Receive(t-12)

Order

AR12_3_3

-0.06855

0.165785

-0.4135

0.682392815

Order(t-12)

Order

AR13_3_1

0.460876

0.249905

1.844203

0.07575913

Voucher(t-13)

Order

AR13_3_2

-0.3442

0.274708

-1.25296

0.22058352

Receive(t-13)

Order

AR13_3_3

0.147679

0.163182

0.904999

0.373187325

Order(t-13)

Order

AR14_3_1

0.076131

0.250425

0.304008

0.763369746

Voucher(t-14)

Order

AR14_3_2

0.50425

0.273142

1.846109

0.075473617

Receive(t-14)

Order

AR14_3_3

0.387588

0.161714

2.396757

0.023460166

Order(t-14)

Order

AR15_3_1

0.036202

0.251934

0.143697

0.886769198

Voucher(t-15)

Order

AR15_3_2

0.693183

0.293157

2.364547

0.02521952

Receive(t-15)

Order

AR15_3_3

0.00421

0.160514

0.026226

0.979263435

Order(t-15)

Order

AR16_3_1

0.643019

0.275643

2.332796

0.02707079

Voucher(t-16)

Order

AR16_3_2

-0.11341

0.298232

-0.38026

0.706622645

Receive(t-16)

Order

AR16_3_3

0.102113

0.159587

0.639856

0.527467718

Order(t-16)

Order

AR17_3_1

-0.29129

0.271971

-1.07104

0.293298539

Voucher(t-17)

Order

AR17_3_2

-0.18914

0.299837

-0.63081

0.533277278

Receive(t-17)

Order

AR17_3_3

-0.2682

0.160018

-1.67606

0.104859286

Order(t-17)

Order

AR18_3_1

-0.09114

0.274851

-0.33158

0.742673334

Voucher(t-18)

Order

AR18_3_2

-0.54341

0.321194

-1.69183

0.101780131

Receive(t-18)

Order

AR18_3_3

-0.22331

0.165103

-1.35253

0.187027552

Order(t-18)

Updated

Receiving Process

101

Predicted Value

Voucher Payment Process

Ordering Process

103

102

101

CA System

Data Segments for Analytical Modeling: 1,2,3,4,5,6……100

AP Model 1

AP Model 2

102

Predicted Value

104

103

102

CA System

Data Segments for Analytical Modeling: 1,2,3,4,5,6……100, 101

AP Model 3

103

Predicted Value

105

104

103

CA System

Data Segments for Analytical Modeling: 1,2,3,4,5,6……100, 101, 102

No

Yes

Retain parameters below threshold; Restrict parameters over threshold to zero

Determine Parameter p-value threshold

Initial Model Estimation

Re-estimate Model

Do new parameter estimates all below threshold?

Final Model

Updated

� http://www.cica.ca/index.cfm/ci_id/989/la_id/1.htm

� Compustat, CRSP or other popular data sets for capital market researchers are usually not sufficient to do empirical research in CA. For CA empirical research, very high frequency data sets are generally required.

� � HYPERLINK "http://www.tpub.com/content/doe/h1012v3/css/h1012v3_33.htm" ��http://www.tpub.com/content/doe/h1012v3/css/h1012v3_33.htm�

� See section 3.4 for detailed description of false negative errors and false positive errors.

� We found negative or zero numbers in these values which can not always be justified by our data provider.

� For example, a purchase record of 1000 pairs of surgical gloves must have a matched receiving record and payment record, with the same order number, item number, and the same transaction quantity which is 1000, etc.

� For example, the transaction quantity can differ a lot among individual transactions. The lag time between order and delivery, delivery and payment can also vary. If we aggregate the individual transactions by day, the variance can be largely reduced.

� We need perform audits on different metrics besides financial numbers. For example, the Patriot Act requires that banks should report the source of money for any deposit larger than US$100,000 by its client. However, the client can by-pass the mandatory reporting by dividing the deposit over $100,000 into several smaller deposits. Even though the deposit amount each time is under the limit, the number of total deposits is over the limit. Auditors can only catch such fraudulent activity by using the number of deposit transactions as an audit metrics.

� We find that the MAPE for predictions of Order, Receive, and Voucher variables are all over 54%.

� In Chen and Leitch (2003) study the MAPEs of the 4 AP models are 0.3915, 0.3944, 0.5964 and 0.5847. Other auditing literature sometimes report MAPE exceeding 100%.

� For the presentation purpose, we also include the tables and charts of detection rate, which equals to 1 minus false negative error rate.

� We also tested other benchmarks for anomalies, such as using MAPE = 50% as a threshold. We have found that using the prediction interval generates the best performance for anomaly detection, resulting the smallest number of false positive errors and false negative errors.

� Our data sets are not extracted from a relational database. As a result, there may exist non-trivial noise which can affect our test results.

1

2

_1175863108.xls

Chart1

10%E10%E

50%E50%E

100%E100%E

200%E200%E

400%E400%E

MTSM_Error_Corretion
MTSM_No_Error_Corretion
0.9625
0.95
0.7125
0.75
0.325
0.4
0.0875
0.425
0
0.375

Sheet1

Error MagnitudeMultivariate Time Series with Error CorrectionMultivariate Time Series without Error Correction

10%E96.25%95%

50%E71.25%75%

100%E32.50%40%

200%E8.75%42.50%

400%E0.00%37.50%

Sheet1

MTSM_Error_Correction
MTSM_No_Error_Correction
Chart 2A: Anomaly Detection Comparison between MTSM with and without Error Correction — False Positive Error Rate

Sheet2

MTSM_Error_Corretion
MTSM_No_Error_Corretion

Sheet3

_1175868272.xls

Chart1

10%E10%E

50%E50%E

100%E100%E

200%E200%E

400%E400%E

SEM_Error_Correction
SEM_No_Correction
0.1
0.0875
0.2125
0.2125
0.6625
0.6
0.875
0.8375
1
0.9

Sheet1

Error MagnitudeSimultaneous Equations with Error CorrectionSimultaneous Equations without Error Correction

10%E10.00%9%

50%E21.25%21%

100%E66.25%60%

200%E87.50%83.75%

400%E100.00%90.00%

Sheet1

SEM_Error_Correction
SEM_No_Error_Correction
Chart 1B: Anomaly Detection Comparison between Simultaneous Equation Models (SEM) with and without Error Correction — Detection Rate

Sheet2

SEM_Error_Correction
SEM_No_Correction

Sheet3

_1175868273.xls

Chart1

10%E10%E

50%E50%E

100%E100%E

200%E200%E

400%E400%E

SEM_Error_Correction
SEM_No_Error_Correction
0.9
0.9125
0.7875
0.7875
0.3375
0.4
0.125
0.1625
0
0.1

Sheet1

Error MagnitudeSimultaneous Equations with Error CorrectionSimultaneous Equations without Error Correction

10%E90.00%91%

50%E78.75%79%

100%E33.75%40%

200%E12.50%16.25%

400%E0.00%10.00%

Sheet1

SEM_Error_Correction
SEM_No_Error_Correction
Chart 1A: Anomaly Detection Comparison between Simultaneous Equation Models (SEM) with and without Error Correction — False Negative Error Rate

Sheet2

SEM_Error_Correction
SEM_No_Error_Correction

Sheet3

_1175868270.xls

Chart1

10%E10%E