accelerated fluctuation analysis by graphic card and complex pattern formation in financial markets

8/8/2019 Accelerated Fluctuation Analysis by Graphic Card and Complex Pattern Formation in Financial Markets

1/21

T h e o p e n a c c e s s j o u r n a l f o r p h y s i c s

New Journal of Physics

Accelerated fluctuation analysis by graphic cards

and complex pattern formation in financial markets*

Tobias Preis1,2,3, Peter Virnau1, Wolfgang Paul1

and Johannes J Schneider1

1 Institute of Physics, Johannes Gutenberg University Mainz,

Staudinger Weg 7, 55128 Mainz, Germany2 Artemis Capital Asset Management GmbH, Gartenstrasse 14,65558 Holzheim, Germany

E-mail: [email protected]

New Journal of Physics 11 (2009) 093024 (21pp)

Received 2 June 2009

Published 16 September 2009

Online at http://www.njp.org/

doi:10.1088/1367-2630/11/9/093024

Abstract. The compute unified device architecture is an almost conventional

programming approach for managing computations on a graphics processing

unit (GPU) as a data-parallel computing device. With a maximum number of

240 cores in combination with a high memory bandwidth, a recent GPU offers

resources for computational physics. We apply this technology to methods of

fluctuation analysis, which includes determination of the scaling behavior of a

stochastic process and the equilibrium autocorrelation function. Additionally, the

recently introduced pattern formation conformity (Preis T et al 2008 Europhys.

Lett. 82 68005), which quantifies pattern-based complex short-time correlations

of a time series, is calculated on a GPU and analyzed in detail. Results are

obtained up to 84 times faster than on a current central processing unit core.

When we apply this method to high-frequency time series of the GermanBUND future, we find significant pattern-based correlations on short time

scales. Furthermore, an anti-persistent behavior can be found on short time

scales. Additionally, we compare the recent GPU generation, which provides

a theoretical peak performance of up to roughly 1012 floating point operations

per second with the previous one.

Supplementary information can be found on http://www.tobiaspreis.de3 Author to whom any correspondence should be addressed.

New Journal of Physics 11 (2009) 0930241367-2630/09/093024+21$30.00 IOP Publishing Ltd and Deutsche Physikalische Gesellschaft
mailto:[email protected]://www.njp.org/http://www.tobiaspreis.de/http://www.tobiaspreis.de/http://www.njp.org/mailto:[email protected]


2/21

2

Contents

1. Introduction 22. GPU device architecture 4

3. Hurst exponent 6

4. Equilibrium autocorrelation 10

5. Fluctuation pattern conformity 13

6. Conclusion and outlook 18

Acknowledgments 18

Appendix. GPU source code 19

References 19

1. Introduction

In computer science applications and diverse interdisciplinary science fields such as

computational physics or quantitative finance, the computational power requirements increase

monotonically in time. In particular, the history of time series analysis mirrors the needs of

computational power and simultaneously the opportunities arising from the use of it. Up to

the present day, it is an often made simplistic assumption that price dynamics in financial time

series obey random walk statistics in order to simplify analytic calculations in econophysics

and in financial applications. However, such approximations, which are, e.g. used in the famous

options pricing model introduced by Black and Scholes [1] in 1973, neglect the real natureof financial market observables, and a large number of empirical deviations between financial

market time series and models presuming only a random walk behavior have been observed

in recent decades [2][6]. Already Mandelbrot [7, 8] discovered in the 1960s that commodity

market time series obey fat-tailed price change distributions [9]. His analysis was based on

historical cotton times and sales records dating back to the beginning of the 20th century. In

accordance with the technological improvements in computing resources, trading processes

were adapted in order to create full-electronic market places. Thus, the available amount of

historical price data increased impressively. As a consequence, the empirical properties found

by Mandelbrot were confirmed. However, the amount of transaction records available today

in time units of milliseconds also requires increased computing resources for its analysis.

From such analyses, scaling behavior, short-time anti-correlated price changes and volatilityclustering [10, 11] of financial markets are well established and can be reproduced, e.g. by a

model of the continuous double auction [12, 13] or by various agent-based models of financial

markets [14][22]. Furthermore, the price formation process and cross correlations [23, 24]

between equities and equity indices have been studied with the clear intention to optimize

asset allocation and portfolios. However, in contrast to such calculations, which can be done

with conventional computing facilities, a large computational power demand is driven by the

quantitative hedge fund industry and also by modern market making, which requires mostly

real time analytics. A market maker usually provides quotes for buying or selling a given asset.

In the competitive environment of electronic financial markets this cannot be done by a human

market maker alone, especially if a large number of assets is quoted concurrently. The rise

New Journal of Physics 11 (2009) 093024 (http://www.njp.org/)
http://www.njp.org/http://www.njp.org/


3/21

3

of the hedge fund industry in recent years and their interest in taking advantage of short time

correlations boosted the real-time analysis of market fluctuations and the market micro-structure

analysis in general, which is the study of the process of exchanging assets under explicit tradingrules [25], and which is studied intensely by the financial community [26][29].

Such computing requirements, which can be found in various interdisciplinary computer

sciences like computational physics including, e.g. Monte Carlo and molecular dynamics

simulations [30][32] or stochastic optimization [33], make use of the high-performance

computing resources necessary. This includes recent multi-core computing solutions based on

a shared memory architecture, which are accessible by OpenMP [34] or MPI [35] and can

be found in recent personal computers as a standard configuration. Furthermore, distributed

computing clusters with homogeneous or heterogeneous node structures are available in order

to parallelize a given algorithm by separating it into various sub-algorithms.

However, a recent trend in computer science and related fields is general purpose

computing on graphics processing units (GPUs), which can yield impressive performance,i.e. the required processing times can be reduced to a great extent. Some applications have

already been realized in computational physics [31], [36][43]. Recently, the Monte Carlo

simulation of the two-dimensional and three-dimensional ferromagnetic Ising model could be

accelerated up to 60 times [44] using a graphic card architecture. With multiple cores connected

by high memory bandwidth, todays GPUs offer resources for non-graphics processing. In the

beginning, GPU programs used C-like programming environments for kernel execution such

as OpenGL shading language [45] or C for graphics (Cg) [46]. The compute unified device

architecture (CUDA) [47] is an almost conventional programming approach making use of the

unified shader design of recent GPUs from NVIDIA corporation. The programming interface

allows one to implement an algorithm using standard C language without any knowledge of

the native programming environments. A comparable concept Close To the Metal (CTM) [48]

was introduced by Advanced Micro Devices Inc. for ATI graphics cards. One has to state that

computational power of consumer graphics cards roughly exceeds that of a central processing

unit (CPU) by 12 orders of magnitudes. A conventional CPU nowadays provides a peak

performance of roughly 20 109 floating point operations per second (FLOPS) [31]. The

consumer graphics card NVIDIA GeForce GTX 280 reaches a theoretical peak performance of

933 109 FLOPS. If one tried to realize the computational power of one GPU with a cluster of

several CPUs, a much larger amount of electric power would be required. A GTX 280 graphics

card exhibits a maximum power consumption of 236 W [49], while a recent Intel CPU consumes

roughly 100 W.

We apply this general-purpose graphics processing unit (GPGPU) technology tomethods of time series analysis, which includes determination of the Hurst exponent and

equilibrium autocorrelation function. Additionally, the recently introduced pattern conformity

observable [50], which is able to quantify pattern-based complex short-time correlations of a

time series, is calculated on a GPU. Furthermore, we compare the recent GPU generation with

the previous one. All methods are applied to a high-frequency data set of the Euro-Bund futures

contract traded at the electronic derivatives exchange Eurex.

The paper is organized as follows. In section 2, a brief overview of key facts and properties

of the GPU architecture is provided in order to clarify implementation constraint details for the

following sections. A GPU-accelerated Hurst exponent estimation can be found in section 3. In

section 4, the equilibrium autocorrelation function is implemented on a GPU and in section 5,

the pattern conformity is analyzed on a GPU in detail. In each of these sections, the performance



4/21

4

Figure 1. Schematic visualization of the grid of thread blocks for a two-

dimensional thread and block structure.

of the GPU code as a function of parameters is first evaluated for a synthetic time series andcompared to the performance on a CPU. Then the time series methods are applied to a financial

market time series and a discussion of numerical errors is presented. Finally, our conclusions

are summarized in section 6.

2. GPU device architecture

In order to provide and discuss information about implementation details on a GPU for time

series analysis methods, key facts of the GPU device architecture are briefly summarized in

this section. As mentioned in the introduction, we use the compute unified device architecture

(CUDA), which allows the implementation of algorithms using standard C language withCUDA specific extensions. Thus, CUDA issues and manages computations on a GPU as a

data-parallel computing device.

The graphics card architecture, which is used in recent GPU generations, is built around

a scalable array of streaming MPs [47]. One such MP contains amongst others eight scalar

processor cores, a multi-threaded instruction unit, and shared memory, which is located on-chip.

When a C program using CUDA extensions and running on the CPU invokes a GPU kernel,

which is a synonym for a GPU function, many copies of this kernelknown as threads

are enumerated and distributed to the available MPs, where their execution starts. For such

an enumeration and distribution, a kernel grid is subdivided into blocks and each block is

subdivided into various threads as illustrated in figure 1 for a two-dimensional thread and

block structure. The threads of a thread block are executed concurrently in the vacated MPs.



5/21

5

Figure 2. Schematic visualization of GPU MPs with on-chip shared memory.

In order to manage a large number of threads, a single-instruction multiple-thread (SIMT) unit

is used. An MP maps each thread to one scalar processor core and each scalar thread executes

independently. Threads are created, managed, scheduled and executed by this SIMT unit in

groups of 32 threads. Such a group of 32 threads forms a warp, which is executed on the same

MP. If the threads of a given warp diverge via a data-induced conditional branch, each branch

of the warp is executed serially and the processing time of this warp consists of the sum of the

branches processing times.

As shown in figure 2, each MP of the GPU device contains several local 32-bit registers per

processor, memory that is shared by all scalar processor cores of an MP. Furthermore, constant

and texture cache are available, which is also shared on an MP. In order to afford reducing results

of involved MPs, slower global memory can be used, which is shared among all MPs and is also

accessible by the C function running in the CPU. Note that the GPUs global memory is still

roughly 10 times faster than the current main memory of personal computers. Detailed factsof the consumer graphics cards 8800 GT and GTX 280 used by us can be found in table 1.

Furthermore note that a GPU device only supports single-precision floating-point operations,

with the exception of the most modern graphic cards starting with the GTX 200 series. However,

the IEEE-754 standard for single-precision numbers is not completely realized. Deviations can

be found especially for rounding operations. In contrast, the GTX 200 series supports also

double-precision floating-point numbers. However, each MP features only one double-precision

processing core and so, the theoretical peak performance is dramatically reduced for double-

precision operations. Further information about the GPU device properties and CUDA can be

found in [47].



6/21

6

Table 1. Key facts and properties of the applied consumer graphics cards. The

theoretical acceleration factor between GeForce 8800 GT and GeForce GTX 280

is given by the difference in number of cores clock rate.

GeForce GeForce

8800 GT GTX 280

Global memory 512 MB 1024 MB

Number of multiprocessors (MPs) 14 30

Number of cores 112 240

Constant memory 64 kB 64 kB

Shared memory per block 16 kB 16 kB

Registers available per block 8192 16384

Warp size 32 32

Clock rate 1.51 GHz 1.30 GHz

3. Hurst exponent

The Hurst exponent H [51] provides information on the relative tendency of a stochastic

process. A Hurst exponent H 0.5 mirrors a

super-diffusive behavior of the underlying process. Extreme values tend to be followed by

extremal values. If the deviations from the mean values of the time series are independent,

which corresponds to a random walk behavior, a Hurst exponent of H = 0.5 is obtained.

The Hurst exponent H was originally introduced by Harold Edwin Hurst [52], a British

government administrator. He studied records of the Nile rivers volatile rain and drought

conditions and noticed interesting coherences for flood periods. Hurst observed in the eight

centuries of records that there was a tendency for a year with good flood conditions to be

followed by another year with good flood conditions. Nowadays, the Hurst exponent as a scaling

exponent is well studied in context of financial markets [50], [53][56]. Typically, an anti-

persistent behavior can be found on short timescales due to the nonzero gap between offer

and demand. On medium timescales, a super-diffusive behavior can be detected [54]. On long

timescales, a diffusive regime is reached, due to the law of large numbers.

For a time series p(t) with t {1, 2, . . . , T}, the time lag-dependent Hurst exponent

Hq (t) can be determined by the general relationship|p(t+ t) p(t)|q 1/q tHq (t) (1)

with the time lag t T and t N. The brackets . . . denote the expectation value. Apart

from (1), there are also other calculation methods, e.g. the rescaled range analysis [51]. We

present in the following the Hurst exponent determination implementation on a GPU for q = 1

and use H(t) Hq=1(t). The process to be analyzed is a synthetic anti-correlated random

walk, which was introduced in [50]. This process emerges from the superposition of two

random walk processes with different timescale characteristics. Thus, a parameter-dependent

anti-correlation at time lag one can be realized. As a first step, one has to allocate memory

on the GPU devices global memory for the time series, intermediate results and final results.

In a first approach, the time lag-dependent Hurst exponent is calculated up to tmax = 256.



7/21

7

Figure 3. Schematic visualization of the determination of the Hurst exponent ona GPU architecture for T = 32 + 4 and tmax = 4. The calculation is split into

various processing steps to ensure that their predecessors are completed.

In order to simplify the reduction process of the partial results, the overall number of time

steps T has to satisfy the condition T = (2 + 1) tmax, with being an integer number

called the length parameter of the time series. The number of threads per blockknown

as block sizeis equivalent to tmax. The array for intermediate results possesses length

T too, whereas the array for the final results contains tmax entries. After allocation, the

time series data have to be transferred from the main memory to the GPUs global memory.

If this step is completed, the main calculation part can start. As illustrated in figure 3 forblock 0, each block, which contains tmax threads each, loads tmax data points of the time

series from global memory to shared memory. In order to realize such a high-performance

loading process, each thread4 loads one value and stores this value in the array located in

the shared memory, which can be accessed by all threads of a block. Analogously, each

block also loads the next tmax entries. In the main processing step, each thread is in charge

of one specific time lag. Thus, each thread is responsible for a specific value of t and

summarizes the terms |p(t+ t) p(t)| in the block sub-segment of the time series. As the

maximum time lag is equivalent to the maximum number of threads and as the maximum

time lag is also equivalent to half the data points loaded per block, all threads have to

summarize the same number of addends and so, a uniform workload of the graphics card

is realized. However, as it is only possible to synchronize threads within a block and a

native block synchronization does not exist, partial results of each block have to be stored in

block-dependent areas of the array for intermediate results, as shown in figure 3. The termination

of the GPU kernel function ensures that all blocks were executed. In a post processing step, the

partial arrays have to be reduced. This is realized by a binary tree structure, as indicated in

figure 3. After this reduction, the resulting values can be found in the first tmax entries of the

intermediate array and a final processing kernel is responsible for normalization and gradient

calculation. The source code of these GPU kernel functions can be found in the appendix.

For the comparison between CPU and GPU implementations, we use an Intel Core 2 Quad

CPU (Q6700) with 2.66 GHz and 4096 kB cache size, of which only one core is used. The

4 Thread and block IDs are accessible in standard C language via built-in variables.



8/21

8

Length parameter()

Time(ms)

10

2

101

100

101

102

103

104

105

106

107

4 6 8 10 12 14

Time on GPU for allocationTime on GPU for memory transferTime on GPU for main functionTime on GPU for post processing

Time on GPU for final processingTotal processing time on GPUTotal processing time on CPU

Acceleration

10

20

30

40

4 6 8 10 12 14

Figure 4. Processing times for the calculation of the Hurst exponent H(t)

on GPU and CPU for tmax = 256. The consumer graphics card 8800 GT is

used as GPU device. The total processing time on the GPU can be broken into

allocation time, time for memory transfer, time for main processing, time for

post-processing, and time for final processing. The acceleration factor is shown

in the inset. A maximum acceleration factor of roughly 40 can be obtained.

Furthermore, at = 9 there is a break of the slope of the acceleration, which

is influenced by cache size effects.

standard C source code executed on the host is compiled with the gcc compiler (version 4.2.1).

The results for tmax = 256 and the consumer graphics card 8800 GT can be found in figure 4.

The acceleration factor , which is shown in the inset, reaches a maximum value of roughly 40,

and is determined by the relationship

=Total processing time on CPU

Total processing time on GPU. (2)

A smaller speed-up factor can be measured for small values of , as the relative fraction of

allocation time and time for memory transfer is larger in comparison to the time needed forthe calculation steps. The corresponding analysis for the GTX 280 yields a larger acceleration

factor of roughly 70. If we increase the maximum time lag tmax to 512, which is only

possible for the GTX 280, a maximum speed-up factor of roughly 80 can be achieved,

as shown in figure 5. This indicates that tmax = 512 leads to a higher workload on the

GTX 280.

At this point, we can also compare the ratio between the performances of 8800 GT and

GTX 280 for our application to the ratio of theoretical peak performances. The latter is given

as the number of cores multiplied with the clock rate, which amounts to roughly 1.84. If we

compare the total processing times on these GPUs for = 15 and tmax = 256, we obtain an

empirical performance ratio of 1.7. If we use the acceleration factors for tmax = 256 on the

8800 GT and for tmax = 512 on the GTX 280 for comparison, we obtain a value of 2.



9/21

9

Length parameter()

Time(ms)

10

2

101

100

101

102

103

104

105

106

107

2 4 6 8 10 12 14



Acceleration

0

20

40

60

80

2 4 6 8 10 12 14

Figure 5. Processing times for the calculation of the Hurst exponent H(t) on

GPU and CPU for tmax = 512. These results are obtained by the GTX 280. The

total processing time on GPU can also be broken into allocation time, time for

memory transfer, time for main processing, time for post-processing and time for

final processing. A maximum acceleration factor of roughly 80 can be reached,

which is shown in the inset.

After this performance analysis, we apply the GPU implementation to real financial marketdata in order to determine the Hurst exponent of the Euro-Bund futures contract traded at

the European exchange (Eurex). In this context, we will also gauge the accuracy of the GPU

calculations by quantifying deviations from the calculation on a CPU. The Euro-Bund futures

contract (FGBL) is a financial derivative. A futures contract is a standardized contract to buy

or sell a specific underlying instrument at a proposed date in the future, which is called the

expiration time of the futures contract, at a specified price. The underlying instruments of the

FGBL contract are long-term debt instruments issued by the Federal Republic of Germany

with remaining terms of 8.5 to 10.5 years and a coupon of 6%. We use the Euro-Bund futures

contract with expiration time June 2007. The time series shown in figure 6 contains 1 051 982

trades, recorded from 8 March 2007 to 7 June 2007. In all presented calculations of the FGBL

time series on the GPU, is fixed to 11. Thus, the data set is limited to the first T = 1049088trades in order to fit the data set length to the constraints of the specific GPU implementation. In

figure 7, the time lag-dependent Hurst exponent H(t) is presented. On short timescales, the

well-known anti-persistent behavior can be detected. On medium timescales, small evidence

is given, that the price process reaches a super-diffusive regime. For long timescales the price

dynamics tend to random walk behavior (H = 0.5), which is also shown for comparison. The

relative error

=

HGPU(t) HCPU(t)

HCPU(t)

(3)

shown in the inset of figure 7 is smaller than one-tenth of a per cent.



10/21

10

t (105

units of time tick)

p

(t)

(percentofparvalue)

111.5

112.0

112.5

113.0

113.5

114.0

114.5

115.0

115.5

116.0

116.5

1 2 3 4 5 6 7 8 9 10

FGBL JUN 2007

Figure 6. High-frequency financial time series of the FGBL with expiration time

June 2007 traded at the Eurex: The time series contains 1 051 982 trades recorded

from 8 March 2007 to 7 June 2007. In all presented calculations of the FGBL

time series on a GPU, is fixed to 11. Thus, the data set is limited to the first

T = 1 049 088 trades in order to fit the data set length to the constraints of the

specific GPU implementation.

4. Equilibrium autocorrelation

The autocorrelation function is a widely used concept in order to determine dependencies within

a time series. The autocorrelation function is given by the correlation between the time series

and the time series shifted by the time lag t through

(t) =p(t) p(t+ t) p(t)p(t+ t)

p(t)2 p(t)2

p(t+ t)2 p(t+ t)2. (4)

For a stationary time series, (4) reduces to

(t) =

p(t) p(t+ t) p(t)2

p(t)2 p(t)2 , (5)

as the mean value and the variance stay constant, i.e. p(t) = p(t+ t) and p(t)2 =

p(t+ t)2.

Applied to financial markets it can be observed that the autocorrelation function of price

changes exhibits a significant negative value at time lag one tick, whereas it vanishes for

time lags t >1. Furthermore, the autocorrelation of absolute price changes or squared price

changes, which is related to the volatility of the price process, decays slowly [15]. In order to

implement (5) on a GPU architecture, similar steps as in section 3 are necessary. The calculation

of the time lag-dependent part p(t) p(t+ t) is analogous to the determination of the Hurst

exponent on the GPU. The input time series, which is transferred to the GPUs main memory,

does not contain prices but price changes. However, in addition one needs the results for p(t)



11/21

11

Time lag t (units of time tick)

HurstexponentH(t)

0.0

0.1

0.2

0.3

0.4

0.5

21

22

23

24

25

26

27

28

29

Random walkFGBL (CPU)FGBL (GPU)


Relativeerror(%)

0.00

0.02

0.04

0.06

100 200 300 400 500

Figure 7. Hurst exponent H(t) in dependence of time lag t calculated on

CPU and GPU. Additionally, the theoretical Hurst exponent of a random walk

process (H = 0.5) is included for comparison. One can clearly see the well-

known anti-persistent behavior of the FGBL time series on short timescales

(t < 24 time ticks). Furthermore, evidence is given that the process reaches a

slightly super-diffusive region (H 0.525) on medium timescales (24 time ticks

< t < 27 time ticks). On long timescales, an asymptotic random walk behavior

can be found. In order to quantify deviations from calculations on a CPU, the

relative error (see main text) is presented for each time lag t in the inset. It istypically smaller than 103.

and p(t)2. For this purpose, an additional array of length T is allocated, in which a GPU

kernel function stores the squared values of the time series. Then, time series and squared time

series are reduced with the same binary tree reduction process as in section 3. However, as this

procedure produces arrays of length tmax, one has to summarize these values in order to obtain

p(t) and p(t)2.

The processing times for determining the autocorrelation function for tmax = 256 on CPU

and 8800 GT can be found in figure 8. Here, we find that allocation and memory transfer

dominate the total processing time on the GPU for small values of and thus, only a fractionof the maximum acceleration factor 33, which is shown as an inset, can be reached. Using

the consumer graphics card GTX 280, we obtain a maximum speed-up factor of roughly 55

for tmax = 256 and 68 for tmax = 512 as shown in figure 9. In figure 10, the autocorrelation

function of the FGBL time series is shown. At time lag one, the time series exhibits a large

negative autocorrelation, (t = 1) = 0.43. In order to quantify deviations between GPU

and CPU calculations, the relative error is presented in the inset of figure 10. Note that small

absolute errors can cause relative errors up to three per cent because the values (t >1) are

close to zero.

For some applications, it is interesting to study larger maximum time lags of the

autocorrelation function. Based on our GPU implementation one has to modify the program

code in the following way. So far, each thread was responsible for a specific time lag t.



12/21

12

Length parameter()

Time(ms)

10

2

101

100

101

102

103

104

105

106

107

4 6 8 10 12 14



Acceleration

0

10

20

30

4 6 8 10 12 14

Figure 8. Processing times for the calculation of the equilibrium autocorrelation

function (t) on the GPU and CPU for tmax = 256. The graphics card

8800 GT is used as the GPU device. The total processing time on the GPU is

broken into allocation time, time for memory transfer, time for main processing,

time for post-processing and time for final processing. The acceleration factor

is shown as inset. A maximum acceleration factor of roughly 33 can be obtained.

Length parameter()

Time(ms)

102

101

100

101

102

103

104

105

106

107

2 4 6 8 10 12 14

Time on GPU for allocationTime on GPU for memory transferTime on GPU for main functionTime on GPU for post processingTime on GPU for final processingTotal processing time on GPUTotal processing time on CPU

Acceleration

0

20

40

60

2 4 6 8 10 12 14

Figure 9. Processing times for the calculation of the equilibrium autocorrelation

function (t) on the GPU and CPU for tmax = 512. The GTX 280 is used as

the GPU device. A maximum acceleration factor of roughly 68 can be obtained.

In a modified ansatz, each thread is responsible for more than one time lag in order to realize

a maximum time lag, which is a multiple of the maximal 512 threads per block. This way, one

obtains a maximum speed-up factor of, e.g. roughly 84 for tmax = 1024 using the GTX 280.



13/21

13


Autocorrelation(t)

0.4

0.3

0.2

0.1

0.0

20

21

22

23

24

25

26

27

28

29

FGBL (CPU)FGBL (GPU)


Relativeerror(%)

0.0

0.5

1.0

1.5

2.0

2.5

100 200 300 400 500

Figure 10. Equilibrium autocorrelation function (t) in dependence of time

lag t calculated on the CPU and GPU. One can clearly see the well-known neg-

ative autocorrelation of the FGBL time series at time lag one. In order to quantify

deviations from calculations on a CPU the relative error is presented for each

time lag t in the inset. The relative error is always smaller than 3 102.

5. Fluctuation pattern conformity

As a third method of time series analysis, the recently introduced fluctuation pattern conformity(PC) determination [50] was migrated to a GPU architecture. The PC quantifies pattern-based

complex short-time correlations of a time series. In context of financial market time series, the

existence of complex correlations implies that reactions of market participants to a given time

series pattern are related to comparable patterns in the past. On medium and long timescales,

one can state that no significant complex correlations can be measured because the price

process exhibits random walk statistics. However, if one investigates the trading process on a

tick-by-tick basis, evidence is given for recurring events. In the course of these considerations, a

general pattern conformity observable is defined in [50], which is not limited to the application

to financial market time series. In general, the aim is to compare a current pattern of time interval

length t with all possible previous patterns of the time series p(t). The current observation

time shall be denoted by t. Then, the current patterns time interval measured in time ticksis given by [t t; t). The evolution after this current pattern intervalthe distance to t is

expressed by t+ (see below)is compared with the prediction derived from all historical

patterns. However, as the standard deviation of the price process is not constant in time, all

comparison patterns have to be normalized with respect to the current pattern. For this purpose,

the true range is usedthe difference between high and low within each interval. Let ph (t, t)

be the maximum value of a pattern of length t at time t and analogously pl (t, t) be the

minimum value. Thus, we can create a modified time series, which is true range adapted in the

appropriate time interval, through

pt

t(t) =

p(t) pl (t, t)

ph (t, t

) pl (t, t

)

(6)

with pt

t(t) [0; 1] t [t t; t), as illustrated in figure 11.



14/21

14

0.0

0.5

1.0

t^

t

t^

1 t^

p~tt(t)

p~tt(t )

p~t^t(t^ 1)

Figure 11. Schematic visualization of the pattern conformity estimation

mechanism. The current pattern pt

t(t) and the comparison pattern pt

t(t )

have the maximum value 1 and the minimum value 0 in [t t; t), as shown by

the filled rectangle. For the pattern conformity calculation, we need to analyze for

each time difference t+ whether the current pattern value and the comparison

pattern value at t+ t+ is above or below the last value of the current pattern

pt

t(t 1). If both are above or below this last value, then +1 is added to the

non-normalized pattern conformity (t+, t). If one is above and the other

below, then 1 is added.

In order to assess the match of a pattern with a comparison pattern, the fit quality Qt

t ( )between the current pattern sequence pt

t(t) and a comparison pattern sequence pt

t(t )

for t [t t; t) has to be determined by the summation of the squared variations through

Qt

t( ) =

t=1

pt

t(t ) pt

t(t )

2

t. (7)

Note that Qt

t( ) takes values in the interval [0, 1] as a result of the true range adaption. With

these elements, one can define a pre-stage of the PC, which is not yet normalized, as motivated

in figure 11, by

t+, t

=

Tt+t=t

t=

sgn

t

t(,t+)

exp

Qt

t( )

(8)

with = t if t t 0 and = t else. In general, we limit the evaluation for

each pattern to maximal historical patterns. Furthermore, for the sign function, we use the stan-

dard definition sgn (x) = 1 for x > 0, sgn (x) = 0 for x = 0, and sgn (x) = 1 for x < 0. In (8),

the parameter weighs pattern terms according to their qualities Qt

t( ) . The larger , the

stricter the pattern weighting in order to use only pattern sequences with good agreement to the

current pattern sequence. The expression t

t(,t+) in (8), which takes into account the value



15/21

15

Scan interval parameter()

Time(ms)

102

101

100

101

102

103

104

105

106

107

108

10910

10

1011

1012

20 21 22 23 24 25

Time on GPU for allocationTime on GPU for memory transferTime on GPU for main functionTotal processing time on GPU

Total processing time on CPU

Accele

ration

510

15

20

21

22

23

24

25

Figure 12. Processing times for the calculation of the pattern conformity on GPU

and CPU for tmax = t+max = 20. The GTX 280 is used as GPU device. The

total processing time on GPU is broken into allocation time, time for memory

transfer, and time for main processing. The acceleration factor is shown as

inset. A maximum acceleration factor of roughly 19 can be obtained.

of current and comparison pattern sequences after t for a proposed t+ relative to pt

t(t 1),

is given by

t

t(,t+) =

pt

t(t1 +t+) pt

t(t1)

pt

t(t 1+t+) pt

t(t 1)

. (9)

By normalizing (8) through its altered version, in which the sign function is replaced by its

absolute value, the pattern conformity can be written as

t+, t

=

t+, t

Tt+t=t

t=

|sgn

t

t(,t+)

|

exp

Qt

t( )

. (10)

We repeat that the pattern conformity is the most accurate measure to characterize theshort-term correlations of a general time series. It is essentially given by the comparison of

subsequences of the time series. Subsequences of various lengths are compared with historical

sequences in order to extract similar reactions on similar patterns.

In order to realize a GPU implementation of the pattern conformity provided in (10),

one has to allocate memory as for the Hurst exponent and for the autocorrelation function

determination in sections 3 and 4, respectively. The allocation is needed for the array containing

the time series, which has to be transferred to the global memory of the GPU, and for further

processing arrays. The main processing GPU function is invoked with a proposed t and

a given t. In the kernel function, shared memory arrays for comparison and current pattern

sequences are allocated and loaded from global memory of the GPU. In the main calculation,

each thread handles one specific comparison pattern, i.e. each thread is responsible for one



16/21

16

05

1015

2025

05

1015

2025

0.0

0.2

0.4

0.6

0.8

t

(a)

t+

05

1015

2025

05

1015

2025

0.00

0.05

0.10

0.15

0.20

t

(b)

t+

Figure 13. (a) Pattern conformity GPU=100(t, t+) of FGBL time series

with = 16 384 calculated on the consumer graphics card GTX 280.

(b) Relative error (in percent) between calculation on the GPU and CPU

(CPU=100(t, t+), with the same parameter settings). The processing time on

the GPU was 5.8 h; the results on the CPU were obtained after 137.2 h, which

corresponds to roughly 5.7 days. Thus, for these parameters an acceleration

factor of roughly 24 is obtained.

value of and so, = is applied with denoting the scan interval parameter and

denoting the number of threads per block. Thus, corresponds to the number of blocks.

The partial results of (t+, t) are stored in a global memory based array of dimension

t+. These partial results have to be reduced in a further processing step, which uses the

same binary tree structure as applied in section 3 for the Hurst exponent determination.

The pattern conformity for a random walk time series, which exhibits no correlations by

construction, is 0. The pattern conformity for a perfectly correlated time series is 1 [50]. A

maximum speed-up factor of roughly 10 can be obtained for the calculation of the pattern

conformity on the GPU and CPU for tmax = t+max = 20, T = 25 000, = 100 and = 256

using the 8800 GT. In figure 12, corresponding results for using the GTX 280 are shown in

dependence of the scan interval parameter . Here, a maximum acceleration factor of roughly

19 can be realized.

With this method, which is able to detect complex correlations of a time series, it is alsopossible to search for pattern conformity based complex correlations in financial market data,

as shown in figure 13 for the FGBL time series. In figure 13(a), the results for the pattern

conformity GPU=100(t, t+) are presented with = 16 384 calculated on the GTX 280. One

can clearly see that for small values of t and t+ large values of GPU=100 are obtained

with a maximum value of roughly 0.8. For the results shown in figure 13(b), the calculation

of the pattern conformity is executed on the CPU, and in figure 13(c), the relative absolute

error

= 102

GPU =100 CPU =100

CPU=100

(11)



17/21

17

05

10

1520

250

510

1520

25

0.00

0.05

0.10

0.15

0.20

0.25

0.30

t t

+

*

Figure 14. (a) FGBL pattern conformity corrected by the ACRW with = 0.044

and with 1.5 106 time steps. Thus, = FGBL=100 ACRW=100 is shown (see the

text).

is shown, which is smaller than two-tenths of a per cent. This small error arises because the GPU

device summarizes only a large number of the weighted values +1 and 1. Thus, the limitation

to single precision has no significant negative effect for the result.

This raw pattern conformity profile is dominated by trivial pattern correlation parts causedby the jumps of the price process between best bid and best ask pricethe best bid price is

given by the highest limit order price of all buy orders in an order book and analogously the

best ask price is given by the lowest limit order price of all sell orders in an order book. As

performed in [50], there are possibilities for reducing these trivial pattern conformity parts. For

example, it is possible to add such jumps around the spread synthetically to a random walk.

Let p be the time series of the synthetically anti-correlated random walk created (ACRW) in a

Monte Carlo simulation through p = a(t) + b(t), which was used in sections 35 as synthetic

time series. With probability [0; 0.5] the expression a(t+ 1) a (t) = +1 will be applied

and with probability a decrement a (t+ 1) a (t) = 1 will occur. With probability 12

the expression a(t+ 1) = a(t) is used. The stochastic variable b(t) models the bid-ask spread

and can take the value 0 or 1 in each time step, each with probability 0.5. Thus, by changing ,

the characteristic timescale of process a in comparison to process b can be modified.

Parts of the pattern-based correlations in figure 13 stem from this trivial negative

autocorrelation for t = 1. In order to try to correct for this, in figure 14 (an animated

visualization can be found in the multimedia enhancements of this publication), the pattern

conformity of the ACRW with = 0.044, which reproduces the anti-correlation of the FGBL

time series at time lag t = 1, is subtracted from the data of figure 13(a). Obviously, the

autocorrelation for the time lag t = 1, which is understood from the order book structure

is not the sole reason for the pattern formation conformity, which is shown in figure 13(a).

Thus, evidence is obtained that financial market time series show pattern correlation on very

short timescales beyond the simple anti-persistence which is due to the gap between bid and askprices.



18/21

18

6. Conclusion and outlook

In this paper, we applied the compute unified device architecturea programming approach forissuing and managing computations on a GPU as a data-parallel computing deviceto methods

of fluctuation analysis. Firstly, the Hurst exponent calculation was presented performed on a

GPU. These results of the scaling behavior of a stochastic process can be obtained up to 80

times faster than on a current CPU core and the relative absolute error of the results obtained

from the CPU and GPU is smaller than 103. The calculation of the equilibrium autocorrelation

function was also migrated to a GPU device successfully and applied to a financial market

time series. In this case, acceleration factors up to roughly 84 were realized. In a further part,

the pattern formation conformity algorithm, which quantifies pattern-based complex short-time

correlations of a time series, was determined on a GPU. For this application the GPU was up

to 24 times faster than the CPU, and the values provided by the GPU and CPU differ only in a

relative error of maximal two-tenths of a per cent. Furthermore, we could verify, that the currentGPU generation is roughly two times faster than the previous one. The presented methods were

applied to an FGBL time series of the Eurex, which exhibits an anti-persistent regime on short

timescales. Evidence was found that a super-diffusive regime is reached on medium timescales.

On long timescales, the FGBL time series complies to random walk statistics. Furthermore, the

anti-correlation at time lag onean empirical stylized fact of financial market time serieswas

verified. The pattern conformity which is used is the most accurate measure to characterize the

short-term correlations of a general time series. It is essentially given by the comparison of

subsequences of the time series. Subsequences of various lengths are compared with historical

sequences in order to extract similar reactions on similar patterns. The pattern conformity of

the FGBL contract exhibits large values up to 0.8. However, these values also include the trivial

auto-correlation property at time lag one, which can be removed by the pattern conformity of a

synthetic anti-correlated random walk. However, significant pattern based correlations are still

exhibited after correction. Thus, evidence is obtained that financial market time series show

pattern correlation on very short timescales beyond the simple anti-persistence, which is due

to the gap between bid and ask prices. Further applications of the GPU-accelerated techniques

in context of the Monte Carlo simulations and agent-based modeling of financial markets are

certainly well worth pursuing. As already mentioned in the introduction, the main advantage of

general-purpose computations on GPUs is that one does not need special-purpose computers.

Although GPU computing opens a large variety of possibilities, the recent development of using

graphic cards for scientific computing will perhaps also revive special-purpose computing as

GPU implementations are not appropriate for each problem.

Acknowledgments

This work was financially supported by the German Research Foundation (DFG) and benefited

by the Forschungsfond of the Materialwissenschaftliches Forschungszentrum (MWFZ) of the

Johannes Gutenberg University of Mainz. The present work is based on the private opinion of

the authors and does not necessarily reflect the views of Artemis Capital Asset Management

GmbH.



19/21

19

Appendix. GPU source code

Source code of the GPU kernel functions for determining the time lag-dependent Hurst exponentH(t). These functions are executed on the GPU device.

References

[1] Black F and Scholes M 1973 J. Polit. Econ. 81 637

[2] Cont R and Bouchaud J P 2000 Macroecon. Dyn. 4 170

[3] Gopikrishnan P, Plerou V, Amaral L A N, Meyer M and Stanley H E 1999 Phys. Rev. E 60 5305

[4] Mantegna R N and Stanley H E 2000 An Introduction to EconophysicsCorrelations and Complexity in

Finance (Cambridge: Cambridge University Press)

[5] Bouchaud J P and Potters M 2000 Theory of Financial RisksFrom Statistical Physics to Risk Management

(Cambridge: Cambridge University Press)

http://dx.doi.org/10.1086/260062http://dx.doi.org/10.1086/260062http://dx.doi.org/10.1017/S1365100500015029http://dx.doi.org/10.1017/S1365100500015029http://dx.doi.org/10.1103/PhysRevE.60.5305http://dx.doi.org/10.1103/PhysRevE.60.5305http://www.njp.org/http://www.njp.org/http://dx.doi.org/10.1103/PhysRevE.60.5305http://dx.doi.org/10.1017/S1365100500015029http://dx.doi.org/10.1086/260062


20/21

20

[6] Paul W and Baschnagel J 2000 Stochastic Processes: From Physics to Finance (Heidelberg: Springer)

[7] Mandelbrot B 1963 J. Bus. 36 394

[8] Mandelbrot B 1964 J. Bus. 37 393[9] Kiyono K, Struzik Z R and Yamamoto Y 2006 Phys. Rev. Lett. 96 068701

[10] Bouchaud J P, Matacz A and Potters M 2001 Phys. Rev. Lett. 87 228701

[11] Krawiecki A, Holyst J A and Helbing D 2002 Phys. Rev. Lett. 89 158701

[12] Daniels M G, Farmer J D, Gillemot L, Iori G and Smith E 2003 Phys. Rev. Lett. 90 108102

[13] Smith E, Farmer J D, Gillemot L and Krishnamurthy S 2003 Quant. Finance 3 481

[14] Preis T, Golke S, Paul W and Schneider J J 2006 Europhys. Lett. 75 510

[15] Preis T, Golke S, Paul W and Schneider J J 2007 Phys. Rev. E 76 016108

[16] Bak P, Paczuski M and Shubik M 1997 Physica A 246 430

[17] Challet D and Stinchcombe R 2003 Quant. Finance 3 155

[18] Maslov S 2000 Physica A 278 571

[19] Maslov S and Mills M 2001 Physica A 299 234

[20] Stauffer D and Penna T J P 1998 Physica A 256 284[21] Stauffer D and Penna T J P 1999 Physica A 271 496

[22] Lux T and Marchesi M 1999 Nature 397 498

[23] Laloux L, Cizeau P, Bouchaud J P and Potters M 1999 Phys. Rev. Lett. 83 1467

[24] Plerou V, Gopikrishnan P, Rosenow B, Amaral L A N and Stanley H E 1999 Phys. Rev. Lett. 83 1471

[25] OHara M 1997 Market Microstructure Theory (Malden, MA: Blackwell)

[26] Biais B, Hillion P and Spatt C 1995 J. Finance 50 1655

[27] Brunnermeier M and Pedersen L 2005 J. Finance 60 1825

[28] Bertsimas D and Lo A 1998 J. Financ. Markets 1 1

[29] Bank P and Baum D 2004 Math. Finance 14 1

[30] Landau D and Binder K 2005 A Guide to Monte Carlo Simulations in Statistical Physics (Cambridge:

Cambridge University Press)[31] van Meel J A, Arnold A, Frenkel D, Zwart S P and Belleman R G 2008 Mol. Simul. 34 259

[32] Kstler H, Schmid R, Rde U and Scheit C 2008 Comput. Vis. Sci. 11 115

[33] Schneider J J and Kirkpatrick S 2006 Stochastic Optimization (Berlin: Springer)

[34] Dagum L and Menon R 1998 IEEE Comput. Sci. Eng. 5 46

[35] Gabriel E et al 2004 Open MPI: goals, concept and design of a next generation MPI implementation Proc.

11th European PVM/MPI Users Group Meeting (Budapest, Hungary) pp 97104

[36] Tomov S, McGuigan M, Bennett R, Smith G and Spiletic J 2005 Comput. Graph. 29 71

[37] Stone J E, Phillips J C, Freddolino P L, Hardy D J, Trabuco L G and Schulten K 2007 J. Comput. Chem.

28 2618

[38] Susukita R et al 2003 Comput. Phys. Commun. 155 115

[39] Zwart S F P, Bellemana R G and Geldof P M 2007 New Astron. 12 641

[40] Bellemana R G, Bdorf J and Zwart S F P 2007 New Astron. 13 103

[41] Li W, Wei X and Kaufman A 2003 Vis. Comput. 19 444

[42] Anderson J A, Lorenz C D and Travesset A 2008 J. Comput. Phys. 227 5342

[43] Yanga J, Wang Y and Chen Y 2007 J. Comput. Phys. 221 799

[44] Preis T, Virnau P, Paul W and Schneider J J 2009 J. Comput. Phys. 228 4468

[45] Rost R J 2006 OpenGL Shading Language 2nd edn (Reading, MA: Addison-Wesley)

[46] Fernando R and Kilgard M J 2003 The Cg Tutorial: The Definitive Guide to Programmable Real-time

Graphics (Reading, MA: Addison-Wesley)

[47] NVIDIA CUDA Compute Unified Device Architecture, Programming Guide 2008 version 2.0 NVIDIA

Corporation (http://www.nvidia.com)

[48] ATI CTM Guide, Technical Reference Manual 2006 version 1.01 Advanced Micro Devices, Inc.

(http://www.amd.com )

http://dx.doi.org/10.1086/294632http://dx.doi.org/10.1086/294632http://dx.doi.org/10.1103/PhysRevLett.96.068701http://dx.doi.org/10.1103/PhysRevLett.96.068701http://dx.doi.org/10.1103/PhysRevLett.87.228701http://dx.doi.org/10.1103/PhysRevLett.87.228701http://dx.doi.org/10.1103/PhysRevLett.89.158701http://dx.doi.org/10.1103/PhysRevLett.89.158701http://dx.doi.org/10.1103/PhysRevLett.90.108102http://dx.doi.org/10.1103/PhysRevLett.90.108102http://dx.doi.org/10.1088/1469-7688/3/6/307http://dx.doi.org/10.1088/1469-7688/3/6/307http://dx.doi.org/10.1209/epl/i2006-10139-0http://dx.doi.org/10.1209/epl/i2006-10139-0http://dx.doi.org/10.1103/PhysRevE.76.016108http://dx.doi.org/10.1103/PhysRevE.76.016108http://dx.doi.org/10.1016/S0378-4371(97)00401-9http://dx.doi.org/10.1016/S0378-4371(97)00401-9http://dx.doi.org/10.1088/1469-7688/3/3/301http://dx.doi.org/10.1088/1469-7688/3/3/301http://dx.doi.org/10.1016/S0378-4371(00)00067-4http://dx.doi.org/10.1016/S0378-4371(00)00067-4http://dx.doi.org/10.1016/S0378-4371(01)00301-6http://dx.doi.org/10.1016/S0378-4371(01)00301-6http://dx.doi.org/10.1016/S0378-4371(98)00223-4http://dx.doi.org/10.1016/S0378-4371(98)00223-4http://dx.doi.org/10.1016/S0378-4371(99)00290-3http://dx.doi.org/10.1016/S0378-4371(99)00290-3http://dx.doi.org/10.1038/17290http://dx.doi.org/10.1038/17290http://dx.doi.org/10.1103/PhysRevLett.83.1467http://dx.doi.org/10.1103/PhysRevLett.83.1467http://dx.doi.org/10.1103/PhysRevLett.83.1471http://dx.doi.org/10.1103/PhysRevLett.83.1471http://dx.doi.org/10.2307/2329330http://dx.doi.org/10.2307/2329330http://dx.doi.org/10.1111/j.1540-6261.2005.00781.xhttp://dx.doi.org/10.1111/j.1540-6261.2005.00781.xhttp://dx.doi.org/10.1016/S1386-4181(97)00012-8http://dx.doi.org/10.1016/S1386-4181(97)00012-8http://dx.doi.org/10.1111/j.0960-1627.2004.00179.xhttp://dx.doi.org/10.1111/j.0960-1627.2004.00179.xhttp://dx.doi.org/10.1080/08927020701744295http://dx.doi.org/10.1080/08927020701744295http://dx.doi.org/10.1007/s00791-007-0062-0http://dx.doi.org/10.1007/s00791-007-0062-0http://dx.doi.org/10.1109/99.660313http://dx.doi.org/10.1109/99.660313http://dx.doi.org/10.1016/j.cag.2004.11.008http://dx.doi.org/10.1016/j.cag.2004.11.008http://dx.doi.org/10.1002/jcc.20829http://dx.doi.org/10.1002/jcc.20829http://dx.doi.org/10.1016/S0010-4655(03)00349-7http://dx.doi.org/10.1016/S0010-4655(03)00349-7http://dx.doi.org/10.1016/j.newast.2007.05.004http://dx.doi.org/10.1016/j.newast.2007.05.004http://dx.doi.org/10.1016/j.newast.2007.07.004http://dx.doi.org/10.1016/j.newast.2007.07.004http://dx.doi.org/10.1016/j.jcp.2008.01.047http://dx.doi.org/10.1016/j.jcp.2008.01.047http://dx.doi.org/10.1016/j.jcp.2006.06.039http://dx.doi.org/10.1016/j.jcp.2006.06.039http://dx.doi.org/10.1016/j.jcp.2009.03.018http://dx.doi.org/10.1016/j.jcp.2009.03.018http://www.nvidia.com/http://www.amd.com/http://www.njp.org/http://www.njp.org/http://www.amd.com/http://www.nvidia.com/http://dx.doi.org/10.1016/j.jcp.2009.03.018http://dx.doi.org/10.1016/j.jcp.2006.06.039http://dx.doi.org/10.1016/j.jcp.2008.01.047http://dx.doi.org/10.1016/j.newast.2007.07.004http://dx.doi.org/10.1016/j.newast.2007.05.004http://dx.doi.org/10.1016/S0010-4655(03)00349-7http://dx.doi.org/10.1002/jcc.20829http://dx.doi.org/10.1016/j.cag.2004.11.008http://dx.doi.org/10.1109/99.660313http://dx.doi.org/10.1007/s00791-007-0062-0http://dx.doi.org/10.1080/08927020701744295http://dx.doi.org/10.1111/j.0960-1627.2004.00179.xhttp://dx.doi.org/10.1016/S1386-4181(97)00012-8http://dx.doi.org/10.1111/j.1540-6261.2005.00781.xhttp://dx.doi.org/10.2307/2329330http://dx.doi.org/10.1103/PhysRevLett.83.1471http://dx.doi.org/10.1103/PhysRevLett.83.1467http://dx.doi.org/10.1038/17290http://dx.doi.org/10.1016/S0378-4371(99)00290-3http://dx.doi.org/10.1016/S0378-4371(98)00223-4http://dx.doi.org/10.1016/S0378-4371(01)00301-6http://dx.doi.org/10.1016/S0378-4371(00)00067-4http://dx.doi.org/10.1088/1469-7688/3/3/301http://dx.doi.org/10.1016/S0378-4371(97)00401-9http://dx.doi.org/10.1103/PhysRevE.76.016108http://dx.doi.org/10.1209/epl/i2006-10139-0http://dx.doi.org/10.1088/1469-7688/3/6/307http://dx.doi.org/10.1103/PhysRevLett.90.108102http://dx.doi.org/10.1103/PhysRevLett.89.158701http://dx.doi.org/10.1103/PhysRevLett.87.228701http://dx.doi.org/10.1103/PhysRevLett.96.068701http://dx.doi.org/10.1086/294632


21/21

21

[49] NVIDIA GeForce GTX 280 Specifications 2008 NVIDIA Corporation (http://www.nvidia.com)

[50] Preis T, Paul W and Schneider J J 2008 Europhys. Lett. 82 68005

[51] Mandelbrot B and Hudson R 2004 The (mis)Behavior of Markets. A Fractal View of Risk, Ruin and Reward(New York: Basic Books)

[52] Hurst H E 1951 Trans. Am. Soc. Civil Engin. 116 770

[53] Darbellay G A and Wuertz D 2000 Physica A 287 429

[54] Ausloos M 2000 Physica A 285 48

[55] Carbone A, Castelli G and Stanley H E 2004 Physica A 344 267

[56] Gu G F and Zhou W X 2009 Eur. Phys. J. B 67 585

http://www.nvidia.com/http://www.nvidia.com/http://dx.doi.org/10.1209/0295-5075/82/68005http://dx.doi.org/10.1209/0295-5075/82/68005http://dx.doi.org/10.1016/S0378-4371(00)00382-4http://dx.doi.org/10.1016/S0378-4371(00)00382-4http://dx.doi.org/10.1016/S0378-4371(00)00271-5http://dx.doi.org/10.1016/S0378-4371(00)00271-5http://dx.doi.org/10.1016/j.physa.2004.06.130http://dx.doi.org/10.1016/j.physa.2004.06.130http://dx.doi.org/10.1140/epjb/e2009-00052-4http://dx.doi.org/10.1140/epjb/e2009-00052-4http://www.njp.org/http://www.njp.org/http://dx.doi.org/10.1140/epjb/e2009-00052-4http://dx.doi.org/10.1016/j.physa.2004.06.130http://dx.doi.org/10.1016/S0378-4371(00)00271-5http://dx.doi.org/10.1016/S0378-4371(00)00382-4http://dx.doi.org/10.1209/0295-5075/82/68005http://www.nvidia.com/

accelerated fluctuation analysis by graphic card and complex pattern formation in financial markets

Documents