a 64-bit horse that can count

A 64-bit horse that can count

Author: Andrey Karpov

Date: 11.05.2009

Abstract The article concerns the peculiarities of Visual C++ compiler's behavior when generating 64-bit code and

possible errors relating to it.

Introduction The phenomenon of "The Clever Hans", Mr. von Osten's horse, was described in 1911 [1]. The Clever

Hans was famous because of his ability to read and solve mathematical problems by tapping with his

front hoof. Of course, there were a lot of skeptics. That's why a team of experts tested Hans' abilities

and proved that the horse was showing them without any help of Mr. von Osten. But how could a

common horse possess such an intellectual level - a human one?! The psychologist O. Pfungst carried

out some very thorough experiments and discovered that Hans received very faint unintentional hints

from those who were asking him questions. For example, when people asked Hans about anything they

started to stare at his front hoof with the help of which the horse "answered". But as soon as Hans had

tapped the right number, they raised their eyes or head just a little waiting for him to finish his answer.

And the horse, that had been trained to note and use these very subtle motions considered them as

signals to stop his action. From aside it looked as if the horse had given the right answer to the question.

Such a wonderful horse it was that counted and solved arithmetic problems although he was unable to

do it. 64-bit programs turned out to be such digital horses of the beginning of the 21st

century, many of

which cannot count either although are successful in pretending to do so. Let's consider this

phenomenon in detail.

1. Potential errors I am the author and co-author of some articles devoted to the problems of developing 64-bit

applications. You can see the articles on our site: http://www.viva64.com/articles/64-bit-development/.

In these articles, I try to use the term "a potential error" or "a hidden error" rather than just "an error"

[2, 3, 4].

This is explained by that one and the same code can be viewed upon as both correct and incorrect

depending on its purpose. A simple example - using a variable of int type for indexing an array's items. If

we address an array of graphics windows with the help of this variable, everything is alright. We never

need, and moreover it is impossible, to operate billions of windows. But when we use a variable of int

type for indexing an array's items in 64-bit mathematical programs or data bases, it can well be a

problem when the number of items excesses 0..INT_MAX range.

But there is one more much subtler reason to call errors "potential". The point is that it depends not

only on the input data but on the mood of the compiler's optimizer if an error occurs or not. I have been

avoiding this topic for a long time for most of such errors occur explicitly in the debug-version and only

in release-versions they are "potential". But not every program built as debug can be debugged at large

data sizes. There is a situation when the debug-version is tested only at very small sizes of data. And

overload testing and testing by end users at actual data is performed only in release-versions where

errors can be temporarily hidden. That's why I decided to tell you what I know about it. I hope that I will

manage to persuade you that it is dangerous to rely only on the checks of the execution stage (unit-

tests, dynamic analysis, manual testing) when porting a program on a different platform. You will say

that all this is meant for promoting Viva64 tool. Yes, you are right, but still read the horror stories I'm

going to tell you. I am fond of telling them.

2. How it all begun

- Why do you have two identical JMPs in a row in your code?

- What if the first one wouldn't work?

I faced the peculiarities of Visual C++ 2005 compiler's optimization for the first time when developing

PortSample program. This is a project included into Viva64 distribution kit and is intended for

demonstrating all the errors which Viva64 analyzer diagnoses. The examples included into this project

must work correctly in 32-bit mode and cause errors in 64-bit one. Everything was alright in the debug-

version but I faced difficulties in the release-version. The code which was to lead to a hang or crash in

64-bit mode worked successfully! The cause lay in optimization. The solution consisted in additional

redundant complication of the examples' code and adding "volatile" key words which you can see in

PortSample project in a great number.

The same relates to Visual C++ 2008. The code differs a bit but everything written in this article can be

applied both to Visual C++ 2005 and Visual C++ 2008. We won't make any difference between them

further.

If you think that it is good that some errors don't occur, refuse this thought. Code with such errors

becomes very unstable and a smallest change of it not relating directly to an error can cause change of

the code's behavior. To make sure, I would like to point out that this is not the fault of the compiler but

of the hidden defects of the code. Further, we will show sample phantom errors which disappear and

occur in release-versions when smallest alterations of the code are introduced and which you have to

hunt for a long time.

3. Phantoms

The section will be long and boring, so I will begin with a funny story which is an abstract of the section:

Once Heracles was walking by a lake and there he saw Hydra. He ran up to her and cut her single head

off. But instead of one head two more grew. Heracles cut them off too but 4 more appeared. He cut the 4

heads off - and there were 8 ones... So passed one hour, two hours, three hours... And then Heracles cut

Hydra's 32768 heads off and Hydra died for she was 16-bit.

Like in this funny story errors lie in types' overflow which can occur or fail to occur depending on the

code the compiler will generate when optimization is enabled. Let's consider the first example of the

code which works in release mode although it shouldn't be so:

int index = 0;

size_t arraySize = ...;

for (size_t i = 0; i != arraySize; i++)

array[index++] = BYTE(i);

This code fills the whole array with values correctly even if the array's size is much larger than INT_MAX.

Theoretically it is impossible because index variable has int type. Some time later, because of the

overflow access to items by a negative index must occur. But optimization leads to generating the

following code:

0000000140001040 mov byte ptr [rcx+rax],cl

0000000140001043 add rcx,1

0000000140001047 cmp rcx,rbx

000000014000104A jne wmain+40h (140001040h)

As you can see, 64-bit registers are used and there is no overflow. But let's alter the code a bit:

int index = 0;

for (size_t i = 0; i != arraySize; i++)

{

array[index] = BYTE(index);

++index;

}

Let's consider that the code look more beautiful this way. I think you will agree that functionally it

remains the same. But the result will be quite different - a program crash will occur. Let's examine the

code generated by the compiler:

0000000140001040 movsxd rcx,r8d

0000000140001043 mov byte ptr [rcx+rbx],r8b

0000000140001047 add r8d,1

000000014000104B sub rax,1

000000014000104F jne wmain+40h (140001040h)

That very overflow occurs that must occur in the previous example as well. r8d = 0x80000000 register's

value extends into rcx as 0xffffffff80000000. The consequence is writing outside the limits of the array.

Let's consider another example of optimization and see how easy it is to spoil everything:

unsigned index = 0;

for (size_t i = 0; i != arraySize; ++i) {

array[index++] = 1;

if (array[i] != 1) {

printf("Error\n");

break;

}

}

Assembler code:

0000000140001040 mov byte ptr [rdx],1

0000000140001043 add rdx,1

0000000140001047 cmp byte ptr [rcx+rax],1

000000014000104B jne wmain+58h (140001058h)

000000014000104D add rcx,1

0000000140001051 cmp rcx,rdi

0000000140001054 jne wmain+40h (140001040h)

The compiler decided to use 64-bit register rdx for storing index variable. As a result the code may

correctly process arrays with the size more than UINT_MAX.

But the world is fragile. It is enough just to complicate the code a bit and it becomes incorrect:

volatile unsigned volatileVar = 1;

...

unsigned index = 0;

for (size_t i = 0; i != arraySize; ++i) {

array[index] = 1;

index += volatileVar;

if (array[i] != 1) {

printf("Error\n");

break;

}

}

Using "index += volatileVar;" expression instead of index++ leads to participation of 32-bit registers in

the code and therefore occurrence of overflows:

0000000140001040 mov ecx,r8d

0000000140001043 add r8d,dword ptr [volatileVar (140003020h)]

000000014000104A mov byte ptr [rcx+rax],1

000000014000104E cmp byte ptr [rdx+rax],1

0000000140001052 jne wmain+5Fh (14000105Fh)

0000000140001054 add rdx,1

0000000140001058 cmp rdx,rdi

000000014000105B jne wmain+40h (140001040h)

In conclusion I will give an interesting but large example. Unfortunately, I didn't manage to abridge it

because it was necessary to show the behavior. It is this why such errors are dangerous for you cannot

foresee the consequence of a smallest alteration of the code.

ptrdiff_t UnsafeCalcIndex(int x, int y, int width) {

int result = x + y * width;

return result;

}

...

int domainWidth = 50000;

int domainHeght = 50000;

for (int x = 0; x != domainWidth; ++x)

for (int y = 0; y != domainHeght; ++y)

array[UnsafeCalcIndex(x, y, domainWidth)] = 1;

This code cannot fill correctly the array consisting of 50000*50000 items. It is impossible because when

calculating "int result = x + y * width;" an overflow must occur.

Miraculously the array is filled correctly in the release-version. UnsafeCalcIndex function integrates

inside the loop and 64-bit registers are used:

0000000140001052 test rsi,rsi

0000000140001055 je wmain+6Ch (14000106Ch)

0000000140001057 lea rcx,[r9+rax]

000000014000105B mov rdx,rsi

000000014000105E xchg ax,ax

0000000140001060 mov byte ptr [rcx],1

0000000140001063 add rcx,rbx

0000000140001066 sub rdx,1

000000014000106A jne wmain+60h (140001060h)

000000014000106C add r9,1

0000000140001070 cmp r9,rbx

0000000140001073 jne wmain+52h (140001052h)

All this takes place because UnsafeCalcIndex function is simple and can be integrate easily. But once you

make it a bit more complicated or the compiler considers that it shouldn't be integrated, an error occurs

at large data sizes.

Let's modify (complicate) UnsafeCalcIndex function a bit. Pay attention that the function's logic has not

been changed at all:

ptrdiff_t UnsafeCalcIndex(int x, int y, int width) {

int result = 0;

if (width != 0)

result = y * width;

return result + x;

}

The result is a program crash when the array's limits are exceeded:

0000000140001050 test esi,esi

0000000140001052 je wmain+7Ah (14000107Ah)

0000000140001054 mov r8d,ecx

0000000140001057 mov r9d,esi

000000014000105A xchg ax,ax

000000014000105D xchg ax,ax

0000000140001060 mov eax,ecx

0000000140001062 test ebx,ebx

0000000140001064 cmovne eax,r8d

0000000140001068 add r8d,ebx

000000014000106B cdqe

000000014000106D add rax,rdx

0000000140001070 sub r9,1

0000000140001074 mov byte ptr [rax+rdi],1

0000000140001078 jne wmain+60h (140001060h)

000000014000107A add rdx,1

000000014000107E cmp rdx,r12

0000000140001081 jne wmain+50h (140001050h)

I think you have become bored by this moment. I am sorry. I just wanted to show you how simply an

efficient 64-bit program may fail after introducing most harmless alterations into it or building it by

another version of the compiler.

4. Diagnosis of potential errors A program is a sequence of processing errors.

(c) An unknown author

I suppose that many already existing 64-bit applications or those which will be soon ported on 64-bit

systems, can suddenly spring more and more unpleasant surprises. A lot of defects may be found in

them when increasing the size of input data which was unavailable for processing in 32-bit systems.

Hidden defects can suddenly occur during further modification of the program code or change of

libraries or a compiler.

Like in the story about the horse, the first impression can be deceptive. It can only seem to you that

your program processes large data sizes successfully. You need to perform a more thorough check to

see exactly if your 64-bit horse can actually count.

To make sure that a 64-bit program is correct, the minimum thing you can do is to use not only the

release-version but the debug-version as well at all stages of testing. Keep in mind that it is a necessary

but far not sufficient condition. If your tests use data sets which, for example, don't cover a large main

memory size, an error can fail to occur both in release- and debug-versions [5]. It is necessary to extend

unit-tests and data sets for overload and manual testing. It is necessary to make algorithms process new

data combinations which are available only in 64-bit systems [6].

An alternative way of diagnosing 64-bit errors lies in using static analysis tools. It is much more radical

and safe than guessing if you have added enough tests or not. It is convenient for it doesn't demand

using the debug-version for crunching gigabytes of data.

The point of the method is to perform a full analysis of a project for a single time when porting the

program and look through all the diagnostic messages on suspicious sections in the code. Many are

frightened off by the list of thousands and tens of thousands of warnings. But the total time spent at

once on analyzing them will be much less than the time spent on correcting various bug-reports

appearing literally from nowhere for many years. It will be those very phantoms described above.

Besides, when you start working with the list of warnings you will soon find out that most of them can

be filtered and there will be much less work than you have expected. Further, you will have only to use

static analysis for a new code and it doesn't take much time.

Of course, when speaking about a toolkit for searching 64-bit phantoms, I offer the tool that we develop

- Viva64. By the way, this tool will soon be included into PVS-Studio which will unite all our static

analysis tools.

To be more objective and avoid constantly being driven out from sites with this article as an advertising

one, I will mention other tools as well. We should list Gimpel PC-Lint and Parasoft C++test. Rules for

testing 64-bit errors are implemented in them too, but they possess less diagnostic abilities than a highly

tailored Viva64 [7]. There is also Abraxas CodeCheck in the new version of which (14.5) functions of

diagnosing 64-bit errors are also implemented but I don't possess more detailed information about it.

Conclusion I will be glad if this article helps you master new platforms easier, for you will know what hidden

problems can occur. Thank you for attention.

References 1. Wikipedia. Clever Hans. http://www.viva64.com/go.php?url=223.

2. Andrey Karpov. 64 bits, Wp64, Visual Studio 2008, Viva64 and all the rest...

http://www.viva64.com/art-1-2-621693540.html

3. Andrey Karpov, Evgeniy Ryzhkov. Static code analysis for verification of the 64-bit applications.


4. Andrey Karpov. Seven Steps of Migrating a Program to a 64-bit System.


5. Andrey Karpov, Evgeniy Ryzhkov. 20 issues of porting C++ code on the 64-bit platform.


6. Andrey Karpov, Evgeniy Ryzhkov. Traps detection during migration of C and C++ code to 64-bit

Windows. http://www.viva64.com/art-1-2-2140958669.html

7. Andrey Karpov. Comparison of analyzers' diagnostic possibilities at checking 64-bit code.


a 64-bit horse that can count

Technology