asm to c translation table

Assembly to C Language Translation andC Language Structure Recovering by

Equivalency Table Definition

By: Enzo P.

Version: 5

Date: February, 2009

Summary

1. Introduction2. Why Translate From Assembly to C?3. C Calling Conventions4. ASM to C - Primitive Data Types Equivalency Table5. ASM to C - Complex Data Types and Structures Equivalency Table6. ASM to C - Instructions Set Equivalency Table6.1. CPU - 8086 + FPU6.2. CPU - 80186 + FPU6.3. CPU - 80286 + FPU6.4. CPU - 80386 + FPU6.5. CPU - 80486 + FPU6.6. CPU - Pentium + MMX + FPU6.7. CPU - Pentium Pro + FPU6.8. CPU - AMD + 3Dnow + FPU7. Details about translation of FPU ASM codes to C8. CPU Manual Reference8.1. From Intel Manual9. Conclusion10. Bibliography

Introduction

Some information added in the table are taken from Intel Manual and/or AMD Manual [and without any doubt, such information is subject to their copyright].

When I started this work, I noticed one problem, some instructions are not clearly explained enough in Intel manual to be possible to do a 1:1 transla-tion to C/C++, or, there are no equivalencies of such instructions in C/C++ language. What means that we need to interpret some instructions by ourselves, and is much possible that it will generate some inconsistencies and defective translations. In our best, the maximum we can do for such instructions that do not have C equivalents, is make use of ASM Inline, what can generate too cryptic source codes. Without doubt, nothing can be done for some ASM instruc-tions, if not the use of ASM Inline.

Here, I plan to define and use my own strategy of ASM to C translation; that will use the equivalency table to make easy definitions of possible ways of a simple code replacement rather than try to interpret any single instruction to build structures or anything like this.

The idea of this strategy is very simple, and seems to be powerful. It is the opposed of what the compiler do. So, it recognize the instructions that are not complex, building basic blocks of codes, or "primitives", and depending of composed instructions, the basic blocks will be merged to build complex blocks of codes, or "structures".

So, the idea is to build a map of blocks of codes, then, as the translation is being done, the sections that was translated is mapped depending on the type of operation that was done in the section, recognizing section by section, but firstly the sections that seems to have the basic blocks of codes.

For instructions that are not much complex, and can be simple replaced 1:1 to a C instruction, I don’t plan to do any type of interpretation of such in-structions, but simple replace then to his equivalents. Moreover, it will be mapped as a simple replacement.

For instructions that are composed, it is, instructions that are dependents of another instructions to build a logical structure, I plan try find patterns to do the recognition in the simpler way as possible, without much code interpretation. In cases that such structures will depend of codes that was already translated to C, the idea is to do simple "code merging", joining the already translated code to the structure that will be build. Moreover, it will be mapped as a complex recognition, with code merging or not, or pattern matching or anything like this.

Why Translate From Assembly to C?

C Calling Conventions

ASM to C - Primitive Data Types Equivalency Table

Typical limits of integral typesImplicit C specifier(s) Explicit C specifier Bits Bytes ASM Type Minimum value Maximum value

signed char same 8 1 Byte −128 +127unsigned char same 8 1 Byte 0 255

char one of the above 8 1 Byte −128 or 0 +127 or 255short signed short int 16 2 Word −32,768 +32,767

unsigned short unsigned short int 16 2 Word 0 65,535

int signed int 16 or 32 2 or 4 Word or Double Word −32,768 or−2,147,483,648

+32,767 or+2,147,483,647

unsigned unsigned int 16 or 32 2 or 4 Word or Double Word 0 65,535 or4,294,967,295

long signed long int 32 4 Double Word −2,147,483,648 +2,147,483,647unsigned long unsigned long int 32 4 Double Word 0 4,294,967,295

long long[1] signed long long int 64 8 Quad Word −9,223,372,036,854,775,808 +9,223,372,036,854,775,807unsigned long long[1] unsigned long long int 64 8 Quad Word 0 18,446,744,073,709,551,615

The size and limits of the plain int type (without the short, long, or long long modifiers) vary much more than the other integral types among C implementations. The Single UNIX Specification specifies that the int type must be at least 32 bits, but the ISO C standard only requires 16 bits. Refer to limits.h for guaranteed constraints on these data types. On most existing implementations, two of the five integral types have the same bit widths.

Integral type literal constants may be represented in one of two ways, by an integer type number, or by a single character surrounded by single quotes. Inte-gers may be represented in three bases: decimal (48 or -293), octal with a "0" prefix (0177), or hexadecimal with a "0x" prefix (0x3FE). A character in single quotes ('F'), called a "character constant", represents the value of that character in the execution character set (often ASCII). In C, character constants have type int (in C++, they have type char).

ASM to C - Complex Data Types and Structures Equivalency Table

ASM to C - Instructions Equivalency Table

CPU - 8086 + FPU

Instruction Set Name: General Purpose x86 Architecture: x86 CPU: 8086 [16bits]Instruction Name

Description Opcode Pseudo Code C Code Notes

AAA ASCII Adjust AL After Addition 37 IF ((AL AND 0FH) > 9) OR (AF = 1) THENAL ← (AL + 6);AH ← AH + 1;AF ← 1;CF ← 1;

ELSEAF ← 0;CF ← 0;

FI;AL ← AL AND 0FH;

AAD ASCII Adjust for Division [ASCII Adjust AX Before Division]

D5 0A tempAL ← AL;tempAH ← AH;AL ← (tempAL + (tempAH ∗ imm8)) AND FFH; (* imm8 is set to 0AH for the AAD mnemonic *)AH ← 0

The immediate value (imm8) is taken from the second byte of the in-struction.

AAM ASCII Adjust for Multiplication [ASCII Adjust AX After Multiplication]

D4 0A tempAL ← AL;AH ← tempAL / imm8; (* imm8 is set to 0AH for the AAD mnemonic *)AL ← tempAL MOD imm8;

The immediate value (imm8) is taken from the second byte of the in-struction.

AAS ASCII Adjust for Subtraction [ASCII Adjust AL After Subtraction]

3F IF ((AL AND 0FH) > 9) OR (AF = 1) THENAL ← AL – 6;AH ← AH – 1;AF ← 1;CF ← 1;

ELSECF ← 0;AF ← 0;

FI;AL ← AL AND 0FH;

ADC Add With Carry DEST ← DEST + SRC + CF;ADD Arithmetic Addition DEST ← DEST + SRC;AND Logical And DEST ← DEST AND SRC;

CALL Call Procedure Instruction composed function(parameters) Complex InstructionCBW Convert Byte to Word 98 AX ← SignExtend(AL);CLC Clear Carry Flag F8 CF ← 0;CLD Clear Direction Flag FC DF ← 0;CLI Clear Interrupt Flag FA IF ← 0; Complex InstructionCMC Complement Carry Flag F5 CF ← NOT CF;CMP Compare Operands temp ← SRC1 − SignExtend(SRC2);

Instruction composed

Is used in conjunction with Jcc, or CMOVcc, or SETcc.ModifyStatusFlags; (* Modify status flags in the same manner as the SUB, ADD instruction*)

CMPSB Compare String (Byte) A6 IF DF = 0 THEN (E)SI ← (E)SI + 1; (E)DI ← (E)DI + 1;

ELSE (E)SI ← (E)SI – 1; (E)DI ← (E)DI – 1;

FI;

Can be preceded by the REP prefix for block com-parisons of CX bytes.Can be used in a LOOP construct that takes some action based on the set-ting of the status flags be-fore the next comparison is made.

CMPSW Compare String (Word) A7 IF DF = 0(E)SI ← (E)SI + 2; (E)DI ← (E)DI + 2;


FI;


CWD Convert Word to Double-Word 99 DX ← SignExtend(AX);

DAA Decimal Adjust AL After Addition 27 IF (((AL AND 0FH) > 9) or AF = 1) THENAL ← AL + 6;CF ← CF OR CarryFromLastAddition; (* CF

OR carry from AL ← AL + 6 *)AF ← 1;

ELSEAF ← 0;FI;

IF ((AL AND F0H) > 90H) or CF = 1) THENAL ← AL + 60H;CF ← 1;

ELSECF ← 0;

FI;DAS Decimal Adjust AL After Subtraction 2F IF (AL AND 0FH) > 9 OR AF = 1 THEN

AL ← AL − 6;CF ← CF OR BorrowFromLastSubtraction; (*

CF OR borrow from AL ← AL − 6 *)AF ← 1;

ELSE AF ← 0;

FI;IF ((AL > 9FH) or CF = 1) THEN

AL ← AL − 60H;CF ← 1;

ELSE CF ← 0;

FI;DEC Decrement by 1 DEST ← DEST – 1;DIV Unsigned Divide temp ← AX / SRC;

IF temp > FFH THEN #DE; (* divide error *) ;

ELSEAL ← temp;AH ← AX MOD SRC;

FI;

ESC Escape [Used with floating-point unit] ??? No references on manual

IDIV Signed Integer Division F6 /7 IF SRC = 0 THEN #DE; (* divide error *)

FI;temp ← AX / SRC; (* signed division *)IF (temp > 7FH) OR (temp < 80H) THEN(* if a positive result is greater than 7FH or a nega-tive result is less than 80H *)

#DE; (* divide error *) ;ELSE

AL ← temp;AH ← AX SignedModulus SRC;

FI;IMUL Signed Integer Multiply IF (NumberOfOperands = 1)THEN

IF (OperandSize = 8)THENAX ← AL ∗ SRC (* signed multiplica-

tion *)IF ((AH = 00H) OR (AH = FFH))THEN

CF = 0; OF = 0;ELSE

CF = 1; OF = 1;FI;

FI;

ELSE IF (NumberOfOperands = 2)THEN temp ← DEST ∗ SRC (* signed multiplica-

tion; temp is double DEST size*)DEST ← DEST ∗ SRC (* signed multiplication

*)IF temp ≠ DEST THEN

CF = 1; OF = 1;ELSE

CF = 0; OF = 0;FI;

ELSE (* NumberOfOperands = 3 *)DEST ← SRC1 ∗ SRC2 (* signed multiplica-

tion *)temp ← SRC1 ∗ SRC2 (* signed multiplica-

tion; temp is double SRC1 size *)IF temp ≠ DEST THEN

CF = 1; OF = 1;ELSE

CF = 0; OF = 0;FI;

FI;IN Input Byte or Word From Port IF ((PE = 1) AND ((CPL > IOPL) OR (VM = 1))) THEN

(* Protected mode with CPL > IOPL or virtual-8086 mode *)

IF (Any I/O Permission Bit for I/O port being accessed = 1) THEN

(* I/O operation is not allowed *)#GP(0);

ELSE ( * I/O operation is allowed *) DEST ← SRC; (* Reads from selected

I/O port *)FI;

ELSE (Real Mode or Protected Mode with CPL ≤ IOPL *)

DEST ← SRC; (* Reads from selected I/O port *)FI;

INC Increment by 1 DEST ← DEST +1;INT Call to Interrupt Instruction composed Used with stack and sys-

tem callsINT03 Call to Interrupt Instruction composed Used with stack and sys-

tem callsINT3 Call to Interrupt Instruction composed Used with stack and sys-

tem callsINTO Call to Interrupt on Overflow Instruction composed Used with stack and sys-

tem callsIRET Return From Interrupt CF Instruction composed Used with stack and sys-

tem callsIRETW Return From Interrupt Instruction composed Used with stack and sys-

tem calls

JXX Jump Instructions Table[JA, JAE, JB, JBE, JC, JCXZ, JE,

Instruction composed Used in conjunction with cmp instruction, to build

JG, JGE, JL, JLE, JNA, JNAE, JNB, JNBE, JNC, JNE, JNG, JNGE, JNL, JNLE, JNO, JNP, JNS, JNZ, JO, JP, JPE, JPO, JS, JZ]

comparison structures or loop blocks

*JCXZ/JECXZ Jump if Register (E)CX is Zero Instruction composed Used to build loop blocksJMP Unconditional Jump Instruction composed Used to build loop blocks

LAHF Load Register AH From Flags [Load flags into AH register]

9F AH ← EFLAGS(SF:ZF:0:AF:0:PF:1:CF);

LDS Load Pointer Using DS ???LEA Load Effective Address ???LES Load Pointer Using ES ???LOCK Lock Bus [Assert BUS LOCK# signal] F0 AssertLOCK#(DurationOfAccompaningInstruction)LODSB Load String (Byte) AC AL ← SRC; (* byte load *)

IF DF = 0 THEN (E)SI ← (E)SI + 1;

ELSE (E)SI ← (E)SI – 1;

FI;


LODSW Load String (Word) AD AX ← SRC; (* word load *)

IF DF = 0 THEN(E)SI ← (E)SI + 2;

ELSE (E)SI ← (E)SI – 2;

FI;


LOOP Decrement CX and Loop if CX Not Zero

Instruction composed Used to build loop blocks

LOOPE Loop While Equal Instruction composed Used to build loop blocksLOOPNE Loop While Not Equal Instruction composed Used to build loop blocksLOOPNZ Loop While Not Zero Instruction composed Used to build loop blocksLOOPZ Loop While Zero Instruction composed Used to build loop blocks

MOV Move Byte or Word DEST ← SRC;MOVSB Move String (Byte) [Move byte from

string to string]DEST ←SRC;IF DF = 0 THEN

(E)SI ← (E)SI + 1; (E)DI ← (E)DI + 1;


FI;


MOVSW Move String (Word) [Move word from string to string]

DEST ←SRC;IF DF = 0

(E)SI ← (E)SI + 2; (E)DI ← (E)DI + 2;


FI;


MUL Unsigned Multiply AX ← AL ∗ SRCOrDX:AX ← AX ∗ SRC

NEG Two's Complement Negation IF DEST = 0 THEN CF ← 0

ELSE CF ← 1;

FI;DEST ← – (DEST)

NOP No Operation (90h) ???NOT One's Compliment Negation (Logical

NOT)DEST ← NOT DEST;

OR Inclusive Logical OR DEST ← DEST OR SRC;OUT Output Data to Port IF ((PE = 1) AND ((CPL > IOPL) OR (VM = 1))) THEN

(* Protected mode with CPL > IOPL or virtual-8086 mode *)

IF (Any I/O Permission Bit for I/O port being

accessed = 1) THEN(* I/O operation is not allowed *)

#GP(0);ELSE ( * I/O operation is allowed *)

DEST ← SRC; (* Writes to selected I/O port *)

FI;ELSE (Real Mode or Protected Mode with CPL ≤ IOPL *)

DEST ← SRC; (* Writes to selected I/O port *)FI;

POP Pop Word off Stack [Only works with register CS on 8086/8088]

DEST ← SS:SP; (* copy a word *)SP ← SP + 2;

Used in conjunction with call structures and stack

POPF Pop Flags off Stack [Pop data into flags register]

???

PUSH Push Word onto Stack ESP ← ESP − 2;SS:ESP ← SRC; (* push word *)

Used in conjunction with call structures and stack

PUSHF Push Flags onto Stack [Push flags onto stack]

???

RCL Rotate Through Carry Left [Rotate left (with carry)]

RCR Rotate Through Carry Right [Rotate right (with carry)]

REP Repeat String Operation [Repeat CMPS/LODS/MOVS/SCAS/STOS]

Instruction composed Used to build loop blocks

REPE/REPZ Repeat Equal / Repeat Zero Instruction composed Used to build loop blocksREPNE/REPNZ Repeat Not Equal / Repeat Not Zero Instruction composed Used to build loop blocksRET Return From Procedure Instruction composed Used in conjunction with

call structures and stackRETF Return From Procedure [Return from

far procedure]Instruction composed Used in conjunction with

call structures and stackRETN Return From Procedure [Return from

near procedure]Instruction composed Used in conjunction with

call structures and stackROL Rotate LeftROR Rotate Right

SAHF Store AH Register into FLAGSSAL Shift Arithmetic Left [Shift Arithmeti-

cally left (signed shift left)]SAR Shift Arithmetic Right [Shift Arith-

metically right (signed shift right)]SBB Subtract with Borrow/Carry SCASB Scan String (Byte) [Compare byte

string]Can be preceded by the REP prefix for block com-parisons of CX bytes.Can be used in a LOOP construct that takes some action based on the set-ting of the status flags be-fore the next comparison is made.

SCASW Scan String (Word) [Compare word string]


SHL Shift Logical Left [Shift left (unsigned shift left)]

SHR Shift Logical Right [Shift right (un-signed shift right)]

STC Set CarrySTD Set Direction FlagSTI Set Interrupt Flag (Enable Interrupts)STOSB Store String (Byte) [Store byte in

string]Can be preceded by the REP prefix for block com-parisons of CX bytes.Can be used in a LOOP construct that takes some action based on the set-

ting of the status flags be-fore the next comparison is made.

STOSW Store String (Word) [Store word in string]


SUB Subtract

TEST Test For Bit Pattern [Logical compare (AND)]

WAIT Event Wait [Wait until not busy] [Waits until BUSY# pin is inactive (used with float-ing-point unit)]

XCHG Exchange dataXLAT/XLATB Table look-up translationXOR Exclusive OR

HLT Halt CPU [Enter halt state] [Private]POP CS Pop top of the stack into CS Segment

register [Only available on 8086. Be-ginning with 80286 this opcode is used as a prefix for 2

[Undocumented]

Instruction Set Name: FPU Architecture: x86 CPU: 8087 [16bits]Instruction Name

Description Opcode Pseudo Code C Code Notes

F2XM1/f2mfx1

FABS FADD FADDP

FBLD FBSTP

FCHS FCLEX FCOM FCOMP FCOMPP

FDECSTP FDISI FDIV FDIVP FDIVR FDIVRP

FENI

FFREE

FIADD FICOM FICOMP FIDIV FIDIVR FILD FIMUL FINCSTP FINIT FIST FISTP FISUB FISUBR

FLD FLD1 FLDCW FLDENV FLDENVW FLDL2E FLDL2T FLDLG2 FLDLN2 FLDPI FLDZ

FMUL FMULP

FNCLEX FNDISI FNENI FNINIT FNOP FNSAVE FNSAVEW FNSTCW FNSTENV FNSTENVW FNSTSW

FPATAN FPREM FPTAN

FRNDINT FRSTOR FRSTORW

FSAVE

FSAVEW FSCALE FSQRT FST FSTCW FSTENV FSTENVW FSTP FSTSW FSUB FSUBP FSUBR FSUBRP

FTST

FWAIT Event Wait

FXAM FXCH FXTRACT

FYL2X/fyl2xp FYL2XP1

CPU – 80186 + FPU

Instruction Set Name: General Purpose x86 Architecture: x86 CPU: 80186/80188 [16bits]

Instruction Name Description Opcode Pseudo Code C Code Notes

CPU – 80286 + FPU

Instruction Set Name: Architecture: x86 CPU:


CPU – 80386 + FPU



CPU – 80486 + FPU



CPU - Pentium + MMX + FPU



CPU - Pentium Pro + FPU



CPU - AMD + 3Dnow + FPU



Details about translation of FPU ASM codes to C

CPU Manual Reference

From Intel Manual

• ZeroExtend(value)—Returns a value zero-extended to the operand-size attribute of the instruction. For example, if the operand-size attribute is 32, zero ex-tending a byte value of –10 converts the byte from F6H to a doubleword value of 000000F6H. If the value passed to the ZeroExtend function and the oper-and-size attribute are the same size, ZeroExtend returns the value unaltered.• SignExtend(value)—Returns a value sign-extended to the operand-size attribute of the instruction. For example, if the operand-size attribute is 32, sign ex-tending a bytecontaining the value –10 converts the byte from F6H to a doubleword value of FFFFFFF6H. If the value passed to the SignExtend function and the operand-size attributeare the same size, SignExtend returns the value unaltered.• SaturateSignedWordToSignedByte—Converts a signed 16-bit value to a signed 8-bit value. If the signed 16-bit value is less than –128, it is represented by the saturated value –128 (80H); if it is greater than 127, it is represented by the saturated value 127 (7FH).• SaturateSignedDwordToSignedWord—Converts a signed 32-bit value to a signed 16-bit value. If the signed 32-bit value is less than –32768, it is represented by the saturated value –32768 (8000H); if it is greater than 32767, it is represented by the saturated value 32767 (7FFFH).• SaturateSignedWordToUnsignedByte—Converts a signed 16-bit value to an unsigned 8-bit value. If the signed 16-bit value is less than zero, it is represented by the saturated value zero (00H); if it is greater than 255, it is represented by the saturated value 255 (FFH).• SaturateToSignedByte—Represents the result of an operation as a signed 8-bit value. If the result is less than –128, it is represented by the saturated value –128 (80H); if it is greater than 127, it is represented by the saturated value 127 (7FH).• SaturateToSignedWord—Represents the result of an operation as a signed 16-bit value. If the result is less than –32768, it is represented by the saturated value –32768 (8000H); if it is greater than 32767, it is represented by the saturated value 32767 (7FFFH).• SaturateToUnsignedByte—Represents the result of an operation as a signed 8-bit value. If the result is less than zero it is represented by the saturated value zero (00H); if it is greater than 255, it is represented by the saturated value 255.• SaturateToUnsignedWord—Represents the result of an operation as a signed 16-bit value. If the result is less than zero it is represented by the saturated value zero (00H); if it is greater than 65535, it is represented by the saturated value 65535 (FFFFH).• LowOrderWord(DEST * SRC)—Multiplies a word operand by a word operand and stores the least significant word of the doubleword result in the destina-tion operand.• HighOrderWord(DEST * SRC)—Multiplies a word operand by a word operand and stores the most significant word of the doubleword result in the destina-tion operand.• Push(value)—Pushes a value onto the stack. The number of bytes pushed is determined by the operand-size attribute of the instruction. Refer to the “Oper-ation” section in “PUSH—Push Word or Doubleword Onto the Stack” in this chapter for more information on the push operation.• Pop() removes the value from the top of the stack and returns it. The statement EAX ← Pop(); assigns to EAX the 32-bit value from the top of the stack. Pop will return either a word or a doubleword depending on the operand-size attribute. Refer to the “Operation” section in “POP—Pop a Value from the Stack” in this chapter for more information on the pop operation.

• PopRegisterStack—Marks the FPU ST(0) register as empty and increments the FPU register stack pointer (TOP) by 1.• Switch-Tasks—Performs a task switch.• Bit(BitBase, BitOffset)—Returns the value of a bit within a bit string, which is a sequence of bits in memory or a register. Bits are numbered from low-order to high-order within registers and within memory bytes. If the base operand is a register, the offset can be in the range 0..31. This offset addresses a bit within the indicated register.

Conclusion

Without no doubt, there is a lot of work to do around it, and don’t matter how much work a person do in it, if it don’t be done in conjunct with many "specialized", prepared, and motivated people, not much feedback can be taken from this work, because the amount of work is necessary to do in it. By the way, it is a good thing to research about, and for ones that like it, a good subject to delight.

For now, this is just a table of instructions to standardize and define the meaning of each instruction, when possible is planned to define all his pseudo codes and equivalent C codes.

Any one that wants to contribute any kind of information is welcome. At the current stage of the table, we need mostly the pseudo codes, and possi-ble solutions [mainly to instructions that don’t have a clear or direct representation in C] to represent the instructions in a plain and clear C code. We also need reference C codes to analyze compiled ASM outputs to find for code patterns, and backward code representations. A backward code representation is when you compile a C code to try generate one specific ASM instruction and check what is the C instruction that generated the specific ASM instruction, in this way you can define one ASM to C equivalency, so, can translate the ASM code back to C.

Bibliography

- Intel CPU Manual- AMD CPU Manual- Wikipedia

asm to c translation table

Documents