lecture 9 - zmitac politechnika Śląskadb.zmitac.aei.polsl.pl/kt/lecture9.pdf · • clculate many...

51
Assembler Programming Lecture 9

Upload: trantruc

Post on 09-Mar-2018

214 views

Category:

Documents


1 download

TRANSCRIPT

Assembler Programming

Lecture 9

Lecture 9

• Floating point coprocessor instructions. MMX instructions.

Math coprocessor• 8087, 80287, 80387 – separate chips.• i486DX and above – built in.• Clculate many times faster then 8086-based

processor:– real numbers,– packed BCD numbers,– long integers.

• Has its own set of registers.

Math coprocessor registers

63 06279 64

SignificandExponentSign

R7

R0

R6R5R4R3R2R1

STST(1)

Groups of the instructions

• Classical stack format.• Memory.• Register.• Register pop.

Classical stack format

• Instructions treat registers like a stack.• Only a top item can be accessed.• First and sometimes second registers are

assumed.• ST is the source operand.• ST(1) is the destination operand.• Source is popped off the stack.• Result is at the top of the stack.

Memory format

• Instructions treat registers like a stack.• Item are pushed from memory or popped

to the memory.• Memory operand is always the source

operand.• ST is the destination operand.• Result is at the top of the stack without

popping the destination operand.

Register format

• Instructions treat registers like a registers.• One operand is always the stack top ST.• First operand is the destination.• Second operand is the source.• Stack position does not change.

Register pop format

• Instructions treat registers as modified stack.• Source must be the ST (stack top).• Destination is the other register.• Result is placed into te destination register.• Source (top) is popped off the stack.

Coprocessor instructions

• Loading and storing data.• Doing arithmetic calculations.• Controlling program flow.

Loading and Storing Data • Copy data between memory and registers.• Copy data between registers.• Data in memory can be:

– integer,– BCD number,– real number.

• Data transferred to coprocessor is always 10-byte real number.

Loading and Storing Data • Load commands push data onto the stack.• Store commands pop data off the stack or

copy the data to other register.• Constants can not be operands.• You can load constants like 0, 1, pi with

special instructions.• You can save the coprocessor status to the

memory and later load the status back intoregisters.

Loading and storing data

• FLD, FST, FSTP - Loads and stores real numbers.

• FILD, FIST, FISTP - Loads and stores binary integers.

• FBLD - Loads BCD.• FBSTP - Stores BCD.• FXCH - Exchanges register values.• In all istructions P means popping the ST.

Loading constants

• FLDZ - Pushes 0 into ST.• FLD1 - Pushes 1 into ST.• FLDPI - Pushes the value of pi into ST.• FLDL2E - Pushes the value of log2e into ST.• FLDL2T - Pushes log210 into ST.• FLDLG2 - Pushes log102 into ST.• FLDLN2 - Pushes loge2 into ST.

Loading and storing status• FLDCW mem2byte - Loads the control word into the

coprocessor• F[[N]]STCW mem2byte - Stores the control word in

memory• FLDENV mem14byte - Loads environment from

memory• F[[N]]STENV mem14byte - Stores environment in

memory• FRSTOR mem94byte - Restores state from memory• F[[N]]SAVE mem94byte - Saves state in memory

Loading and storing example

.DATAm1 REAL4 1.0m2 REAL4 2.0

.CODEfld m1 ; Push m1 into first itemfld st(2) ; Push third item into firstfst m2 ; Copy first item to m2fxch st(2) ; Exchange first and third itemsfstp m1 ; Pop first item into m1

.DATAm1 REAL4 1.0m2 REAL4 2.0

.CODEfld m1 ; Push m1 into first itemfld st(2) ; Push third item into firstfst m2 ; Copy first item to m2fxch st(2) ; Exchange first and third itemsfstp m1 ; Pop first item into m1

Arithmetic calculations • FADD - Adds the source and destination.• FSUB - Subtracts the source from the

destination.• FSUBR - Subtracts the destination from the

source.• FMUL - Multiplies the source and the destination.• FDIV - Divides the destination by the source.• FDIVR - Divides the source by the destination.• FABS - Sets the sign of ST to positive.• FCHS - Reverses the sign of ST.

Arithmetic calculations • FRNDINT - Rounds ST to an integer.• FSQRT - Replaces the contents of ST with

its square root.• FSCALE - Multiplies the stack-top value by

2 to the power contained in ST(1).• FPREM - Calculates the remainder of ST

divided by ST(1).

Arithmetic calculations 387• FSIN - Calculates the sine of the value in

ST• FCOS - Calculates the cosine of the value

in ST• FSINCOS - Calculates the sine and cosine

of the value in ST• FPTAN - Calculates the tangent of the

value in ST• FPATAN - Calculates the arctangent of the

ratio Y/X

Arithmetic calculations 387• FPREM1 - Calculates the partial remainder

by performing modulo division on the top two stack registers

• FXTRACT - Breaks a number down into its exponent and mantissa and pushes the mantissa onto the register stack

• F2XM1 - Calculates 2x–1• FYL2X - Calculates Y * log2 X• FYL2XP1 - Calculates Y * log2 (X+1)

Arithmetic calculations 387• F[[N]]INIT - Resets the coprocessor and restores

all the default conditions in the control and status words

• F[[N]]CLEX - Clears all exception flags and the busy flag of the status word

• FINCSTP - Adds 1 to the stack pointer in the status word

• FDECSTP - Subtracts 1 from the stack pointer in the status word

• FFREE - Marks the specified register as empty

Arithmetic calculations - Example.DATAa REAL4 3.0b REAL4 7.0cc REAL4 2.0posx REAL4 0.0negx REAL4 0.0

.CODE.

; Solve quadratic equation - no error checking; The formula is: -b +/- squareroot(b2 - 4ac) / (2a)

fld1 ; Get constants 2 and 4fadd st,st ; 2 at bottomfld st ; Copy itfmul a ; = 2afmul st(1),st ; = 4afxch ; Exchange st and st(1)fmul cc ; = 4ac

.DATAa REAL4 3.0b REAL4 7.0cc REAL4 2.0posx REAL4 0.0negx REAL4 0.0

.CODE.

; Solve quadratic equation - no error checking; The formula is: -b +/- squareroot(b2 - 4ac) / (2a)

fld1 ; Get constants 2 and 4fadd st,st ; 2 at bottomfld st ; Copy itfmul a ; = 2afmul st(1),st ; = 4afxch ; Exchange st and st(1)fmul cc ; = 4ac

fld b ; Load bfmul st,st ; = b2fsubr ; = b2 - 4ac

; Negative value here produces errorfsqrt ; = square root(b2 - 4ac)fld b ; Load bfchs ; Make it negativefxch ; Exchange

fld st ; Copy square rootfadd st,st(2) ; Plus version = -b + root(b2 - 4ac)fxch ; Exchangefsubp st(2),st ; Minus version = -b - root(b2 -4ac)

fdiv st,st(2) ; Divide plus versionfstp posx ; Store itfdivr ; Divide minus versionfstp negx ; Store it

fld b ; Load bfmul st,st ; = b2fsubr ; = b2 - 4ac

; Negative value here produces errorfsqrt ; = square root(b2 - 4ac)fld b ; Load bfchs ; Make it negativefxch ; Exchange

fld st ; Copy square rootfadd st,st(2) ; Plus version = -b + root(b2 - 4ac)fxch ; Exchangefsubp st(2),st ; Minus version = -b - root(b2 -4ac)

fdiv st,st(2) ; Divide plus versionfstp posx ; Store itfdivr ; Divide minus versionfstp negx ; Store it

Status Word register

SW C0C1C2C3

Invalid operationDenormalizedZero divideOverflowUnderflowPrecisionStack faultException flagCondition codesTop of stackReserved

Exception flags

Controlling program flow• Status word can be stored:

– into the memory,– into the AX register (80287 and above).

• Coprocessor have instructions for:– comparing operands,– testing control flags.

• These instructions compare the ST to:– specified source operand,– ST(1) if not specified.

Controlling program flow• FCOM - Compares the stack top to the source.

The source and destination are unaffected by the comparison.

• FTST - Compares ST to 0. • FCOMP - Compares the stack top to the source

and then pops the stack.• FUCOM, FUCOMP, FUCOMPP - Compares the

source to ST and sets the condition codes of the status word according to the result (80386/486 only).

• F[[N]]STSW mem2byte - Stores the status word in memory.

• FXAM - Sets the value of the control flags based on the type of the number in ST.

Controlling program flow• FPREM - Finds a correct remainder for large

operands. It uses the C2 flag to indicate whether the remainder returned is partial (C2 is set) or complete (C2 is clear).

• FNOP - Copies the stack top onto itself without having any effect on registers or memory.

• FDISI, FNDISI, FENI, FNENI - Enables or disables interrupts (8087 only).

• FSETPM - Sets protected mode. Requires a .286P or .386P directive (80287, 80387, and 80486 only).

Controlling the flow - Example.DATA

down REAL4 10.35 ; Sides of a rectangleacross REAL4 13.07diamtr REAL4 12.93 ; Diameter of a circlestatus WORD ?P287 EQU (@Cpu AND 00111y).CODE

; Get area of rectanglefld across ; Load one sidefmul down ; Multiply by the other

; Get area of circle: Area = PI * (D/2)2fld1 ; Load one andfadd st, st ; double it to get constant 2fdivr diamtr ; Divide diameter to get radiusfmul st, st ; Square radiusfldpi ; Load pifmul ; Multiply it

.DATAdown REAL4 10.35 ; Sides of a rectangleacross REAL4 13.07diamtr REAL4 12.93 ; Diameter of a circlestatus WORD ?P287 EQU (@Cpu AND 00111y).CODE

; Get area of rectanglefld across ; Load one sidefmul down ; Multiply by the other

; Get area of circle: Area = PI * (D/2)2fld1 ; Load one andfadd st, st ; double it to get constant 2fdivr diamtr ; Divide diameter to get radiusfmul st, st ; Square radiusfldpi ; Load pifmul ; Multiply it

; Compare area of circle and rectanglefcompp ; Compare and throw both awayIF p287fstsw ax ; (For 287+, skip memory)ELSEfnstsw status ; Load from coprocessor to memorymov ax, status ; Transfer memory to registerENDIFsahf ; Transfer AH to flags registerjp nocomp ; If parity set, can't comparejz same ; If zero set, they're the samejc rectangle ; If carry set, rect. is biggerjmp circle ; else circle is bigger

nocomp: ... ; Error handler...

same: ... ; Both equal...

rectangle: ... ; Rectangle bigger...

circle: ... ; Circle bigger

; Compare area of circle and rectanglefcompp ; Compare and throw both awayIF p287fstsw ax ; (For 287+, skip memory)ELSEfnstsw status ; Load from coprocessor to memorymov ax, status ; Transfer memory to registerENDIFsahf ; Transfer AH to flags registerjp nocomp ; If parity set, can't comparejz same ; If zero set, they're the samejc rectangle ; If carry set, rect. is biggerjmp circle ; else circle is bigger

nocomp: ... ; Error handler...

same: ... ; Both equal...

rectangle: ... ; Rectangle bigger...

circle: ... ; Circle bigger

Program flow – new mechanism• Available beginning with the P6 family

processors.• New instructions:

– FCOMI, FCOMIP, FUCOMI, FUCOMIP,– compare and set ZF, PF, and CF flags in the

EFLAGS register directly.• New conditional transfer instructions:

– FCMOVcc,– conditionally moves floating point values – eliminates branches.

Memory access

• When using the coprocessor, follow these three steps:– Load data from memory to coprocessor registers.– Process the data.– Store the data from coprocessor registers back to

memory.• Processing the data, can occur while the main

processor is handling other tasks.• Loading and storing data must be coordinated

Memory access

• Coprocessor instruction follows a processor instruction:– assembler coordinates this conflict automatically for

8086,– processor coordnates it automatically on 80186 and

above processors.

; Processor instruction first - No wait neededmov WORD PTR mem32[0], ax ; Load memorymov WORD PTR mem32[2], dxfild mem32 ; Load to register

; Processor instruction first - No wait neededmov WORD PTR mem32[0], ax ; Load memorymov WORD PTR mem32[2], dxfild mem32 ; Load to register

Memory access

• Processor instruction follows a coprocessor instruction:– synchronization is not automatic,– You must include WAIT or FWAIT instruction.

; Coprocessor instruction first - Wait neededfist mem32 ; Store to memoryfwait ; Wait until

; coprocessor is done

mov ax, WORD PTR mem32[0] ; Move to registermov dx, WORD PTR mem32[2]

; Coprocessor instruction first - Wait neededfist mem32 ; Store to memoryfwait ; Wait until

; coprocessor is done

mov ax, WORD PTR mem32[0] ; Move to registermov dx, WORD PTR mem32[2]

Coprocessor example; counting average of table elements

count DW 100average REAL4 0.0

mov cx,countmov si,tablefld qword ptr [si] ; load first element to the STdec cx

sum: add si,8 ; index of next elementfld qword ptr [si] ; load next element to the STfadd ; add ST(1) to STloop sumfidiv count ; divide sum in ST / countfstp average ; store the result

; counting average of table elements

count DW 100average REAL4 0.0

mov cx,countmov si,tablefld qword ptr [si] ; load first element to the STdec cx

sum: add si,8 ; index of next elementfld qword ptr [si] ; load next element to the STfadd ; add ST(1) to STloop sumfidiv count ; divide sum in ST / countfstp average ; store the result

MMX• Introduced in the Pentium MMX.• SIMD – Single Instruction Multiple Data.• Handles 64-bit packed integer data.• Works on 8 new 64-bit registers.• Three new packed data types:

– 64-bit packed byte integers (signed and unsigned).– 64-bit packed word integers (signed and unsigned).– 64-bit packed doubleword integers (signed and

unsigned).• 47 new instructions.

MMX registers

63 079 64

64-bit MMX registers

80-bit math coprocessor registers

MM7

MM0

MM6MM5MM4MM3MM2MM1

R7

R0

R6R5R4R3R2R1

MMX data types

Word

Byte

Doubleword

63 0

ByteByteByte ByteByteByteByte

Word Word Word

Doubleword

Packed byte

Packed word

Packed doubleword

SIMD instructions

Word Word Word Word Packed word

Word Word Word Word

Word Word Word Word

Packed word

Packed word

Source operand

Destination

Source operand

PADDSW + + + +

= = = =

Wraparound

0000 FFFF 8000 FFFF Packed word

0001 0001 8000 FFFF

0001 0000 0000 FFFE

Packed word

Packed word

Source operand

Destination

Source operand

PADDW + + + +

= = = =

Signed saturation

0000 FFFF 8000 FFFF Packed word

0001 0001 8000 FFFF

0001 0000 8000 FFFE

Packed word

Packed word

Source operand

Destination

Source operand

PADDSW + + + +

= = = =

Unsigned saturation

0000 FFFF 8000 FFFF Packed word

0001 0001 8000 FFFF

0001 FFFF FFFF FFFF

Packed word

Packed word

Source operand

Destination

Source operand

PADDUSW + + + +

= = = =

MMX instructions• Data transfer• Arithmetic• Comparison• Conversion• Unpacking• Logical• Shift• Empty MMX state instruction (EMMS)

Data transfer• MOVD – moves 32 bits between MMX

register and memory, or general purposeregister.

• MOVQ - moves 64 bits between MMX register and memory, or between MMX registers.

Arithmetic instructions• PADDB, PADDW, PADDD

– add packed integers with wraparound.• PSUBB, PSUBW, PSUBD

– subtract packed integers with wraparound.• PADDSB, PADDSW

– add packed signed integers with signed saturation.• PSUBSB, PSUBSW

– subtract packed signed integers with signed saturation• PADDUSB, PADDUSW

– add packed unsigned integers with unsigned saturation• PSUBUSB, PSUBUSW

– subtract packed unsigned integers with unsigned saturation

Arithmetic instructions• PMULHW

– multiply packed signed integers and store highresult.

• PMULLW– multiply packed signed integers and store low

result.• PMADDWD

– multiply and add packed integers.

Comparison instructions• PCMPEQB, PCMPEQW, PCMPEQD

– compare packed data for equal.• PCMPGTB, PCMPGTW, PCMPGTD

– compare packet signed data for greater than.

Conversion instructions• PACKSSWB

– pack words into bytes with signed saturation.• PACKSSDW

– pack doublewords into words with signedsaturation.

• PACKUSWB– pack words into bytes with unsigned

saturation.

Unpack instructions• PUNPCKHBW, PUNPCKHWD,

PUNPCKHDQ– unpack high-order data elements.

• PUNPCKLBW, PUNPCKLWD, PUNPCKLDQ– unpack low-order data elements.

Logical instructions• PAND

– bitwise logical AND.• PANDN

– bitwise logical AND NOT.• POR

– bitwise logical OR.• PXOR

– bitwise logical exclusive OR.

Shift instructions• PSLLW, PSLLD, PSLLQ

– shift packed data left logical.• PSRLW, PSRLD, PSRLQ

– shift packed data right logical.• PSRAW, PSRAD

– shift packed data right arithmetic.

EMMS instruction• Indicates math coprocessor registers as

empty.• Must be executed at the end of MMX

routine.