[ Home | Optimize | Current | Project | Tools | Made in France | F.A.Q | Forums | Links | Sources Codes ]

There are many kinds of optimisations as there many way to do them.

REMOVING THE MUL INSTRUCTION

You can replace the multiply instruction by a left shit if you are using a power of two constant. You can make many left shifts for replacing a multiply instruction with any constant. Now here are some examples.

Suppose you want to multiply EAX by 2, just write SHL EAX,1

Now suppose you need to multiply the same register by 100. For this operation it is very useful to get all the powers of two that we find in 100. This number is equal to 64 + 32 + 4 = 26 + 2+ 22. You need to use three registers for this operation. At the beginnig EAX contains the number to multiply. Initialize EBX and ECX with EAX. EAX = EBX = ECX = 100. Now write :

SHL EAX,6 ; EAX *= 64
SHL EBX,5 ; EBX *= 32
SHL ECX,2 ; ECX *= 4

ADD EAX,EBX ; EAX = (X * 64) + (X * 32) => EAX = X * 96
ADD EAX,ECX ; EAX = (X * 96) + (X * 4) => EAX = X * 100

Suppose that the registers are initialized with the value 97.

EAX = 97 << 5 = 6208
EBX = 97 << 4 = 3104
ECX = 97 << 2 = 388

6208 + 3104 + 388 = 9700

This method is not the best. There are too many instructions. Just before the first SHL instruction you have two MOV instructions for setting the EAX register into EBX and ECX. The best way is given by Paul Hsieh. My method is only for learning purpose.

Paul Hsieh replaces these SHIFTS / ADD with the following instructions. Take a look at Paul Hsieh's multiplication table. I am trying to decrypt it. In this example EAX is initialized with the number to multiply. We suppose that EAX is now 97.

Mul0100: ; // 3 clocks
LEA EBX,[EAX + EAX * 2] ; EBX = EAX + (2 * EAX) = > EBX = 97 + (2 * 97) => EBX = 97 + 194 = > EBX = 291
LEA EAX,[EAX * 4]       ; EAX = EAX * 4         => EAX = 97 * 4        => EAX = 388
SHL EBX,5               ; EBX = EBX << 5        => EBX = 291 * 32         EBX = 9312
ADD EAX,EBX             ; EAX = EAX + EBX       => EAX = 388 + 9312       EAX = 9700 

The first line multiplies the EBX register by 3. The second multiplies EAX by 4. The third line multiplies by 32 the register EBX. The last line creates the result by adding EBX to EAX. This code is very compact. It shows us how to multiply a number by 3 with one line of code only. All the instructions take one clock. There are 4 lines and 4 * 1 != 3. How does it compute ? The two LEA instructions are executed at the same time and the result is one clock. The two others lines are not pairables so they are executed one by one, in that case they use two clocks. 1 + 2 = 3 clocks.

REMOVE THE DIV INSTRUCTION

Replacing the DIV instruction is more complex than replacing the MUL instruction. If we divide by a power of two, DIV can be replaced by a right shift. If we divide by an other constant, it can be quicker to apply the following formula : A = B / C = B * 1/C. The 1/C is equal to NOT C. For some divisions we can use a magic number. I had a post on MASM Forum for dividing by 100, it is at http://www.masm32.com/board/index.php?topic=5821.0

Use "Magic divider" (coded by The Svin) to generate other divisors

MagicNumber = 2748779069

MOV EAX,X
MOV EDX, MagicNumber
INC EAX
MUL EDX
SHR EDX,6

This is only valid for integers. Suppose that the value of X is 00241259h, it is equal to 2363993.

MOV EAX,X ; EAX = 2363993
MOV EDX,MagicNumer ; EDX = 2748779069
INC EAX   ; EAX = 2363994
MUL EDX   ; EAX = EAX * EDX  = > EAX = 2363994 * 2748779069 = 6498097226441586= 1715FC28E5E372h<BR>         ; = 1715FC28E5E372 => EDX = 1715FCh and EAX      = 28E5E372h
SHR EDX,6 ; EDX = EDX / 64   => 1715FCh / 64               = 1512956 / 64     = 23639

Now we can see that we have lost the number 93 from 2363993. But how is the MagicNumber calculated ? Raymond gives an answer.

An other way to multiply is like this A = B / C is equal to A = B * (1/C). Here B = 1724 and C = 33. A = 1724 * (1/33) = 1724 * 0.0303 = 52.2424 but this operation cannot be made with integers. For computing like this you have to use the floatting Point Unit (FPU).

 

Many kinds of optimizations can be found Agner Fog's Web Site.

My Own Methods

What I suggest is to replace the INVOKE instruction when two or more calls are following themselves. I just want to say if you have the following code :

INVOKE TranslateMessage,ADDR Msg
INVOKE DispatchMessage,ADDR Msg

It is tranlated like this :

LEA  EAX,Msg
push EAX
CALL TranslateMessage
LEA  EAX,MSG
PUSH EAX
CALL DispatchMessage

One way to optimize could be by pushing twice the EAX register like this the second LEA EAX,Msg and PUSH EAX are simply replaced with only one PUSH EAX. The new code is :

LEA  EAX,MSG
PUSH EAX ; That is for DispatchMessage
PUSH EAX ; That is for TranslateMessage
CALL TranslateMessage
CALL DispatchMessage

What have we won ? One line, and one or two clocks ? Wonderful, but there is better to do. We will keep the manner the parameters are pushed onto the stack. When the call to the TranslateMessage is made, it pushes the return address which is the address of the CALL DispatchMessage, and it jumps to the TranslateMessage function. OK ?

I suggest that when the TranslateMessage ends it does not return to the line following this call but goes directly to the DispatchMessage function. This is made like this :

   LEA  EAX,Msg                ; Load Structure address into EAX
    PUSH EAX                    ; That is for DispatchMessage
    PUSH OFFSET @F              ; Return address for DispatchMessage
    PUSH EAX                    ; That is for TranslateMessage
    PUSH OFFSET DispatchMessage ; Return address for TranslateMessage
    JMP  TranslateMessage       ; Call the function
@@ :

Is it clear ? Good, now there is a last optimizations to do. If the last call is followed by a JMP instruction. Wa have something like this :

LEA  EAX,Msg                ; Load Structure address into EAX
     PUSH EAX                    ; That is for DispatchMessage
     PUSH OFFSET @F              ; Return address for DispatchMessage
     PUSH EAX                    ; That is for TranslateMessage
     PUSH OFFSET DispatchMessage ; Return address for TranslateMessage
     JMP  TranslateMessage       ; Call the function
@@ : JMP Loop

Now we just have to replace "PUSH OFFSET @F" with "PUSH OFFSET Loop". The final code is :

   @Loop :
        LEA  EAX,Msg                ; Load Structure address into EAX
        PUSH EAX                    ; That is for DispatchMessage
        PUSH OFFSET Loop            ; Return address for DispatchMessage
        PUSH EAX                    ; That is for TranslateMessage
        PUSH OFFSET DispatchMessage ; Return address for TranslateMessage
        JMP  TranslateMessage       ; Call the function
@@ :

 

[ Home | Optimize | Current | Project | Tools | Made in France | F.A.Q | Forums | Links | Sources Codes ]