[
Home |
Optimize
| Current
| Project
| Tools
| Made in France
| F.A.Q
| Forums
| Links |
Sources Codes ]

There are many kinds of optimisations as there many way to do
them.
REMOVING THE MUL INSTRUCTION
You can replace the multiply instruction by a left shit if you
are using a power of two constant. You can make many left shifts for replacing
a multiply instruction with any constant. Now here are some examples.
Suppose you want to multiply EAX by 2, just write SHL EAX,1
Now suppose you need to multiply the same register by 100. For
this operation it is very useful to get all the powers of two that we find in
100. This number is equal to 64 + 32 + 4 = 26 + 25 +
22. You need to use three registers for this operation. At the
beginnig EAX contains the number to multiply. Initialize EBX and ECX with
EAX. EAX = EBX = ECX = 100. Now write :
SHL EAX,6 ; EAX *= 64
SHL EBX,5 ; EBX *= 32
SHL ECX,2 ; ECX *= 4
ADD EAX,EBX ; EAX = (X * 64) + (X * 32)
=> EAX = X * 96
ADD EAX,ECX ; EAX = (X * 96) + (X * 4) => EAX = X * 100
Suppose that the registers are initialized with the value 97.
EAX = 97 << 5 = 6208
EBX = 97 << 4 = 3104
ECX = 97 << 2 = 388
6208 + 3104 + 388 = 9700
This method is not the best. There are too many instructions.
Just before the first SHL instruction you have two MOV instructions for setting
the EAX register into EBX and ECX. The best way is given by Paul Hsieh. My
method is only for learning purpose.
Paul Hsieh
replaces these SHIFTS / ADD with the following instructions. Take a look at
Paul Hsieh's multiplication table. I am trying to decrypt it. In this example
EAX is initialized with the number to multiply. We suppose that EAX is now 97.
Mul0100: ; // 3 clocks
LEA EBX,[EAX + EAX * 2] ; EBX = EAX + (2 * EAX)
= > EBX = 97 + (2 * 97) => EBX = 97 + 194 = > EBX = 291
LEA EAX,[EAX * 4] ; EAX = EAX *
4 => EAX = 97 *
4 => EAX = 388
SHL EBX,5
; EBX = EBX << 5 => EBX
= 291 * 32 EBX = 9312
ADD EAX,EBX ;
EAX = EAX + EBX => EAX = 388
+ 9312 EAX = 9700
The first line multiplies the EBX register by 3. The second
multiplies EAX by 4. The third line multiplies by 32 the register EBX. The last
line creates the result by adding EBX to EAX. This code is very compact. It
shows us how to multiply a number by 3 with one line of code only. All the
instructions take one clock. There are 4 lines and 4 * 1 != 3. How does it
compute ? The two LEA instructions are executed at the same time and the result
is one clock. The two others lines are not pairables so they are executed one
by one, in that case they use two clocks. 1 + 2 = 3 clocks.
REMOVE THE DIV INSTRUCTION
Replacing the DIV instruction is more complex than replacing the
MUL instruction. If we divide by a power of two, DIV can be replaced by a right
shift. If we divide by an other constant, it can be quicker to apply the
following formula : A = B / C = B * 1/C. The 1/C is equal to NOT C. For
some divisions we can use a magic number. I had a post on MASM Forum for
dividing by 100, it is at http://www.masm32.com/board/index.php?topic=5821.0
Use "Magic divider" (coded by The Svin) to generate other divisors
MagicNumber = 2748779069
MOV EAX,X
MOV EDX, MagicNumber
INC EAX
MUL EDX
SHR EDX,6
This is only valid for integers. Suppose that the value
of X is 00241259h, it is equal to 2363993.
MOV EAX,X ; EAX = 2363993
MOV EDX,MagicNumer ; EDX = 2748779069
INC EAX ; EAX = 2363994
MUL EDX ; EAX = EAX * EDX = > EAX =
2363994 * 2748779069 = 6498097226441586=
1715FC28E5E372h<BR> ; = 1715FC28E5E372 =>
EDX = 1715FCh and EAX = 28E5E372h
SHR EDX,6 ; EDX = EDX / 64 => 1715FCh /
64
= 1512956 / 64 = 23639
Now we can see that we have lost the number 93 from
2363993. But how is the MagicNumber calculated ?
Raymond gives an answer.
An other way to multiply is like this A = B / C is equal to A = B * (1/C). Here
B = 1724 and C = 33. A = 1724 * (1/33) = 1724 * 0.0303 = 52.2424 but this
operation cannot be made with integers. For computing like this you have
to use the floatting Point Unit (FPU).
Many kinds of optimizations can be found
Agner Fog's Web Site.
My Own Methods
What I suggest is to replace the INVOKE
instruction when two or more calls are following themselves. I just want to say
if you have the following code :
INVOKE TranslateMessage,ADDR Msg
INVOKE DispatchMessage,ADDR Msg
It is tranlated like this :
LEA EAX,Msg
push EAX
CALL TranslateMessage
LEA EAX,MSG
PUSH EAX
CALL DispatchMessage
One way to optimize could be by pushing twice the EAX register
like this the second LEA EAX,Msg and PUSH EAX are simply replaced with only one
PUSH EAX. The new code is :
LEA EAX,MSG
PUSH EAX ; That is for DispatchMessage
PUSH EAX ; That is for TranslateMessage
CALL TranslateMessage
CALL DispatchMessage
What have we won ? One line, and one or two clocks ? Wonderful, but
there is better to do. We will keep the manner the parameters are pushed onto
the stack. When the call to the TranslateMessage is made, it pushes the return
address which is the address of the CALL DispatchMessage, and it jumps to the
TranslateMessage function. OK ?
I suggest that when the TranslateMessage ends it does not return to the line
following this call but goes directly to the DispatchMessage function. This is
made like this :
LEA EAX,Msg
; Load Structure address into EAX
PUSH
EAX
; That is for DispatchMessage
PUSH OFFSET
@F
; Return address for DispatchMessage
PUSH
EAX ;
That is for TranslateMessage
PUSH OFFSET DispatchMessage ; Return address for
TranslateMessage
JMP
TranslateMessage ; Call the function
@@ :
Is it clear ? Good, now there is a last optimizations to do. If the
last call is followed by a JMP instruction. Wa have something like this :
LEA EAX,Msg
; Load Structure address into EAX
PUSH
EAX
; That is for DispatchMessage
PUSH OFFSET
@F
; Return address for DispatchMessage
PUSH
EAX ;
That is for TranslateMessage
PUSH OFFSET DispatchMessage ; Return address for
TranslateMessage
JMP
TranslateMessage ; Call the function
@@ : JMP Loop
Now we just have to replace "PUSH OFFSET @F" with "PUSH OFFSET
Loop". The final code is :
@Loop
:
LEA
EAX,Msg
; Load Structure address into EAX
PUSH
EAX
; That is for DispatchMessage
PUSH
OFFSET Loop
; Return address for DispatchMessage
PUSH
EAX ;
That is for TranslateMessage
PUSH OFFSET DispatchMessage ; Return
address for TranslateMessage
JMP
TranslateMessage ; Call the function
@@ :