Re: Inline assembler reference

Tech Tip: Click here to run a free scan for Windows Errors and optimize PC performance



Everything really depends on the reason why people use inline assembly.
When
I hear that the main reason is "speed", normally I just laugh. Performance
improvement can be the reason for using inline assembly only in case if
you
use it in the code that may run at IRQL >=DISPATCH_LEVEL and/or while
interrupts are disabled - otherwise, performance improvement (if any - as
both you and Skywing have properly pointed out, it may result in slowdown,
rather than improvement) is just negligible.

You make the assumption that we notice that performance isn't all it can be
and then we use inline and then voila, we just believe that magically
performance has been improved.

In all the times where I've used inline assembly to improve performance it
has been under the microsope of a profiler and other performance related
testing techniques. I have never released something that had less
performance after using inline assembly than it did before I used inline
assembly.

Usually, for a few nano/milliseconds I may not even bother, depending on
where the bottleneck is. If the bottleneck is all IO and user related, then
using assembly really isn't always justified. But if its in a critical loop
of an important calculation, than hand-optimization SSE*/MMX/Floating point
instructions (MMX doesn't count on 64-bit Windows) I've always been able to
do both better than the compiler and the intrinsics the compiler vendor
provides. And even if it made performance for the rest of the function
worse off, the net gains were still better than before the
hand-optimizations. I would have to design the innerloops differently if
the entire function/module had to be coded in assembly and linked in during
compile time. I'd rather not do that if its is avoidable.

I've never written a device driver so I have no idea what kinds of
performance implications there might be... and yes, I realize this is a
kernel related newsgroup. But at the application level, my hand
optimzations (inlining assembly) on the 32-bit compiler has always net
gained me better performance by 10-150% than not hand optimization and than
than intrinsics, depending on what it is I'm doing.

I suppose it's because when inlining assembly, I can make certain
assumptions that the compiler would never know how to make and it works in
my favor. But for a critical loop, even 10% performance gain is
appreciable, any more than that is just icing on the cake.

I would never release something that has worse performance after
hand-optiming than without.


Thanks,
Shawn


.



Relevant Pages

  • Re: Free FAT16 Filesystem
    ... being an honourable reason. ... Dave, I made every effort 3 weeks ago to have you understand that if you ... >>Murray did not mention Keil in his correspondence to you. ... >>both your compiler AND under Keil's, ...
    (comp.arch.embedded)
  • compiler and metadata, request opinions...
    ... a lot of the upper/middle compiler machinery is still lacking (such as ... embed the metadata directly into the object modules (the reason being that ... request for a particular piece of information is embedded in a symbol (sort ... it will be loaded into an in-memory version of the database. ...
    (comp.compilers)
  • misc: compiler and metadata...
    ... a lot of the upper/middle compiler machinery is still lacking (such as ... embed the metadata directly into the object modules (the reason being that ... request for a particular piece of information is embedded in a symbol (sort ... it will be loaded into an in-memory version of the database. ...
    (comp.lang.misc)
  • Re: "Sorting" assignment
    ... issue on some ancient compiler doesn't make a lot of sense. ... to his on a few commonly used platforms and compilers, ...  Be sure and call the swap ... reason to find algorithms which operate independent of it. ...
    (comp.programming)
  • Re: IMPLICIT NONE (F2k8+/-)
    ... >convention that has served Fortran for 50 years, is itself, no good ... >reason for change. ... >done, using implicit typing. ... The IBM 1130 compiler could do ...
    (comp.lang.fortran)