Re: Inline assembler reference
- From: "Shawn B." <leabre@xxxxxxxx>
- Date: Wed, 31 May 2006 15:26:07 -0700
Everything really depends on the reason why people use inline assembly.
When
I hear that the main reason is "speed", normally I just laugh. Performance
improvement can be the reason for using inline assembly only in case if
you
use it in the code that may run at IRQL >=DISPATCH_LEVEL and/or while
interrupts are disabled - otherwise, performance improvement (if any - as
both you and Skywing have properly pointed out, it may result in slowdown,
rather than improvement) is just negligible.
You make the assumption that we notice that performance isn't all it can be
and then we use inline and then voila, we just believe that magically
performance has been improved.
In all the times where I've used inline assembly to improve performance it
has been under the microsope of a profiler and other performance related
testing techniques. I have never released something that had less
performance after using inline assembly than it did before I used inline
assembly.
Usually, for a few nano/milliseconds I may not even bother, depending on
where the bottleneck is. If the bottleneck is all IO and user related, then
using assembly really isn't always justified. But if its in a critical loop
of an important calculation, than hand-optimization SSE*/MMX/Floating point
instructions (MMX doesn't count on 64-bit Windows) I've always been able to
do both better than the compiler and the intrinsics the compiler vendor
provides. And even if it made performance for the rest of the function
worse off, the net gains were still better than before the
hand-optimizations. I would have to design the innerloops differently if
the entire function/module had to be coded in assembly and linked in during
compile time. I'd rather not do that if its is avoidable.
I've never written a device driver so I have no idea what kinds of
performance implications there might be... and yes, I realize this is a
kernel related newsgroup. But at the application level, my hand
optimzations (inlining assembly) on the 32-bit compiler has always net
gained me better performance by 10-150% than not hand optimization and than
than intrinsics, depending on what it is I'm doing.
I suppose it's because when inlining assembly, I can make certain
assumptions that the compiler would never know how to make and it works in
my favor. But for a critical loop, even 10% performance gain is
appreciable, any more than that is just icing on the cake.
I would never release something that has worse performance after
hand-optiming than without.
Thanks,
Shawn
.
- Follow-Ups:
- Re: Inline assembler reference
- From: David J. Craig
- Re: Inline assembler reference
- From: anton bassov
- Re: Inline assembler reference
- Prev by Date: Re: Inline assembler reference
- Next by Date: Re: Inline assembler reference
- Previous by thread: Re: Inline assembler reference
- Next by thread: Re: Inline assembler reference
- Index(es):
Relevant Pages
|