Re: Inline assembler reference

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



I would say that normally it takes a 30% improvement for it to be noticeable
to the user unless you are talking about runtimes in tens of minutes.
Anything less is not significant enough to be impressive except for the
"benchmarks are gold" group. True, the cumulative inefficiencies of the OS
and other software keep eating up the gain in processor and disk speed so in
many ways our systems are not any faster than an old 286 system. There are
a lot of more features in most software that we use, but the basic
functionality that most of us use is not any faster. How fast can you type?
Can the screen keep up with what you are typing? It couldn't with an old
CP/M system with WordStar and a 80 words per minute typist. I am only about
60 WPM, so it wasn't too bad.

"Shawn B." <leabre@xxxxxxxx> wrote in message
news:eb1P3EQhGHA.3916@xxxxxxxxxxxxxxxxxxxxxxx
Everything really depends on the reason why people use inline assembly.
When
I hear that the main reason is "speed", normally I just laugh.
Performance
improvement can be the reason for using inline assembly only in case if
you
use it in the code that may run at IRQL >=DISPATCH_LEVEL and/or while
interrupts are disabled - otherwise, performance improvement (if any - as
both you and Skywing have properly pointed out, it may result in
slowdown,
rather than improvement) is just negligible.

You make the assumption that we notice that performance isn't all it can
be and then we use inline and then voila, we just believe that magically
performance has been improved.

In all the times where I've used inline assembly to improve performance it
has been under the microsope of a profiler and other performance related
testing techniques. I have never released something that had less
performance after using inline assembly than it did before I used inline
assembly.

Usually, for a few nano/milliseconds I may not even bother, depending on
where the bottleneck is. If the bottleneck is all IO and user related,
then using assembly really isn't always justified. But if its in a
critical loop of an important calculation, than hand-optimization
SSE*/MMX/Floating point instructions (MMX doesn't count on 64-bit Windows)
I've always been able to do both better than the compiler and the
intrinsics the compiler vendor provides. And even if it made performance
for the rest of the function worse off, the net gains were still better
than before the hand-optimizations. I would have to design the innerloops
differently if the entire function/module had to be coded in assembly and
linked in during compile time. I'd rather not do that if its is
avoidable.

I've never written a device driver so I have no idea what kinds of
performance implications there might be... and yes, I realize this is a
kernel related newsgroup. But at the application level, my hand
optimzations (inlining assembly) on the 32-bit compiler has always net
gained me better performance by 10-150% than not hand optimization and
than than intrinsics, depending on what it is I'm doing.

I suppose it's because when inlining assembly, I can make certain
assumptions that the compiler would never know how to make and it works in
my favor. But for a critical loop, even 10% performance gain is
appreciable, any more than that is just icing on the cake.

I would never release something that has worse performance after
hand-optiming than without.


Thanks,
Shawn



.



Relevant Pages

  • Re: Unformatted, big-endian files and fseek
    ... as it is still a compiler used by several. ... F2003 I/O will improve the reading speed versus reading the whole ... most of them have the functionality in one form or other. ... I/O syntax (I think, for example, that the latest version of the Intel ...
    (comp.lang.fortran)
  • Re: Who uses clapack?
    ... > with it and has a compatible compiler send me a line. ... wrapper for the BLAS/LAPACK or Intel MKL libraries. ... claim only to support a subset of the BLAS/LAPACK functionality (though ... I also support all of the ...
    (sci.math.num-analysis)
  • Re: gfortran, g95, and dual-core
    ... makefile, you will gain. ... there is to parallelize the code yourself, ... any compiler, or OpenMP. ... about autoparallelization, others might know more than what I said above. ...
    (comp.lang.fortran)
  • Re: Most Economical VB Development Compiler
    ... The product overview says "All the functionality is ... > OEAPI support folks. ... You can't just buy a compiler. ...
    (microsoft.public.vb.winapi)
  • Re: A Fateman paper
    ... you need to use functionality of your processor that isn't exported by ... i/o instructions that talk to hardware. ... Unless they are exposed as extensions, yes you can need assembler to deal with those. ... otherwise are not emitted by the C compiler. ...
    (comp.lang.c)