Re: Double-Checked Locking pattern issue

Tech-Archive recommends: Fix windows errors by optimizing your registry



Thanks for your correction, Ben!


I agree with you. :-)

I understand generally reorder instructions to fully utilize pipeline is a
good idea. But I want to know in my specific case, why swap step 2 and step 3
is faster? Could you provide more description please?


regards,
George

"Ben Voigt [C++ MVP]" wrote:


"George" <George@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:D8AE73B1-06F4-4A4B-9986-11DF04C9165B@xxxxxxxxxxxxxxxx
Thanks Ben,


I understand generally how pipeline works. :-)

But I am not sure how (if for the purpose of pipeline, compiler do the
re-ordering) the general rules applies to my specific case. Any ideas?

Out of order execution is reordering in the CPU, not the compiler, to make
more efficient use of the pipeline.



regards,
George

"Ben Voigt [C++ MVP]" wrote:


"George" <George@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:AFF9AD5D-1564-4099-AA56-C36A23EA124B@xxxxxxxxxxxxxxxx
Thanks Igor,


to memory is expensive, it is conceivable the CPU may reorder the
write
to the pointer as early as possible.

The only reason I could think of is writing earlier to memory could
save
the
register and we could save the register for later use.

What are your points about why writing early will improve performance?
Could
you show more description or some pseudo code please?

You aren't considering pipelining, cache effects, speculative branching,
or
any of the other things that new CPUs use to retire multiple instructions
per clock.

In short, while some CPU can retire four instructions per clock, there
aren't four copies of every unit, and certainly it can't transfer that
much
to/from memory at once. So the CPU reorders instructions to get
different
instructions that use different parts of the CPU executing together. In
essence, this is what hyperthreading also does, except it interleaves a
separate flow of execution instead of reordering a single flow.

All of this is part of the reason that function calls are so very
expensive.



regards,
George






.



Relevant Pages

  • Re: New ARM Cortex Microcontroller Product Family from STMicroelectronics
    ... freeze all stages of the pipeline while you wait for the Flash to ... National 32 bit CPU with variable length instructions. ... Reading internal SRAM is a one clock cycle operation on the AVR32. ...
    (comp.arch.embedded)
  • Re: Double-Checked Locking pattern issue
    ... I understand generally how pipeline works. ... In short, while some CPU can retire four instructions per clock, there ... separate flow of execution instead of reordering a single flow. ...
    (microsoft.public.vc.language)
  • Re: Double-Checked Locking pattern issue
    ... I understand generally reorder instructions to fully utilize pipeline is a ... other code is in the function, what parts of the CPU are being used. ... because although memory reads and writes may be reordered, the pipeline ... will see the intermediate states of a reordering. ...
    (microsoft.public.vc.language)
  • Re: Double-Checked Locking pattern issue
    ... I understand generally how pipeline works. ... Out of order execution is reordering in the CPU, not the compiler, to make ... In short, while some CPU can retire four instructions per clock, there ...
    (microsoft.public.vc.language)
  • Re: Opteron versus P4
    ... that this CPU could execute thre FADD instructions in parallel, ... It has throughput 1 for FADD and this means that there is one pipeline ... measure a throughput of 1 per cycle on code that blends these instructions. ...
    (borland.public.delphi.language.basm)