Re: volatile and win32 multithreading

From: Jerry Coffin (jcoffin_at_taeus.us)
Date: 01/26/05


Date: Tue, 25 Jan 2005 20:52:53 -0700

In article <esk28GxAFHA.2712@TK2MSFTNGP15.phx.gbl>, MikeAThon2000
@nospam.hotmail.com says...

[ ... ]

> Thanks for your thorough reply. I have seen some of those comments before
> (guess where <g!>) and I admit that I still do not fully understand all of
> the points you make.

Perhaps it would help to step back for a moment.

Volatile makes the compiler produce code that reads from memory when
the value is read and writes to memory when the value is written. The
problems arise primarily due to caching: when the code attempts to
read or write memory, it will typically only REALLY read from/write
to the cache. What goes into the cache doesn't normally get written
out to memory immediately at all.

In fact, the processor typically attempts to keep things in the cache
as long as it can. It will flush something out to memory only when it
needs to. The primary reason is that the cache is full, and it needs
to make room to load something new. In this case, attempts to find
the least recently used line in the right part of the cache, and
flushes it out to memory. There are two problems with this: first of
all, keeping track of the time any given item is used takes too much
space, so it really only takes a guess at least-recently used.
Second, an item at any particular address always gets put into one of
a (fairly small) number of specific places in the cache.

Assume I have two volatile variables X and Y. I update them in that
order, and since they're volatile, I assume other processors will see
the updates in that order as well. Having finished that, I do
something else that reads from A, B, C and D. Just for the sake of
argument, we'll assume this is a four-way set-associative cache, and
that A, B, C and D all happen to map to the same cache lines as Y
did. Since we read A, B and C after we updated Y, when we read D, Y
is now the oldest item in those cache lines, so it gets flushed out
to memory. Meanwhile, we haven't done anything that touches the part
of the cache that X is in, so it hasn't been written out yet, and we
haven't even a good idea of when it will be.

Worse: memory is already a bottleneck in most situations. When you
add more processors, updating lots of things to memory becomes even
more of a bottleneck. Therefore, the more processors you want to
support, the harder you work at keeping things in caches, and the
more you (usually) relax how up-to-date you keep memory. Now that
processor clock speeds aren't going up constantly like they used to,
we can expect nearly all machines to start to have more and more
processors. Code that can't take advantage of them will be considered
poor, and code that works incorrectly on an MP machine will become
essentially unusable.

-- 
    Later,
    Jerry.
The universe is a figment of its own imagination.


Relevant Pages

  • Re: 54 Processors?
    ... > My memory is cloudy but I seem to recall these statements around the ... a big problem was strong memory consistency model and cache ... were saturating their machines ... ... the common L2 cache interfaced to the SCI memory access port. ...
    (bit.listserv.ibm-main)
  • Re: double-checked locking in C
    ... Of course you may think that the object takes place on cache line boundary and is only partially updated. ... Then if I stick to standards, the C norm is enough and I just have to use volatile. ... # 4.10 Memory Synchronization ...
    (comp.programming.threads)
  • Re: Cached memory never gets released
    ... Stock linux 2.4.26 kernel. ... Due to flash bug 3M of memory gets lost due to font memory getting lost ... The output of "free" cache number steadily grows. ... longer to exhaust all of system memory with the cache. ...
    (Linux-Kernel)
  • Re: Embedded software interview question collection
    ... volatile really does and because of this the new piece of hardware they have released witb an embedded processor does not work if the cache is switched on!!!! ... I now need to got through all my code and try to move every variable into a seperate block of memory so there will be no memory corruption. ... and this could be catagorized as DMA. ...
    (comp.arch.embedded)
  • Re: Problem: Creating a raw binary string
    ... > While its true that a 64-bit cpu will move twice the data per instruction it ... > Memory bus width plays an important role here and unless it too is widened / ... You are forgetting the two levels of cache in the processor. ... The memory chips are addressed in Row col fashion. ...
    (alt.comp.lang.borland-delphi)

Loading