Re: Memory Allocation in a Multi-Threaded Environment



See below...
On Sun, 8 Jun 2008 18:20:16 +0100, "Jeff" <someone@xxxxxxxxxxxxx> wrote:


"David Lowndes" <DavidL@xxxxxxxxxxxxxxx> wrote in message
news:dnkn44l5vhk751e51pb8t39nu7tmirq0do@xxxxxxxxxx
I quickly realised that multi-threading opens up a whole new slew of bugs
for the novice, but there is one area that is really confusing me.
I have a small class that uses an std::vector array. I create a new worker
process with afxbeginthread - this process will expand/contract the vector
(and do some other stuff) and then return. The problem is, I seem to get
heap corruption errors when I try to access the memory allocated in the
worker thread after that thread has finised. Is this to be expected or
should I be looking elswhere for the bug?

Jeff,

It's not expected, but without more details it's nigh on impossible to
know where your error lies. Try the same code in a single threaded
test and see if you suffer the same issue.
****
In general, I believe if you are sharing the same data between two threads, the design is
already hopeless and needs to be redone. See, for example, my essay "the best
synchronization is no synchronization", the point being that you should not create designs
where synchronization is required for correct behavior.

It would not at all be surprising to see heap corruption errors if you are sharing a
std::vector between two threads. It has nothing to do with "storage allocation" as such,
but a LOT to do with the fact that none of the STL containers are "thread-safe" and
therefore should not be used by one thread unless the entire container is locked against
access by all other threads. This is usually a design error, because it can mean lengthy
locks, and therefore is usually a Very Bad Idea.

For example, imagine you are iterating over a std::vector of n elements. While you are
doing the iteration, and modifying the elements, some other thread does a push_back. Alas,
the first thread, which cleverly has a *pointer* to the array position, will find itself
writing to unallocated memory, because the push_back could have freed the old vector
contents after copying to the new vector.

This means you must never allow ANY access to the vector that can change it while another
thread is using it.
*****


Thanks a lot for that, I guess the issue must be somewhere else then.

The particular example concerns reading and processing data from a URL file.
A worker thread is created to read the entire file into a buffer whilst the
main thread begins processing the data as it becomes available. The program
was working fine on my local host where the file was "downloaded" almost
before the processing had begun. When I try it on a remote file however the
heap corruption problem arises - I suspect because the processing is now
catching up with the download. I'm using a local data member to synchronise
reading and resizing of the buffer. In particular the buffer can only be
resized if it isn't being read and can only be read if it isn't being
resized - since a resize could move the buffer. I assume that the buffer can
be read and written to simultaneously.
*****
This sounds like a really bad design. If you want to handle partial downloads, this is
about the worst possible way to do it. What you should do is have a thread that is taking
the partial download and filling a buffer; when the buffer is filled, you
PostMessage/PostThreadMessage/PostQUeuedCompletionStatus this buffer to the thread that is
doing the analysis. It performs the analysis as far as it can, then, if it has run out of
things to do, waits for more information to come down (and this is not as simple as it
sounds). When more information arrives, you resume processing using the new information.

I would never consider any possibility of doing this by using a single shared buffer
between the threads....it is far too difficult to get right (and I've been doing
multithreaded programming since 1968). Your assumption that the buffer can be read and
written simultaneously is very suspect. I would work with partial buffers which were
always treated as being independent; and I would use a model called "distributed finite
state machine" or "distributed context free grammar" to handle the parsing. It looks
harder, but it ultimately is not, and is more robust. Key here is to not confuse state
with stack depth, that is, state is contained in variables, and is not part of the call
stack at all. So at any time, you can return to the main data-fetching loop and all the
state is there. If you've run out of data, you just block waiting for more data to be
delivered, and you can resume the parse at any time because everything you need is stored
as part of the state of your parse.

If you have to wait to resize until the processing is done, or delay the processing until
the resize is done, you have given away most of the advantage of multiple threading,
because you have forced the threads to run in "lock-step", one or the other but not both.
I consider this bad design. Create your threads to run completely independently and
asynchrononously, with no interlocks between them because none are required by such a
design.
joe
*****

If I do the download before (and in the same thread as) the processing then
I have no problem even with a remote Url.
****
This says that your fundamental design has flaws, and you need to redesign.
****

Does anyone have a pointer to a simple program that does this type of
simultaneous buffer and process operation?
*****
It's called the "producer-consumer" model and is one of the oldest and simplest of
designs. There is a discussion of doing this on my Web site; see my essay on semaphores.
But my own preference is to, whenever possible, avoid events, mutexes, and semaphores,
ESPECIALLY mutexes and semaphores, and use the bulit-in primitives. See my essays on
worker threads, on UI threads, and the use of I/O Completion Ports for interthread
queuing, all on my MVP Tips site.
joe
*****

Thanks again
Jeff

Joseph M. Newcomer [MVP]
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
.



Relevant Pages

  • Re: Memory Allocation in a Multi-Threaded Environment
    ... A worker thread is created to read the entire file into a buffer whilst the ... resized - since a resize could move the buffer. ... If I do the download before the processing then ...
    (microsoft.public.vc.mfc)
  • Re: design Q : using timer/threads
    ... You would never "start" a thread to do the I/O, ... You do not make it clear what you mean by "polling". ... You don't need to queue a buffer up; an asynchronous WriteFile works real well. ... Whoever designed the hardware needs a course in introductory hardware design for dummies. ...
    (microsoft.public.vc.mfc)
  • Re: design Q : using timer/threads
    ... > in short bursts using the MM timer, but bottom line is that hardware like this is simply ... > You don't need to queue a buffer up; an asynchronous WriteFile works real well. ... > Whoever designed the hardware needs a course in introductory hardware design for dummies. ... > synchronization of anything dealing with the network. ...
    (microsoft.public.vc.mfc)
  • Re: TeamB, Borland, admit obvious
    ... >> of lacking design. ... Reallocation of the string. ... allocate a sufficiently large buffer, ... It is not a trivial task if you must concatenate lots of strings. ...
    (borland.public.delphi.non-technical)
  • Re: Library Design, f0dders nightmare.
    ... Lousy code design is imposing ... avoid a buffer limitation and the claim of being a general purpose ... Yes but you can tell the difference of imposing an ancient architecture ... many applications can simply do teir command line ...
    (alt.lang.asm)