Re: Yet again - closing a thread



See my post which is part of this thread.

In the system I described where I spent a year making it bulletproof, most of it the
effort was involved in adding exception handlers to recover at each level where recovery
was needed. It was an event-driven system, so in effect the interior of the "GetMessage"
loop, in Windows terms, was protected by an exception frame (much as the MFC dispatch
handler is protected).

What pleased me, and I still have the printout, was the day the program took a parity
error which had been reflected to the user app by the kernel. My recovery code was so
thorough that it actually recovered from this error (probably three months' effort went
into that piece of code, including some kernel changes to allow me to effectively restart
the entire app "in place" and retrieve all handle state by the metadata I could attach to
each handle).

I also regularly recovered from an access fault that we never did find the cause of. I
finally gave up and made the code robust. So when the access fault was taken (it was a
highly-multithreaded app on a 16-way multiprocessor; we always suspected some weird race
condition), I was able to recover and keep running. We had one of these about every three
days. In the entire year we (and I mean a team of several people) could never figure out
the cause. At least one theory was that occasionally the multiported memory might be
returning a 0 value)

But as you point out, it is very likely a bug in the program. As such, it is critical to
identify what the bug actually is, which the one-line loop cannot possibly do.
joe

On Fri, 03 Feb 2006 11:04:37 -0600, "Doug Harrison [MVP]" <dsh@xxxxxxxx> wrote:

On Fri, 03 Feb 2006 10:33:04 -0600, "Doug Harrison [MVP]" <dsh@xxxxxxxx>
wrote:

On Fri, 3 Feb 2006 03:26:12 -0800, "dududuil"
<dududuil@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:

Thanks all, I'd made some serious advance sicnce, and true - the main problem
was some windowing stuff, the working thread had done.

One more question is this: asuming that my thread does simple tasks, and
terminate itself very quickly: If the Wait call fails (WHY?) I can not
continue - and have no solution but to termininate everything all together -
which is unacceptable.

Now that's a good point. If you know your handle is valid, and you have the
necessary access rights, then the Wait will not fail. If it does fail, it
will be due to a bug, most likely in your code, or you might have a
hardware failure. I suppose it could also fail if some malicious code
injected itself into your process and invalidated the handle.

Does anyone has an idea of what to do after the Wait call fails, and still
you must be certain that the thread had terminated itself?

If the Wait fails, then all bets are off. It indicate the program state has
changed in such a way that you can no longer reason correctly about its
behavior, and it would be reasonable to terminate your program right then
and there. You might instead throw an exception.

I added the last sentence at the last minute, and assuming you're talking
about Wait returning WAIT_FAILED (which I was assuming), I already regret
it. Clearly, you expect the wait to succeed, and if it doesn't, you have an
error whose cause you can only guess at. C++ exceptions are for dealing
with anticipated errors due to known causes that occur in correct programs.
The problem with throwing an exception here is that it conflates errors due
to bugs or other random glitches with errors that can occur in the normal
operation of your program. This can lead to errors far away from the Wait
call, which can greatly hinder debugging; instead of catching the bug in
the act, you can end up having to track it down to the scene of the crime
based on indirect evidence that something bad happened. Also, if program
state is corrupted, who can say that stack unwinding won't do more harm
than good? So this is a difficult question. It would be reasonable for a
debug build to assert and for a release build to std::terminate() or
abort(). When things that "cannot happen" do happen, the first rule is, "Do
no harm." For example, if you're a word processor, you shouldn't save all
the user's documents by overwriting his existing good files on disk; before
exiting, you should tell him something went wrong and perhaps offer to
write the documents to new files that he can inspect later.
Joseph M. Newcomer [MVP]
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
.



Relevant Pages

  • Oracle procedure raises exception but looks to be successful from DBI
    ... I've just spent a long time tracking down a bug in an Oracle procedure because DBI's execute method returned success even though the procedure raised an exception. ... If I replace the procedure with a single call to raise_application_error execute does fail. ...
    (perl.dbi.users)
  • Re: Problem with KITL?
    ... > Enable kernel debugger ... This OK if KdStub stumbling on its own BP. ... Exception in debugger, Addr=0x801AB864 - attempting to recover ...
    (microsoft.public.windowsce.platbuilder)
  • Re: General exception Handling question
    ... The idea is to catch the exception and, ... possible, recover from it cleanly. ... I think you should always implement a try-catch at the highest level ... middle of the night if my app dies a horrible death. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Reconnect after Oracle maintenance
    ... maintenance), but once in a while it is still not able to recover, and ... driver level caching of the older corrupt connections that are retained ... When the DB recovers, our application throws the following exception, ...
    (microsoft.public.dotnet.framework.adonet)
  • Re: Outlook 2003 and orphaned .ost
    ... Hello, this is a bug. ... I have lost my ability to recover ... be able to at least recover the OST folder using some sort ... >|| Microsoft Office and Microsoft Office related News ...
    (microsoft.public.outlook)

Quantcast