Re: output.c error in multithreaded program
From: Joseph M. Newcomer (newcomer_at_flounder.com)
Date: 12/31/04
- Next message: Joseph M. Newcomer: "Re: output.c error in multithreaded program"
- Previous message: Jonathan Wood: "Re: Best Way to Add Custom Control to Dialog"
- Maybe in reply to: Joe: "Re: output.c error in multithreaded program"
- Next in thread: Joe: "Re: output.c error in multithreaded program"
- Reply: Joe: "Re: output.c error in multithreaded program"
- Messages sorted by: [ date ] [ thread ]
Date: Fri, 31 Dec 2004 15:41:46 -0500
See below...
On Thu, 30 Dec 2004 02:59:27 -0000, "Joe" <not@home.com> wrote:
>
>"Joseph M. Newcomer" <newcomer@flounder.com> wrote in message
>news:nku5t05qu2vilpvi3jueqbdhsaic6h0art@4ax.com...
>> On Mon, 27 Dec 2004 12:28:13 -0000, "Joe" <not@home.com> wrote:
>>
>> >
>> >"Joseph M. Newcomer" <newcomer@flounder.com> wrote in message
>> >news:81dvs0hgpp195fhu2usp9j6pj37i772srt@4ax.com...
>> >> CWinThread::Create is essentially AfxBeginThread. Rarely is it
>necessary
>> >to create an
>> >> instance of CWinThread and start it separately.
>> >>
>> >> pThread = (CThreadType1 *)AfxBeginThread(CThreadType1::ThreadFunc,
>> >ThreadParams);
>> >
>> >> >Firstly.
>> >> >#####
>> >> >
>> >> >I want to set up priorites for the
>> >> >threads to access the internet - but I could find any examples so I
>set
>> >up
>> >> >this structure using semaphores. I include the function desc which
>should
>> >> >explain my rationale
>> >> >
>> >>
>>
>>>dwRet=GetHttpFileHarness(LOW_PRIORITY,pThrPars->m_strServerName,pThrPars->
>m
>> >_
>> >> >strObject, pThrPars,FALSE);
>> >> ****
>> >> I don't know what you mean by "priorities". Do you mean that requests
>will
>> >not be in FIFO
>> >> order? That's easy: just modify your queue insertion routine to put the
>> >request in the
>> >> queue in priority order.
>> >
>> >Well I was trying to be clever. Dont forget there are several processes
>with
>> >several groups.
>> >So I dont have a single queue.
>> >The application sees type A header data as more important than C. I
>didn't
>> >want a process thread drilling a header page having to wait for 20 type C
>> >threads to finish downloading their pages in another group or even
>another
>> >process entirely.
>> >The other motivation was that I dont want to hit the server hard, I
>restrict
>> >the global semaphores to 6 in total - so I can only be making 6 requests
>at
>> >a time. A new type 'A' request may get stuck behind 100 type C requests
>if I
>> >can't have some sort of prioritisation.
>> *****
>> Then there's something wrong with your structure. Type A requests would go
>in a type A
>> queue, type C requests in a type C queue. Alternatively, a type A request
>migrates past
>> all type C requests in a queue until it hits a type A request. Thus you
>implement
>> priorities easily. But using thread priorities to try to implicitly
>implement a notion of
>> application importance is almost always a bad idea, because you are
>working under the
>> illusion that these priorities actually mean something. Think of them as
>"suggestions"
>> which the system may or may not honor. Because the Balance Set Manager
>will boost a
>> starved thread to priority 15 at random times, you don't have any way to
>enforce the
>> notion that all A threads will always preempt all C threads. Messing with
>priorities is
>> risky business. Every once in a while you need to, but it should not be
>considered a way
>> of doing business. It offers no real guarantees.
>
>I don't have conventional thread priorites, just the "brute force" approach
>above where I leave a couple of global semaphore openings for priority 1
>request that priority 2 requests dont have access to.
>
>
>> The problem is that there are several problems you are trying to solve
>with fairly complex
>> mechanisms. Instead, you need to reparition your program into simpler
>tasks, and address
>> the tasks. The mechanisms that fall out will be fairly straightforward.
>>
>> I've built some really complex multithreaded programs in my career, and
>I've learned that
>> simpler structures are better. Overall, I just try to keep everything as
>simple as
>> possible, working from the goal back to the mechanisms. Otherwise, the
>complexity of the
>> mechanisms blinds you about the goal.
>>
>> For example, one thing you've said is that there can be several processes.
>A question is
>> "why?" Why not a single process? But if the processes are independent,
>then the number of
>> them doesn't really matter; if you solve the problem once; it is then
>solved for all
>> processes.
>> *****
>
>Right, I did consider having one process, with the parent processes having a
>queue(s) of URL requests to be executed at a time in the future. After a
>request was executed the next one would be set up on a Timer.
>
>But the problems I envisaged with this was firstly I thought I might hit
>some system limits.
>
>At an extreme I could have, say, 100 events. For each event I want to
>download the state of the webpage at the exact same relative time to an
>event "event". To use a ubiquitous example, say I wanted the stock price
>on 100 stocks on exchanges around the globe at 10 minute intervals and
>always *exactly* 1 minute before bourse closure. I *thought* I had more
>chance of achieving this accuracy if I had several processes, one for each
>bourse. They could all issue requests at the same time, only my global
>semaphore restricting concurrent web requests (and of course the response of
>the servers) would impinge on this accuracy. With just one process I was
>unsure how it would cope with so much "going on" at the same time.
*****
Nothing is ever going to happen "exactly" in Windows, Linux, Unix, or any other
general-purpose operating system. "Approximately" is the best you can ever hope for. And
you have no idea how long the transmission delays are; they are indeterminate. You have no
idea of the queueing that would happen at the receipt site, or the response time. So you
have little hope of getting something within a minute of any specific time, let alone
"exactly" a minute before a specified time. You have zero chance of achieving this
accuracy in any case. But since multiple processes actually involve more overhead than
multiple threads, you have created a less efficient construct than using a single process
with multiple threads. Not that any of it really matters. It is no harder to cope with all
that in one process than in multiple processes, and in fact it is easier in a single
process. You have created a situation in which you have to gate multiple processes using
even more complex mechanisms than you would need for a single process. And a hundred
anything is so tiny as to be irrelevant. Except for the limit on WaitForMultipleObjects,
worrying about whether or not a hundred anything will overload a single process or require
multiple processes is just a waste of time. Note that you consume *more* resources with
multiple processes than with a single process.
*****
>
>Note that the stock prices would be my class A requests. Depending on the
>time to bourse closure and stock price we would want occassional snapshots
>of a variable number of the most recent trades struck, say 5 to 40 - this
>would be my class C. Once we have all the trades downloaded we can do some
>stats or whatever on *all* the data. If the download succeeds I put each A
>thread assigned to a stock to sleep until the next scheduled time (the C's
>terminate) , if A fails (the site is down or whatever) I sleep for a shorter
>period.
*****
I do something like this with respect to server maintenance. I typically only have about
30 threads running, but each thread can, at various times, query each server. It then
captures the output from the server. I'm not using http, but pure tcp/ip. This is just an
issue of scheduling. All threads run asynchronously, using asynchronous I/O. All from a
single process. It would scale up to 255 connections, but as far as I know, nobody has
more than 50. I can handle over 1,000 messages per minute (under simulation, we saturated
it at 1350 messages/minute; I could do better with some improvements to the internal
messaging, but realisitically we need about 1/3 of that rate to support real environments)
*****
>
>I give this scenario as an example becuase it's possible/likely that under
>certain "time" conditions we could have a *lot* of pending requests.
*****
Yes, which means any attempt to deal with "exactly one minute" is completely doomed. And
the notion that you are using a global semaphore to do gating leads to the inevitable
"priority inversion" problem where a blocked thread with a low-priority event holds up a
high-priority event. Limiting the number of requests per server can make sense; limiting
the total number of outstanding requests can have serious consequences.
******
>
>So I could have gone either way, but my "procedural driven" rather than
>"message driven" mindset coupled with the worries about "one process"
>system limitations (say on the number of threads or maximum number of TIMER
>requests) meant I took a guess and plumped for my "procedural" approach
>along with multiple processes.
>
*****
The last time somebody came up with a thread-limit idea, I wrote a toy program that
created 5,000 threads. Took me ten minutes to code it. I then recompiled it for 10000
threads and it still worked. At that point, I decided that we were beyond any reasonable
limits and didn't pursue the upper bound issue. I am not aware of any practical limit on
the number of timer requests. There is an upper bound of about 16,384 windows in a
process, or at least there was under Windows NT 4.0; I don't know if that has been
relaxed. This is not MS-DOS. Windows is a real operating system, with immense limits on
most resources. 100 anythings are so trivial as to be unworthy of discussion. I've even
been known to map 100megabyte files into memory.
When attempting to design to limits, the first and most important thing you must do is
actually measure the limits. Only after you have measured them do you know if it is worth
distorting your design to take limits into account. Guessing at limits and distorting the
design without any actual knowledge is bad methodology. I've built systems where the
limits were very small and very real, and where a lot of design effort dealt with coping
with those limits. One thing it taught me is that until you have quantifiable data on the
limits, you don't do a lot of unnecessary extra work to cope with them. Coping with real
limits can be very difficult; one system I worked in had three different algorithms,
depending on the amount of memory actually avaiable. The first was simple and fast. The
second was harder and slow. The last took weeks to debug and was dead slow. But it worked.
So in the best case, I could sort in-memory with a side vector and qsort. If I couldn't do
the side vector, I did bubble sort. And if I couldn't fit it into memory, I did a
multipass disk merge-sort. But I only had 512K bytes of address space to work in.
****
>Throughout the whole development process I was consistently
>wondering/worrying whether the one process approach, driven by the parent
>thread/process and Timers would have been better- but I had no point of
>reference on which to base any decision.
*****
Key here is to take the time to do the experiments that tell you where the limits are.
These take a small integer number of minutes, and you have probably wasted more time
trying to cope with nonexistent limits than it would have taken to determine what the
limits are.
******
>
>I don't want to start from scratch again with an "event/message driven"
>unless I really really must. Also the HTTP
>asynchronous method is much different to the synchronous one -
>including having to use a callback function. I would basically have to go
>back to the drawing board.
*******
Yes, this is a problem. But I'm not sure the current design could actually be made to
work. And if that is true, you may have no choice but to go back to the drawing board. Not
the news you wanted to hear, but it may be the actual news.
******
>
>I dont know how to have one global queue for all the processes to allow me
>to replace my two stage global semaphores prioritising and restricting
>concurrent internet requests. Is it (efficiently) possible? (I suppose I
>could go through a pipe to a single HTTP queueing/sending/receiving process,
>but that seems to add another level of complexity).
*******
This is actually straightforward, at least in an event-driven world. The key here is that
you have two kinds of transactions. Also, there is no reason to limit the total number of
transactions, only the transactions to a particular server. So what I would do for
something like this is have one queue per server, and remove elements from the queue based
on a timer notification. So on each WM_TIMER notification, I would look at each queue; for
every queue that had a pending element, I'd send out an asynchronous HTTP request. The
queue would not be inspected as long as there was an outstanding request.
But the notion of needing to get data within exactly one minute is so hard to consider as
possible I have no idea how I'd even approach that. For one thing, you really have no idea
what time it actually is, or how your clock synchronizes with the clock of the server you
are communicating with. Even if you synchronize with one of the time servers, are you sure
the server you are talking to is synchronized to a time server, and if so, how closely?
There are so many variables involved here it isn't even within the realm of sanity to
consider doing real-time synchronization with a networked server an indeterminate number
of hops away, with an unknown turnaround time.
The simpler approaches are usually best. In the communication world, asynchronous
communication is almost invariably simpler than synchronous communication, once you take
second-order effects into consideration (synchronous is easier if there is never a
failure, hang, or need to shut the program down. However, these are unrealistic
assumptions)
******
>
>For example, process 1 might have no type A requests but 10 type C. Process
>2 might have 10 type A but no type C about to be sent. How can I ensure,
>that the type A's requests are sent before the type C's and only say a
>maximum of 6 requests are active at any time (as I don't wish to have any
>chance of "overloading" the server)?
******
In a single process it is trivial: multiple queues. In multiple processes, you are in such
deep trouble trying to keep them synchronized that it is nearly impossible to come up a
good way to handle this. I think the notion of trying to limit the total number of
messages is what is at fault here. Flow management with priority is not trivial (although
not as complex as you have created a situation to handle it in), but it gets nearly
impossible when distributed among multiple independent processes that have to interact.
Certainly you need more than one semaphore; you need one per type of request, and in fact,
there is no reason to not have multiple outstanding requests in process. If your goal is
to not overload a server (a reasonable goal) then you need one queue per server, not one
queue per type of request.
The problem as I see it is that you started with a simple synchronous model, then starting
throwing things at it to support multiple processes, flow management, and failure
recovery. This is not a good approach to the design. I've done things like this, alas, and
I ended up scrapping the program and rewriting it. I know this is painful. What is more
painful are the times I inherited messes from other programmers and tried to salvage them.
I end up regretting it. Sometimes the only effective way is to back off, reconsider the
design, and start over.
******
>
>I will probably do all the changes over time and develop a one process
>approach if, given my application specifics detailed above, you still thing
>it's the way to go - but I just need something *working* ASAP.
*****
I know, it is frustrating, and no fun. But "working" is one of the problems that is going
to be hard to solve in the current structure, as I've understood it. Wish I had better
news, but I think the structure is where the problems are.
*****
>
>Cheers
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
- Next message: Joseph M. Newcomer: "Re: output.c error in multithreaded program"
- Previous message: Jonathan Wood: "Re: Best Way to Add Custom Control to Dialog"
- Maybe in reply to: Joe: "Re: output.c error in multithreaded program"
- Next in thread: Joe: "Re: output.c error in multithreaded program"
- Reply: Joe: "Re: output.c error in multithreaded program"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|