Re: Multi Threading Options
- From: Joseph M. Newcomer <newcomer@xxxxxxxxxxxx>
- Date: Wed, 01 Jun 2005 22:35:08 -0400
See below...
On Wed, 1 Jun 2005 22:46:44 +0100, "Mark Randall" <markyr@xxxxxxxxxxxxxxxxxxxxxxxxxxx>
wrote:
>> Joseph M. Newcomer" <newcomer@xxxxxxxxxxxx> wrote in message
>> news:id6s91potodhqpm8sfqhjq5sjds21c7sev@xxxxxxxxxx
>> See below...
>> On Wed, 1 Jun 2005 18:46:26 +0100, "Mark Randall"
>> <markyr@xxxxxxxxxxxxxxxxxxxxxxxxxxx>
>> wrote:
>>
>>>class CAwAvatar
>>>{
>>> int x, y, z, messages_to, session;
>> ****
>> AVOID ALL USE OF COMMAS IN VARIABLE DECLARATIONS. I see what should be
>> five separate lines
>> of code lumped into one unreadable line above
>
>... It was for the purposes of reducing the length of this post.
>
***
fine. I wouldn't even do it for that purpose.
***
>>> void SendMsg(LPCTSTR text)
>>> {
>>> aw_whisper(session, text) << External C API
>>> messages_to++;
>>> }
>> ****
>> messages_to apparently does not participate in any of the logic dealing
>> with messaging (is
>> it only for doing performance analysis?)
>
>Yes, it was the simplest example - in other situations the variables changed
>are used immediately.
>
****
got it
****
>>> void Kill(LPCTSTR reason)
>>> {
>>> FOR_EACH_OF_AVATARS << Linked List Enumerator
>>> pUser->SendMsg(reason);
>>> END_FOR_EACH << aka '}'
>>> }
>>>}
>> *****
>> Presumably this is shutting down the avatar threads?
>
>It is to notify everyone else connected to the server that a person has been
>removed... my client is a bot on the server, not the server itself.
>
>>>class CAwAvatars
>>>extern CAwAvatars avatars; << defined globally in stdafx
>> ******
>
>> HUGE HUGE RANT GOES HERE
>> Code that does this is unmaintainable. If you need an extern, you will
>> put it, in the
>> best case, in a totally separate file which contains that one line of
>> declaration.
>
>I have a master.h file containing all of my global declairations, few
>hundred of them, most are defines.
>
*****
That sounds suspicious. Such declarations would have to be used in every module to justify
this. Break them up into pieces if they are logically unrelated.
****
>> I
>> cannot emphasize strongly enough that abuse of the purpose of the stdafx
>> files leads to
>> unmaintainable disasters. MFC/C++ is *not* "C programming". Abysmal habits
>> such as lumping
>> everything into a single #include file were poor programming in C, and
>> unacceptable in
>> C++.
>
>My single 'master.h' include file makes all of the variables I use
>everywhere extern, about 10 of them.
>
****
That's probably ten too many
****
>> Using stdafx files for any purpose other than the one they were designed
>> for is
>> unforgiveable.
>
>According to the docs I have, use of the precompiled headers is to reduce
>time inserting the header at the start of every single CPP file, and some of
>my headers are several hundred lines long for my 2 main objects.
>
****
Wow! Several Hundred Lines! Let's see. The last time I measured the compiler it was
compiling about 70,000 lines per minute, so that would mean that a 200 lines takes, let's
see...170 millisconds, just about.
Compare this with the three million lines of Windows headers (use /P and see how many
lines of code you get in the .i file!). Get real. The few hundred lines of your header
file aren't even noticeable in the compilation time. Takes less time than to let the F7
come back up after a recompile request.
You have to look at these things in terms of their actual impact, no some fantasy about
improving your compilation speed. Three million lines of Windows code makes a difference.
A few thousand lines of your own code aren't going to matter in the slightest. But what
you did was compromise the effectiveness and maintainability of your code. Also, if you
partitioned those files up so that you used smaller header files it probably wouldn't even
matter.
The net effect is to LOWER your effectiveness. Instead of having short rebuilds when you
modify something in master.h, you have to go through a complete build of stdafx.obj, which
takes a VERY long time. So if you work it out, the milliseconds you save by putting your
header file in stdafx.h (say a round 200ms) have to be measured against the fact that not
only do you have to do the multi-second rebuild of stdafx.obj, but EVERY module needs to
be recompiled. This is what Ellen's thesis proves: that the increase in rebuild times
caused by techniques such as master.h ultimately result in significantly slower long-term
development due to unnecessary recompilations.
*****
> Even the concept that there is a single global header file for your private
>> variables is a mistake. Adding #include directives is never a mistake.
>
>Private variables? I do not follow. I link my Master.h into the stdafx.h at
>the bottom to avoid having to declaire extern evywhere... The speed increase
>on compilation was a few hundred %.
****
Wrong. Putting the #include directive in every file that uses the variables, and making
sure that every .h file is as small as it can be, will be more effective. You do not get a
gain of a few hundred % by putting your header file in stdafx.h; what you get is a
significant DECREASE in overall performance due to triggering unnecessary builds...and if
you measure it, the compilation of stdafx.obj is the slowest part of any build, in some
cases I've measured it at 4-5 seconds, where a typical build-and-link step can usually
take place in less time than that. This is because not only does it trigger a massive
rebuild, but it must write out the .pch file, 4-6MB, which actually takes significant
time. You get the real large multiples by using .pch files for the Windows headers, which
would otherwise result in a massive slowdown.
*****
>
>> Splitting is good.
>> Lumping is bad. Always. (One of my Ph.D. students did her dissertation and
>> has the numbers
>> to prove that this is always a mistake! This is not just my opinion, it is
>> substantiated
>> by carefully measured evidence). You include the files you need ONLY in
>> the modules that
>> use them! (For example, when I subclass, the FIRST thing I do is remove
>> that asinine
>> #include that Microsoft puts into every derived class, which includes the
>> app class.
>
>I too, but it should get included in the PCH anyway.
****
Why? It isn't used anywhere except in the project.cpp and mainframe.cpp files, and its
presence in the .pch file is completely useless. Worse than useless; it tempts people to
write ill-structured code.
****
>
>> There
>> is absolutely no excuse for this to have been done, and it is always a
>> mistake to leave it
>> in. At best, you may need resource.h, and should include only that. The
>> last major app I
>> delivered had close to 200 source files, and exactly TWO of them included
>> the project.h
>> file, and they were the project.cpp and the mainfrm.cpp files).
>> *****
>
>http://zetech.swehli.com/pics/too_much_code.jpg
>
****
And the point is? I once got a project that had four source files. One of them contained
all the menuing logic AND the interrupt handler! When I was done, I had over 400 files,
all nicely arraged in a sensible fashion. You could find anything with a single keystroke
(put the caret on the function name, hit the key, bang, at the definition). Incremental
rebuild was trivial. I find the philosophy of "fewer files are better" is what I call "the
punched-card mindset", where we would submit our entire FORTRAN program as a 4000-card
deck and have it compiled and linked. I stopped doing that sometime around 1967. I even
have been known to split classes into two or three files. Is small file count a valid
metric? I have not found it so since 1967, and I've never seen the slightest shred of
evidence that it is a good metric. Ellen's research shows that small files and lots of
header files is *provably* a more efficient development strategy. I always suspect this,
but she studied several large projects and has the statistics to leave no doubt that more,
smaller, files, smaller header files, and including only the definitions needed in each
compilation unit, is substantially more efficient.
****
>>>[could be any thread #2]
>>>void some_externally_triggered_func(int session, int x, int y, int z,
>>>[...])
>>>{
>>> CAwAvatar* pUser = avatars.find(session);
>>> if (pUser)
>>> {
>>> pUser->x = x;
>>> pUser->y = y;
>>> [...]
>>> }
>>>}
>> ****
>> If the avatars struct could be modified by any other thread, it *must* be
>> synchronized. At
>> that point, your modifications are so short that releasing the main lock
>> and locking just
>> the individual struct doesn't pay off, unless that stuff inside the
>> conditional has on the
>> order of a thousand lines of code, or could block. It looks like neither
>> would apply to
>> your case, so you only need a single lock to modify it)
>
>At most it is 150 lines.
****
Look at the number of lines of assembly code that get executed, including subroutine
calls. Figure 1ns per line. Look at the cost of a single lock API call, and remember you
would need two: one to first lock the object, and one to unlock the main avatars table
lock. 150 lines is probably more efficient if you don't use the second lock. But before
I'd bet on that, I'd use the high-resolution timer to get some real numbers. My experience
suggests that a second lock will actually significantly decrease performance
****
>
>> ****
>>>
>>>[could be any thread except #2]
>>>void another_external_func(int session, LPCTSTR msg)
>>>{
>>> CAwAvatar* pUser = avatars.find(session);
>>> if (!strcmp(msg, "bad word"))
>>> pUser->Kill("User said something... regrettable");
>>>}
>> ****
>> basically the same problem. SInce you have to actually lock the avatars
>> list in its
>> entirety, unless your lstrcmp represents a long computational loop looking
>> things up from
>> a massive dictionary, the single lock will suffice.
>>
>> Note that if performance is an issue, you might consider a CMap instead of
>> a CList <OOPS,
>> Hit the send sequence by accident...keyboard is flaky> if you have a lot
>> of atavars. The
>> linear lookup time may become the dominant performance problem.
>> *****
>
>With a maximum of about 5,000 items in it, searching for a long - I have not
>encountered any problems in any simulations.
>
****
If you do linear search, that means that on the average you have to search 2500 items. If
you have that much time to spare, then using CRITICAL_SECTION locking and having only one
lock is probably not going to hurt you very much.
****
>>>[ --- thread #2 --- ]
>>>void MyDlg::OnTimer( )
>>>{
>>> FOR_EACH_OF_AVATARS
>>> DoDrawingOf(pUser);
>>> END_FOR_EACH
>>>}
>> ****
>> Given that the refresh is timer=based, you could probably eliminate all
>> synchronization
>> just by at this point sending a draw notification to the thread that
>> handles the avatars.
>
>The avatars are handled in the main thread - its the drawing a map of them
>in another thread that is the problem. All the other thread does at the
>moment is read.
****
The problem is that the drawing thread will then cause the avatar thread, the main thread,
to block if you use simple mutex-style synchronization. That has the potential for being a
serious performance hit.
****
>
>> That is, the avatar struct, intead of being a global variable, is actually
>> a thread-local
>> variable of a CWinThread class. Nobody can even touch it without
>> contacting that thread.
>> If it is a UI thread, then while the drawing is happening, the
>> Post[Thread]Message
>> requests to the thread just pile up waiting for the drawing to finish, but
>> don't block the
>> sending threads (a lock would force blocking of those threads to do the
>> update), and every
>> once in a while, the main thread, on the OnTimer handler, posts a message
>> to redraw. This
>> message will queue up behind all the pending update requests. If the
>> pending queue can get
>> long enough to block updating, you might want to consider my earlier
>> suggestion of a mix
>> of a message pump and an I/O Completion Port (a bad name since it is
>> really a
>> general-purpose queuing mechanism which happens to have a nice interface
>> to the I/O
>> system. The way I do this in MFC, by the way, is described in my essay; I
>> handle it in the
>> OnIdle handler of the CWinThread.
>> *****
>>>
>>>void MyDlg::DoDrawingOf(CAwAvatar* pUser)
>>>{
>>> >> rather large drawing code here based on variables in pUser <<
>>> >> also some methods of pUser may be called here that change its data
>>> <<
>>>}
>>>
>>>Thats the bread and butter of it, of course there are over 100 methods and
>>>properties associated with the AwAvatar class, all are set directly via
>>>pUser->x = x and the like (somehow setting and getting vars through
>>>functions sounds like a good way to add extra overhead, is this going to
>>>come back and bite me in the back side now?)
>> ****
>> Only if you believe that set/get functions actually generate function
>> calls. If you create
>> them in the class itself, you will discover in the optimized code that
>> they generate no
>> real function calls, just inline automatically.
>>
>> void SetSomeValue(int n) { somevalue = n; }
>> int GetSomeValue() { return somevalue; }
>>
>> But never judge what generated code looks like unless you have generated
>> the code using
>> compiler optimizations on. Debug code always looks crappy in this regard.
>>
>> The problem is that if you have to synchronize accesses, the cost of the
>> function call is
>> undetectable (even if it were a full function call) compared with
>> synchronization
>> overhead, which is why it is a good idea to avoid the need for
>> synchronization overhead.
>>
>> What are the expected update rates? Tens, hundreds, thousands per second?
>> If thousands,
>> you may need to rethink what you are doing, because synchronizaiton
>> overhead is going to
>> kill your performance by orders of magnitude over what subroutine calls
>> would have cost.
>> ****
>
>Approx 5 repaints a second, with upto a few hundred updates to the main list
>per second.
***
Not too bad. It might work with a simple PostThreadMessage queue. But to be sure, you
should actually gather performance statistics. Fifteen years of performance measurement
convinced me that the times usually goes to places you don't expect (one of the reasons I
tend to flame about too-clever optimization is that if you compare performance of the
poor-performing but simple algorithm vs. the really-clever, barely-maintainable algorithm,
you find that in nearly all cases you can't even measure the performance improvement of
the clever algorithm. Meanwhile, 30% of the CPU cycles are being eaten by some other place
in the program, which is where the effort should have gone. I couldn't even optimize my
own code without getting performance data, and since I was the performance management
expert, everyone came to me with their programs. They were almost invariably wrong about
what they optimized. Over the years I have learned that simple is best, measure, and
direct the efforts in ways that make sense relative to actual performance data).
****
>
>>>So really, I dont want CAwAvatar to be added or removed while I am looping
>>>through them, and I do not want the AwAvatar object to be changed by
>>>anything else while I am actually reading from it, but while I am reading
>>>from it I need to call functions inside it that change things - even if
>>>those variables are shielded from obvious view by the class
>> *****
>> That's where I'd tend to use queueing models. The painting thread would
>> either be painting
>> or processing update requests. While it was in its painting loop, it
>> wouldn't be seeing
>> update requests, so there would be no problem. The risk is that as soon
>> as you have
>> cycles in the structures, deadlock becomes inevitable if both threads can
>> access the
>> structure from different roots in the structure. Canonical ordering
>> constraints would
>> readily be violated.
>>
>> Another approach I used, when draw performance was an issue, was to
>> "compile" the updated
>> list into an array of draw requests. I could then draw from the "compiled"
>> array at high
>> speed while letting alternate threads update the "source" structure. But
>> my update rate
>> was fairly low, although disruptive to the drawing logic.
>
>Considering this at the moment.
****
Going back to my paragraph above, the compilation was dictated by the fact that list
traversal in the presence of disruptive updates was the killer, that synchronization was
causing me to miss hard real-time deadlines. The idea was that once the realtime data
phase was entered, get the update logic out of the way. I was actually able to measure
lock time on that system, and it was absolutely fatal. [To give you an idea of how long
ago this was, we had a 20us timer which was good enough for extremely high-resolution
timing. Today, I'd need a 10ns timer to get the same degree of resolution relative to the
code stream].
*****
>
>> Another approach I took was to segment the drawing list, so I could be
>> adding to
>> "inactive" segments without requiring synchronization overhead in them. I
>> didn't have
>> cycles to deal with, so it was easy to partition. Since there was only one
>> drawing thread,
>> and it only locked one segment at a time, I had no deadlock issues to
>> address. If you
>> could partition the drawing list into logically disjoint segments, then
>> high-performance
>> synchronization using CRITICAL_SECTION would work pretty well. The only
>> time you would
>> block the update thread is while a segment was being drawn, and if you
>> make the segments
>> small enough, this puts an upper bound on the amount of time the insertion
>> thread would be
>> delayed.
>>
>> If you are at the tens of items per second, PostMessage/PostThreadMessage
>> should work
>> fine.
>>
>> It is unfortunate that the kernel resource known as an "Executive
>> Resource" which allows
>> multiple readers and a single writer is not exported to user space. It
>> could be done using
>> a device driver, but at that point, you are into potentially fairly large
>> overheads.
>>
>> I found two articles on the mutliple-reader-single-writer problem. One
>> uses my earlier
>> suggestion of a file system lock as the locking primitive; the other
>> refers to a .NET
>> feature of a multiple-reader-single-writer lock, which is described as the
>> ReaderWriterLock class. It has a C++ interface, but it would require using
>> .NET
>> components.
>> *****
>>>
>>>So in summary:
>>>
>>>- Looping and Adding to linked list must be mutally exclusive
>>>- Any data from the avatar should be readable at any time, but should wait
>>>until all data has finished being updated;
>>>- Any thread needs to be able to call methods in the avatar class without
>>>fear of half-way-changing other things in it for something else thats
>>>reading.
>> ****
>> The issue is one of "priority". If you need to do an update, RIGHT NOW,
>> then you can
>> simply use my two-tier approach to allow drawing to get priority over
>> updates. But this
>> delays the updates. Using Post[Thread]Message or PostQueuedEventStatus (to
>> an I/O
>> completion port) means the sending threads don't have to block during the
>> drawing, even
>> though the updates are delayed. The delay could be seen by the user, but
>> not by the
>> threads generating the update requests.
>> ****
>>>
>>>I hope that allows you to suggest anything else you may be able to think
>>>of.
>>
>> Joseph M. Newcomer [MVP]
>> email: newcomer@xxxxxxxxxxxx
>> Web: http://www.flounder.com
>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>
>Cheers Joe... But calm down about PCH's or you might have a heart attack and
>then we dont get your help anymore! :P
****
It is a performance and correctness issue. If you save 200ms by having your header file
precompiled, you have to have a minimum of about 25 compilations with an unchanged header
file to even break even. And you violate serious issues about structuring programs in the
MFC culture by using those files in ways outside of their intent. For example, your ten
externs cost you on the order of 150us to compile, at 70,000 slpm, not enough to justify a
five-second rebuild on a change. File system caching and #pragma once work wonders to
reduce that time either further. I finally came to the conclusion that every .h file must
compile standalone, that is, whatever project files it takes to make it compile must be
included in the .h file itself; it is not legitimate to assume the user will put the right
headers in, or in the correct order. It took me years to come to that conclusion. The more
I program, the more I am attracted by the notion of one function per file, when the
functions are unrelated to each other.
****
>
>- Mark R
>
Joseph M. Newcomer [MVP]
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
.
- Follow-Ups:
- Re: Multi Threading Options
- From: Mark Randall
- Re: Multi Threading Options
- References:
- Multi Threading Options
- From: Mark Randall
- Re: Multi Threading Options
- From: Joseph M . Newcomer
- Re: Multi Threading Options
- From: Mark Randall
- Re: Multi Threading Options
- From: Scott McPhillips [MVP]
- Re: Multi Threading Options
- From: Joseph M . Newcomer
- Re: Multi Threading Options
- From: Mark Randall
- Re: Multi Threading Options
- From: Joseph M . Newcomer
- Re: Multi Threading Options
- From: Mark Randall
- Multi Threading Options
- Prev by Date: Re: Message Map with multiple inheritance - FIXED!
- Next by Date: Re: Check Box
- Previous by thread: Re: Multi Threading Options
- Next by thread: Re: Multi Threading Options
- Index(es):
Loading