Re: Treads in the new 6 core CPU from Intel
- From: Joseph M. Newcomer <newcomer@xxxxxxxxxxxx>
- Date: Sat, 01 Mar 2008 21:15:24 -0500
Why do you think six cores are "fighting" for resources? First, what resources do you
think they are "fighting" for? Until you identify the resources, you have no idea what to
optimize. A naive assumption that every situation results in "fighting" for resources
will mean that you will be unable to make any decisions because every decision will have
six levels of questions you will be unable to answer.
What do you mean by "serious lag"? What do you imagine the issues with "shared pointers"
might be? (Since I rarely share pointers, I rarely care). Why do you think threads as
such will cause any "fights"? If you design your threads wrong, you WILL have serious
contention, but your questions suggest that you are actually designing your threads
incorrectly. Your concentration on the notion of "sharing", for example, indicates that
you think that creating situations in which multiple threads access the same memory
location was actually a good idea in the first place; even in a uniprocessor, this was
usually a bad idea, and you should try to avoid it.
Do you know how the L1 and L2 caching systems work on an x86? Do you know how distributed
L2 cache control works? Do you know what cache coherency is? Do you know what cache
optimization algorithms are, and how you can avoid cache thrashing? Do you know the
implications of an asymmetric I/O bus structure on file system I/O?
Note that there are very few reasons multiple threads should *want* to access "the same
object" in memory.
What, pray tell, is an I/O lock? Do you mean a File Lock (as in LockFileEx)? That
applies only to file systems, not I/O in general. That's a different question and has
nothing to do with multithreading as such; it has to do with concurrent access to a file,
and that can happen on a uniprocessor, and will be no different on a uniprocessor than on
a multiprocessor. In fact, the effects are usually *worse* in a uniprocessor.
Why do you think the file I/O is not taking advantage of the File System Cache? Why do
you think every access to file data requires a disk access? Why do you think the 6ms
number bears any relationship whatsoever to reality--why do you think modern hard drives
have onboard caches on the drive itself? [See, for example,
http://www.seagate.com/www/en-us/products/external/esata_hard_drive/ which describes a
low-end consumer drive with a 16MB cache, or
http://www.hothardware.com/Articles/Seagate_Barracuda_720011_1TB/ which describes a 4.6ms
access drive with a 32MB cache, and even knowing this fact, how would you compare this to
a large-scale RAID array in a NAS system, which would have even heavier caching?] Why do
you think the database system has not already handled most of the details of file system
optimization for you? Why do you think that the file system should be involved in
certain computations? Why do you think that agressive file I/O is a better strategy than
lazy file I/O? Why do you think that systems like SQL Server maintain incredibly complex
caching structures for indexes? What, in fact, do you mean by the phrase "database
developer"? Are you someone who is WRITING a database system, or are you someone who is
the client of a database system, e.g., using SQL, Access, etc.? If you are the client of
a database system, that is, calling upon a database system, realize that its programmers
have spent YEARS tuning that system. If you are trying to "roll-your-own" database by
writing low-level file I/O (ReadFile, WriteFile, LockFileEx, etc.) you had better
consider using asynchronous I/O anyway, to maximize concurrency. If you are working with
file system locks, are you aware that LockFileEx can delay until the locked area is
released? Would you know when to use the LOCKFILE_FAIL_IMMEDIATELY flag? Would you know
when to make the decision to use asynchronous LockFileEx? Do you know how to implement a
transacted database and how to roll back changes on a transaction abort? (I've worried
about these issues, and one thing I would never try today is to re-implement a transacted
file system; I'd buy a third-party package that has thousands, or preferrably tens of
thousands, of users; I would rely on that file system to deal with all I/O optimizations,
file system locks, caching strategies, etc. because I know how incredibly difficult all
these are to deal with. Not on a bet would I try to write a high-performance database
system from scratch. For what it would cost to have me build this from scratch, my
customer could buy a top-of-the-line database system, such as SQL Server, Oracle, or DB2,
which would mean that I would have to expend ZERO effort worrying about the things you
think you need to worry about, and could concentrate on solving the problem instead. Don't
confuse database issues with programming issues such as concurrency, NUMA access costs,
L1/L2 caching, etc.) Note that I not only know the answers to some of these questions, I
teach courses in them. But for other questions you are asking, I could easily spend
months figuring out what the answers might be, and a great deal of that time would be
spent figuring out what *precise* questions I have to ask of the hardware, operating
system, and database software. Until I know how to ask the questions, I would not be able
to understand the answers; and I would have to understand the answers to do to the design
of my system to maximize the performance. You have to get yourself up to speed in the
basic technologies before you know what questions to ask. THEN you can expect to spend
months figuring out those questions and figuring out the answers
Thread starvation is always an issue, but the REAL problems of large-scale multiprocessor
systems rarely have anything to do with the issues you raised. For example, you first
have to decide what you mean by "object", and how large the "object" is, and what you
think the actual contention is going to be. For example, to manage a large image, you
would have hundreds or perhaps even thousands of discrete "objects", each 4096 bytes long,
and you would first have to identify how these pages are managed. Do you know what a NUMA
architecture is? (You have to know this before the discussion can proceed meaninfully).
Are you using Vista? (Vista is NUMA-aware). Note that in a NUMA architecture, your
current concerns about "shared pointers" either vanish or become even worse than you have
imagined, depending on a lot of other issues that you need to think very deeply about, and
for which your current questions show that you need to first learn the basics before you
can worry about the deep problems. There is a huge body of multiprocessor literature,
going back to the mid-1960s. You could spend months studying this before you begin to
understand even the questions, let alone know what the answers should be. There are a few
key topics that you need to look at; early work at CMU, for example, on C.mmp and CM* (the
CM* group invented the term "NUMA") would be a good start. Understanding the bus
architectures of the AMD processors and the high-order Intel processors by reading the
hardware manuals would be useful. A good read of Solomon & Russinovich "Inside Windows"
would be essential. A google search of "NUMA performance" and reading the articles in the
first five pages or so would be a good start. AFTER that, you will have some perspective
on the problem. It is not an easy set of issues, but you are going after them by worrying
about what are often superficial problems, and perhaps believing that there are simple
answers. If you ask the wrong question, however, you will get an answer that can be
irrelevant, misleading, or just wrong.
The most serious multiprocessor problem arises when you do too much synchronization, which
is why you want to AVOID doing synchronization by creating programs for which it is not
necessary. In a program which uses the storage allocator extensively, your most likely
performance bottleneck will be contention for the storage allocator critical section. If
you have an object that requires synchronized access, that will be your bottleneck, so
don't do it. Don't use synchronization if you can avoid it, and that means not writing
your multithreaded app as if it is logically a single-threaded app that just happens to
have lots of threads.
Consider a mutex or a CRITICAL_SECTION as a sign that you might be doing the wrong thing.
There are places where they matter, but as I tell my students: "Synchronization is where
threads rub together. Like any physical system, where things rub, the friction generates
heat and wastes energy. Synchronization wastes energy and therefore, the correct solution
is to avoid the friction by redesigning the system".
You can be concerned, but your knowledge of the issues right now is far too superficial to
let you ask the right questions. Get yourself up to speed on the real details of what is
going on. Note that I don't have answers to a lot of the questions you are asking, but
you need to understand all the questions I asked. If a term is unfamiliar to you, you
need to do some research in it first. Otherwise, you might ask the wrong question of your
system, and get a wrong answer. In fact, you've already started to leap to conclusions
based on a failure to understand either hardware or software at the depth required to
actually solve the problems you are concerned about.
joe
On Tue, 26 Feb 2008 22:00:52 -0800, "Roger Rabbit" <roger@xxxxxxxxxx> wrote:
The new 6 core CPU from Intel is getting me thinking about my development ofJoseph M. Newcomer [MVP]
software and working with hardware too. With 6 cores fighting for resources,
it seems that there is going to be some serious lag issues working with
shared pointers and threads in large scale programming projects. The problem
is with 6 threads all wanting access to the same object in memory such as an
index to a database are going to put a lot of stress on memory. This is
because memory has not kept up with CPU advances.
Worse is going to be for data on a disk, threads are going to use I/O locks
heavily and this will degrade performance when a transaction oriented
program with a large number of queries are operating. Consider a disk with
6ms access times, that 6 million times slower than a 1GHz processor. Now
with 6 threads all wanting something from the disk, its going to make life
complicated for a database developer.
I can see the possibility of a thread literally being starved to death for
CPU in some situations that could lead to serious synchronization problems.
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
.
- Prev by Date: Re: Disable/Enable menu items
- Next by Date: Re: Deselecting all ListBox-Items
- Previous by thread: Disable/Enable menu items
- Next by thread: Re: Treads in the new 6 core CPU from Intel
- Index(es):
Relevant Pages
|