Re: Is there a maximum contiguous memory allocation?
- From: Joseph M. Newcomer <newcomer@xxxxxxxxxxxx>
- Date: Mon, 21 Dec 2009 21:46:05 -0500
See below...
On Mon, 21 Dec 2009 19:38:14 -0600, "Peter Olcott" <NoSpam@xxxxxxxxxxxxx> wrote:
****
"Joseph M. Newcomer" <newcomer@xxxxxxxxxxxx> wrote in
message news:gu7vi5d9q8ojc8ae00fm119sf7d2neg6f4@xxxxxxxxxx
See below...
On Mon, 21 Dec 2009 07:17:18 -0600, "Peter Olcott"
<NoSpam@xxxxxxxxxxxxx> wrote:
****
"Joseph M. Newcomer" <newcomer@xxxxxxxxxxxx> wrote in
message news:0dssi5thoj8m18oc3nintm5bdqdfhc4op7@xxxxxxxxxx
See below...
On Sat, 19 Dec 2009 09:15:38 -0600, "Peter Olcott"
<NoSpam@xxxxxxxxxxxxx> wrote:
****
"Bo Persson" <bop@xxxxxx> wrote in message
news:7p47djFhkhU1@xxxxxxxxxxxxxxxxxxxxx
Peter Olcott wrote:
My application needs to create a std::vector > 5GB,
is
that
possible in x64?
Yes, if you have enough virtual memory.
The limitation is not in the amount of RAM, but in the
address space.
Bo Persson
I have read the Microsoft has placed and artificial 2GB
limit on the size of an array. I also read that this
same
limit applies to 64 bit .NET applications, maximum
object
size of 2GB.
My application requires a single contiguous block of
physical memory, is this possible?
I'd missed this, and I'm only coming back to it based on
another reply.
Unless you are writing a device driver for a piece of
hardware designed by an amateur
designer, the chances that you will require contiguous
physical memory is zero.
My DFA recognizer needs contiguous physical memory,
Nonsense. Complete and utter nonsense, beyond any shadow
of a doubt. Why do you keep
talking about "contiguous physical memory"?
(a) it doesn't matter if the physical memory is or is not
contiguous
(b) from an application, you cannot control physical
memory
(c) even if you could control physical memory, you can't
allocate large blocks of it
(d) what part of "virtual memory" are you failing to
understand?
****
So contiguous virtual memory can be mapped to fragmented
physical memory?
As I responded in another answer, this is EXACTLY how it is designed to work.
****
****
or disk****
swap time would make this process infeasibly slow.
In the trade, we call this "life is hard". Meaning,
there's nothing you can do about it.
You are making an impossible request, which
(a) has no meaning
(b) makes no sense because it is impossible to achieve
(c) requires something that makes no sense if a virtual
memory world
(d) is impossible to achieve even for a kernel programmer
working with physical memory
(MmAllocateContiguousMemory)
(e) even if it was possible, it would not change anything,
since you can't address more
than 2GB
(f) that 2GB has to include space for all your
application, other structures you use,
DLLs, all the storage they use, and the OS interface, so
you are reduced to something less
than 2GB
Win64 has a 2GB Limit?
No. We are speaking about Win32 here. As I've already said, Win64 does not have these
limitations. Even 32-bit code in Win64 does not have a 2GB limit if you link
/LARGEADDRESSAWARE (it does have a 4GB limit for total address space. How much of that
you can get in a single allocation depends on a lot of features of your application)
A native Win64 program running in Win64 probably has a much larger limit. Note that
malloc, which will try to initialize the pages in debug mode, can take a long time to
perform this allocation in debug mode (but will run much faster in release mode). So
there you are going to be limited by the size of your swap space. If you want 10GB of
address space, you had better have a 10GB swapfile.
*****
*****
(g) since those various pieces I just alluded to can
fragment memory, in practice you
cannot get arbitrarily large contiguous blocks any time
you fell like it; there is a
practical limit to the maximum block size, which varies
from moment to moment in your
program; the longer your program runs, the smaller this
size becomes.
****
There would be a possible disk read for every pixel on the****
screen.
Current whole screen response time <= 100 ms.
This is called "need to redesign the algorithm".
Typically, in VM systems, you have to
consider things that repack FSM models to maximize
locality of reference. This is a
problem that has been known and understood since at least
1961, and was well-understood
when I started using virtual storage in 1968 (that's 41
years ago). In 1969, we were
spending hours analyzing our algorithms and repacking data
to minimize paging; in fact, we
were even using features of our linker to pack code
adjacent. In 1971, I wrote a
diagnostic program that measured code page transitions
during execution of an application
so we could understand how to pack our code to minimize
page faults by studying its actual
behavior. The first LISP machines (in the 1980s) did not
store lists as lists but as
contiguous arrays to minimize page faults (the extra cost
and complexity of handling
complex array/list structures including automatic
repacking of lists into arrays more than
paid off in terms of performance gains achieved by
avoiding page faults). Once we got
machines with caches, we started redesigning algorithms to
maximize cache hits ever for
pages that were resident. Cache hit performance can
improve your program performance by a
factor of 10; paging optimization can improve your program
performance by a factor of
100,000 to 1,000,000. Or more.
In my case any change to the algorithm would degrade its
performance. In the case of recognizing Asian glyphs I will
have no choice, 2.0 TB of RAM is not yet cost-effective.
Note that you are confusing "degrading performance" with "making performance worse
overall". As I already pointed out, if your algorithm executes lots of extra code to
avoid page faults, it will be *faster*, possibly by orders of magnitude. You are making
the error of confusing instruction cycles with performance.
*****
*****
Note that you can use raw VirtualAlloc to improve your
chances of getting storage (malloc
already guarantees fragmentation most of the time). But
you are still going to hit limits
far smaller than 2GB.
Even in the case of a machine that only has Win XP x64, and
my application with 32GB of RAM? That seems implausible. If
it is true then I could only conclude a horribly bad
architecture design.
Note that I have been talking always in terms of Win32, except when I explicitly refer to
Win64. In Win64, you will be able to allocate MASSIVE structures. However, they are
going to page like crazy, and you need a paging file at least as large as your largest
expected app usage. So if you need 32GB of data space, you will expect to need 32GB of
swap space for it. Now, given particular physical configurations of RAM (e.g., 64GB), you
may not need much of that swap space much of the time. There is even a slight chance you
will never have to page. Of course, this also requires that your app's working space size
be set high enough that it will not get trimmed. But what we have been saying for several
days now: if you need lots of space, you need Win64. That is not an option, that is a
necessity. If you cannot live with a 1GB or smaller contiguous allocation, you have no
choice but to move to Win64. Perhaps the simplest solution for you is simply to move to
Win64 now, and stop worrying about this, because otherwise you are just going to keep
insisting that you want the OS to do something it is incapable of.
Architecture is a collection of interacting decisions, many of which (such as working set)
are user-definable, but whose default values may be unsuitable for you. You have to find
out what all the parameters are, and make sure you have configured them correctly. So you
start by buying a machine with mongo memory, and proceed from there.
*****
*****
I just tried an experiment; I ran a program that tried to
allocate
storage. If it succeeded, I would exit the program and
try again.
Using either VirtualAlloc or malloc, the largest size I
could allocate was 1100MB; 1200MB
failed. I did not try values between these two. And that
was in a trivial MFC program
Someone else on this same thread was able to allocate 4 GB.
Not in this world. It is simply impossible. The system has no concept of allowing more
than either 2GB or 3GB TOTAL USER SPACE. So nobody is EVER going to be able to allocate
4GB of memory on Win32. On Win64, it is equally impossible in a 32-bit program, because
some segment of that memory holds your code, stacks, static data, and heap other than this
one massive structure. So you MIGHT get around 3.5GB plus or minus some change, but never
4GB (don't forget that the first 64K and the last 64K don't exist).
In Win64, 4GB is just a toy allocation. Sort of like 4K in Win32. Serious allocations
are in the terabtyes.
*****
****
that did no other allocation, had no user DLLs loaded
(just what MFC loads). amd allocated
essentially immediately upon startup. Your Mileage May
Vary, but it shows that hopes of
getting larger allocations are very unlikely. Note that
it took about 6 seconds to do the
allocation.
An assumption of uniform time to access large data arrays
is not a valid assumption and
has NEVER been a valid assumption in virtual memory
systems. If you created an algorithm
whose success depends on a performance that is in practice
impossible to achieve, then you
need to rethink your design. It can be as simple as
repacking your FSM so adjacent states
are packed adjacent. Or it simply may be that it is
impossible to achieve the performance
you thought was possible.
Note that the issues of working set and VM do not go away
in Win64. Paging does not go
away. Physical memory still has no meaning.
****
****
If there was some disk equivalent technology that was
comparable in speed to RAM, then this limitation would not
have the same degree of impact. Conventional disk seek
time
would kill my performance. The alternatives that I have
examined are solid state drives and various types (and
redesigns) of RAID arrays.
Yes, those help. Sometimes the only solution is faster
technology. Raw hardware can
solve problems. So can algorithm redesign. Those of us
who grew up on machines with slow
swapping files and small address spaces learned these
lessons. The current generation
thinks that memory is uniform, and it always comes as a
surprise when they discover it is
not.
****
****
It would be really great if this problem did not exist
because I would then be able to process Chinese glyphs
efficiently. The current process is estimated to require
about 2.0 TB RAM. I am working on redesigning the process
to
eliminate this restriction.
Sounds like Win64 to me (8TB limit). But note that you
will still be limited by how many
pages are available in physical memory, and that's not
going to change a lot in the
foreseeable future because of memory costs. Memory costs
are not only the cost of the
chips, but the cost of the space made available on the
motherboards to plug the memory
into (sockets cost money; printed circuit board space
costs money, and there are physical
limits to how many sockets you can place on a
motherboard). For example, a 2GB chip costs
about $80. So a 2TB RAM system requires 1000 chips and
would cost $80,000. But note that
this means you would need 1000 sockets on your
motherboard! Not going to happen. So you
are going to be paging. Take that as a given. It is not
negotiable, it is not avoidable,
it is going to be part of what you live with and you
cannot change that fact. So your
algorithms have to change to accoutn for that.
joe
****
Yes so I have several alternatives.
(1) Conventional Windows like glyphs can easily be
recognized with (by today's standards) small amounts of RAM
(2) Complex glyphs such as those that Apple, PDF, and some
Unix systems have may sometimes exceed the limits of Win32,
and thus require a 64 bit OS. (it will sometimes require
std:vector::size() > 4.0 GB)
(3) The recognition of Asian glyphs will require a redesign
that will save memory at the cost of speed. I think that I
have the elements of this redesign figured out.
These all sound reasonable. The time.vs.memory tradeoffs have always been with us.
joe
****
Joseph M. Newcomer [MVP]
Joseph M. Newcomer [MVP]
(Professional hardware designers as a matter of course
specify what are called "infinite
scatter-gather DMA controllers", which although
"infinite"
is a bit of a misnomer (you are
usually limited to 4GB of descriptors), each descriptor
specifies a 32-bit address and a
32-bit length, allowing a single DMA transfer to
transfer
as many discontiguous blocks of
data as are needed to complete the I/O, in a single
operation).
You may require contiguous *virtual* memory, which is a
different question, and when you
start looking at objects the size you are describing,
either you have to assume that you
will be working with discontiguous memory or you have to
go to a 64-bit native platform.
There are no other solutions.
joe
****
Joseph M. Newcomer [MVP]
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
.
- Follow-Ups:
- Re: Is there a maximum contiguous memory allocation?
- From: Peter Olcott
- Re: Is there a maximum contiguous memory allocation?
- References:
- Is there a maximum contiguous memory allocation?
- From: Peter Olcott
- Re: Is there a maximum contiguous memory allocation?
- From: Bo Persson
- Re: Is there a maximum contiguous memory allocation?
- From: Peter Olcott
- Re: Is there a maximum contiguous memory allocation?
- From: Joseph M . Newcomer
- Re: Is there a maximum contiguous memory allocation?
- From: Peter Olcott
- Re: Is there a maximum contiguous memory allocation?
- From: Joseph M . Newcomer
- Re: Is there a maximum contiguous memory allocation?
- From: Peter Olcott
- Is there a maximum contiguous memory allocation?
- Prev by Date: Re: Is there a maximum contiguous memory allocation?
- Next by Date: Re: Aborting synchronous function call,....
- Previous by thread: Re: Is there a maximum contiguous memory allocation?
- Next by thread: Re: Is there a maximum contiguous memory allocation?
- Index(es):
Relevant Pages
|