Re: Is there a maximum contiguous memory allocation?

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



See below...
On Tue, 22 Dec 2009 07:39:14 -0600, "Peter Olcott" <NoSpam@xxxxxxxxxxxxx> wrote:


"Joseph M. Newcomer" <newcomer@xxxxxxxxxxxx> wrote in
message news:qjb0j55qf7r19bdd2dcm1dciotd7g79jq3@xxxxxxxxxx
Followup: with my trivial MFC app, the largest allocation
I could get with VirtualAlloc or
malloc was 1194*1024*1024 bytes. 1195 failed.

Yes but then are you testing this on Win64 with 8 GB + RAM?
I need to know that Win64 with 32 GB of RAM could provide me
with a 30 GB std::vector that maps to that much physical
memory.
****
No, I'm testing it in the context of Win32. If you have Win64, then try the experiment
yourself!

Note that if I had my Win64 system up, with 4GB of RAM, I could STILL allocate 30GB of
vector. Most of it would be paged out most of the time, but could ALLOCATE it! I could
allocate it if I had 2GB of physical memory! I see no reason I would need 8GB+ of RAM to
do a simple allocation. In fact, if the allocation failed I would seriously start
worrying about Windows. Note, of course, that I also need, ir order to allocate 30GB of
VM, probably on the order of 60GB of paging file, which would consume pretty much my
entire 68GB C: drive (which won't happen because I have Office and VS installed on it, so
I don't have enough space left for that massive pagefile.sys), so I'd have to install a
second hard drive just to hold the paging file. So I have to meet the minimum system
requirements. But with a 60GB paging file I can allocate 30GB of vector independent of
the amount of physical memory I have installed.

PLEASE STOP USING THE PHRASE "PHYSICAL MEMORY". IT IS NEVER GOING TO HAPPEN! Your
continued use of this phrase in consistently erroneous ways is not helping make your case.
It keeps screaming "I have no idea what I'm talking about, but answer my question anyway".

If you allocate a 30GB std::vector, you get 30GB of virtual memory. The chances that this
will all be resident in a 32GB physical memory configuraiton is vanishingly small, let's
just call it "no chance whatsoever" to simplify the discussion. If you want 30GB to be
largely resident, you must
- make sure the user has the privilges to modify the working set quota
- set the working set quota to be large enough to encompass your
code, your data, AND your 30GB vector

Note that this merely IMPROVES the likelihood that your vector will be not paged out when
you go to access part of it. In NO WAY does it make any guarantees! And "working set" is
a "request", not a "non-negotiable demand". The system is completely free to ignore
anyone's working set request if it needs pages for any purpose it deems suitable.

If you want a 100% guarantee that the vector will be resident, and not get paged out, you
need to
- make sure the user has the privileges to use VirtualLock, and to be able
to lock at least 30GB down
- use VirtualLock to lock the memory [probably will fail on a system with
32GB, so plan on 64GB as your minimum config]

NOW you have locked down 30GB of VIRTUAL memory. It won't page out. But on a machine of
< 64GB, it is unlikely this can happen. It may not even be possible because there might
be no way to allow a user to set a working set large enough, or do a VirtualLock large
enough; I don't know what the administrative limits are. But these are the minimum steps
you need to take.

Note that both of these require administrative controls be exercised. Note that virtually
none of your customers would have a clue as to how to set any of these parameters, so you
will have to give them a script to follow (and I have never worked with these parameters,
and I have no idea how to access them). The script itself will require someone with
administrative privileges set these parameters.

Note that a system that can survive with only 2GB to run the kernel and the application
you are working with is extremely unlikely. So if you want 30GB, you had better have a LOT
of spare memory around! So don't even think of trying to do this on a system < 64GB.

You will never, ever, under any circumstances imaginable be able to allocate physical
memory. You can only allocate virtual memory.
joe
****


For a real app, one with lots of DLLs, threading, etc. the
value will typically be
smaller. Perhaps much smaller.

Your Mileage May Vary.
joe


On Mon, 21 Dec 2009 12:35:49 -0500, Joseph M. Newcomer
<newcomer@xxxxxxxxxxxx> wrote:

See below...
On Mon, 21 Dec 2009 07:17:18 -0600, "Peter Olcott"
<NoSpam@xxxxxxxxxxxxx> wrote:


"Joseph M. Newcomer" <newcomer@xxxxxxxxxxxx> wrote in
message
news:0dssi5thoj8m18oc3nintm5bdqdfhc4op7@xxxxxxxxxx
See below...
On Sat, 19 Dec 2009 09:15:38 -0600, "Peter Olcott"
<NoSpam@xxxxxxxxxxxxx> wrote:


"Bo Persson" <bop@xxxxxx> wrote in message
news:7p47djFhkhU1@xxxxxxxxxxxxxxxxxxxxx
Peter Olcott wrote:
My application needs to create a std::vector > 5GB,
is
that
possible in x64?

Yes, if you have enough virtual memory.

The limitation is not in the amount of RAM, but in
the
address space.


Bo Persson



I have read the Microsoft has placed and artificial 2GB
limit on the size of an array. I also read that this
same
limit applies to 64 bit .NET applications, maximum
object
size of 2GB.

My application requires a single contiguous block of
physical memory, is this possible?

****
I'd missed this, and I'm only coming back to it based
on
another reply.

Unless you are writing a device driver for a piece of
hardware designed by an amateur
designer, the chances that you will require contiguous
physical memory is zero.

My DFA recognizer needs contiguous physical memory,
****
Nonsense. Complete and utter nonsense, beyond any shadow
of a doubt. Why do you keep
talking about "contiguous physical memory"?
(a) it doesn't matter if the physical memory is or is not
contiguous
(b) from an application, you cannot control physical
memory
(c) even if you could control physical memory, you can't
allocate large blocks of it
(d) what part of "virtual memory" are you failing to
understand?
****
or disk
swap time would make this process infeasibly slow.
****
In the trade, we call this "life is hard". Meaning,
there's nothing you can do about it.
You are making an impossible request, which
(a) has no meaning
(b) makes no sense because it is impossible to achieve
(c) requires something that makes no sense if a virtual
memory world
(d) is impossible to achieve even for a kernel programmer
working with physical memory
(MmAllocateContiguousMemory)
(e) even if it was possible, it would not change anything,
since you can't address more
than 2GB
(f) that 2GB has to include space for all your
application, other structures you use,
DLLs, all the storage they use, and the OS interface, so
you are reduced to something less
than 2GB
(g) since those various pieces I just alluded to can
fragment memory, in practice you
cannot get arbitrarily large contiguous blocks any time
you fell like it; there is a
practical limit to the maximum block size, which varies
from moment to moment in your
program; the longer your program runs, the smaller this
size becomes.
****
There would be a possible disk read for every pixel on
the screen.
Current whole screen response time <= 100 ms.
****
This is called "need to redesign the algorithm".
Typically, in VM systems, you have to
consider things that repack FSM models to maximize
locality of reference. This is a
problem that has been known and understood since at least
1961, and was well-understood
when I started using virtual storage in 1968 (that's 41
years ago). In 1969, we were
spending hours analyzing our algorithms and repacking data
to minimize paging; in fact, we
were even using features of our linker to pack code
adjacent. In 1971, I wrote a
diagnostic program that measured code page transitions
during execution of an application
so we could understand how to pack our code to minimize
page faults by studying its actual
behavior. The first LISP machines (in the 1980s) did not
store lists as lists but as
contiguous arrays to minimize page faults (the extra cost
and complexity of handling
complex array/list structures including automatic
repacking of lists into arrays more than
paid off in terms of performance gains achieved by
avoiding page faults). Once we got
machines with caches, we started redesigning algorithms to
maximize cache hits ever for
pages that were resident. Cache hit performance can
improve your program performance by a
factor of 10; paging optimization can improve your program
performance by a factor of
100,000 to 1,000,000. Or more.

Note that you can use raw VirtualAlloc to improve your
chances of getting storage (malloc
already guarantees fragmentation most of the time). But
you are still going to hit limits
far smaller than 2GB. I just tried an experiment; I ran a
program that tried to allocate
storage. If it succeeded, I would exit the program and
try again.

Using either VirtualAlloc or malloc, the largest size I
could allocate was 1100MB; 1200MB
failed. I did not try values between these two. And that
was in a trivial MFC program
that did no other allocation, had no user DLLs loaded
(just what MFC loads). amd allocated
essentially immediately upon startup. Your Mileage May
Vary, but it shows that hopes of
getting larger allocations are very unlikely. Note that
it took about 6 seconds to do the
allocation.

An assumption of uniform time to access large data arrays
is not a valid assumption and
has NEVER been a valid assumption in virtual memory
systems. If you created an algorithm
whose success depends on a performance that is in practice
impossible to achieve, then you
need to rethink your design. It can be as simple as
repacking your FSM so adjacent states
are packed adjacent. Or it simply may be that it is
impossible to achieve the performance
you thought was possible.

Note that the issues of working set and VM do not go away
in Win64. Paging does not go
away. Physical memory still has no meaning.
****

If there was some disk equivalent technology that was
comparable in speed to RAM, then this limitation would
not
have the same degree of impact. Conventional disk seek
time
would kill my performance. The alternatives that I have
examined are solid state drives and various types (and
redesigns) of RAID arrays.
****
Yes, those help. Sometimes the only solution is faster
technology. Raw hardware can
solve problems. So can algorithm redesign. Those of us
who grew up on machines with slow
swapping files and small address spaces learned these
lessons. The current generation
thinks that memory is uniform, and it always comes as a
surprise when they discover it is
not.
****

It would be really great if this problem did not exist
because I would then be able to process Chinese glyphs
efficiently. The current process is estimated to require
about 2.0 TB RAM. I am working on redesigning the process
to
eliminate this restriction.
****
Sounds like Win64 to me (8TB limit). But note that you
will still be limited by how many
pages are available in physical memory, and that's not
going to change a lot in the
foreseeable future because of memory costs. Memory costs
are not only the cost of the
chips, but the cost of the space made available on the
motherboards to plug the memory
into (sockets cost money; printed circuit board space
costs money, and there are physical
limits to how many sockets you can place on a
motherboard). For example, a 2GB chip costs
about $80. So a 2TB RAM system requires 1000 chips and
would cost $80,000. But note that
this means you would need 1000 sockets on your
motherboard! Not going to happen. So you
are going to be paging. Take that as a given. It is not
negotiable, it is not avoidable,
it is going to be part of what you live with and you
cannot change that fact. So your
algorithms have to change to accoutn for that.
joe
****

(Professional hardware designers as a matter of course
specify what are called "infinite
scatter-gather DMA controllers", which although
"infinite"
is a bit of a misnomer (you are
usually limited to 4GB of descriptors), each descriptor
specifies a 32-bit address and a
32-bit length, allowing a single DMA transfer to
transfer
as many discontiguous blocks of
data as are needed to complete the I/O, in a single
operation).

You may require contiguous *virtual* memory, which is a
different question, and when you
start looking at objects the size you are describing,
either you have to assume that you
will be working with discontiguous memory or you have
to
go to a 64-bit native platform.
There are no other solutions.
joe
****
Joseph M. Newcomer [MVP]
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Joseph M. Newcomer [MVP]
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
Joseph M. Newcomer [MVP]
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Joseph M. Newcomer [MVP]
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
.



Relevant Pages

  • Re: Is there a maximum contiguous memory allocation?
    ... but could ALLOCATE it! ... allocate it if I had 2GB of physical memory! ... the amount of physical memory I have installed. ... Note that you can use raw VirtualAlloc to improve your ...
    (microsoft.public.vc.mfc)
  • Re: Is there a maximum contiguous memory allocation?
    ... if you have enough virtual memory. ... physical memory is zero. ... Note that you can use raw VirtualAlloc to improve your chances of getting storage (malloc ... that did no other allocation, had no user DLLs loaded. ...
    (microsoft.public.vc.mfc)
  • Re: Is there a maximum contiguous memory allocation?
    ... if you have enough virtual memory. ... physical memory is zero. ... contiguous arrays to minimize page faults (the extra cost ... Note that you can use raw VirtualAlloc to improve your ...
    (microsoft.public.vc.mfc)
  • Re: VirtualAlloc()
    ... large/global memory usage instead of "new". ... allocate physical storage. ... uses `VirtualAlloc'. ...
    (microsoft.public.vc.language)
  • Re: Application becomes slow in windows server 2003
    ... Memory allocation is not your bottleneck. ... I asked the difference between heapalloc and virtualalloc because of my ... Since calloc,malloc call heapalloc so i thought to use it directly. ... assumption was it will improve performance(time to allocate memory). ...
    (microsoft.public.vc.mfc)