Re: Fast way to allocate buffer for producer/consumer scenario



Ulrich Eckhardt wrote:
Hi!

The topic already says most of what I want, but here is a slightly more
verbose explanation:
I have a thread that performs network service by simply reading from a
socket. At the moment, it then copies the received data into the internal
queue and triggers another thread to handle the received data. The size of
the received data ranges from around 6 bytes to 500 bytes for 98% of all
cases but other, rare cases being even in the range of a few megabytes.

What I now did was to restructure this so that instead of copying the buffer
to the queue, the data is moved into the queue, i.e. the queue assumes
ownership of the buffer and the network service thread allocates a new
buffer for the next transfer.

How about a hybrid approach - copy if up to 500 bytes, move if over 500 bytes?


During that, I noticed that the performance strongly depends on the way the
buffers are allocated, which is why I wanted to ask for the best way to do
it. I guess that the probably fastest way would be a dedicated memory pool,
i.e. one that is used only for this transfer. However, this is not a
trivial task a) to get right and b) to test that you got it right, so I'd
rather not do this but build on a good system-provided way.

The most performant method is not to make any memory allocations at all (e.g. allocate sufficient up front), though this sounds like it might have too much memory overhead for your situation.


I have at this moment tried GlobalAlloc, VirtualAlloc and 'operator new'.
All three of them had a comparable performance but I'm not satisfied with
either of them. GlobalAlloc() is marked as obsolete and slower than
VirtualAlloc(), which is supposed to replace it. VirtualAlloc() however
interoperates with the OS to reserve pages in virtual and physical memory,
which in itself presents an overhead. Doing so repeatedly seems like a lot
of overhead to me, I'd rather recycle the pages inside the process before
interacting with the OS. 'operator new' does exactly that, but it uses a
bytesize granularity instead of a pagesize granularity which imposes a
certain overhead.

Given that your allocations are mostly in the range of 6-500 bytes, surely you want byte granularity?


I have taken a look at the heap functions for creating and accessing a heap,
is that perhaps the right approach? To be honest, this API seems to be
quite complicated, which is why I ask here first if it would solve my
problems.

Any suggestions anyone?

Release build operator new calls the Heap* functions pretty much directly, but you might get a small benefit from using a specific heap. For small operator new allocations, you can get a speed up by calling:
_set_sbh_threshold(512);
or similar at program startup. However, I think you'll get the biggest benefit by using a "small buffer" optimization, where smaller allocations (e.g. up to 500 bytes) are held directly in the object and therefore copied rather than moved. This can be encapsulated easily enough in a class of course. e.g.


class mybuffer
{
  mybuffer(std::size_t size)
    :m_size(size)
  {
    if (m_size > MAX_INTERNAL)
    {
      m_buffer.m_external = new char[m_size];
    }
  }

  ~mybuffer()
  {
    if (m_size > MAX_INTERNAL)
      delete[] m_buffer.m_external;
  }

  //transfers buffer
  mybuffer(mybuffer& other)
    :m_size(other.m_size)
  {
    if (m_size > MAX_INTERNAL)
    {
      m_buffer.m_external = other.m_buffer.m_external;
      other.m_buffer.m_external = 0;
      other.m_size = 0;
    }
    else
    {
      std::memcpy(m_buffer.m_internal,
        other.m_buffer.m_internal, m_size);
    }
  }

  //access members like size() and get().

  //operator=(mybuffer& other) should be done too

private:
  std::size_t const MAX_INTERNAL = 500;
  std::size_t m_size;
  union
  {
    char* m_external;
    char m_internal[MAX_INTERNAL];
  } m_buffer;
};

Tom
.



Relevant Pages

  • Re: Is there a Queue size limit ?
    ... Depending on the exact data being stored in the queue, it could be as much as two or three times as memory intensive. ... And since Queueeventually has zero memory allocations and garbage collections associated with it, the time cost for Queueis much less than for a LinkedList-based implementation. ... Assuming we're talking about a queue of a reference type, that's a quadrupling of the memory overhead. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: a *working* PostThreadMessage() implementation...?
    ... road if you add a secondary message loop for some reason. ... // add a new message to queue ... BOOL PostThreadMessage; ... // deflate queues buffer ...
    (microsoft.public.vc.mfc)
  • Re: a *working* PostThreadMessage() implementation...?
    ... queue is enormous and is not affected by other threads. ... QUAD qCommand; ... bool _MsgBufferStart; ... // free all memory used by message buffer, ...
    (microsoft.public.vc.mfc)
  • Re: a *working* PostThreadMessage() implementation...?
    ... queue is enormous and is not affected by other threads. ... QUAD qCommand; ... bool _MsgBufferStart; ... // free all memory used by message buffer, ...
    (microsoft.public.vc.mfc)
  • Re: Problem with linked list
    ... !pool and see what piece of memory has ... allocated (thankfully you tag your allocations). ... > The buffer is filled the following way: ... >> what's in the section of MYSTRUCT (which you don't appear to have ...
    (microsoft.public.development.device.drivers)