Re: Byte alignment



The __declspec would apply to compiler-allocated data, but not to dynamically allocated
data. Read about __declspec(align), which explicitly saysit is there to "precisely
control the alignment of user-defined data (for example, static or automatic data in a
function)". RIght away that indicates that it doesn't apply to dynamically-*allocated*
data. It even explicitly says you have to use _aligned_malloc to get dynamic data
properly aligned.

I was told by one client who is using high-end Xeons that even the 32-bit chipsets are now
using wider cache lines. I haven't checked this out, but it came up in a discussion of
optimizing cache hits for regular computations during my system programming course.
AMD64s. assuming it is possible to find one that actually runs Win64 successfully (I am
just packing up my shuttle.com unit and returning it because it doesn't), will have 32- or
64-byte alignment.

By the way, if anyone knows of a workable AMD64/Win64 configuration, I am trying to buy
one. However, after fighting the nVidia Force 4 chipset for several days and asking
around, several other people have told me that the nVidia 64-bit drivers essentially don't
work, and they are very unhappy with their nVidia-based motherboards. One had returned
his, the other is stuck and very, very unhappy. The rest didn't say what they had done,
but they were definitely unhappy.
joe


On Wed, 28 Sep 2005 14:50:18 -0400, "Hanna-Barbera" <NULL@xxxxxxxxxx> wrote:

>
>"Joseph M. Newcomer" <newcomer@xxxxxxxxxxxx> wrote in message
>news:oejlj19i9ib91fbqc67jqnaf711g1mpqlu@xxxxxxxxxx
>> The problem is that you have applied a __declspec to a meaningless
>context. Saying that
>> you want something aligned on 16 byte boundaries does not cause the
>allocation to take
>> place on 16-byte boundaries
>
>I figured I'd give it a shot. D3DX has a typedef specifically for this
>
>#define D3DX_ALIGN16 __declspec(align(16))typedef D3DX_ALIGN16
>_D3DXMATRIXA16 D3DXMATRIXA16, *LPD3DXMATRIXA16;
>
>I did some thing of the sort for my class members, variables declared on the
>stack, ...
>but for some reason it did not work.
>So I tried
>
>float *pMatrix = new D3DX_ALIGN16 float[16];
>
>Still didn't work.
>
>> (why 16 bytes, by the way? Cache alignment? On what chip
>> set? Be aware that some chipsets have wider cache lines than 16 bytes
>these days. AMD64
>> systems, even running 32-bit operating systems, have 32-byte or 64-byte
>cache lines).
>
>I'm aiming for 32 bit CPUs.
>This is what the SSE instruction set requires. I should say that certain SSE
>instructions require it. They pull data faster from cache to the registers
>this way.
>I think this part of the design, even on P4 (all 32 bit), Athlon XP, Athlon
>64 need 16 byte.
>
>> There is an _aligned_malloc call that takes an alignment constraint, but
>you might want to
>> make this a runtime value, not a compile-time value (note that the DirectX
>code may
>> predate newer chipsets, and naively assume that 16 is optimal)
>> joe
>
>I'll give it a shot. I have no idea what D3DX does. I know what I'm doing :)
>
Joseph M. Newcomer [MVP]
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
.



Relevant Pages

  • Re: [BUG] slab debug vs. L1 alignement
    ... Perhaps we should remind ourselves what the alignment rules actually are ... No two kmalloc allocations may share cache lines (otherwise data ... architecture. ...
    (Linux-Kernel)
  • Re: reading a text file into a string
    ... If you are buffering lines and want to avoid unintentional cache ... support specified Alignments that are greater than the maximum Alignment ... This applies to subtypes, ...
    (comp.lang.ada)
  • Re: [rfc][patch 3/3] use SLAB_ALIGN_SMP
    ... I dont understand why you added SLAB_SMP_ALIGN, without removing SLAB_HWCACHE_ALIGN on these places. ... alignment on UP systems as well because we care about the layout of the ... While HWCACHE_ALIGN might be a hint saying: ... The writer carefully designed the structure so that max performance is obtained when all objects starts on a cache line boundary, ...
    (Linux-Kernel)
  • Re: Multiple of 4? Better performance?
    ... Multiples of 4 can also help with alignment ... performance hit from cache stalls and misses. ... Even attempting to help client code "get aligned" doesn't make much sense: ...
    (microsoft.public.sqlserver.programming)
  • Re: Byte alignment
    ... > you want something aligned on 16 byte boundaries does not cause the ... Be aware that some chipsets have wider cache lines than 16 bytes ... This is what the SSE instruction set requires. ...
    (microsoft.public.vc.mfc)

Loading