Alignment of arrays in .NET

2019-01-26 14:47发布

问题:

Are arrays in .NET aligned to any boundary?

If yes, to which? And is it the same for all array types?

回答1:

The common language infrastructure (ECMA-335) places the following restrictions on alignment:

12.6.2 Alignment

Built-in data types shall be properly aligned, which is defined as follows:

  • 1-byte, 2-byte, and 4-byte data is properly aligned when it is stored at a 1-byte, 2-byte, or 4-byte boundary, respectively.
  • 8-byte data is properly aligned when it is stored on the same boundary required by the underlying hardware for atomic access to a native int.

Thus, int16 and unsigned int16 start on even address; int32, unsigned int32, and float32 start on an address divisible by 4; and int64, unsigned int64, and float64 start on an address divisible by 4 or 8, depending upon the target architecture. The native size types (native int, native unsigned int, and &) are always naturally aligned (4 bytes or 8 bytes, depending on the architecture). When generated externally, these should also be aligned to their natural size, although portable code can use 8-byte alignment to guarantee architecture independence. It is strongly recommended that float64 be aligned on an 8-byte boundary, even when the size of native int is 32 bits.

The CLI also specifies that you can use an unaligned prefix to allow for abritrary alignment. Furthermore, the JIT must produce correct code to read and write regardless of the actual alignment.

Additionally, the CLI allows for the explicit layout of class fields:

  • explicitlayout: A class marked explicitlayout causes the loader to ignore field sequence and to use the explicit layout rules provided, in the form of field offsets and/or overall class size or alignment. There are restrictions on valid layouts, specified in Partition II.

...

Optionally, a developer can specify a packing size for a class. This is layout information that is not often used, but it allows a developer to control the alignment of the fields. It is not an alignment specification, per se, but rather serves as a modifier that places a ceiling on all alignments. Typical values are 1, 2, 4, 8, or 16. Generic types shall not be marked explicitlayout.



回答2:

I haven't done it myself, but if you need to control the alignment of an array for interoperability with non-managed mode, then you may consider using a (unsafe) fixed array inside a struct with the StructLayoutAttribute applied, and see if that works.



回答3:

In .NET objects (of which arrays are a species) are always aligned based on pointer size (e.g. 4 byte or 8 byte alignment). So, object pointers and object arrays are always aligned in .NET.

The code in Michael Graczyk's answer checks for alignment on the index, because although the array itself is aligned, since it's an Int32 array, the individual odd indices won't be aligned on 64 bit systems. On 32 bit systems, all indices of an Int32 array would be aligned.

So technically that method could be faster if it checked the process' bitness. On 32 bit processes, it wouldn't need to do the alignment check for Int32 arrays. Since all indices would be word aligned, and pointers are word length as well in that case.

I should also point out that dereferencing a pointer in .NET doesn't require alignment. However, it will be slower. e.g. if you have a valid byte* pointer and that points to data that is at least eight bytes in length, you can cast it to long* and get the value:

unsafe
{
    var data = new byte[ 16 ];
    fixed ( byte* dataP = data )
    {
        var misalignedlongP = ( long* ) ( dataP + 3 );
        long value = *misalignedlongP;
    }
}

Reading through .NET's source code, you can see that Microsoft sometimes accounts for alignment and often does not. An example would be the internal System.Buffer.Memmove method (see https://referencesource.microsoft.com/#mscorlib/system/buffer.cs,c2ca91c0d34a8f86). That method has code paths that cast the byte* to long without any alignment checks in a few places, and the calling methods do not check alignment either.



回答4:

I do not know about managed arrays, but in a few places Microsoft's BCL code assumes that fixed arrays are word aligned. Here is an example from BitConverter.cs in .NET 4.0:

    public static unsafe int ToInt32 (byte[]value, int startIndex) { 
        //... Parameter validation

        fixed( byte * pbyte = &value[startIndex]) {
            if( startIndex % 4 == 0) { // data is aligned
                return *((int *) pbyte); 
            }
            else { 
               // .. do it the slow way
            } 
        }
    } 

As you can see, the code checks for alignment using startIndex rather than *pbyte. There are only two reasons why this be the case:

  1. pbyte is always word aligned.
  2. It's a bug.

I don't think it is a bug. I use ToInt32 all the time, and it doesn't ever cause me problems. I also tend to give the BCL the benefit of doubt because the authors sometimes have intimate knowledge of the CLR internals.

I think it is safe to assume that fixed arrays are always word aligned.