Chapter 10. The ARM Structured Alignment FAQ

Q: What is structure alignment?
Q: Why is this an issue for ARM systems?
Q: How is this related to the alignment trap?
Q: Which compilers are affected?
Q: What are the advantages of word structure alignment?
Q: What are the disadvantages of word alignment?
Q: What is the magnitude of the porting problem?
Q: Why can't we just change the compiler?
Q: What about mixed distributions?
Q: Some examples of code with problems?
Q: How do I find alignment problems in code from other platforms?
Q: How do I fix alignment problems?
Q: What about C++?

A: All modern CPUs expect that fundamental types like ints, longs and floats will be stored in memory at addresses that are multiples of their length.

CPUs are optimized for accessing memory aligned in this way.

Some CPUs:

When a C compiler processes a structure declaration, it can:

The specifications for C/C++ state that the existence and nature of these padding bytes are implementation defined. This means that each CPU/OS/Compiler combination is free to use whatever alignment and padding rules are best for their purposes. Programmers however are not supposed to assume that specific padding and alignment rules will be followed. There are no controls defined within the language for indicating special handling of alignment and padding although many compilers like gcc have non-standard extensions to permit this.

Summary

Structure Alignment

Structure alignment may be defined as the choice of rules which determine when and where padding is inserted together with the optimizations which the compiler is able to effect in generated code.

A:

These rules are acceptable to the C/C++ language specifications but they are different from the rules that are used by virtually all 32 and 64 bit microprocessors. Linux and its applications have never been ported to a platform with these alignment rules before so there are latent defects in the code where programmers have incorrectly assumed certain alignment rules. Moreover, these defects appear when applications are ported to the ARM platform.

The Linux kernel itself contains these types of assumptions.

These latent defects can consequently lead to:

  • decreased performance;

  • corrupted data;

  • program crashes.

The exact effect depends on how the compiler and OS are configured as well as the nature of the defective code.

These defects may be fixed by:

  1. changing the compiler's alignment rules to match those of other Linux platforms;

  2. using an alignment trap to fix incorrectly aligned memory references;

  3. finding and fixing all latent defects on a case-by-case basis.

The three alternatives are, to some extent, mutually exclusive. All of them have advantages and disadvantages and have been applied in the past so there is some experience with each although the correct solution depends on your goals (see below).

A: On the StrongARM processor, the OS can establish a trap to handle unaligned memory references. This is important because unaligned memory references are a frequent consequence of alignment traps although they are not the only consequence.

Thus, some, but not all, alignment defects can be fixed within an alignment trap.

Furthermore, not every unaligned access indicates a defect. In particular, compilers for processors without halfword access will use unaligned accesses to efficiently load and store these values. If the alignment trap fixes these memory references, the program will produce incorrect results.

On the ARM and StrongARM, if you ask for a non-aligned word and you don't take the alignment trap, then you get the aligned word rotated such that the byte align you asked for is in the LSB.

Consider:

        Address: 0  1  2  3  4  5  6  7
        Value  : 10 21 66 23 ab 5e 9c 1d

Using *(unsigned long*)2 would give:

        on x86: 0x5eab2366
        on ARM: 0x21102366

An alignment trap can distinguish between kernel code and application code and do different things for each.

The basic choices for the alignment trap are:

  1. It can be turned off. The unaligned access will then behave like unaligned accesses on other members of the ARM family without performance penalty.

  2. It can "fixup" the access to simulate a processor that allows unaligned access.

  3. It can fixup the access and generate a kernel message.

  4. It can terminate the application or declare a kernel panic.

There is a significant performance penalty for fixing up unaligned memory references.

A: The disadvantages of word alignment are that:

There is hot debate on both the number of Linux packages that have latent alignment defects and how difficult these defects will be to find and fix. Estimates of the magnitude of the problem include:

 

The only programs that I found that were violating this when I did the original port were very few and far between. I think it was in the order of 1 in 200. However, as of lately, maybe because of the commercialisation of the Internet, this figure appears to be increasing.

 
--unattributed 
 

Generally, the defects I've found stick out like a sore thumb.

 
--unattributed 
 

These problems are so severe that I'd be very surprised if any major Linux application runs reliably or can be made to run reliably without superhuman effort.

 
--unattributed 

Unless other measures are taken, this debate will not be resolved until ARM distributions that align all structures are complete and widely deployed or the attempt is abandoned. Distributions that elect to not align all structures avoid the problem and thus never find out its magnitude in detail.

The alignment trap for application code can be used to produce an estimate of the problem magnitude earlier than this. Application code will execute unaligned memory references in the following circumstances:

When the alignment trap is set to generate a count of traps from application code and code compiled for the StrongARM is run, then every trap signals the existence of a defect that needs to be fixed. If the problem magnitude is large, many messages/counts will be recorded. If the problems are rare or have already been fixed, the trap will be silent.

The early results of this testing on the NetWinder have been:

This picture has changed as more and more packages are updated to newer versions and compiled with newer compiler versions to the point that the number of traps has declined to about 1,000 per CPU minute even with X windows use.

Setting the alignment trap to produce messages or counts is obviously useful for debugging as well. However, it produces only an estimate of the magnitude because there are potential latent defects that will cause applications to fail without ever doing an unaligned memory reference.

The argument that aligned structures are effectively slower is based on three positions:

  1. the fixes to alignment defects often result in slower code;

  2. the alignment trap would be called less frequently if the compiler didn't align all structures;

  3. code compiled for ARM processors will execute slower than code compiled specifically for the StrongARM.

A: All of the following examples are defective in a way that works for most Linux platforms and fails under the ARMLinux distribution. The behaviour of the ARMLinux distribution is described.

Example A

Suppose, I'm doing something to a truecolour image in C++ (brightening it for instance) and I have a pointer to the image in memory.

struct Pixel
{
unsigned char red;
unsigned char green;
unsigned char blue;
};

unsigned char* image;

Pixel* ptr = (Pixel*)image;

inline brighten(Pixel* pix)
{
//...a bunch of code that references *pix
}

for (int x=0; x>1024; x++)
{
brighten(ptr++);
}

The Pixel structure will be padded with an extra byte at the end and will be aligned to a word boundary. Each ptr++ will step the pointer by four bytes instead of three as intended and thus the image will be corrupted. If image is aligned on a word boundary (this is random chance), no unaligned memory references will be made.

If the loop is alterred so that ptr is incremented by three bytes instead of four, then the image may be corrupted depending on what brighten does and the optimization level.

Example B

Suppose now, I have an alpha field:

struct RGBAPixel
{
unsigned char alpha;
Pixel pxl;
};

This is an 8 byte structure with a layout totally different from

struct RGBAPixel
{
unsigned char alpha;
unsigned char red;
unsigned char green;
unsigned char blue;
};

which is the layout on most other Linux platforms.

Example C

struct Date
{
char hasHappened;
char year[4];
char month[2];
char day[2];
};

struct Record
{
char name[20];
Date birthday;
Date marriage;
Date death;
Date last_taxes_paid;
} inbuf;

#define RECORD_LENGTH (20+4*9)

read(fd, &inbuf, RECORD_LENGTH);

All of the date fields will be corrupt after the read.

Example D

This example is from the Kernel source.

struct nls_unicode {
unsigned char uni1;
unsigned char uni2;
};

static struct nls_unicode charset2uni[256] = {...};

Each unicode character consumes four bytes instead of two as on other platforms. Although in this case, the only impact is benign (extra memory consumption).

Attempting to read, write, or copy unicode strings based on this definition would lead to problems.

A: This section is fairly specific to ARMLinux application porting. Fixing all alignment problems, including those that may cause problems in future or on other platforms, is beyond the scope of this FAQ.

The gcc compiler for the ARMLinux distribution aligns all structures containing ints, longs, floats and pointers in the same way as gcc on x86 and other 32 bit platforms. The differences that may result in exposing latent alignment defects are all related to structures consisting entirely of chars and shorts either signed or unsigned. On ARMLinux, these are aligned to a word (4 byte) boundary. On other platforms these are aligned to a character boundary (ie: unaligned) for structures containing only chars and a halfword boundary for structures containing shorts or shorts and chars.

In practice, structures of this nature are relatively rare, so this is a good place to start looking.

The uses of these structures that may cause problems are: