Memory and Data Alignment in C++ — What is Misaligned Access?
Data or memory alignment is a significant concept in software development that surprisingly isn’t discussed often. Mostly because of how advanced compilers and interpreters have gotten, that these lower-level complications are taken care of without most of us knowing.
But since you are programming in C++, you’re always looking for more performance. Ensuring that your data is optimally aligned in memory is one of the more straightforward ways to immediately boost performance. But to do so, one should understand how memory alignment works in C++.
What is Memory Alignment?
As a short prerequisite, let’s discuss how our data is stored in memory. Data is stored in single-byte “cells” arranged sequentially. In the early days of computing, computers had data buses that were only of one or two-byte width. For instance, 8 bit computers had memory arranged sequentially in a single bank, fetched by an 8-bit data bus.
This means that in order to fetch an 8-byte long data, it would require 8 memory read cycles — which doesn’t seem very efficient.
As we get to 32-bit and 64-bit computing, the data buses have also increased in size. In order to benefit from the larger data buses, memory can be arranged in multiple banks that are accessed in parallel, with its sequential arrangement preserved.
Here is a more realistic diagram for the folks that prefer them :)
In this case, an 8-byte long data, like a double
, will only need one memory read cycle to fetch! However, this is only true if the data is allocated in an address divisible by 8 — or on a 8-byte boundary.
If the address of the data is misaligned i.e. its address is not divisible by 8, it will span across two rows. Now, it requires two memory read cycle to fetch.
Alignment Requirement
In C++, every object has an alignment requirement. To quote from cppreference,
alignment requirement is a non-negative integer value representing the number of bytes between successive addresses at which objects of this type can be allocated.
Let’s take a look at a simple application to understand this.
#include <iostream>
struct A
{
char mChar; // size: 1
int mInt; // size: 4
};
struct B
{
char mChar; // size: 1
double mDouble; // size: 8
int mInt; // size: 4
};
struct C
{
double mDouble; // size: 8
int mInt; // size: 4
char mChar; // size: 1
};
int main()
{
std::cout << "Size of struct A is: " << sizeof(A)
<< " with alignment: " << alignof(A) << std::endl;
std::cout << "Size of struct B is: " << sizeof(B)
<< " with alignment: " << alignof(B) << std::endl;
std::cout << "Size of struct C is: " << sizeof(C)
<< " with alignment: " << alignof(C) << std::endl;
return 0;
}
Without compiling the code, we could naively calculate and assume the memory allocation as follows (assuming LP64 environment — see 64-bit data models):
struct
A = 1 (char) + 4 (int) = 5 bytes
struct
B = 1 (char) + 8 (double) + 4 (int) = 13 bytes
struct
C = 8 (double) + 4 (int) + 1 (char) = 13 bytes
Let’s compile the code and see if our assumptions are right.
$ g++ --version
g++ (GCC) 11.4.0
$ g++ -dumpmachine
x86_64-pc-cygwin
$ g++ align.cpp -o align
$ ./align
Size of struct A is: 8 with alignment: 4
Size of struct B is: 24 with alignment: 8
Size of struct C is: 16 with alignment: 8
As observed, the compiler has added extra bytes to our structure!
This is done to satisfy alignment restrictions of each member variable of the struct
, and the struct
itself.
In struct
A, mChar
is of 1-byte alignment, followed by mInt
that is of 4-byte alignment. In order to allocate mInt
on a 4-byte boundary, 3 bytes of padding is added after the mChar
. Hence, our total size is 1 + 3 (padding) + 4 = 8 bytes.
If we apply the same steps for struct
B and C, we would get:
- B = 1 + 7(padding) + 8 + 4 = 20 bytes
- C = 8 + 4 + 1 = 13 bytes
However, we observed that the compiler has added more padding for both.
This is because alongside the members variables, we still have to satisfy the alignment requirement of the struct
itself.
Let’s take a look at struct
B. Say we have an array of objects of this type. The alignment requirement will be violated for the second object, as seen in the figure below.
The object at index 0
would be aligned just fine, but mDouble
and mInt
of object at index 1
are misaligned. Fetching them is misaligned access.
To fix this, the compiler adds more padding bytes to satisfy the alignment of the struct
— which is the equivalent to the largest alignment of all the member variables. In struct
B’s case, this is 8 (mDouble
). This is the value that is returned by alignof(B)
. The same process applies to struct
C.
The keen eye amongst us would have noticed that struct
B and C contain similar data but one occupies more bytes than the other (24 and 16 respectively).
This means that the order of member variables could help reduce padding, which results in less space consumed.
Note: For special cases, padding bytes can be avoided through compiler-specific commands such as #pragma pack(n)
for GCC.
Misaligned Access in C++
It’s important to be cognizant that misaligned access in C++ is undefined behaviour. Take a look at the following code for instance.
#include <cstdint>
#include <iostream>
int main()
{
uint64_t a = 42;
// Cast address of 'a' to pointer to char
char* x = reinterpret_cast<char*>(&a);
// Increment the address by one (size of char)
uint32_t* b = reinterpret_cast<uint32_t*>(x+1);
// Access object through pointer
std::cout << *b;
return 0;
}
If we compile and run the code with UBSAN (Undefined Behaviour SANitizer, enabled with fsanitize=undefined
), we get the following error.
Note that without UBSAN, the code runs fine on my machine. This is because most x86 CPUs support unaligned access, but might come with a cost on performance. Nevertheless, undefined behaviour should always be avoided, since nothing is guaranteed. Don’t be surprised if it hands you a pint of 🍺 the next time you run it!
Why Align?
As we discussed, memory alignment restrictions ensure minimum memory cycles to fetch data, and that our code runs properly on all platforms — regardless of support for unaligned access.
Additionally, memory alignment is also used for optimisation, such as cache line alignment (which I also wrote a story on!) and SIMD intrinsics which require specific alignment to take advantage of the modern vector processors of today.
Hopefully that helped you understand the concept and importance of memory alignment in C++. Feel free to leave a comment if there are any doubts, or something you would like to add. Happy coding! :)