tepples wrote:blargg wrote:It's a bad idea to assume the sizes of the integral types in C and C++.
Two specifications impose constraints on a C compiler: the C standard and each platform's application binary interface (ABI). In C, char means byte, and C guarantees that char is always at least 8 bits (CHAR_BIT >= 8).
[1] The ABIs of the most popular platforms (x86, PowerPC, ARM) guarantee that CHAR_BIT == 8, making unsigned char and uint8_t equivalent.
There are several things that are not guaranteed in C or C++ (except maybe in C99 or the latest version of the C++ spec, I dunno?). However, they are true on practically all platforms that anyone has made in the past 20 years, and will continue to be true basically forever:
(1) A byte is 8-bits, and types exist which are 8, 16 and 32 bits in size. Modern compilers all support 64-bit integer types also. Except, you might not know which types are which size! What most people do is simply define their own types for known sizes. Then if you want to support multiple compilers or port it to a different platform, its easy to supply alternate definitions.
In my own code, I usually use the following definitions:
Code: Select all
typedef unsigned char U8;
typedef signed char S8;
typedef unsigned short U16;
typedef signed short S16;
typedef unsigned int U32;
typedef signed int S32;
typedef unsigned long long U64;
typedef signed long long S64;
Then I use those types everywhere, so that it is easy for me to keep track of what is going on when I do arithmetic or other operations on them. The only time I would use "int" or "unsigned" is as a loop counter where I'm not doing any operations with the counter that mix it with those fixed-size types. For example, if I'm only using it to index an array or something, then I might use "int" or "unsigned". But even then I tend to prefer U32 or S32 for loop counters. If it makes you feel better, then typedef these to the new language types (uint8_t or whatever) but I've personally never bothered to do that.
(2) Integers are stored using 2's complement representation for negative integers (i.e. the top bit is the sign bit, there is only one representation of zero--with all bits clear--and the representation of -1 is the number with all bits set. Contrast this with floating-point numbers, where they actually have *two* representations of zero). No one has made a machine with other int representations for at least 20 years.
(3) NULL pointers to any data type (including void*) can be represented by a bit-pattern of all clear bits. So you can (for example) use memset(data, 0, sizeof(MyStruct)); to clear a structure, and assume that any pointers in it are now NULL. The C/C++ languages actually allow the implementation to use almost anything they want for a NULL pointer--even different values for different types! But nobody does this, and too much existing code would break if they ever tried to change it. So go ahead and assume it.
(4) Most platforms nowadays are "32-bit", which means sizeof(int)==4 and sizeof(void*)==4 (in fact size of any pointer type except C++ for pointer-to-member types, should be 32 bits). If you want to be future-proof for 64-bit platforms its a good idea to keep in mind that their pointer types might be 64 bits instead of 32. But supporting those two combinations should be plenty for most code (unless you plan to port it to cell phones or something... and most of those have 32-bit processors now anyway).
(5) "Natural" alignment: this is not guaranteed on every platform, but it works on all x86-based platforms (as well as all of the common PPC-, Sparc- and ARM-based platforms, and probably most others). Basically, small types like to be aligned to their size (i.e. a 4-byte integer type should be aligned on a 4-byte boundary, i.e. bottom 2 bits of its address should be zero). Structures need alignment and size to the largest alignment of any of their members. *Also a structure's size is rounded up to a multiple of its alignment by adding padding at the end*, so that if you have an array of that struct, the members of the array are all properly aligned. Classes == structures (but if there are any virtual methods or virtual base classes, assume the compiler added some crud to your structure that you can't see to support the virtual stuff). On some platforms, a mis-aligned type is harmless (on x86 this is anything 8-byte-aligned or less), though it is probably slower to access. In other cases it is NOT harmless and causes the program to crash! So compilers have to insert extra code to do misaligned accesses (which is a lot slower), AND they have to know that they're doing it---so if you cast a structure pointer to an aligned U64* for example, you might get crashes because you tricked the compiler into thinking the data accessed through the pointer would be aligned when it isn't.
Anyway, you can avoid nearly all alignment problems if you use "natural alignment" for all of your data: Simply don't change structure packing from the compiler default (some people like #pragma pack(1) and such, but I always avoid them because of these alignment requirements), and always put the larger members of your structure first, *or* count the sizes of the members to make sure the later ones are properly aligned:
Code: Select all
struct Foo
{
U8 m_type;
U8 m_flags;
U16 m_blockSize; // <-- offset 2, "natural" alignment == 2
U8* m_pData; // <-- offset 4, "natural" alignment == 4 (on most platforms anyway)
U16 m_dataAge; // <-- offset 8, "natural" alignment == 2
U16 m_padding0; // <-- only exists to make the next field 4-byte aligned
U32 m_counter;
};
Two things to notice about this little example:
(1) I assumed that sizeof(U8*) == sizeof(U32) == 4. You can always check that with a compile-time assertion, but its true on all 32-bit platforms. (NOT necessarily on some of the newer 64-bit platforms though! So the compiler would have inserted an extra 4 bytes of padding before the m_pData field!)
(2) I inserted a 2-byte m_padding0 field, just so that m_counter would have the proper alignment. Actually, the compiler will insert padding by itself (if its necessary, and unless you've told it not to)... but I prefer to stick to the "natural" alignment rule by inserting padding fields myself so that the compiler never has to add them. That makes it easier to manually add up the size of the structure at a glance, too.
[Edit: I forgot to describe the main usefulness of the "natural alignment" rule... many platforms, such as x86 for example, have rules where a 2-, 4- or 8-byte type can have any alignment you want, but if it happens to cross a cache line boundary then it will be slower to access (sometimes much slower). Or they have rules where the integer types support misaligned accesses but the floating point types don't. So if you just stick to "natural alignment", then you guarantee that no 4-byte or 8-byte type is ever going to cross a 32- or 64-byte cache line boundary, and you avoid having to deal with any of those special cases. "Natural alignment" is a simple rule that's easy to follow, and will avoid 99% of potential alignment problems for most code.]
Anyway, just some ideas. Happy coding!