06 November 2016

Array Initialization in C and C++

A smart student new to C/C++ is confused by array initialization in C and C++. Indeed, standard says something, best practices say something, and compilers may say something else too. So I wrote this.
Note that I wrote in year 2016, this is important because many things can change very fast in the computing world! I will try to update it when I am free.

/* Array initialization in C and C++ */


// Stop GCC from nagging. Yes, you can use gcc/g++ to compile this piece of source code.

#pragma GCC diagnostic ignored "-Wunused-variable"

// This declares an array of 4 integers

// - It has static storage allocation
int ar01[4];

// The first two will be initialized to 1 and 2

// - There is only one copy of this variable, therefore the variable
//   has static storage.
// - The array will be initialized once during load time.
int ar02[4] = {1,2};

int main(void)

{
  // This declares an array of 4 integers
  int ar03[4];

  // The first two will be initialized to 1 and 2

  int ar04[4] = {1,2};

  // The array will be initialized to 2, 3, 4 and 5

  // - How about 6? It's extra and will be ignored by the compiler
  // - C compiler generally issues a warning for this
  //   E.g., gcc: warning: excess elements in array initializer
  // - C++ compiler generates an error instead
  //   E.g., g++: error: too many initializers for int [4]
  int ar05[4] = {2,3,4,5,6};
  // This is a stack-dynamic variable. Therefore it has to be
  // initialized each time it's declared.
  // - A compiler will generally translate the code to:
  //     int ar05[4];
  //     ar05[0] = 2;
  //     ar05[1] = 3;
  //     ar05[2] = 4;
  //     ar05[3] = 5;
  // - In other words, array initialization here is just a syntactic sugar.

  // This declares an array of characters

  // - First 3 elements initialized to 'a', 'b', 'c'
  char ar06[4] = {'a', 'b', 'c'};

  // In C, a string is simply an array of characters where the final

  // character is null ('\0').
  // The array below is thus a C string.
  char ar07[4] = {'a', 'b', 'c', '\0'};

  // We can use a string literal as a 'shortcut' to initialize this array.

  char ar08[4] = "abc";
  // It is important to note that, when you do so, the compiler simply
  // translates the code to:
  //    char ar08[4];
  //    ar08[0] = 'a';
  //    ar08[1] = 'b';
  //    ar08[2] = 'c';
  //    ar08[3] = '\0';

  // How about this?

  char ar09[4] = "abcdef";
  // The size of array initializer is greater than the size of array.
  // So the compiler will ignore excess elements.
  // - In other words, the compiler will simply treat the code above as:
  //     char ar09[4] = {'a', 'b', 'c', 'd'};
  // - Note that 'e', and 'f' will simply be *ignored* by the compiler!
  //     (If you have doubt, use gcc -S and examine the assembly output.)
  // - A string "abcdef" will NEVER be generated by the compiler!
  // - C compiler generally issues a warning for this
  //   E.g., gcc: warning: initializer-string for array of chars is too long
  // - C++ compiler generates an error instead
  //   E.g., g++: error: initializer-string for array of chars is too long

  // How about this?

  char ar10[4] = "abcd";
  // Same as the array above.
  // - So the compiler will ignore the excess terminating null character.
  // - Interestingly, C compiler may not issue warning
  //   for this excess null character!
  //   - E.g., gcc 5.4.1 and clang 3.8.0 do not issue warning here
  //   - However, this array should NOT be used as a normal C string,
  //     since there is NO guarantee that it is null-terminated!
  // - C++ compiler generates an error instead!
  //   E.g., g++: error: initializer-string for array of chars is too long
  // - So this is a 'boundary' case between C and C++:
  //   - No problem with C (may not even have a warning!)
  //   - Error with C++

  // If the size of the array is the same as that of the array initializer,

  // we can choose not to specify the size and let the compiler to fill it
  // in for us.
  int ar11[] = {3, 4, 5};
  // In this case, the size of the array is 3.

  // Same for array of characters

  char ar12[] = {'a', 'e', 'i', 'o', 'u'};
  // In this case, the size of the array is 5

  // How about a C string?

  char ar13[] = "aeiou";
  // In this case, the code is equivalent to:
  //     char ar13[] = {'a', 'e', 'i', 'o', 'u', '\0'};
  // So the size of array is 6.

  // Now, consider the following code:

  char *str01 = "aeiou";
  // We are NOT declaring an array, so there is no array initializer here!
  // Instead, the statement above is equivalent to:
  //     char *str01;
  //     str01 = "aeiou";
  // Here:
  // - When the program is loaded, (at least) 6 bytes will be allocated
  //   to store the null-terminated string "aeiou".
  //   - The string "aeiou" will be allocated to a region that is
  //     supposed to be read-only, enforced either by the hardware
  //     or by the operating system. In other words, the technically
  //     correct data type should be "const char *".
  //   - However, such intensive use of the 'const' keyword is not
  //     practiced by many C programmers.  Nonetheless, care must
  //     be taken as not to modify such strings, or it can lead
  //     to undefined behavior.
  // - When the statement above is reached:
  //   (1) A few bytes will be allocated for the character pointer 'str01'
  //   (2) The pointer will then be assigned the starting address of the
  //       string "aeiou" allocated above.
  //
  // Differences between 'ar13' and 'str01':
  // (1) While 'str01' can be assigned a new value (an address), 'ar13'
  //     can be reassigned and always refers to the starting address of
  //     the array.
  // (2) Storage is allocated for 'str01', but no storage is allocated
  //     for the name 'ar13'.
  //
  // - C compiler generally accepts the statement.
  //   - Since 'const' is not practiced in many C programs, the warning
  //     is disabled by default (in gcc and clang)
  // - C++ compiler generates warning for it since 'const' is encouraged
  //   to be used whenever possible:
  //     gcc -std=c++98: warning: deprecated conversion from string constant to ‘char*’
  //     gcc -std=c++11: warning: ISO C++ forbids converting a string constant to ‘char*’
  //   - So the statement is best to be replaced with:
  //         const char *str01 = "aeiou";

  return 0;

}