3. Can the sizeof operator be used to tell the size of an array passed to a function?
No. There's no way to tell, at runtime, how many elements are in an array parameter just by looking at the array parameter itself. Remember, passing an array to a function is exactly the same as passing a pointer to the first element. This is a Good Thing. It means that passing pointers and arrays to C functions is very efficient.
It also means that the programmer must use some mechanism to tell how big such an array is. There are two common ways to do that. The first method is to pass a count along with the array. This is what memcpy() does, for example:
The second method is to have some convention about when the array ends. For example, a C "string" is just a pointer to the first character; the string is terminated by an ASCII NUL ('\0') character. This is also commonly done when you have an array of pointers; the last is the null pointer. Consider the following function, which takes an array of char*s. The last char* in the array is NULL; that's how the function knows when to stop.
Most C programmers would write this code a little more cryptically:
C programmers often use pointers rather than indices. You can't change the value of an array tag, but because strings is an array parameter, it's really the same as a pointer. That's why you can increment strings. Also,
while ( *strings )
means the same thing as
while ( *strings != NULL )
and the increment can be moved up into the call to puts().
If you document a function (if you write comments at the beginning, or if you write a "manual page" or a design document), it's important to describe how the function "knows" the size of the arrays passed to it. This description can be something simple, such as "null terminated," or "elephants has numElephants elements." (Or "arr should have 13 elements," if your code is written that way. Using hard coded numbers such as 13 or 64 or 1024 is not a great way to write C code, though.)
4. Is it better to use a pointer to navigate an array of values, or is it better to use a subscripted array name?
It's easier for a C compiler to generate good code for pointers than for subscripts.
Say that you have this:
Here's one way to loop through all elements:
On the other hand, you could write the loop this way:
What's different between these two versions? The initialization and increment in the loop are the same. The comparison is about the same; more on that in a moment. The difference is between x=a[i] and x=*p. The first has to find the address of a[i]; to do that, it needs to multiply i by the size of an X and add it to the address of the first element of a. The second just has to go indirect on the p pointer. Indirection is fast; multiplication is relatively slow.
This is "micro efficiency." It might matter, it might not. If you're adding the elements of an array, or simply moving information from one place to another, much of the time in the loop will be spent just using the array index. If you do any I/O, or even call a function, each time through the loop, the relative cost of indexing will be insignificant.
Some multiplications are less expensive than others. If the size of an X is 1, the multiplication can be optimized away (1 times anything is the original anything). If the size of an X is a power of 2 (and it usually is if X is any of the built-in types), the multiplication can be optimized into a left shift. (It's like multiplying by 10 in base 10.)
What about computing &a[MAX] every time though the loop? That's part of the comparison in the pointer version. Isn't it as expensive computing a[i] each time? It's not, because &a[MAX] doesn't change during the loop. Any decent compiler will compute that, once, at the beginning of the loop, and use the same value each time. It's as if you had written this:
This works only if the compiler can tell that a and MAX can't change in the middle of the loop. There are two other versions; both count down rather than up. That's no help for a task such as printing the elements of an array in order. It's fine for adding the values or something similar. The index version presumes that it's cheaper to compare a value with zero than to compare it with some arbitrary value:
The pointer version makes the comparison simpler:
Code similar to that in version (d) is common, but not necessarily right. The loop ends only when p is less than a. That might not be possible.
The common wisdom would finish by saying, "Any decent optimizing compiler would generate the same code for all four versions." Unfortunately, there seems to be a lack of decent optimizing compilers in the world. A test program (in which the size of an X was not a power of 2 and in which the "do something" was trivial) was built with four very different compilers. Version (b) always ran much faster than version (a), sometimes twice as fast. Using pointers rather than indices made a big difference. (Clearly, all four compilers optimize &a[MAX] out of the loop.)
How about counting down rather than counting up? With two compilers, versions (c) and (d) were about the same as version (a); version (b) was the clear winner. (Maybe the comparison is cheaper, but decrementing is slower than incrementing?) With the other two compilers, version (c) was about the same as version (a) (indices are slow), but version (d) was slightly faster than version (b).
So if you want to write portable efficient code to navigate an array of values, using a pointer is faster than using subscripts. Use version (b); version (d) might not work, and even if it does, it might be compiled into slower code.
Most of the time, though, this is micro-optimizing. The "do something" in the loop is where most of the time is spent, usually. Too many C programmers are like half-sloppy carpenters; they sweep up the sawdust but leave a bunch of two-by-fours lying around.