Sunday, November 6, 2011

What I Like (and Don't Like) About Arrays in C/C++

Good and bad things about arrays in C/C++.
This is a continuation of a previous post on Arrays in Programming Languages. This also serves as a prelude to the posts I intend to write in the near future on implementing a two-dimensional array in C++.

Here is a list of things about C/C++ arrays that I don't like:

  1. C arrays start with the index 0 while most numerical methods start counting from 1. It is not a difficult task to reorient one's thinking to start counting from zero instead of 1, but it is certainly an additional task to be handled.
  2. When passing C arrays as arguments to functions, size of the array must be specified as a constant. That statement needs to be qualified. All but the right most index must be specified as a constant. This is a consequence of the fact that internal storage of arrays is row-wise. Thus, for a one dimensioned array, it is acceptable not to specify the size at all (as long as you take care not to exceed the number of elements). For a two-dimensioned array, it is acceptable to specify only the number of columns per row while leaving out the number of rows (as long as you take care not to exceed the last row). For higher order arrays, it acceptable to leave out the size of the right most index while specifying the rest. The result of this is that C functions with arrays as parameters have to specify some of the dimensions as constants, which breaks the modularity of such functions because the arrays that will be passed as arguments to such functions from the parent function must have the same size as specified while defining the function.
  3. Operators such as +, -, * are not defined for array data types (of course, these operations make sense only when arrays are numbers, not any other type).
The above so called "objections" are pertinent only to numerical and scientific computations and are perfectly acceptable for non-numeric, computer science kind of applications. But it is surprising that C has dynamic memory allocation but places a restriction on size of arrays sent as function arguments, whereas Fortran, has no dynamic memory allocation but thinks it wise to implement array size definitions in terms of variables.

Here are a few things about C/C++ that are promising when it comes to numerical and scientific computations:
  1. Pointer arithmetic allows one to use any starting index you wish, as long as you operate only on that part of memory allocated to the array.
  2. Dynamic memory allocation. Being able to decide the size of an array at run-time makes it possible to write functions without worrying about the things one must know before using such functions.
  3. Operator overloading feature of C++ makes it possible to define operations on user defined data types. Although this has run-time overheads affecting speed, it would make writing programs requiring matrix operations simple to use and re-engineer.
So here are the specifications for a simple C++ implementation of two-dimensioned arrays:
  1. Indexing must start from any user-specified value. Thus it should be possible to define an array with indices starting from 1, 0 or a negative index if you so choose (like it is done in Fortran or Pascal).
  2. Memory allocation to an array can be done at run-time or compile time. Array size can be changed (increased or decreased) at run-time.
  3. Operators for +, - and * must be defined so that a statement such as c = a + b can be written, where a, b and c are all matrices.
  4. Memory allocated will be compatible with C arrays, in the sense that the memory is contiguous (this is not necessary when you have your own memory allocation strategy, but is necessary to maintain compatibility with other C functions). Elements will be stored row-wise.
  5. Class must be a class template so that it could cover all numeric types in one definition.
The above wish list has one limitation - it looks only at two-dimensioned arrays, not at multi-dimensioned arrays as a generalization.

One source of information for implementing such a C++ class would be Numerical Recipes in C. This book discusses how to use pointer rearrangement to get any starting index you want.

Another excellent reference is The C++ Programming Language (there is a separate chapter on Numerics). Concepts such as slices are already implemented in C++. You just have to use them in your C++ array implementation.

There are a large number of C++ array classes available in the open source world. Some of them are Blitz++, Boost multi_array library, MTL etc. They are more efficient and do more things than what is outlined above, but then there is nothing that can compare with learning by doing. Further, if you don't need the additional learning curve associated with some of them, you could try this implementation (at the cost of time and effort in developing your own library and living with its limitations and inefficiencies).

No comments:

Post a Comment

Your comments are ciritical input for the sustainance and improvement of this blog.