Tuesday, October 11, 2011

Arrays in Programming Languages

Which programming language implements matrices best?
Matrices and operations on matrices are central to a wide variety of engineering and scientific applications. To name just a few, take for example finite element analysis, computational fluid dynamics, computer graphics.

Most, so called, third generation programming languages (3GLs), such as Fortran, C, C++ implement matrices as arrays. They offer some degrees of efficiency but none support operations on them. That is, the language specification does not define how operations on arrays are to be performed. That is not to say that it cannot be done, for example, in C++ using OOP and operator overloading but the language specification is silent about these operations.

However, the so called fourth generation languages (4GLs) such as Matlab, Scilab, GNU Octave, Mathematica define operations on matrices. This makes them extremely powerful when it comes to engineering and scientific applications. But there is a trade-off, most 4GLs are slower compared to 3GLs. There may be workarounds to this issue of performance, because all the above 4GLs allow linking functions written in Fortran or C into the language through APIs that allow two-way exchange of data between the 4GL and the code written in a 3GL. The same is true of Python when used along with NumPy, SciPy and matplotlib but I am not sure if it is right to call Python a 4GL (although the combination of Python with the above libraries certainly makes it a 4GL).

This post attempts to describe how various 3GLs implement arrays, specifically 1 and 2 dimensioned arrays and the possible ways of implementing operations on arrays in C++. An array is a collection of elements all of which are of the same type and occupy contiguous space in computer's memory. Basic operations on arrays include:
  1. Creating arrays (allocating contiguous memory to accommodate the elements of the array), usually at compile time (static arrays)
  2. Accessing specific elements in an array to either put a value or get a value from that location
  3. Passing arrays as arguments to child functions
Additional operations could include being able to:
  1. Create and resize arrays dynamically (at run time as against compile time)
  2. Perform operations on arrays, such as, addition, subtraction, multiplication, transpose, inversion, determinant, solve linear simultaneous equations, eigenvalue analysis, etc.
Arrays in Fortran
Fortran arrays are said to have the best performance and are excellent at passing arrays to child functions. It is possible to make a one dimensioned array created in the parent function to behave like a multi-dimensioned array within a child function. The only rider is that the size of the array must be passed on as additional arguments to the child function. This technique is used to simulate dynamic arrays. Let me illustrate this:

C23456789
      REAL A(12)
      INTEGER M, N, I

      DO 10 I = 1, 12
        A(I) = I
10    CONTINUE
      CALL PRINTMAT(A, 12, 1)
      CALL PRINTMAT(A, 3, 4)
      CALL PRINTMAT(A, 4, 3)
      STOP
      END

      SUBROUTINE PRINTMAT(X, M, N)
      INTEGER M, N, I, J
      REAL X(M, N)

      DO 100 I = 1, M
        PRINT(*, 10) (A(I, J), J = 1, N)
100   CONTINUE
10    FORMAT(20F12.3)
      RETURN
      END

The main program creates a one dimensioned array of real numbers with 12 elements with the name A. Its elements are assigned the values 1, 2, 3, ..., 12. Now, look at the subroutine PRINTMAT(), which has 3 parameters X, M and N. Within the subroutine, X is defined as a two dimensioned array of real numbers with M rows and N columns. Notice that the size of the array X is defined not in terms of constants but in terms of variables M and N, which can be defined at run time.

Let us switch our attention back to the main program, where the subroutine PRINTMAT() is being invoked. The first time it is called with the arguments A, 12 and 1. During this subroutine call, A of the main program appears like the two dimensioned array X(12, 1), a column vector. In the second subroutine call, array A is treated as X(3, 4) and in the third subroutine call, A is treated as X(4, 3). Note the following points:
  1. A is one-dimensioned in the main program but appears as a two-dimensioned array in the subroutine
  2. When A is passed as an argument to the subroutine, programmer must ensure that the number of elements in the array in the subroutine must be less than or equal to 12, the number of elements allocated to A in the main program.
To simulate dynamic arrays, the typical practice is to create one huge one-dimensioned array in the main program and partition it off into smaller arrays of the required number of dimensions inside the subroutines. Main program keeps track of the number of elements allocated so as to ensure the total number of elements used by the smaller arrays fitted into the huge array does not exceed the size of the array created in the main program. Here is a short illustration:

C23456789
      REAL A(20000)

      CALL READMAT(A(1), 50, 10)
      N1 = 50 * 10
      CALL READMAT(A(N1+1), 100, 10)
      N2 = N1 + (100 * 10)
      CALL READMAT(A(N2+1), 200, 10)
      N3 = N2 + (200 * 10)
      CALL PRINTMAT(A(1), 50, 10)
      CALL PRINTMAT(A(N1+1), 100, 10)
      CALL PRINTMAT(A(N2+1), 100, 10)
      STOP
      END

      SUBROUTINE READMAT(X, M, N)
      INTEGER M, N, I, J
      REAL X(M, N)

      READ(*,*) ((X(I, J), J=1,N), I=1,M)
      RETURN
      END

Even though Fortran does not have dynamic memory allocation in the traditional sense, it can be simulated as shown in the above program. It is just that the programmer ends up doing a lot of book keeping that is done by automatically by the compiler in languages such as C/C++.

Arrays in C
Strangely, even though C/C++ can allocate memory dynamically (at run time), it insists on defining sizes of arrays as constants and does not permit use of variables in defining array sizes, even for arguments to function calls. This serious drawback makes C arrays ill suited for engineering and scientific applications. Thus, while rewriting the above Fortran subroutines in C, the sizes of parameter x must be specified as constants. With this requirement, it becomes necessary to define the size of the array created in the main function to be the same as the size of the array defined in the function. This breaks the main principle of 3GLs, modularity, because now the size of the array in the parent function is dependent on the size defined in the child function.

Here is the C program similar to the above Fortran programs:
void readmat(float x[3][4], int m, int n) {
  int i, j;

  for(i = 0; i < m; i++) {
    for(j = 0; j < n; j++) {
      scanf("%f" &x[i][j]);
    }
  }
}

void printmat(float x[3][4], int m, int n) {
  int i, j;

  for(i = 0; i < m; i++) {
    for(j = 0; j < n; j++) {
      printf("%f, " x[i][j]);
    }
    printf("\n");
  }
}

int main() {
  float a[3][4] = { {1, 2, 3, 4}, {5, 6, 7, 8}, {9, 10, 11, 12}};

  readmat(a, 3, 4);
  printmat(a, 3, 4);

  readmat(a, 2, 4);
  printmat(a, 2, 4);

  readmat(a, 3, 2);
  printmat(a, 3, 2);
}

The following points are worth noting:
  1. Size of array in the main function, that is passed as an argument to a child function must have the same size as in the child function
  2. It as acceptable to use fewer rows and/or columns than the actual size of the array but an error if you exceed either the number of rows or columns

2 comments:

  1. Regarding the Fortran: Dynamic memory allocation is built into the language and is very easy to use. Modularity, data hiding, overloading, pointers and a host of other capabilities are also there. All of these capabilities have been part of standardized fortran for over 20 years and are ready available from multiple vendors and even from open-source. Dan :-)

    ReplyDelete
    Replies
    1. My apologies, in the article above, I should have said Fortran 77. What you said applies to Fortran 90 and Fortran 95. It is true that even in Fortran 77, arrays had features that C does not have even today given its dynamic memory allocation capabilities. I believe the next standard of C will address this issue, but I don't know much about it.

      It is most unfortunate that engineering curricula in India stopped teaching Fortran in the late 80s and then switched to C. I believe that in US, Universities are teaching Matlab or Python. A modern version of Fortran is better than any other when it comes to scientific computing.

      Delete

Your comments are ciritical input for the sustainance and improvement of this blog.