The Computational Kernel

Next: Performance for sparse matrix Up: The Sparse Matrix Benchmark Previous: I/O considerations

The Computational Kernel

Once the matrix is read in, it is converted into the internal data structure. Here the popular compacted sparse row format for storing sparse matrices is used. This comprises two integer arrays $ia,\ ja$ and one double precision array , where is of dimension and points to the start of each row in the arrays and . The array is of dimension and contains the column indices, while is also of dimension and contains the matrix entries.

The main kernel in the computational part of the benchmark is the code for the product of two matrices. In F90 this is of the following form, where the product is calculated. Each matrix is represented in the compacted sparse row format; for example, matrix is represented by arrays $ia,\ ja,\ a$ , matrix by arrays , and product matrix by arrays $ic,\ jc,\ c$ .


subroutine spmatmul(n,m,ia,ja,a,ib,jb,b,ic,jc,c)
  ! C = A*B
  integer (kind = myint), intent (in) :: n,m
  integer (kind = myint), intent (in) :: ia(*),ib(*),ja(*),jb(*)
  integer (kind = myint), intent (out) :: ic(*),jc(*)
  real (kind = myreal), intent (in) :: a(*),b(*)
  real (kind = myreal), intent (out) :: c(*)
  integer (kind = myint) :: mask(m),nz,i,j,k,icol,icol_add,neigh
  real (kind = myreal) :: aij
  ! initialise the mask array which is an array that has 
  ! non-zero value if the column index already exist, in which 
  ! case the value is the index of that column 
  mask = 0
  nz = 0
  ic(1) = 1
  do i=1,n
     do j = ia(i),ia(i+1)-1
        aij = a(j)
        neigh = ja(j)
        do k = ib(neigh),ib(neigh+1)-1
           icol_add = jb(k)
           icol = mask(icol_add)
           if (icol == 0) then
              nz = nz + 1
              jc(nz) = icol_add
              c(nz) =  aij*b(k)
              mask(icol_add) = nz     ! add mask
           else
              c(icol) = c(icol) + aij*b(k)
           end if
        end do
     end do
     mask(jc(ic(i):nz)) = 0   
     ic(i+1) = nz+1
  end do
end subroutine spmatmul

The main characteristic of the above computation is that although the arrays $(ia,\ ja, a)$ are accessed sequentially, the arrays and may be accessed randomly through indirect addressing.

In Java, the kernel looks very similar:


static void spmatmul_double(    
                            int n, int m,
                            double[] a, int[] ia,int[] ja, 
                            double[] b, int[] ib,int[] jb,     
                            double[] c, int[] ic,int[] jc)
{
    int nz;
    int i,j,k,l,icol,icol_add;
    double aij;
    int neighbour;

    // extra space for FORTRAN like array indexing
    int[] mask = new int[m+1]; 

    // starting from one for FORTRAN like array index	
    for (l = 1; l <= m; l++) mask[l] = 0; 

    ic[0] = 1;
    nz = 0;

    // starting from one for FORTRAN like array indexing
    for (i = 1; i <= n; i++) {   
       for (j = ia[i]; j < ia[i+1]; j++){
	    neighbour = ja[j];
	    aij = a[j];
            for (k = ib[neighbour]; k < ib[neighbour+1]; k++){ 
                icol_add = jb[k];
                icol = mask[icol_add];
                if (icol ==  0) { 
                    jc[++nz] = icol_add;
                    c[nz] =  aij*b[k];
                    mask[icol_add] = nz;
                    }
                    else {
                    c[icol] +=  aij*b[k];
                }
            }
        }
        for (j = ic[i]; j < nz + 1; j++) mask[jc[j]] = 0;
        ic[i+1] = nz+1;
    }

}

Notice that because the matrix is stored such that the row and column entries always start from , in Java each array will be given one extra element so that the array addressing can start from . In C however it is possible to utilize the flexibility of pointer manipulation to avoid this extra element, as the following C code shows.


void spmatmul_double(
                     const int* n, const int* m,
                     const double *a, const int *ia,const int *ja, 
                     const double *b, const int *ib,const int *jb, 
                     double *c, int *ic,int *jc,
                     const int* ind_base)
{
  int nz;
  int i,j,k,icol,icol_add;
  const double *aij;
  const int *neighbour;
  int *mask, *pmask;

  mask = (int*) malloc( (*m) * sizeof(int));
  pmask = mask;

  for (i = 0; i != *m; i++) *pmask++ = 0;
  /* shift the index base for fortran like indexing: *ind_base=1 */
  ib -= *ind_base;
  jb -= *ind_base;
  b -= *ind_base;
  jc -= *ind_base;
  c -= *ind_base;
  mask -= *ind_base;

  aij = a;
  neighbour = ja;
  nz = -1+*ind_base;
  *ic = 1;
  for (i = 0; i != *n; i++) {
    for (j = ia[i]; j != ia[i+1]; j++){
      for (k = ib[*neighbour];k != ib[*neighbour+1];k++){ 
        icol_add = jb[k];
        icol = mask[icol_add];
        if (icol ==  0) 
          {
            jc[++nz] = icol_add;
            c[nz] =  (*aij)*b[k];
            mask[icol_add] = nz;
          }
        else
          {
            c[icol] +=  (*aij)*b[k];
           }
      }
      aij++;
      neighbour++;
    }
    for (j = *ic; j != nz + 1; j++) mask[jc[j]] = 0;
    *(++ic) = nz+1;
  }
  mask += *ind_base;
  free(mask);
  
}

Next: Performance for sparse matrix Up: The Sparse Matrix Benchmark Previous: I/O considerations

2000-08-16