Once the matrix is read in, it is converted into the internal data structure. Here the popular compacted sparse row format for storing sparse matrices is used. This comprises two integer arrays and one double precision array , where is of dimension and points to the start of each row in the arrays and . The array is of dimension and contains the column indices, while is also of dimension and contains the matrix entries.
The main kernel in the computational part of the benchmark is the code for the product of two matrices. In F90 this is of the following form, where the product is calculated. Each matrix is represented in the compacted sparse row format; for example, matrix is represented by arrays , matrix by arrays , and product matrix by arrays .
subroutine spmatmul(n,m,ia,ja,a,ib,jb,b,ic,jc,c) ! C = A*B integer (kind = myint), intent (in) :: n,m integer (kind = myint), intent (in) :: ia(*),ib(*),ja(*),jb(*) integer (kind = myint), intent (out) :: ic(*),jc(*) real (kind = myreal), intent (in) :: a(*),b(*) real (kind = myreal), intent (out) :: c(*) integer (kind = myint) :: mask(m),nz,i,j,k,icol,icol_add,neigh real (kind = myreal) :: aij ! initialise the mask array which is an array that has ! non-zero value if the column index already exist, in which ! case the value is the index of that column mask = 0 nz = 0 ic(1) = 1 do i=1,n do j = ia(i),ia(i+1)-1 aij = a(j) neigh = ja(j) do k = ib(neigh),ib(neigh+1)-1 icol_add = jb(k) icol = mask(icol_add) if (icol == 0) then nz = nz + 1 jc(nz) = icol_add c(nz) = aij*b(k) mask(icol_add) = nz ! add mask else c(icol) = c(icol) + aij*b(k) end if end do end do mask(jc(ic(i):nz)) = 0 ic(i+1) = nz+1 end do end subroutine spmatmul
The main characteristic of the above computation is that although the arrays are accessed sequentially, the arrays and may be accessed randomly through indirect addressing.
In Java, the kernel looks very similar:
static void spmatmul_double( int n, int m, double[] a, int[] ia,int[] ja, double[] b, int[] ib,int[] jb, double[] c, int[] ic,int[] jc) { int nz; int i,j,k,l,icol,icol_add; double aij; int neighbour; // extra space for FORTRAN like array indexing int[] mask = new int[m+1]; // starting from one for FORTRAN like array index for (l = 1; l <= m; l++) mask[l] = 0; ic[0] = 1; nz = 0; // starting from one for FORTRAN like array indexing for (i = 1; i <= n; i++) { for (j = ia[i]; j < ia[i+1]; j++){ neighbour = ja[j]; aij = a[j]; for (k = ib[neighbour]; k < ib[neighbour+1]; k++){ icol_add = jb[k]; icol = mask[icol_add]; if (icol == 0) { jc[++nz] = icol_add; c[nz] = aij*b[k]; mask[icol_add] = nz; } else { c[icol] += aij*b[k]; } } } for (j = ic[i]; j < nz + 1; j++) mask[jc[j]] = 0; ic[i+1] = nz+1; } }
Notice that because the matrix is stored such that the row and column entries always start from , in Java each array will be given one extra element so that the array addressing can start from . In C however it is possible to utilize the flexibility of pointer manipulation to avoid this extra element, as the following C code shows.
void spmatmul_double( const int* n, const int* m, const double *a, const int *ia,const int *ja, const double *b, const int *ib,const int *jb, double *c, int *ic,int *jc, const int* ind_base) { int nz; int i,j,k,icol,icol_add; const double *aij; const int *neighbour; int *mask, *pmask; mask = (int*) malloc( (*m) * sizeof(int)); pmask = mask; for (i = 0; i != *m; i++) *pmask++ = 0; /* shift the index base for fortran like indexing: *ind_base=1 */ ib -= *ind_base; jb -= *ind_base; b -= *ind_base; jc -= *ind_base; c -= *ind_base; mask -= *ind_base; aij = a; neighbour = ja; nz = -1+*ind_base; *ic = 1; for (i = 0; i != *n; i++) { for (j = ia[i]; j != ia[i+1]; j++){ for (k = ib[*neighbour];k != ib[*neighbour+1];k++){ icol_add = jb[k]; icol = mask[icol_add]; if (icol == 0) { jc[++nz] = icol_add; c[nz] = (*aij)*b[k]; mask[icol_add] = nz; } else { c[icol] += (*aij)*b[k]; } } aij++; neighbour++; } for (j = *ic; j != nz + 1; j++) mask[jc[j]] = 0; *(++ic) = nz+1; } mask += *ind_base; free(mask); }