Once the matrix is read in, it is converted into the internal
data structure. Here the popular compacted sparse row format
for storing sparse matrices is used.
This comprises two integer arrays and one double precision array
, where
is of dimension
and points to the start of each row
in the arrays
and
. The array
is of dimension
and
contains the column indices, while
is also of dimension
and contains the matrix entries.
The main kernel in the computational part of the benchmark is the code
for the product of two matrices. In F90 this is of the following form,
where the product is calculated. Each matrix is represented in
the compacted sparse row format; for example, matrix
is
represented by arrays
, matrix
by arrays
,
and product matrix
by arrays
.
subroutine spmatmul(n,m,ia,ja,a,ib,jb,b,ic,jc,c) ! C = A*B integer (kind = myint), intent (in) :: n,m integer (kind = myint), intent (in) :: ia(*),ib(*),ja(*),jb(*) integer (kind = myint), intent (out) :: ic(*),jc(*) real (kind = myreal), intent (in) :: a(*),b(*) real (kind = myreal), intent (out) :: c(*) integer (kind = myint) :: mask(m),nz,i,j,k,icol,icol_add,neigh real (kind = myreal) :: aij ! initialise the mask array which is an array that has ! non-zero value if the column index already exist, in which ! case the value is the index of that column mask = 0 nz = 0 ic(1) = 1 do i=1,n do j = ia(i),ia(i+1)-1 aij = a(j) neigh = ja(j) do k = ib(neigh),ib(neigh+1)-1 icol_add = jb(k) icol = mask(icol_add) if (icol == 0) then nz = nz + 1 jc(nz) = icol_add c(nz) = aij*b(k) mask(icol_add) = nz ! add mask else c(icol) = c(icol) + aij*b(k) end if end do end do mask(jc(ic(i):nz)) = 0 ic(i+1) = nz+1 end do end subroutine spmatmul
The main characteristic of the above computation is that although the arrays
are accessed sequentially, the arrays
and
may be accessed randomly through indirect addressing.
In Java, the kernel looks very similar:
static void spmatmul_double( int n, int m, double[] a, int[] ia,int[] ja, double[] b, int[] ib,int[] jb, double[] c, int[] ic,int[] jc) { int nz; int i,j,k,l,icol,icol_add; double aij; int neighbour; // extra space for FORTRAN like array indexing int[] mask = new int[m+1]; // starting from one for FORTRAN like array index for (l = 1; l <= m; l++) mask[l] = 0; ic[0] = 1; nz = 0; // starting from one for FORTRAN like array indexing for (i = 1; i <= n; i++) { for (j = ia[i]; j < ia[i+1]; j++){ neighbour = ja[j]; aij = a[j]; for (k = ib[neighbour]; k < ib[neighbour+1]; k++){ icol_add = jb[k]; icol = mask[icol_add]; if (icol == 0) { jc[++nz] = icol_add; c[nz] = aij*b[k]; mask[icol_add] = nz; } else { c[icol] += aij*b[k]; } } } for (j = ic[i]; j < nz + 1; j++) mask[jc[j]] = 0; ic[i+1] = nz+1; } }
Notice that because the matrix is stored such that the row and column entries
always start from , in Java each array will be given one extra element
so that the array addressing can start from
. In C however
it is possible to utilize the flexibility of
pointer manipulation to avoid this extra element, as the following C code
shows.
void spmatmul_double( const int* n, const int* m, const double *a, const int *ia,const int *ja, const double *b, const int *ib,const int *jb, double *c, int *ic,int *jc, const int* ind_base) { int nz; int i,j,k,icol,icol_add; const double *aij; const int *neighbour; int *mask, *pmask; mask = (int*) malloc( (*m) * sizeof(int)); pmask = mask; for (i = 0; i != *m; i++) *pmask++ = 0; /* shift the index base for fortran like indexing: *ind_base=1 */ ib -= *ind_base; jb -= *ind_base; b -= *ind_base; jc -= *ind_base; c -= *ind_base; mask -= *ind_base; aij = a; neighbour = ja; nz = -1+*ind_base; *ic = 1; for (i = 0; i != *n; i++) { for (j = ia[i]; j != ia[i+1]; j++){ for (k = ib[*neighbour];k != ib[*neighbour+1];k++){ icol_add = jb[k]; icol = mask[icol_add]; if (icol == 0) { jc[++nz] = icol_add; c[nz] = (*aij)*b[k]; mask[icol_add] = nz; } else { c[icol] += (*aij)*b[k]; } } aij++; neighbour++; } for (j = *ic; j != nz + 1; j++) mask[jc[j]] = 0; *(++ic) = nz+1; } mask += *ind_base; free(mask); }