Performance for sparse matrix multiplication

Table 2 shows the I/O performance of Java compared with F90 and C on the three platforms. In this table and Table 3, the timing for the F90 (in the case of the Pentium it is the Absoft F90) is used as a benchmark. The timings for other compilers are divided by the timings for the F90 compiler, and averaged to give a score in the last row of the tables. With this measure, the F90 compiler always comes out with a score of 1. The lower the score, the better the algorithm.

On the Pentium system, the I/O performance of both versions of Java are about 25-40% worse than C, but are about three times better than the Absoft F90! The disappointing I/O performance of the Absoft compiler is however not inherent to F90. Using the Portland Group F90 compiler, the I/O performance is close to that of C. However, as we will see later, the computing performance of the Portland Group F90 compiler is rather disappointing. This was known from an earlier experience [3]. I/O performance of Java (Sun) is slightly (about 10%) better than that of Java (IBM).

On the Sun Ultra 80, it is surprising that the I/O of the C version is 70% slower than F90, while the Java took 3.5 times the I/O time of the F90.

On the IBM Power3, the F90 I/O took the least time, followed by Java and C, which are both about 60-75% slower.

One annoying feature of Java is that all Java Virtual Machines assume a certain fixed heap size (this being 30 MB on the IBM platform in our experiment), and for large problems one has to specify the amount of memory needed explicitly using a flag -mx<size>, which can be inconvenient since in our experiments we do not know in advance the amount of memory needed.

**Table 2:** Comparing the I/O performance (in seconds) of Java with F90 and C for reading in sparse matrices in MatrixMarket coordinate form as ASCII files.
Pentium II
matrices			abf90	pgf90	gcc	Java (IBM)	Java (Sun)
af23560.mtx	23560	484256	13.779	4.426	3.490	4.821	4.441
bcsstk30.mtx	28924	1036208	29.921	9.564	7.530	9.445	9.336
e40r0000.mtx	17281	553956	15.200	5.037	3.930	5.278	4.913
fidap011.mtx	16614	1091362	30.345	9.787	7.720	10.673	10.090
fidapm11.mtx	22294	623554	20.024	5.678	4.470	5.898	5.641
memplus.mtx	17758	126150	3.745	1.096	0.880	1.840	1.112
qc2534.mtx	2534	463360	13.360	3.891	3.110	3.766	3.697
s3dkt3m2.mtx	90449	1921955	62.245	18.034	14.180	18.549	18.061
score			1.000	0.306	0.242	0.341	0.304

Sun Ultra 80
matrices			F90	C	Java
af23560.mtx	23560	484256	0.989	1.720	3.560
bcsstk30.mtx	28924	1036208	2.044	3.720	7.672
e40r0000.mtx	17281	553956	1.106	1.910	3.966
fidap011.mtx	16614	1091362	2.109	3.760	7.741
fidapm11.mtx	22294	623554	1.263	2.210	4.552
memplus.mtx	17758	126150	0.293	0.410	0.906
qc2534.mtx	2534	463360	0.896	1.450	3.090
s3dkt3m2.mtx	90449	1921955	3.831	7.100	4.386
score			1.000	1.711	3.564

IBM Power3
matrices			F90	C	Java
af23560.mtx	23560	484256	1.970	3.510	3.247
bcsstk30.mtx	28924	1036208	4.320	7.550	7.061
e40r0000.mtx	17281	553956	2.300	4.020	3.624
fidap011.mtx	16614	1091362	4.480	7.770	7.118
fidapm11.mtx	22294	623554	2.700	4.510	4.168
memplus.mtx	17758	126150	0.488	0.887	0.776
qc2534.mtx	2534	463360	1.755	3.190	2.671
s3dkt3m2.mtx	90449	1921955	8.280	14.310	13.528
score			1.000	1.756	1.592

The compute performance for the sparse matrix multiplications is compared in Table 3. On the Pentium platform, the Absoft F90 compiler performed the best. It is interesting that the C version of the matrix multiplication performed just as well. The Java (IBM) version runs about 27% slower. The big disappointment is the Portland F90, which needed 75% more time (this is reduced to about 67% on another Linux system which has the newer 3.1-3 version of Portland F90)! Java (Sun) is around 35% slower than Java (IBM), and 70% slower than the Absoft F90.

On the Sun Ultra 80, the F90 and C versions have almost the same performance, but the Java version does not perform well at all, requiring on average 2.4 times the CPU time!

On the IBM Power3, the C version is 30% slower than the F90 version. The Java version is about 2.9 times slower! Note however from Table 1 that the Java compiler used is version 1.1.8, rather than the latest version 1.3 (beta). This is because the IBM Power3 used in this experiment runs under AIX 4.3.3.0. To install the Java version 1.3 (beta) would require an operating system upgrade to or above AIX 4.3.3.10, which we were unable to obtain permission to do. We would expect that with a newer Java compiler and JVM, the gap of computing performance may be closer. It is also worth noting that prior to benchmarking on the Power3 platform, we have experimented on a PowerPC Silver processor. It was found that the I/O performance of C and Java versions are no more than 25% slower than the F90 version, while in terms of the compute performance, C and Java are about 17% and 85% slower, respectively. In view of these, we do not understand why Java performed so poorly on the Power3. (Or: This may indicate that Java is not able to utilise the extra floating point pinelines that are available on the Power3 and the Sun Ultra80, but not available on the PowerPC and the Pentium II processors)

**Table 3:** Comparing the computing performance (in seconds) of Java with F90 and C for multiplying two sparse matrices.
Pentium II
matrices			abf90	pgf90	gcc	Java (IBM)	Java (Sun)
af23560.mtx	23560	484256	1.765	3.584	1.820	2.182	2.824
bcsstk30.mtx	28924	1036208	5.034	8.600	5.780	6.484	9.319
e40r0000.mtx	17281	553956	2.790	4.870	3.100	3.618	5.093
fidap011.mtx	16614	1091362	8.997	14.110	10.730	12.358	16.948
fidapm11.mtx	22294	623554	3.232	5.896	3.380	3.899	5.202
memplus.mtx	17758	126150	2.631	4.719	2.390	2.863	3.757
qc2534.mtx	2534	463360	10.266	12.508	12.760	13.348	19.072
s3dkt3m2.mtx	90449	1921955	5.159	10.778	5.600	6.948	8.329
score			1.000	1.747	1.096	1.267	1.709

Sun Ultra 80
matrices			F90	C	Java
af23560.mtx	23560	484256	1.081	1.070	2.529
bcsstk30.mtx	28924	1036208	3.323	3.510	8.530
e40r0000.mtx	17281	553956	1.862	1.940	4.751
fidap011.mtx	16614	1091362	5.972	6.560	15.195
fidapm11.mtx	22294	623554	2.063	2.080	4.428
memplus.mtx	17758	126150	1.656	1.440	2.793
qc2534.mtx	2534	463360	5.958	6.780	16.263
s3dkt3m2.mtx	90449	1921955	3.368	3.340	8.849
score			1.000	1.024	2.399

IBM Power3
matrices			F90	C	Java
af23560.mtx	23560	484256	0.380	0.482	1.085
bcsstk30.mtx	28924	1036208	1.100	1.435	3.264
e40r0000.mtx	17281	553956	0.598	0.783	1.796
fidap011.mtx	16614	1091362	1.865	2.580	5.524
fidapm11.mtx	22294	623554	0.730	0.907	1.935
memplus.mtx	17758	126150	0.490	0.573	1.339
qc2534.mtx	2534	463360	1.665	2.360	5.428
s3dkt3m2.mtx	90449	1921955	1.220	1.575	3.513
score			1.000	1.298	2.914