next up previous
Next: The SciMark2 Benchmark Up: The Sparse Matrix Benchmark Previous: The Computational Kernel

Performance for sparse matrix multiplication

Table 2 shows the I/O performance of Java compared with F90 and C on the three platforms. In this table and Table 3, the timing for the F90 (in the case of the Pentium it is the Absoft F90) is used as a benchmark. The timings for other compilers are divided by the timings for the F90 compiler, and averaged to give a score in the last row of the tables. With this measure, the F90 compiler always comes out with a score of 1. The lower the score, the better the algorithm.

On the Pentium system, the I/O performance of both versions of Java are about 25-40% worse than C, but are about three times better than the Absoft F90! The disappointing I/O performance of the Absoft compiler is however not inherent to F90. Using the Portland Group F90 compiler, the I/O performance is close to that of C. However, as we will see later, the computing performance of the Portland Group F90 compiler is rather disappointing. This was known from an earlier experience [3]. I/O performance of Java (Sun) is slightly (about 10%) better than that of Java (IBM).

On the Sun Ultra 80, it is surprising that the I/O of the C version is 70% slower than F90, while the Java took 3.5 times the I/O time of the F90.

On the IBM Power3, the F90 I/O took the least time, followed by Java and C, which are both about 60-75% slower.

One annoying feature of Java is that all Java Virtual Machines assume a certain fixed heap size (this being 30 MB on the IBM platform in our experiment), and for large problems one has to specify the amount of memory needed explicitly using a flag -mx<size>, which can be inconvenient since in our experiments we do not know in advance the amount of memory needed.



Table 2: Comparing the I/O performance (in seconds) of Java with F90 and C for reading in sparse matrices in MatrixMarket coordinate form as ASCII files.
Pentium II
matrices $n$ $nz$ abf90 pgf90 gcc Java (IBM) Java (Sun)
af23560.mtx 23560 484256 13.779 4.426 3.490 4.821 4.441
bcsstk30.mtx 28924 1036208 29.921 9.564 7.530 9.445 9.336
e40r0000.mtx 17281 553956 15.200 5.037 3.930 5.278 4.913
fidap011.mtx 16614 1091362 30.345 9.787 7.720 10.673 10.090
fidapm11.mtx 22294 623554 20.024 5.678 4.470 5.898 5.641
memplus.mtx 17758 126150 3.745 1.096 0.880 1.840 1.112
qc2534.mtx 2534 463360 13.360 3.891 3.110 3.766 3.697
s3dkt3m2.mtx 90449 1921955 62.245 18.034 14.180 18.549 18.061
score     1.000 0.306 0.242 0.341 0.304
Sun Ultra 80
matrices $n$ $nz$ F90 C Java
af23560.mtx 23560 484256 0.989 1.720 3.560
bcsstk30.mtx 28924 1036208 2.044 3.720 7.672
e40r0000.mtx 17281 553956 1.106 1.910 3.966
fidap011.mtx 16614 1091362 2.109 3.760 7.741
fidapm11.mtx 22294 623554 1.263 2.210 4.552
memplus.mtx 17758 126150 0.293 0.410 0.906
qc2534.mtx 2534 463360 0.896 1.450 3.090
s3dkt3m2.mtx 90449 1921955 3.831 7.100 4.386
score     1.000 1.711 3.564

IBM Power3
matrices $n$ $nz$ F90 C Java
af23560.mtx 23560 484256 1.970 3.510 3.247
bcsstk30.mtx 28924 1036208 4.320 7.550 7.061
e40r0000.mtx 17281 553956 2.300 4.020 3.624
fidap011.mtx 16614 1091362 4.480 7.770 7.118
fidapm11.mtx 22294 623554 2.700 4.510 4.168
memplus.mtx 17758 126150 0.488 0.887 0.776
qc2534.mtx 2534 463360 1.755 3.190 2.671
s3dkt3m2.mtx 90449 1921955 8.280 14.310 13.528
score     1.000 1.756 1.592


The compute performance for the sparse matrix multiplications is compared in Table 3. On the Pentium platform, the Absoft F90 compiler performed the best. It is interesting that the C version of the matrix multiplication performed just as well. The Java (IBM) version runs about 27% slower. The big disappointment is the Portland F90, which needed 75% more time (this is reduced to about 67% on another Linux system which has the newer 3.1-3 version of Portland F90)! Java (Sun) is around 35% slower than Java (IBM), and 70% slower than the Absoft F90.

On the Sun Ultra 80, the F90 and C versions have almost the same performance, but the Java version does not perform well at all, requiring on average 2.4 times the CPU time!

On the IBM Power3, the C version is 30% slower than the F90 version. The Java version is about 2.9 times slower! Note however from Table 1 that the Java compiler used is version 1.1.8, rather than the latest version 1.3 (beta). This is because the IBM Power3 used in this experiment runs under AIX 4.3.3.0. To install the Java version 1.3 (beta) would require an operating system upgrade to or above AIX 4.3.3.10, which we were unable to obtain permission to do. We would expect that with a newer Java compiler and JVM, the gap of computing performance may be closer. It is also worth noting that prior to benchmarking on the Power3 platform, we have experimented on a PowerPC Silver processor. It was found that the I/O performance of C and Java versions are no more than 25% slower than the F90 version, while in terms of the compute performance, C and Java are about 17% and 85% slower, respectively. In view of these, we do not understand why Java performed so poorly on the Power3. (Or: This may indicate that Java is not able to utilise the extra floating point pinelines that are available on the Power3 and the Sun Ultra80, but not available on the PowerPC and the Pentium II processors)



Table 3: Comparing the computing performance (in seconds) of Java with F90 and C for multiplying two sparse matrices.
Pentium II
matrices $n$ $nz$ abf90 pgf90 gcc Java (IBM) Java (Sun)
af23560.mtx 23560 484256 1.765 3.584 1.820 2.182 2.824
bcsstk30.mtx 28924 1036208 5.034 8.600 5.780 6.484 9.319
e40r0000.mtx 17281 553956 2.790 4.870 3.100 3.618 5.093
fidap011.mtx 16614 1091362 8.997 14.110 10.730 12.358 16.948
fidapm11.mtx 22294 623554 3.232 5.896 3.380 3.899 5.202
memplus.mtx 17758 126150 2.631 4.719 2.390 2.863 3.757
qc2534.mtx 2534 463360 10.266 12.508 12.760 13.348 19.072
s3dkt3m2.mtx 90449 1921955 5.159 10.778 5.600 6.948 8.329
score     1.000 1.747 1.096 1.267 1.709

Sun Ultra 80
matrices $n$ $nz$ F90 C Java
af23560.mtx 23560 484256 1.081 1.070 2.529
bcsstk30.mtx 28924 1036208 3.323 3.510 8.530
e40r0000.mtx 17281 553956 1.862 1.940 4.751
fidap011.mtx 16614 1091362 5.972 6.560 15.195
fidapm11.mtx 22294 623554 2.063 2.080 4.428
memplus.mtx 17758 126150 1.656 1.440 2.793
qc2534.mtx 2534 463360 5.958 6.780 16.263
s3dkt3m2.mtx 90449 1921955 3.368 3.340 8.849
score     1.000 1.024 2.399

IBM Power3
matrices $n$ $nz$ F90 C Java
af23560.mtx 23560 484256 0.380 0.482 1.085
bcsstk30.mtx 28924 1036208 1.100 1.435 3.264
e40r0000.mtx 17281 553956 0.598 0.783 1.796
fidap011.mtx 16614 1091362 1.865 2.580 5.524
fidapm11.mtx 22294 623554 0.730 0.907 1.935
memplus.mtx 17758 126150 0.490 0.573 1.339
qc2534.mtx 2534 463360 1.665 2.360 5.428
s3dkt3m2.mtx 90449 1921955 1.220 1.575 3.513
score     1.000 1.298 2.914



next up previous
Next: The SciMark2 Benchmark Up: The Sparse Matrix Benchmark Previous: The Computational Kernel

2000-08-16