Table 2 shows the I/O performance of Java compared with F90 and C on the three platforms. In this table and Table 3, the timing for the F90 (in the case of the Pentium it is the Absoft F90) is used as a benchmark. The timings for other compilers are divided by the timings for the F90 compiler, and averaged to give a score in the last row of the tables. With this measure, the F90 compiler always comes out with a score of 1. The lower the score, the better the algorithm.
On the Pentium system, the I/O performance of both versions of Java are about 25-40% worse than C, but are about three times better than the Absoft F90! The disappointing I/O performance of the Absoft compiler is however not inherent to F90. Using the Portland Group F90 compiler, the I/O performance is close to that of C. However, as we will see later, the computing performance of the Portland Group F90 compiler is rather disappointing. This was known from an earlier experience [3]. I/O performance of Java (Sun) is slightly (about 10%) better than that of Java (IBM).
On the Sun Ultra 80, it is surprising that the I/O of the C version is 70% slower than F90, while the Java took 3.5 times the I/O time of the F90.
On the IBM Power3, the F90 I/O took the least time, followed by Java and C, which are both about 60-75% slower.
One annoying feature of Java is that all Java Virtual Machines assume a certain fixed heap size (this being 30 MB on the IBM platform in our experiment), and for large problems one has to specify the amount of memory needed explicitly using a flag -mx<size>, which can be inconvenient since in our experiments we do not know in advance the amount of memory needed.
Pentium II | |||||||
matrices | abf90 | pgf90 | gcc | Java (IBM) | Java (Sun) | ||
af23560.mtx | 23560 | 484256 | 13.779 | 4.426 | 3.490 | 4.821 | 4.441 |
bcsstk30.mtx | 28924 | 1036208 | 29.921 | 9.564 | 7.530 | 9.445 | 9.336 |
e40r0000.mtx | 17281 | 553956 | 15.200 | 5.037 | 3.930 | 5.278 | 4.913 |
fidap011.mtx | 16614 | 1091362 | 30.345 | 9.787 | 7.720 | 10.673 | 10.090 |
fidapm11.mtx | 22294 | 623554 | 20.024 | 5.678 | 4.470 | 5.898 | 5.641 |
memplus.mtx | 17758 | 126150 | 3.745 | 1.096 | 0.880 | 1.840 | 1.112 |
qc2534.mtx | 2534 | 463360 | 13.360 | 3.891 | 3.110 | 3.766 | 3.697 |
s3dkt3m2.mtx | 90449 | 1921955 | 62.245 | 18.034 | 14.180 | 18.549 | 18.061 |
score | 1.000 | 0.306 | 0.242 | 0.341 | 0.304 |
Sun Ultra 80 | |||||
matrices | F90 | C | Java | ||
af23560.mtx | 23560 | 484256 | 0.989 | 1.720 | 3.560 |
bcsstk30.mtx | 28924 | 1036208 | 2.044 | 3.720 | 7.672 |
e40r0000.mtx | 17281 | 553956 | 1.106 | 1.910 | 3.966 |
fidap011.mtx | 16614 | 1091362 | 2.109 | 3.760 | 7.741 |
fidapm11.mtx | 22294 | 623554 | 1.263 | 2.210 | 4.552 |
memplus.mtx | 17758 | 126150 | 0.293 | 0.410 | 0.906 |
qc2534.mtx | 2534 | 463360 | 0.896 | 1.450 | 3.090 |
s3dkt3m2.mtx | 90449 | 1921955 | 3.831 | 7.100 | 4.386 |
score | 1.000 | 1.711 | 3.564 |
IBM Power3 | |||||
matrices | F90 | C | Java | ||
af23560.mtx | 23560 | 484256 | 1.970 | 3.510 | 3.247 |
bcsstk30.mtx | 28924 | 1036208 | 4.320 | 7.550 | 7.061 |
e40r0000.mtx | 17281 | 553956 | 2.300 | 4.020 | 3.624 |
fidap011.mtx | 16614 | 1091362 | 4.480 | 7.770 | 7.118 |
fidapm11.mtx | 22294 | 623554 | 2.700 | 4.510 | 4.168 |
memplus.mtx | 17758 | 126150 | 0.488 | 0.887 | 0.776 |
qc2534.mtx | 2534 | 463360 | 1.755 | 3.190 | 2.671 |
s3dkt3m2.mtx | 90449 | 1921955 | 8.280 | 14.310 | 13.528 |
score | 1.000 | 1.756 | 1.592 |
The compute performance for the sparse matrix multiplications is compared in Table 3. On the Pentium platform, the Absoft F90 compiler performed the best. It is interesting that the C version of the matrix multiplication performed just as well. The Java (IBM) version runs about 27% slower. The big disappointment is the Portland F90, which needed 75% more time (this is reduced to about 67% on another Linux system which has the newer 3.1-3 version of Portland F90)! Java (Sun) is around 35% slower than Java (IBM), and 70% slower than the Absoft F90.
On the Sun Ultra 80, the F90 and C versions have almost the same performance, but the Java version does not perform well at all, requiring on average 2.4 times the CPU time!
On the IBM Power3, the C version is 30% slower than the F90 version. The Java version is about 2.9 times slower! Note however from Table 1 that the Java compiler used is version 1.1.8, rather than the latest version 1.3 (beta). This is because the IBM Power3 used in this experiment runs under AIX 4.3.3.0. To install the Java version 1.3 (beta) would require an operating system upgrade to or above AIX 4.3.3.10, which we were unable to obtain permission to do. We would expect that with a newer Java compiler and JVM, the gap of computing performance may be closer. It is also worth noting that prior to benchmarking on the Power3 platform, we have experimented on a PowerPC Silver processor. It was found that the I/O performance of C and Java versions are no more than 25% slower than the F90 version, while in terms of the compute performance, C and Java are about 17% and 85% slower, respectively. In view of these, we do not understand why Java performed so poorly on the Power3. (Or: This may indicate that Java is not able to utilise the extra floating point pinelines that are available on the Power3 and the Sun Ultra80, but not available on the PowerPC and the Pentium II processors)
Pentium II | |||||||
matrices | abf90 | pgf90 | gcc | Java (IBM) | Java (Sun) | ||
af23560.mtx | 23560 | 484256 | 1.765 | 3.584 | 1.820 | 2.182 | 2.824 |
bcsstk30.mtx | 28924 | 1036208 | 5.034 | 8.600 | 5.780 | 6.484 | 9.319 |
e40r0000.mtx | 17281 | 553956 | 2.790 | 4.870 | 3.100 | 3.618 | 5.093 |
fidap011.mtx | 16614 | 1091362 | 8.997 | 14.110 | 10.730 | 12.358 | 16.948 |
fidapm11.mtx | 22294 | 623554 | 3.232 | 5.896 | 3.380 | 3.899 | 5.202 |
memplus.mtx | 17758 | 126150 | 2.631 | 4.719 | 2.390 | 2.863 | 3.757 |
qc2534.mtx | 2534 | 463360 | 10.266 | 12.508 | 12.760 | 13.348 | 19.072 |
s3dkt3m2.mtx | 90449 | 1921955 | 5.159 | 10.778 | 5.600 | 6.948 | 8.329 |
score | 1.000 | 1.747 | 1.096 | 1.267 | 1.709 |
Sun Ultra 80 | |||||
matrices | F90 | C | Java | ||
af23560.mtx | 23560 | 484256 | 1.081 | 1.070 | 2.529 |
bcsstk30.mtx | 28924 | 1036208 | 3.323 | 3.510 | 8.530 |
e40r0000.mtx | 17281 | 553956 | 1.862 | 1.940 | 4.751 |
fidap011.mtx | 16614 | 1091362 | 5.972 | 6.560 | 15.195 |
fidapm11.mtx | 22294 | 623554 | 2.063 | 2.080 | 4.428 |
memplus.mtx | 17758 | 126150 | 1.656 | 1.440 | 2.793 |
qc2534.mtx | 2534 | 463360 | 5.958 | 6.780 | 16.263 |
s3dkt3m2.mtx | 90449 | 1921955 | 3.368 | 3.340 | 8.849 |
score | 1.000 | 1.024 | 2.399 |
IBM Power3 | |||||
matrices | F90 | C | Java | ||
af23560.mtx | 23560 | 484256 | 0.380 | 0.482 | 1.085 |
bcsstk30.mtx | 28924 | 1036208 | 1.100 | 1.435 | 3.264 |
e40r0000.mtx | 17281 | 553956 | 0.598 | 0.783 | 1.796 |
fidap011.mtx | 16614 | 1091362 | 1.865 | 2.580 | 5.524 |
fidapm11.mtx | 22294 | 623554 | 0.730 | 0.907 | 1.935 |
memplus.mtx | 17758 | 126150 | 0.490 | 0.573 | 1.339 |
qc2534.mtx | 2534 | 463360 | 1.665 | 2.360 | 5.428 |
s3dkt3m2.mtx | 90449 | 1921955 | 1.220 | 1.575 | 3.513 |
score | 1.000 | 1.298 | 2.914 |