Java is now widely recognized as a good object-oriented language for writing portable programs quickly. However, its penetration to computationally intensive numerical calculation is still low. One of the main reasons is its poor performance, or the perception of it, for such numerically intensive computing.
However, Java, as a programming language that is supposed to be ``written once and run everywhere'', is very attractive for scientific and engineering calculations that involve many researchers using different platforms. With Java's strong connections to Internet technology, the potential of running applications over the Internet, either for server/client side computing, or in terms of using computing resources over the Internet as a Computational Grid, is enormous.
That Java can suffer from performance problems is perhaps not at all surprising. Java is not designed for numerical computing, rather, it is a truly object oriented language which aims to achieve bit-for-bit reproducibility of results on different platforms, safety of execution, and ease of programming and testing. As a result, everything apart from very primitive types is an object, with the associated overhead of handling objects. Furthermore, access to array elements is subject to expensive bound checking and null pointer checking. Array objects are also not assumed to occupy a contiguous section of the memory, which is bad for optimal cache usage. Besides, Java is not allowed to take advantage of some special features of hardware, such as the fused multiply-add instruction found on the IBM POWER architecture.
In addition to the above, compiler technology takes time to mature. Compared with Fortran and C, Java is still relatively new. Java also differs from C and Fortran in that Java codes are first compiled into bytecodes, and then interpreted on any platform using a Java Virtual Machine (JVM). This is similar to what other interpreted languages do, such as Basic and Perl. Java is therefore optimized at run time, rather than at compile time.
There are however already major advances in the Java compiler technology. One of these technologies is the JIT (Just-In-Time) compiler. When a JIT is present, the Java Virtual Machine hands the bytecodes to JIT, which in turn compiles them into native code for the platform, and runs the resulting executables. The JIT is an integral part of the Java Virtual Machine, and is transparent to the users. Since Java is a dynamic language, the JIT is really ``just-in-time'', and compiles methods on a method by method basis just before they are called. There has also been effort in reducing the amount of array bound check through some clever transformations of code .
Not long ago, Java could only achieve say 20% of the performance of Fortran on a good day. With the introduction of new compiler technologies such as JIT and HotSpot (http://developer.java.sun.com/developer/technicalArticles/Networking/HotSpot/index.html), there have been an increasing number of reports of Java compilers giving applications with comparable performance to statically compiled languages such as C (e.g., ).
It is therefore our intention in this report to have a close look at the Java performance issue as it stands today, and compare it with Fortran 90 (F90) and C on benchmarks that are important for scientific and engineering applications. This is of course a task that is impossible to achieve fully given the varying kernels dominating different applications. These may range from dense matrix calculations, to sparse matrix operations, to the solution of eigenvalue problems, or even repeated evaluations of elementary or special functions. We therefore decided to restrict our comparison to two benchmarks.
For our first study, we compare the performance of JAVA with C and F90 for sparse matrix based calculations. Sparse matrices appear frequently in large-scale scientific and engineering applications and the ability of JAVA to handle such sparse systems efficiently is of vital importance to the usefulness of this language for these applications. Our sparse benchmark compares the three languages in terms of the speed for sparse matrix multiplications. I/O speed is also tested.
For our second study, we measure the performance of JAVA against C using the SciMark2 benchmark (http://math.nist.gov/scimark2/), which contains a variety of applications including FFT, dense LU factorization, and sparse matrix vector products.
It is worth noting that since Java 2 (version 1.2 onwards), there are two floating-point modes - strictfp and default. The strictfp mode, which applies to classes or methods with the strictfp keyword, corresponds to the original (Java 1) floating-point semantics. Although this mode enforces bit-for-bit reproducibility of results across JVMs, it could lead to severe performance deterioration when implemented on Intel Pentium like processors. The registers of these processors operate using IEEE 754's 80-bit double-extended format, therefore under the strictfp mode, both the fractional part and the exponent part have to be truncated to IEEE 754 64-bit double format at a great cost. For performance consideration, the default mode was therefore not strictfp anymore. All the benchmarks in this report are run under the default mode, although as far as the authors understand, no JVM has implemented strictfp mode yet.