This is actually not true. I benchmarked 3 libraries, Colt (uses double), Apache Commons Math (uses double) and Jama (uses double cleverly). At first it looks like Jama has a similar performance as Colt (they avoid  slow access by a clever algorithm). But once hotspot hits, the difference is crazy and Jama becomes the fastest (Far ahead).
|JDK 1.6.0 Linux 1000x1000 matrix multiplication on Intel Q6600|
|loop index||Colt||Commons Math||Jama|
|We don't include matrix construction time, and fetching the result. Only the multiplication is taken into account.|
The difference is less pronounced on smaller matrices, but still there. Jama looks very good in this simple test case. In more real scenarios, the difference is not so obvious. For example Commons Math SVD is faster than Jama one.