This is a kind of following to the CUDA performance myth. There is a recent news on the java concurrent mailing list about SplittableRandom class proposed for JDK8. It is a new parallel random number generator a priori usable for Monte-Carlo simulations.
It seems to rely on some very recent algorithm. There are some a bit older ones: the ancestor, L'Ecuyer MRG32k3a that can be parallelized through relatively costless skipTo methods, a Mersenne Twister variant MTGP, and even the less rigourous XorWow popularized by NVidia CUDA.
The book GPU Computing Gems provides some interesting stats as to GPU vs CPU performance for various generators (L'Ecuyer, Sobol, and Mersenne Twister)