I just stumbled upon Julia, a new programming language aimed at numerical computation. It's quite new but it looks very interesting, with the promise of C like performance (thanks to LLVM compilation) with a much nicer syntax and parallelization features.
Out of curiosity, I looked at their cumulative normal distribution implementation. I found that the (complimentary) error function (directly related to the cumulative normal distribution) algorithm relies on an algorithm that can be found in the Faddeeva library. I had not heard of this algorithm or this library before, but the author, Steven G. Johnson, claims it is faster and as precise as Cody & SLATEC implementations. As I previously had a look at those algorithms and was quite impressed by Cody's implementation.
The source of Faddeeva shows a big list (100) of Chebychev expansions for various ranges of a normalized error function. I slightly modified the Faddeva code to compute directly the cumulative normal distribution, avoiding some exp(-x*x)*exp(x*x) calls on the way.
Is it as accurate? I compared against a high precision implementation as in my previous test of cumulative normal distribution algorithms. And after replacing the exp(-x*x) with Cody's trick to compute it with higher accuracy, here is how it looks (referenced as "Johnson").
I also measured performance on various ranges, and found out that this Johnson algorithm is around 2x faster than Cody (in Scala) and 30% faster than my optimization of Cody (using a table of exponentials for Cody's trick).