Friday, December 18, 2009
Java & 3D Surface
I am quite shocked that something that has been in Excel for 15 years is still not available in Java. And it's not easy to make your own.
Java & 3D Surface
I am quite shocked that something that has been in Excel for 15 years is still not available in Java. And it's not easy to make your own.
Thursday, November 26, 2009
double[][] Is Fine
JDK 1.6.0 Linux 1000x1000 matrix multiplication on Intel Q6600 | ||||||||||
loop index | Colt | Commons Math | Jama | |||||||
1 | 11.880748 | 24.455125 | 9.828977 | |||||||
2 | 11.874975 | 24.265102 | 9.848916 | |||||||
3 | 9.772616 | 14.374153 | 9.826572 | |||||||
4 | 9.759679 | 14.368105 | 2.655915 | |||||||
5 | 9.799622 | 15.238928 | 2.649129 | |||||||
6 | 9.780556 | 14.741863 | 2.668104 | |||||||
7 | 9.72831 | 15.509909 | 2.646811 | |||||||
8 | 9.79838 | 15.724348 | 2.646069 | |||||||
9 | 9.726143 | 15.988762 | 2.646052 | |||||||
10 | 9.784505 | 15.121782 | 2.644572 | |||||||
We don't include matrix construction time, and fetching the result. Only the multiplication is taken into account. |
The difference is less pronounced on smaller matrices, but still there. Jama looks very good in this simple test case. In more real scenarios, the difference is not so obvious. For example Commons Math SVD is faster than Jama one.
double[][] Is Fine
JDK 1.6.0 Linux 1000x1000 matrix multiplication on Intel Q6600 | ||||||||||
loop index | Colt | Commons Math | Jama | |||||||
1 | 11.880748 | 24.455125 | 9.828977 | |||||||
2 | 11.874975 | 24.265102 | 9.848916 | |||||||
3 | 9.772616 | 14.374153 | 9.826572 | |||||||
4 | 9.759679 | 14.368105 | 2.655915 | |||||||
5 | 9.799622 | 15.238928 | 2.649129 | |||||||
6 | 9.780556 | 14.741863 | 2.668104 | |||||||
7 | 9.72831 | 15.509909 | 2.646811 | |||||||
8 | 9.79838 | 15.724348 | 2.646069 | |||||||
9 | 9.726143 | 15.988762 | 2.646052 | |||||||
10 | 9.784505 | 15.121782 | 2.644572 | |||||||
We don't include matrix construction time, and fetching the result. Only the multiplication is taken into account. |
The difference is less pronounced on smaller matrices, but still there. Jama looks very good in this simple test case. In more real scenarios, the difference is not so obvious. For example Commons Math SVD is faster than Jama one.
The Pain of Java Matrix Libraries
Sure, there is Apache Commons Math, but it is still changing a lot, and it is not very performance optimized yet, while it has been active for several years already. There is also Java3D, it does Matrix through GMatrix, but not much linear algebra and if you look at their implementation, it is very basic, not performance oriented.
The other candidates seem to be 1-man projects that can disappear any other day (some of them look quite good like ojalgo, most of them are not interesting). Then you also have the serious but not maintained anymore Cern Colt library.
Compared to C/C++, the Java world is worrying if you want to do maths.
In those libraries, a dense matrix of double can be implemented two ways:
- by maintaining internally a double[][]. Usually those libraries allow for not copying the array, so it can be neat if your interfaces have this kind of arrays.
- by maintaining internally a double[]. The reason is for performance, but then each time you build a matrix from a double[][], an expensive copy will happen. So you need to use the Matrix object in your interfaces instead of double[][].
The Pain of Java Matrix Libraries
Sure, there is Apache Commons Math, but it is still changing a lot, and it is not very performance optimized yet, while it has been active for several years already. There is also Java3D, it does Matrix through GMatrix, but not much linear algebra and if you look at their implementation, it is very basic, not performance oriented.
The other candidates seem to be 1-man projects that can disappear any other day (some of them look quite good like ojalgo, most of them are not interesting). Then you also have the serious but not maintained anymore Cern Colt library.
Compared to C/C++, the Java world is worrying if you want to do maths.
In those libraries, a dense matrix of double can be implemented two ways:
- by maintaining internally a double[][]. Usually those libraries allow for not copying the array, so it can be neat if your interfaces have this kind of arrays.
- by maintaining internally a double[]. The reason is for performance, but then each time you build a matrix from a double[][], an expensive copy will happen. So you need to use the Matrix object in your interfaces instead of double[][].
Wednesday, August 19, 2009
Java Calendar Is Broken On JVM Upgrade
The reason behind this is the daylight saving time conventions. An old JVM won't necessarily have the same daylight saving time for a given TimeZone than a latest JVM, and therefore will interpret the date differently.
Here is the output of a very simple program on 2 different JVMs. The number is the long number used to represent the date. I only use SimpleDateFormat with different TimeZone:
JVM 1.5.0_12
loading date=Sat Jul 18 06:59:36 CEST 2009, number=1247893176505
using formatter in EST: 7/17/09 11:59 PM
using formatter in Asia/Singapore: 7/18/09 12:59 PM
JVM 1.5.0_20
loading date=Sat Jul 18 06:59:36 CEST 2009, number=1247893176505
using formatter in EST: 7/18/09 12:59 AM
using formatter in Asia/Singapore: 7/18/09 12:59 PM
The source code:
What does this mean? This means that if you entered in a GUI in the first JVM a particular date & time using EST time zone. This will change when you read back in the second JVM.
public class DateBug {
private static String FILE_NAME = "datebug.txt";
public static void load() throws IOException {
FileReader fr = new FileReader(FILE_NAME);
BufferedReader br = new BufferedReader(fr);
String l = br.readLine();
br.close();
long time = new Long(l);
Date d = new Date(time);
System.out.println("loading date="+d+", number="+d.getTime());
SimpleDateFormat formatter = new SimpleDateFormat();
formatter.setTimeZone(TimeZone.getTimeZone("EST"));
System.out.println("using formatter in EST: "+formatter.format(d));
formatter.setTimeZone(TimeZone.getTimeZone("Asia/Singapore"));
System.out.println("using formatter in Asia/Singapore: "+formatter.format(d));
}
public static void saveNew() throws IOException {
Calendar c = Calendar.getInstance(TimeZone.getTimeZone("EST"));
c.set(2009, 06, 17, 23, 59);
Date d = c.getTime();
System.out.println("saving date="+d+", number="+d.getTime());
SimpleDateFormat formatter = new SimpleDateFormat();
formatter.setTimeZone(TimeZone.getTimeZone("EST"));
System.out.println("using formatter in EST: "+formatter.format(d));
formatter.setTimeZone(TimeZone.getTimeZone("Asia/Singapore"));
System.out.println("using formatter in Asia/Singapore: "+formatter.format(d));
FileWriter fw = new FileWriter(FILE_NAME);
PrintWriter pw = new PrintWriter(fw);
pw.println(d.getTime());
pw.close();
}
public static void main(String[] args) throws IOException {
System.out.println("JVM "+System.getProperty("java.version"));
if (args.length == 1) {
if (args[0].equals("save")) {
saveNew();
}
} else {
load();
}
}
This suggests that if you want to keep the same dates, you are better off saving in UTC where daylight saving time is not used and trick DateFormat. But I have to say this looks quite ugly.
Java Calendar Is Broken On JVM Upgrade
The reason behind this is the daylight saving time conventions. An old JVM won't necessarily have the same daylight saving time for a given TimeZone than a latest JVM, and therefore will interpret the date differently.
Here is the output of a very simple program on 2 different JVMs. The number is the long number used to represent the date. I only use SimpleDateFormat with different TimeZone:
JVM 1.5.0_12
loading date=Sat Jul 18 06:59:36 CEST 2009, number=1247893176505
using formatter in EST: 7/17/09 11:59 PM
using formatter in Asia/Singapore: 7/18/09 12:59 PM
JVM 1.5.0_20
loading date=Sat Jul 18 06:59:36 CEST 2009, number=1247893176505
using formatter in EST: 7/18/09 12:59 AM
using formatter in Asia/Singapore: 7/18/09 12:59 PM
The source code:
What does this mean? This means that if you entered in a GUI in the first JVM a particular date & time using EST time zone. This will change when you read back in the second JVM.
public class DateBug {
private static String FILE_NAME = "datebug.txt";
public static void load() throws IOException {
FileReader fr = new FileReader(FILE_NAME);
BufferedReader br = new BufferedReader(fr);
String l = br.readLine();
br.close();
long time = new Long(l);
Date d = new Date(time);
System.out.println("loading date="+d+", number="+d.getTime());
SimpleDateFormat formatter = new SimpleDateFormat();
formatter.setTimeZone(TimeZone.getTimeZone("EST"));
System.out.println("using formatter in EST: "+formatter.format(d));
formatter.setTimeZone(TimeZone.getTimeZone("Asia/Singapore"));
System.out.println("using formatter in Asia/Singapore: "+formatter.format(d));
}
public static void saveNew() throws IOException {
Calendar c = Calendar.getInstance(TimeZone.getTimeZone("EST"));
c.set(2009, 06, 17, 23, 59);
Date d = c.getTime();
System.out.println("saving date="+d+", number="+d.getTime());
SimpleDateFormat formatter = new SimpleDateFormat();
formatter.setTimeZone(TimeZone.getTimeZone("EST"));
System.out.println("using formatter in EST: "+formatter.format(d));
formatter.setTimeZone(TimeZone.getTimeZone("Asia/Singapore"));
System.out.println("using formatter in Asia/Singapore: "+formatter.format(d));
FileWriter fw = new FileWriter(FILE_NAME);
PrintWriter pw = new PrintWriter(fw);
pw.println(d.getTime());
pw.close();
}
public static void main(String[] args) throws IOException {
System.out.println("JVM "+System.getProperty("java.version"));
if (args.length == 1) {
if (args[0].equals("save")) {
saveNew();
}
} else {
load();
}
}
This suggests that if you want to keep the same dates, you are better off saving in UTC where daylight saving time is not used and trick DateFormat. But I have to say this looks quite ugly.
Monday, August 17, 2009
Implicit Finite Differences Method For Pricing Barrier Option
While trying to price a simple knock down and out barrier option, I encountered several difficulties I did not expect with the implicit finite differences method. The explicit method has less issues with barrier options pricing. I will show here what the tricky parts are and why explicit seems simpler in this case.
The full article is here (pdf) or here (html) (the later is not very well formatted).
Algorithms used in this article can be found at http://code.google.com/p/javamc/
Implicit Finite Differences Method For Pricing Barrier Option
While trying to price a simple knock down and out barrier option, I encountered several difficulties I did not expect with the implicit finite differences method. The explicit method has less issues with barrier options pricing. I will show here what the tricky parts are and why explicit seems simpler in this case.
The full article is here (pdf) or here (html) (the later is not very well formatted).
Algorithms used in this article can be found at http://code.google.com/p/javamc/
Saturday, June 27, 2009
Pulseaudio Nightmares - Pure ALSA to the Rescue
One thing works wonderfully, pure ALSA. To have multilple apps sharing ALSA, I just use dmix. As I use digital out, there is no mixer, but ALSA can provide one through softvol. It works really well. ALSA is already not that simple to configure/setup properly, but with pulseaudio on top, welcome to your worst configuration nightmares.
Here is the .asoundrc I use:
pcm.amix {
type dmix
ipc_key 50557
slave {
pcm "hw:0,1"
period_time 0
period_size 1024
buffer_size 8192
}
bindings {
0 0
1 1
}
}
pcm.softvol {
type softvol
slave {
pcm "amix" #redirect the output to dmix (instead of "hw:0,0")
}
control {
name "PCM" #override the PCM slider to set the softvol volume level globally
card 0
}
}
pcm.!default {
type plug
slave.pcm "softvol" #make use of softvol
}
Pulseaudio Nightmares - Pure ALSA to the Rescue
One thing works wonderfully, pure ALSA. To have multilple apps sharing ALSA, I just use dmix. As I use digital out, there is no mixer, but ALSA can provide one through softvol. It works really well. ALSA is already not that simple to configure/setup properly, but with pulseaudio on top, welcome to your worst configuration nightmares.
Here is the .asoundrc I use:
pcm.amix {
type dmix
ipc_key 50557
slave {
pcm "hw:0,1"
period_time 0
period_size 1024
buffer_size 8192
}
bindings {
0 0
1 1
}
}
pcm.softvol {
type softvol
slave {
pcm "amix" #redirect the output to dmix (instead of "hw:0,0")
}
control {
name "PCM" #override the PCM slider to set the softvol volume level globally
card 0
}
}
pcm.!default {
type plug
slave.pcm "softvol" #make use of softvol
}
Monday, June 15, 2009
Java int Overflow Behavior
can we rely that int x = Integer.MAX_VALUE + 1 is the same for every JVM on any platform?
I thought the answer would be easy to find in the Java specifications document. But I was wrong. It is not clearly defined.
I found a trick that suggests this behavior is indeed standard and will stay the same, it is related to type casting. Java guarantees that the cast of a long just truncates the long to int precision. Therefore if
long l = Integer.MAX_VALUE;
l = l +1;
x = (int)l;
x has a guaranteed value.
http://java.sun.com/docs/books/jls/third_edition/html/conversions.html
paragraph 5.1.3
Java int Overflow Behavior
can we rely that int x = Integer.MAX_VALUE + 1 is the same for every JVM on any platform?
I thought the answer would be easy to find in the Java specifications document. But I was wrong. It is not clearly defined.
I found a trick that suggests this behavior is indeed standard and will stay the same, it is related to type casting. Java guarantees that the cast of a long just truncates the long to int precision. Therefore if
long l = Integer.MAX_VALUE;
l = l +1;
x = (int)l;
x has a guaranteed value.
http://java.sun.com/docs/books/jls/third_edition/html/conversions.html
paragraph 5.1.3
Static Fields and Inheritance
class Toto extends TotoParent{
final static Toto a = new Toto ("a");
public Toto(String a){
super(a);
}
}
import java.util.ArrayList;
import java.util.List;
public abstract class TotoParent {
static List list = new ArrayList();
public TotoParent(String a) {
list.add(a);
}
protected static List get() {
return list;
}
}
import org.junit.Test;
import static org.junit.Assert.*;
public class TotoTest {
@Test
public void testGet(){
assertEquals(1, Toto.get().size());
}
}
I am quite used to static initialization, and would have answered the same as the first answer in the thread:"Get is static and associated with TotoParent, so that is the same as calling TotoParent.get().size()". I would have even thought that the compiler would compile the call Toto.get() to TotoParent.get(). But running javap, you can see it is still compiled as TotoParent.get(). So there is still a lookup done. This is why the first answer is actually not that correct.
The important bit here is that Toto is never initialized, even if we call Toto.get(). The java specs (invaluable reference) explains clearly that calling a static method not declared in the class does not initialize the class.
Calling Toto.get() is not exactly the same as calling TotoParent.get().
If TotoParent.get() called another TotoSuperParent.get():
Toto.get() -> TotoParent.get() -> TotoSuperParent.get()
We compile then later we change to make TotoParent have a specific implementation of get(). Toto will then be automatically aware of it, without even recompiling it.
http://java.sun.com/docs/
paragraph 12.4.1
Static Fields and Inheritance
class Toto extends TotoParent{
final static Toto a = new Toto ("a");
public Toto(String a){
super(a);
}
}
import java.util.ArrayList;
import java.util.List;
public abstract class TotoParent {
static List list = new ArrayList();
public TotoParent(String a) {
list.add(a);
}
protected static List get() {
return list;
}
}
import org.junit.Test;
import static org.junit.Assert.*;
public class TotoTest {
@Test
public void testGet(){
assertEquals(1, Toto.get().size());
}
}
I am quite used to static initialization, and would have answered the same as the first answer in the thread:"Get is static and associated with TotoParent, so that is the same as calling TotoParent.get().size()". I would have even thought that the compiler would compile the call Toto.get() to TotoParent.get(). But running javap, you can see it is still compiled as TotoParent.get(). So there is still a lookup done. This is why the first answer is actually not that correct.
The important bit here is that Toto is never initialized, even if we call Toto.get(). The java specs (invaluable reference) explains clearly that calling a static method not declared in the class does not initialize the class.
Calling Toto.get() is not exactly the same as calling TotoParent.get().
If TotoParent.get() called another TotoSuperParent.get():
Toto.get() -> TotoParent.get() -> TotoSuperParent.get()
We compile then later we change to make TotoParent have a specific implementation of get(). Toto will then be automatically aware of it, without even recompiling it.
http://java.sun.com/docs/
paragraph 12.4.1
Wednesday, June 03, 2009
Benchmarking Languages Is Difficult
Looking at the existing Java implementation for the test I decided to try to submit the tricky one using a pool of thread, and pooling message processing rather creating 1 thread per node. To my surprise, it was accepted without questions and I did have the best score for a Java program for a while. Shortly after someone else copied my program and got rid of various stuff not useful for the particular benchmark (breaking the interesting part of the design) and got accepted as well with of course a better result.
I decided to see if I could make an even more silly program - tailored for the test only. I managed to be orders of magnitude faster - 1 thread, no synchronization, everything processed in a FIFO (linkedlist) queue. This is actually a standard way to reimplement recursion. But I was honest enough not to hide that I consider that kind of program to cheat the test and got my entry in the "interesting alternatives".
In reality there is no difference in the "cheating" between my new program and the program that got accepted in the official list, they both cheat by using only 1 thread and process everything 1 by 1. There is not 1 thread per node in any of the program, and they can avoid any concurrency issues. One "looks" better because it uses a pool of 503 threads (but really use only 1 or 2 threads) and the other does not hide its use of 1 thread for processing. But this is not evident to people accepting the programs.
When I look at the haskell code, I can not really tell if it is creating 503 threads in the language or a pool or ..., you have to know each language quite well and sometimes it is not that easy to define what cheating is. Therefore this kind of benchmark is a bit disappointing. One should force the use of the same algorithm. But can you do so (a functional language won't use the same algo as a procedural one)?
Benchmarking Languages Is Difficult
Looking at the existing Java implementation for the test I decided to try to submit the tricky one using a pool of thread, and pooling message processing rather creating 1 thread per node. To my surprise, it was accepted without questions and I did have the best score for a Java program for a while. Shortly after someone else copied my program and got rid of various stuff not useful for the particular benchmark (breaking the interesting part of the design) and got accepted as well with of course a better result.
I decided to see if I could make an even more silly program - tailored for the test only. I managed to be orders of magnitude faster - 1 thread, no synchronization, everything processed in a FIFO (linkedlist) queue. This is actually a standard way to reimplement recursion. But I was honest enough not to hide that I consider that kind of program to cheat the test and got my entry in the "interesting alternatives".
In reality there is no difference in the "cheating" between my new program and the program that got accepted in the official list, they both cheat by using only 1 thread and process everything 1 by 1. There is not 1 thread per node in any of the program, and they can avoid any concurrency issues. One "looks" better because it uses a pool of 503 threads (but really use only 1 or 2 threads) and the other does not hide its use of 1 thread for processing. But this is not evident to people accepting the programs.
When I look at the haskell code, I can not really tell if it is creating 503 threads in the language or a pool or ..., you have to know each language quite well and sometimes it is not that easy to define what cheating is. Therefore this kind of benchmark is a bit disappointing. One should force the use of the same algorithm. But can you do so (a functional language won't use the same algo as a procedural one)?
Friday, May 15, 2009
Cholesky & Jakarta Commons Math
Looking at the source one can easily understand why. I did a small (many people will say not representative 1 million loop test) and finds out:
cholesky GPL= 5.4ms
cholesky BSD=37.1ms
So BSD code is 7 times slower! Of course it can do a bit more and has many checks of validity, but still. It shows it is not easy to do Math libraries, because some people will care a lot about this performance difference, and some other people won't but will like the other "features".
Cholesky & Jakarta Commons Math
Looking at the source one can easily understand why. I did a small (many people will say not representative 1 million loop test) and finds out:
cholesky GPL= 5.4ms
cholesky BSD=37.1ms
So BSD code is 7 times slower! Of course it can do a bit more and has many checks of validity, but still. It shows it is not easy to do Math libraries, because some people will care a lot about this performance difference, and some other people won't but will like the other "features".
Hull American Option Price Fallacies
Hull says American put is best exercised immediately and american call is optimal at expiry like a european. Is this really true?
At first it seems really clever and model show clearly this. But if we change the market assumptions only a tiny bit, everything falls down.
I could not detail everything in a blog post so I created a static web page about it. Everything was produced in Java using algorithm found in popular books and graphs through JFreeChart.
Hull American Option Price Fallacies
Hull says American put is best exercised immediately and american call is optimal at expiry like a european. Is this really true?
At first it seems really clever and model show clearly this. But if we change the market assumptions only a tiny bit, everything falls down.
I could not detail everything in a blog post so I created a static web page about it. Everything was produced in Java using algorithm found in popular books and graphs through JFreeChart.
Tuesday, May 05, 2009
On Quasi Random Numbers - MersenneTwister vs Sobol precision in Asian Option Pricing
This brought me to read more carefully several books about Monte Carlo and Finance (Haug Option Pricing, Sobol Primer on Monte Carlo, and Glasserman Monte Carlo Methods in Finance Engineering). I had quite a hard time to understand why the dimension of the quasi random generator was so important to price an asian option. Intuitively I thought the averaging points of an asian option were all on the same path, so they should be using the same random generator. This is very wrong as one does not care about the path in the first place but just in simulating each point in the average (using the regular black and scholes hypothesis). Finding the estimation for the average on the given points forces to use independent random generators for each point, because we want to approximate the estimation by the sum over those random points for each point.
There is another simple argument to explain why independence of the random generators is so important. If we use the same generator for each point, then each point will move exactly the same way at each simulation. The average of those point will therefore behave exactly the same way as if there was only 1 point using the same generator. And we don't price an asian anymore but just a regular vanilla option.
Using a pseudo random generator, one does not see the problem of dimension, because we can create N independent dimensions by just taking numbers N by N on a pseudo random generator. So effectively having 1 or N dimensions is the same on a pseudo random generator.
Still I wrote a small test to see if a 1D quasi random generator was so bad when simulating N dimensions (taking values N by N on the quasi random generator). Here are the results:
MersenneTwister vs MersenneTwister on 10D asian:
14:43:51,111 INFO MonteCarloSimulationTest:114 - 867970000 -- expPrice=0.978958644504466
14:43:51,428 INFO MonteCarloSimulationTest:120 - 314619000 -- expPrice=0.9733220318545934
14:43:51,430 INFO MonteCarloSimulationTest:122 - relative difference=-0.005757763804951897
can be as high as 2%
Sobol vs MersenneTwister on 10D asian:
14:48:46,909 INFO MonteCarloSimulationTest:115 - 980209000 -- expPrice=0.9895032774079221
14:48:47,345 INFO MonteCarloSimulationTest:121 - 433685000 -- expPrice=0.9790264042895171
14:48:47,348 INFO MonteCarloSimulationTest:123 - relative difference=-0.010588012548932534
about 1% it is actually bounded by MersenneTwister precision.
Sobol vs Sobol1D on 10D asian:
14:47:08,614 INFO MonteCarloSimulationTest:115 - 717444000 -- expPrice=0.8810736428068913
14:47:08,925 INFO MonteCarloSimulationTest:121 - 308499000 -- expPrice=0.9791449305055208
14:47:08,927 INFO MonteCarloSimulationTest:123 - relative difference=0.11130884290920073
about 10% and stays that way even when increasing number of simulations.
Using an asian rate with 10 points, we see that Sobol1D will always give a very bad estimate, no matter the number of simulations. While Sobol used properly will give (much) better precision for less iterations. So even though there is the word random in quasi random, the numbers are very far from being random or even behaving like random numbers. It helped me to read about Van der Corput and Halton numbers to really understand quasi random numbers.
On Quasi Random Numbers - MersenneTwister vs Sobol precision in Asian Option Pricing
This brought me to read more carefully several books about Monte Carlo and Finance (Haug Option Pricing, Sobol Primer on Monte Carlo, and Glasserman Monte Carlo Methods in Finance Engineering). I had quite a hard time to understand why the dimension of the quasi random generator was so important to price an asian option. Intuitively I thought the averaging points of an asian option were all on the same path, so they should be using the same random generator. This is very wrong as one does not care about the path in the first place but just in simulating each point in the average (using the regular black and scholes hypothesis). Finding the estimation for the average on the given points forces to use independent random generators for each point, because we want to approximate the estimation by the sum over those random points for each point.
There is another simple argument to explain why independence of the random generators is so important. If we use the same generator for each point, then each point will move exactly the same way at each simulation. The average of those point will therefore behave exactly the same way as if there was only 1 point using the same generator. And we don't price an asian anymore but just a regular vanilla option.
Using a pseudo random generator, one does not see the problem of dimension, because we can create N independent dimensions by just taking numbers N by N on a pseudo random generator. So effectively having 1 or N dimensions is the same on a pseudo random generator.
Still I wrote a small test to see if a 1D quasi random generator was so bad when simulating N dimensions (taking values N by N on the quasi random generator). Here are the results:
MersenneTwister vs MersenneTwister on 10D asian:
14:43:51,111 INFO MonteCarloSimulationTest:114 - 867970000 -- expPrice=0.978958644504466
14:43:51,428 INFO MonteCarloSimulationTest:120 - 314619000 -- expPrice=0.9733220318545934
14:43:51,430 INFO MonteCarloSimulationTest:122 - relative difference=-0.005757763804951897
can be as high as 2%
Sobol vs MersenneTwister on 10D asian:
14:48:46,909 INFO MonteCarloSimulationTest:115 - 980209000 -- expPrice=0.9895032774079221
14:48:47,345 INFO MonteCarloSimulationTest:121 - 433685000 -- expPrice=0.9790264042895171
14:48:47,348 INFO MonteCarloSimulationTest:123 - relative difference=-0.010588012548932534
about 1% it is actually bounded by MersenneTwister precision.
Sobol vs Sobol1D on 10D asian:
14:47:08,614 INFO MonteCarloSimulationTest:115 - 717444000 -- expPrice=0.8810736428068913
14:47:08,925 INFO MonteCarloSimulationTest:121 - 308499000 -- expPrice=0.9791449305055208
14:47:08,927 INFO MonteCarloSimulationTest:123 - relative difference=0.11130884290920073
about 10% and stays that way even when increasing number of simulations.
Using an asian rate with 10 points, we see that Sobol1D will always give a very bad estimate, no matter the number of simulations. While Sobol used properly will give (much) better precision for less iterations. So even though there is the word random in quasi random, the numbers are very far from being random or even behaving like random numbers. It helped me to read about Van der Corput and Halton numbers to really understand quasi random numbers.
Friday, April 24, 2009
Java Logging Still Crap in 2009
I remember having looked at it very shortly before continuing using Log4j on all projects I have been involved with.
Today, while doing a very small project, I tried once more to use java logging. The main reason is that I was lazy to add a dependency to one more jar for this small project. While trying I found out that:
- you still need to use a damned JVM parameter to point to your configuration file
- you can not change the formatting without writing a formatter class!
Java Logging Still Crap in 2009
I remember having looked at it very shortly before continuing using Log4j on all projects I have been involved with.
Today, while doing a very small project, I tried once more to use java logging. The main reason is that I was lazy to add a dependency to one more jar for this small project. While trying I found out that:
- you still need to use a damned JVM parameter to point to your configuration file
- you can not change the formatting without writing a formatter class!
Monday, March 23, 2009
Bachelier vs. Black
Black and Scholes gives a strange result for the price of a binary option under high volatility. You will learn here how to simulate a stock price evolution using Java, and how to show it using JFreeChart library. It starts with more complex concepts (don't be afraid) and goes done towards simpler things.
I could not write all that in a blog format, so I created a old HTML page about it here and a PDF version.
Bachelier vs. Black
Black and Scholes gives a strange result for the price of a binary option under high volatility. You will learn here how to simulate a stock price evolution using Java, and how to show it using JFreeChart library. It starts with more complex concepts (don't be afraid) and goes done towards simpler things.
I could not write all that in a blog format, so I created a old HTML page about it here and a PDF version.
Thursday, March 19, 2009
Linux Audio State = Miserable
Rhythmbox does some crossfade, but crashes when you move manually in the song. Audacious does some crossfade but regularly crashes with crossfade plugin.
The real alternative are AIMP2 or Foobar2000 in Wine. It is quite incredible that you can have good solid crossfade in wine and not natively in Linux.
Maybe people spent too much time on useless Pulseaudio (I have much less issues using only ALSA).
Linux Audio State = Miserable
Rhythmbox does some crossfade, but crashes when you move manually in the song. Audacious does some crossfade but regularly crashes with crossfade plugin.
The real alternative are AIMP2 or Foobar2000 in Wine. It is quite incredible that you can have good solid crossfade in wine and not natively in Linux.
Maybe people spent too much time on useless Pulseaudio (I have much less issues using only ALSA).
Tuesday, February 10, 2009
Senior Developers Team Productivity X4 (from MS Research Paper)
"TDD seems to be applicable in various domains and can significantly reduce the defect density of developed software without significant productivity reduction of the development team"Their data gives also other interesting results:
- An experienced team (5 people over 10 years + 2 people under 5 years) : 155KLOC C# code (+60 test).
- A junior team (3 people under 10 years + 6 people under 5 years): 41 KLOC Java code (+28 test).
I know this is very far from scientific evidence and more like astrology, but still, the most conservative ratio for senior/junior is 4.23!
Senior Developers Team Productivity X4 (from MS Research Paper)
"TDD seems to be applicable in various domains and can significantly reduce the defect density of developed software without significant productivity reduction of the development team"Their data gives also other interesting results:
- An experienced team (5 people over 10 years + 2 people under 5 years) : 155KLOC C# code (+60 test).
- A junior team (3 people under 10 years + 6 people under 5 years): 41 KLOC Java code (+28 test).
I know this is very far from scientific evidence and more like astrology, but still, the most conservative ratio for senior/junior is 4.23!
Thursday, January 15, 2009
The End Of Rings Around Plain Java - A Better Concurrency Test
His suggestion was to simple send N start messages where N >= number of processors. In theory, the performance will become optimal with N threads then. Unfortunately this is not what happened in real life. In real life the single threaded performance is still better if you send even 16 messages on a biprocessor machine.
public static void main(String[] args) throws Exception {OptimizedRing ring = new OptimizedRing();
RingNode node = ring.startRing(Integer.parseInt(args[0]));
node.sendMessage(new StartMessage());
node.sendMessage(new TokenMessage(node.nodeId,1));
node.sendMessage(new TokenMessage(node.nodeId,1));
node.sendMessage(new TokenMessage(node.nodeId,1));
ring.executor.awaitTermination(10, TimeUnit.MINUTES);
}
My idea was that it was related to the swiching from thread to thread overhead, which is precisely what I think the original author of the test had in mind to test. I am not 100% convinced it is really what's happening. I wanted a test that would actually be faster using N threads; so I decided to add a bit of computation at before processing each Token. Unfortunately I had the bad idea to compute Pi by Monte Carlo method to do that. Running my tests I was surprised it did not change the results, and made things worse the most computer intensive the computation was (increasing the number of monte carlo iterations). It scared me a bit wondering what the hell could be wrong there. The following class performs much worse with 2 threads compared to 1:
public class BadParallelPi {private static void startExecutors() throws Exception {
long startTime = System.currentTimeMillis();
System.out.println(startTime);
ExecutorService executor1 = Executors.newFixedThreadPool(1);
executor1.execute(new Computation());
executor1.execute(new Computation());
executor1.shutdown();
executor1.awaitTermination(60, TimeUnit.SECONDS);
long delay = System.currentTimeMillis() - startTime;
System.out.println("finished single thread in "+(delay/1000.0));
startTime = System.currentTimeMillis();
System.out.println(startTime);
executor1 = Executors.newFixedThreadPool(2);
executor1.execute(new Computation());
executor1.execute(new Computation());
executor1.shutdown();
executor1.awaitTermination(60, TimeUnit.SECONDS);
delay = System.currentTimeMillis() - startTime;
System.out.println("finished 2 threads in "+(delay/1000.0));
}
public static class Computation implements Runnable {
public volatile int count = 0;
private double computePi() {
double pi = 0;
double x,y;
int n = 10000000;
for (int i=0;i<n;i++) {
x = Math.random();
x *= x;
y = Math.random();
y *= y;
if (x+y < 1) {
pi +=1;
}
}
pi = 4*pi/n;
return pi;
}
public void run() {
double pi = computePi();
long time = System.currentTimeMillis();
System.out.println(time+" thread "+Thread.currentThread().getId()+" pi="+pi);
count++;
}
}
public static void main(String[] args) throws Exception {
startExecutors();
}
}
Did you figure out why?
It took me less time with this simple code than with the original ring test to find out why. It is simply because of the Math.random call. Math.random only creates one random number generator, and it will be shared among threads. So every thread will wait at the other one at this point. Creating one random generator per thread showed 2 threads were much faster than 1, finally.
Back to the original ring test. Adding the correct way to compute Pi by Monte Carlo, I now had decent test results as long as the number of iterations is not too small. 10 iterations is enough to show a real difference between N threads and 1. Adding a small computation helps figuring out what happens behind the scene. You can also verify D Andreou claim, using only 1 start message the single threaded version is faster. If computation is too weak (for example number of Monte Carlo iteration of 0, one only measures method calls between threads (context switching), which is obviously optimal for 1 thread. Measuring Actor libraries on it is dangerous: if I write a single threaded Actor library, it will be the fastest of this test, but it certainly is not what you want to use as Actor library.
Let's see now how Scala fares compared to the Plain Java solution, using computation:
Machine | Algorithm | Time for 100000 ring count, 10 mc, 4 messages | Time for 10000 ring count, 100 mc, 4 messages |
Core2Duo | OptimizedRing 2 Threads | 57s | 37s |
Core2Duo | OptimizedRing 4 Threads | 78s | 39s |
Core2Duo | Scala Actors | 82s | 47s |
Core2Duo | SimpleRing (100 Threads) | 137s | 58s |
Core2Duo | OptimizedRing 1 Thread | 89s | 71s |
Core2Quad | OptimizedRing 4 Threads | 81s | 25s |
Core2Quad | Scala Actors | 71s | 30s |
Core2Quad | OptimizedRing 2 Threads | 61s | 43s |
Core2Quad | OptimizedRing 1 Threads | 100s | 80s |
The Core2Quad is Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
It is interesting to compare results of 4 threads on a biprocessor with monte carlo count of 10 and 100. We see a much higher thread overhead with fewer computation. With too few computation in monte carlo, the overhead of threads is too high over 2 concurrent threads. This explains why the very simple threading architecture fares much better in the last column compared to the previous one.
Scala Actors fares much better when it is not hindered in the creation of too many threads. It seem actually very good at abstracting multithreading intricacies, while still providing near Java performance in the real world where each actor does enough computation and multithreading is important.
The End Of Rings Around Plain Java - A Better Concurrency Test
His suggestion was to simple send N start messages where N >= number of processors. In theory, the performance will become optimal with N threads then. Unfortunately this is not what happened in real life. In real life the single threaded performance is still better if you send even 16 messages on a biprocessor machine.
public static void main(String[] args) throws Exception {OptimizedRing ring = new OptimizedRing();
RingNode node = ring.startRing(Integer.parseInt(args[0]));
node.sendMessage(new StartMessage());
node.sendMessage(new TokenMessage(node.nodeId,1));
node.sendMessage(new TokenMessage(node.nodeId,1));
node.sendMessage(new TokenMessage(node.nodeId,1));
ring.executor.awaitTermination(10, TimeUnit.MINUTES);
}
My idea was that it was related to the swiching from thread to thread overhead, which is precisely what I think the original author of the test had in mind to test. I am not 100% convinced it is really what's happening. I wanted a test that would actually be faster using N threads; so I decided to add a bit of computation at before processing each Token. Unfortunately I had the bad idea to compute Pi by Monte Carlo method to do that. Running my tests I was surprised it did not change the results, and made things worse the most computer intensive the computation was (increasing the number of monte carlo iterations). It scared me a bit wondering what the hell could be wrong there. The following class performs much worse with 2 threads compared to 1:
public class BadParallelPi {private static void startExecutors() throws Exception {
long startTime = System.currentTimeMillis();
System.out.println(startTime);
ExecutorService executor1 = Executors.newFixedThreadPool(1);
executor1.execute(new Computation());
executor1.execute(new Computation());
executor1.shutdown();
executor1.awaitTermination(60, TimeUnit.SECONDS);
long delay = System.currentTimeMillis() - startTime;
System.out.println("finished single thread in "+(delay/1000.0));
startTime = System.currentTimeMillis();
System.out.println(startTime);
executor1 = Executors.newFixedThreadPool(2);
executor1.execute(new Computation());
executor1.execute(new Computation());
executor1.shutdown();
executor1.awaitTermination(60, TimeUnit.SECONDS);
delay = System.currentTimeMillis() - startTime;
System.out.println("finished 2 threads in "+(delay/1000.0));
}
public static class Computation implements Runnable {
public volatile int count = 0;
private double computePi() {
double pi = 0;
double x,y;
int n = 10000000;
for (int i=0;i<n;i++) {
x = Math.random();
x *= x;
y = Math.random();
y *= y;
if (x+y < 1) {
pi +=1;
}
}
pi = 4*pi/n;
return pi;
}
public void run() {
double pi = computePi();
long time = System.currentTimeMillis();
System.out.println(time+" thread "+Thread.currentThread().getId()+" pi="+pi);
count++;
}
}
public static void main(String[] args) throws Exception {
startExecutors();
}
}
Did you figure out why?
It took me less time with this simple code than with the original ring test to find out why. It is simply because of the Math.random call. Math.random only creates one random number generator, and it will be shared among threads. So every thread will wait at the other one at this point. Creating one random generator per thread showed 2 threads were much faster than 1, finally.
Back to the original ring test. Adding the correct way to compute Pi by Monte Carlo, I now had decent test results as long as the number of iterations is not too small. 10 iterations is enough to show a real difference between N threads and 1. Adding a small computation helps figuring out what happens behind the scene. You can also verify D Andreou claim, using only 1 start message the single threaded version is faster. If computation is too weak (for example number of Monte Carlo iteration of 0, one only measures method calls between threads (context switching), which is obviously optimal for 1 thread. Measuring Actor libraries on it is dangerous: if I write a single threaded Actor library, it will be the fastest of this test, but it certainly is not what you want to use as Actor library.
Let's see now how Scala fares compared to the Plain Java solution, using computation:
Machine | Algorithm | Time for 100000 ring count, 10 mc, 4 messages | Time for 10000 ring count, 100 mc, 4 messages |
Core2Duo | OptimizedRing 2 Threads | 57s | 37s |
Core2Duo | OptimizedRing 4 Threads | 78s | 39s |
Core2Duo | Scala Actors | 82s | 47s |
Core2Duo | SimpleRing (100 Threads) | 137s | 58s |
Core2Duo | OptimizedRing 1 Thread | 89s | 71s |
Core2Quad | OptimizedRing 4 Threads | 81s | 25s |
Core2Quad | Scala Actors | 71s | 30s |
Core2Quad | OptimizedRing 2 Threads | 61s | 43s |
Core2Quad | OptimizedRing 1 Threads | 100s | 80s |
The Core2Quad is Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
It is interesting to compare results of 4 threads on a biprocessor with monte carlo count of 10 and 100. We see a much higher thread overhead with fewer computation. With too few computation in monte carlo, the overhead of threads is too high over 2 concurrent threads. This explains why the very simple threading architecture fares much better in the last column compared to the previous one.
Scala Actors fares much better when it is not hindered in the creation of too many threads. It seem actually very good at abstracting multithreading intricacies, while still providing near Java performance in the real world where each actor does enough computation and multithreading is important.