Archive for February, 2008

No need to support Safari 2 any more

Thursday, February 28th, 2008

I am probably one of the rare type of Mac owners who did neither of the following:

  • Bought Mac OS X 10.5 (including Safari 3)
  • Upgraded to Safari 3 by downloading it for free from Apple’s website

Therefore I was still using Safari 2 on my Mac.

A few nights ago, Apple’s software-update program pushed Safari 3 to my computer.

Although I could have deselected that download, I assume that the sort of people who would deselect an update that are the sort of people who care about their computer, and those are probably the sort of people who’ve already got Safari 3 by one of the above methods.

Therefore I assume nearly nobody is using Safari 2 any more, and thus see no need why it need be explicitly supported for future web projects.

Monitor troubles

Wednesday, February 27th, 2008

The monitor I am using at work suddenly went white. I.e. every pixel went white; not black, and not blue. A white screen of death, as it were.

I was in the middle of programming an algorithm requiring some degree of thought, so I wasn’t really up for being interrupted by unreliable technology.

The monitor a 22″ wide-screen flat monitor. My initial suspicion obviously lay with Windows.

For some reason, some instinct told me to turn the monitor off and on. I pressed the on/off button but amazingly nothing happened, and the “on” light stayed on. The on/off button is hidden in some inconvenient place, and one can’t see it unless one gets up and peers round the side of the monitor. It has no real tactile feedback when pressed, so initially I assumed, when the button did nothing, that I’d pressed it wrongly.

But getting up and looking at the button and pressing it a few more times confirmed that it was in fact doing nothing.

Again, some instinct told me to hold the button down for 5 seconds. Amazingly, that did turn the monitor off.

I had managed to crash my monitor !

To quote my friend Robin:

As life goes on, one owns more and more devices which need rebooting.

Slowest ever response to a job application?

Tuesday, February 19th, 2008

I suddenly got an email out of the blue today:

Dear Sir / Madam:

We have received your application, an interview will be arranged for shortlisted candidates. Thank you for your interest to join our team!

Human Resources

Initially I thought it was spam (although the normally good Gmail spam system hadn’t marked it as spam), then I saw the company name, and saw they were based in Macau.

A quick search for that company’s name in my email found one outgoing message to them, applying for a job, dated December 8th, 2006! Today is February 19th, 2008.

Java gotcha: anArray.hashCode isn’t deep

Thursday, February 14th, 2008

Every object has a hashCode and an equals method. These are used to determine where to place an object within a hashing algorithm, and if two objects with the same place in the hashing algorithm actually are the same, respectively. If you want to add objects to a Set—which stores only unique objects—it uses these methods to determine whether two objects are the same and thus shouldn’t both be stored.

If you have code like:

Set<byte[]> uniqueArrays = new HashSet<byte[]>();
uniqueArrays.add(new byte[] { 1,2,3 });
uniqueArrays.add(new byte[] { 1,2,3 });
uniqueArrays.add(new byte[] { 1,2 });
System.out.println(uniqueArrays.size() + " unique byte arrays");

This code prints 3. You might expect this program to print 2, as there are only two unique arrays within the Set. But arrays’ hashCode methods do not return the same result for two different arrays with the same contents. This is in contrast to, for example, the String class, which does indeed consider the String’s contents when computing the hashCode.

Set<String> uniqueStrings = new HashSet<String>();
uniqueStrings.add(new String("123"));
uniqueStrings.add(new String("123"));
uniqueStrings.add(new String("12"));
System.out.println(uniqueStrings.size() + " unique strings");

This code prints 2. (The slightly strange-looking “new String” here is to make sure that there are actually different object instances with the same content being passed to the add method; otherwise the Java compiler would use the same object instance for the two calls, as the string-content is the same.)

The solution is to use the Arrays.hashCode(anArray) method.

This isn’t particularly convenient if you want to store unique arrays in a set. But if you have an object with e.g. a byte[] instance variable, then you can implement the hashCode method on that object to use Arrays.hashCode, or you can use the code:

Map<Integer, byte[]> map = new HashMap<Integer, byte[]>();
map.put(Arrays.hashCode(anArray), anArray);
Collection<byte[]> uniqueByteArrays = map.values();

The importance of remembering how to spell your name

Tuesday, February 12th, 2008

There was a discussion on Slashdot about corporate ethics and system administrators reading other people’s emails. That reminded me of the following story:

A friend of a friend was working in IT as a Windows administrator. He was called to fix some boss’ computer, who then went out to lunch leaving the friend alone with the computer. The friend happened to see a mail on the boss’ computer that he found interesting, so he forwarded it to himself.

This is surely a bad thing to do, and the end of the story is that he got fired, but he probably would have got away with it apart from the mistake he made…

He managed to spell his own name wrong in his email address. So when the the boss got back from lunch, there was the bounce mail waiting for him in his inbox, with the friend’s misspelt name…

Creating an Iterator for a streaming ResultSet in Java

Monday, February 11th, 2008

The Java Iterator interface requires one implements a hasNext method, to determine if the current item is the last to be iterated over, or not. The MySQL driver’s implementation of the JDBC ResultSet object, if one uses streaming mode throws an exception from its isLast method. (Streaming mode prevents the JVM from running out of memory, which it would do if it tried to fetch all the results at once.)

Therefore I’ve developed an Iterator class based on such a ResultSet whose “next” method actually pre-fetches the row after the current one. The Iterator’s “hasNext” method therefore just returns if the row was created or not. And the “next” method returns the pre-fetched one, and fetches the next one.

And in order to make this code reusable, it’s an abstract superclass, and you can implement a method in a concrete subclass which converts the row into an object of your choosing. And thus the concrete subclass will provide an implementation of Iterator<T> for your T.

And to make this code reusable to people other than me, I hereby make it available.

ResultSetIterator.java

Reading row-by-row into Java from MySQL

Thursday, February 7th, 2008

Trying to read a large amount of data from MySQL using Java using one query is not as easy as one might think.

I want to read the results of the query a chunk at a time. If I read it all at once, the JVM understandably runs out of memory. In this case I am stuffing all the resulting data into a Lucene index, but the same would apply if I was writing the data out to a file, another database, etc.

Naively, I assumed that this would just work by default. My initial program looked like this (I’ve left out certain things such as closing the PreparedStatement):

public void processBigTable() {
    PreparedStatement stat = connection.prepareStatement(
        "SELECT * FROM big_table");
    ResultSet results = stat.executeQuery();
    while (results.next()) { ... }
}

Failed with the following error:

Exception in thread "main"
        java.lang.OutOfMemoryError: Java heap space
    at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2823)
    at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2763)
    ...
    at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1657)
    ...

The line it failed at was the exceuteQuery. So as we can see from the stack backtrace, it’s clearly trying to load all the results into memory simultaneously.

I tried all sorts of things but it was only after I took at the MySQL JDBC driver code did I find the answer. In StatementImpl.java:

protected boolean createStreamingResultSet() {
    return ((resultSetType == ResultSet.TYPE_FORWARD_ONLY)
        && (resultSetConcurrency == ResultSet.CONCUR_READ_ONLY)
        && (fetchSize == Integer.MIN_VALUE));
}

This boolean function determines if it’s going to use the approach “read all data first” or “read rows a few at a time” (= “streaming” in their terminology). I clearly need the latter.

You can specify, using the generic JDBC API, the number of rows you want to fetch at once (the “fetchSize”). Why would you have to set that to Integer.MIN_VALUE, which is stated to be −231, in order to get streaming data? I wouldn’t have guessed that.

Basically this very important decision about which approach to use, which in my case amounts to “program works” or “program crashes”, is left to test whether three variables are set to various values. I am not aware if this is in the documentation (I didn’t find it), nor if this decision is guaranteed to be stable, i.e. won’t change in some future driver version.

Now my code looks like the following:

public void processBigTable() {
    PreparedStatement stat = c.prepareStatement(
        "SELECT * FROM big_table",
        ResultSet.TYPE_FORWARD_ONLY,
        ResultSet.CONCUR_READ_ONLY);
    stat.setFetchSize(Integer.MIN_VALUE);
    ResultSet results = stat.executeQuery();
    while (results.next()) { ... }
}

This code works, and reads chunks of rows at a time.

Well I’m not sure if it reads chunks of rows at a time, or just one row at a time. I hope it doesn’t read one row at a time, because that would be very inefficient in terms of number of round trips from the software to the database. I assumed this was what the fetchSize parameter was controlling, so you could tune the size of the chunks to meet your particular latency and memory setup. But being forced to set it to a large negative number in order to get it to work means one has no control over the size of the chunks (as far as I can see).

(I am using Java 6 with MySQL 5.0 and the JDBC driver “MySQL Connector” 5.1.15.)

Which log levels to use when?

Tuesday, February 5th, 2008

I’m sure there are a lot of opinions in the world about which log levels to use for which errors. Log levels in the sense of if a text destined for a logfile should be prefixed with “Info”, “Warning”, “Error” etc.

There is even great debate about which log levels there should be. E.g. should there be a “Debug” level or a “Trace” level? Or both? In which case what’s the difference?

I tended to ignore those debates—for me a log statement was simply a log statement—but recently I was deploying an application, and various people had worked on it and used log levels inconsistently. Hardly a surprise as I’d never set any guidelines on how they should be used, due to aforestated ignoring. So I started to think about how I would have wanted the log levels to have been in the application I was at that moment deploying.

Firstly, what does a log level actually influence?

  • Different texts are stated in the log file. The only differences this makes to anyone is that operations teams, not too familiar with the software, tend to understandable freak out when one has lots of ERRORs in the log file; whereas they tend to freak out less with lots of INFOs.
  • You can do monitoring based on log files. E.g. one can create assertions which can be monitored, such as “INFOs may be in the log file, but ERRORs may not.”
  • You can set a log level and state that errors below a certain level will not be displayed, e.g. “ERRORs should be displayed but WARNINGs should not”. So the log level of an error may influence whether it gets logged at all.

And how would those differences influence ones decision about which level to make a particular log?

I came up with the following scheme for me:

TRACE For things that should happen, but one doesn’t want to see live. (For example SQL statement logs, etc.)
INFO For things that should happen, but one does want to log live. (For example, “writing file to /x/y.pdf…”) It’s important to state IDs and paths in log files, for if something goes wrong, one needs to be able to work out exactly what the robot did with what rows and objects. And if the robot crashes, a log, written before an action is done, such as “Writing PDF file…” can help to identify what part of the process caused the error.
WARNING   For things that shouldn’t happen, but where the user sees the correct result. For example files that should always exist, but if they don’t, they can be automatically recreated. If those happen live, one might want to look at them. But as long as the user sees the right thing, one can look at them in ones own time.
ERROR For things that shouldn’t happen, and where the user sees an incorrect result. Including “fatal” errors.

And in that case, one would monitor ERRORs (or maybe WARNINGs if one was otherwise bored), and would set the log level to TRACE on all test servers and INFO on all live servers.