Archive for the ‘Java’ Category

Code generation? Don’t generate to Java

Wednesday, February 10th, 2010

I tried to write a program using Java; it all seemed to be going well but then I hit a ridiculous limit. Java cannot be used for this type of problem. I have now completely re-written it in a different programming language, and that works fine.

Be aware of this limit. I was unaware of it when I started this project. But it makes Java completely unsuitable for a whole class of problem.

My customer supplies me with a config file from time to time, this specifies a certain algorithm. When the user enters data, this algorithm must be applied. The algorithm is complex, so performance is an issue.

The solution I chose was to generate code to execute the algorithm, based on the information in the config file. This is a valid computer-science approach, and is used for similar problems. For example, language parsers are often expressed as a grammar, and code to parse documents in the grammar are generated. JSPs are turned into Java classes which are then compiled and executed. WebTek pre-compiles HTML templates containing macros into code which produces the resulting HTML when executed.

However, don’t try this in Java, unless you are only working with small problems. A single method in Java can only be 64KB in size, once compiled.
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4262078

This means, JSPs can only be of a certain length, parsers can only parse languages of a certain complexity, if WebTek were written in Java then templates could only be of a certain length and complexity, and so on. Do you want to place such restrictions on the software you produce?

My specific problem involves simulating one million variations to a particular solution. How can I fit that into 64K?

  • That is 0.06 bytes per solution variation; yet the simulation of a single variation involves many lines of code (i.e. in total compiling to more than 0.06 bytes!).
  • I could put each variation into its own method, and have a big method which calls them all—but a method call takes more than 0.06 bytes!
  • I could have a hierarchy of methods: one main method which calls, say, 100 sub-methods, each of those call 100 sub-sub-methods, and finally those methods call the methods for the individual variations.

It’s not even possible to know how many bytes a method will generate to! So, as the complexity of the simulation of a variation is expressed in the config file, I would have to essentially have to do a “trial and error” approach: generate a method, compile it, if I get the error concerning the 64KB limit, split the problem up into slightly smaller methods, try the compilation again, repeat, etc. (And the Java compiler is not even very fast.)

This is all so wrong! This is complexity, which isn’t solving the customer’s problem. This complexity costs me time (and thus my customer money), complexity leads to bugs and difficulty of maintenance, etc.

So I have changed the language. Rather than generate Java, I generate C and compile it using the GNU gcc compiler. From the GNU coding standards:

Avoid arbitrary limits on the length or number of any data structure, including file names, lines, files, and symbols, by allocating all data structures dynamically.

This is a good standard! I like it. All programs should be written with this in mind. Your program may well be online in 10 or 20 years, and the hardware may well have changed: a 64KB limit may seem reasonable one year but is a real limitation 10 or 20 years later in software which would otherwise still be useful.

So, if you are solving this type of problem, don’t use Java.

P.S. On a separate project I used a similar approach using Perl, and that worked out fine too.

Java: Always explicitly specify which XML parser to use

Tuesday, September 15th, 2009

There is the following design error in Java (at least in Servlets):

  1. A server may serve multiple applications; each application may use different libraries or even different versions of the same library, “side by side”.
  2. XML parsers, transformers (XSLT), etc., have a standard interface, and there may be different implementations of this interface from different vendors, open-source projects, etc.
  3. Which XML parser, transformed etc. is actually used depends on a global system variable.

And it’s point 3 that’s the problem really. Points 1 and 2 are debatable: they certainly bring advantages, but they certainly bring complexity too.

I just had the problem that one of my web applications stopped working, but only intermittently. Restarting the server led to everything being OK, but later things would not be OK. I do hate environments where everything appears to work, yet in fact doesn’t. I mean how do you know when you’re “done” in such an environment? (Or how do you even know you are in such an environment?)

The bug was caused by:

  1. Application one used the default XML parser, and didn’t have any extra JARs (libraries) for reading XML
  2. Application two required a special XML parser, set the global variable so it would be used, and included the JARs necessary for the special XML parser

So when a request came to application 1, after a request had come to application 2, then the system would try to instantiate the special XML parser within application 1 (specified in the global variable set by application 2), but wouldn’t find it, as it wasn’t deployed in application 1 (and applications can’t use one another’s libraries, due to feature #1).

This seems obvious when one describes it, but looking at the logs, on a live server, with the system down and the clock ticking? – Far from obvious.

So now, I assert, every time you want to create an XML parser, do the following:

If you require a special XML library, use:

System.setProperty("javax.xml.parsers.DocumentBuilderFactory",
    "org.apache.xerces.jaxp.DocumentBuilderFactoryImpl");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
...

If you require the standard XML library, use:

Properties systemProperties = System.getProperties();
systemProperties.remove("javax.xml.parsers.DocumentBuilderFactory");
System.setProperties(systemProperties);
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
...

There is also the possibility to pass a parameter to DocumentBuilderFactory to specify which XML parser technology to use. That’s a good option too, as it wouldn’t “corrupt” this global variable (“system property”). However I think one should be defensive, and always delete the global variable if one wishes to use the standard XML parser, and therefore it doesn’t matter if this global variable gets corrupted or not.

Never do the following:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

This simply relies on whichever XML parser is currently set in the global variable. You have no way to guarantee that some other application running on the same server won’t set the global variable to use an XML parser you don’t have installed in your application. Even if you have control of the server and all applications, you don’t know what software you’ll be writing in the future. (In this case I installed a new application to a server which’d been running fine for 1 year, but due to setting the global variable, the old application broke..)

The same applies for all those other “factory” situations such as TransformerFactory.newInstance() etc.

This feels all quite inelegant to me, and has just cost me a lot of time, and it’s not as if I’m so new to programming Java. I am wondering if there is a better way to approach it? Or is Java just broken in this particular respect?

P.S. This is not the only thing that went wrong with the old application today. I upgraded from Java 5 to Java 6 and suddenly some XML was not compliant against a schema according to Java – I had hit this error.

Starting Jetty: FAILED

Sunday, July 12th, 2009

It is literally 23:19 on a Sunday and I’ve been working through the weekend to get a release out of some software I’m working on.

The Java application webserver (Jetty) was taking a long time to restart each time I did a change, so for some reason I thought I’d experiment with some new command-line options. Probably not the right time to do that.

Normally I would type

$ sudo  /etc/init.d/jetty6 restart
Stopping Jetty: OK
Starting Jetty: OK

and everything would be good. I tried typing

$ sudo /etc/init.d/jetty6 supervise

Then some stuff happened that I didn’t really understand. Rather than try and work out what it did I tried to restart it again using the old restart mechanism

$ sudo  /etc/init.d/jetty6 restart
...
Starting Jetty: FAILED

OK I mean that was basically what was going on, it just wrote FAILED. How helpful! There was no info in the logfile. I searched Google but didn’t come up with anything.

A reboot later, and about half an hour of looking into /etc/init.d/jetty6 with vi and randomly making changes and printing more stuff out yielded the fact that the “supervise” command had evidently run Jetty “as me” and not as the “jetty” user. So when the normal “restart” command came along and tried to run the program as “jetty” then there were files it couldn’t write to.

Solution:

$ sudo chown jetty /var/log/jetty6/2009_07_12.stderrout.log

My favourite Hibernate error

Wednesday, June 10th, 2009

… is this one. I’ve wasted many an hour searching for the cause of this. And it’s one you’re likely to run into pretty quickly when you try to write your first Hibernate configuration file.

The XML

<one-to-many type=”OtherClass”/>

delivers the error

Error parsing XML: Attribute “type” must be declared for element type “one-to-many”.

This looks like a perfectly self-explanatory error, however looking at the file, the element does have a “type” attribute. What should one do?

Thinking about it, I only just introduced the “type” attribute to the <one-to-many> element in my config. What happens if I change the attribute name to “fsdjkfdk”?

<one-to-many fsdjkfdk=”OtherClass”/>

The error is now:

Error parsing XML: Attribute “fsdjkfdk” must be declared for element type “one-to-many”.

What the error means is that the attribute must not be declared, as opposed to must.

It’s amusing to read even people on the Hibernate team get confused by this error, and can’t find a solution.

(Hibernate 3.3.1 – the most current version – although I encountered this error within the first hour of ever using Hibernate in Q1/2006.)

Java really delivers “write once, run anywhere”

Wednesday, May 6th, 2009

Java’s slogan was “write once, run anywhere”. They received a certain amount of criticism but I have to say that compared to other programming languages it’s really true. You can use it for:

  • Background jobs (without user-interface)
  • Server-side web applications (many web servers & web frameworks available)
  • In the web browser (applets)
  • In the web browser (translation from Java to Javascript by GWT)
  • On the desktop (using platform-independent Swing, or with native Apple UI using Cocoa)
  • On some mobile phones (J2ME, Google Android) – although I haven’t tested how good that really works?

Writing the previous version of a certain application website in Perl, there was no easy way to give the customer a “tool” to test out new versions of the configuration file. These files would normally be installed on the server, were multiple megabytes in size, and the Perl would parse and use them. For testing, it was not ideal to have to upload potential new files to the test server, due to their size.

The new version is in Java and also takes a configuration file, but I have written a Swing (desktop) tool which simply allows the tester to select a new potential configuration file from their local hard disk, and the desktop tool reuses all the processing logic 1:1 that the web server in production would use.

That wouldn’t have been possible with the old version of the logic written in Perl. (I know there are windowing libraries for languages like Perl but they are hardly as easy to deploy – i.e. install on a Windows workstation – as a Java application – simply double-click the .jar file once Java is installed)

I am writing the web front-end for the new version in GWT so I can reuse certain (mainly user validation) code between the web browser (giving the user instant feedback in case of errors) and the web server (necessary for security in case someone bypasses the client and sends HTTP requests directly.) And simply pass Java objects between the web server and the web client, without having to worry about how that transfer works (JSON, XML, etc.)

Other mainstream candidates for languages which run on multiple places:

  • Objective-C is not too bad, running on Apple desktops, Apple iPhones, and on the web server via WebObjects (does the current version of WO still use Objective-C or is it Java-only these days?) – but not in the browser
  • Javascript runs on the web browser and server, and no doubt mobile browsers (but not desktops as far as I know)
  • Perhaps C#? Certainly good desktop, web server integration, no doubt IE/Windows integration via ActiveX

Jetty “null127.0.0.1″ error

Thursday, November 20th, 2008

I have just spent an hour debugging the following problem. Suddenly my Jetty Java web server didn’t work any more. Starting it claimed:

Starting Jetty: OK

Yet connecting to the port (telnet localhost 8080) showed that the port was not open. Looking in the logfile (/var/log/jetty6) showed the following error:

2008-11-20 12:08:47.477::WARN:
  failed SelectChannelConnector@null127.0.0.1:8080
...
Caused by: java.nio.channels.UnresolvedAddressException

That meant that the web server was unable to open the socket on port 8080.

Googling for this error showed only 1 result which lead to a 404. Javadocs and Jetty docs did not mention any problem like that.

Thankfully I had a working system to compare it with. The working system wrote in its logfile:

2008-11-03 13:34:41.283::INFO:
  Started SelectChannelConnector@0.0.0.0:8080

Notice the difference between 0.0.0.0 (works) and null127.0.0.1 (doesn’t work). It turned out that on the good system, the config file /etc/jetty6/jetty.xml had the line

<Set name="host"><SystemProperty name="jetty.host" /></Set>

and the non-working system had:

<Set name="host"><SystemProperty name="jetty.host" />127.0.0.1</Set>

I have absolutely no idea how this non-working line came into the config. I didn’t change it, and the operations department surely didn’t change it either. I have absolutely no idea.

Why is Java so difficult !?

Jetty 6.1.11, Debian Etch, Linux 2.6.24, Sun Java 1.5

Automatic reconnect from Hibernate to MySQL

Friday, October 24th, 2008

Yesterday I spent the entire day getting the following amazing state-of-the-art not-ever-done-before feature to work:

  • Executing a SQL statement from my program

Because, as everyone knows, I don’t suffer from NIHS, I used standard object-relational mapping software Hibernate, with a standard programming language Java, using the standard web-application server Tomcat, and now I am using the standard “connection pooling” software C3P0 (which I didn’t know I needed to execute a SQL statement, see below..)

The program is, in fact, already completed, and is nearly deployed. On the test server it works fine and even on the (future) live server it worked fine. But the customer noticed that if one installed it one day, the next day it didn’t work. I’ve had such symptoms many times before, so I know immediately what was going on:

  • MySQL drops a connection after 8 hours (configurable)
  • The software is used during the day, but isn’t used during the night, therefore the connection times out in the night
  • Therefore in the morning, the program one installed the day before no longer works

Perhaps I exaggerated the simplicity above of what I was really trying to achieve. It should really be expressed as the following:

  • Executing a SQL statement from my program, even if a long time has passed since the last one was executed

But that amounts to the same thing in my opinion! It isn’t rocket science! (But in fact is, see below..)

A obvious non-solution is to increase the “connection drop after” time on the MySQL server from 8 hours to e.g. “2 weeks” (“wait_timeout” in “mysql.cnf”). But software has got to be capable of reconnecting after a connection drops. The database server may need to be reset, it may crash, it may suffer hardware failure, etc. If, every time one restarts one particular service, one has to restart a thousand dependent services (maybe some Java, some Perl, some PHP, some robots, ..) and then maybe restart services which are dependent on them – that’s a maintenance nightmare. So the software has to be altered to be able to handle connection drops automatically, by reconnecting. Once the software has been so altered, one no longer needs to alter the “wait_timeout” on the server.

The error was:

org.hibernate.util.JDBCExceptionReporter: The last packet successfully received from the server was 56697 seconds ago. The last packet sent successfully to the server was 56697 seconds ago, which  is longer than the server configured value of ‘wait_timeout’. You should consider either expiring and/or testing connection validity before use in your application, increasing the server configured values for client timeouts, or using the Connector/J connection property ‘autoReconnect=true’ to avoid this problem.

Quite a helpful error message, don’t you think? But

  • I’m not going to increase “wait_timeout” as discussed above,
  • “testing validity” in the application – well I was using standard software Hibernate which should take care of this sort of thing automatically, but evidently wasn’t
  • and we were already using ?autoReconnect=true in the JDBC URL (this evidently wasn’t working).

I figured I really needed to get to the bottom of this. Googling just showed (many) people with the same problem, but no solutions. The only way to get to the bottom of software is to read the source. (It has been the way to resolve issues of simple things simply not working in MySQL before.)

I stopped looking in the MySQL source for why “autoReconnect=true” didn’t work when I saw the following text in the source describing the autoReconnect parameter:

The use of this feature is not recommended, because it has side effects related to session state and data consistency

I have no idea what particular side-effects are meant here? I guess that’s left as an exercise for the reader, to test their imagination.

And anyway, I figure that a reconnect-facility belongs in the “application” (Hibernate in my case) as opposed to in database-vendor specific code. I mean the exactly the same logic would be necessary if one were connecting to PostgreSQL or Oracle, so it doesn’t make sense to build it in to the database driver.

So then I looked in the Hibernate code. To cut a long story short, the basic connection mechanism of Hibernate (as specified in all the introductory books and websites, which is probably how most people learn Hibernate) doesn’t support reconnecting, one has to use H3C0 connection pool (which itself didn’t always support reconnecting)

(I don’t want to use container/Tomcat-managed connections, as I have some command-line robots which do some work, and I don’t want to use different code for the robots as the web application. Although another company defined Servlets which did “robot work”, and the robot was just a “wget” entered into Tomcat – to get the user of container-managed connections – but this seems a too-complex solution to my taste..

But once one’s used H3C0, the default behavior seems to be that to process a request, if the connection is dead then the user sees and error – but at least it reconnects for the next request. I suppose one error is better than infinite errors, but still not as good as zero errors. It turns out one needs the option testConnectionOnCheckout - which the documentation doesn’t recommend because testing the connection before a request might lead to lower performance. Surely the software firstly has to work, only secondly does it have to work fast.

So, to summarize, to get a connection to “work” (which I define as including handling dropped connections by reconnecting without error): In “hibernate.cfg.xml”:

<!-- hibernate.cfg.xml -->
<property name="c3p0.min_size">5</property>
<property name="c3p0.max_size">20</property>
<property name="c3p0.timeout">1800</property>
<property name="c3p0.max_statements">50</property>
<property name="connection.provider_class">
   org.hibernate.connection.C3P0ConnectionProvider</property>
<!-- no "connection.pool_size" entry! -->

Then create a file “c3p0.properties” which must be in the root of the classpath (i.e. no way to override it for particular parts of the application):

# c3p0.properties
c3p0.testConnectionOnCheckout=true

Amazing, that that stuff doesn’t just work out of the box. Programming the solution myself in Uboot took, I think, 1 line, and I’m sure it’s not more in WebTek either.

That was an amazing amount of effort and research to get the simplest thing to work. Now if only this project had been paid by the hour…..

[Update 28 May 2009] More Java hate today. Starting a new application, deployed it, and it didn’t work. In the morning, the application was down. Reason: The new project used Hibernate 3.3, and upgrade from 3.2 to 3.3 requires the “connection.provider_class” property to be set. Previously the presence of “c3p0.max_size” was enough.

Java gotcha: anArray.hashCode isn’t deep

Thursday, February 14th, 2008

Every object has a hashCode and an equals method. These are used to determine where to place an object within a hashing algorithm, and if two objects with the same place in the hashing algorithm actually are the same, respectively. If you want to add objects to a Set—which stores only unique objects—it uses these methods to determine whether two objects are the same and thus shouldn’t both be stored.

If you have code like:

Set<byte[]> uniqueArrays = new HashSet<byte[]>();
uniqueArrays.add(new byte[] { 1,2,3 });
uniqueArrays.add(new byte[] { 1,2,3 });
uniqueArrays.add(new byte[] { 1,2 });
System.out.println(uniqueArrays.size() + " unique byte arrays");

This code prints 3. You might expect this program to print 2, as there are only two unique arrays within the Set. But arrays’ hashCode methods do not return the same result for two different arrays with the same contents. This is in contrast to, for example, the String class, which does indeed consider the String’s contents when computing the hashCode.

Set<String> uniqueStrings = new HashSet<String>();
uniqueStrings.add(new String("123"));
uniqueStrings.add(new String("123"));
uniqueStrings.add(new String("12"));
System.out.println(uniqueStrings.size() + " unique strings");

This code prints 2. (The slightly strange-looking “new String” here is to make sure that there are actually different object instances with the same content being passed to the add method; otherwise the Java compiler would use the same object instance for the two calls, as the string-content is the same.)

The solution is to use the Arrays.hashCode(anArray) method.

This isn’t particularly convenient if you want to store unique arrays in a set. But if you have an object with e.g. a byte[] instance variable, then you can implement the hashCode method on that object to use Arrays.hashCode, or you can use the code:

Map<Integer, byte[]> map = new HashMap<Integer, byte[]>();
map.put(Arrays.hashCode(anArray), anArray);
Collection<byte[]> uniqueByteArrays = map.values();

Note: This only applies to arrays. Lists and Sets do consider their entries when computing their hashCodes, and thus can usefully be used as they keys of Sets.

Creating an Iterator for a streaming ResultSet in Java

Monday, February 11th, 2008

The Java Iterator interface requires one implements a hasNext method, to determine if the current item is the last to be iterated over, or not. The MySQL driver’s implementation of the JDBC ResultSet object, if one uses streaming mode throws an exception from its isLast method. (Streaming mode prevents the JVM from running out of memory, which it would do if it tried to fetch all the results at once.)

Therefore I’ve developed an Iterator class based on such a ResultSet whose “next” method actually pre-fetches the row after the current one. The Iterator’s “hasNext” method therefore just returns if the row was created or not. And the “next” method returns the pre-fetched one, and fetches the next one.

And in order to make this code reusable, it’s an abstract superclass, and you can implement a method in a concrete subclass which converts the row into an object of your choosing. And thus the concrete subclass will provide an implementation of Iterator<T> for your T.

And to make this code reusable to people other than me, I hereby make it available.

ResultSetIterator.java

Reading row-by-row into Java from MySQL

Thursday, February 7th, 2008

Trying to read a large amount of data from MySQL using Java using one query is not as easy as one might think.

I want to read the results of the query a chunk at a time. If I read it all at once, the JVM understandably runs out of memory. In this case I am stuffing all the resulting data into a Lucene index, but the same would apply if I was writing the data out to a file, another database, etc.

Naively, I assumed that this would just work by default. My initial program looked like this (I’ve left out certain things such as closing the PreparedStatement):

public void processBigTable() {
    PreparedStatement stat = connection.prepareStatement(
        "SELECT * FROM big_table");
    ResultSet results = stat.executeQuery();
    while (results.next()) { ... }
}

Failed with the following error:

Exception in thread "main"
        java.lang.OutOfMemoryError: Java heap space
    at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2823)
    at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2763)
    ...
    at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1657)
    ...

The line it failed at was the exceuteQuery. So as we can see from the stack backtrace, it’s clearly trying to load all the results into memory simultaneously.

I tried all sorts of things but it was only after I took at the MySQL JDBC driver code did I find the answer. In StatementImpl.java:

protected boolean createStreamingResultSet() {
    return ((resultSetType == ResultSet.TYPE_FORWARD_ONLY)
        && (resultSetConcurrency == ResultSet.CONCUR_READ_ONLY)
        && (fetchSize == Integer.MIN_VALUE));
}

This boolean function determines if it’s going to use the approach “read all data first” or “read rows a few at a time” (= “streaming” in their terminology). I clearly need the latter.

You can specify, using the generic JDBC API, the number of rows you want to fetch at once (the “fetchSize”). Why would you have to set that to Integer.MIN_VALUE, which is stated to be −231, in order to get streaming data? I wouldn’t have guessed that.

Basically this very important decision about which approach to use, which in my case amounts to “program works” or “program crashes”, is left to test whether three variables are set to various values. I am not aware if this is in the documentation (I didn’t find it), nor if this decision is guaranteed to be stable, i.e. won’t change in some future driver version.

Now my code looks like the following:

public void processBigTable() {
    PreparedStatement stat = c.prepareStatement(
        "SELECT * FROM big_table",
        ResultSet.TYPE_FORWARD_ONLY,
        ResultSet.CONCUR_READ_ONLY);
    stat.setFetchSize(Integer.MIN_VALUE);
    ResultSet results = stat.executeQuery();
    while (results.next()) { ... }
}

This code works, and reads chunks of rows at a time.

Well I’m not sure if it reads chunks of rows at a time, or just one row at a time. I hope it doesn’t read one row at a time, because that would be very inefficient in terms of number of round trips from the software to the database. I assumed this was what the fetchSize parameter was controlling, so you could tune the size of the chunks to meet your particular latency and memory setup. But being forced to set it to a large negative number in order to get it to work means one has no control over the size of the chunks (as far as I can see).

(I am using Java 6 with MySQL 5.0 and the JDBC driver “MySQL Connector” 5.1.15.)