/etc/cron.daily scripts may not have dots in their name

I just had the problem I’d placed a script into /etc/cron.daily on my Debian Lenny system but it wasn’t getting run (or at least it didn’t seem so).

Two things I learned:

(1) Execute the following to see which scripts would get run:

run-parts --test /etc/cron.daily

(2) In my case, the reason the script was never executed was that I called it “xyz.sh” as it was a shell script; the dot in the filename was the problem. I removed the dot and now it runs.

Java is lacking a String “join” function

Between Java 1.0 and Java 1.3 (1996-2002 according to Wikipedia) there was no way to split strings into an array or a list.

In Java 1.4 the authors of Java saw it fit to introduce a method to split strings,

String csvData = "field1,field2,field3";
String[] fields = csvData.split(",");

However they did not introduce a method to “join” strings! Even in Java 7 there is no way to do this, e.g. via a static String.join method (2002-now).

OK I realize this is not “rocket science”, and I appreciate it exists in various versions in various third-party libraries, but still, it’s something every program needs to do at some point, it’s annoying to have to re-define it or think about it for each application.

For example, in one project I was working on in the last 6 months, such a function was created, and then had a bug! (OK but to be fair, that was not the only bug in the application!)

Come on, I mean this is a totally trivial function, totally necessary, available in all scripting lanuags why is it still not in Java?!

(P.S. Want to see an “enterprise java” solution to this problem? Check out the number of methods on this class)

Java varargs: inconsistent behaviour if you pass an array

In Java 1.4 there was the function Arrays.asList. You could pass it an array and it would make a list out of it.

String[] myArray = new String[] { "foo", "bar" };
List myList = Arrays.asList(myArray);

In Java 1.5 this was retrofitted for varargs; you could simply pass elements to the function

List<String> myList = Arrays.asList("foo", "bar");

I never really understood how that worked in a backwards-compatible way; I mean either the function takes an array of stuff, or it takes individual elements, surely?

It turns out, that with the varargs syntax, the caller is not forced to pass individual elements, the caller can instead pass an array of elements.

List<String> myList = Arrays.asList("foo", "bar");
List<String> myList = Arrays.asList(new String[] { "foo", "bar" });

The above two calls are identical, both return a List<String>.

But surely this is really dangerous? I mean Arrays.asList does not make any assumptions about what types of arguments it accepts; the list can be composed of any object.

How can it be certain that you want to have an List of Strings, and not a List containing a single element which is a String array? (An array is an object.)

To demonstrate this inconsistency:

String[] arr = new String[] { "foo", "bar" };
Arrays.asList(arr);            // returns List<String>
Arrays.asList(arr, arr);       // returns List<String[]>
Arrays.asList(arr, arr, arr);  // returns List<String[]>

foreach syntax

Most modern languages use very similar syntax inspired by C; but the features added since C are really non-standard! The “for-each” syntax annoys me particularly. I mean none of these is significantly better/worse than the others, but I program in all these languages (apart from C#) on a regular basis and I always have to think when typing in the line in order not to get the wrong syntax.

PHP foreach (list as element)
Perl foreach my element (list)
Java for (element : list)
Javascript for (var element in list)
C# foreach (element in list)

For what it’s worth, I think “foreach” is nicer than “for” as it reads more like a sentence (the word “for” really makes no sense at all in that context); and about “in” vs. colon I’ve got no preference really.

NoSQL becuase SQL is too slow?

I just read this excellent article relating to the reasoning for using NoSQL databases being performance problems.

http://thoughts.j-davis.com/2010/03/07/scalability-and-the-relational-model/

However, I wonder if it’s even reasonable to search for solutions to performance problems with the relational model; NoSQL, etc.

I think as computers get faster and faster (CPUs, SSDs, more memory, …) the set of problems which are “too slow” for a particular technology (or mindset) get fewer and fewer.

I used to have the pleasure of having to optimize systems (based on SQL, some of our solutions involved leaving the relational model); e.g. community with 4M users (uboot.com), but nowadays I have no customer where even an open-source database installed on reasonably inexpensive hardware is insufficient.

Of course, that’s just my experience, and there certainly are many problems, companies, etc. today which require solving performance problems in the database, but I assert a lot of the people proposing NoSQL as a solution to performance problems with SQL databases don’t have the performance problems in the first place; I mean not everyone is implementing Facebook, Google, …

Upgrade to Lenny, everything down :(

How annoying, I upgraded from Debian Etch (Apache 2.2.3-4) to Debian Lenny (Apache 2.2.9), and then my Subversion Server (over HTTPS) gave the following error when surfed to from Firefox, which worked fine before:

An error occurred during a connection to svn.example.com.
SSL received a record that exceeded the maximum permissible length.
(Error code: ssl_error_rx_record_too_long)

What does that mean!? There’s not a great deal of info on the web.

Fundamentally, in my case, the first thing to work out, is that that error message means (or meant, in my case at least) HTTP was being transmitted over the HTTPS port, i.e. it wasn’t valid HTTPS at all, thus the protocol error. This could be confirmed by surfing to http://…:443/ (i.e. not https://) and seeing that the content (the Subversion server in my case) was correct.

The question was why? I had a bunch of sites in the “sites-enabled” directory, and another one of them (not my Subversion site!) had a

<VirtualHost *>

whereas it should have been

<VirtualHost *:80>

i.e. the port was missing. I’m not quite sure why it had that effect, as the request to the Subversion HTTPS URL did deliver the Subversion content, just not over HTTPS any more. But perhaps without the :80, it decided all ports should be subject to NameVirtualHost, and as that’s not possible with HTTPS, switched HTTPS off for all ports and all sites?

Nightmare ….

See also: http://stackoverflow.com/questions/119336/ssl-error-rx-record-too-long-and-apache-ssl

We took uboot.com online 10 years ago

At approx 6:30am on Monday 21st Feb 2000 my boss, my colleague and I took the first version of Uboot online.

It certainly didn’t have all the features it needed back then, it took us one extra week to add an address book to the messaging functionality, for example. And it didn’t have any photo sharing, video sharing, blogging, or the e-commerce functions that it has now.

It’s actually amazing how fast we did develop the first version: a team of three software developers, I started at the company on 3rd Jan 2000 and completed two other projects before starting Uboot; the first commit was on 19th Jan 2000.

With a few exceptions, I have been involved with Uboot software development, sometimes more, sometimes less, since then.

“me” vs “you” in user-interface dialogs

Computers, if they are addressing the user, should address the user as “you”, not as “me”.

Computers need, from time to time, to address the user, for example “You have updated your setting successfully”.

Some programs use the word “you” to address the user, some use the word “me” on the grounds that the user is reading it, and to them, they are “me”.

However, using “me” to address someone is ridiculous! That’s as logical as a human using the word “I” or “me” to refer to the recipient of a piece of communication, “hey, do I fancy going to the pub?” on the grounds that, to the recipient, they are “I”.

“me” is the source of communication, “you” is the destination of communication; if a computer is communicating, the user is the recipient of the communication so should be called “you”.

Specifically Gmail is the main culprit for this in my life, it lists conversations between “me, Joe”; having “you, Joe” or “Adrian, Joe” would be much better! Gmail even, on the same screen that it lists conversations between “me” and other people, says “You are using 25% of your 7000MB”, so it’s not even consistent!

Code generation? Don’t generate to Java

I tried to write a program using Java; it all seemed to be going well but then I hit a ridiculous limit. Java cannot be used for this type of problem. I have now completely re-written it in a different programming language, and that works fine.

Be aware of this limit. I was unaware of it when I started this project. But it makes Java completely unsuitable for a whole class of problem.

My customer supplies me with a config file from time to time, this specifies a certain algorithm. When the user enters data, this algorithm must be applied. The algorithm is complex, so performance is an issue.

The solution I chose was to generate code to execute the algorithm, based on the information in the config file. This is a valid computer-science approach, and is used for similar problems. For example, language parsers are often expressed as a grammar, and code to parse documents in the grammar are generated. JSPs are turned into Java classes which are then compiled and executed. WebTek pre-compiles HTML templates containing macros into code which produces the resulting HTML when executed.

However, don’t try this in Java, unless you are only working with small problems. A single method in Java can only be 64KB in size, once compiled.
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4262078

This means, JSPs can only be of a certain length, parsers can only parse languages of a certain complexity, if WebTek were written in Java then templates could only be of a certain length and complexity, and so on. Do you want to place such restrictions on the software you produce?

My specific problem involves simulating one million variations to a particular solution. How can I fit that into 64K?

  • That is 0.06 bytes per solution variation; yet the simulation of a single variation involves many lines of code (i.e. in total compiling to more than 0.06 bytes!).
  • I could put each variation into its own method, and have a big method which calls them all—but a method call takes more than 0.06 bytes!
  • I could have a hierarchy of methods: one main method which calls, say, 100 sub-methods, each of those call 100 sub-sub-methods, and finally those methods call the methods for the individual variations.

It’s not even possible to know how many bytes a method will generate to! So, as the complexity of the simulation of a variation is expressed in the config file, I would have to essentially have to do a “trial and error” approach: generate a method, compile it, if I get the error concerning the 64KB limit, split the problem up into slightly smaller methods, try the compilation again, repeat, etc. (And the Java compiler is not even very fast.)

This is all so wrong! This is complexity, which isn’t solving the customer’s problem. This complexity costs me time (and thus my customer money), complexity leads to bugs and difficulty of maintenance, etc.

So I have changed the language. Rather than generate Java, I generate C and compile it using the GNU gcc compiler. From the GNU coding standards:

Avoid arbitrary limits on the length or number of any data structure, including file names, lines, files, and symbols, by allocating all data structures dynamically.

This is a good standard! I like it. All programs should be written with this in mind. Your program may well be online in 10 or 20 years, and the hardware may well have changed: a 64KB limit may seem reasonable one year but is a real limitation 10 or 20 years later in software which would otherwise still be useful.

So, if you are solving this type of problem, don’t use Java.

P.S. On a separate project I used a similar approach using Perl, and that worked out fine too.

Learnt two new smileys today

Thanks to Nessus!

\o/ – arms raised in the air – success gesture
<3  – heart