Archive for the ‘Coding’ Category

“v” command in “less”

Wednesday, March 4th, 2009

Cool, just discovered: if you’re using the (UNIX command-line program) “less” to view a file, you can hit the “v” character to open the file in “vi”!

Klingon programming proverb

Tuesday, February 17th, 2009

From the autodie manual:

It is better to die() than to return() in failure.
  — Klingon programming proverb.

(“die” in perl is like throwing an exception.)

(via $foo magazin)

“Annotating” things mid-unit-test

Monday, February 9th, 2009

I use JUnitPHPUnit and Perl’s Test::Unit. Perl’s version has a very cool feature. You can call annotate(string) at any point during the test, and this text will get recorded, and only outputted in the case of the test’s failure. 

sub test_foo {
   my $self = shift;
   my $value = setup_something();
   $self->annotate("value = $value \n");
   $self->assert_equals(12, do_something($value));
}

So you can just log arbitrary stuff during the test, and it will be output when you need it (when the test fails), but won’t clutter up the test output (when the test succeeds).

I wish this was available from the PHP and Java unit testing frameworks!

Why Subversion is better than Git

Tuesday, February 3rd, 2009

A while back I had to make a decision, as the most senior developer on a team, if we should stick with the Git version control system. Git is extremely popular these days in the open source community. It was developed by Linus Torvalds to manage the Linux source code. And everyone wants to be as cool as Linus.

But hold on. Linux is not your average project. Certainly not the average project I work on. Linux is programmed by thousands of developers all over the world, with a hierarchy of committers and reviewers. Linux has a large volume of code. People using and making modifications to the source code are programmers, who can handle complex concepts who know why merge conflicts occur and when to use feature branches, release branches, etc.

My world looks different. I sit in an office with 2-3 onsite people. Some of them are graphics people who have a good sense of what looks good, but not a good grasp of resolving merge conflicts using vi. Some of them are managers, bosses, project leaders, product creators, who should be enabled to edit certain things such as wording, but who certainly won’t make much effort to learn anything new and will simply order me to do their wording changes myself if the system doesn’t work to their liking. This may not be everyone’s world, but it’s my world.

So it would be surprising to find that one system fits all sizes (although certainly not impossible, look at TCP/IP!). But it certainly can’t be assumed without evidence or investigation, that a system working for one world is the best solution for the other.

What do I want from a version control system? I want the following, in the following order (most important first):

  1. A system which can manage files between multiple users. A Windows share would suffice for this requirement. What I don’t want is me having to send a file to someone via email, me changing the file, them sending a changed version 2 weeks later back to me, and another 2 weeks after that, and me having to manually work out what changed when, and trying to make a combined version in a text editor. Or me working on my codebase and my colleague working on their codebase, and then 1 month later having to try and merge the results in a text editor.
  2. History. Who did what when, and why? If something breaks, who did it? Why did they do it? If I revert their change, what will I break? The ability to revert to a previous version.
  3. Branches. If more than one person creates a new feature, I want to be able to check stuff in, they check stuff out. But it shouldn’t go live yet. But live bugfixes need to happen. Multiple people could work on multiple different features, and one merges each feature into the “to release” history stream when they’re ready: in whatever order they become ready.
  4. Being able to create patches of changes, send them to other people, have them review them, etc. Actually I don’t need this functionality at all, but maybe I would if I worked on open source software.

OK let’s look at objectives #1 and #2, my most important objectives. As stated above, as many people should be using the system as possible (incl. non programming roles), in order to make me send/receive emails of various versions of stuff as infrequently as possible. Here’s my experience in this area:

  • At Uboot, a long time ago, we used to use CVS. There was only WinCVS at the time, a terrible program by all accounts. My manager Helge simply refused to use it. He said “I don’t need to use version control, that’s for programmers”. Therefore he was not technically able to do any wording, CSS or HTML changes. He had to delegate even the smallest thing to a web-designer or programmer.
  • A manager before that, when I suggested web design should be using version control, said “don’t even bother trying, web designers are too stupid to use version control”. He literally said that. I didn’t believe him. Years later at Uboot, the web designers were using CVS, thanks to my campaigning.
  • At Uboot, some web designers were better using CVS than others. Some could handle it, others would constantly get confused. Understandably, the error messages were terrible.
  • Using Subversion at Easyname (approx 5 people in total), managers, graphics people and software developers are able to work on the code base: anyone can alter wording, HTML, etc. Anyone can fix any bug.
  • Using Subversion on another project, my non-programming manager can alter wording in the HTML files I produce. That’s significantly better than him sending me Excel files with wording, and me building it in to the code. Not only is my life easier: it much less effort in total, meaning we can offer the projects at a lower cost to our customers, for same about of value for the customer.
  • Then, on this project, we switched to Git. The manager could just about handle checking in (although didn’t really understand the messages, there was a lot of “delete everything, check out everything again, try again” going on). The graphics department couldn’t handle it at all. They’d been working on a new version of the product for months, and had produced an un-checked-in tree of changes, un-merged with what development had been doing. Trying to check it in just resulted in errors such as ! [rejected] with no further info (literally, git is this unusable). Forget about creating branches or higher-level objectives of a VCS, this wasn’t even achieving objective #1.

Subversion is good because it’s simple and it has a good user-interface. Having a good user-interface is made possible (although in no way guaranteed!) by it being simple. Errors don’t happen often (as it’s easy for the user to have a model of how it works consistent with how it does actually work) and if they do, errors are clear and understandable, and often include a description of what to do next. Tortoise SVN is integrated with Windows Explorer and works perfectly, I think everyone who’s ever used it would agree. (Is there anything equivalent for Git?)

Subversion 1.5 (the latest version) even supports some fancy Git features. You can import all the changes from one branch onto another branch (“rebasing”) and it remembers which its imported, so next time you do them, it won’t import them again. You can “cherry pick”, i.e. decide that a particular revision should be imported from one branch to another, but no revisions before or after it: useful for applying a particular bug fix from development to live, for example, without carrying across any other new features. So there’s really nothing Subversion doesn’t do, which I would need.

So I took the decision to convert our repository away from the modern shiny “git” system, into the nearly-legacy (if you read open-source evangelists!) well-engineered easy-to-use Subversion. I haven’t looked back.

In addition (independent of distributed vs central source repositories), Subversion has so many well-engineered practical features:

  • Directories being empty and directories not existing are different states. Once a web designer created a whole structure of empty directories and checked them in to CVS only to see them not be checked out. On Subversion a directory can be empty on one revision, deleted on the next, replaced by a file of the same name on the next commit, etc. Neither CVS nor Git can handle this.
  • Want to revert your changes? Just type “svn revert”. I have watched a “Git expert” spend 30 minutes reading man pages to find out why the equivalent git command worked on modified files, but didn’t restore files which had been deleted. Reverting things is important, objective #2.
  • Binary files are automatically detected (in contrast to CVS).
  • Executable flag allows files to be marked as needing the UNIX “executable flag” (in contrast to CVS)
  • Files can be ignored based on versioned properties (i.e. *.class files, etc.). This is more elegant than creating a special file like “.cvsignore”
  • Want to commit with a different user? (e.g. shared check-out like on a live system) Just use the “–username” option or simply press return when prompted for a password and it’ll prompt you for a new username.
  • Absolute paths. You can execute “svn commit /home/adrian/checkout/file.txt”. This is convenient with Mac Finder: you can just drag a file from the finder to the terminal to gets its (absolute) path to appear. Neither CVS nor Git allows absolute paths for its commands.
  • The help is clear because the features are so simple. Check out the manual for svn log vs git log (shows the history of a file: how complex can it be!?)
  • You can do a quick fix on the live server and commit it. Subversion demands that the modified files be up-to-date before you can commit. Git demands that every file in the checkout is up-to-date, basically meaning you always have to pull the latest version from the repository and install it live, before you can commit the fix.
  • Don’t have Subversion? Don’t even know what it is? Subversion repositories are URLs, so you can just type it into a browser and see the files. CVS and Git repository descriptions are weird strings, so if you don’t have the software, you don’t know where to start.
  • You can force locking. You can specify that certain unmergable files should adhere to “lock, edit, commit & unlock” semantics. I use this for Excel spreadsheets, for example.
  • You can embed expandable keywords like $Revision$ into the source and use an option to say they should get replaced to e.g. $Revision: r213$ when a commit is done. This is useful because if you send a file to someone, they edit it and send it back, you need to know which version they originally got in case you’ve updated the file in the meantime. (One could say they “should do a checkout” but try telling a large many-thousand-person company to install a new version control system just to edit e.g. one wording file or an image, when they don’t even have physical access let alone administrative rights to their PC.)
  • Those $Revision$ tags don’t generate conflicts when merges are done. (CVS also replaces tags, but if you merge two versions together, CVS sees that the tag has been “modified” (by itself) in incompatible ways, and generates a conflict. Subversion normalizes such tags back to $Revision$ before applying merges.)
  • You can checkout only a subdirectory of the repository just by adding the directory to the end of the URL. (so on http://svn.x.com/project you can check out http://…/project/dir to get just “dir”).
  • Old servers can talk to new clients and vice-versa. This is good software engineering.
  • Output is copy-paste friendly. From the “svn diff” command you see your differences. You can copy/paste the filenames easily. In Git, you always have an extra “a/” in front of the name, e.g. “a/dir/myfile.txt”. Double-click on it and you can’t copy-paste it anywhere without having removing to remove the “a/” first.
  • Internally, Subversion separate logic from UI. A library of logic can be used from various different UIs. CVS tools all just wrap the command-line CVS tools (and always have a “command pane” where you can see the input/output to these tools, in case something goes wrong, e.g. you try and check in a file whose name contains a space!) It seems “git” has also been developed around the command-line-tool model. In 2008-11 someone suggested making a UI-independent library of functionality for git.
  • Subversion can handle Unicode file names.
  • SVN Externals are cool, for example a checkout of an application can check out the libraries as well, even if the libraries are hosted somewhere else. That’s better than checking the source of the libraries into the VCS, or the compiled form (e.g. JAR) or having a more complex build procedure

A few positive things to say about Git vs Subversion:

  • Git really is unbelievably fast. But is that actually a useful feature? For the Linux kernel, maybe. But I just did an “svn up” on my current project, it took 1.2 seconds (incl. server communication) and it was 1,684 files. On another project (uboot) “svn up” took 7.4 seconds and has 18k files. But I only do “svn up” once a day.
  • Grep -v. Because Subverison stores the original versions of all files it checks out, so it can send diffs to the server, if you do a “recursive search” for any string every file shows up double: once in the file, a second time in the Subversion’s internal directory. That’s quite annoying. Git has “git grep” but even if one uses “grep -v” they don’t show up for some reason.
  • Scrolling. Git automatically pipes the output of commands to a “more” program
  • “svn log” on a directory shows the differences to that directory object (which you rarely want), not all the files underneath it (which is what you always want). “svn info” shows the URL for that directory, “svn log <url>” shows the differences to all files (what you want).
  • Subversion renames don’t work too well: if you modify a file and someone else renames it you get a conflict, as opposed to your changes getting taken over to the new file name (i.e. contents changes and renaming being changes to two different aspects to a file, so should be mergeable). Does git handle this better? No idea, I didn’t understand git’s output when I tried to rename a file. (“git mv a b; git status” and “mv a b; git rm a; git add b; git status” produced different results, even though they should be the same according to the docs?)

By the way, when converting between version control systems:

  • Do always copy/convert the entire history, don’t just check in a “snapshot” of the current versions. Old versions are really amazingly useful. I regularly use “svn blame” on uboot and see changes from 2004 and beyond. Particularly satisfactory/useful if the change occurred at the time I wasn’t working for the company :)
  • Branches. Make sure to convert not only the “trunk” / “master branch”, but also any other branches or tags which are in use.
  • Send an email to all developers explaining how to download/use the new system (e.g. Subversion), how to make a checkout (e.g. the URL), and any username/passwords.
  • Other checkouts. Don’t forget clients of the VCS are not only humans, but all sorts of live servers, integration testing environments, daily build systems, etc. Make sure they get converted too.
  • Turn off writes to the old system. Keep the old system there (in case something goes wrong) but make sure it’s read-only.

(Subversion 1.5, Git 1.5)

Unicode characters in file names

Monday, January 26th, 2009

It’s amazing that the following all works:

  • Using my terminal program (PuTTY on Windows, set to UTF-8) connected to a Linux computer (terminal settings set to UTF-8), created a file whose name had various Unicode characters
  • The shell allowed me to type those characters (vi <filename>)
  • The standard programs such as “ls”, “cat”, “vi” seemed to be able to handle these file names
  • I checked the file into the Subverison version control system - it worked [1]
  • On Windows (XP, NTFS) I checked the file out using Tortoise SVN, it worked.
  • Windows Explorer showed the file having the Unicode characters in its file name.
  • I opened the file in Windows Notepad, it opened the file and displayed the name correctly in the title bar

That means, for my uses, I can absolutely use Unicode characters in file names. That’s a cool situation.

(In this particular situation, the user of my program should choose a “report” from a drop-down of possible report types. Each report has a directory on the disk, with some files in a standard layout inside the directory. There is no additional data, the file names do not need to be localized, so rather than creating an extra config file, which could get out-of-date with the directories on the disk, it is much more convenient and normalized to simply scan the directory from the program. The reports are created on Windows, my program is running on Linux, and the communication between the two is the Subversion VCS.)

There was one slight problem, which I didn’t notice at first, which is that perl can’t read the Unicode file names correctly on Linux. I didn’t notice it because, as is often the case with character set situations, there were two errors which cancelled on another out, to make it look like it had worked. Perl read the file name thinking that each UTF-8 two-byte character was actually two characters, and by default outputted Latin1 even though the terminal was UTF-8 so the two “characters” were output and interpreted by the terminal as the single original character in the file name. In such situations, I find the only way to debug and test such things is to output the length in characters as a number, as then such cancelling-out errors cannot occur.

[1] If you’re experiencing problems on the unix command line, using a UTF-8 terminal try the command “export LANG=en_US.UTF-8″!

An email address field should not be called “email”

Friday, January 23rd, 2009

Naming is extremely important in software development. Any software that deals with emails probably also deals with email addresses. Emails and email addresses are two completely different types of objects, with a completely different set of attributes, and a completely different set of actions which one can perform on them. Even if your application only deals with the one or the other type, it would be unwise to confuse them.

But yet constantly I see the word “email” being used to refer to an email address. Don’t do this, it’s confusing. Only use “email address” to refer to the address and “email” to refer to the message.

Currently seen in RFC 3733 section 2.9 describing the attributes of a Internet domain’s contact information. But there are a million other examples as well.

Jetty “null127.0.0.1″ error

Thursday, November 20th, 2008

I have just spent an hour debugging the following problem. Suddenly my Jetty Java web server didn’t work any more. Starting it claimed:

Starting Jetty: OK

Yet connecting to the port (telnet localhost 8080) showed that the port was not open. Looking in the logfile (/var/log/jetty6) showed the following error:

2008-11-20 12:08:47.477::WARN:
  failed SelectChannelConnector@null127.0.0.1:8080
...
Caused by: java.nio.channels.UnresolvedAddressException

That meant that the web server was unable to open the socket on port 8080.

Googling for this error showed only 1 result which lead to a 404. Javadocs and Jetty docs did not mention any problem like that.

Thankfully I had a working system to compare it with. The working system wrote in its logfile:

2008-11-03 13:34:41.283::INFO:
  Started SelectChannelConnector@0.0.0.0:8080

Notice the difference between 0.0.0.0 (works) and null127.0.0.1 (doesn’t work). It turned out that on the good system, the config file /etc/jetty6/jetty.xml had the line

<Set name="host"><SystemProperty name="jetty.host" /></Set>

and the non-working system had:

<Set name="host"><SystemProperty name="jetty.host" />127.0.0.1</Set>

I have absolutely no idea how this non-working line came into the config. I didn’t change it, and the operations department surely didn’t change it either. I have absolutely no idea.

Why is Java so difficult !?

Jetty 6.1.11, Debian Etch, Linux 2.6.24, Sun Java 1.5

Automatic reconnect from Hibernate to MySQL

Friday, October 24th, 2008

Yesterday I spent the entire day getting the following amazing state-of-the-art not-ever-done-before feature to work:

  • Executing a SQL statement from my program

Because, as everyone knows, I don’t suffer from NIHS, I used standard object-relational mapping software Hibernate, with a standard programming language Java, using the standard web-application server Tomcat, and now I am using the standard “connection pooling” software C3P0 (which I didn’t know I needed to execute a SQL statement, see below..)

The program is, in fact, already completed, and is nearly deployed. On the test server it works fine and even on the (future) live server it worked fine. But the customer noticed that if one installed it one day, the next day it didn’t work. I’ve had such symptoms many times before, so I know immediately what was going on:

  • MySQL drops a connection after 8 hours (configurable)
  • The software is used during the day, but isn’t used during the night, therefore the connection times out in the night
  • Therefore in the morning, the program one installed the day before no longer works

Perhaps I exaggerated the simplicity above of what I was really trying to achieve. It should really be expressed as the following:

  • Executing a SQL statement from my program, even if a long time has passed since the last one was executed

But that amounts to the same thing in my opinion! It isn’t rocket science! (But in fact is, see below..)

A obvious non-solution is to increase the “connection drop after” time on the MySQL server from 8 hours to e.g. “2 weeks” (“wait_timeout” in “mysql.cnf”). But software has got to be capable of reconnecting after a connection drops. The database server may need to be reset, it may crash, it may suffer hardware failure, etc. If, every time one restarts one particular service, one has to restart a thousand dependent services (maybe some Java, some Perl, some PHP, some robots, ..) and then maybe restart services which are dependent on them – that’s a maintenance nightmare. So the software has to be altered to be able to handle connection drops automatically, by reconnecting. Once the software has been so altered, one no longer needs to alter the “wait_timeout” on the server.

The error was:

org.hibernate.util.JDBCExceptionReporter: The last packet successfully received from the server was 56697 seconds ago. The last packet sent successfully to the server was 56697 seconds ago, which  is longer than the server configured value of ‘wait_timeout’. You should consider either expiring and/or testing connection validity before use in your application, increasing the server configured values for client timeouts, or using the Connector/J connection property ‘autoReconnect=true’ to avoid this problem.

Quite a helpful error message, don’t you think? But

  • I’m not going to increase “wait_timeout” as discussed above,
  • “testing validity” in the application – well I was using standard software Hibernate which should take care of this sort of thing automatically, but evidently wasn’t
  • and we were already using ?autoReconnect=true in the JDBC URL (this evidently wasn’t working).

I figured I really needed to get to the bottom of this. Googling just showed (many) people with the same problem, but no solutions. The only way to get to the bottom of software is to read the source. (It has been the way to resolve issues of simple things simply not working in MySQL before.)

I stopped looking in the MySQL source for why “autoReconnect=true” didn’t work when I saw the following text in the source describing the autoReconnect parameter:

The use of this feature is not recommended, because it has side effects related to session state and data consistency

I have no idea what particular side-effects are meant here? I guess that’s left as an exercise for the reader, to test their imagination.

And anyway, I figure that a reconnect-facility belongs in the “application” (Hibernate in my case) as opposed to in database-vendor specific code. I mean the exactly the same logic would be necessary if one were connecting to PostgreSQL or Oracle, so it doesn’t make sense to build it in to the database driver.

So then I looked in the Hibernate code. To cut a long story short, the basic connection mechanism of Hibernate (as specified in all the introductory books and websites, which is probably how most people learn Hibernate) doesn’t support reconnecting, one has to use H3C0 connection pool (which itself didn’t always support reconnecting)

(I don’t want to use container/Tomcat-managed connections, as I have some command-line robots which do some work, and I don’t want to use different code for the robots as the web application. Although another company defined Servlets which did “robot work”, and the robot was just a “wget” entered into Tomcat – to get the user of container-managed connections – but this seems a too-complex solution to my taste..

But once one’s used H3C0, the default behavior seems to be that to process a request, if the connection is dead then the user sees and error – but at least it reconnects for the next request. I suppose one error is better than infinite errors, but still not as good as zero errors. It turns out one needs the option testConnectionOnCheckout - which the documentation doesn’t recommend because testing the connection before a request might lead to lower performance. Surely the software firstly has to work, only secondly does it have to work fast.

So, to summarize, to get a connection to “work” (which I define as including handling dropped connections by reconnecting without error): In “hibernate.cfg.xml”:

<!-- hibernate.cfg.xml -->
<property name="c3p0.min_size">5</property>
<property name="c3p0.max_size">20</property>
<property name="c3p0.timeout">1800</property>
<property name="c3p0.max_statements">50</property>
<property name="connection.provider_class">
   org.hibernate.connection.C3P0ConnectionProvider</property>
<!-- no "connection.pool_size" entry! -->

Then create a file “c3p0.properties” which must be in the root of the classpath (i.e. no way to override it for particular parts of the application):

# c3p0.properties
c3p0.testConnectionOnCheckout=true

Amazing, that that stuff doesn’t just work out of the box. Programming the solution myself in Uboot took, I think, 1 line, and I’m sure it’s not more in WebTek either.

That was an amazing amount of effort and research to get the simplest thing to work. Now if only this project had been paid by the hour…..

[Update 28 May 2009] More Java hate today. Starting a new application, deployed it, and it didn’t work. In the morning, the application was down. Reason: The new project used Hibernate 3.3, and upgrade from 3.2 to 3.3 requires the “connection.provider_class” property to be set. Previously the presence of “c3p0.max_size” was enough.

mysqli_affected_rows

Wednesday, October 8th, 2008

Recently I programmed the following screen in PHP:

  • The user logs in
  • The user has a subscription
  • The subscription has a number of states (“terminate”, “auto-extend”, ..)
  • There is a screen allowing the user to change this state
  • The screen is a set of radio buttons – each radio button relates to one state
  • The user clicks on the radio-button representing the state they wish, clicks “ok”, and the new state gets saved to the database

Not rocket science eh? Well, unbelievably my implementation of the above had a bug. How on earth was that possible?

The bug was the following: If you changed the state, everything worked fine. But if you chose the same state as is already selected, an Exception gets thrown.

Initially I suspected a simple coding mistake. When I looked at the code, everything looked right. I had used the following “algorithm”:

  • Update the “subscription” row using SQL
  • Check the result of the SQL statement, that exactly 1 row was updated (in case e.g. id referenced a non-existing subscription, which would be an error)

I used the PHP function mysqli_affected_rows for that and unbelievably that has the following functionality: it only returns the number of changed rows i.e. the number of rows:

  • Matching the where clause, and
  • Currently having values different to those values being written to the row.

I can’t imagine a case where one would want to know that. I couldn’t find any function to return the number of rows matching, independent of if the values were changed or not. (The older version mysql_affected_rows exhibits the identical functionality.)

So I had to write the following function:

/**
 * Returns the number of rows which matched the WHERE
 * clause on the last UPDATE statement. This is not the
 * same as mysqli_affected_rows, which only returns the
 * number of changed rows.
 */
public static function DbUpdatedRows() {
    $link = self::DbGetLink();  // mysqli object
    $info = mysqli_info($link);
    if (preg_match('/Rows matched: (\d+) +Changed/',
            $info, $matches))
        return $matches[1];
    throw new Exception("DbUpdatedRows called although ".
        "it doesn't look like an UPDATE was the ".
        "last statement: mysqli_info returned '$info'");
}

I’ve just checked, and in InnoDB inside a transaction, it’s good to see that (as with Oracle) write-locks are indeed placed on all matched rows not just updated rows.

And don’t get me started on using DB-specific function calls (i.e. functions named mysql_x) as opposed to using a DB-abstraction layer like DBI in Perl, JDBC in Java, etc. Nor why I’m using PHP or MySQL in the first place.

Programming Languages: Is newer always better? (Part 2)

Friday, March 28th, 2008

Let me respond to some of the comments left at “Programming Languages: Is newer always better?

First up, Knowing what’s going on:

This is a terrible example. You are really arguing that PHP programmers don’t know how their language works while C programmers do. This is a horribly wrong-headed assertion. How about I counter your straw man with one of my own. I know plenty of new (as of the last 5 years) C programmers who have no idea that 0 is equivalent to NULL.

Yeah you’re right, this point is probably untrue.

At the time I wrote it I was getting frustrated with PHP programmers who didn’t know the difference between == and ===. I still have the feeling that Java and C books tend to concentrate firstly on the fundamentals of the available data types and operations, whereas introductions to PHP tend to focus on just writing code that looks OK and seems to do the right thing (an attitude which leads one to write programs with subtle bugs).

But, having thought about that a bit more, that probably has more to do with my exposure to books written for people who can already program, vs articles about PHP on the web. And probably really does have nothing to do with the language whatsoever.

Strict typing

You want the compiler to check that a method can only receive an object of type SomeObject while I want any method to be able to receive any object as long as it responds to (or has the same interface) as SomeObject.

I used to think this way for quite a time, when I was programming Objective-C: that it was cool to write code which took any object as long as it responded to a certain set of methods. And that asserting an object must be of a particular class or respond to a particular interface made my code less flexible and reusable.

However after a time, looking at both my own Objective-C code and that written by colleagues, you would see methods like saveToDb:anObject. That method assumed that the parameter anObject responded to certain methods (by virtue of the method’s body calling those methods on its parameter), yet this was not documented in the method’s prototype (although it could have been placed in a comment had the programmer decided to), and could not be checked at compile-time. It gets worse when anObject is simply passed to some other function, so you have to open that in the editor to determine what type of object you can pass there. And you’re out of luck if you don’t have the source code. And even if you do document the type in a comment, you can’t build an IDE where you can just click on the type and it opens the definition, immediately listing its methods and documentation.

C, Fortran, C++, Java and Pascal require static definitions and suffer greatly for it. C++ (again) and Java (again) have templates/generics to fake this kind of feature and suffer horribly for it.

I have to agree that what really has improved in modern languages and runtimes (post concerning improvements in the future) is that the runtime knows what type of object a reference points to. Using void* in C is nasty.

No, Perl isn’t strictly typed and can’t do what you’re saying. But once again, you can check things. You can validate that an Object is a particular class or descendant of a particular class. As with the variable bounds, you can validate your data.

This is true, you can do that. But it doesn’t happen at compile-time (which means if you didn’t unit test or click-through that code path, you don’t see the error), and other programmers may choose not to even put the acceptable ranges or types in comments, and then you’ve got code which takes $x and then you’re really stuck. (Although I suppose if you work with programmers who don’t like to make readable code, you’re stuck no matter what language they’re programming in; I mean you can make unreadable code in any language.)

Enumerated types

This is a great feature modern day languages have though maybe it isn’t called “enumerated type.” Ruby has symbols so you can say your types are :hot, :warm, :lukewarm, :cold. These symbols mean the same thing everywhere. To use your PHP example in Ruby, how about error_log(“user not found”, :user_not_found). In this example, you don’t know the languages you are criticizing.

Well that’s great that Ruby has such a feature, but Perl and PHP still do not have such a feature. If they did, PHP wouldn’t have defined its error_log function that way. So when I’m programming those two languages (which I do a lot, alas) I am forced to write less readable code. (Even after defining constants, i can still pass gender_male with a value of 3 to a function expecting a state where 3 means the user has been deleted, and it won’t even exit with an error, let alone give me a compile error: it will simply do the wrong thing.)

No Compiler

Please point me to a modern language that is slower with longer variable and method names. Ruby, Perl, Python, OCaml and Erlang all “compile” the code to an intermediate form (bytecodes) and then execute those.

What? Are you suggesting that a comment in a procedure is parsed every time the procedure executes? I don’t know a single interpreted language implementation that would do that. The only exception are calls to “eval” or similar functions. 

As Perl, PHP etc all take plain-text files as their input, it follows that they have to process these files, byte per byte. Agreed, the better ones parse the source to an intermediate form where e.g. execution of loops will not be slower for longer variable names or a more complex programming style, but they still have to take the hit once, during the conversion from the text form to the intermediate form.

I have experienced this first hand. Uboot has about 350k lines of code (which is not unreasonable, the system provides mail, sms, photo galleries, blogs, subscriptions, and many more features, some of which are not active any more.) That takes about 4 CPU-seconds to convert to intermediate code (maybe faster these days, that was about 2 years ago). On each server we have about 30 instances of that code running. That means when we restart a webserver, it’s down for about 2 minutes. It does 2 minutes of useless work!

I have been told often enough, since working at Uboot, that I use the language wrong, that my programming is too “Java style”. The solution, I’m told by experienced Perl web developers, is simply not to write 350k lines of reusable library code, but instead write a simple large script with all the code rolled together. It starts faster, runs faster, and consumes less memory. And I’ve tried it: on some performance-critical sections I have indeed manually copy-pasted sections of code together to form one simple script, and it really does compile and run orders of magnitude faster.

I’ve essentially manually done what I would like a compiler to do. But that’s not the way I want to program. I do not want to be rewarded at runtime for bad programming practice!

*Every* language bears this cost because they *all* to have to parse the code at some point to either turn it into bytes or machine code.

That is very true, but some languages do this on your build machine, not on your production machines when you start the service.

Also, doing this on your build machine means you can perform more expensive optimizations, as you don’t have to worry about how long those optimizations take, which you do if the compiling means your service starts slower.

No linker

Your argument here is about memory footprint. This is a total non-starter on any modern operating system that does demand paging. If huge sections of your ruby/perl/python/whatever library are not used, the OS will never page them into RAM.

This depends where you wish to deploy to. For sure, on a web-server, this doesn’t matter.

On Uboot I wrote the “Uboot Joe” which is a program you can download to your Windows computer. I made the mistake of writing it in Java. To distribute it, I distributed the whole JVM (as most users won’t have one) which includes all sorts of things I never used, I included XML-RPC libraries (which no doubt include methods I never used), as well as my own code. The entire bundle came to 15MB. Our users had to download that just to get a program sitting on the tray, connecting to the Uboot servers, and popping up a few notifications. The size of this download file was attributed to one of the reasons why the program was not successful.

Yet cutting out unused functions via a linker is not rocket science. All C linkers do this (as far as I know).

I don’t think including the JVM was an incorrect decision; the file would not have been so excessively big if the download had included the Java runtime, but only those classes and methods of the JVM which I, or the libraries I had used, could actually possibly call at runtime.

I don’t write massive GUI apps in Perl.

Unfortunately I do write massive apps in Perl (albeit not GUI ones). And I did use Java to write a downloadable GUI app (albeit a simple one).

Multiple compile errors

I prefer to write a test, watch it fail, write the code to make it pass.

Right, but I’m tired of having to write test cases for trivial methods.

If I write a setter, I have to write a test case in Perl, otherwise it might fail because I made a spelling mistake. (I know from experience, writing test cases for even such trivial things really does actually help in Perl.)

In Java I don’t bother testing trivial methods; they just work.

Formatted Strings

I went through a long period of time wondering this myself. I thought sprintf was good enough all these years, why should I bother with iostreams. Well, I experienced one too many crashes from the simple error of mismatching the printf format specifier with argument type (%s -> int). These instances usually occur in logging statements that you don’t always encounter in normal code paths. This problem goes away completely with iostreams, as the most important benefit is type safety.

Ah that’s true. And one of the good things about modern systems (article forthcoming) is that they know what the types of things are at runtime. If they don’t (C++ by default), then I agree with you completely.

I suppose my point more related to the needless leaving out of good things which existed in the past. Java had to wait till 1.5 to get printf (and 1.4 for regular expressions). One should be more aware of the history of programming languages, and what things have already been thought of.

Auto-creation of variables

I agree with you on this one. It should be noted this is considered horribly bad practice in Perl now. Adding one line, “use strict;”, stops this from happening and every program I write begins with that. I think the PHP folk have long since started declaring and initializing variables for the most part. So it didn’t work.

That is true, that “use strict” helps.

Alas many languages such as PHP do not have such a “use strict”.

However, even in Perl with “use strict”, you can still misspell a function/method name and that will only get picked up at runtime (assuming you unit-test or click-through that path, otherwise it will go unnoticed), and if you misspell an attribute name in a $self hash, that only gets picked up at runtime.

I mean the flexibility that Perl offers (i.e. you can fill the $self hash with anything, and write an AUTOLOAD method which gets called when a method does not exist) would mean that it would not be possible to check those things at compile-time. However for me the benefit of catching errors at compile-time outweighs the benefits of the flexibility. But that is a matter of opinion, for sure.

Several features are dropped from new languages because the designers consider it “very dangerous, no _real_ programmer would ever use that”. As that’s a matter of opinion, we lose several powerful features just because they are… hmm… powerful. For example: GOTOs and Multiple Inheritance.

That’s for sure true. However I would use that argument to say that the power which one gains from the totally dynamic runtimes and languages (such as Perl $self hash and AUTOLOAD mentioned above) are too powerful (and means certain static checks cannot be done). But that’s a matter of opinion for sure.

If it’s Turing-complete, your language is ultimately fine.

I’m not sure about that. For me, a programming language is firstly a communication tool from one programmer to another programmer (or to the first programmer, but later). Secondly it is a way to express as many invariants as possible. Only thirdly is it a way to command the machine (which, as you say, all languages, including assembler, are capable of).

In that respect, one should choose a language firstly giving you maximum expressiveness (e.g. using an object-oriented language to program an object-oriented design, using a language which does not penalise you for creating libraries even if not all functions in the library are used in every program, etc.).

And secondly one should choose a language which enables you to express as many invariants as possible (e.g. the object being passed here should always be a User, this number should always be between 2 and 20, this reference should never be null), serving both as mandatory documentation and as a way for a computation process (e.g. compiler) to check as many of these invariants as possible.