Archive for the ‘Coding’ Category

Programming Languages: Is newer always better? (Part 2)

Friday, March 28th, 2008

Let me respond to some of the comments left at “Programming Languages: Is newer always better?

First up, Knowing what’s going on:

This is a terrible example. You are really arguing that PHP programmers don’t know how their language works while C programmers do. This is a horribly wrong-headed assertion. How about I counter your straw man with one of my own. I know plenty of new (as of the last 5 years) C programmers who have no idea that 0 is equivalent to NULL.

Yeah you’re right, this point is probably untrue.

At the time I wrote it I was getting frustrated with PHP programmers who didn’t know the difference between == and ===. I still have the feeling that Java and C books tend to concentrate firstly on the fundamentals of the available data types and operations, whereas introductions to PHP tend to focus on just writing code that looks OK and seems to do the right thing (an attitude which leads one to write programs with subtle bugs).

But, having thought about that a bit more, that probably has more to do with my exposure to books written for people who can already program, vs articles about PHP on the web. And probably really does have nothing to do with the language whatsoever.

Strict typing

You want the compiler to check that a method can only receive an object of type SomeObject while I want any method to be able to receive any object as long as it responds to (or has the same interface) as SomeObject.

I used to think this way for quite a time, when I was programming Objective-C: that it was cool to write code which took any object as long as it responded to a certain set of methods. And that asserting an object must be of a particular class or respond to a particular interface made my code less flexible and reusable.

However after a time, looking at both my own Objective-C code and that written by colleagues, you would see methods like saveToDb:anObject. That method assumed that the parameter anObject responded to certain methods (by virtue of the method’s body calling those methods on its parameter), yet this was not documented in the method’s prototype (although it could have been placed in a comment had the programmer decided to), and could not be checked at compile-time. It gets worse when anObject is simply passed to some other function, so you have to open that in the editor to determine what type of object you can pass there. And you’re out of luck if you don’t have the source code. And even if you do document the type in a comment, you can’t build an IDE where you can just click on the type and it opens the definition, immediately listing its methods and documentation.

C, Fortran, C++, Java and Pascal require static definitions and suffer greatly for it. C++ (again) and Java (again) have templates/generics to fake this kind of feature and suffer horribly for it.

I have to agree that what really has improved in modern languages and runtimes (post concerning improvements in the future) is that the runtime knows what type of object a reference points to. Using void* in C is nasty.

No, Perl isn’t strictly typed and can’t do what you’re saying. But once again, you can check things. You can validate that an Object is a particular class or descendant of a particular class. As with the variable bounds, you can validate your data.

This is true, you can do that. But it doesn’t happen at compile-time (which means if you didn’t unit test or click-through that code path, you don’t see the error), and other programmers may choose not to even put the acceptable ranges or types in comments, and then you’ve got code which takes $x and then you’re really stuck. (Although I suppose if you work with programmers who don’t like to make readable code, you’re stuck no matter what language they’re programming in; I mean you can make unreadable code in any language.)

Enumerated types

This is a great feature modern day languages have though maybe it isn’t called “enumerated type.” Ruby has symbols so you can say your types are :hot, :warm, :lukewarm, :cold. These symbols mean the same thing everywhere. To use your PHP example in Ruby, how about error_log(”user not found”, :user_not_found). In this example, you don’t know the languages you are criticizing.

Well that’s great that Ruby has such a feature, but Perl and PHP still do not have such a feature. If they did, PHP wouldn’t have defined its error_log function that way. So when I’m programming those two languages (which I do a lot, alas) I am forced to write less readable code. (Even after defining constants, i can still pass gender_male with a value of 3 to a function expecting a state where 3 means the user has been deleted, and it won’t even exit with an error, let alone give me a compile error: it will simply do the wrong thing.)

No Compiler

Please point me to a modern language that is slower with longer variable and method names. Ruby, Perl, Python, OCaml and Erlang all “compile” the code to an intermediate form (bytecodes) and then execute those.

What? Are you suggesting that a comment in a procedure is parsed every time the procedure executes? I don’t know a single interpreted language implementation that would do that. The only exception are calls to “eval” or similar functions. 

As Perl, PHP etc all take plain-text files as their input, it follows that they have to process these files, byte per byte. Agreed, the better ones parse the source to an intermediate form where e.g. execution of loops will not be slower for longer variable names or a more complex programming style, but they still have to take the hit once, during the conversion from the text form to the intermediate form.

I have experienced this first hand. Uboot has about 350k lines of code (which is not unreasonable, the system provides mail, sms, photo galleries, blogs, subscriptions, and many more features, some of which are not active any more.) That takes about 4 CPU-seconds to convert to intermediate code (maybe faster these days, that was about 2 years ago). On each server we have about 30 instances of that code running. That means when we restart a webserver, it’s down for about 2 minutes. It does 2 minutes of useless work!

I have been told often enough, since working at Uboot, that I use the language wrong, that my programming is too “Java style”. The solution, I’m told by experienced Perl web developers, is simply not to write 350k lines of reusable library code, but instead write a simple large script with all the code rolled together. It starts faster, runs faster, and consumes less memory. And I’ve tried it: on some performance-critical sections I have indeed manually copy-pasted sections of code together to form one simple script, and it really does compile and run orders of magnitude faster.

I’ve essentially manually done what I would like a compiler to do. But that’s not the way I want to program. I do not want to be rewarded at runtime for bad programming practice!

*Every* language bears this cost because they *all* to have to parse the code at some point to either turn it into bytes or machine code.

That is very true, but some languages do this on your build machine, not on your production machines when you start the service.

Also, doing this on your build machine means you can perform more expensive optimizations, as you don’t have to worry about how long those optimizations take, which you do if the compiling means your service starts slower.

No linker

Your argument here is about memory footprint. This is a total non-starter on any modern operating system that does demand paging. If huge sections of your ruby/perl/python/whatever library are not used, the OS will never page them into RAM.

This depends where you wish to deploy to. For sure, on a web-server, this doesn’t matter.

On Uboot I wrote the “Uboot Joe” which is a program you can download to your Windows computer. I made the mistake of writing it in Java. To distribute it, I distributed the whole JVM (as most users won’t have one) which includes all sorts of things I never used, I included XML-RPC libraries (which no doubt include methods I never used), as well as my own code. The entire bundle came to 15MB. Our users had to download that just to get a program sitting on the tray, connecting to the Uboot servers, and popping up a few notifications. The size of this download file was attributed to one of the reasons why the program was not successful.

Yet cutting out unused functions via a linker is not rocket science. All C linkers do this (as far as I know).

I don’t think including the JVM was an incorrect decision; the file would not have been so excessively big if the download had included the Java runtime, but only those classes and methods of the JVM which I, or the libraries I had used, could actually possibly call at runtime.

I don’t write massive GUI apps in Perl.

Unfortunately I do write massive apps in Perl (albeit not GUI ones). And I did use Java to write a downloadable GUI app (albeit a simple one).

Multiple compile errors

I prefer to write a test, watch it fail, write the code to make it pass.

Right, but I’m tired of having to write test cases for trivial methods.

If I write a setter, I have to write a test case in Perl, otherwise it might fail because I made a spelling mistake. (I know from experience, writing test cases for even such trivial things really does actually help in Perl.)

In Java I don’t bother testing trivial methods; they just work.

Formatted Strings

I went through a long period of time wondering this myself. I thought sprintf was good enough all these years, why should I bother with iostreams. Well, I experienced one too many crashes from the simple error of mismatching the printf format specifier with argument type (%s -> int). These instances usually occur in logging statements that you don’t always encounter in normal code paths. This problem goes away completely with iostreams, as the most important benefit is type safety.

Ah that’s true. And one of the good things about modern systems (article forthcoming) is that they know what the types of things are at runtime. If they don’t (C++ by default), then I agree with you completely.

I suppose my point more related to the needless leaving out of good things which existed in the past. Java had to wait till 1.5 to get printf (and 1.4 for regular expressions). One should be more aware of the history of programming languages, and what things have already been thought of.

Auto-creation of variables

I agree with you on this one. It should be noted this is considered horribly bad practice in Perl now. Adding one line, “use strict;”, stops this from happening and every program I write begins with that. I think the PHP folk have long since started declaring and initializing variables for the most part. So it didn’t work.

That is true, that “use strict” helps.

Alas many languages such as PHP do not have such a “use strict”.

However, even in Perl with “use strict”, you can still misspell a function/method name and that will only get picked up at runtime (assuming you unit-test or click-through that path, otherwise it will go unnoticed), and if you misspell an attribute name in a $self hash, that only gets picked up at runtime.

I mean the flexibility that Perl offers (i.e. you can fill the $self hash with anything, and write an AUTOLOAD method which gets called when a method does not exist) would mean that it would not be possible to check those things at compile-time. However for me the benefit of catching errors at compile-time outweighs the benefits of the flexibility. But that is a matter of opinion, for sure.

Several features are dropped from new languages because the designers consider it “very dangerous, no _real_ programmer would ever use that”. As that’s a matter of opinion, we lose several powerful features just because they are… hmm… powerful. For example: GOTOs and Multiple Inheritance.

That’s for sure true. However I would use that argument to say that the power which one gains from the totally dynamic runtimes and languages (such as Perl $self hash and AUTOLOAD mentioned above) are too powerful (and means certain static checks cannot be done). But that’s a matter of opinion for sure.

If it’s Turing-complete, your language is ultimately fine.

I’m not sure about that. For me, a programming language is firstly a communication tool from one programmer to another programmer (or to the first programmer, but later). Secondly it is a way to express as many invariants as possible. Only thirdly is it a way to command the machine (which, as you say, all languages, including assembler, are capable of).

In that respect, one should choose a language firstly giving you maximum expressiveness (e.g. using an object-oriented language to program an object-oriented design, using a language which does not penalise you for creating libraries even if not all functions in the library are used in every program, etc.).

And secondly one should choose a language which enables you to express as many invariants as possible (e.g. the object being passed here should always be a User, this number should always be between 2 and 20, this reference should never be null), serving both as mandatory documentation and as a way for a computation process (e.g. compiler) to check as many of these invariants as possible.

Programming Languages: Is newer always better?

Wednesday, March 26th, 2008

I constantly hear the belief that modern programming languages and environment are better than older programming languages. More productive, easier to user, and so on. It would stand to reason: nobody would make a new programming language with worse features than an already existing programming language. Or would they?

Everyone seems to think that this is fact. But surprisingly it’s not. There are many features in older programming languages which are not present in today’s languages. I predict these features will be re-invented by the next generation of programming languages authors, and everyone will think they are geniuses for having come up with these ideas. But at the same time those new languages will omit most of the good points of today’s languages. This cycle can go on forever.

It’s like the cycle that tends to take place of “the network” vs “the standalone computer”.

  • Central - IBM used to make mainframe computers, which one would access from terminals, i.e. central computing power, distributed usage.
  • Local - But those computers were slow because they were remote. Then e.g. Sun invented the “workstation”. The PC then followed. Local power to everyone.
  • Central - Then the web happened. Suddenly everything was remote again. “All you need is a browser!”. No local software installation nightmare. (Perhaps) independence from the single operating system vendor.
  • Local - And now “using the web offline” is back in fashion. So that’ll be local computing again then.

A few facts, for those who think there was no programming before Javascript, the web:

  • 1957 - Fortran released: expressions, variables, loops, subroutines
  • 1959 - LISP released: treating functions as data, enabling higher-order programming
  • 1967 - Simula 67 released: Object-oriented programming

Consider the following:

  • Variable Bounds. Ada, developed for the American military, with high emphasis on program correctness, allows one to define bounds to variables. For example “array with index between 1 and 100″ or “0 and 10″ or number “not more than 5″. Most variables, in reality, have allowed ranges. Why not express it in the program, it’s more self-documenting and it allows the run-time, and to an extent the compiler. to check the constraints. Isn’t minimization of bugs something that affects not just the military?
  • Strict typing. If you know an object being passed to a function is a “User”, it’s no good being passed an “Email Address”. The set of operations those objects can perform are completely different, so even if the programming language is “advanced” enough to be able to accept the parameter, the first method call to the object will fail. Why not express that and let the compiler check that. C++ can do it (since 1983) so let’s use that not Perl which can’t do it. Recently I read an article making a joke about casting everything to a string, but in reality that’s the default behaviour (in fact the only behaviour) of all scripting languages.
  • Knowing what’s going on. In C, it’s well defined what “0″ means or what the string “abc” in a program means, and so. Ask a C programmer if 0==NULL and as a PHP programmer if 0==null and see a) their reaction times b) if they’re correct. The C programmer will know fast and be correct, the PHP programmer will not. Who do you think writes programs with fewer subtle bugs?
  • Enumerated types. Is a user “active”, “disabled”, “inactive”? Having such options are common to all domains. C can define an enumerated type since ANSI C (1989) and Lisp since 1959. Why did Java have to wait until Java 5.0 (in 2004), and why do we have to create unreadable programs with languages like Ruby which can’t do them at all? For example what does the function error_log(”user not found”, 2) do in PHP, what does the 2 mean?
  • No compiler. Every byte in an interpreted language costs time to interpret. So it makes sense to have short variable names, fewer comments, for run-time efficiency. Is this the sort of programming style one should be encouraging?
  • No linker. You can build big libraries in a linked language, and only those functions used by the program (or used by the functions used by the program) will be included in the final executable. In Java, PHP etc, all the code you use is available all the time, taking up memory. I am often criticized for writing “too many libraries”, or code being “too object-oriented” in scripting languages, which is a fair criticism, as that code will run slower. However is it really an improvement to remove this function-pruning feature, which means bad programming practices will produce more efficient code?
  • Multiple compile errors. Why do modern programming languages such as PHP only tell you the first error in your program, then abort? This is laziness on the part of the compiler writer. Old compilers tell you all the errors in your program, so you can correct them all, without having to correct one, retry, correct next one, retry, and so on.
  • Formatted strings. There is nothing wrong with the format concept behind C’s “sprintf” command, originating from 1972. You can print numbers, strings, specify precision, field length and so on. (Apart from the inability to reorder parameters.) Why did C++ introduce the “<<" notation? (At least you can still use printf in C++). Why is this re-invented, worse, in .net? Why did Java have to wait until Java 5.0 to get this feature? Why do we have to reinvent the wheel (worse) all the time?
  • Auto-creation of variables. When programming languages like C were created, the authors made the decision that it was an error to use a variable without declaring it. This caught all sorts of errors such as misspellings of variables. Why have these decisions been forgotten, and every scripting language allows you to just use variables without declaring them? This means hours of searching for bugs when you simply misspell a variable name, something that’s going to happen to everyone at some point. We’re only human and we have to take that into account.

The above is a list of things that have got worse over the last 2 decades, I.e. they haven’t just not got better by staying the same, but these things have actually got worse.

Java gotcha: anArray.hashCode isn’t deep

Thursday, February 14th, 2008

Every object has a hashCode and an equals method. These are used to determine where to place an object within a hashing algorithm, and if two objects with the same place in the hashing algorithm actually are the same, respectively. If you want to add objects to a Set—which stores only unique objects—it uses these methods to determine whether two objects are the same and thus shouldn’t both be stored.

If you have code like:

Set<byte[]> uniqueArrays = new HashSet<byte[]>();
uniqueArrays.add(new byte[] { 1,2,3 });
uniqueArrays.add(new byte[] { 1,2,3 });
uniqueArrays.add(new byte[] { 1,2 });
System.out.println(uniqueArrays.size() + " unique byte arrays");

This code prints 3. You might expect this program to print 2, as there are only two unique arrays within the Set. But arrays’ hashCode methods do not return the same result for two different arrays with the same contents. This is in contrast to, for example, the String class, which does indeed consider the String’s contents when computing the hashCode.

Set<String> uniqueStrings = new HashSet<String>();
uniqueStrings.add(new String("123"));
uniqueStrings.add(new String("123"));
uniqueStrings.add(new String("12"));
System.out.println(uniqueStrings.size() + " unique strings");

This code prints 2. (The slightly strange-looking “new String” here is to make sure that there are actually different object instances with the same content being passed to the add method; otherwise the Java compiler would use the same object instance for the two calls, as the string-content is the same.)

The solution is to use the Arrays.hashCode(anArray) method.

This isn’t particularly convenient if you want to store unique arrays in a set. But if you have an object with e.g. a byte[] instance variable, then you can implement the hashCode method on that object to use Arrays.hashCode, or you can use the code:

Map<Integer, byte[]> map = new HashMap<Integer, byte[]>();
map.put(Arrays.hashCode(anArray), anArray);
Collection<byte[]> uniqueByteArrays = map.values();

Creating an Iterator for a streaming ResultSet in Java

Monday, February 11th, 2008

The Java Iterator interface requires one implements a hasNext method, to determine if the current item is the last to be iterated over, or not. The MySQL driver’s implementation of the JDBC ResultSet object, if one uses streaming mode throws an exception from its isLast method. (Streaming mode prevents the JVM from running out of memory, which it would do if it tried to fetch all the results at once.)

Therefore I’ve developed an Iterator class based on such a ResultSet whose “next” method actually pre-fetches the row after the current one. The Iterator’s “hasNext” method therefore just returns if the row was created or not. And the “next” method returns the pre-fetched one, and fetches the next one.

And in order to make this code reusable, it’s an abstract superclass, and you can implement a method in a concrete subclass which converts the row into an object of your choosing. And thus the concrete subclass will provide an implementation of Iterator<T> for your T.

And to make this code reusable to people other than me, I hereby make it available.

ResultSetIterator.java

Reading row-by-row into Java from MySQL

Thursday, February 7th, 2008

Trying to read a large amount of data from MySQL using Java using one query is not as easy as one might think.

I want to read the results of the query a chunk at a time. If I read it all at once, the JVM understandably runs out of memory. In this case I am stuffing all the resulting data into a Lucene index, but the same would apply if I was writing the data out to a file, another database, etc.

Naively, I assumed that this would just work by default. My initial program looked like this (I’ve left out certain things such as closing the PreparedStatement):

public void processBigTable() {
    PreparedStatement stat = connection.prepareStatement(
        "SELECT * FROM big_table");
    ResultSet results = stat.executeQuery();
    while (results.next()) { ... }
}

Failed with the following error:

Exception in thread "main"
        java.lang.OutOfMemoryError: Java heap space
    at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2823)
    at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2763)
    ...
    at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1657)
    ...

The line it failed at was the exceuteQuery. So as we can see from the stack backtrace, it’s clearly trying to load all the results into memory simultaneously.

I tried all sorts of things but it was only after I took at the MySQL JDBC driver code did I find the answer. In StatementImpl.java:

protected boolean createStreamingResultSet() {
    return ((resultSetType == ResultSet.TYPE_FORWARD_ONLY)
        && (resultSetConcurrency == ResultSet.CONCUR_READ_ONLY)
        && (fetchSize == Integer.MIN_VALUE));
}

This boolean function determines if it’s going to use the approach “read all data first” or “read rows a few at a time” (= “streaming” in their terminology). I clearly need the latter.

You can specify, using the generic JDBC API, the number of rows you want to fetch at once (the “fetchSize”). Why would you have to set that to Integer.MIN_VALUE, which is stated to be −231, in order to get streaming data? I wouldn’t have guessed that.

Basically this very important decision about which approach to use, which in my case amounts to “program works” or “program crashes”, is left to test whether three variables are set to various values. I am not aware if this is in the documentation (I didn’t find it), nor if this decision is guaranteed to be stable, i.e. won’t change in some future driver version.

Now my code looks like the following:

public void processBigTable() {
    PreparedStatement stat = c.prepareStatement(
        "SELECT * FROM big_table",
        ResultSet.TYPE_FORWARD_ONLY,
        ResultSet.CONCUR_READ_ONLY);
    stat.setFetchSize(Integer.MIN_VALUE);
    ResultSet results = stat.executeQuery();
    while (results.next()) { ... }
}

This code works, and reads chunks of rows at a time.

Well I’m not sure if it reads chunks of rows at a time, or just one row at a time. I hope it doesn’t read one row at a time, because that would be very inefficient in terms of number of round trips from the software to the database. I assumed this was what the fetchSize parameter was controlling, so you could tune the size of the chunks to meet your particular latency and memory setup. But being forced to set it to a large negative number in order to get it to work means one has no control over the size of the chunks (as far as I can see).

(I am using Java 6 with MySQL 5.0 and the JDBC driver “MySQL Connector” 5.1.15.)

Random unreproducable Java error of the day

Monday, January 21st, 2008

I mean I’m really kind of of the opinion that Java Sevlets, at least when using Tomcat and the other open source tools, don’t work. I mean surely it can’t be difficult to implement a Servlet container or logging framework!

I just tried to start Tomcat and it refused to start because of the following error:

log4j:ERROR Error occured while converting date.
java.lang.NullPointerException
  at java.lang.System.arraycopy(Native Method)
  at java.lang.AbstractStringBuilder.getChars
  at java.lang.StringBuffer.getChars
  at org.apache.log4j.helpers.ISO8601DateFormat.format
  at java.text.DateFormat.format
  ...
  at org.apache.log4j.Category.log
  at org.apache.commons.logging.impl.Log4JLogger.error
  ...
  at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt
  at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run
  at java.lang.Thread.run
log4j:ERROR Error occured while converting date.

So I just hit “start” again, and this time it starts without error.

And people trust their mission-critical server architecture to this stuff!

Web software front-end test cases

Wednesday, January 16th, 2008

Recently I was working on a website which was developed in PHP without a web framework. A lot of things were programmed manually which would normally be taken care of by a web framework (for example things like: in case of an error, the HTML fields on the form are re-populated on the response page).

So I came up with an extensive set of test cases, and made sure I used them all on every field on every page.

These are the tests, in no particular order.

  1. On a form with radio buttons and text fields next to the radio buttons, clicking on the radio button should position the text cursor in the text field (as typing there is the next thing you’re always going to want to do).
  2. Clicking on the text field by a radio button should select the radio button.
  3. Checkboxes and radio buttons which have text beside them must have the text in a <label> so that clicking the text selects the checkbox or radio button.
  4. On a <form>, pressing return must do the same as clicking “OK”.
  5. Pressing the TAB key on the keyboard should progress from one field to the next in a reasonable order.
  6. Every action where something is done (e.g. delete an FTP account) should have a confirmation text on the result page like “ftp account XYZ deleted”. I think it’s important to include the name of the object being acted upon in the message.
  7. All forms should use “post” and “get” appropriately. I mean various people have various strong views about when to use one and when to use the other. But for me the difference is the browser’s “are you sure you want to repost?” message when you click refresh. Do you want it? If so use “post”, otherwise use “get”. Also bookmarkability.
  8. Is there a reasonably small amount of HTML generated? E.g. in uboot to display the address book page requires about 1MB of HTML to be downloaded to the browser (not including any external CSS, graphics etc.) That’s too much.
  9. The back button should work everywhere. (E.g. Uboot multimedia gallery: click on a picture, click back, you’re at page 1 of the gallery rather than the page you were on.)
  10. Strings to long? Then “…” should be displayed, to avoid breaking the layout.
  11. If breadcrumbs are used, (i.e. main page > hosting > ftp accounts > ftp account ‘x’) then they must work. Ideally they should contain data such “ftp account ‘x’”.
  12. While loading a page on a slow connection, is the text readable before the background image loads? E.g. white text on a black background image may not be readable before the image loads. There should be an alternative flat-colour background of a similar colour. (Thanks Helge for pointing this out a few years ago!)
  13. While loading the page, does the page move around a lot due to width=x height=x attributes missing on an <img> tag? It’s annoying when you start to read the text on the page and then it moves a few seconds later just because some logo has finally loaded.
  14. Every time there is an error in the form and the page is reloaded with the error: is the original data still in the form?
  15. In all places where the user may enter free text: Do weird non-Latin1 characters work correctly, both entering them and displaying them? Do ” ‘ work correctly? Are < > & displayed correctly? Is all of this written to the database correctly? (Sometimes the browser sends &#123; style code to the server. If this isn’t escaped on the display code then the character may appear to have been processed correctly. But you don’t want &#123; strings in your database data.)
  16. Type in long values into text fields. Values should be neither truncated nor should an internal error be produced (default behaviour if you’re using MySQL or Oracle respectively)
  17. If there are any id=nn type parameters in the URL, then adding or subtracting 1 from the URL should not allow you to view other people’s content.
  18. Is the <title> tag set usefully? This is necessary when you use the down-arrow by the “back” button on the browser, to determine how far back you want to go. A whole lot of options such as “MyWebsite”, “MyWebsite”, “MyWebsite” is not very helpful.
  19. Test in a high-latency environment. Such as over a UMTS connection. If there are a lot of redirects, or images referenced from CSS referenced from HTML, or Javascript making AJAX calls, processing the result then making more calls, then the page will be slow, but work fine over a LAN connection.
  20. Test in an unreliable environment, e.g. where packets get lost. Google Spreadsheets, when you type in a value, lets you continue editing the page while it’s sending the value to the server. But if there’s an error or timeout with the sending you see an error, and the cell is reverted to its previous value. You simply have to type in the same data all over again. Instead of that it should remember your data, and display an warning “can’t connect to server right now; retrying…”
  21. If you type in URLs in capitals or mixed case: do they still work? (Thanks again Helge!)
  22. AJAX progress indicators: Is it the case that you’ve written an AJAX site, and when the user clicks an action, absolutely no visual feedback is given to the user that he’s clicked? You need to have some kind of feedback, e.g. the “loading…” of gmail.
  23. Are the buttons large enough to be easily clicked on? E.g. confirmation page with “Yes”, “No” options displayed as links in a tiny font. They’re hard to click!
  24. Do the colours work even if you view your laptop at a weird angle? A site I was using recently used a white background (what a surprise..) and to highlight the tool you had selected used a light-grey background. Worked fine on their monitors I’m sure, but at the airport with your laptop on your lap, they’re difficult to see.
  25. Does the site work, or at least fail gracefully on old browsers? An error message immediately is preferable to allowing the user to type in lots of text then lose it when the user presses “OK”, due to some browser incompatibility issue (e.g. MediaWiki on Safari 3 beta for Windows)
  26. What about small screens? My parents computer uses 800×600 and I use my Laptop in 1064×600 normally. In Google Reader I can hardly see any feeds at that size. The whole screen is taken up with toolbars, menus, etc.
  27. Is the session timeout compatible with the company’s policy? E.g. do you really need the user to log in again after 10 minutes, i.e. when he just had to nip off for a meeting?
  28. If the session expires, what happens to the user’s data? Composing a long email and clicking “send” only to receive the response page “please log in again!”, and losing the email, is wrong.
  29. If there is a possibility to log in on every screen (e.g. “logged out” at the top-right of the screen, or like on uboot), then logging in should take you back to the screen where you were. Because that’s what the user would want.
  30. If you are logged out and go to a particular screen e.g. via URL or bookmark sent while a user was logged in, do you get useful information? A redirect to the homepage is also OK, but “general error” is not.
  31. If you go to a page with a form, is the text cursor already in the first field? Or do you have to reach for the mouse and click on the first field in order to use the form on the page?
  32. Does the site look good on both LCD monitors and conventional monitors? Some colour combinations (e.g. dark green on a light green background) are perfectly readable and look nice on LCDs, but are completely unreadable on conventional monitors.
  33. If a browser window is open, and a user is logged in, and from another browser (or directly in the database) that user’s password is changed, the user deleted, or disabled, is the first browser immediately logged out?

The cycle of programming languages

Thursday, January 10th, 2008

The following cycle never ceases to amaze me:

  1. People learning programming find “real” languages such as C++ or Java filled with too many “complex” constructs.
  2. They find or invent languages such as Javascript or PHP or BASIC and think they can get the job done without “unnecessary complexity”
  3. As these programmers develop, they develop increasing complex programs, and find that constructs such as classes, inheritance, exceptions, generics/templates, errors upon encountering undefined variables, and static typing help them debug their code and write better code quicker.
  4. They then add these features to their programming languages and everyone rejoices believing they’ve done something new and great.
  5. Other programmers - just starting out - find the current set of languages to be too complex as they contain features they don’t understand they need, such as classes, inheritance, exceptions, etc.: go to step 1.

I mean PHP5 includes features such as classes, exceptions, and “phpdoc”, similar to Java. When displaying an uncaught exception, the $ex->__toString() method even returns a stack backtrace just like Java. (But global errors - which different to exceptions, as they were invented before PHP5 - do not).

And now Axel has blogged enthusiastically about the improvements to Javscript in the next version. I agree that these are great improvements, but believe it is incorrect to applaud Javascript for these improvements. These are simply useful features which exist in other languages; one can applaud Javascript for realizing they are useful, however at the same time one must observe that they did not realize they were useful when designing previous versions.

I also started programming using BASIC. It did not have advanced constructs such as abstract classes and exception handling. I did not know I needed them when I started programming. So I can certainly sympathize with people at step #1 above.

But it is incorrect to treat languages such as (the original versions of) Javascript or PHP or BASIC as anything other than beginners languages, useful as a stepping stone in the process of learning to program. If you want a programming language for writing expressive and maintainable software, it would seem less effort to just to use existing languages which already have the necessary constructs for doing so, rather than extending beginners languages with constructs identical to these existing languages.

Next PHP uselessness of the day

Tuesday, December 11th, 2007

There is the option “php -l” which checks the validity of a source file. Obviously it doesn’t do a wonderful job as it doesn’t detect misspelt variable or method names; but I suppose it’s better than nothing.

So I apply this recursively to all the files under a certain directory. For reasons I won’t go into here there are Postscript fonts checked into the source directory. To this files, “php -l” outputs:

No syntax errors detected in ./pdflib/fonts/php_Times-Italic.afm

I assert “php -l” is not very useful.

Unbelievable PHP limitation of the day

Monday, December 10th, 2007

If one defines a class with the member variable:

protected static $bytes = 12582912;

then that’s fine. However if one defines it as:

protected static $bytes  = 12*1024*1024; // 12 MB

then that gives a compile error:

Parse error: syntax error, unexpected '*', expecting ',' or ';'

I know of no other language that I’ve ever programmed (i.e. including BASIC, and C) where you can write a value, but you can’t write an expression.

How broken is that!

Putting spaces around the *, or adding brackets around the whole expression, does not help.

(PHP 5.2.0)