Programming Languages: Is newer always better? (Part 2)

By Adrian Smith28 Mar 20082400 words11 mins to read

Let me respond to some of the comments left at Programming Languages: Is newer always better?

Knowing what's going on:

This is a terrible example. You are really arguing that PHP programmers don't know how their language works while C programmers do. This is a horribly wrong-headed assertion. How about I counter your straw man with one of my own. I know plenty of new (as of the last 5 years) C programmers who have no idea that 0 is equivalent to NULL.

Yeah you're right, this point is probably untrue.

At the time I wrote it I was getting frustrated with PHP programmers who didn't know the difference between == and ===. I still have the feeling that Java and C books tend to concentrate firstly on the fundamentals of the available data types and operations, whereas introductions to PHP tend to focus on just writing code that looks OK and seems to do the right thing (an attitude which leads one to write programs with subtle bugs).

But, having thought about that a bit more, that probably has more to do with my exposure to books written for people who can already program, vs articles about PHP on the web. And probably really does have nothing to do with the language whatsoever.

Strict typing

You want the compiler to check that a method can only receive an object of type SomeObject while I want any method to be able to receive any object as long as it responds to (or has the same interface) as SomeObject.

I used to think this way for quite a time, when I was programming Objective-C: that it was cool to write code which took any object as long as it responded to a certain set of methods. And that asserting an object must be of a particular class or respond to a particular interface made my code less flexible and reusable.

However after a time, looking at both my own Objective-C code and that written by colleagues, you would see methods like saveToDb:anObject. That method assumed that the parameter anObject responded to certain methods (by virtue of the method's body calling those methods on its parameter), yet this was not documented in the method's prototype (although it could have been placed in a comment had the programmer decided to), and could not be checked at compile-time. It gets worse when anObject is simply passed to some other function, so you have to open that in the editor to determine what type of object you can pass there. And you're out of luck if you don't have the source code. And even if you do document the type in a comment, you can't build an IDE where you can just click on the type and it opens the definition, immediately listing its methods and documentation.

C, Fortran, C++, Java and Pascal require static definitions and suffer greatly for it. C++ (again) and Java (again) have templates/generics to fake this kind of feature and suffer horribly for it.

I have to agree that what really has improved in modern languages and runtimes (post concerning improvements in the future) is that the runtime knows what type of object a reference points to. Using void* in C is nasty.

No, Perl isn’t strictly typed and can’t do what you’re saying. But once again, you can check things. You can validate that an Object is a particular class or descendant of a particular class. As with the variable bounds, you can validate your data.

This is true, you can do that. But it doesn't happen at compile-time (which means if you didn't unit test or click-through that code path, you don't see the error), and other programmers may choose not to even put the acceptable ranges or types in comments, and then you've got code which takes $x and then you're really stuck. (Although I suppose if you work with programmers who don't like to make readable code, you're stuck no matter what language they're programming in; I mean you can make unreadable code in any language.)

Enumerated types

This is a great feature modern day languages have though maybe it isn't called "enumerated type." Ruby has symbols so you can say your types are :hot, :warm, :lukewarm, :cold. These symbols mean the same thing everywhere. To use your PHP example in Ruby, how about error_log("user not found", :user_not_found). In this example, you don't know the languages you are criticizing.

Well that's great that Ruby has such a feature, but Perl and PHP still do not have such a feature. If they did, PHP wouldn't have defined its error_log function that way. So when I'm programming those two languages (which I do a lot, alas) I am forced to write less readable code. (Even after defining constants, i can still pass gender_male with a value of 3 to a function expecting a state where 3 means the user has been deleted, and it won't even exit with an error, let alone give me a compile error: it will simply do the wrong thing.)

No Compiler

Please point me to a modern language that is slower with longer variable and method names. Ruby, Perl, Python, OCaml and Erlang all "compile" the code to an intermediate form (bytecodes) and then execute those.

What? Are you suggesting that a comment in a procedure is parsed every time the procedure executes? I don’t know a single interpreted language implementation that would do that. The only exception are calls to "eval" or similar functions. 

As Perl, PHP etc all take plain-text files as their input, it follows that they have to process these files, byte per byte. Agreed, the better ones parse the source to an intermediate form where e.g. execution of loops will not be slower for longer variable names or a more complex programming style, but they still have to take the hit once, during the conversion from the text form to the intermediate form.

I have experienced this first hand. Uboot has about 350k lines of code (which is not unreasonable, the system provides mail, sms, photo galleries, blogs, subscriptions, and many more features, some of which are not active any more.) That takes about 4 CPU-seconds to convert to intermediate code (maybe faster these days, that was about 2 years ago). On each server we have about 30 instances of that code running. That means when we restart a webserver, it's down for about 2 minutes. It does 2 minutes of useless work!

I have been told often enough, since working at Uboot, that I use the language wrong, that my programming is too "Java style". The solution, I'm told by experienced Perl web developers, is simply not to write 350k lines of reusable library code, but instead write a simple large script with all the code rolled together. It starts faster, runs faster, and consumes less memory. And I've tried it: on some performance-critical sections I have indeed manually copy-pasted sections of code together to form one simple script, and it really does compile and run orders of magnitude faster.

I've essentially manually done what I would like a compiler to do. But that's not the way I want to program. I do not want to be rewarded at runtime for bad programming practice!

*Every* language bears this cost because they *all* to have to parse the code at some point to either turn it into bytes or machine code.

That is very true, but some languages do this on your build machine, not on your production machines when you start the service.

Also, doing this on your build machine means you can perform more expensive optimizations, as you don't have to worry about how long those optimizations take, which you do if the compiling means your service starts slower.

No linker

Your argument here is about memory footprint. This is a total non-starter on any modern operating system that does demand paging. If huge sections of your ruby/perl/python/whatever library are not used, the OS will never page them into RAM.

This depends where you wish to deploy to. For sure, on a web-server, this doesn't matter.

On Uboot I wrote the "Uboot Joe" which is a program you can download to your Windows computer. I made the mistake of writing it in Java. To distribute it, I distributed the whole JVM (as most users won't have one) which includes all sorts of things I never used, I included XML-RPC libraries (which no doubt include methods I never used), as well as my own code. The entire bundle came to 15MB. Our users had to download that just to get a program sitting on the tray, connecting to the Uboot servers, and popping up a few notifications. The size of this download file was attributed to one of the reasons why the program was not successful.

Yet cutting out unused functions via a linker is not rocket science. All C linkers do this (as far as I know).

I don't think including the JVM was an incorrect decision; the file would not have been so excessively big if the download had included the Java runtime, but only those classes and methods of the JVM which I, or the libraries I had used, could actually possibly call at runtime.

I don’t write massive GUI apps in Perl.

Unfortunately I do write massive apps in Perl (albeit not GUI ones). And I did use Java to write a downloadable GUI app (albeit a simple one).

Multiple compile errors

I prefer to write a test, watch it fail, write the code to make it pass.

Right, but I'm tired of having to write test cases for trivial methods.

If I write a setter, I have to write a test case in Perl, otherwise it might fail because I made a spelling mistake. (I know from experience, writing test cases for even such trivial things really does actually help in Perl.)

In Java I don't bother testing trivial methods; they just work.

Formatted Strings

I went through a long period of time wondering this myself. I thought sprintf was good enough all these years, why should I bother with iostreams. Well, I experienced one too many crashes from the simple error of mismatching the printf format specifier with argument type (%s -> int). These instances usually occur in logging statements that you don’t always encounter in normal code paths. This problem goes away completely with iostreams, as the most important benefit is type safety.

Ah that's true. And one of the good things about modern systems (article forthcoming) is that they know what the types of things are at runtime. If they don't (C++ by default), then I agree with you completely.

I suppose my point more related to the needless leaving out of good things which existed in the past. Java had to wait till 1.5 to get printf (and 1.4 for regular expressions). One should be more aware of the history of programming languages, and what things have already been thought of.

Auto-creation of variables

I agree with you on this one. It should be noted this is considered horribly bad practice in Perl now. Adding one line, "use strict;", stops this from happening and every program I write begins with that. I think the PHP folk have long since started declaring and initializing variables for the most part. So it didn’t work.

That is true, that "use strict" helps.

Alas many languages such as PHP do not have such a "use strict".

However, even in Perl with "use strict", you can still misspell a function/method name and that will only get picked up at runtime (assuming you unit-test or click-through that path, otherwise it will go unnoticed), and if you misspell an attribute name in a $self hash, that only gets picked up at runtime.

I mean the flexibility that Perl offers (i.e. you can fill the $self hash with anything, and write an AUTOLOAD method which gets called when a method does not exist) would mean that it would not be possible to check those things at compile-time. However for me the benefit of catching errors at compile-time outweighs the benefits of the flexibility. But that is a matter of opinion, for sure.

Several features are dropped from new languages because the designers consider it "very dangerous, no _real_ programmer would ever use that". As that’s a matter of opinion, we lose several powerful features just because they are... hmm... powerful. For example: GOTOs and Multiple Inheritance.

That's for sure true. However I would use that argument to say that the power which one gains from the totally dynamic runtimes and languages (such as Perl $self hash and AUTOLOAD mentioned above) are too powerful (and means certain static checks cannot be done). But that's a matter of opinion for sure.

If it’s Turing-complete, your language is ultimately fine.

I'm not sure about that. For me, a programming language is firstly a communication tool from one programmer to another programmer (or to the first programmer, but later). Secondly it is a way to express as many invariants as possible. Only thirdly is it a way to command the machine (which, as you say, all languages, including assembler, are capable of).

In that respect, one should choose a language firstly giving you maximum expressiveness (e.g. using an object-oriented language to program an object-oriented design, using a language which does not penalise you for creating libraries even if not all functions in the library are used in every program, etc.).

And secondly one should choose a language which enables you to express as many invariants as possible (e.g. the object being passed here should always be a User, this number should always be between 2 and 20, this reference should never be null), serving both as mandatory documentation and as a way for a computation process (e.g. compiler) to check as many of these invariants as possible.

This article was written by Adrian Smith on 28 Mar 2008

Follow me: Facebook | Twitter | Email

More on: Coding | Language Design