Archive for the ‘Coding’ Category

Transfering some hex. Sometimes gets replaced by string "INF". Why?

Thursday, May 10th, 2007

This was never going to work out. Data transfer interface. Our side in Perl and their side in PHP. Both scripting languages (bad) and not even the same scripting language (incompatible badness).

Over the data transfer interface, we are transferring users. Including a code to enable them to unsubscribe from an email newsletter. The first 7 characters of the code identify the users (digits) and the rest of the code is a hex string containing some security information.

All works great. But some users can’t use the code? It turns out on the destination system they have “INF” in the field instead of the code.

It turns out that some of these users have e.g. 1234567 to identify the user, and e.g. 123e1234567 as their hex code. That makes the security code “1234567123e1234567″. And that “looks like” a floating point number to Perl. But quite a big one. Almost as big as Infinity in fact, so might as well call it that.

I hardly think the flexibility we “won” through every data instance having its own type based on what its data “looks like” hardly compensates the anger of a segment of our users not being able to unsubscribe from their newsletter, or the extra expense to the company of the time to debug this problem (which was then an urgent problem, as it was only discovered after the system went live, as it only affected 0.6% of our users).

P.S. my solution was to put a space in front of the code, which is taken off by the receiving system, so the data always “looks like” a string. But I wouldn’t like to guarantee that what “looks like” a string won’t change with the next version of the Perl SOAP client libraries we are using.

Class names repeating information stated in the package name

Sunday, May 6th, 2007

Classes in modern programming languages can be arranged in hierarchies, e.g. a perl class might be called “Uboot::Message::Mail” or a Java class “com.uboot.message.Mail”.

In some programming languages (e.g. Perl) one always refers to the class by its full name (such as “Uboot::Message::Mail”) and never by its leaf name (e.g. “Mail”). For example:

use Uboot::Message::Mail;
my $mail = Uboot::Message::Mail->new();
print "it's a mail" if ($mail->isa("Uboot::Message::Mail"));

In other langauges (e.g. Java) one almost always refers to classes via their leaf-name, such as:

import com.uboot.message.Mail;
class MyClass {
   public void static main(String[] args) {
      Mail mail = new Mail();
      if (mail instanceof Mail) System.out.println("it's a mail");
   }
}

For those languages such as Perl, which require using the class’ full path at all times, it’s not necessary to repeat information in the leaf name that has been specified already in the path. For example, a class to model an entry in a Uboot address book might be in a directory called “Uboot/ABook” in which case the entry class can be called “Uboot::ABook::Entry”.

But in Java, you don’t want to have a class called “Entry” because, as soon as the “import” statement scrolls out of sight, you’ll not know if your instance, helpfully statically typed to be an “Entry”, is an address book entry, a guestbook entry, a blog entry, or any other conceivable type of entry. In that case the class needs to be called something like “com.uboot.abook.ABookEntry”.

Class names like “Uboot::ABook::ABookEntry” or “Uboot::Monitoring::MonitoringResult” are (only in langauges such as Perl) needlessly redundant and long.

perl / switch statement: Cool Limitation

Wednesday, May 2nd, 2007

Look at the documentation for the Perl switch statement. Look down the bottom at the “limitations” section. Look at the last limitation.

vi

Tuesday, May 1st, 2007

Here I am, programming using “vi” and, as usual, it’s annoying me. Why am I using it?

It’s just occurred to me, I remember from my childhood, my father would come home from work and complain about “vi”.

I wonder if my children will use “vi”?

Scripting languages are only for advanced programmers

Wednesday, April 25th, 2007

Why do people believe scripting languages are suitable for beginners to programming? This may be the way they were designed, but they end up having the opposite effect.

Essentially I think scripting languages are like languages for experts - like when you can play the piano and get a really hard piece to play, to show off how good you are. In programming, if you can already write bug free code using a normal language, but why not try the same thing using a scripting language? If you can do it, it shows your intellectual dexterity and cleverness (although the result would in that case be the same, i.e. a working program, and in the case you failed, it would be a less-working program).

Look at the following: http://at2.php.net/manual/en/function.stripos.php

I understand that “warning” completely. Essentially this function returns zero to indicate the string occurs at position zero and false to indicate the string does not occur in the text. But they are both the same if compared with the php == operator. And for that reason there is a === operator, called “exactly equals”, which also tests the type of the thing being compared. In the case of ===, not (0 === false).

I mean this is not simple stuff. Nor is it abstracting away from the details of programming.

How is this in any way better, or simpler, than:

  • C which returns -1 if the item is not found?
  • Java which
    • gives a compile error if you try and compare false to 0,
    • and also returns -1 if the item is not found
  • A hypothetical language which threw a SubstringNotFoundException, which gave a compile error if not checked or thrown?

Scripting languages should just be banned in my opinion.

Exceptions: use them

Friday, April 20th, 2007

Exceptions have been around for a long time. There’s no reason not to use them.

I don’t want to ever see code such as this ever again.

if ( ! $user) return false;

We all know what happens with such code:

  • Nobody checks the return value
  • Especially if half the code is written using exceptions, and half using return values, then definitely nobody will check the return code
  • It breaks the linguistics of the language. A function called “getUser” should return a User, but a function called “deleteFile”: why does it return “true” or “false”?
  • Where are the log statements, stating why this function returned false? What if the function has multiple places to return false? Then (perhaps) the calling function can print the log “function deleteFile returned false” but that doesn’t tell you which of the multiple places failed.
  • What if you want to programmatically check the return result of the function? If all the errors return the same value, there’s no way you can programmatically respond to each error differently (as some might be permenant errors such as the file doesn’t exist, some might only be temporary errors such as network currently unavailable.)

Every language in use today, at least in the projects I work on, has ways of handling exceptions.

  • PHP5 has exceptions.
  • Java has exceptions.
  • C++ has exceptions.
  • Perl has “eval” and “die” which are like exceptions.
  • Javascript has exceptions.

One needs to simply be assertive, realize that exceptions are much better, and insist on using them.

Don't think that task tracking numbers are a replacement for documentation

Wednesday, April 11th, 2007

Do you think this is an appropriate and sufficient documentation for this function?

# 3978
#
sub is_contact_in_abook_for_user {

This number refers to the task number in a task/bug tracking system. The idea being, why write documentation, when that would simply duplicate what is already available.

There are a number of reasons why this sort of documentation is bad, but the main one is that a feature lives on a for a long time, as does reusable code which one creates in order to implement the particular feature. But a task is just a task, once it's done, no one cares about it any more. So if one sees e.g. a class modelling a user exposing methods such "fetch by user name", "fetch by id", "fetch by telephone number", it isn't really helpful to know that the first two were implemented as part of a "implement payment" feature, and the last for a "mobile phone shop" feature. There are so many aspects of those features which have nothing to do with a function which can be reused time and time again.

The other reasons are:

  • You cannot click "3978" and see the documentation. You have to open a browser window, search for the task, and so on. This is a lot of work, so many people will never do it, meaning they'll alter the code without understanding what it's doing. This may be their fault, as opposed to the fault of the process, but it's a reality nevertheless.
  • What about the other documentation for the function? What are the return types? What are the parameter types? It's not documented here and I bet it's not documented as a "comment" to task #3978.
  • Task trackers get changed over the life of the program. In this case, the original task tracker was maintained by another company with whom my customer no longer has a relationship. So I have no way whatsoever to find anything about task #3978 or any other. So all those comments littered throughout the program are literally useless.

I think the solution is the task tracker should reference the code, not the other way around. E.g. a comment by a programmer "adrian 20th dec 2006: committed file MyClass.java in svn revision 2938".

Floats and Doubles

Friday, March 9th, 2007

C has the types "float" and "double" (normally 4- and 8-bytes long). Java has them too (but their length and behaviour is exactly defined in Java). PHP also has "float" and "double". Most people only ever program with doubles.

I always wondered where this concept of having two exactly-sized types came from. Databases allow yo to specify precision and scale to suit your application.

I always assumed C introduced this, and every other language copied it (like many other features, such as for-loop syntax). But maybe it was Fortran?

http://www.php.net/manual/en/language.types.php#43671

Making changes to a Script

Wednesday, March 7th, 2007

I need to change a bunch of functions in a bunch of classes to take a "user" object as opposed to a "user_id" number.

I am using a scripting language. How am i going to do this? I am going to do it the best i can, then compile, but the compiler is not going to find any problem as they are the same "type" i.e. they are both values.

Then i'm going to get lots of runtime errors. I'll correct them until they all go away. Then there will be no known runtime errors. Which is better than known runtime errors. But both more difficult and much less satisfactory than a simple compile in a language like Java or C++ which would find all occurances of such errors.

Thankfully I have lots of unit test scripts which will help in my unnecessary debugging process. A lot of them I just wrote to test simple things like "do getters work?". In a compiled language many of those tests would be unnecessary as the only errors that could occur, the compiler would pick up.

So scripting languages have caused me more effort:

  1. To do and test this change
  2. To write and run and maintain unnecessary unit test scripts
  3. To write this blog post (which wouldn't have been necessary had points 1+2 not been necessary).

Using UTF-8 and Unicode data with Perl MIME::Lite

Tuesday, February 27th, 2007

MIME::Lite predates Perl 5.8 which supports Unicode and UTF-8. But it's easy to get MIME::Lite to work with Unicode bodies and subjects.

To attach a plain text part to a message, with a string which contains unicode characters, use:

$msg->attach(
   Type => 'text/plain; charset=UTF-8',
   Data => encode("utf8", $utf8string),
);

To set the subject of a mail from a string containing unicode characters, use:

use MIME::Base64;
my $msg = MIME::Lite->new(
   ...
   Subject =>   "=?UTF-8?B?" .
      encode_base64(encode("utf8", $subj), "") . "?=",
   ...
);

Note that the above methods also work even if the strings do not contain unicode characters, or do not have the UTF-8 bit set.

It would be better to change MIME::Lite such that subject and data strings are accepted and the above code happens inside MIME::Lite. I've filed a bug report.