No need to support IE8 (even for corporate software)

One of my customers (mobilreport) is used by accountancy departments in large companies in Austria and Germany to analyze their employees’ mobile phone usage. More so than average web software, this web software needs to work on whatever web browsers large companies have installed; large companies tend to use more Internet Explorer than the world at large.

So it’s with pleasure that I report the following trend: (Numbers are the % of all users who come to the website within that month using IE8)

June 2012 34.94
July 2012 44.97
August 2012 22.12
September 2012 17.43
October 2012 16.00
November 2012 14.00
December 2012 7.78
January 2013 4.21
February 2013 1.85  (so far)

It’s amazing to think, only 7 months ago (July 2012), that 45% of our users were using IE8 (the remaining 55% being split between IE6, IE7, IE9, FF, …). And now less than 2%, hopefully soon even less.

I am developing more and more software which uses SVG to visualize things (e.g. United Youth Symbol Generator), and IE9+ (and other desktop browsers, and most mobile browsers) all support SVG. So the drop of IE8 is great news.

Constraint name visibility on MySQL and PostgreSQL

Can one have two tables with constraints of the same name? Is that even consistent between types of constraints? What about between database vendors?

It turns out that it’s neither consistent between types, nor is the way in which it’s inconsistent consistent between database vendors.

  MySQL 5.5 PostgreSQL 9.2
Unique constraint names Local to table Global to DB
Foreign Key constraint names Global to DB Local to table

From MySQL:

mysql> CREATE TABLE foo (x INTEGER, 
    ->   CONSTRAINT foo_unique UNIQUE(x));
Query OK, 0 rows affected (0.01 sec)

-- Second UNIQUE constraint CAN be created with same name
mysql> CREATE TABLE foo2 (x INTEGER, 
    ->   CONSTRAINT foo_unique UNIQUE(x));
Query OK, 0 rows affected (0.01 sec)

mysql> CREATE TABLE foo3 (x INTEGER, 
    ->   CONSTRAINT foo_fk FOREIGN KEY (x) REFERENCES foo(x));
Query OK, 0 rows affected (0.01 sec)

-- Second FOREIGN KEY constraint CANNOT be created with same name
mysql> CREATE TABLE foo4 (x INTEGER, 
    ->   CONSTRAINT foo_fk FOREIGN KEY (x) REFERENCES foo(x));
ERROR 1005 (HY000): Can't CREATE TABLE 'test.foo4' (errno: 121)

From PostgreSQL:

postgres=# CREATE TABLE foo (x INTEGER, 
postgres-#   CONSTRAINT foo_unique UNIQUE (x));
CREATE TABLE

-- Second UNIQUE constraint CANNOT be created with same name
postgres=# CREATE TABLE foo2 (x INTEGER,
postgres-#   CONSTRAINT foo_unique UNIQUE (x));
ERROR:  relation "foo_unique" already exists

postgres=# CREATE TABLE foo3 (x INTEGER,
postgres-#   CONSTRAINT foo_fk FOREIGN KEY (x) REFERENCES foo(x));
CREATE TABLE

-- Second FOREIGN KEY constraint CAN be created with same name
postgres=# CREATE TABLE foo4 (x INTEGER,
postgres-#   CONSTRAINT foo_fk FOREIGN KEY (x) REFERENCES foo(x));
CREATE TABLE

Do not use automatic code reformatting

Beautiful source code is about a communication between the author of a program and the reader of a program. It’s a communication between two people. A computer cannot communicate as well with a human as a human can.

I’ve often heard it asserted that people cannot read other people’s code without auto-formatting. However, I think that reflects poorly on both the author (who isn’t thinking enough about the reader), and the reader (reading code often requires as much effort as writing code; often readers are lazy). If the authoring programmer cannot lay out code so that it’s understandable, they’re probably writing bad code; reformatting their code won’t suddenly make it good code.

Further to the reasons why auto-formatting do not help, here are some examples where it actually hinders. There are some examples from a customer I went back to work with in 2007, who I hadn’t visited for a few months.


In the following example, there is a clear structure to the statement. A restriction is added which is an “or”, composing of a left-hand-side and a right-hand-side. I formatted each one of those sides as on a single line indented from the surrounding “or”.

result.add(Restrictions.or(
    Restrictions.ltProperty("wonUnitCount", "desiredUnitCount"),
    Restrictions.eq("reserveMet", false)));

The reformatted code does not have this structure. The “.eq” on the second line is a method on the Restrictions class of the right-hand-side. From the structure one might think that it’s one of the most structurally significant methods of the statement, but it’s not.

result.add(Restrictions.or(Restrictions.ltProperty("wonUnitCount", "desiredUnitCount"), Restrictions
    .eq("reserveMet", false)));

I needed to protect a bunch of parameters against being null. The eye scanning the following code can clearly see the “==” and the “=” in all the statements, thus one can see that all the statements are related and do similar things.

if (offeredUnitCount     == null) offeredUnitCount     = 1;
if (startCentsPerUnit    == null) startCentsPerUnit    = 1;
if (reserveCentsPerUnit  == null) reserveCentsPerUnit  = startCentsPerUnit;
if (autoBidIntervalCents == null) autoBidIntervalCents = 1;

However, the reformatted code is not only twice as long (thus meaning that some statements and methods which one would otherwise be able to see now scroll off the bottom of the monitor) but the reader is forced to examine each statement to see what they do as “==” and “=” visual pattern is no longer there.

if (offeredUnitCount == null)
    offeredUnitCount = 1;
if (startCentsPerUnit == null)
    startCentsPerUnit = 1;
if (reserveCentsPerUnit == null)
    reserveCentsPerUnit = startCentsPerUnit;
if (autoBidIntervalCents == null)
    autoBidIntervalCents = 1;

A method was going to throw a bunch of Exceptions but didn’t do so yet. This was just “in progress code”.

* @throws CentsPerUnitBeyondSystemMaximumException if either startCentsPerUnit
* or autoBidMaxCentsPerUnit are greater than the system maximum.
*
* TODO - throws DesiredUnitCountTooLowException(min)
* TODO - throws DesiredUnitCountGreaterThanOfferedUnitCountException
* TODO - throws StartCentsPerUnitTooLowException(min)
* TODO - throws CannotBidOnOwnItemException
* TODO - throws MustIncreaseUnitCountOrChangeMaxBid

Afterwards, one doesn’t know what’s going on at all.

* @throws CentsPerUnitBeyondSystemMaximumException if either startCentsPerUnit or autoBidMaxCentsPerUnit are greater than the
* system maximum. TODO - throws DesiredUnitCountTooLowException(min) TODO - throws
* DesiredUnitCountGreaterThanOfferedUnitCountException TODO - throws StartCentsPerUnitTooLowException(min) TODO - throws
* CannotBidOnOwnItemException TODO - throws MustIncreaseUnitCountOrChangeMaxBid

The method here takes few arguments. Listed in this way it’s clear that the function takes four arguments.

public static List<AuctionForSeller> getAuctionListForSeller(
    AuctionListTypeForSeller type,
    Member seller,
    int firstIndex,
    int itemsPerPage
) {

Afterwards there are 3 arguments on the first line and 1 argument on the last line. It’s almost as if the 4th argument is somehow of special significance; that’s the way it appears to the eye (even before one thinks about it with the conscious brain). But that’s not the case.

public static List<AuctionForSeller> getAuctionListForSeller(AuctionListTypeForSeller type, Member seller, int firstIndex,
    int itemsPerPage) {

Put SQL together with the code that uses it

I have heard some strong developers say that “inline SQL in and of itself is not evil”. I do not understand how inline SQL is acceptable. To me its just like hard coding. Many a developer would scoff in my face for putting a connection string in the code vs a config file. So why is it that “SELECT value1,value2 FROM TABLE” is perfectly acceptable in compiled code?

There is a lot of intrinsic coupling between a database query and the code that surrounds it

For example, a query might fetch first name and last name for a particular user from a database, and then create an XML file containing the first name and last name. If you put the database query somewhere else (e.g. config file) it might seem like you’ve increased configurability and flexibility. However, if you want to actually change it, say add a field called “age” to the XML file, you can’t just change the query alone, you need to change the code writing the XML file.

SELECT statements produce rows with certain columns, and take parameters either in a certain order (“?”) or with certain names (“:foo”). Your code needs to consume the columns produced by the SELECT, and needs to set all the parameters. If you change what the SELECT is doing, you’ll almost certainly need to change the columns it returns, or the parameters it takes. That means you’ll need to change the code that consumes those columns or sets those parameters.

By separating the code and SQL, you’ve replaced the problem of having to change one thing (code), with the problem of having to change two things which depend on one another (code and configuration). Having two things introduces the danger the things might not be consistent, and you haven’t gained any extra flexibility.

Based on http://stackoverflow.com/questions/5303746/is-inline-sql-hard-coding/5303878

SMS trailers at Uboot

We developed a cool algorithm to add informational trailer texts to SMS at Uboot. It was online from mid 2000 to mid 2011 (when Uboot stopped doing SMS). I developed this algorithm together with my colleague Mike Weinzettl. So that it doesn’t get lost forever, here is a description of it :-)

The general situation was this:

  • The users may send SMS from Uboot to phones
  • SMS have fixed length (e.g. single part SMS contains 160 characters)
  • The user may well type fewer than the maximum number of characters (e.g. 100 characters, resulting in a 1-part SMS with a maximum length of 160 characters)
  • We can use this space to promote Uboot (e.g. add “Sent by www.uboot.com” text)
  • We realized we needed a trailer system to define what text should be added to which SMS under which circumstances

Requirements:

  • Texts should be stored in a separate file/files or DB (i.e. not in the middle of the source code)
  • Different SMSs have a different number of free characters at the end (from zero characters upwards). It would be nice to put longer texts on those SMS which have more characters free.
  • Users speak different languages (English, German, ..)
  • It would be nice to put variables in the text (e.g. “Sent by USERNAME”)
  • The text depends on various other factors: Is the recipient the telephone number of a registered user?

We came up with the following solution.

  • We would have a set of candidate texts, combined with the conditions under which they may be used (e.g. recipient is known user, which language)
  • Conditions could either be a single value (e.g. “en” language), a list of acceptable values, or an indication that any value is acceptable (no condition)
  • Inspired by crontab, we used a file, one line of the file per rule.
  • Each line had a set of columns for the conditions (language,..) with values such as “de” (single value), or “en,de” (multiple values), or * (any value is acceptable)
  • The last column of the file was the text
  • The text may contain variables such as ${USER}

For example:

language  recipient-registered    text
en        yes                     Sent by ${USER} on Uboot
en        no                      Sent by ${USER} on www.uboot.com
en        yes                     Sent by Uboot
en        no                      Sent by www.uboot.com 
*         *                       Uboot
*         no                      www.uboot.com
de        yes                     Verschickt von ${USER} auf Uboot

The algorithm would then do the following:

  • Search through the file, finding all rules that matched the condition, take the texts
  • Expand variables in the texts
  • See how many characters are available in the user’s SMS
  • Throw away any texts (with variables expanded) which are longer than the available space
  • Take the longest text out of the remaining text
  • If there are more than one texts with the same length, choose one at random
  • If no text matches, simply use no trailer (e.g. if only 1 character is free, it’s unlikely a useful trailer will be defined with that length)

The advantage of this algorithm is, given two users who send an SMS with the same amount of space free, different trailers may be the longest, depending on how long their username is.

I’m proud of this algorithm :-)

Wicket component to count up to a value, then refresh its value from the server

On the avaaz website there is a nice animation to display the number of users the site currently has. The animation is a number, which:

  • Initially, counts up from 0 to the current number of users client-side, taking a few minutes
  • Thereafter, refreshes its value from the server regularly, increasing its value as new users register.

I needed something similar for a project I’m working on, so I’ve written a Wicket component to do it.

It’s perhaps not the most revolutionary component ever! But maybe it’s useful to someone. It’s available under the GPL.

CountingUpThenAutoRefreshingLabel  JavaDoc | Download.

The above example counts up to 1M and then the “refresh” is faked, for this example, by being a number which slowly increases; imagine it’s the number of users the site has, or similar. (It wouldn’t normally be in an iframe either; that is a consequence of the fact this blog is WordPress not Wicket.)

I think this is a good animation in terms of keeping the server load low. Initially, there are no server calls as the animation is done client-side (the normal use case that the page won’t be open long). The second phase re-queries the server at an interval, however this interval gets longer (i.e. the updates become slower), so that the other use case (user opens the browser and goes to lunch) also won’t result in N users hitting the server every 2 seconds for the rest of time.

Wicket 1.4–1.5, Java 1.6

“Let’s rewrite this in Java!”

I’ve heard the phrase “Let’s rewrite this in Java!” uttered in various meetings at various companies at various times in my career. Often by managers. All such projects to completely re-write the company’s software in a new language or framework inevitably end in disaster. Why is this? I must confess I don’t really know.

I suppose it doesn’t have to be Java, although it often is. Java is the classic. For example, a big company comes along and sees, oh my god, the entire code-base is written in a scripting language!! and feels the need to “clean up” the source code. Without further thought, the decision is taken to write everything. I think at least such companies inevitably choose to transition to Java.

It’s the same with frameworks. Oh my god, this is written with web framework X which is no longer trendy, or with no framework at all!!! The decision is taken to “introduce” a new front-end framework, often trendy, which necessitates a re-write of at least the entire front-end, if not the entire system.

Consequences

By “end in disaster,” I mean, mainly, that the system is never completed, and never goes online. At some point the code just gets deleted. (Or stored in some backup system, or on a branch of the VCS, where nobody ever looks at it, which are basically equivalent, if one judges the success of software by it going online and the benefit it brings to its users, and/or the money it brings to its sponsoring company, etc)

There is a danger to re-writes, beyond just costing the sponsoring organization money, without any benefit. In a small team it seems too much to concentrate on two projects at once (the old and the new). And the tasks associated with the new (conception, programming afresh) are much more fun than the tasks associated with the old (bug fixing, performance issues), so the programmers tend to prefer them. They also feel the work on the new system is a better investment of their time, as that work will be online a year from now, in contrast to work on the old system. With weak technical management programmers may do what they prefer as opposed to what is important. But if this takes a year or two, what about existing customers? The live system can just stagnate, while their competitors add new features and progress. I think in this case it’s important to have people assigned to the old project, and people assigned to the new. It might suck to be the guy assigned to the old project but what can you do? The work has to be done.

Reasons for failure

Perhaps it’s just a nature of re-writing software, as described by Joel on Software. But I think it’s more than that. I’ve seen re-writes or refactoring projects be successful. Classically the re-write of the Mac operating system was very successful, and the re-writing of the Netscape browser into what became Mozilla Firefox. The re-writing of Windows into Windows NT. At Uboot we re-wrote many pieces of software and they all worked out fine, and were much better for having been re-written.

I think it could have something to do with how the project starts out. What is trying to be achieved? I think if the project starts out with the objective “we need these completely new features” and then it turns out that a re-write is necessary, as the current code-base can’t support these new features, then such a re-write can be successful. If it starts out with the only objective being “we need to re-write it”, for example because the programming language is considered wrong, or the software design is considered wrong, i.e. things which are more faults of the old system than benefits of the new, then things will start to get into trouble.

I think if the objective is simply “we need a re-write” then everyone will chime in with all the problems the old system had and how they can be fixed. For example the software design can be improved (for its own sake). Newer frameworks can be used (for their own sake, or some vague reason such as they are more modern.) Radical new features, which have been on the roadmap for a while, but never really seemed to be totally necessary, can be introduced. Perhaps this is just too much at once, combined with re-writing a system which might have had years of work already put into it without all those new features? But I’m not sure it’s just that.

I also notice that, while the old system is still online, such re-writes do not have any feeling of urgency. (“The old system is online anyway. It’s important that we get it right this time, even if that takes longer.”) Perhaps that too is bad – urgency is often a good motivator for simplicity, and simplicity an attribute of systems with desirable properties such as ease of maintain, having fewer bugs, fewer scalability problems. Related: Perhaps people feel more allowance to act as an architecture astronaut; after all, the objective is to have an architecture, not to have features for customers.

Perhaps it’s because if there are no clearly expressed objectives beyond “we need a re-write” then the other objectives which haven’t been clearly stated (such as which new features to add) can change over time? Half way through the project a new framework is released and is decided to be used, on the grounds, well, we’re doing a re-write so we should use the most modern framework available. Or the same with new requirements which crop up during the project. If one introduces a new framework or feature sufficiently often, and these introductions require re-writing part of the already written re-write, then perhaps the re-write will never get a chance to be finished.

Perhaps it’s that people have a problem conceiving of a new system (which, on the day it’s released, must already have lots of features and scale to lots of users)? Perhaps people find it easier to go one feature at a time, and scale the existing system when necessary? Which is how an initial system often starts out.

Perhaps the projects fail because they take too long? One the one side you want to have people on the old team producing new features for the customers and keeping up with the competition. On the other side you don’t want the new project to work against a moving target and keep on re-writing bits of the re-write. These are contradictory objectives. Perhaps the trick is to get the re-write out the door in 3-6 months, before the old system has had a chance to change too much? Which is a problem if the system is beyond a certain size, or the re-write too ambitious.

Perhaps implementing a system in a language and/or framework with which the development team, and perhaps also architects, are not familiar, is a really bad idea too? This will inevitably lead to mistakes being made. Perhaps that can be mitigated by having an external consultant review the designs, perform design, do code reviews, or by using a language/framework with which senior members of the team are already familiar.

Conclusion

I don’t really know why, when I hear the words “Let’s rewrite it in X!” (where X is nearly always Java!), I just know the project is going to cost the company 1-2 years and most of the development team, get thrown away at the end, bring no benefit to either the users or the development team, and cause the live product to stagnate behind the competitors for this time.

And I don’t really know what you can do about it. Not re-writing things is clearly not acceptable either (would you like your primary system to be the latest incarnations of Netscape 4 on Windows ME?). I think you need to have clear objectives (which features do you need that you can’t implement on the old system, which technology do you want to move to and why?), keep separate teams for old and new, have at least one person with years of experience in the new technology, and make sure the new vision gets wrapped up within 3-6 months to avoid the live system having acquired too many new features.

Shameless self-promotion: I work as a software architect and have been involved in many successful re-writes. We re-wrote the UCP Media Album from Perl into Java J2EE and deployed it an an American mobile operator. For Offer Ready we’ve completely re-written our calculation engine. Uboot was in a constant state of flux e.g. introducing payments into messaging, etc. At easyname I oversaw the re-write of many key features such as the way money was handled. If you are hitting a wall with your current system and want to know your options, contact me.

Email Template

One often needs to send users email notifications. Ideally these should have the following capabilities:

  • Text is stored in separate files (not strings in the source code)
  • Localizable into different languages
  • Variables (“Hello Adrian!”)
  • Attachments (e.g. PDF invoices)
  • HTML (for styling)
  • Plain text alternative version
  • Inline images (e.g. company logo)
  • Variables displayed properly in HTML mails (escaping of “&” etc.)
  • Variables which are embedded directly in the HTML (“Here are your products: <table>…”)
  • Supporting unit testing (no email sent; assertions over what would have been sent)
  • HTML is easy to edit (double-click file, see inline images, use HTML editor, etc.)
  • Unicode (UTF-8) only

Finding no such library available, I wrote my own in Java: EmailTemplate. Javadoc | Download.

Did I oversee something? Is there a library out there which does the above which everyone else uses? (In case not, my library is open source.)

This software has a long history. Originally developed for Onestop Concept in Perl, then developed for Easyname in PHP (no longer used), and then for United Youth in Java, it’s been updated for the HR website I’m developing.

Store users’ birth dates, not ages, in the database

If you store your users’ ages in the database, in one year’s time their ages will be wrong. If you store your users’ birth dates in the database, and calculate their age from that, this age will always be correct.

Some sites wish to ask the user their age and display it. It would seem simplest to just store this number in an integer field alongside the user in the database. But you can be sure that, in one year, this value will be wrong.

Instead you should ask the user their birth date. Store that in the database. Always calculate their age by seeing how many years have passed between their stored birth date and the current date. As they get older, their age will always be displayed correctly.

In case you really wish to ask the user their age only, one trick we did at uboot, was to calculate an approximate birth date. Assuming they’re half way between their birthdays, and they say they are n years old, assume their birth date was n+½ years ago (or 12n+6 months ago). Calculate the age to display as described above. On the day they enter their age, the display will be correct. One year thence it’ll be correct as well. In between it’ll be, well, an approximation. But better than displaying their entered age forever—showing them never ageing.

Projects I’ve taken online in 2012

This year has been an amazing year professionally. Together with my employee Martin, and a contractors I’ve been using for visual design and HTML, I’m proud to report we’ve taken the following features for the following customers online 2012.

 

mobilreport

Companies can give their employees mobile phones. Companies pay the bill for these mobile phones. Companies with thousands of such phones get PDF bills which would be inches thick if one were to print them out. mobilreport imports the data electronically and allows various reports over the data: which employee phoned the most? To which countries? During their free-time or during business hours? etc.

Programmed using Java Wicket, we use MySQL to store our data, which then gets exported to XML, transformed via XSLT to create the reports that you see in the browser, and which you can download as PDF, RTF or XLS. My colleague Martin Schmidt at Onestop Concept used Altova Stylevision is used to make designing the reports a case of WYSIWYG.

In 2012 we released:

  • Monthly emails. We send employees monthly emails informing them of how much their mobile phone has cost the company. These are also designed in a WYSIWYG fashion with Stylevision.
  • Complete new visual design. Design by Mandy, HTML from my people in the Philippines, implementation by Martin.
  • Login administration. Whereas which users existed were previously done by altering configuration files, now a full GUI exists to allow customers to do this themselves. Implemented by Martin.
  • More providers. Importing data from providers is a messy process! The files are not well-formed (e.g. CSV files where the “;” field separator appears unescaped in the middle of a cell, making the CSV file technically unparseable). We always supported A1, now we also support Telering AT, T-Mobile AT, A1 Landline AT, COLT AT, Vodafone DE, T-Mobile DE. (Want to see some nastiness? Check out EDIFACT. Want some changes? Apply to the UN.) Implemented by Martin.
  • Master-detail reports. For example: On a table e.g. to which countries calls were made. Clicking [+] by one country opens an expansion where all users who made calls to that country were made, sorted by the cost of those calls. Which data forms the master, which data forms the detail, and how they are connected, is specified by configuration.
  • Subsidiaries. Previously it was necessary to log out and log back in again as a different “customer” if one company had multiple subsidiaries. Aggregated data over all subsidiaries was this impossible. Now the model has been changed; each customer has 1-n subsidiaries.
  • PostgreSQL. Ported the system from MySQL to PostgreSQL; although this was finished, it never hit production. (The issues we had with MySQL were resolved.)

Offer Ready

I’ve been working for Offer-Ready on-and-off since March 2007. Operators can integrate this software into their workflow and find the optimal tariff for their customers. With this optimal tariff, documents such as offers can be generated.

This is great software that throws up a number of interesting technical challenges. We consider millions (no exaggeration!) of mobile phone tariff variations in well under a second, to find the optimal one.

This year we designed three new pieces of software from scratch, they are now all online:

  • Publisher. When a change is made to the configuration defining the mobile phone tariffs, this software takes those changes, and does what needs to be done so that the new configuration is online and available. This is a non-trivial process involving many steps and much wall-clock time, this software ties all those steps together. While publishing is happening, the old version of the configuration is online and can be used; in case it fails, the old version stays online.
  • Profile Editor. Data entry for the system. User may enter their usage data, which then calls the existing system (API) to work out the best tariff for these data. Programmed by Martin.
  • Document Viewer. Results of calculations may be displayed in the browser; options to download PDF, RTF, XLS exist.

HR Gamification System

New product, not released yet. Website should support Human Resource departments to use the talent already within the company to find new talented employees. Using appropriate current techniques such as gamification.

I will be providing the software development for this product. Currently supporting the customer with requirements, doing software design. HTML is being done by my employees in Philippines. Next year we’ll get on to the software development (with Martin) and taking the system into production.

United Youth

Between July 2011 and February 2012 inclusive I was the sole developer for a charity web platform called United Youth for Kurt Braunhofer. Although development was continued by a new team from March onwards, in December 2012 the new team took their platform online and included one piece of software from me: the Symbol Page.