Archive for June, 2008

Programming with unique constraints

Friday, June 13th, 2008

If you’re using a remote database system, your application doesn’t have access to all the data at any point in time. (I.e. you just load and save the rows you’re interested in in a particular transaction). Therefore if you want to do some database-wide operation, you need to ask the database to do it.

When you want to enforce uniqueness, for example across a whole table (for example a document name needs to be unique), or across a particular part of a table (for example each user must have documents with unique names), you need to do that in the database.

There is only one acceptable way to do this with a SQL database:

  1. Insert a new row
  2. If the insert succeeds, there wasn’t a row there before, and now there is
  3. If the insert throws a unique constraint violation, the row is already there
  4. If you want to update the row (i.e. an “insert or update” operation to maintain a “lazy singleton” in the database), you can update the row with safety after the unique constraint violation, as you can be certain the row is already there.

The following methods are all not acceptable:

  • Do a “select” to find out how many rows there are. If there aren’t any, do an “insert”. However someone may have inserted a row between your “select” and your “insert”.
  • Do an “update” and if the database says that 0 rows have been updated, do an “insert”. Again, someone may have inserted a row between your “update” and the “insert”.
  • Do a “select for update” statement (Oracle, Postgres, InnoDB) to check that there aren’t any rows while creating a lock, and then do an insert. However that statement only locks the rows it returns, so if it doesn’t return any rows, it doesn’t create any locks, so you still can’t be certain that no one has inserted a row between the “select” and the “insert”.
  • Lock the whole table and do one of the above. This works, but it means that all write access is “serialized” i.e. happens after one another. Any other operation, writing something completely irrelevant, will now also have to wait until the end of your transaction, whereas it shouldn’t have. This reduces concurrency.

The way I program this is the following. On the “insert” statement, I catch the database error, and see if it’s a “unique constraint violation”-type error. If it is, I throw an (unchecked) Exception. The calling code can catch that and do something with it (or not, if the statement should not generate such an error, in which case it will propogate to the main loop like any other database exception). I have had the pleasure of introducing this to easyname.eu and now also the pleasure of introducing this to the WebTek framework.

It is extremely frustrating working with big complex frameworks, whose usage is much more complex than just writing SQL manually, and not being able to do the above properly.

  • Hibernate clearly states in its documentation that if any database error occurs, the Session (main object managing all persistence) must be destroyed as its state will be out-of-sync with the database. But there is no way other than the above to do this sort of check (as far as I know).
  • OptimalJ generates code that, if a database error occurs, sets the transaction to rollback. Including any other work you may have done.

I mean checking unique constraints is something every database application needs, so the fact it’s not supported by major frameworks is just unbelievable.

Oracle, Nulls and the empty string

Thursday, June 12th, 2008

Oracle has an unusual feature, which attracts it a lot of criticism. If you try to insert the empty string into a column marked “not null”, you get an error. The empty string is treated the same as “null” by Oracle.

This is different to programming languages (and indeed other databases, at least MySQL), which means one has to be careful not to make a mistake when using a programming language to talk to the database. That it’s different from other systems is the main reason for the criticism.

However, how many times have you wanted the following to be allowed in your data schema, for example on a “first name” column:

  • Writing nulls is not allowed
  • Writing the empty string is allowed

I just wrote a program and the front-end framework helpfully noticed that the field was “not null” in the database and gave the user an error in the front-end if the field was empty. However when I altered the code slightly, it no longer gave an error. Because the field in the program was the empty string, and not null.

However, I assert, when dealing with data, checking only for not null is not useful; you also want to check that the string contains some data in that case.

I have now updated the framework, so that if the field is marked “not null”, then the error is presented to the front-end not only if the variable in the program is “null”, but also if it is the empty string.

(Note: I am not advocating e.g. Java lose the distinction between ==null and .isEmpty(): for some reason this is a useful distinction when doing programming and data manipulation—such as null indicating that a variable isn’t initialized yet—I just don’t think it’s a useful distinction in a system solely designed to model persistent data.)

Using fixed-width fonts for data entry fields

Wednesday, June 4th, 2008

I’m sure that Joel on Software is well read: his views on user interfaces are certainly well worth reading.

But there’s a point hidden down at the end of Chapter 7 which is well worth repeating. And that’s that using a wide fixed-width font for text entry is much better than a thin proportionally-spaced font.

Recently I noticed that the invitation panel on Google Spreadsheets, allowing you to type in email addresses of people to invite to collaboratively edit the document, used a fixed-width font. And conveniently they didn’t alter their word processing program’s similar facility, making it easy to compare the two.

It really does just feels so much nicer to enter text using a fixed width font. So all projects I’m doing now, I make sure I use a fixed-width font for data entry. And I don’t even think it has to look ugly!