Archive for February, 2007

Using UTF-8 and Unicode data with Perl MIME::Lite

Tuesday, February 27th, 2007

MIME::Lite predates Perl 5.8 which supports Unicode and UTF-8. But it's easy to get MIME::Lite to work with Unicode bodies and subjects.

To attach a plain text part to a message, with a string which contains unicode characters, use:

$msg->attach(
   Type => 'text/plain; charset=UTF-8',
   Data => encode("utf8", $utf8string),
);

To set the subject of a mail from a string containing unicode characters, use:

use MIME::Base64;
my $msg = MIME::Lite->new(
   ...
   Subject =>   "=?UTF-8?B?" .
      encode_base64(encode("utf8", $subj), "") . "?=",
   ...
);

Note that the above methods also work even if the strings do not contain unicode characters, or do not have the UTF-8 bit set.

It would be better to change MIME::Lite such that subject and data strings are accepted and the above code happens inside MIME::Lite. I've filed a bug report.

Starting a Software Development company

Friday, February 23rd, 2007

For those of you who don't know, I'm now living in Macau as well as Vienna.

I always wanted to start a software development company, but hiring people in Europe is expensive. And now I've found a country with a lower cost of living and salary expectations than Europe, and that not only permits international business but actually encourages it. This country is Macau (or Macao).

The starting of the company is underway. I offer good quality outsourced software development, at a good price. For large or small projects. So if you need to increase speed in your existing software development projects, or develop an entirely new system at a good price, then don't hesitate to contact me for a quote.

http://www.smith-software-development-macau.com/

CHAR vs VARCHAR (and VARCHAR2)

Friday, February 23rd, 2007

A friend just asked me:

I have a DB, informix actually but I think its unimportant. A column is a char100. I have a string of text in a row in that col. the string is 4 characters long. When I select the char100 column I get a space padded string of 100 characters with my string at the front. Have you ever seen this?

Yep that's normal. A char(100) column is exactly that: 100 characters, no more, no less. So if you put too few characters in the field, rather than giving an error, the db pads the value with spaces.

The "very advanced" datatype varchar (or varchar2 in oracle) remembers how many characters you put in the field. (I shudder to think what oracle's varchar datatype must have been like. These days it's an alias to varchar2).

There are basically no reasons to use a char. We used e.g. char(2) to store a user's country in uboot ("at", "de" etc) but in fact we even had some problems with that, so we decided to always use varchar thenceforth.

In MySQL, if you define a table where it can pre-compute the byte-width of a row in advance, i.e. it's only composed of chars, ints, etc, no varchars, then if you delete rows and then re-insert rows, you never lose any space. As it can slot new rows exactly into the space taken up by old rows.

But if you use any varchars, then it can't. And in that case, it states there is no advantage to having any chars. So if you have a table with some chars and some varchars, then it converts those chars to varchars.

Incidentally, my book on Oracle (which serves also as an advertisement for Oracle) says that Informix, if you do an insert, then the page where the row gets written gets locked for the duration of the transaction, i.e. page lock not row lock. So if you do another insert from another transaction, even though it's an independent row, it's probably going to be written to the same page, so it must wait on the first transaction. So there's a lot of waiting going on. Not being able to do two inserts on the same table from two different transactions simultaneously would seem to be exeedingly rubbish. So I'm curious if it's really true or just Oracle marketing. Or maybe it was the case in Informix 1.0. But let's not forget that Oracle 1.0 probably didn't have varchar2.

Update: my friend writes: spoke to our informix guy, he says that was true in informix 7.10 from '97.

Database error messages

Wednesday, February 21st, 2007

Database error messages in general are very bad. Why?

Oracle Version 8 had lots of messages such as "Invalid column name" where they meant:

  1. Column name not found in the table in question. The word "invalid" is the wrong word as it implies illegal characters or something like that.
  2. Which column? Which table? The parser surely knows this at the time it generates the error message. But it helpfully chooses not to inform the user.

Thankfully Oracle 10 has improved its error messages a lot. They include the statement in question and the point in the statement producing the problem. And error messages contain which foreign key constraint has been violated, and so on.

But I have the following problem with MySQL. I try to create a table with InnoDB with a foreign key constraint and it says:

ERROR 1005 (HY000):
Can't create table './myschema/mytable.frm' (errno: 150)

What it means is: the statement has an error in it. But what is the error? In this case, there was a foreign key constraint and the column didn't exist in the referenced table. But why couldn't it tell me this?

How MySQL reduces error messages in your program

Thursday, February 1st, 2007

Ah MySQL (at least MyISAM) so isn't a real database!

Firstly, when doing an insert, I did some arithmetic. The numeric column was of a certain width. If the result of the arithmetic is larger than the maximum allowed value my number was just getting turned into that maximum allowed value, without warning or error. A large number suddenly becoming some other large number may sound good in the philosophy of "errors are bad - we want to minimize errors!" but literally it's never what you want. Oracle gives an error if a number is too big to be stored in a column. Which is what you always want.

Secondly, due to above arithmetic overflow errors, my insert statement was failing (as multiple values that should have been distinct, but beyond the maximum, were then identical, equal to the maximum). I kept on doing it and it kept on failing. Then I looked at the table and each time I'd done such an unsuccessful insert (a single statement to insert maybe 10k rows) some rows (but not all - due to the error) were getting inserted. Having half a statement succeed is never what you want! Oracle sets an invisible checkpoint before each statement and if the statement fails, rolls the database back to that checkpoint. That's always what you want!