Archive for the ‘Databases’ Category

If you’re going to change databases, do it in one go

Tuesday, June 25th, 2013

At Uboot, there was the suggestion that we should change database technology. This was because we would save money with by moving away from Oracle to e.g. MySQL.

It was further decided, as we had 200k+ lines of code, and 100+ tables, that if a migration was to occur, it shouldn’t happen  “all at once”, but instead one would have some parts of the system migrated with other parts of the system not yet migrated.

But the strategy of migrating databases piece by piece has the following problems:

  1. Data consistency. Initially, all important data are within the source database. It is possible to guarantee certain consistency constraints (e.g. all photos are owned by existing users). However, if e.g. photos are migrated to MySQL, while users remain in Oracle, no such consistency can be guaranteed. Inconsistency has the consequence that software must accept not only valid data, but also invalid data. It is impossible to test the software for all combinations of invalid data, and with the rule that “software which is not tested does not work”, one can deduce that the software will then not work. The set of inconsistencies will increase over time, so the set of users who experience bugs will also increase over time. Afterwards, going from an inconsistent state to a consistent state is nearly impossible.
  2. Reporting over multiple system areas. With one database, one can ask a question such as “how many photos belong to users logged in in the last month”. A “SQL join” is used. However if the tables reside in different database, no such query can be asked. One would have to export the data from one database to another, in order to ask the query. Or export all data into a centralized data warehouse. Or write software to do what a single line of “join” would do. All this effort is saved, by using a single database.
  3. Point-in-time recovery. With one system, if the database crashes or an error is made (e.g. “drop table account” is executed by an employee), a “point-in-time recovery” can be performed. Data is restored to its consistent state as it was at a certain point in time. If one uses more than one database, a point-in-time recovery will be more difficult. For example if the databases are out-of-sync by 1 second, then some photos will exist for which there are no users, or vice versa, and data consistency problems exist as described in point 2.
  4. Spending money on migration; no saving in license fees. If a migration is done, time/money is spent. Not only in doing the migration, but fixing performance problems after the system is done. If money can be saved on the Oracle license, perhaps this time is worth it, however if one is using multiple databases then one is still paying for Oracle.

The solution is either to change databases in one go, or simply don’t change databases. You want to have your data in one place, at all times.

PostgreSQL vs MySQL “update” difference

Tuesday, March 26th, 2013

While one session is doing a “delete then insert”, and another session doing an “update”, the update waits until the “delete then insert” is finished. But does the “update” then affect the “inserted” row or the “deleted” row?

Create a table foo(id) and insert one row with id=5 into it. Then:

Session A Session B
UPDATE foo SET id=6 WHERE id=5 (blocks)
COMMIT (unblocks)

What does Session B see?

The result depends on your database vendor:

  • PostgreSQL: Row has id=5 (Session B’s update affected the deleted row)
  • MySQL InnoDB: Row has id=6 (Session B’s update affected the inserted row)

I don’t think I’d say either of these is the “right” approach, I think they are both valid. If two things happen at the same time, who’s to say which should win?

I would actually advise against deleting then inserting. I would insert then, if a unique constraint violation is triggered, do an update. But this doesn’t change the fact that, if you’re doing two things at the same time, you can’t attach meaning to which one should win.

BTW if you’re using an ORM like Hibernate, which claims to abstract away from the database, and allow you to use any vendor, do you think it takes differences like this into account?

Originally posted here:

Constraint name visibility on MySQL and PostgreSQL

Thursday, February 7th, 2013

Can one have two tables with constraints of the same name? Is that even consistent between types of constraints? What about between database vendors?

It turns out that it’s neither consistent between types, nor is the way in which it’s inconsistent consistent between database vendors.

  MySQL 5.5 PostgreSQL 9.2
Unique constraint names Local to table Global to DB
Foreign Key constraint names Global to DB Local to table

From MySQL:

mysql> CREATE TABLE foo (x INTEGER, 
    ->   CONSTRAINT foo_unique UNIQUE(x));
Query OK, 0 rows affected (0.01 sec)

-- Second UNIQUE constraint CAN be created with same name
mysql> CREATE TABLE foo2 (x INTEGER, 
    ->   CONSTRAINT foo_unique UNIQUE(x));
Query OK, 0 rows affected (0.01 sec)

mysql> CREATE TABLE foo3 (x INTEGER, 
    ->   CONSTRAINT foo_fk FOREIGN KEY (x) REFERENCES foo(x));
Query OK, 0 rows affected (0.01 sec)

-- Second FOREIGN KEY constraint CANNOT be created with same name
mysql> CREATE TABLE foo4 (x INTEGER, 
    ->   CONSTRAINT foo_fk FOREIGN KEY (x) REFERENCES foo(x));
ERROR 1005 (HY000): Can't CREATE TABLE 'test.foo4' (errno: 121)

From PostgreSQL:

postgres=# CREATE TABLE foo (x INTEGER, 
postgres-#   CONSTRAINT foo_unique UNIQUE (x));

-- Second UNIQUE constraint CANNOT be created with same name
postgres=# CREATE TABLE foo2 (x INTEGER,
postgres-#   CONSTRAINT foo_unique UNIQUE (x));
ERROR:  relation "foo_unique" already exists

postgres=# CREATE TABLE foo3 (x INTEGER,
postgres-#   CONSTRAINT foo_fk FOREIGN KEY (x) REFERENCES foo(x));

-- Second FOREIGN KEY constraint CAN be created with same name
postgres=# CREATE TABLE foo4 (x INTEGER,
postgres-#   CONSTRAINT foo_fk FOREIGN KEY (x) REFERENCES foo(x));

MySQL Performance: MyISAM or InnoDB?

Friday, November 16th, 2012

A friend of mine posed the following question recently:

Do you have an opinion over using myisam or innodb for a read heavy MySQL DB?

I do indeed have an opinion on this topic. It is thus: Use InnoDB for everything, use MyISAM for nothing.

In some speed tests, it might appear that MyISAM is faster for read-heavy workloads. However, that only translates to reality when you’ve got a database which is doing nothing apart from read-only work (e.g. data warehouse system generating reports during the day, and importing new transaction data in the night.)

In most environments, the speed you get is measured by the amount of concurrency you can support, not by the throughput of a single transaction running without anything else running on the machine. InnoDB allows reads and writes simultaneously (row-level locking; multi-version concurrency control), whereas MyISAM just locks the whole table as soon as you do anything.

For example, a system where one user is reading their data, another user is writing their data, and a third user is running some reports over the entire database, the following will happen with MyISAM:

  • As soon as e.g. a minute-long report query starts, nothing else will be able to write to the database until it ends. This means that all your users who want to write to the database will simply wait that number of minutes. Further, if it’s a web application, as those users will consume an Apache child process, any further read-only requests will also fail as Apache will run out of available child processes.
  • If you have a number of individual users doing small write queries and small read queries, the writes will wait on the reads and vice versa. This means you will have low concurrent throughput.

With InnoDB, on the other hand, writes lock only the rows involved, and only against other writes.

  • The minute-long report query doesn’t lock anything. Other users can read and write anything they like.
  • Multiple individual users can write to their own rows concurrently without waiting on each other.
  • Even after a “commit” has been issued for a change, the old value is stored. This means that the minute-long report sees the values of the data which were valid at the point in time the statement or transaction started, even if, during the execution of the statement, another transaction altered the data and committed that change (“consistent read”).

And then there are the long list of non-performance related advantages of InnoDB over MyISM: the fact that InnoDB allows you to use transactions, allows foreign key constraint checking, and various other things, which you’ll be wanting to use.

BTW if you’re going to do a migration, a company I worked for did this once:

Rather than doing a “turn off; migrate; turn on” scenario, they set up a new server with the new storage engine and turned on replication. A few weeks later, all the data was replicated to the new server. So they they just turned off the old server, made sure the two instances were in-sync, turned on the new server, so that reduced the downtime by quite a bit. That also works from MySQL 4.x -> 5.0, and 5.0 -> 5.1, indeed going from 4.x MyISM to 5.1 InnoDB involved a chain of databases all replicating from one another.

For more information on migration, see:


Thursday, July 7th, 2011

The MySQL function curtime has (at least) the following issues:

  1. The function returns the current time as a number i.e. the time quarter past 8 would be returned from the function as the number 201500. (So if you subtract one from such a number you get 201499 which has no meaning.)
  2. The function only returns this if you use it in an “integer context”, i.e. (x+0) causes x to be evaluated to an integer. (Otherwise it produces a string in a different format.)
  3. At some point the function was changed not to return an integer but to return a decimal with 6 decimal places. As far as I can tell these 6 decimal places are always zero. The example in the documentation also supports this experience.
  4. There is no way to specify a timezone. When going from a point in time (e.g. unix time or simply the concept of “now”) to/from an external format (hours, minutes, year, …) you must always specify a timezone. The concept of the “default timezone” should just be eradicated and deleted from the world. If I had a time-machine, the first thing I would do is go back and prevent this concept’s creation. (The user must see the data in their timezone, which on a web application needn’t be the same as the server’s. And if you are writing a desktop application, you need to decide explicitly to use the user’s timezone, as not all data might be for the user, e.g. data from a server or database.)
  5. I don’t really think the name of the function is very good (saving characters by abbreviating the name, but why?)

Why would you ever use a function? Why would you ever add it to your database product. *Shakes Head*

I welcome feedback of the form “this is an amazing function, I’ve used it my application and it fitted my need exactly”. I do really welcome this. Please don’t hold back.

Related: curdate, now.

Circular dependencies on Foreign Key constraints (Oracle)

Tuesday, November 9th, 2010

Sometimes it’s valid to have tables that reference one another. For example:

  • A photo table stores photos, each of these photos belongs in a folder
  • A folder table, but each folder has mandatory (not null) “cover” photo

So both tables reference one another, but how do you create the tables, and how do you insert data?

  • If you try and create either of the tables, the other would need to exist in order to create the table with the foreign key constraint. The solution is to firstly create both tables without the foreign key constraints and secondly “alter” the tables to add the foreign key constraints.
  • If you try and insert a photo, the folder would need to exist; if you try and insert a folder then its cover photo would already need to exist. The solution here is to create the foreign keys with “initially deferred deferrable“; then the foreign key constraints are checked at commit-time rather than insert-time. You can insert the rows, in any order, and only when you do a commit will the system check that the constraints are not violated.

A perfect explanation here:

MySQL “lock tables” does an implicit commit

Thursday, May 7th, 2009

The MySQL “lock tables” (and “unlock tables”) command has a nasty side-effect, it implicitly commits the current transaction.

This caused a bug in production code (a normally irrelevant temporary error, which should have normally caused a rollback, only rolled back to the last “unlock tables” command due to its implicit commit, and thus left the database in an inconsistent state, meaning that when the request was retried the software found the database in a state it wasn’t expecting and thus couldn’t process the request even though the irrelevant temporary error had now been fixed. So the request got retried indefinitely, and the result was that a lot of orphan invoices were created in the database, which wouldn’t have been a problem but invoice numbers were then consumed by these unused invoices, and Austrian law dictates that invoice numbers must be sequential, i.e. the numbering system shouldn’t have huge holes in it. Or, as in our case, a number of huge holes, as the system was up and running for other requests during this bug (a feature, that one faulty request shouldn’t take the whole system down), so other invoices did get correctly generated at the time, and their invoice numbers were then small islands in the sea of unused/invalid invoice numbers produced by the bug.)

I wish this had been more obviously documented (MySQL 5.1 lock tables documentation). It is there in the bullet points (and the comments!), but it’s hardly well emphasized, and I missed it.

The trouble is that the “lock tables” command comes from the stone-age of MySQL before it supported transactions at all.

This has been “fixed” in MySQL 6.0 (MySQL 6.0 changelog), via the introduction of a new options “in share mode” / “in exclusive mode”.

In the mean-time, if you need to lock a table (which one needs to do in order to ensure table-wide sequential numbering, as a primary key isn’t designed for that) I would recommend creating a table with a name like “mutux” and creating a row for each type of table-lock you might like to acquire, and then “select for update” the row corresponding to your table. This creates an in-transaction row-level lock, doesn’t do an implicit commit, and the lock will be automatically released when the transaction ends.

“Ask Tom” Oracle Seminar in Prague

Tuesday, March 10th, 2009

Today and tomorrow I am at an Ask Tom Live Seminar in Prague. Tom Kyte works for Oracle and has written many books on Oracle. For me he’s a bit of a “star”, so when I heard he was talking in Europe, I had to come. (But is that wrong? Normal people are in to stars like Robbie Williams surely, not Oracle experts.)

Wow there is so much I don’t know about Oracle evidently. I mean the solutions that he is presenting concerning Oracle 11g, are solutions to problems in features I’ve never used, such as partitioning, etc.

Some fun things I learnt:

  • In Oracle 6 “sql plus” used to be offered as a for-cost extra, as it was considered so good! (Apparently there was a previous tool, and it was worse.) Sql Plus still doesn’t support e.g. the “up” arrow key to get the previous command.
  • In Oracle 3 “insert into x select from x” would be an infinite statement as it would constantly re-select then insert the data from x that it had just inserted. This was fixed in Oracle 4. 
  • Oracle 11 can create “virtual columns”, i.e. “create table x(a number, b as (a+1))”. You can then use these virtual columns for indexes (like functional indexes), foreign key constraints, etc. A statement like “select .. from x where a+1=?” will be rewritten as “where b=?” which will then use the index. Audience member asked “can I write an update statement to update b?”
  • You can do “select …. as of <date>” and if the data is still available in the rollback segment, it will be returned.
  • Adding a column is now always instantaneous, i.e. “alter table x add y number default 4″ will not alter the data in the table, but when a row is read which was created before the alter table statement happened, the column will be added with its default value. (But presumably altering the column’s default value then causes the long write?)
  • Apparently Oracle was the first RDBMS. Well that’s what Tom Kyte claims anyway.

P.S. I should have brought my Oracle One on One with me for him to autograph… OK that would have been ridiculous.

What you can’t deduce by looking at primary keys

Wednesday, January 28th, 2009

There is often a misunderstanding about primary keys in database tables. Primary keys identify rows uniquely, so the only property they have to obey is they must be unique.

Primary keys don’t have to be numbers. But if they are, here are a few properties these numbers don’t have to have:

  1. Numerical PKs don’t indicate ordering, i.e. which row was inserted first.
  2. Looking at the highest PK value doesn’t indicate the number of rows created (cardinality).
  3. There are not guaranteed not to be “holes” in the numerical sequence: e.g. row 11 and row 13 exist, but row 12 doesn’t.

In addition to PKs not having to have these properties, there are good reasons why auto-generated PKs actually don’t have them practice.

Many types of systems (for example websites, or the central transaction database of a supermarket) measure their performance not by the duration of a particular transaction (0.10s or 0.08s doesn’t make a big difference) but measure their performance by concurrency, which influences the throughput, i.e. if you can handle 100 or 1,000 transactions per second.

If all data and locks which are system-wide are stored on the database, then all other parts of the system (web servers, application servers, cash tills) can be independent of one another. If the performance of system were limited by one of these non-database parts, one can improve the throughput of the system (although not the duration of individual transactions) by increasing the concurrency, i.e. simply adding more of these non-database parts in parallel. (Surprisingly, the case in one system I’m building, concerning simulation of many different mobile phone tariffs for a particular phone usage scenario, this is indeed the case: the slowest part by far is the application server)

Increasing the concurrency in the database, on the other hand, is not such an easy task. One can’t just split the data up into different databases: unique constraints must be enforced over all data, referential integrity (foreign keys) must be enforced over all data, locks must be system-wide (e.g. “does user X have enough money? first lock, then check, then take away money, then unlock”)

But there are techniques available:

  • Is the limiting factor the database CPU? If so, more cores can be added. If that limit is reached, an expensive cluster-database can be used. But, in my opinion, database should be storing and reading the database, and not doing CPU-intensive processing. If the processing is done by the application servers, then they can easily and cheaply be scaled, as stated above.
  • Is the limiting factor the disk? One can add more disks, which can physically read/write at the same time. And this is really the (concurrency-reducing) contention point on most databases, and as databases are the contention point of the whole system, and as concurrency is the objective of scalability: Disk contention is the main focus of my work when I do performance consultancy.

So, given that the main contention point is the database, and that multiple parallel CPUs and multiple parallel disks are in use, it would be unfortunate to make e.g. one disk wait for the other to finish. It should be possible that both disks are in use at the same physical moment in time.

And to do this, one needs to make sure that there is no reason why one transaction (e.g. purchase by a user) needs to wait on another transaction (e.g. different purchase by a different user). One needs to use locking where appropriate – but no more (e.g. lock the user’s balance for the duration of a transaction so if two items are charged the same account at the same time so one doesn’t get “check balance transaction A -> OK; check balance transaction B -> OK; subtract A; subtract B” and find the user had enough money for either A or B but not A+B together – but two transactions for two different users can be totally independent).

Imagine if primary keys could indicate either ordering or cardinality, or weren’t allowed to have holes. This would have the following consequences:

  • If one transaction (e.g. purchase) needed to create a new row, it would need a new primary key value. It would need to acquire that primary key value, do some things with it, create the row, maybe do other things e.g. write some logs, and then “commit” the transaction. If, during this time, a second transaction (e.g. a purchase by a different user) were to wish to execute the same purchase procedure, it would also need to acquire a new primary key value. It wouldn’t be able to get this value, until the database knew if the first purchase would “commit” or “rollback”. So the execution of one transaction would have to wait on the execution of another transaction. One disk would be sitting waiting, while the second disk was at work.
  • It is sometimes necessary to retrieve a primary key value before creating a new row. Only the database knows the current highest value in the table. The “retrieve primary key” request would have to send a request to the database, and wait for the reply. However, if the only requirement is uniqueness, the first such request can retrieve e.g. 20 PK values from the database, and the subsequent 19 requests can use the rest of the numbers. If the client crashes, those PK values remain unused.
  • What is the “ordering” of two transactions which are written to separate disks, processed by separate CPUs, and which happen at exactly the same time?

So primary keys are just arbitrary data indicating uniqueness alone. And that’s good, because if they didn’t, there would be much more inter-transaction contention, and one couldn’t utilize ones multi-server multi-disk multi-CPU environment: the speed of the system would be limited to the speed of the slowest transaction.

Automatic reconnect from Hibernate to MySQL

Friday, October 24th, 2008

Yesterday I spent the entire day getting the following amazing state-of-the-art not-ever-done-before feature to work:

  • Executing a SQL statement from my program

Because, as everyone knows, I don’t suffer from NIHS, I used standard object-relational mapping software Hibernate, with a standard programming language Java, using the standard web-application server Tomcat, and now I am using the standard “connection pooling” software C3P0 (which I didn’t know I needed to execute a SQL statement, see below..)

The program is, in fact, already completed, and is nearly deployed. On the test server it works fine and even on the (future) live server it worked fine. But the customer noticed that if one installed it one day, the next day it didn’t work. I’ve had such symptoms many times before, so I know immediately what was going on:

  • MySQL drops a connection after 8 hours (configurable)
  • The software is used during the day, but isn’t used during the night, therefore the connection times out in the night
  • Therefore in the morning, the program one installed the day before no longer works

Perhaps I exaggerated the simplicity above of what I was really trying to achieve. It should really be expressed as the following:

  • Executing a SQL statement from my program, even if a long time has passed since the last one was executed

But that amounts to the same thing in my opinion! It isn’t rocket science! (But in fact is, see below..)

A obvious non-solution is to increase the “connection drop after” time on the MySQL server from 8 hours to e.g. “2 weeks” (“wait_timeout” in “mysql.cnf”). But software has got to be capable of reconnecting after a connection drops. The database server may need to be reset, it may crash, it may suffer hardware failure, etc. If, every time one restarts one particular service, one has to restart a thousand dependent services (maybe some Java, some Perl, some PHP, some robots, ..) and then maybe restart services which are dependent on them – that’s a maintenance nightmare. So the software has to be altered to be able to handle connection drops automatically, by reconnecting. Once the software has been so altered, one no longer needs to alter the “wait_timeout” on the server.

The error was:

org.hibernate.util.JDBCExceptionReporter: The last packet successfully received from the server was 56697 seconds ago. The last packet sent successfully to the server was 56697 seconds ago, which  is longer than the server configured value of ‘wait_timeout’. You should consider either expiring and/or testing connection validity before use in your application, increasing the server configured values for client timeouts, or using the Connector/J connection property ‘autoReconnect=true’ to avoid this problem.

Quite a helpful error message, don’t you think? But

  • I’m not going to increase “wait_timeout” as discussed above,
  • “testing validity” in the application – well I was using standard software Hibernate which should take care of this sort of thing automatically, but evidently wasn’t
  • and we were already using ?autoReconnect=true in the JDBC URL (this evidently wasn’t working).

I figured I really needed to get to the bottom of this. Googling just showed (many) people with the same problem, but no solutions. The only way to get to the bottom of software is to read the source. (It has been the way to resolve issues of simple things simply not working in MySQL before.)

I stopped looking in the MySQL source for why “autoReconnect=true” didn’t work when I saw the following text in the source describing the autoReconnect parameter:

The use of this feature is not recommended, because it has side effects related to session state and data consistency

I have no idea what particular side-effects are meant here? I guess that’s left as an exercise for the reader, to test their imagination.

And anyway, I figure that a reconnect-facility belongs in the “application” (Hibernate in my case) as opposed to in database-vendor specific code. I mean the exactly the same logic would be necessary if one were connecting to PostgreSQL or Oracle, so it doesn’t make sense to build it in to the database driver.

So then I looked in the Hibernate code. To cut a long story short, the basic connection mechanism of Hibernate (as specified in all the introductory books and websites, which is probably how most people learn Hibernate) doesn’t support reconnecting, one has to use H3C0 connection pool (which itself didn’t always support reconnecting)

(I don’t want to use container/Tomcat-managed connections, as I have some command-line batch jobs, and I don’t want to use different code for the batch jobs than the web application. Although another company defined Servlets which were “batch job servlets”, and these were executed via a “wget” entered into crontab – to get the user of container-managed connections – but this seems a too-complex solution to my taste..)

But once one’s used H3C0, the default behavior seems to be that to process a request, if the connection is dead then the user sees and error – but at least it reconnects for the next request. I suppose one error is better than infinite errors, but still not as good as zero errors. It turns out one needs the option testConnectionOnCheckout – which the documentation doesn’t recommend because testing the connection before a request might lead to lower performance. Surely the software firstly has to work, only secondly does it have to work fast.

So, to summarize, to get a connection to “work” (which I define as including handling dropped connections by reconnecting without error): In “hibernate.cfg.xml”:

<!-- hibernate.cfg.xml -->
<property name="c3p0.min_size">5</property>
<property name="c3p0.max_size">20</property>
<property name="c3p0.timeout">1800</property>
<property name="c3p0.max_statements">50</property>
<property name="connection.provider_class">
<!-- no "connection.pool_size" entry! -->

Then create a file “” which must be in the root of the classpath (i.e. no way to override it for particular parts of the application):


Amazing, that that stuff doesn’t just work out of the box. Programming the solution myself in Uboot took, I think, 1 line, and I’m sure it’s not more in WebTek either.

That was an amazing amount of effort and research to get the simplest thing to work. Now if only this project had been paid by the hour…..

[Update 28 May 2009] More Java hate today. Starting a new application, deployed it, and it didn’t work. In the morning, the application was down. Reason: The new project used Hibernate 3.3, and upgrade from 3.2 to 3.3 requires the “connection.provider_class” property to be set. Previously the presence of “c3p0.max_size” was enough.