Archive for the ‘Databases’ Category

MySQL “lock tables” does an implicit commit

Thursday, May 7th, 2009

The MySQL “lock tables” (and “unlock tables”) command has a nasty side-effect, it implicitly commits the current transaction.

This caused a bug in production code (a normally irrelevant temporary error, which should have normally caused a rollback, only rolled back to the last “unlock tables” command due to its implicit commit, and thus left the database in an inconsistent state, meaning that when the request was retried the software found the database in a state it wasn’t expecting and thus couldn’t process the request even though the irrelevant temporary error had now been fixed. So the request got retried indefinitely, and the result was that a lot of orphan invoices were created in the database, which wouldn’t have been a problem but invoice numbers were then consumed by these unused invoices, and Austrian law dictates that invoice numbers must be sequential, i.e. the numbering system shouldn’t have huge holes in it. Or, as in our case, a number of huge holes, as the system was up and running for other requests during this bug (a feature, that one faulty request shouldn’t take the whole system down), so other invoices did get correctly generated at the time, and their invoice numbers were then small islands in the sea of unused/invalid invoice numbers produced by the bug.)

I wish this had been more obviously documented (MySQL 5.1 lock tables documentation). It is there in the bullet points (and the comments!), but it’s hardly well emphasized, and I missed it.

The trouble is that the “lock tables” command comes from the stone-age of MySQL before it supported transactions at all.

This has been “fixed” in MySQL 6.0 (MySQL 6.0 changelog), via the introduction of a new options ”in share mode” / “in exclusive mode”.

In the mean-time, if you need to lock a table (which one needs to do in order to ensure table-wide sequential numbering, as a primary key isn’t designed for that) I would recommend creating a table with a name like “mutux” and creating a row for each type of table-lock you might like to acquire, and then “select for update” the row corresponding to your table. This creates an in-transaction row-level lock, doesn’t do an implicit commit, and the lock will be automatically released when the transaction ends.

“Ask Tom” Oracle Seminar in Prague

Tuesday, March 10th, 2009

Today and tomorrow I am at an Ask Tom Live Seminar in Prague. Tom Kyte works for Oracle and has written many books on Oracle. For me he’s a bit of a “star”, so when I heard he was talking in Europe, I had to come. (But is that wrong? Normal people are in to stars like Robbie Williams surely, not Oracle experts.)

Wow there is so much I don’t know about Oracle evidently. I mean the solutions that he is presenting concerning Oracle 11g, are solutions to problems in features I’ve never used, such as partitioning, etc.

Some fun things I learnt:

  • In Oracle 6 “sql plus” used to be offered as a for-cost extra, as it was considered so good! (Apparently there was a previous tool, and it was worse.) Sql Plus still doesn’t support e.g. the “up” arrow key to get the previous command.
  • In Oracle 3 “insert into x select from x” would be an infinite statement as it would constantly re-select then insert the data from x that it had just inserted. This was fixed in Oracle 4. 
  • Oracle 11 can create “virtual columns”, i.e. “create table x(a number, b as (a+1))”. You can then use these virtual columns for indexes (like functional indexes), foreign key constraints, etc. A statement like “select .. from x where a+1=?” will be rewritten as “where b=?” which will then use the index. Audience member asked “can I write an update statement to update b?”
  • You can do “select …. as of <date>” and if the data is still available in the rollback segment, it will be returned.
  • Adding a column is now always instantaneous, i.e. “alter table x add y number default 4″ will not alter the data in the table, but when a row is read which was created before the alter table statement happened, the column will be added with its default value. (But presumably altering the column’s default value then causes the long write?)
  • Apparently Oracle was the first RDBMS. Well that’s what Tom Kyte claims anyway.

P.S. I should have brought my Oracle One on One with me for him to autograph… OK that would have been ridiculous.

What you can’t deduce by looking at primary keys

Wednesday, January 28th, 2009

There is often a misunderstanding about primary keys in database tables. Primary keys identify rows uniquely, so the only property they have to obey is they must be unique.

Primary keys don’t have to be numbers. But if they are, here are a few properties these numbers don’t have to have:

  1. Numerical PKs don’t indicate ordering, i.e. which row was inserted first.
  2. Looking at the highest PK value doesn’t indicate the number of rows created (cardinality).
  3. There are not guaranteed not to be “holes” in the numerical sequence: e.g. row 11 and row 13 exist, but row 12 doesn’t.

In addition to PKs not having to have these properties, there are good reasons why auto-generated PKs actually don’t have them practice.

Many types of systems (for example websites, or the central transaction database of a supermarket) measure their performance not by the duration of a particular transaction (0.10s or 0.08s doesn’t make a big difference) but measure their performance by concurrency, which influences the throughput, i.e. if you can handle 100 or 1,000 transactions per second.

If all data and locks which are system-wide are stored on the database, then all other parts of the system (web servers, application servers, cash tills) can be independent of one another. If the performance of system were limited by one of these non-database parts, one can improve the throughput of the system (although not the duration of individual transactions) by increasing the concurrency, i.e. simply adding more of these non-database parts in parallel. (Surprisingly, the case in one system I’m building, concerning simulation of many different mobile phone tariffs for a particular phone usage scenario, this is indeed the case: the slowest part by far is the application server)

Increasing the concurrency in the database, on the other hand, is not such an easy task. One can’t just split the data up into different databases: unique constraints must be enforced over all data, referential integrity (foreign keys) must be enforced over all data, locks must be system-wide (e.g. “does user X have enough money? first lock, then check, then take away money, then unlock”)

But there are techniques available:

  • Is the limiting factor the database CPU? If so, more cores can be added. If that limit is reached, an expensive cluster-database can be used. But, in my opinion, database should be storing and reading the database, and not doing CPU-intensive processing. If the processing is done by the application servers, then they can easily and cheaply be scaled, as stated above.
  • Is the limiting factor the disk? One can add more disks, which can physically read/write at the same time. And this is really the (concurrency-reducing) contention point on most databases, and as databases are the contention point of the whole system, and as concurrency is the objective of scalability: Disk contention is the main focus of my work when I do performance consultancy.

So, given that the main contention point is the database, and that multiple parallel CPUs and multiple parallel disks are in use, it would be unfortunate to make e.g. one disk wait for the other to finish. It should be possible that both disks are in use at the same physical moment in time.

And to do this, one needs to make sure that there is no reason why one transaction (e.g. purchase by a user) needs to wait on another transaction (e.g. different purchase by a different user). One needs to use locking where appropriate – but no more (e.g. lock the user’s balance for the duration of a transaction so if two items are charged the same account at the same time so one doesn’t get “check balance transaction A -> OK; check balance transaction B -> OK; subtract A; subtract B” and find the user had enough money for either A or B but not A+B together – but two transactions for two different users can be totally independent).

Imagine if primary keys could indicate either ordering or cardinality, or weren’t allowed to have holes. This would have the following consequences:

  • If one transaction (e.g. purchase) needed to create a new row, it would need a new primary key value. It would need to acquire that primary key value, do some things with it, create the row, maybe do other things e.g. write some logs, and then “commit” the transaction. If, during this time, a second transaction (e.g. a purchase by a different user) were to wish to execute the same purchase procedure, it would also need to acquire a new primary key value. It wouldn’t be able to get this value, until the database knew if the first purchase would “commit” or “rollback”. So the execution of one transaction would have to wait on the execution of another transaction. One disk would be sitting waiting, while the second disk was at work.
  • It is sometimes necessary to retrieve a primary key value before creating a new row. Only the database knows the current highest value in the table. The “retrieve primary key” request would have to send a request to the database, and wait for the reply. However, if the only requirement is uniqueness, the first such request can retrieve e.g. 20 PK values from the database, and the subsequent 19 requests can use the rest of the numbers. If the client crashes, those PK values remain unused.
  • What is the “ordering” of two transactions which are written to separate disks, processed by separate CPUs, and which happen at exactly the same time?

So primary keys are just arbitrary data indicating uniqueness alone. And that’s good, because if they didn’t, there would be much more inter-transaction contention, and one couldn’t utilize ones multi-server multi-disk multi-CPU environment: the speed of the system would be limited to the speed of the slowest transaction.

Automatic reconnect from Hibernate to MySQL

Friday, October 24th, 2008

Yesterday I spent the entire day getting the following amazing state-of-the-art not-ever-done-before feature to work:

  • Executing a SQL statement from my program

Because, as everyone knows, I don’t suffer from NIHS, I used standard object-relational mapping software Hibernate, with a standard programming language Java, using the standard web-application server Tomcat, and now I am using the standard “connection pooling” software C3P0 (which I didn’t know I needed to execute a SQL statement, see below..)

The program is, in fact, already completed, and is nearly deployed. On the test server it works fine and even on the (future) live server it worked fine. But the customer noticed that if one installed it one day, the next day it didn’t work. I’ve had such symptoms many times before, so I know immediately what was going on:

  • MySQL drops a connection after 8 hours (configurable)
  • The software is used during the day, but isn’t used during the night, therefore the connection times out in the night
  • Therefore in the morning, the program one installed the day before no longer works

Perhaps I exaggerated the simplicity above of what I was really trying to achieve. It should really be expressed as the following:

  • Executing a SQL statement from my program, even if a long time has passed since the last one was executed

But that amounts to the same thing in my opinion! It isn’t rocket science! (But in fact is, see below..)

A obvious non-solution is to increase the “connection drop after” time on the MySQL server from 8 hours to e.g. “2 weeks” (“wait_timeout” in “mysql.cnf”). But software has got to be capable of reconnecting after a connection drops. The database server may need to be reset, it may crash, it may suffer hardware failure, etc. If, every time one restarts one particular service, one has to restart a thousand dependent services (maybe some Java, some Perl, some PHP, some robots, ..) and then maybe restart services which are dependent on them – that’s a maintenance nightmare. So the software has to be altered to be able to handle connection drops automatically, by reconnecting. Once the software has been so altered, one no longer needs to alter the “wait_timeout” on the server.

The error was:

org.hibernate.util.JDBCExceptionReporter: The last packet successfully received from the server was 56697 seconds ago. The last packet sent successfully to the server was 56697 seconds ago, which  is longer than the server configured value of ‘wait_timeout’. You should consider either expiring and/or testing connection validity before use in your application, increasing the server configured values for client timeouts, or using the Connector/J connection property ‘autoReconnect=true’ to avoid this problem.

Quite a helpful error message, don’t you think? But

  • I’m not going to increase “wait_timeout” as discussed above,
  • “testing validity” in the application – well I was using standard software Hibernate which should take care of this sort of thing automatically, but evidently wasn’t
  • and we were already using ?autoReconnect=true in the JDBC URL (this evidently wasn’t working).

I figured I really needed to get to the bottom of this. Googling just showed (many) people with the same problem, but no solutions. The only way to get to the bottom of software is to read the source. (It has been the way to resolve issues of simple things simply not working in MySQL before.)

I stopped looking in the MySQL source for why “autoReconnect=true” didn’t work when I saw the following text in the source describing the autoReconnect parameter:

The use of this feature is not recommended, because it has side effects related to session state and data consistency

I have no idea what particular side-effects are meant here? I guess that’s left as an exercise for the reader, to test their imagination.

And anyway, I figure that a reconnect-facility belongs in the “application” (Hibernate in my case) as opposed to in database-vendor specific code. I mean the exactly the same logic would be necessary if one were connecting to PostgreSQL or Oracle, so it doesn’t make sense to build it in to the database driver.

So then I looked in the Hibernate code. To cut a long story short, the basic connection mechanism of Hibernate (as specified in all the introductory books and websites, which is probably how most people learn Hibernate) doesn’t support reconnecting, one has to use H3C0 connection pool (which itself didn’t always support reconnecting)

(I don’t want to use container/Tomcat-managed connections, as I have some command-line robots which do some work, and I don’t want to use different code for the robots as the web application. Although another company defined Servlets which did “robot work”, and the robot was just a “wget” entered into Tomcat – to get the user of container-managed connections – but this seems a too-complex solution to my taste..

But once one’s used H3C0, the default behavior seems to be that to process a request, if the connection is dead then the user sees and error – but at least it reconnects for the next request. I suppose one error is better than infinite errors, but still not as good as zero errors. It turns out one needs the option testConnectionOnCheckout - which the documentation doesn’t recommend because testing the connection before a request might lead to lower performance. Surely the software firstly has to work, only secondly does it have to work fast.

So, to summarize, to get a connection to “work” (which I define as including handling dropped connections by reconnecting without error): In “hibernate.cfg.xml”:

<!-- hibernate.cfg.xml -->
<property name="c3p0.min_size">5</property>
<property name="c3p0.max_size">20</property>
<property name="c3p0.timeout">1800</property>
<property name="c3p0.max_statements">50</property>
<property name="connection.provider_class">
   org.hibernate.connection.C3P0ConnectionProvider</property>
<!-- no "connection.pool_size" entry! -->

Then create a file “c3p0.properties” which must be in the root of the classpath (i.e. no way to override it for particular parts of the application):

# c3p0.properties
c3p0.testConnectionOnCheckout=true

Amazing, that that stuff doesn’t just work out of the box. Programming the solution myself in Uboot took, I think, 1 line, and I’m sure it’s not more in WebTek either.

That was an amazing amount of effort and research to get the simplest thing to work. Now if only this project had been paid by the hour…..

[Update 28 May 2009] More Java hate today. Starting a new application, deployed it, and it didn’t work. In the morning, the application was down. Reason: The new project used Hibernate 3.3, and upgrade from 3.2 to 3.3 requires the “connection.provider_class” property to be set. Previously the presence of “c3p0.max_size” was enough.

Programming with unique constraints

Friday, June 13th, 2008

If you’re using a remote database system, your application doesn’t have access to all the data at any point in time. (I.e. you just load and save the rows you’re interested in in a particular transaction). Therefore if you want to do some database-wide operation, you need to ask the database to do it.

When you want to enforce uniqueness, for example across a whole table (for example a document name needs to be unique), or across a particular part of a table (for example each user must have documents with unique names), you need to do that in the database.

There is only one acceptable way to do this with a SQL database:

  1. Insert a new row
  2. If the insert succeeds, there wasn’t a row there before, and now there is
  3. If the insert throws a unique constraint violation, the row is already there
  4. If you want to update the row (i.e. an “insert or update” operation to maintain a “lazy singleton” in the database), you can update the row with safety after the unique constraint violation, as you can be certain the row is already there.

The following methods are all not acceptable:

  • Do a “select” to find out how many rows there are. If there aren’t any, do an “insert”. However someone may have inserted a row between your “select” and your “insert”.
  • Do an “update” and if the database says that 0 rows have been updated, do an “insert”. Again, someone may have inserted a row between your “update” and the “insert”.
  • Do a “select for update” statement (Oracle, Postgres, InnoDB) to check that there aren’t any rows while creating a lock, and then do an insert. However that statement only locks the rows it returns, so if it doesn’t return any rows, it doesn’t create any locks, so you still can’t be certain that no one has inserted a row between the “select” and the “insert”.
  • Lock the whole table and do one of the above. This works, but it means that all write access is “serialized” i.e. happens after one another. Any other operation, writing something completely irrelevant, will now also have to wait until the end of your transaction, whereas it shouldn’t have. This reduces concurrency.

The way I program this is the following. On the “insert” statement, I catch the database error, and see if it’s a “unique constraint violation”-type error. If it is, I throw an (unchecked) Exception. The calling code can catch that and do something with it (or not, if the statement should not generate such an error, in which case it will propogate to the main loop like any other database exception). I have had the pleasure of introducing this to easyname.eu and now also the pleasure of introducing this to the WebTek framework.

It is extremely frustrating working with big complex frameworks, whose usage is much more complex than just writing SQL manually, and not being able to do the above properly.

  • Hibernate clearly states in its documentation that if any database error occurs, the Session (main object managing all persistence) must be destroyed as its state will be out-of-sync with the database. But there is no way other than the above to do this sort of check (as far as I know).
  • OptimalJ generates code that, if a database error occurs, sets the transaction to rollback. Including any other work you may have done.

I mean checking unique constraints is something every database application needs, so the fact it’s not supported by major frameworks is just unbelievable.

Oracle, Nulls and the empty string

Thursday, June 12th, 2008

Oracle has an unusual feature, which attracts it a lot of criticism. If you try to insert the empty string into a column marked “not null”, you get an error. The empty string is treated the same as “null” by Oracle.

This is different to programming languages (and indeed other databases, at least MySQL), which means one has to be careful not to make a mistake when using a programming language to talk to the database. That it’s different from other systems is the main reason for the criticism.

However, how many times have you wanted the following to be allowed in your data schema, for example on a “first name” column:

  • Writing nulls is not allowed
  • Writing the empty string is allowed

I just wrote a program and the front-end framework helpfully noticed that the field was “not null” in the database and gave the user an error in the front-end if the field was empty. However when I altered the code slightly, it no longer gave an error. Because the field in the program was the empty string, and not null.

However, I assert, when dealing with data, checking only for not null is not useful; you also want to check that the string contains some data in that case.

I have now updated the framework, so that if the field is marked “not null”, then the error is presented to the front-end not only if the variable in the program is “null”, but also if it is the empty string.

(Note: I am not advocating e.g. Java lose the distinction between ==null and .isEmpty(): for some reason this is a useful distinction when doing programming and data manipulation—such as null indicating that a variable isn’t initialized yet—I just don’t think it’s a useful distinction in a system solely designed to model persistent data.)

XML in the database

Monday, March 3rd, 2008

As I do a lot of programming using open source databases, I couldn’t help but notice that inclusion of XML in the database seems to be a hot topic. Postgres 8.1 (just released) and MySQL 5.1 (not released yet) will allow XML documents to be stored as the value of cells, and allow those documents to be manipulated and inspected. (Oracle has had this feature since 2001)

I generally don’t like XML (as Robin put it: most formats are designed to be either conveniently readable by humans, or by computer. XML is conveniently readable by neither.)

So I especially wouldn’t like to store XML in the database. However, there are some times when one would naturally want to store some hierarchical data and the relational model doesn’t really allow that conveniently. Compared to:

  • Storing e.g. Java serialized objects as a BLOB in the database (with no chance of ever doing a “select” against that data, or migrating that data using a SQL migration script);
  • Or storing direct XML without manipulation functions (slightly more readable, but still not programatically manipulatable using SQL)

I suppose using the XML features provided in the database is better than either of the above solutions.

The following table contains documentation copied from the documentation or the changelog of the database in question:

New in Postgres 8.3
(4th Feb 2008)
“XML Support: New XML data type fully supports the SQL/XML specification of ANSI SQL:2003, including well-formedness checks, type-safe operations, SQL/XML publishing and XPath queries. Version 8.3 also includes additional functions for XML data export.”
New in MySQL 5.1
(In development)
“XML functions with XPath support. ExtractValue() returns the content of a fragment of XML matching a given XPath expression. UpdateXML() replaces the element selected from a fragment of XML by an XPath expression supplied by the user with a second XML fragment (also user-supplied), and returns the modified XML. See Section 11.10, ‘XML Functions’.”
New in Oracle 9.0.1
(Released 2001)
“The introduction of Oracle XML DB and the XMLType datatype provides new techniques that facilitate the persistence of XML content in the database. These techniques include the ability to store XML documents in an XMLType column or table, or in Oracle XML DB Repository. Storing XML as an XMLType column or table makes Oracle Database aware that the content is XML. This allows the database to: Perform XML-specific validations, operations, and optimizations on the XML content; Facilitate highly efficient processing of XML content by Oracle XML DB.”
New in MS SQL Server 2005
(Released 2005)
“SQL Server 2005 introduces a native data type called XML. A user can create a table that has one or more columns of type XML in addition to relational columns. … Together with a large set of functions, embedded XQuery and data modification languages provide rich support for manipulating XML data.”

Reading row-by-row into Java from MySQL

Thursday, February 7th, 2008

Trying to read a large amount of data from MySQL using Java using one query is not as easy as one might think.

I want to read the results of the query a chunk at a time. If I read it all at once, the JVM understandably runs out of memory. In this case I am stuffing all the resulting data into a Lucene index, but the same would apply if I was writing the data out to a file, another database, etc.

Naively, I assumed that this would just work by default. My initial program looked like this (I’ve left out certain things such as closing the PreparedStatement):

public void processBigTable() {
    PreparedStatement stat = connection.prepareStatement(
        "SELECT * FROM big_table");
    ResultSet results = stat.executeQuery();
    while (results.next()) { ... }
}

Failed with the following error:

Exception in thread "main"
        java.lang.OutOfMemoryError: Java heap space
    at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2823)
    at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2763)
    ...
    at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1657)
    ...

The line it failed at was the exceuteQuery. So as we can see from the stack backtrace, it’s clearly trying to load all the results into memory simultaneously.

I tried all sorts of things but it was only after I took at the MySQL JDBC driver code did I find the answer. In StatementImpl.java:

protected boolean createStreamingResultSet() {
    return ((resultSetType == ResultSet.TYPE_FORWARD_ONLY)
        && (resultSetConcurrency == ResultSet.CONCUR_READ_ONLY)
        && (fetchSize == Integer.MIN_VALUE));
}

This boolean function determines if it’s going to use the approach “read all data first” or “read rows a few at a time” (= “streaming” in their terminology). I clearly need the latter.

You can specify, using the generic JDBC API, the number of rows you want to fetch at once (the “fetchSize”). Why would you have to set that to Integer.MIN_VALUE, which is stated to be −231, in order to get streaming data? I wouldn’t have guessed that.

Basically this very important decision about which approach to use, which in my case amounts to “program works” or “program crashes”, is left to test whether three variables are set to various values. I am not aware if this is in the documentation (I didn’t find it), nor if this decision is guaranteed to be stable, i.e. won’t change in some future driver version.

Now my code looks like the following:

public void processBigTable() {
    PreparedStatement stat = c.prepareStatement(
        "SELECT * FROM big_table",
        ResultSet.TYPE_FORWARD_ONLY,
        ResultSet.CONCUR_READ_ONLY);
    stat.setFetchSize(Integer.MIN_VALUE);
    ResultSet results = stat.executeQuery();
    while (results.next()) { ... }
}

This code works, and reads chunks of rows at a time.

Well I’m not sure if it reads chunks of rows at a time, or just one row at a time. I hope it doesn’t read one row at a time, because that would be very inefficient in terms of number of round trips from the software to the database. I assumed this was what the fetchSize parameter was controlling, so you could tune the size of the chunks to meet your particular latency and memory setup. But being forced to set it to a large negative number in order to get it to work means one has no control over the size of the chunks (as far as I can see).

(I am using Java 6 with MySQL 5.0 and the JDBC driver “MySQL Connector” 5.1.15.)

COUNT(*) vs COUNT(pk_col)

Monday, November 19th, 2007

A while back I was doing some performance tuning on MySQL 5 for a customer. A SELECT was counting the number of rows in the table. I always use COUNT(*) for that but I know a lot of people, including the customer, use COUNT(pk_col). The query was taking a long time (a few minutes). I analyzed that the problem came from the usage of COUNT(pk_col) instead of COUNT(*). With COUNT(*) it was instantaneous.

I didn’t know that there was a difference between the two. There is no difference in their semantics, therefore it didn’t occur to me that there might be a difference in the way they were executed.

Just to recap the SQL syntax:

  • COUNT(DISTINCT col) counts the number of distinct values that are in “col”
  • COUNT(col) counts the number of rows where col is not null.
  • COUNT(*) counts the number of rows in a table

It follows that for columns which don’t have any “null” values in them, COUNT(col) must be the same as COUNT(*). For any column marked “not null”, such as a primary key, this is always the case. And thus it follows that COUNT(pk_col) must always deliver an identical result to COUNT(*).

However, the database in question was executing a COUNT(pk_col) query and a COUNT(*) query differently.

  • For the COUNT(*) queries it was simply counting the number of rows in the table (taking constant time and not requiring reading the rows from disk)
  • For the COUNT(pk_col) it was going through all the rows, presumably checking if the pk_col was in fact null, and counting the number of rows where it wasn’t, thus requiring a lot of disk access.

Alas I didn’t write down the output from the EXPLAIN statement and I’ve just tried it on some other databases (using MySQL 4.1 and 5.0) and was unable to reproduce this behaviour. So this is a bit of a useless blog entry! Nevertheless – you must believe me – it did happen!

So the conclusion is, one should always use COUNT(*) and never COUNT(pk_col).

Interesting Oracle/MySQL locking difference

Wednesday, September 5th, 2007

I know the rules for Oracle row locking well. A row can be locked for write if one updates it, or if one “select for update”s it.

  1. create table a (x number); (and equivalent in MySQL for InnoDB)
  2. Session A: insert into a values (9);
  3. Session A: commit;
  4. Session A: start transaction (in mysql)
  5. Session A: select * from a where x=9 for update;
  6. Session B: start transaction (in mysql)
  7. Session B: select * from a where x=9 for update;
  8. Session B hangs, waiting for row-level lock to be release from the row by Session A
  9. Session A: update a set x=4;
  10. Session A: commit;
  11. At this point, Session B returns no rows. Lock has been released, and row no longer confirms to where, so is not returned.
  12. Session A: update a set x=5;

This is where the difference occurs:

  • Oracle returns the row to session A. The command “for update” in session B did not return any rows, and thus no rows were locked, and thus session A has nothing to wait on.
  • MySQL (version 4.1.18) blocks session A waiting on the transaction in session B to be ended. That means A requires a lock owned by B. But what is this lock? Is it a row-level lock on the row which was not selected? Some other type of lock?

The reason I tested this was it just occurred to me that in a “select for update” the where clause is applied twice:

  1. Firstly to determine which rows to return. A lock is requested for those rows, which can involve waiting until the lock is released if it is owned by some other transaction.
  2. After the lock has been acquired, the rows is checked again to see if it still confirms to the where clause. If not, it is not returned (although the session blocked waiting for it)

I suspect that both databases simply do step 1 involving waiting and acquiring the lock on the rows which initially conform to the where clause. And both apply the where in step 2, as in both cases no rows were actually returned dispite the wait. But Oracle, upon noticing the row will not be returned, actually releases the lock as the command implies only rows returned will be locked. But MySQL just leaves the lock form step 1 hanging around, which is wrong.