Use the “Repeatable Read” Isolation Level on PostgreSQL

The database system like PostgreSQL serves requests from many clients simultaneously. Sometimes multiple simultaneous requests might pertain to the same set of data. Programmers often (wrongly) think about the software they write as if it were the only process using the database. The danger is that often their software is the only piece of software using a particular piece of data, meaning that bugs due to this incorrect mindset often go undetected and are deployed to the live system.

You should strive to reduce the set of bugs introduced by this mismatch in the programmer’s perception of a single-user system, and the reality of a multi-user concurrent system. Because such bugs are so hard to detect, rigorous testing is an insufficient strategy to eliminate them.

An example of such a bug might be if the software, responding to an API call, within one transaction, firstly reads information about a user, and secondly adds a new authentication token to the user. If there is a foreign key constraint between the authentication tokens and the user, a programmer looking at the code might reasonably expect that if the reading of the user succeeds, the writing of the authentication tokens will not fail with the error that the user does not exist. However if, between the time of the read of the user and the time of the write of the token, another process deleted the user, and only the default “Read Committed” isolation level is used, then this error will indeed occur.

Ideally, the programmer will have foreseen this possibility, and their code will respond appropriately when the write of the token fails due to the user not existing. However, because the code looks so reasonable, they may have overseen the possibility. In the worst case, the error from the write of the token is unexpected, and will fail with an internal error: this is a bug.

PostgreSQL’s Repeatable Read Isolation Level cannot solve this problem, but it can mitigate it in two important ways vs. the default Read Committed:

When a piece of software wishes to read from the database, the state of the database at the time the transaction began is queried, rather than its current state. Therefore if there is a foreign key constraint, reading from a child row will imply that reading from the parent row will succeed. Errors due to rows disappearing, such as NullPointerException errors and internal errors, will not occur.
When a piece of software wishes to write to the database, if e.g. the user has disappeared since the user was queried, therefore child rows like authentication tokens cannot be created, the database will return a specific error that the database has changed (“ERROR: could not serialize access due to concurrent update”). There can be general code that catches this error, meaning that the programmer doesn’t have to think about catching it at every possible place, and potentially forgetting.

PostgreSQL has an even stricter level “Serializable”, however, that adds guarantees that do not solve the problems addressed here better than “Repeatable Read”, but those additional guarantees will cause transactions to fail in more circumstances.

For those familiar with Oracle, Oracle calls this isolation level “Serializable”.

P.S. I recently created a nerdy privacy-respecting tool called When Will I Run Out Of Money? It's available for free if you want to check it out.