Unit testing code that needs data in a database

12 April 2017 — in Software Architecture, Coding

If you write a method like “add product to user’s cart”, you want to test that that method works.

That method may require state to exist e.g. in a database (e.g. that the user exists, to whose cart the product shall be added), and testing it automatically e.g. with JUnit may require querying the new state (e.g. that the product is now in the user’s cart).

You have these options:

You could just not test it. I will waste no words describing why this option is not the one you should choose.
You could test the method manually, probably as part of testing a piece of user-interface that uses the method by clicking through it. Controversially I’m going to suggest this isn’t a bad option for simple methods, as has the advantage of not having to spend time developing and maintaining tests. But for complex methods, with a lot of edge cases which require a lot of testing, nobody’s going to test all the edge cases in the future, so your method, and thus your software, will have bugs at some point in the future.
You could manually set up a local database with the necessary data that the method requires. This isn’t reproducible on another developer’s machine, or even on your machine in a year’s time.
You could “mock” the database by “injecting” something other than a database driver to the code which performs the queries. When testing, it’s important to test on an environment that’s as similar to the live system as possible. For example, same operating system, same version of any third-party software, etc. (I once had a bug which only happened with a particular point release of the JVM.) Testing database code by using something other than a database is an approach so far away from that ideal that it is worthy of no further consideration.
You could set up a database containing the data, and check a dump of that database into your VCS. Not a bad idea, but over time this dump will get unmaintainable. Nobody will understand the relationship of which tests require which data. Nobody will dare alter the dump, everyone will just add new rows for their new tests. It’s OK, but the data a test needs (pre-requisites) are so tightly coupled with the test, it’s a shame to have the pre-requisites in a completely different place from the actual test code which needs them.
A better option is to write code at the start of the test to add the data that the particular test needs. For example a JUnit test method might do “add user” (set up state) then “insert product into that user’s cart” (method under test). The primary key will probably be assigned by the database, but what about other unique values like username? What if the previous test failed and didn’t clean up? A user with that username might already exist, and thus the “add user” part of the test will fail (although the code under test works; the problem is that on the last run it didn’t work). It might seem that with database transactions, you can ensure all such changes from the last test are rolled back and gone, but there are still times that you must test a process spanning multiple transactions, or the code under test requires data in a legacy data source not supporting transactions such as MyISAM, the filesystem, REST APIs, etc.
This can be solved by using random values for all those values that need to be unique.

So these days my unit tests compose of:

Selecting random values for any usernames or other IDs that I need, putting them into variables
Creating new rows in the database for all the data that I need, using these IDs
Performing the operation under test
Asserting the results are correct, by querying the database if necessary, using the IDs generated earlier
If possible, I rollback the transaction just for neatness, but those non-transactional data sources won’t get rolled back, and that’s fine too.

P.S. I recently created a nerdy privacy-respecting tool called When Will I Run Out Of Money? It's available for free if you want to check it out.