MySQL’s “enum” datatype is a good thing

I’ve often had discussions with people about whether the “enum” type in MySQL is a good thing or not. Basically there are two ways to use your database:

  1. As an unstructured bunch of “stuff” to store whatever the software needs to persist. Such databases use lots of “blob” data with serialized objects (it’s easy to program), tables with multiple functions (“object” table with “type” column), few constraints, and so on.
  2. As a representation of the data the program is trying to model. Such databases have meaningful column names, two different types of things are two different tables. Adding constraints is easy.

There are good arguments for both. As one is programming in a programming language, whatever is most convenient for that programming language (e.g. serialize a bunch of objects to a blob) means less effort, less code: which means less cost and less bugs. Those are all good things.

But for some reason I’ve always been the 2nd type. I like to look at a database and understand what data is being modeled. It creates a certain self-documentation which can often be lacking from complicated software. Constraints can be added which acts like functional assertions (functional in the sense that they involve no state: you say that this value must be between 1 and 10 and then it is that way. You don’t have to program any “path” or state to check that).

That an item can be in exactly one a distinct set of states is a fact of life in all types of domain modeling:

  • What state is this invoice in? Is it “paid”, “open”, “paid+processed”?
  • Is this item deleted or not? (Or marked temporarily suspended, pending administrator checking the content)
  • Is this photo public, or does it belong to a user?

To explicitly work with enumerations is to acknowledge their existence within the domain, and to create a more 1:1 mapping between the domain and its implementation. That’s what all software should be about, modeling. Ideally a program is nothing more than a model and a few rules for how that model transitions between one consistent state and another.

If you program databases like #2 above, like I do, then enumerations really do make the data model richer.

  • The database maps to the domain easier (thereby documenting the domain, in case this is not done elsewhere)
  • The database maps to a domain-consistent implementation in the programming language easier
  • The database implicitly then has a constraint, as you can’t set the column to be some value which makes no sense for the domain (and therefore the program)

So enums in DBs are a good thing, and should be used wherever possible. Oracle should implement them just like MySQL does. And enums should be used in programming languages too. Why enums appeared only since Java 1.5 I don’t know. And the fact that php, perl etc do not support them, is yet another reason not to use those programming languages to try and attempt to build any software system whose function it is to model a domain.

It can be said that then it makes it “more difficult” to add a new state, as now you have to “change the database as well”. But if the model changes, and the database reflects the model, then that’s a good thing, not a bad thing. And it isn’t even much effort: If you say that invoices can be in a new state, then there’ll be a lot of programming work to support that change (UI, billing logic, robots, test scripts). The “alter table” statement is no work at all proportionally, and you’ll also be thankful for every extra constraint the database or programming language can offer you (which such a change feels very scary in a Perl program, but not so much in a Java program).

(Enums are supported in PostgreSQL since v8.3)

  • Michael C

    I think you’ve completely missed the point. The question is not whether enums compared to no enums is a good thing (obviously it’s better to have greater constraint). The real question is whether it’s better to use an enum or a another table to store the list of these enum values. To follow your invoice example above, you would have an invoice table and, say, an InvoiceState table which listed all the possible states of the invoice. You would then have a foreign key joining the 2 tables. The advantage to this is that if you need to add further data to this InvoiceState table then you could (for example, you could have shortname, longname for the enum values etc). Using an enum data type in a database is just a poor substite for those too lazy to create a table.

  • Michael C, I think it is you who has missed the point :-)

    You use a lookup table if the meaning is purely illustrative for the user. E.g. moods

    Happy, Sad, Angry

    but if an application logic needs to function differently depending on the state and the user cannot define the states, as in a purchase order

    Raising, Raised, Cancelled

    then this should be defined as an enum and not a data table that contains an additional numeric column with a “special value” in it.

    Whether the enum belongs in the DB as an enum or not is not my argument, it’s just that your comment about people being too lazy is not accurate.

    ==== – Think domain, not database