If you’re designing a standard library, insist that all map keys are strings

Associative arrays. Maps. Hashes. Dictionaries. They are an important cornerstone of data modelling in software. Every language has them. Question: What types should the keys be?

The answer depends on what language you're using. I have spent a lot of time programming Perl and Java, and they answer this question differently. Perl's hashes force you to have string keys. Java's Map allows you to have any objects as keys, and you can override (amongst others) the equals method to allow the map to know if two keys are the same or not. C++ STL's map is the same as Java, but in addition allows you to supply code, when you create the map, to determine if two key objects are the same or not.

The Java and C++ STL way definitely seem to be more general. But after years of programming in both worlds, I've come to a conclusion: The way Perl does it, allowing only strings as keys, is better.

It's obvious whether two strings are equal or not. It's not obvious whether two arbitrary objects are equal or not.

Take a User object. This represents a user in the system. It has an id, a name, a picture and so on. You want information against users, so you create a Map with Users as keys, and the information as values. The code Map<User, MyInfo> reads easily enough. Adding entries and fetching entries from this map reads easily enough.

But what does it actually do? Under what circumstances are two User objects equal? When their IDs are equal? When all fields are equal? When certain key fields such as names are equal? What if two users haven't been inserted into the database yet and thus don't have IDs yet?

This is a whole can of worms. I have encountered subtle bugs due to object equality not being what I expect it to be in a certain situation (even though it looked reasonable enough when the class was defined). These bugs have been difficult to trace and fix, and aren't always the sort of bug which even have a manifestation quickly or all the time.

Adding information to the map with map.add(user.id, myInfo) is also easy to read. You know exactly what it does. You don't have to program those annoying equals and hashCode methods, so your business logic becomes less diluted by implementation artefacts. Those methods also can't have bugs if you haven't programmed them.

A disadvantage: If you're using a language supporting static typing and generics, then extra readability and compiler-verification can be gained by stating the type of the key of a relationship. Just using String doesn't allow this (is it a user id? the name of the user?). But I still think that the simplicity aspect described above is more important.

Perhaps one could argue that numbers should be allowed as keys too. But I think, for simplicity, there should only be one type of key. A number can be converted to a string easily enough (as long as one makes sure to remove any leading zeros, which would make two identical numbers appear to be different strings). More complex data, that one might want as a key, can be more easily converted to a string than to a number.

Regarding sets and finding unique objects

The situation is similar. Perl has no facility to do this. Java has the ability to add objects to a Set, the elements' equals method is used to determine if objects are the same or not. C++ STL's set is the same as Java, but also allows you to supply a function to determine if two elements are the same or not.

Again, if you want to find which (unique) users certain information pertains to, in Java you'd create a Set<User> and put the users in there. Again, what makes one User the same as another? Again, subtle bugs can be introduced when your understanding of what makes a user unique doesn't coincide with the author's of the user class.

This is the way Perl does it: There are only maps, no sets. You place objects in the map as values, and the unique aspect is the key of the map. At the end you just take all the values of the map, and ignore the key. map.add(user.id, user) is easy to write, it's clear what's going on.

P.S. I recently created a nerdy privacy-respecting tool called When Will I Run Out Of Money? It's available for free if you want to check it out.

This article is © Adrian Smith.
It was originally published on 18 Mar 2013
More on: Language Design