Quotes in CSV Files

Quotes in CSV files are a classic case of a solution to a problem, which in fact does nothing to solve the problem, but is a problem of its own.

The problem: If you have a row of data like foo,bar and you wish to include commas in one data element, you can't, as the recipient system will treat the comma as a field separator as opposed to part of the data. The solution used by programs like Excel: surround such data in double-quotes, i.e. "fo,o",bar.

In my opinion, there are two solutions to embedding data in other data (e.g. a string in a CSV file, or a piece of arbitrary text in an XML file):

  1. Either any character may be contained within the embedded data
  2. Or only certain characters may be contained within the embedded data

If you are in situation #2, it doesn't really matter how many characters you can/can't contain, the fact is, you'll have to deal with the situation.

The CSV solution if introducing double-quotes has simply been changed the problem from "you can't use commas" to "you can't use double-quotes" which are pretty equivalent problems IMHO. Yet, as with most solutions, the complexity has been increased by its introduction.

There needs to be a solution to the new problem of "you can't use double-quotes", and that solution should be to define an escape character e.g. "" to prefix the double-quotes in case you wish to represent them literally. And that solution would work just as well to solve the original "you can't use commas" problem!

(In fact the solution chosen by Excel is to use two double-quotes to represent a literal double-quote, and not an escape prefix like the backslash: this makes parsing harder and can't be used for field separators as they couldn't be differentiated from two field separators around an empty field.)

This article is © Adrian Smith.
It was originally published on 11 Apr 2011
More on: FAIL | Coding