Which log levels to use when?

I’m sure there are a lot of opinions in the world about which logger levels to use for which errors. Log levels in the sense of if a text destined for a logfile should be prefixed with “Info”, “Warning”, “Error” etc.

There is even great debate about which log levels there should be. E.g. should there be a “Debug” level or a “Trace” level? Or both? In which case what’s the difference?

I tended to ignore those debates—for me a log statement was simply a log statement—but recently I was deploying an application, and various people had worked on it and used log levels inconsistently. Hardly a surprise as I’d never set any guidelines on how they should be used, due to aforestated ignoring. So I started to think about how I would have wanted the log levels to have been in the application I was at that moment deploying.

Firstly, what does a log level actually influence?

  • Different texts are stated in the log file. The only differences this makes to anyone is that operations teams, not too familiar with the software, tend to understandable freak out when one has lots of ERRORs in the log file; whereas they tend to freak out less with lots of INFOs.
  • You can do monitoring based on log files. E.g. one can create assertions which can be monitored, such as “INFOs may be in the log file, but ERRORs may not.”
  • You can set a log level and state that errors below a certain level will not be displayed, e.g. “ERRORs should be displayed but WARNINGs should not”. So the log level of an error may influence whether it gets logged at all.

And how would those differences influence ones decision about which level to make a particular log?

I came up with the following scheme for me:

TRACE For things that should happen, but one doesn’t want to see live. (For example SQL statement logs, etc.)
INFO For things that should happen, but one does want to log live. (For example, “writing file to /x/y.pdf…”) It’s important to state IDs and paths in log files, for if something goes wrong, one needs to be able to work out exactly what the robot did with what rows and objects. And if the robot crashes, a log, written before an action is done, such as “Writing PDF file…” can help to identify what part of the process caused the error.
WARNING   For things that shouldn’t happen, but where the user sees the correct result. For example files that should always exist, but if they don’t, they can be automatically recreated. If those happen live, one might want to look at them. But as long as the user sees the right thing, one can look at them in ones own time.
ERROR For things that shouldn’t happen, and where the user sees an incorrect result. Including “fatal” errors.

And in that case, one would monitor ERRORs (or maybe WARNINGs if one was otherwise bored), and would set the log level to TRACE on all test servers and INFO on all live servers.

  • I would put FATAL errors on a separate level. FATAL errors are (for me) errors that have resulted in a catastrophic failure of some component of the system. I work on a system that has multiple failover components and without the separate TAG of a fatal error we wouldn’t know immediately that a component had died and been failed over. The FATAL errors trigger a sound event on a number of PCs around the desk so we can respond rapidly. (One of the sound effects is the red dwarf ‘This ol’ baby’s crashed more times than a ZX81!’, so you get the picture on how reliable the system is ;-)).

  • Robin Salih

    Yeah, that’s basically how I do logging, although instead of TRACE I have DEBUG and SPAM. If you set SPAM on it will basically fill up a hard disc very quickly as it tends to log each sql statement, messages between servers, function calls etc. DEBUG is useful during the development stage, logs in more detail than INFO so you can basically follow the path of execution.

    Additionally there is CRITICAL, which is where you cannot recover from an error.