I’m sure there are a lot of opinions in the world about which log levels to use for which errors. Log levels in the sense of if a text destined for a logfile should be prefixed with “Info”, “Warning”, “Error” etc.
There is even great debate about which log levels there should be. E.g. should there be a “Debug” level or a “Trace” level? Or both? In which case what’s the difference?
I tended to ignore those debates—for me a log statement was simply a log statement—but recently I was deploying an application, and various people had worked on it and used log levels inconsistently. Hardly a surprise as I’d never set any guidelines on how they should be used, due to aforestated ignoring. So I started to think about how I would have wanted the log levels to have been in the application I was at that moment deploying.
Firstly, what does a log level actually influence?
- Different texts are stated in the log file. The only differences this makes to anyone is that operations teams, not too familiar with the software, tend to understandable freak out when one has lots of ERRORs in the log file; whereas they tend to freak out less with lots of INFOs.
- You can do monitoring based on log files. E.g. one can create assertions which can be monitored, such as “INFOs may be in the log file, but ERRORs may not.”
- You can set a log level and state that errors below a certain level will not be displayed, e.g. “ERRORs should be displayed but WARNINGs should not”. So the log level of an error may influence whether it gets logged at all.
And how would those differences influence ones decision about which level to make a particular log?
I came up with the following scheme for me:
|TRACE||For things that should happen, but one doesn’t want to see live. (For example SQL statement logs, etc.)|
|INFO||For things that should happen, but one does want to log live. (For example, “writing file to /x/y.pdf…”) It’s important to state IDs and paths in log files, for if something goes wrong, one needs to be able to work out exactly what the robot did with what rows and objects. And if the robot crashes, a log, written before an action is done, such as “Writing PDF file…” can help to identify what part of the process caused the error.|
|WARNING||For things that shouldn’t happen, but where the user sees the correct result. For example files that should always exist, but if they don’t, they can be automatically recreated. If those happen live, one might want to look at them. But as long as the user sees the right thing, one can look at them in ones own time.|
|ERROR||For things that shouldn’t happen, and where the user sees an incorrect result. Including “fatal” errors.|
And in that case, one would monitor ERRORs (or maybe WARNINGs if one was otherwise bored), and would set the log level to TRACE on all test servers and INFO on all live servers.