Marco Valtas

The forgotten art of Failing

Failing is so important as not failing (accomplishing the task) in software, but for some time I have the impression that developers are not well aware of this. For amusement and horror you can check the Error.d in The Daily WTF. I will write about regular software, not mission critical ones nor services these have different contexts where failing or not and how a more complex decision. Regular ones are those we use daily like a Word Processor, a Image Editor, a Browser, a command line tool and similar tools.

Knowing how to fail is important it gives your user a chance to fix the issue. If a configuration is misspelled (syntax error), if a flag is not recognized, if the user tried to do some illegal operation a proper failing message will help the user to learn more about your software and continue using it. On the other hand a huge cryptic message, a whole confusion of errors mixed with native language messages will drive your users away. Common users will be puzzled, scratch their heads and ask the closest tech savvy friend about the error. If the friend doesn't know what it means but know other software that does the same, you know, your software lost the chance. Developers, the tech savvy guys, will scratch their heads, shout a couple of curses, copy the error and paste it at Google. If the solution doesn't show up until the third matching result and the Navigator (assuming that pairing is happening) knows another software that does the same, again, chance lost. But not all is lost, let's see some ways to improve your failing.

First thing to consider is scope, keep it under control. There's hundreds causes that can lead your software to fail and dealing with all of them may sound too much work. Here's list suggestion where to concentrate your efforts when dealing with errors:

user input - direct input, input files, command line flags, config files...
user environment - user file system, net connection, peripherals...
other dependencies - other services...

Failing is related to the user experience, keep this is mind and try to deal with it in a way that your user can tell what is missing or at least have enough information to search for a solution.

Finally, here's some tips plus reading:

Fail Fast (or Die Earlier, Die Often) - As soon you get to a point that recovering is not possible, fail. This helps to give a more meaningful message with one error, try not accumulate and give them all at once, this could hide the real error and confuse the user.
Usage messages - Command line tools are expected to respond to --help (unix like) or /? (dos like), most languages have libraries to parse command line arguments so take the time and write these help messages.
Log fails properly - Logging levels (an art apart) should be used consistently, check the JBoss logging conventions for a suggestion how to use these levels.
Static analysis - Java has the nice FindBugs that help you to find a lot of a bugs that could reach your user, other languages have similar tools use them. Don't let a NullPointerException (plus a huge stack trace) spoil your user experience when a nice simple message would do.
Errors should not expose the internal structure of your software, this is interesting just for you and your team (use a debug flag for it).

Published in Mar 22, 2011