In my previous post we established that a key factor in professional software development is delivering value to the business - that we are constructing a business asset and that we should be building the greatest possible value into that asset.

One of the less obvious consequences of this perspective is the necessity of delivering more than just the requested features and functionality - those features and functions must be managable in a production environment as well.

The indefatigable Ayende Rahien once wrote on his blog:

A professional system is one that can be supported in production easily. About the most unprofessional thing that you can say is: “I have no idea what is going on.”

Think about this for a minute - when you have finished writing the code, the build is complete and the automated tests pass, when your manual testing phase is complete, the business has signed off and your code is released to production, what happens then?

If you haven’t given any thought to the problem of monitoring the system in production, you might just find your operations team wash their hands of it and pass any and all issues to you - and then you’re stuck providing support instead of creating the next big feature.

Logging is one important way to provide a way to monitor the system - but this involves a lot more than throwing a few tracing statements into some error handlers.

Regular activity needs to be logged - if you only log errors then you’re unable to tell the difference between normal success (nothing logged because nothing is going wrong) and catastrophic failure (nothing logged because nothing is going).

Business operations need to be logged - while the definition of a business operation varies from system to system, each error needs to be handled within the context of the operation that was being attempted when the failure occurred. If you’re not logging any detail about the business operation, good luck sorting out that error.

Logs might contain sensitive data - having logs that are useful enough to troubleshoot problems often means having logs that contain actual data, details of customer activities, SQL used for querying the database, and so on. Without this information the logs may be useless, so the answer is to ensure that only the appropriate staff have access to the logs.

Error logging must be complete - it’s amazing how some developers will log the message out of an exception they’ve caught and believe their work is done. In most environments, an exception contains a wealth of information that is of use for diagnistic purposes; throwing this information away serves no good use and can make it substantially harder to solve the underlying problem.

Logging introduces its own hassles - regardless of the approach you take to logging, it introduces issues that you need to address. For example, you don’t want your production system to go down because your proactive logs fill the disk; nor do you want it to go down because the database used as a log target is temporarily unavailable.

What are you working on at the moment? Does it have a good logging story? If not, why not? - and how are you going to improve it?


blog comments powered by Disqus