When a bomb beheld

Here’s a puzzle for you all …

Assume you’re working with a complex system that’s highly important for the business, one that is mission critical for some of the staff, where they are dependent of the system so they can do their jobs every day.

Further, lets assume the system has just been restarted after a major meltdown and you’re investigating the cause of that failure.

When you think you’ve identified the sequence of actions that caused the system to go down, when you have a plausible theory of the crash that matches known information, what do you do next?

Document your theory and actively solicit feedback from other knowledgeable parties?
Run a controlled test of your theory in a non-critical (non-production!) environment to see what happens?
Try it out on the production system to see it the meltdown recurs?

If you happen to think that option #3 is acceptable, think about what happens if your theory is correct and the production system goes down again. (Bonus points if you can guess which of these was inflicted on critical system at work today.)

Having a production system go down for reasons outside of your control can be both catastrophic and expensive, with costs in time, direct expenses and lost revenue.

If that down time occurs for reasons under your control - if you in fact actively caused that failure, then the consequences should also be “career limiting.”

For most systems, test environments are set up and maintained for exactly this reason - to provide a safe place where failure is less significant, less costly, less important.

No matter how good you think you are - in fact, regardless of how good you actually are - playing fast and loose with a production system is not ok.

Next Post
Caliburn.Micro and Ninject 31 Jan 2012
Prior Post
It's a (PowerShell) Trap 06 Jan 2012
Related Posts
Error assertions 26 Apr 2025
Browsers and WSL 31 Mar 2024
Factory methods and functions 05 Mar 2023
Using Constructors 27 Feb 2023
An Inconvenient API 18 Feb 2023
Method Archetypes 11 Sep 2022
A bash puzzle, solved 02 Jul 2022
A bash puzzle 25 Jun 2022
Improve your troubleshooting by aggregating errors 11 Jun 2022
Improve your troubleshooting by wrapping errors 28 May 2022
Archives
January 2012
2012