Monday, June 7, 2010

All failed deployments are anachronisms.

Your code doesn't care what day it released on. If there is an extended outage or degradation as a result of a deployment, the code is always in the wrong place at the wrong time. Here are all the wrong days to deploy a broken release:

  1. Monday
  2. Tuesday
  3. Wednesday
  4. Thursday
  5. Friday
  6. Saturday
  7. Sunday
and the times:
  1. One O'Clock
  2. Two O'Clock
  3. ...
and obviously a bunch of holidays and unrelated business synchronizing events (black friday, sales deadlines, etc) that you also shouldn't deploy on.

However, if you really believe this, you should stop writing, managing or using software NOW! Unfortunately, you will deploy broken code, because:
"I’m sorry to say so but, sadly, it’s true that Bang-ups and Hang-ups can happen to you."
-
 Dr. Seuss (Oh, the places you'll go!)
And when you do deploy that broken code, it probably had nothing to do with when it was deployed. In fact no one probably would've noticed if you did just one thing...

ROLLBACK!
Three rules:
  1. Code must always be able to be rolled back.
  2. Rollback must be a single command.
  3. The rules for rollback must be simple, easy-to-follow and aggressive. (ie. Customer call related to issue with release, exception related to a release, etc.)... then, just...
ROLLBACK!

Then, figure out what went wrong, how to prevent it from happening again... rinse and repeat... any day... any time.

Thursday, June 3, 2010

Follow your debugging process, stupid. In 10 easy steps.

This post is a reminder to myself. I wasted a lot of my personal time and time from members of my team by not following, a simple, repeatable debugging process. The below process won't work 100% of the time, but when it does it will save you hours, stomach lining and and leave you extra gas in the tank for handling problems that are actually hard to replicate and debug.

  1. Verify the input and output creating the defect.
  2. Replicate the defect in your development environment.
  3. Write a unit test that replicates the defect using the same input as the defect and desired output. (You may want to extend this to other permutations of the defect as well.)
  4. Verify the unit test breaks.
  5. Fix the code.
  6. Verify the unit test passes.
  7. Verify that the defect is resolved in your local development environment.
  8. Release to next environment (Prod/QA)
  9. Verify that the defect is resolved in the next environment.
  10. Repeat until defect is fixed for all users.
Do not try to skip steps in the process. You will miss something important. You will waste yours or other people's time. It's not worth it.