Optimize IT

1 Post authored by: John Atkinson
Share: |

I was recently presenting to a group of executives in the Silicon Valley, when I posed a question to them about the Astronauts of the Mercury program.  Specifically, I wanted to know who was the most controversial.  Surprisingly, many familiar names were riddled off – John Glenn, Alan Shepard, Gus Grissom.  Finally, someone piped up with "Sam!". He was right.


Sam, for those of you that are not familiar with him, was chosen to take the first flight in the Mercury program, before Alan Shepard and before John Glenn.  This wouldn't be a big deal if Sam was a former military pilot, or a renown test pilot, or even a human.  Sam, you see, was a rhesus monkey.

This caused all kinds of problems among the human astronauts.  Was NASA really going to send a monkey into space to do a man's job, or were they sending a man into space to do a monkey’s job?    The fact is, for Sam, flying the Mercury capsule was easy – it was fully automated – basically, the monkey was along for the ride. Of course, there was more to this, and the Mercury pilots had very challenging and dangerous jobs to do, but the automation allowed the astronauts to focus on those critical tasks that simply couldn’t be automated.


A few years back I was running the Tech Ops group for an online bank where I had my own "monkey" moment. At the time, we were one of the largest advertisers on the Internet, and were doing millions of dollars a day in transactions. The work the tech ops staff did was complicated and time consuming, but because our margins were so tight, there was no way they would let me continually hire staff to do the work, and without the additional staff, we would never be in a position to do the more interesting and higher value tasks, like redesigning our network infrastructure or implementing a CDN for ad placement. We had to automate to survive, and our first automation project was to automate our application releases.


A typical release for us included pushing code and content to 50-60 servers that made up our production environment, but it took 4-5 hours to execute if everything went right. On the Sunday nights when we would do roll-outs, I would schedule half the team to be in the office all night, and the other half would be ready to take over in the morning, looking for bugs or other deployment related problems. We always booked a couple rooms at the Palace Hotel so the engineers could grab a power nap if things went sideways. The process was manual, error prone, and had a history of running past our approved change window and costing us business. I needed to fix this, and my job was on the line.


Server and network tools were virtually non-existent, so a team of four of use got together and pulled off a week long hack-a-thon to build our own automation system. I’m not sure why, perhaps because we were desperate or masochists, but we decided to build this automation platform with scripts that leveraged MKS Toolkit and a monitoring system that could execute scripts based on events. It was a world-class kluge that would have made Rube Goldberg proud.


Finally, after a week of development, we were ready to show our work off to the rest of the team. We did a whiteboard of the environment, explained how we parameterized it, went through the list of scripts we created and what events triggered them. And finally, we were ready to execute the a release in our staging environment, and as I executed the first script, one of the engineers quipped "And the monkey presses the button"…

Filter Blog

By date:
By tag:
It's amazing what I.T. was meant to be.