Tim Fessenden

ROI for Dummies

Posted by Tim Fessenden May 31, 2011
Share: |

In my last post, I talked about the need for our customers to develop a “Center of Excellence”. There are two good reasons to do this:


  • Develop standards that assist in enabling the broader organization, and
  • Identify value-add projects with solid ROI that can be measured & marketed to those who cut the checks




You really can’t be successful with any implementation unless you do both…But if I had to choose, I’d go with ROI measurement over standards. It’s simple really – if I’m cutting the checks and I can’t see what I’m getting, I’m inevitably going to ask to see that value - or cut the program all together. Better to be prepared up-front so you never have to get into that situation.




If you’ve avoided the first pitfall and you’ve invested in ROI measurement from the outset, do yourself a huge favor: don’t over-engineer it. There can be a thousand different ways to measure ROI, but in reality your management only needs you to be directionally accurate, likely within +/- 10%. Presented below are some key dos and don’ts for the most common ROI measurements of any automation system – labor cost & time reduction






  • Survey those living the day-to-day job to get a solid understanding of the time involved to execute a given use case.
  • Include aspects of the end to end process where automation will have no bearing. For example, in a provisioning process, there is a large chunk of time spent in procurement. It is likely that your procurement process is not the first thing you are going to automate with your automation tools, so it doesn’t make sense to include it in the measurement.
  • Use the POC process and early tests to determine what the average savings (in minutes or hours) will be each time the use case is run. Convert that savings to $$$’s by looking at fully-loaded costs of your administrators.
  • Attempt to dissect actual job run times or workflow start/end times to determine savings. The reality is that the time it takes for automation routines to run has little bearing on the actual savings. If you’re using automation, you’re likely off doing other things while the job is running – or better yet, you’ve got it scheduled.
  • Identify the key marker for when a given use case has been run. This could be a workflow that executes, a job that is run in the system, an email that the automation system sends – really anything that tells you that the use case was run and is completed. It’s pretty simple to get an operational report that will tell you how many times something was run. From there it’s simple math:




# Use Cases Executed * Average Savings = $$$ & Time Saved

  • Roll-up your ROI metrics by use case and in terms that your management understands. Those terms are:
    • $$$ Saved
    • Headcount Reduced
    • Reduction in Time to Market
    • Velocity/Efficiency Increased
  • Attempt to showcase raw automation data as a substitute for ROI. Data points such as “# of Successful Job Runs”, “% Compliant”, and “Average Run Time” are important metrics for the operators of the system, but don’t tell the check-writers a single thing about the value of the solution they purchased.





In general, automation solutions are great tools for the day-to-day operators, but they have a long way to go to spit out the kinds of ROI measurements and management reports discussed above. Until that time comes – and it is most definitely coming – use these tips to simplify the way you look at ROI and report up to your management. It will save you time and put your project in the good graces of those cutting the checks – and that’s exactly where you want to be!

Share: |

After a hard day's work of touting the benefits of Data Center Automation to our customers and prospects, I often like to come home, relax on the couch, and watch some TV on the new entertainment center that I recently setup.  Just when you would think that my mind is far, far away from the topic of Automation, I look at the universal remote control that I just used to power on my system and I find myself yet again surrounded by Automation... Entertainment Center Automation!


Before we dive any further, let's take a brief look at the history of remote controls...


The Old Days


Back in the old days, every component in your entertainment center (i.e. VCR, Receiver, TV, Cable Box, etc..) shipped with it's very own remote control.  Each remote control knew only the correct infrared control codes to operate it's respective component and nothing else.  The problem with this was that if you wanted to watch your new BetaMax copy of "Fast Times at Ridgemont High" in analog Dolby Pro Logic on your new Trinitron TV set, you would have to switch between 3 or 4 remote controls and manually punch a series of buttons in the correct order on each to get all the components powered on and working together.  Before long, your coffee table looked like the image to the right:

The Not As Old (But Still Old) Days


Then came the first pass at a "universal" remote control.  These remote controls were much more intelligent in that they knew the proper control codes for many of the common components in your entertainment center.  Some of these remote controls were even programmable in case your component was not supported out-of-box.  While this was definitely a step in the right direction by eliminating the need to have multiple remote controls on your coffee table, these remote controls still required the user to manually punch a complex series of buttons in the correct order to get all the components powered on and working together.  Valiant effort but still... Automation FAIL.




Today's universal remote controls are a big step towards realizing the full potential of Entertainment Center Automation.  Just like the previous generation of universal remote controls, they know the correct control codes for most of the components found in a modern entertainment system but take things a step further by now allowing the user to create automated workflows for many of the common use cases a user would use an entertainment system for.  By having the ability to communicate with all of the various components in my entertainment center and by automating the otherwise complex series of button presses required in the previous generation of universal remotes, I can now sit down, press a single button on my remote, and have the remote reliably do all the work for me in terms of powering on and configuring each component in my entertainment system in the correct order... Automation SUCCESS!


Just as the modern universal remote control automates many of the manual processes of interacting with the various components in your entertainment center, BMC's Atrium Orchestrator does exactly the same for your datacenter.  BMC's Atrium Orchestator provides out-of-box adapters that know how to communicate with the various components of your IT infrastructure and also provides you the ability to define automated workflows across these various components.  Just as the modern univeral remote solved much of the manual effort of using my entertainment center, BMC's Atrium Orchestrator can do the same for your datacenter.


Happy Automation!

Share: |

Assurance and Automation must converge

Principals office.jpg

When I was a kid I would occasionally get sent to the principal’s office.  It was usually due to a hall monitor or safety patrol detecting undesirable behavior. A couple of infractions that come to mind are snowball throwing during recess and giving out kooties “no trade backs” when I was supposed to be sitting in my seat.  The principal would then take action in the form of a stern talking or some other punishment that could vary in severity depending on the permission received from home.  The safety patrol provided the monitoring, the principal provided the remediation.


Are you beginning to see where I’m taking this?


This is similar to the sequence of actions that occurs in a Network OperationsCenter (NOC).  Monitoring is handled by network operations using Assurance tools that monitor and correlate events.  Remediation is handled, once approved, by network engineering using some sort of device management or automation tool.  Let’s take it one step further.  A friend of mine once threw a paper airplane in class.  This was detected and he was given a warning by the teacher (suppressed event).  After a double dog dare he threw another one that hit her in the head.  This was correlated with the previous event and led to escalation of a trouble ticket (go to the principal’s office).  Because the principal had permission on file from my friend’s parents (Approval), my buddy got the paddle which meant a canoe paddle to the backside (remediation).  He never threw another paper airplane in class.  I didn’t just make that up.


Assurance and Automation, the two main pillars of this dance are typically treated as two completely separate responsibilities handled by different personnel and different buying groups.  The hand off from one to the other is usually manual.  But it doesn’t have to be that way.  There are out of the box tools and integrations that handle detection, correlation and remediation all while following ITIL best practices and process compliance.  Here is an example:


Automated Fault Management

  • Change to network device is detected by Network Automation tool and sent to Event Management
  • Fault on that network device is detected by Network Management tool and sent to Event Management
  • Event Management correlates these events and automatically generates an Incident enriched with the relevant data.
  • Network Automation tool builds a script to remove the change to the network device that caused the outage.  This can be done manually or automatically depending on site policy or severity of event.
  • Change Ticket with change details is created automatically
  • Once the Change Ticket is approved which again can be done manually or automatically depending on policy, the Network Automation tool sends the script to the network device
  • Change Ticket is closed automatically
  • Network Management tool detects that service is back up and automatically closes the Incident.


Note that there are points in this sequence where pauses can occur for manual action and review; however,  if site policy allows it, the entire sequence could run automatically, taking full advantage of the automation tools (i.e.Network Automation in this example), assurance tools (i.e. Network Management and Event Management) and the service desk (i.e. Incident and Change management).


We have customers who are doing this level of automation today but the market has not nearly exploited the possibilities yet.  Another example that customers are beginning to leverage is compliance automation where instead of a Fault being detected, acompliance violation in a device’s configuration is detected and auto-remediated.


So go forth and further automate your service operations by collapsing Assurance and Automation.  It will keep you out of the Principal’s office. I double dog dare you.

Fred Breton

I had a dream ....

Posted by Fred Breton May 16, 2011
Share: |


      I had a dream and it was not about electric sheep but more about freedom, freeing time and doing more. Before sharing my dream I should explain in which context it happened.


     Those last days I was working on configuring an environment that required several blocks to have an application running. Looking on what needed to be installed for the application to run, I realized that 80% is things I’ve already installed on the last 5 years and at least 15% of the 20% remainder has been already done by people I know. But even more, for sure, at least 99% of what I’ve to do has already be done 100’s time by peoples of data center community. Purpose of the environments I use to build is to be used for demo or to do some tests and... I’m working about automation.

As usual, I started to look on my “private catalogue” which means, directory structure and packages, scripts I’ve on various automation environment. At the end of the day, I found less than 20% of my needs for various reasons:

  1. Some of my content was crappy, was not enough parameterized, was too much specific
  2. Some was not anymore on my storage (or I didn’t find it…, try to find a script you didn’t use since 3 years)
  3. Content of my folks was not enough parameterized and/or documented


Bottom line is that I did 90% manually because the clock was running and I had no time to improve content I had or wait others to provide usable content.


On the mid time I got some e-mails requesting help on some topics where I didn’t efficiently help or I even don’t answer because I was not able to provide easy and immediate content to use (point 1 before) and I was already running after cycles to achieve all of what I’ve on my plate.


I was so disappointed that such story could happen to me when I'm working on automation, on the age of Cloud and social media. That followed me on the night and I made a dream…:


I was in front of a web UI, I was designing the architecture of the environment I needed to build. I was specifying devices (server, network, storage…) and their relationship regarding their role and I was dragging and dropping some components (OS, patches, middleware, DB, software, hardening level, network zone…) from a centralized catalogue (on the Cloud) containing private content, public content and content that was shared between groups of people I’m part of. From the catalogue, I could see thread of comments for each content, how much time it was successfully used per context, I had accessed to advices, experience sharing…

I created the parameterization relationships between my components and the template of my services was done. I was ready to deploy. To do that I just needed to map the template with an environment that was composed of VMs, physical devices and Cloud resources and then I requested a deployment. All happened well, and so I published this services in the catalogue providing access to various groups who may need it, putting a description and some information were automatically added:

-     one successful deployment, kind of environment on which it happened,

-     all sub components were updated to increment the number of successful deployments on the relative context (kind of OS, version…)

-     post on the community to the groups that I provided access to.


During the deployment time, having a look on the posts, I saw a request from a PS guy who was on site. He wanted to know if someone already created some rules for specific compliance checks. With 3 clicks I provided to him access to one of my private content I've created one month ago that should make the job. He immediately got access to it from customer site and 30mn later, after few checks, he had the job done for the customer. Better, he even added few improvements to the content I provided to him.


BMC software has the building blocks to provide this service to Data Center with its BSM platforms. I’m not actually managing data center but I need this kind of service as many people building environments for test, demo, training or production whatever is the size of their environment or company. BMC could provide this putting BSM on SaaS architecture with multi-tenent capability and community capabilities.


How much of you had the same dream?

Bill Robinson

I am Sylar

Posted by Bill Robinson May 9, 2011
Share: |


I’ve been watching the last two seasons of Heroes recently.  I’d lost  touch during the writers’ strike and finally downloaded the last two  seasons from iTunes.  On an ability level, the character I identify most  with is Sylar(but not the serial killer part).  His ability  (all the characters have some superhuman ability) is ‘intuitive  aptitude’, which the Heroes wiki defines as “the ability to understand  the structure and operation of complex systems without special education  or training.”  You might say this could also be defined by the  statement “Jack of all trades and master of no one” but I would  disagree.  In any system there are some basic principles to grasp hold  of and then it is much easier to understand the more complex bits of  that system. 


As a consultant for the better part of 10 years this  “ability” has served me well.  I would run into new software, new  architectures or new requirements that I would need to quickly  understand and turn around and do something with the next day.  Many  times I’d never seen or heard of these things, with no understand how  they specifically worked.  But I knew how my product worked.  And I knew  in general what this new product or system did.  From there I could  find a SME or documentation and bridge the gap between the basics and  the specifics.  As someone else noted here in a post, the pace of change  in the technology world is very fast so the grounding in the  fundamentals is so much more important.


There’s a  relation here also to troubleshooting and understanding why something is  broken, and how to fix it (though Sylar stopped fixing watches).  I see  a lot of people who know how to look up error logs on a KB, but if they  don’t find the exact error text they give up.  They don’t understand  how to read through one sequence, see parallels with the problem at  hand, and adapt a similar methodology for resolution.  If you don’t  understand how something works, how can you ever hope to troubleshoot  effectively?  Understanding is not that hard, it can be done without  much technical knowledge.  Ask simple questions.  What talks to what?   Was this working before?  Was there a change?  How does this talk to  that?  If this isn’t working what happens to that?  Questions lead to  understanding.  If X failed because of A and caused Y, next time Y  happens check X and A, or maybe X and B.  I had a professor in college  that was really big on ‘thought experiments’ - taking what we  know about a system and predicting what would happen if we changed some  of the variables.  When you are limited by time and budget it’s good to  be able to focus your efforts.


I don't know how exactly my car works.  But I know enough that when the mechanic says I need my headlight fluid replaced, to take my business elsewhere.  How did I figure that out?  Google, Wikipedia, some auto repair websites.  Can I actually replace my headlights, probably not.  But I know there is no fluid in them.

Share: |




Having recently had my first child I’m quickly learning about the miracle cures/preventions that, it seems like, every older relative has.  After one discussion with my wife’s aunt, I learned that sufficiently covering a baby in Vaseline will essentially cure it/make it immune to almost any ailment known to people today.   The funny part is how similar the discussion can be when it comes to virtualization. It seems like with almost any problem or issue you run into, the first solution is to just use virtualization.

  I’ve definitely been guilty of it myself.  There has been more than one conversation I have had with our Director of Engineering, where I’m trying to shoehorn one last requirement into the plan for the next release, and he mentions we would need hardware, and my immediate response is “Couldn’t we just spin up a few VMs and use them?”.   I mean, after all, it’s virtualization, the magic pixie dust of the IT Data center. It just happens, right?  No cost, no hassle, no approvals to go through.  It just makes problems go away! 


In some ways, this isn’t far off from the truth.  By this point everyone is aware, virtualization has the ability to change the entire cost structure and time to value of getting systems up and running.  But, while there are huge potential savings, many people are also beginning to realize the additional management costs and security implications can be significant as well. I’ve lost count of the amount of times I have heard from a customer that they don’t know how many virtual machines are running in their environment, that they don’t know who owns them, and have no clue if they are even close to compliant with any corporate standards.  This lack of transparency is obviously a huge risk and exposure. In some ways virtualization asks makes these types of issues more significant than in a physical world, due to the velocity and volume that services are stood up in these environments. 

It is these exact challenges that have been some of the key drivers behind many of the investments BMC has been making in the virtualization space.  We want to strive to be sure that physical, virtual, or cloud, we provide a seamless mechanism to provision, manage and ensure compliance of those systems.  By focusing in on areas around:


  • Discovery and drift detection with BBSA and ADDM
  • Provisioning integration with change approvals via Closed Loop Compliance
  • True heterogeneous virtualization support, including: VMware, Solaris Zones, AIX Lpars, Citrix Xen, and Redhat KVM support
  • End to end lifecycle and request management via CLM


Leveraging these features starts to enable users to ensure they have the same control over their virtual resources as they have over their physical resources.  And ensuring that customers can leverage the benefits from virtualization without creating a larger issue elsewhere.

Michael Ducy

NoOps? No Way.

Posted by Michael Ducy May 3, 2011
Share: |

There was a time when a NoOp was a machine code routine sent to a microprocessor that did nothing, other than slow down the execution of the program for various reasons.  Today Ops, or Operations, is seen as the routine that slows down the execution of a company.  This view of Operations has given birth to methodologies such as DevOps - where Operations takes a more Development focused attitude of quick releases, nimble infrastructure, a level of automation and even developers executing operations functions.


From my perspective, DevOps has taken rise due to the ongoing conflict between development, operations and the business.  Development tends to be more strongly aligned with achieving the goals of the business, and operations tends to be in the way of development executing those goals.  Operations often acts as the canary in the coal mine, calling out all the potential pitfalls, and trying to hedge against every edge case no matter how small.


With the advent of cloud platforms and the ongoing struggle of Development vs. Operations, a new methodology has formed - NoOps.  Instead of slowing down execution, NoOps seeks to speed up the execution of the business by removing the greatest detractors of progress, Operations.  NoOps relies on third party providers to manage infrastructure, and developers can have relatively unfettered access to the platform to push new projects, fixes, and business initiatives.


As a former employee of Operations departments, NoOps at first strikes me to be an attack on my livelihood.  But taking the emotion out of the equation, it is completely understandable why companies want to remove the ball and chain of Operations.  Operations departments often have a single focus - how is this change going to impact my operations - versus the more correct view of "how is this going to effect the organization as a whole."  The notion of NoOps is simply of a natural extension of "innovate or die", or "survival of the fittest".  Operations departments have long been the harbinger of doom and gloom, thus giving the impression that they are holding back the progress of the organization.  Thus, the natural inclination is to eliminate those things that hold back progress.


But is NoOps really the way to go?  I think the recent Amazon EBS outage would give you an definite answer of "No NoOps".  I make this statement not based on how the outage came about, but rather based on the large number of companies that suffered an outage because they didn't have a scalable, reliable, and redundant architecture.  These companies relied on a single point of failure, one Amazon availability zone, which any student of Operations could tell you is a big no-no.  Eliminating Operations means you eliminate years of experience in building environments that are scalable, reliable, and redundant.


Rather than eliminating Operations, Operations needs to evolve.  Operations can start the evolution by focusing on three things:


  • Institute Dynamic Business Service Management (BSM) and eliminate the old Operations mind set of knobs and widgets.  BSM focuses on aligning the Operations department to the needs of the business, much like development has already done.  Instead of looking at the individual servers, switches, or routers, BSM focuses on the services IT provides to the business and how those services impact the bottom line.
  • Institute automation to become more agile and flexible.  This will allow Operations teams to better respond to the needs of the business with repeatability and consistency.
  • Stop saying no.  Instead Operations needs to start listening, understanding, and collaborating on moving the business forward.  Do you still think that .01% chance of incurring a $50,000 outage is a reason not to do a new project?  Use basic tools like decision trees to convince your teams that projects make sense from a dollars perspective, edge cases included.


Or course, these are just 3 starting points for turning around your Operations departments.  Ask your peers in development and the lines of business what your Ops team can do to avoid becoming NoOps, or post your suggestions below.

Filter Blog

By date:
By tag:
It's amazing what I.T. was meant to be.