Search BMC.com
Search

Michael Ducy

Organize For Success

Posted by Michael Ducy Nov 16, 2010
Share: |


Much is said about implementing automation solutions. Technically speaking, implementing automation is simple.  Problems are introduced when the organization that is attempting to adopt the solution clashes with the automation.  Often in organizations, individual teams have some level of automation.  This automation works well for them, and in their world they see no reason to change.  In order to successfully implement automation in this environment, organizations will need to reorganize to be most effective.  In reorganizing, companies should consider the following.

 

Form Teams Focused on Automation

 

From my experience, companies that form automation teams tend to be more successful than teams that simply attempt to layer on automation to existing processes and teams.  These automation teams can assist the other parts of the organization in implementing the parts of the automation solution that is relevant to them.  For example, the automation team would help the security team with creating compliance scans (PCI, SOX, etc) for servers.

 

In successful companies, automation teams are often broken down into Automation Administrators and Automation Developers.  Automation Administrators run the day to day operations of the automation solution.  They help teams report on the success of particular automation jobs and troubleshoot any problems with automation.

 

Automation Developers help the various teams in the organization initially setup automation .   Often these individuals have the experience and mindset needed to implement automation.  Also,  they have an in-depth knowledge of the automation solution, as well as a programming background.  These characteristics are important as these individuals will need to implement new automation, and translate existing automation to the new solution.  Additionally the Automation Developers help teams modify automation jobs to fit the changing needs of the organization.

 

Build cross platform teams

 

Build teams that are focused on more than just one specific function.  Instead of having a Windows Server team and a Linux Server, have one server team.  Individuals in the team may still focus on a particular specialty, but having a single team helps break down walls that prevent innovation and growth.  Individuals in the team can assist each other in developing and implementing automation.  Automation solutions such as BladeLogic are cross platform.  A cross platform team in the organization can maximize the value of such a solution, helping each member of the team maximize the value of the solution.

 

Plan, Build, Run

 

I used to think that reorganization into a Plan, Build, Run structure was a ineffective structure for organizations.  After stepping back and looking at successful companies, a Plan, Build, Run structure makes total sense for organizations looking to implement automation.

 

Plan - the plan stage can work with the rest of the organization to determine what solutions should be automated, gather requirements for automation, and act as a liaison to the build team.  This team should consist of individuals that have experience with automation, understand the intricacies involved in implementing automation, and one who knows the right questions to ask regarding automation.

 

Build - the build team should be focused solely on implementing the automation.  This team should consist of individuals that have a strong knowledge of the automation solution and understand how to get the most out of it.  This team should also have a multi-platform foundation, as they will be focusing on building automation for the entire organization.

 

Run - the run team ensures that the automation is working and producing the desired results.  They can also assist other teams in making small modification to the automation jobs (for example, deploy a new patch via an existing automated process).

 

Required Automation Features

 

In creating such an organizational structure, it is important that your automation solution can support this structure.  Your automation solution should have these key features to support your new organization:

  • Cross Platform Support - Unix, Linux, and Windows should all be manageable through one interface with a common set of functions and features that are applicable to all platforms.
  • Role Based Access Controls - Strong Role Based Access Controls that allow you to granularly give and remove access to elements of the automation solution.  In addition you should be able to easily promote packages between teams.
  • Packaging Technology - your automation solution needs a strong cross-platform packaging technology that makes it easy to update and change existing processes, as well as rapidly develop new solutions.

 

With an automation solution that has these features in place, and organizing your teams into a structure that supports automation, you can ensure your company's success in adopting data center automation.

Share: |


So you've got some reasonable, enterprise class problem to solve, say, patching 1600 Windows servers on Patch Tuesday, or deploying a specific configuration change across 1100+ WebSphere instances in under an hour.  How do you a: figure out how long it'll take, and b: figure out whether you've got the infrastructure to do it?  I'll take the long way to get there, but perhaps it'll help to think through the different parts of a real world example.

 

When i first started working in the BladeLogic Services group, one of my tasks at the customer i worked regularly with was patching the windows server environment on six continents within a couple of days of new patches being released.

 

There's three major parts to this problem.  One, older, less reliable machines that didn't always completely restart.  Of course, if you're patching the system, the app owner looks to you to explain why their application, which has been running "fine" up until now, suddenly no longer works.  The second challenge comes in getting payloads out to these machines in a timely fashion.  For some systems, we may need to transmit several hundred MB if a large patch or a new service pack needs to go out.  And lastly, executing these changes in a timely fashion, so that we're not keeping production systems out of service any longer than necessary.  Other challenges include the fairly basic ones inherent to any automation system: tracking newly built or imported and decommissioned servers, and determining all the ways a system can become unavailable for the duration of the patching window.

 

The first challenge is relatively easy: on Windows more so than even UNIX, services expect reboots, and are usually configured to startup at boot, so a reboot will rarely cause a hardship.  Even so, a reboot executed without other changes, the weekend or window before the window we want to patch in can help greatly simplify the patching or change-making administrator's troubleshooting tasks during the maintenance window.

 

Payload distribution has always been a challenge with distributed environments.  Either every host pulls or is pushed its payloads, resulting in either slow network connections, or long time to delivery.  By using a push model, and relays or repeaters in designated data centers, administrators can balance the added overhead of maintaining another software component with the direct benefits of local copies over fast networks.  Payloads can be synchronized a number of different methods, but at least one copy's going to need to get to a remote data center one way or another, and then ideally be copied across a faster, lower cost network.

 

The execution of changes in a timely fashion is the critical piece to any successful maintenance window.  Usually there's more than one change that has to go in, usually there's more than one team, and usually it means that someone's staying up past their usual bed time.  When I would regularly execute OS patching, that meant if I could figure out a way to successfully patch twice as many boxes in parallel, it might mean I hand the system off to the next guy an hour or two early, and that, on an 11PM-7AM maintenance window, might mean -they- get home -before- their kids are up for Sunday morning, or even before the bars closed.

 

So what's this all got to do with Service Automation, or Server Automation?  We have customers who, every weekend, are trying to get 10 pounds of changes to fit in a 5 pound bag.  Through appropriate tuning of the various tools they work with, they basically can run a certain number of things at any given time.  The number of parallel threads times the wall-clock time available in any given maintenance window, for a parallelizable task, dictates how many actions you can take, how many things you can kick off, monitor, and close out in a given window.

 

We worked out the available number of parallel threads available out of the box in the environment we were working with, and found it wasn't going to be able to get our task done in the window we wanted to, with the basic configuration.  So we looked at the available resources, determined we could effectively significantly increase the number of parallel threads for this task (10x), while staying within the performance constraints of our back end systems (which were not necessarily beefy systems).  We were able to achieve a result 8x faster than our initial design constraints (think about getting 8 hours of work done in 1, being able to catch the end of Saturday Night Live at midnight instead of falling into bed at 7AM).

 

Key factors to our success: objectively evaluating the workloads for how parallel-friendly they are (and on which end, the infrastructure, or the endpoints, they're going to be most intensive), the ability to effectively measure the performance capabilities of the key systems involved, an in-depth knowledge of how these systems scale, born of long experience, the ability to effectively package change, and some practical on-site testing.

 

You too can achieve tighter maintenance windows through some basic performance assessment, using the right tools for the job, and tuning for most effective performance.  If this is something you're struggling with, reach out to me.

Share: |



Bill Robinson’s recent column on “Long Live the Desktop” raised an interesting point.  The untapped potential of using desktop administrators for the server automation tasks is an area that more companies need to work for effective utilization of resources.  Likewise, the argument can be extended to network administrators and storage administrators too.  The trend in the Industry is more towards converged computing solutions (CCS).  The CCS type of systems unifies the major components of the Data center, mainly, servers, networks and storage systems.

 

Long has been the norm in the industry to treat the individual areas of CCS as silos.  Each of these areas has their own set of tool and processes along with their own management organizations.  The recent trend in cloud computing (private or public) requires tools, people and processes across the different areas of CCS to act as one to have an agile IT department which can respond to business critical requirements with the least amount of overhead.  Most of the complex interactions between different areas in CCS can be achieved by proper use of automated tools and processes but still requires a skilled set of people who are well versed in the breadth of the CCS functional areas.  These folks are the Universal Administrators (UA) of the IT department (akin to GP’s in medical field).

 

An organization needs to be willing to break down the silos and integrate them as a unified entity in order to create Universal Administrators.  The effectiveness of this approach can be increased by:

  1. Automation

Tools that automate most of the tasks across the different CCS areas is a good start.  Better yet, tools that can be integrated with each other to function cohesively.  The best solution would be to use tools that have been created which treat the CCS in a unified approach rather than piecemeal additions.   This helps UAs manage most of the CCS without getting bogged down with the nitty-gritty details.

  1. Policy and Procedures:

The policies and procedures pertaining to IT personnel in an organization play an important role in creating universal administrators.  Companies should encourage people to have more cross functional skills spanning multiple CCS areas.  The policies should not preclude the need for specialist in the different areas of CCS but should actively encourage the personnel to develop skill sets across all areas to be effective as an UA.

 

Obviously, BSM platform solution from my company, BMC Software Inc, comes to my mind as the best fit for 1) and that has been validated by the success of it in the marketplace.  For 2), the change has to come within the organization to make the best use of the most important IT resource,  the IT staff,  in order to create effective Universal Administrators.

Bill Robinson

Long Live the Desktop

Posted by Bill Robinson Nov 2, 2010
Share: |


I’ve been to more than one customer struggling to get their server administrators to adopt automation.  Most server administrators are used to thinking about the servers they manage in the single or low double digits.  The two database cluster nodes, those three mail relays.  There are a couple big applications that take a week to install and configure, which only happens a couple times a year so there is no need to spend the extra time figuring out silent installs and the like.  With Virtualization and “Cloud” the numbers are getting into three and four digits and those big applications are getting pushed out to these systems, in addition to all the smaller applications needed for endpoint protection, performance monitoring and configuration management.  All of these applications need to be packaged and installed in an unattended fashion.  The tools are there (eg BBSA and BARA) to do the job, but the administrators don’t know how to speak the automation language.

 

I’ve been to a couple environments that had an interesting way to solve this problem – rotate folks from the desktop support team through the server support team.  In most environments these teams are separate – separate managers, they report into different parts of the business, but if you think about it, they are doing the same thing:  managing applications and configurations on a set of systems.  Desktop admins often do this on a very large scale as an enterprise will have a huge number of desktops/laptops compared to the number of servers.  These systems typically have many applications on them – Office Productivity, Email client, Anti-virus, Anti-spyware, Web Browser and many more.  Desktop admins have become experts at packaging, deploying and managing applications deployed across a wide array of systems, usually in a heterogeneous environment – all of the things server admins are being asked to do now.  We often lose sight of the similarities of the day to day because of the perceived prestige of servers versus desktops.  The servers are where the big boys play and where all the big money is made.  Desktops are a reality we have to put up with and need to just work.  If a web server crashes we’re losing money but if a laptop crashes just reboot and too bad if you lost the document you were working on.

 

The desktop admin is an untapped resource pool that at first does not seem to make sense – will a desktop admin be able to get his head around a large RDBMS install or that mail server setup?  Why not?   Is it really that different from installing the Office Productivity suite?  No, the process for figuring out the install, testing and validating it are the same.  Coaxing an application that desperately wants user interaction during install is something most desktop admins can do in their sleep.  The exposure to the toolset and the sheer numbers and variety of applications and installation issues should be reason enough to rotate them through.  There might be some political battles to fight and some bruised egos to nurse (“Really, that Desktop guy is going to tell me how to manage my *server*?”) but in the end it’s about leveraging a skillset the best way possible to make your enterprise most successful.

Filter Blog

By date:
By tag:
It's amazing what I.T. was meant to be.