Skip navigation
1 2 3 Previous Next

Optimize IT [ARCHIVED]

153 posts
Share This:

Another day, another SSL vulnerability. This one doesn't have the cool logo and memorable name that Heartbleed had, but it's no less serious. 


We shouldn't be surprised by this. IT security compliance is a constant arms race, where nobody can ever get far enough ahead to stop running. The best that IT professionals can hope for is not to lose ground. There is a constant deluge of patches, updates, security bulletins, new policy requirements, and best practices to keep up with. There is the new technology to learn, and the old technology that has to keep working in the meantime. All the while, there are constantly-evolving requests and requirements from the business that need to be satisfied.


wiring-closet.jpgThe Heartbleed bug at least brought with it the advantage of a high media profile, so it was easy to justify a fire drill. When even the mainstream press was publishing horror stories about the impact and consequences of a bug, there was plenty of attention focused on solving that problem. There is much less patience with a constant state of emergency, which is what IT security amounts to if it's done by hand.


The status quo in IT security was born of an era when there were small numbers of systems and all of them were known to the IT operations team. Today, with servers multiplying continuously, and unknown but large portions of the IT estate outside of corporate IT's control or even visibility, that approach is showing its limits.


This is hardly a new problem, and most IT operations teams have adopted tools to help them deal with the constant demand. The problem is that each team has chosen its own tool for its own immediate needs, without considering the wider requirements. This ad-hoc and disconnected approach has failed to deliver the expected results, because local optimisation of each sub-task did not impact the overall process. This failure to apply standards to the entire compliance and security workflow is the reason that despite years of effort and investment, most IT departments are still stuck in fire-drill mode when it comes to responding to the constant flow of new issues.


BMC proposes a new maturity model for compliance and security. To find out more, join us for a free webinar on Thursday the 12th of June at 11am CST / 6pm CET. Register here to attend the webinar, and bring your questions to the live Q&A session afterwards.

Share This:

It’s been a few weeks now, but the Heartbleed OpenSSL bug has not gone away. The bug itself is not that hard to fix. Patches for affected operating systems and libraries were available on the same day as the announcement.

heartbleed small.png


However, recent survey data from Netcraft (as reported by re/code) indicate that the fires are still smouldering, with only 14% of sites having completed all the required steps to fix the Heartbleed bug.

Traditionally, IT professionals divide their work into “keeping the lights on” and “firefighting”. Fixing the Heartbleed bug definitely fell under the second category. Fighting fires means lots of people running around, screaming, confusion, and elevated stress levels and mess all around. Because of the confusion, it's not the most efficient way to get things done, but most of the time it works - eventually.


But what happens once the fires are out and everyone goes back to concentrating on keeping the lights on?




Six Steps to Fix Any Configuration Problem - Not Just Heartbleed



What is a good process to put into place to make sure that problems like Heartbleed don’t crop up all over again once the IT team have taken their eye off the immediate issue?


  1. Schedule regular discovery across the entire IT estate
  2. Deploy management agents to any previously unknown servers
  3. Execute audit of all discovered systems to identify security and compliance violations
  4. Review dependencies and impact of remediation, set exceptions as necessary
  5. Approve and schedule remediation
  6. Update documentation trail automatically



Automated discovery is a must. The IT department can no longer rely on being the sole source of IT resources. It follows that their list of provisioned systems is no longer an exhaustive list of all systems, and therefore a system is needed to discover systems provisioned through parallel, “shadow IT” mechanisms.


This means that the old security paradigm that divides the world into “inside the firewall” and “outside the firewall”, with maybe a DMZ in between, is no longer valid. Systems way off in the public cloud may still be logically “inside” the network. This means that compliance and security audits can no longer focus on a short list of “core” systems. Any and all systems are potentially “core” in terms of vulnerability and dependency, and need to be audited accordingly.


Because more and more systems are being provisioned without input from the IT department, IT can no longer rely on controlling the templates used for provisioning or the definitive media library used to deploy applications. The assumption has to be that there will be systems provisioned with obsolete or otherwise vulnerable configurations, and will have to be updated in the field.


The democratization of IT means that the owners of those systems may well not have the skills to harden them on their own, so IT will need a mechanism to distribute, validate and enforce secure configurations. Because of the rate of change of virtualized environments, this process has to be automated to keep up with the current state of IT.


Finally, once the fires have been put out and everyone has moved on to the next issue, people will immediately start forgetting the details of Heartbleed and all of the things they had to do to fix it. An audit trail is needed to keep that historical memory, but nobody has time to create one of those by hand while also fighting fires. The documentation needs to be created automatically for every action.

None of these requirements – automated discovery, and automated configuration management – are new. Vulnerabilities like Heartbleed simply serve to underline the urgency of fulfilling them, to avoid a future of constant fire-fighting in IT.

What to do today


If you are a current customer of Discovery, you can register here for a free assessment of your environment for compliance and security, including Heartbleed.

BMC offers a ZipKit for BladeLogic TrueSight Server Automation to automate the Heartbleed remediation process. We also have integration with Discovery to make sure that we catch all the vulnerable systems - including ones that might have been provisioned without full IT oversight or even awareness. The whole process can also be documented in Remedy ITSM The specified item was not found..

This way, Heartbleed remediation moves from firefighting to routine background, and you can focus on activities that add value to the business.

Share This:

If you are reading this blog, I assume you are familiar with the Heartbleed bug. For a while there it seemed that even mainstream news sites were discussing little else apart from Heartbleed.


The short version is that a certain library used for secure communication, OpenSSL, was found to have a vulnerability, and to have had it for the last couple of years. Because this is an open-source library, it was re-used in many many other products and web services. BMC BladeLogic can help with the applying the fix to your affected servers, and many of our customers have already used BladeLogic for this purpose.




The tricky bit about Heartbleed actually comes after the first round of patches. How can you find and fix all the affected devices and services on your network? Remember, this is not a one-time task. Six months from now, after you’ve cleaned your network of HeartBleed, someone clones an old VM template that pre-dated awareness of the problem and has not been patched. How are you going to deal with that situation in the future?


As it happens, BMC can be a big part of the solution to that problem.


The first part of the solution is discovery. BMC’s Discovery solution will discover devices and services on a continuous basis. You can then look for affected software versions and determine correct remediation actions. Because ADDM discovery is frequent and does not depend on external inputs, this approach will give a good view of the actual vulnerability of your environment to Heartbleed - and also all sorts of other issues. For instance, scans might turn up old systems running Windows XP, which are now completely out of support.


The next step is to put those remediation actions into place. BMC BladeLogic will let you distribute patches to the affected servers or even shut them down if they are unauthorised. Automating this patching process is crucial to ensuring complete coverage, especially in large-scale environments. Patching by hand would take too long, but automation is fast.


The other advantage is that automation is predictable. Once you have automated a process, you can be sure that the results will be the same whenever and wherever it is run. This means that you can test the upgrade in a pre-production environment and make sure it does not have unforeseen impacts before you roll it out in production. BladeLogic will let you validate that those environments are congruent and therefore that your test makes sense.


Either way, Heartbleed is not a vulnerability you can afford to ignore. Attackers can not only extract random data from your server’s memory, but it has now been proven to be possible to hijack user sessions. This sort of impersonation could have very serious consequences, especially if it takes place some time in the future when much of the current attention to Heartbleed has died down. You need to ensure defence in depth, not just against Heartbleed but against the next bug and the one after that. Continuous automated discovery and remediation is the only way to do that.


If you are an ADDM user we can schedule an assessment in your environment, identifying Heartbleed and other vulnerabilities. Please register here to arrange that. For more information about how to use BladeLogic in your fight against Heartbleed and other similar problems, join the discussion on the TrueSight Server Automation forum or contact your BMC account manager directly.



Regarding our own products, BMC has published a constantly-updated list of affected products and the status of those patches, which you can find here. BladeLogic automation products (Server, Network, Middleware and Database) are not affected, but the Decision Support reporting engine is affected on Unix and Linux (not on Windows).

Share This:

Picture1.pngAs with many words in the English language, the word integrate (or integration) is derived from the Latin word integratus which means to “to bring together or incorporate (parts) into a whole.”  The word integration has been in existence since the 1600’s which means that people have been integrating things for a very long time.  When I hear that two products are integrated I generally have a positive reaction and assume that the integrated-whole is better than the individual parts.  But is that really the case – and is that enough?



After years of hearing the challenges, pains and requirements of IT operations professionals, I have the impression that when people learn that something is “integrated”, they assumed that it’s also (by default) - simple to use and will meet most or all their needs.  Now I was inclined to agree with them but then there’s that old adage, “don’t assume anything.”


Many times integration between management products simply consists of a launch-in-context capability, where the operator has the ability to; for example, seamlessly switch from viewing a device in an incident ticket, to viewing the historical performance or capacity utilization of the same device from another application.  Now it might be simple enough to swivel from one application to another, but it is worth it?  Does this integration (e.g. launch-in-context) give the operator enough information to triage and remediate the incident faster as a result, or does he/she simply have more data?  Unfortunately in many cases it’s the latter, which is minimally useful, since it is information that enables decision making and data is simply the input. 



So if integration is not enough, then what is enough or required to make IT operations more proactive?  I suggest the combination of analytics, visibility and workflow is the key…



Analytics – is a contemporary and very popular word that is at times overused and under defined, but it’s relevance to IT Operations is growing.  In the context of IT operations (i.e. managing the availability, performance and capacity of the IT infrastructure) the word analytics equates to intelligence.  For example, intelligence that can be applied to vast amounts of complex data, to identify patterns of behavior or correlate business metrics to infrastructure utilization, allows IT operations to identify normal/abnormal activity (and reduce incidents) and understand which and how infrastructure resources will respond to business demand. 



Visibility – is something that IT Operations has had for years in the form of availability, performance and capacity charts, graphs and reports.  But many of these views display nothing more than data and require extensive analysis to understand where to focus your efforts.  However, visibility that’s driven by analytics provides “actionable” views that can quickly identify the hot spots in the infrastructure and help prioritize efforts, speed analysis and remediation.  Having multiple levels of visibility is also a must.  No longer are device utilization views all that are required.  Some views, like a view showing the number of servers in the 95th percentile (2 std. deviations) of CPU capacity used, are required by the majority of performance/capacity analysts.  Factor in the professional preference as well as the skill and ability of the analyst and what’s required are views at all levels - infrastructure, applications, services and the data center.



Workflow – is the means in which actions are taken and things get done “automatically” – and are driven by intelligence so the right things get done right (and quickly) instead of doing the wrong things right, or the right things wrong.  IT workflows have been around for decades but in my experience there’s certain things that IT operations can and will allow to be done automatically, such as server configuration.  Then there are actions, such as automatically adjusting capacity to a mission critical server, that (rightfully paranoid) IT operations folks will only do semi-automatically at most.


We have seen an explosion in IT complexity, speed and scale with the advent virtualization, converged infrastructure and cloud computing.  In my opinion IT challenges are only compounding exponentially as these new technologies are not completely displacing physical systems, which come with their own challenges.  In short, people can’t keep up and automated workflows, driven by intelligence, are what will reduce human error and increase operational efficiency.



For years IT organizations have struggled to deliver proactive IT operations.  While product integration provides efficiency benefits it’s the addition of analytics coupled with visibility and workflows that will propel organizations from simply improving mean time to repair (MTTR), to achieving the higher value mean time between failure (MBTF).



Don’t get me wrong, I enjoy using integrated products – my iPhone being one of them.  But sometimes integration is not enough.

Share This:

Last week, we talked about how shared resource pools change the way IT operates the cloud environment. We mentioned that how to avoid false positive and save the maintenance costs by measuring the pool decay. Today, I am going to explain how you can avoid another major challenge in the cloud operations -  outage storm.


The Outage storm typically is caused by cascading error and the lack of mechanism to detect those errors. Chances are you are not unfamiliar with this issue. In April, 2011, Amazon AWS experienced a week-long outage on many of its AWS service offerings. I examined this incident in the article - thunderstorm from Amazon reminded us the importance of weather forecast. In a nutshell, a human error diverted major network traffic to a low bandwidth management channel. This flooded the communication between many EBS nodes. Because of the built-in automation process, these nodes started to unnecessarily replicate themselves and quickly consumed all the storage resources the availability zone. Eventually it brought down not only EBS but all other services relying on it. Almost a year later, Microsoft Azure experienced a day long outage. This time, a software glitch started to trigger unnecessary built-in automation process and brought down the server nodes. You can see the similarity between these two incidents. An error happened and triggered, not intentionally, automation processes that were built for different purpose. The outage storm, without any warning, brings your cloud down.


So how you can detect and stop the cascading as soon as possible? Let's look at these two incidents. The environment seemed normal during the onset. The capacity in the pool seemed good. I/O was normal. The services run from these pools were not impacted. You felt everything was under control since you were monitoring the availability of each of those resources. Suddenly, you started to notice number of events showing up in your screen. While you were trying to make sense on these events, there were more and more events coming in and alerting you the availability of many devices were gone. Not long, the service help desk tickets swamped in. Customers started to complain large number of their services experiencing performance degradation. Everything happened just so fast that you didn't get time to understand the root cause and make necessary adjustment. Sounds a nightmare to you?

How one can prevent that from happening? My suggestion is that you need to two thing. One, you need to measure the pool health. Particularly, in this case, you need to monitor the distribution of health status of its member resources. How many of them are in trouble? Do you see any trend how the trouble is propagated? What's the rate of this propagation? Case in point, the Azure incident could have lasted longer and impacted more customers if Microsoft team hadn't implemented its "human investigate" threshold. But still it lasted more than 12 hours. The main reason was these thresholds rely on the availability monitoring through periodic pings. And it took three timeouts in a row to trigger the threshold of the pool. And this delays the alert. So if you want to detect storm at the onset, the second thing you need to do is to detect the abnormality of behavior for its member resources, not just the ping. Combining these two measurements, the device can reflect their abnormality health status and the pool can detect the changes of the health distribution among its member resources. You, as an IT operation person, can set up rules to alert you when the health distribution changes across a critical threshold.


How does this benefit you? First you can get the alerts once that threshold is crossed even if the overall performance and capacity of the pool seem good. You will then have enough time to respond, for example diverting services to another pool or have the troubled devices quarantined. In addition, you won't be swamped by massive alerts from each affected devices and try to guess which one you should look first. You can execute root cause analyses right from that alert at your pool level.


Cloud is built with the automation as the main mechanism to ensure its elasticity and agility. But occasionally, like what happened in these two incidents, errors can amplify their damages through cascading very quickly through those automation. Because of its inherited nature, the outage storm is more often than you think. If you operate a cloud environment, chances are you will face them pretty soon. You need to find a solution that can detect resource health by learning its behavior and can measure the distribution change of those health status at the pool level. The shared pool changes how you operate your cloud environment. Operation solution needs to  evolve to help you better measure pool decay and detect outage storm. Cloud-wash is not going to cut it.


To see how it works in a real world, you can visit BMC booth (#701) in this year's VMworld. You can see a live demo on how to detect a outsage storm at the onset and get some ideas how you would approach these problems. If you want to discuss this with me, please let the booth staff know.

Share This:

Jun 23, 2012 is the 100th birthday of Alan Turing. 76 years ago, Turing, just 24 years old, designed an imaginary machine to solve an important question: are all numbers computable? As a result, he actually designed a simple but the most powerful computing model known to computer scientists. To honor Turing, two scientists,  Jeroen van den Bosand and Davy Landman,  constructed a working Turing's machine .  It is not the first time such a machine is built. The interesting thing this time is that the machine was built totally from a single LEGO Mindstorms NXT set.



The modern brick design of LEGO was developed in 1958. It was a revolutionary concept. The first LEGO brick built 54 years ago still interlocks with those made in the current time to construct toys and even the Turing machine. When you want to build a LEGO toy or machine, you don't need to worry about when and where  the bricks are manufactured. You focus on the thing you are building and what standard shapes and how many of LEGO bricks you need.  And you can get them in any of those LEGO store no matter what you are building.


Sounds familiar? This is very similar to how one would build a cloud service using resources in a shared fabric pool. You don't care which or what clusters or storage arrays these resources are hosted. All you care is types (e.g. 4cpu vs 8cpu VM) and service levels (e.g. platinum vs. gold) these resources need to support. Instead of taking each element devices, such as computer hosts or storage arrays, as key building blocks, IT now needs to focus on the logic layer that provides computing power to everything running inside the cloud - VMs, storage, databases, and application services. This new way to build services changed everything on how to measure, analyze, remediate and optimize resources shared within the fabric pool in the cloud.


To understand why we need to shift our focus to pools and away from element devices, let's talk about another popular toy - puzzle set. Last year, I bought a 3D earth jigsaw puzzle set to my son who was 3 years old at that time. He was very excited as he just took a trip to Shanghai and was expecting a trip to Disney World. He was eager to learn all the places he had been and would be visiting.  So he and I (well, mostly I) built the earth using all those puzzle pieces. The final product was a great sphere constructed with 240 pieces. We have enjoyed it for 2 weeks until one of the pieces was missing. How can you blame a 3 year-old boy who wanted to redo the whole thing by himself? Now here is the problem, unlike those two scientists who used LEGO bricks to build the Turing machine, I can't easily go to a store to just buy that missing piece.  I need to somehow find that missing piece or call the manufacture to send me a replacement. In the IT, it is called incident based management. When all your applications are built using dedicated infrastructure devices, you have a way to customize those devices and the way how they are put together to tailor to the particular needs of that application. If one of those devices has issue, it impacts the overall health of that application. So you file a ticket and operations team will do triage, isolation, and remediation.



In a cloud environment with shared resource pools, things happen differently. Since now the pool is built  with standard blocks and is shared by applications, you have the ability, through cloud management system, to set policy which moves VMs or logical disks around if their underneath infrastructure blocks get hit by issues. So a small percentage of unhealthy infrastructure blocks doesn't necessary need immediate triage and repairing action.  If you monitor only the infrastructure blocks themselves, you will be overwhelmed by alerts that not necessary impact your cloud services. To respond all these alerts immediately increases your maintenance costs without necessary improving your service quality. Google did a study on the failure rate of their storage devices. They found that the AFR (annual failure rate) of those storage device is 8%. Assuming Google has 200,000 storage devices (in reality, it may have more than that), every half hour, you will have a storage alert somewhere in your environment. How expensive is it to have a dedicate team to keep doing triage and fixing those problem?


So how do we know when services hosted in a pool will be impacted? We give a name to this problem  - pool decay. You need to measure the decay state - the combination of  performance behavior of the pool itself and distribution of the unhealthy building blocks underneath it. In this way, you will be able to tell how the pool, as a single unit, performs and how much ability it has to provide the computing power to hosted services. When you go out to look for a solution that can truly understand the cloud, you need to check whether it has such ability to detect the pool decay without giving you excessive false positive. Otherwise, you will just get a solution who is cloudwashing.


Back to my missing piece in the 3D jigsaw set, I finally found it under the sofa. But the lesson learned,  I  now  bought my boy LEGO sets instead.


Next week, we will examine how the resource pool with the automation introduces another well known challenge - outage storm. Stay tuned.

Share This:

I haven't posted a blog since last August. One reason is that I had two great vacations on beautiful Caribbean islands. But most of time, I was working with a team of excellent talents to finish a project that allows IT to run its cloud environment with high confidence. Today, I am very proud to say - we did it.


I have talked about how cloud computing poses new challenges in IT operations and why proactive performance management is even more important now. Today, we launched the next generation of cloud operations management solution to provide a set of capabilities to help IT take on those new challenges. These capabilities range from cloud panorama, showing performance health for every shared fabric resource in your environment, to automated workflow, allowing those resources to be optimized for provisioned workloads.


Actionable Views

A cloud environment is complex. Not only do you have to manage the infrastructure resources, such as storage, network, and compute, but you also need to understand how they collectively power cloud services through shared pools. Many approaches you can find today in the market try to collect and show as much as data possible. We believe this is not efficient and actually prevent you from spotting the real issues and risks in your cloud environment. This new release gives IT operations and administration teams an actionable view - cloud panorama.  Cloud panorama not only summarizes the data organized as you see in the cloud (e.g. pools, containers, etc.) but also allows you act upon on what you can get from those data.

High-precision Root Cause Analyse

The data is important. But the meaning of the data is even more important. What an operations staff wants to understand is what these data really mean to his/her day-to-day job. This is where the analytics comes in. Analytics for performance and capacity data is not a new thing. What unique about the analytics enhanced in this new release is that it is the first time an analytics engine can provide the insight into how shared pools in the cloud power highly-automated cloud services. Lack of this type of insight causes serious problems. Think about last year's Amazon AWS outage and this year's Microsoft Azure disruption. In the coming blogs, I will explain why it matters to you and how you can execute high-precision root cause analyses to prevent this type of outage from happening in your cloud environment.


Intelligent Workflows

When end users asks for a new cloud service, such as new VMs, new database instance, or new storage space, they will get it almost instantly. This is because the provisioning of these services is automated. The challenge to cloud operators is how they can ensure these services run as expected from the get-go. To manually identify, deploy and configure your monitoring agents into these services is not an option. In this new release, we will enable you to automatically deploy and configure your monitoring agents during the provisioning of the service. By doing so, all your cloud services will be instant-on and instant-assured. In addition, when a service is provisioned, the solution tells the provisioning engine how to optimize the workload,  leveraging the workload pattern it  has learned and the capacity supply it knows. Finally, the solution analyzes the data it collects and provides showback reports.


Cloud computing provides IT tremendous advantages to provide automated and elastic services to its end users. But it also creates new challenges that IT operations teams have to face. In the past year, we at BMC worked very hard to understand those new needs. Today, we are excited to announce this new release of cloud operations management solution. Through its actionable views, high-precision root cause analyses, and intelligent workflows, this release enables IT confidently to power the cloud, control the cloud, and take intelligent action to ensure high-quality service delivery. Take a look at the clickable demo my colleague Thad did and check around the product page, particular that 2-minute explainer. We will get into more details in the coming weeks.

Share This:



By now you've probably heard about the new Cloud Operations functionality we've added to our BMC ProactiveNet Performance Management platform.  You may have even read a bit about what it is and how it can help manage a cloud environment.  But if you are anything like me, you get MUCH more out of driving a short demonstration than you do reading any whitepaper, viewing PowerPoint presentation, or reading a datasheet.  (Disclaimer to keep my job - not that there is anything wrong with those much needed mediums)  BMC has created a very short, user driven demostration of our new Cloud Operations Management solution that is now available at


In the demonstration you will see how the new Cloud Operations Management solution uses Visibilty, Analytics and Workflow together to drive faster resolution time to complex cloud performance issues.

Share This:

untitled.bmpYou may not recognize the acronym BNY, but you’ve likely heard of the Bank of New York-Mellon (www.bnymellon.com, a worldwide investment management and investment services company headquartered in New York City.  BNY Mellon competes in the rapidly changing global financial services marketplace, where IT is a key player in supporting the business by providing technology based solutions that enhance the businesses ability to be competitive and successful.  With a global IT infrastructure consisting of multiple data centers, various hardware and software platforms, and many servers the capacity planning group has  their hands full, ensuring that the appropriate amount of resources are available to handle all business, application and system requirements.     




Now the art and science of capacity management is not new to Boris Gdalevich, Capacity and Performance Strategist at BNY Mellon.  With a 15 year history in the Capacity Management discipline he’s seen a lot of changes in IT.  As has Giuseppe Nardiello, Principle Product Manager at BMC Software, who also has a long and storied IT background.


Recently Boris and Giuseppe teamed up to produce a podcast discussing the evolution of capacity management and how BNY Melon has adapted their capacity management practice to keep up with increasing IT complexity and the steady decline of capacity planners.  Whether you’re just establishing a capacity management practice or a seasoned professional listen to the podcast and learn...


  • How the capacity management landscape has changed
  • Why “automation” is the key ingredient for any organization implementing a capacity management process
  • The contemporary challenges of capacity management and the approach BNY Mellon has taken
  • The benefit and business value of an automated approach

Automated capacity management – with Boris Gdalevich & Giuseppe Nardiello” - podcast and white paper is posted on the BMC Software web site at…

Share This:

So, someone else owned (BladeLogic) Server Automation in your environment.  They left/quit/got fired/got promoted/moved to New Zealand to raise sheep.  Now you own Server Automation.  Your company sent the other people (let's call them Bob and Kelly) to training.  They got the knowledge transfer, they had the product expertise, they knew a support person by name. 


Either way, now you own it, and your manager wants to know when you'll have that inventory report working, and what you want to do about setting up Disaster Recovery in the shiny new data center. 


Where do you start?


Two easy places to start: first one is the rapidly-wiki-izing documentation, (all of those links that say  Links here include the basic documentation, and a few of the Best Practices that have been developed recently.  Start with Deployment Architecture to understand some of the moving pieces.  Next is the list of "Howto" videos that have been developed over the last few months.  These are basic walkthroughs that show you how to build your own basic self-contained "sandbox" environment of BSA on a Windows VM


BSA 8.2 base documentation:


Deployment Architecture:


Sizing and Scalability:


Disaster Recovery and High Availability:


Large Scale Installations:


Howto Videos:


•        Initial Install – Database Setup: On BMCdocs YouTube at

•        Initial Install – File Server and App Server Installs: On Communities YouTube at

•        Initial Install – Console GUI and Appserver Config: On Communities YouTube at

•        Compliance Content Install: On BMCdocs YouTube at

•        Compliance Quick Audit: On BMCdocs YouTube at

•        Setting up Compliance – Discovery Jobs:

•        BSA 8.2 Patching - Setting Up a Windows Patch Catalog: On Communities YouTube at

•        Windows Patch Analysis: On Communities YouTube at

•       Patching in Short Maintenance Windows with BMC BladeLogic Server Automation: On Communities YouTube at

Share This:


In April 2012 BMC introduced a new product in it's capacity optimization line called "Moviri Integration for BMC Capacity Optimization."  This new product comes to BMC through a partnership with an Italian company - Moviri.  The folks at Moviri have many years of experience in the capacity management discipline and many customers have asked to learn more about BMC's partnership with Moviri, and their product offered through BMC.  I had a recent opportunity to talk with Riccardo Casero - Product Manager at Moviri and ask him a few questions about the new product and the partnership with BMC...





CG:  How did this partnership come about?

Riccardo C:  The partnership between BMC and Moviri is actually twofold. A little bit of history to explain why. Neptuny  was an Italian company, acting both as a software development farm and a consulting firm in the IT systems performance evaluation market. In 2010 BMC acquired  Neptuny’s  flagship product (Caplan), now called BMC Capacity Optimization (BCO), as well as the Caplan product development team and the Neptuny brand. The “company” Neptuny, was renamed Moviri and retained the entire consulting arm, with its many years of professional services experience in the deployment, integration, customization and end-usage of the tool. Integrations in particular play a key role in BCO success. This is due to the openness of the BCO framework, which can incorporate IT data from potentially any electronic data source. And this is one area where Moviri helped customers the most and where they grew their biggest expertise.

So, under this scenario, there was an immediate professional services partnership between Moviri and BMC, with Moviri being  the most trusted company to engage with for BCO related projects. This partnership has flourished since then.

Soon, as a natural evolution, BMC and Moviri saw  the opportunity to more efficiently make Moviri expertise available to customers, enabling them to purchase integration software components as full-featured and supported products. And this is how the partnership came about.

CG:  What is Moviri providing to BMC?  What would you like customers to know?
Riccardo C: 
Moviri is extending BMC Capacity Optimization's  visibility of IT infrastructure utilization and performance data, enabling  customers to leverage their deployed  monitoring solutions to feed the BCO data warehouse. The BCO framework already enables customers to build their own connectors to  in-house platforms; with Moviri Integrations they can now get these connectors in addition to those that are available out-of-the-box, with the standard BMC products quality and level of support.




The current offering includes packages for IBM Tivoli Monitoring and HP Reporter. The integrations enable the transfer of historical performance and configuration data from monitored standalone OS instances, in a robust and controlled manner. Enhancement to cover specific metrics for monitored virtualized platforms such as IBM LPARs and Solaris Containers are already in roadmap.


We are also actively discussing with BMC on how to further leverage Moviri experience on the field in order to enlarge the offering, i.e. more integrations to many other monitoring and system management platforms. So continue to follow communication channels for news.

CG:  How complex is to create a new connector (development thru testing)?

Riccardo C:  There are a lot of factors influencing the effort to build a new connector. First one is the extent of the IT entities and the metrics to be imported. Here are  two examples: There is a considerable difference in the degree of effort required to just import server network card traffic  versus importing metrics from all the  network devices (firewalls, routers, load balancers)  recorded by a network monitoring tool; The same holds true for  importing basic OS-level server utilization metrics rather than importing OS-specific and virtualization-aware metrics.

A second aspect is the understanding of the subject data and the matching of the subject data to BCO data model. In order to have meaningful capacity reports and models in BCO, data cannot simply be transferred with its original label from one tool to the other. Appropriate matching and transformation needs to be done at the data-flow level.

The third aspect is the means of integration: how, how often, and through which protocol the data is transferred.  And finally, there are general software development aspects, such as code reusability, maintainability and extensibility that need to be addressed.





CG:  You mentioned that Moviri connectors are supported.  Does that mean the you will enhance them in the future?

Riccardo C:  Yes, together with BMC we are willing to continuously find ways to add functionalities that can bring value to the customers. Main objective is to fully cover the set of imported entities from the integrated data source so that BCO can take full advantage from the integration. Scalability, robustness and performance are other areas where we plan to regularly improve.  We will also closely look at new versions of integrated data source in order to provide needed upgrades for available connectors.





CG:  How can customers access/buy Moviri connectors?

Riccardo C: The Moviri integration for BCO product can be purchased by customers directly through BMC.  For more information contact your BMC sales representative, or visit Moviri Integration for BMC Capacity Optimization.

Share This:


I’m a big fan of low tech.  I probably shouldn’t say that, given that I work for a high tech company.  Every time I see a new device with a stylus, I think why buy a $100 pen that I am going to lose when the virtually free one in my hand does just fine?  That said I’m also a big fan of leveraging technology in  ways where it makes sense and where the benefits vastly outweigh the costs.  When I can leverage technology to make myself or my customers more efficient, effective and thereby more productive – I’m the first one in line to check into it.


For Kalyan Kumar (aka KK), Worldwide Head of IT Consulting and Cross Tower Services at HCL Technologies, ( using the predictive analytics in BMC ProactiveNet Performance Management is an advanced technology that makes sense. It enables  his team to manage a growing base 100,000 servers, deployed globally across multiple customers, and deliver a higher quality of service at a lower cost.  The ability of predictive analytics to aggregate, analyze and action massive amounts of data allows HCL to do something that is simply not humanly possible without this technology.  That is to proactively identify and repair problems before things fail.  This is simply something that cannot be done with threshold-based monitoring tools - or a with a pen.


In a BMC podcast KK discusses the reasons why HCL adopted predictive analytics as well as the business and customer value derived from it...


1.  Incident Reduction - HCL staff can proactively detect application failure, and  reduce incident volumes by 60%


2.  Integrated - with a single predictive analytics platform increases his  staff efficiency and effectiveness.  It provides a single source for  - global, integrated - performance, event and service-impact management


3.  Actionable intelligence - reduced MTTR, lower TCO and better/consistent customer performance with automated root cause analysis


You know I keep hearing from industry analysts that 2012 will be the breakout year when behavior learning and predictive analytics will go from the margins to the mainstream.  And I have to believe that they are not too far off, given the increasing complexity and continued expansion IT environments – to include both enterprise IT and cloud.


Learn more about predictive analytics with a with a podcast from an early adopter - HCL Technologies:

Share This:

Picture1.pngThe Defense Information Services Agency (DISA), is a US DoD Combat Support Agency responsible for providing continuous command and control capabilities and global enterprise infrastructure to joint war fighters, National level leaders, and other mission and coalition partners.  For DISA, information is the greatest source of military power and it is imperative that its customers have the information they need, when they need it and wherever they need it. 


DISA is responsible for ensuring its services are accessible while protecting the network -and the information on it – from their adversaries.  DISA’s Field Security Operations (FSO) has the responsibility for ensuring the strength of those systems and networks by certifying and testing them against threats, using Security Readiness Reviews (SRR’s).  SRR’s employ extensive scripts to audit the configuration and security posture of their system configurations.  Rapid identification of vulnerabilities, proper alert and auditing, and fast, accurate remediation are critical capabilities that cannot be done manually to meet the demands of the environment.  DISA FSO’s needs automation tools that can quickly find potential vulnerabilities and remediate them across a global network.


Access Security Configuration & Control project (ASCC) – DISA’s implementation of BMC’s Bladelogic - has recently been accredited as an FSO SRR Tool, meaning that it provides the trust, auditability, accuracy, and reliability to meet DISA’s stringent requirements.  This puts BMC’s Bladelogic at the front lines of defense for securing the US Military’s most critical network and information systems – and in turn, its war fighters and its nuclear command capabilities.

Share This:

Regular cooks look for a recipe, check their cupboards and fridge and then go shopping for the missing ingredients.  Home chefs either peruse their kitchen or walk the grocery store, looking for inspiration. A great chef can create a new dish from leftovers and bits of this and that.  They simply see things in a new way.  Taking a second, and even a third look at ingredients inspires them.strawberry.gif


How much do you find yourself doing your job on auto-pilot, looking at the same things in the same way, using your tools exactly as you did 10 years ago?  But the challenges change as do your tools.  When was the last time you looked at what you did and how you do it with fresh eyes?  Start with your job – what has changed this year?  Have your company’s priorities or business focus changed?  Are you moving from a primarily storefront interface to the web? To mobile? Are you virtualizing or moving to a cloud?  Look back 10 years (or 5) and see if your job is different now. Then ask – are you still doing it the same way?


Next, look at the tools you use to do your job.  Are you on the current release? Have you read up on all the new features?  Do you even use a lot of what is available with the tool now?  Find at least one capability that you could use, but haven’t, and learn it.  Approach your old tool with fresh eyes; what else can it do for you?  Once upon a time, I got a notion that my modeling tool, now called BMC Capacity Management for Mainframes, could do more than just predict the future, as it is commonly used.  It could also tell me whether our various disaster recovery plans would actually work.  For some key scenarios, we were able to determine that they wouldn’t perform, even though, on the surface, they looked okay.  A new use for an old tool – what could be better?


Ask who else might benefit from the information generated by the tool?  Too often, we don’t see a way to share the information, or in some cases, don’t want to share the information, but it can be great to share with the business a report that shows them their business transaction counts paired with their IT costs. They can then see how they are doing from a profitability standpoint.  And you can include response time and other metrics showing them the quality of your work. 


Tomorrow, take a step back and review your data center “kitchen.”  What else can you make from what you already have?  How can you get more value out of those old ingredients?  You may be surprised at what you find.


Share This:

Back when I was a performance analyst, I began getting some oddly technical requests from senior management , literally “out of the blue.”  In one famous case, someone with a fancy title, ESVP or some other interesting combination of letters, demanded that we immediately begin to do “parallel SYSPLEX.”  I had a notion that he didn’t actually know what parallel SYSPLEX meant, so I asked “How much of it should we do?”  He replied, “We’re a bank. Let’s be conservative and do about 10% to start.”  I filed that request under a dump and ignored it.  But it got me wondering – where are these ideas coming from?  It was then I discovered a selection of management magazines, heralding the next new thing for IT, urging managers to get on board.  We were ordered to convert all VSAM files to DB2, to move from the mainframe to UNIX, taking perfectly good CICS systems and moving them all to MRO and more. 


These “good ideas” could absorb an army of technicians without necessarily resulting in any business benefit.  It isn’t that any of them were necessarily bad, but you had to be reading more carefully to understand under what circumstances these ideas were warranted and the cost of making those choices.  At the same time, real world issues presented themselves, but to many, it seemed a career-limiting move to focus on those. 

Now, the buzz is cloud and again, it’s not that moving work to the cloud is bad.  But you have to ask first – What problem are you trying to solve and will this be the best way to solve it? The savvy technician – the one who wants to retain his job while still doing the right thing – will take the following steps:


  1. Read the magazines.  If you don’t know what your managers are reading, you won’t really understand what is behind the request.  Figure out what the “free lunch” is to them and whether or not your real world works that way.
  2. Develop the right list of questions.  No manager in the world likes being told by his senior technician that he is an idiot.  But if you ask powerful questions, you can work together to understand the real problem and then, derive a good solution.
  3. Understand the business. At the heart of it all is the value the business gets from IT versus the cost.  Powerful arguments, when needed, will always involve framing the issue in terms of business value. 
  4. Be a diplomat.  Diplomacy is the art of letting somebody else have your way.
  5. Be prepared to learn more.
  6. Use this as a solution-buying occasion

Once an idea has been agreed upon, you will quickly find your new challenges require new tools.  If you have done the research and you are ready to implement the new direction, you should know before you start what tools you need so you can manage it.  If you wait, it is much harder to upgrade your toolset.  But in the beginning of a project, it can be readily folded into the cost of the project.  These journals can be your friend, or your enemy.  It’s your choice.  Don’t let a crisis go to waste.

Filter Blog

By date:
By tag: