There's a new feature in BladeLogic 7.4 which can speed up file deployment, and also has a few features which aren't available in a BLPackage. While pondering how to deploy files to thousands of servers, I came up with a method that's worth consideration.
BladeLogic version 7.4.x quietly added a feature where you could
specify a URL for the source of a software deployment. So for
instance, you could reference an NFS mount on a UNIX box, or an SMB
share on a Windows box. You can also reference an nsh path.
According to pg 340 of the User's Guide,
source for direct use at deployment (no local copy)
instructs an agent to mount or map the device specified in the URL and
deploy the software package directly to the agent. When you select
this option, the agent uses the data transmission protocol specified
in the URL to access the specified source files. The software package
is not copied to a staging area on the agent, so no local copy of the
source file is created.
To use either this option, the Source Location field (see stepc) must
provide a URL that complies with BladeLogic's requirements for network
data transmission, including a data transmission protocol: either NFS
or SMB. See URL Syntax for Network Data Transmission for
detailed information about the required syntax.
To illustrate how significant this is, let's discuss a typical use for
a BladeLogic software deployment package.
Let's say you have ten thousand servers running Sun Solaris 10, and
you need to deploy Sun's latest patch cluster. That's 458.4
megabytes. Typically you would load the patch cluster into the
BladeLogic depot, then deploy to the targets. Theoretically, a
BladeLogic server with a gigabit connection could deploy 458 megabytes
to a target in 7.33 seconds*. In the real world, this is rarely the
case. Job of this size have been known to take three minutes, even on
a fast network and server. With 10,000 targets, it would take the
better part of a month to deploy the patch cluster to all targets. No
one is going to spend three weeks deploying patches 24x7, so there has
to be a better way.
For ages we've had repeaters to speed up the deploy process.
Repeaters are a great solution, and they're easy to implement. But a
repeater is three times slower than the method I'm about to describe.
The reason is that our file is copied three times when a repeater is
Here's the fastest method to deploy files to numerous hosts with
BladeLogic. Create a software deploy job, and for the installable
source, specify an NFS mount. For example,
Note I've selected the option "Agent mounts source for direct use at
deployment (no local copy)". According to the 7.4.x docs, this
sidesteps use of the staging area; this fact alone will double the
throughput of a deploy job with single target.
Reducing the time of our deploy from 20.8 days to 10.4 days is a huge
improvement. But how can we improve it further? For that trick, we
call upon the property dictionary.
Here's how to do it:
1. Using the property dictionary, create a server property called
2. Parameterize your software deploy job to reference the property
for the name of the server where the patch cluster will be copied
3. Copy the patch cluster to a server who can serve up the file via NFS.
4. Last but not least, pick 21 servers that need to be patched, and
set their server property to reference the NFS URL from step 3. (see
pg 345 of the user's guide for details)
With 21 targets, it should be possible to deploy the file to all
targets in 30.5 minutes. (This takes the original 3 minute deploy
time, halves it because there's only one copy not two, and multiplies
At this point, you're thinking "who cares?" Deploying 21 files is no
big deal. The trick is to have those 21 server up to the next 21, and
the next 21. Just like a pyramid scheme!
By using this scheme, deploy time could be cut down from 21 days to 90
By creating a "software deployment pyramid", we're leveraging the
network and storage bandwidth of every server we deploy the patch to.
If anyone else can tell me another way to reduce file deployment time
by 99.7%, I'm all ears. This method is ideal for BladeLogic, because
traditional patching methods don't give you access to all 10,000
servers at once. BladeLogic also scales better than traditional
methods, because we support the use of multiple app servers. There's
nothing to stop you from using this method with 20 app servers and ten
thousand managed hosts. The pyramid scheme also reduces load on the
BladeLogic depot, the storage it lives on, and the network interface
of both the app server and depot.
I've intentionally ignored a number of factors, to keep the discussion
simple. I haven't factored in the bandwidth of the network. If you
do the math, you'll see that we don't need much bandwidth to copy a
500megabyte file in 90 seconds. In fact, it's less than a 56K modem
(46kilobits/per sec to be exact.)
I've ignored the bottleneck of the deploy job itself, but that's easy
to fix. Just use lots of app servers. BladeLogic performance
improves dramatically with multiple app servers, and ANYONE with
10,000 managed servers should be using at least twenty of
There are other ways to deploy very large files, such as repeaters,
but they're not as fast. Another drawback with the use of repeaters
is that it's all-or-nothing. The method I've outlined here could be
used when you need to patching or deploy large files, without
resorting to re-configuring your BladeLogic infrastructure to use
The bottom line is that this method of file deployment is dramatically
faster than existing methods, and offers a level of BladeLogic
performance which was impossible prior to 7.4.x
The best description I've seen of the traditional file deployment
process was posted by Sean Daley in the internal forums. According to
what he posted:
"In the past we've noticed that our performance can lag behind normal
scp performance quite a bit (.. edited for brevity...). There are a
couple of things to look at / note.
1) When deploying 1 GB worth of files you're actually copying 2 GB
worth of files. The process of copying a file from one agent to
another agent does not make the data go directly from agent A to agent
B. The data first has to pass through the client initiating the copy
before heading to agent B. So when you're deploying 1 GB of files, the
appserver has to copy these files from the fileserver to the target
agent. This means 1 GB of data passes from the fileserver to the
appserver and then from the appserver to the agent (so effectively 2
GB of data). Even 2 GB of data with those numbers seems weird
2) Are you doing a direct or an indirect deploy (using a repeater)
when you do this?
If you're doing an indirect deploy, are you using the maximum cache
size feature on the repeater? If so don't. That feature performs
horribly. (.. edited ..)
3) Going back to #1 again, because the data gets copied between two
machines, you'll need to make sure that the network to both servers is
not messed up. For example, if a switch port is misconfigured on the
file server this could have catastrophic consequences for performance.