1 Reply Latest reply on Feb 26, 2009 9:12 PM by Bill Robinson

    Deploying Thousands of Files

    John Van Ommen

      There's a new feature in BladeLogic 7.4 which can speed up file deployment, and also has a few features which aren't available in a BLPackage. While pondering how to deploy files to thousands of servers, I came up with a method that's worth consideration.

      BladeLogic version 7.4.x quietly added a feature where you could
      specify a URL for the source of a software deployment. So for
      instance, you could reference an NFS mount on a UNIX box, or an SMB
      share on a Windows box. You can also reference an nsh path.

      According to pg 340 of the User's Guide,

      Agent mounts
      source for direct use at deployment (no local copy)

      —The Deploy Job
      instructs an agent to mount or map the device specified in the URL and
      deploy the software package directly to the agent. When you select
      this option, the agent uses the data transmission protocol specified
      in the URL to access the specified source files. The software package
      is not copied to a staging area on the agent, so no local copy of the
      source file is created.

      To use either this option, the Source Location field (see stepc) must

      provide a URL that complies with BladeLogic's requirements for network

      data transmission, including a data transmission protocol: either NFS

      or SMB. See URL Syntax for Network Data Transmission for

      detailed information about the required syntax.

      To illustrate how significant this is, let's discuss a typical use for
      a BladeLogic software deployment package.

      Let's say you have ten thousand servers running Sun Solaris 10, and
      you need to deploy Sun's latest patch cluster. That's 458.4
      megabytes. Typically you would load the patch cluster into the
      BladeLogic depot, then deploy to the targets. Theoretically, a
      BladeLogic server with a gigabit connection could deploy 458 megabytes
      to a target in 7.33 seconds*. In the real world, this is rarely the
      case. Job of this size have been known to take three minutes, even on
      a fast network and server. With 10,000 targets, it would take the
      better part of a month to deploy the patch cluster to all targets. No
      one is going to spend three weeks deploying patches 24x7, so there has
      to be a better way.

      For ages we've had repeaters to speed up the deploy process.
      Repeaters are a great solution, and they're easy to implement. But a
      repeater is three times slower than the method I'm about to describe.
      The reason is that our file is copied three times when a repeater is

      Here's the fastest method to deploy files to numerous hosts with
      BladeLogic. Create a software deploy job, and for the installable
      source, specify an NFS mount. For example,


      Note I've selected the option "Agent mounts source for direct use at
      deployment (no local copy)". According to the 7.4.x docs, this
      sidesteps use of the staging area; this fact alone will double the
      throughput of a deploy job with single target.

      Reducing the time of our deploy from 20.8 days to 10.4 days is a huge
      improvement. But how can we improve it further? For that trick, we
      call upon the property dictionary.

      Here's how to do it:

      1. Using the property dictionary, create a server property called

      2. Parameterize your software deploy job to reference the property
      for the name of the server where the patch cluster will be copied

      3. Copy the patch cluster to a server who can serve up the file via NFS.

      4. Last but not least, pick 21 servers that need to be patched, and
      set their server property to reference the NFS URL from step 3. (see
      pg 345 of the user's guide for details)

      With 21 targets, it should be possible to deploy the file to all
      targets in 30.5 minutes. (This takes the original 3 minute deploy
      time, halves it because there's only one copy not two, and multiplies
      by 21.)

      At this point, you're thinking "who cares?" Deploying 21 files is no
      big deal. The trick is to have those 21 server up to the next 21, and
      the next 21. Just like a pyramid scheme!


      By using this scheme, deploy time could be cut down from 21 days to 90
      minutes (!!!)

      By creating a "software deployment pyramid", we're leveraging the
      network and storage bandwidth of every server we deploy the patch to.
      If anyone else can tell me another way to reduce file deployment time
      by 99.7%, I'm all ears. This method is ideal for BladeLogic, because
      traditional patching methods don't give you access to all 10,000
      servers at once. BladeLogic also scales better than traditional
      methods, because we support the use of multiple app servers. There's
      nothing to stop you from using this method with 20 app servers and ten
      thousand managed hosts. The pyramid scheme also reduces load on the
      BladeLogic depot, the storage it lives on, and the network interface
      of both the app server and depot.

      I've intentionally ignored a number of factors, to keep the discussion
      simple. I haven't factored in the bandwidth of the network. If you
      do the math, you'll see that we don't need much bandwidth to copy a
      500megabyte file in 90 seconds. In fact, it's less than a 56K modem
      (46kilobits/per sec to be exact.)

      I've ignored the bottleneck of the deploy job itself, but that's easy
      to fix. Just use lots of app servers. BladeLogic performance
      improves dramatically with multiple app servers, and ANYONE with
      10,000 managed servers should be using at least twenty of

      There are other ways to deploy very large files, such as repeaters,
      but they're not as fast. Another drawback with the use of repeaters
      is that it's all-or-nothing. The method I've outlined here could be
      used when you need to patching or deploy large files, without
      resorting to re-configuring your BladeLogic infrastructure to use

      The bottom line is that this method of file deployment is dramatically
      faster than existing methods, and offers a level of BladeLogic
      performance which was impossible prior to 7.4.x

      • The best description I've seen of the traditional file deployment

      process was posted by Sean Daley in the internal forums. According to

      what he posted:
      "In the past we've noticed that our performance can lag behind normal
      scp performance quite a bit (.. edited for brevity...). There are a
      couple of things to look at / note.

      1) When deploying 1 GB worth of files you're actually copying 2 GB
      worth of files. The process of copying a file from one agent to
      another agent does not make the data go directly from agent A to agent
      B. The data first has to pass through the client initiating the copy
      before heading to agent B. So when you're deploying 1 GB of files, the
      appserver has to copy these files from the fileserver to the target
      agent. This means 1 GB of data passes from the fileserver to the
      appserver and then from the appserver to the agent (so effectively 2
      GB of data). Even 2 GB of data with those numbers seems weird

      2) Are you doing a direct or an indirect deploy (using a repeater)
      when you do this?
      If you're doing an indirect deploy, are you using the maximum cache
      size feature on the repeater? If so don't. That feature performs
      horribly. (.. edited ..)

      3) Going back to #1 again, because the data gets copied between two
      machines, you'll need to make sure that the network to both servers is
      not messed up. For example, if a switch port is misconfigured on the
      file server this could have catastrophic consequences for performance.

        • 1. Re: Deploying Thousands of Files
          Bill Robinson

          if/when we move this content to the bmcdn we should get you a blog :)


          a note on this:


          the 'agent mount at staging' has one important limitation - if you want to use the smb your appserver and target needs to be windows. if you want to use nfs, the appserver and target need to be unix. i've raised this issue as i don't see why it matters what os the appserver is.