we also make "heavy" use of NSH-Proxies due to the usage of Automation Principals.
We also noticed a decrease in performance, however not by the factor 7.
How is your application-server configured from a deployment perspective ?
Is your JOB_SERVER also the NSH_PROXY ?
Do you make use of any Load-Balancer in your environment ?
The reason i'm asking for the LB, if your NSH proxysvc URL is set to an LB address this would also route your whole traffic through the LB, which could also be a huge performance bottleneck.
Just out of curiosity, what is BMC trying to fix on that matter for 8.6 ?
There's no workaround - there's overhead for nsh to open and use the nsh -> nsh proxy connection. that's where the performance hit is happening. you can of course try to run more in parallel, and you can try and reduce the number of connections made to the proxy in your scripts.
This is a known issue that is known since BSA 7.6, and was never really fixed. The issue like Bill stated, is when the NSH initiates the NSH Proxy connections. The problem is that it's doing so for every NSH command in a script, instead of once for the whole script. So if for example your script has 60 commands, and each add 1 second of delay because of the NSH Proxy connection initiation, then it's a whole 60 seconds extra that it adds to the execution of the script. This can add-up pretty fast when you have large scripts.
We are also waiting for an improvement there because we have retested this behavior again recently and it's still an issue even in 8.3. In average, it takes about 1 second extra per NSH command. Whether we use socks proxies or not.
We also use NSH Proxy for Automation principals.
We have 3 AppServers two comes with "ALL" Server Role configured, and the other one just have "JOB" Server Role configured. NSH Proxy is NOT configured in the JOB Role Server.
We also have a Load-Balancer and the Proxy Service URL is set with L-B address.
Why it can produce a bottleneck performance...?
Support told us, that there is a RFE created for this problem, which consists in "something like nsh performance commnads which wolud enhance the performance of NSH Jobs".Anyway, it is going to be implemented in next BSA 8.6 version.
Bill, How can I reduce the number of proxy connection in our Scripts?. Is there any configuration option to limit number of proxy connections?. If so, in what can BSA functionality be affected?.
Thank you to everyone.
1 of 1 people found this helpful
I'm the one who opened that RFE, but as far as I know, it's not been implemented yet. As for your question, (Bill, correct me if I'm wrong) there's no way to limit the number of NSH Proxy connections used in a script besides using less NSH commands. Each NSH command will initiate a connection to the NSH proxy if the app server's secure file is configured to go through it.
One thing that you can check, is to optimize the use of pipes. For example, instead of doing this:
VARNAME=$(echo $1 | tr -d "\n")
VARNAME=$(tr -d "\n" <<< "$1")
That way you get rid of the echo. Basically, look for any pipes you could remove, since each command in the pipe will spawn a new NSH Proxy connection.
As far as I know, the BSA jobs and features that are affected by this are any job using NSH:
1. NSH Script Jobs (type 1, type 2, and I believe type 4, not sure about type 3 since it's basically like scriptutil).
2. File Deploy Jobs: They use NSH in the background to copy the file
3. Package Deploy Jobs: They use NSH during the staging phase as far as I know.
4. Extended Objects using centralized NSH commands (not remote)
1 of 1 people found this helpful
when your Proxy-SVC URL is set to the LB address, all your NSH traffic would also go through the LB, which slowed it down massively for us.
We decided to only use the LB to balance the Authentication-Service. Once you are authenticated your session_credentials would contain the App-Svc URL and the NSHProxy-Svc URL from a "real" App-Server, not the LB.
The downside if this approach is, that when the App-Server you are connected against goes down you would have to re-connect in order to keep on working.
Can you share the RFE number so i can follow his implementation, we have the same performance issue.
Our workaround is to not configure the NSH client on Application server to use NSH Proxy.
All NSH Script are stored in the Console (NSH Script job). We do not use anymore NSH Script directly on shell NSH.
For user who wants to execute directly NSH Script without create it in the BladeLogic console, we have created a linux box with only the NSH client configure to pass through the NSH Proxy.
Thank you everyone for your requests.
I´ll study all of this and see what is the best option for us.
The RFE support gave us is : QM001778941
The problem is that bmc don´t consider the low NSH performance like a Defect.
We had the same problem when we tried to escalate it. It's considered a design flaw, but not a defect per say. Since it would require core changes, it is not something easy to address. Now that multiple customers have noticed the issue however, the priority may change. At first, we were the only ones who reported it.
would setting up a dedicated NSH proxy server be helpful? Rout all App server traffic through that URL (not LB name). Also change all blcli commands in scripts to blcli_execute, that definitely speeds up the job
it probably won't. the issue is w/ the nsh -> nsh proxy connection itself. we made some changes to the crypto stuff for 8.5.01 patch5 that may improve things here as some of the ciphers used slowed things down.
Until the root of the issue is fixed - which is the fact that a new NSH Proxy connection is opened for each new NSH command performed in the script - nothing will make much of a difference unfortunately.
It's not a matter of resources, it's just the exponential factor. The more NSH commands you have in a script, and the more it will take some additional time to run when the application server is going through the NSH Proxy. In other words, if it takes around 1 second extra to execute a NSH command when going through a NSH Proxy, and that you have 60 NSH commands in your script, it's a whole 1 minute extra that it will take to run the script.
This may seem small, but when you're running a multi-hundred line script against thousands of targets (with 50-100 in parallel), it adds up to a lot.
We had to disable the NSH Proxy on our app servers because of this. We only keep one app server that goes through it, and use job routing to exclude it from normal runs. If we need to use automation principals (which only work when going through a NSH Proxy), then we force that job to go through that server.