12 Replies Latest reply on May 5, 2016 11:00 AM by Bill Robinson

    Error running NSH job against one of my app servers

    Neal Meagher

      I have an NSH script that runs on all my app servers but fails on one.

       

       

      This is the job log:

       

      ParticipantStepAttemptDateTypeMessageItem IdItem Name
      run level log04/15/2016 00:26:19InfoStarted running the job 'csv2xml_RHEL5' with priority 'NORMAL' on application server 'ulvblgp03.fg.rbc.com'(9)
      run level log04/15/2016 00:26:20Warningno repeater is set for server bsaloopback. Deploy to the target directly.
      run level log04/15/2016 00:26:51ErrorThe job 'csv2xml_RHEL5' has failed
      bsaloopbackCommit104/15/2016 00:26:49ErrorCould not open configuration file /var/tmp/stage/edf34bbd81b93748804feae7a311a8a2/7790f69ad5d23546899bb13d1586e4ff.cfg for writing
      (Time in agent's deploy log:: 04/15/2016 00:26:40)
      bsaloopbackCommit104/15/2016 00:26:49ErrorConfiguration initialization on parse command line failed
      (Time in agent's deploy log:: 04/15/2016 00:26:40)
      bsaloopbackCommit104/15/2016 00:26:49ErrorCould not open configuration file /var/tmp/stage/edf34bbd81b93748804feae7a311a8a2/7790f69ad5d23546899bb13d1586e4ff.cfg for writing
      (Time in agent's deploy log:: 04/15/2016 00:26:40)
      bsaloopbackCommit104/15/2016 00:26:49ErrorCould not open configuration file /var/tmp/stage/edf34bbd81b93748804feae7a311a8a2/7790f69ad5d23546899bb13d1586e4ff.cfg for writing
      (Time in agent's deploy log:: 04/15/2016 00:26:40)
      bsaloopbackCommit104/15/2016 00:26:49ErrorCould not open configuration file /var/tmp/stage/edf34bbd81b93748804feae7a311a8a2/7790f69ad5d23546899bb13d1586e4ff.cfg for writing
      (Time in agent's deploy log:: 04/15/2016 00:26:40)
      bsaloopbackCommit104/15/2016 00:26:49WarningDeploy failed. Cleaning up staging area.
      (Time in agent's deploy log:: 04/15/2016 00:26:40)
      bsaloopbackCommit104/15/2016 00:26:49WarningDeploy failed. Cleaning up staging area.
      (Time in agent's deploy log:: 04/15/2016 00:26:40)
      bsaloopbackCommit104/15/2016 00:26:49WarningUnable to delete; Folder: /opt/bmc/bladelogic/NSH/Transactions/7790f69ad5d23546899bb13d1586e4ff does not exist
      (Time in agent's deploy log:: 04/15/2016 00:26:40)
      bsaloopbackCommit104/15/2016 00:26:49ErrorAPPLY failed for server bsaloopback. Exit code = -4001

       

      This is from app server log:

       

      5 Apr 2016 00:26:39,504] [WorkItem-Thread-5] [INFO] [erausche@oak.fg.rbc.com:SubAdmin:] [Compliance] Started pid 11149: scriptutil -d "/var/tmp/stage" -h "usvw35d1.devfg.rbc.com" -s /opt/bmc/fileserver/extended_objects/tsssolaris.3.1.1.sh

      [15 Apr 2016 00:26:39,529] [WorkItem-Thread-76] [INFO] [erausche@oak.fg.rbc.com:SubAdmin:] [Compliance] Started pid 11157: scriptutil -d "/var/tmp/stage" -h "usvexad1.devfg.rbc.com" -s /opt/bmc/fileserver/extended_objects/tsssolaris.3.1.1.sh

      [15 Apr 2016 00:26:41,782] [WorkItem-Thread-5] [INFO] [erausche@oak.fg.rbc.com:SubAdmin:] [Compliance] Started pid 11165: nexec usvw35d1.devfg.rbc.com "ps -aef | egrep [i]netd"

      [15 Apr 2016 00:26:41,786] [WorkItem-Thread-76] [INFO] [erausche@oak.fg.rbc.com:SubAdmin:] [Compliance] Started pid 11167: nexec usvexad1.devfg.rbc.com "ps -aef | egrep [i]netd"

      [15 Apr 2016 00:26:44,035] [WorkItem-Thread-5] [INFO] [erausche@oak.fg.rbc.com:SubAdmin:] [Compliance] Started pid 11186: scriptutil -d "/var/tmp/stage" -h "usvw35d1.devfg.rbc.com" -s /opt/bmc/fileserver/extended_objects/tsssolaris.check_service.sh -x smtp:sendmail

      [15 Apr 2016 00:26:44,062] [WorkItem-Thread-76] [INFO] [erausche@oak.fg.rbc.com:SubAdmin:] [Compliance] Started pid 11194: scriptutil -d "/var/tmp/stage" -h "usvexad1.devfg.rbc.com" -s /opt/bmc/fileserver/extended_objects/tsssolaris.check_service.sh -x smtp:sendmail

      [15 Apr 2016 00:26:47,471] [WorkItem-Thread-72] [ERROR] [erausche@oak.fg.rbc.com:SubAdmin:] [Compliance] Connection timed out;//uacctid21.devfg.rbc.com/etc/inetd.conf

      [15 Apr 2016 00:26:47,651] [WorkItem-Thread-5] [INFO] [erausche@oak.fg.rbc.com:SubAdmin:] [Compliance] Started pid 11205: scriptutil -d "/var/tmp/stage" -h "usvw35d1.devfg.rbc.com" -s /opt/bmc/fileserver/extended_objects/tsssolaris.3.5.15.sh

      [15 Apr 2016 00:26:47,674] [WorkItem-Thread-76] [INFO] [erausche@oak.fg.rbc.com:SubAdmin:] [Compliance] Started pid 11213: scriptutil -d "/var/tmp/stage" -h "usvexad1.devfg.rbc.com" -s /opt/bmc/fileserver/extended_objects/tsssolaris.3.5.15.sh

      [15 Apr 2016 00:26:49,915] [WorkItem-Thread-5] [INFO] [erausche@oak.fg.rbc.com:SubAdmin:] [Compliance] Started pid 11222: scriptutil -d "/var/tmp/stage" -h "usvw35d1.devfg.rbc.com" -s /opt/bmc/fileserver/extended_objects/tsssolaris.4.1.1.3.sh

      [15 Apr 2016 00:26:49,947] [WorkItem-Thread-76] [INFO] [erausche@oak.fg.rbc.com:SubAdmin:] [Compliance] Started pid 11230: scriptutil -d "/var/tmp/stage" -h "usvexad1.devfg.rbc.com" -s /opt/bmc/fileserver/extended_objects/tsssolaris.4.1.1.3.sh

      [15 Apr 2016 00:26:51,538] [Job-Execution-4] [INFO] [erausche@oak.fg.rbc.com:SubAdmin:] [Deploy] Phase: Commit completed with errors for target: bsaloopback

      [15 Apr 2016 00:26:51,542] [Job-Execution-4] [ERROR] [erausche@oak.fg.rbc.com:SubAdmin:] [Deploy] The job 'csv2xml_RHEL5' has failed

      [15 Apr 2016 00:26:51,643] [Job-Execution-2] [ERROR] [erausche@oak.fg.rbc.com:SubAdmin:] [Batch] Member job csv2xml_RHEL5 failed

      [15 Apr 2016 00:26:51,703] [Job-Execution-2] [INFO] [erausche@oak.fg.rbc.com:SubAdmin:] [Batch] Started running member job Copy_XML_RHEL5

      [15 Apr 2016 00:26:51,718] [Job-Execution-1] [INFO] [erausche@oak.fg.rbc.com:SubAdmin:] [FileDeploy] Started running the job 'Copy_XML_RHEL5' with priority 'NORMAL' on application

       

        • 1. Re: Error running NSH job against one of my app servers
          Bill Robinson

          this is a nsh job or a deploy job?  because it looks like a deploy job.

          • 3. Re: Error running NSH job against one of my app servers
            Bill Robinson

            ok, then fix the title :)?

             

             

            i'm not sure what the point of the appserver log snippet was - there's nothing related to the job in question there that's useful.

             

             

            can you export the deploy job run log from the gui and also get the related bldeploy log from the target and attach ?

            • 4. Re: Error running NSH job against one of my app servers
              Neal Meagher

              The failure happens in a batch file. It gathers information step 1 runs compliance step 2 writes to a csv step 3. Then runs the Python script and converts the csv to xml. That is  where it fails only on app 3. We checked the python libraries on the server. Those are fine. We ran the same python script outside the job on the command line and it works. We are having hard time creating new deploy job outside the batch and binding it to app3. We tried a job execution rule and it starts on app03 then hands off to another app server. This happens when there is little going on in the system. Is there a job sub property I can create to bind the job to an app server.?

              • 5. Re: Error running NSH job against one of my app servers
                Bill Robinson

                so there's still a bit that's unclear.  so you have:

                 

                a batch job that contains multiple jobs?

                - member job is a compliance job.  is this running remediation ? or what ?

                - member job is a nsh script job ? (what is writing the csv)

                - member job that is what that runs your python script to convert the csv to xml (you know there's a nsh utility for this right (csv2xml)?

                - ...

                 

                which is the deploy job ?

                 

                if you are running a deploy job, then what matters is the target of the job, not what appserver runs it.  everything in the deployed package runs on the target.  nothing from the appserver running the job is involved.

                 

                so why does the job routing rule matter here?

                 

                what does 'we are having a hard time creating a new deploy job outside the batch and binding it to app3 mean' ?  you know that member job runs of a batch run will follow the routing of the batch right ?

                 

                your python script is part of a nsh job or a deploy job/blpackage ?

                • 6. Re: Error running NSH job against one of my app servers
                  Neal Meagher

                  The csv2xml script is being run as bldeploy and is part of batch job. It is now fails intermittently on different app servers. I am getting error below. Seems to be resource related.

                   

                  Typical Run Results for Job:

                  Info        04/20/2016 18:18:36        The job 'csv2xml_RHEL7->csv2xml_RHEL7' has succeeded on server bsaloopback                               

                  Info        04/20/2016 18:18:29        Package "Run csv2xml" UUID(5e48e6e0e1e23478a673c6bb792f33ff) completed. exitCode = 0 (Apply successful)                

                  Info        04/20/2016 18:18:26        Package "Run csv2xml" UUID(5e48e6e0e1e23478a673c6bb792f33ff) processing instructions                       

                  Info        04/20/2016 18:18:26        Package "Run csv2xml" UUID(5e48e6e0e1e23478a673c6bb792f33ff) initialized, entering wait queue for processing                        

                  Info        04/20/2016 18:18:21        Package "Run csv2xml" UUID(5e48e6e0e1e23478a673c6bb792f33ff) started                       

                  Info        04/20/2016 18:18:14        Started running the deploy step job 'csv2xml_RHEL7->csv2xml_RHEL7' on application server 'ulvblgp08.fg.rbc.com'(11) against target server 'bsaloopback'                      

                  Info        04/20/2016 18:18:14        Deploy Apply Job (Pre-Execute):csv2xml_RHEL7; Server:bsaloopback; PkgID:"9544ca39-b4b3-4f5a-8c45-f47effd19647-5629.12"; UUID:5e48e6e0e1e23478a673c6bb792f33ff                    

                   

                   

                  Info        04/21/2016 18:18:28        The job 'csv2xml_RHEL7->csv2xml_RHEL7' has failed on server bsaloopback                       

                  Error      04/21/2016 18:18:28        APPLY failed for server bsaloopback. Exit code = -4001                  

                  Info        04/21/2016 18:18:21        Package "Run csv2xml" UUID(72828ce04e7d345eb0c6652c6e2cb70a) completed. exitCode = -4001 (Deployment failed to process)                             

                  Info        04/21/2016 18:18:18        Package "Run csv2xml" UUID(72828ce04e7d345eb0c6652c6e2cb70a) started                      

                  Info        04/21/2016 18:18:10        Started running the deploy step job 'csv2xml_RHEL7->csv2xml_RHEL7' on application server 'ulvblgp06.fg.rbc.com'(16) against target server 'bsaloopback'                      

                    Info        04/21/2016 18:18:10        Deploy Apply Job (Pre-Execute):csv2xml_RHEL7; Server

                   

                  Line 2720: java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key property_set_class.-2026234.name

                    Line 2739: java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key property_set_class.-2026234.description

                    Line 3751: java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key enum_data_type_legal_value.2024402.0.display_name

                    Line 3784: java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key enumerated_bl_value.-2248256.display_name

                    Line 3821: java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key enum_data_type_legal_value.2024402.1.display_name

                    Line 3854: java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key enumerated_bl_value.-2248258.display_name

                    Line 3891: java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key enum_data_type_legal_value.2024402.2.display_name

                    Line 3924: java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key enumerated_bl_value.-2248260.display_name

                    Line 3961: java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key enum_data_type_legal_value.2024402.3.display_name

                    Line 3994: java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key enumerated_bl_value.-2248262.display_name

                    Line 4031: java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key enum_data_type_legal_value.2024402.4.display_name

                    Line 4064: java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key enumerated_bl_value.-2248264.display_name

                    Line 4101: java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key enum_data_type_legal_value.2024402.5.display_name

                    Line 4134: java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key enumerated_bl_value.-2248266.display_name

                    Line 4171: java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key enum_data_type_legal_value.2024402.6.display_name

                    Line 4204: java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key enumerated_bl_value.-2248268.display_name

                    Line 4241: java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key enum_data_type_legal_value.2024402.7.display_name

                    Line 4274: java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key enumerated_bl_value.-2248270.display_name

                    Line 4311: java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key enum_data_type_legal_value.2024402.8.display_name

                    Line 4344: java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key enumerated_bl_value.-2381720.display_name

                    Line 4381: java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key enum_data_type_legal_value.2024402.9.display_name

                    Line 4414: java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key enumerated_bl_value.-2422370.display_name

                    Line 4451: java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key enum_data_type_legal_value.2024402.10.display_name

                    Line 4484: java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key enumerated_bl_value.-2422380.display_name

                    Line 4521: java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key enum_data_type_legal_value.2024402.11.display_name

                    Line 4554: java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key enumerated_bl_value.-2531473.display_name

                   

                   

                   

                  • 7. Re: Error running NSH job against one of my app servers
                    Bill Robinson

                    please answer the rest of the questions i posed.

                    • 8. Re: Error running NSH job against one of my app servers
                      Neal Meagher

                      Its a long story regarding use of a loopback trying to force the use of certain app server for testing

                       

                      Disregard:

                       

                      "if you are running a deploy job, then what matters is the target of the job, not what appserver runs it.  everything in the deployed package runs on the target.  nothing from the appserver running the job is involved.

                       

                      so why does the job routing rule matter here?"

                       

                      Here is whats in the package:

                       

                       

                      CAE_TSS_RHEL7 - Batch Job

                                      Update Server Properties Job

                                      TSS_RHEL7_discovery - Discovery Job

                                      TSS_RHEL7_compliance - Compliance Job

                                      Export_RHEL7 - NSH Script Job that executes  a 'blcli_execute Utility exportComplianceRunLatest' blcli command that generates an export of the latest compliance job run and puts the results in a csv file.

                                      Csv2xml_RHEL7 - (This is the job that is having the issue. Some days this job runs successfully, and others it fails) A BL Package Deploy Job that only has only an external command that runs a csv2xml.py python script. This script reads the csv file and generates a readable xml file

                                      Copy_XML_RHEL7 - This is a filecopy job that just moves the xml file from the appserver to a dedicated server for post processing that is outside of BladeLogic.

                      • 9. Re: Error running NSH job against one of my app servers
                        Bill Robinson

                        is the batch job set to use its own targets or the targets of the individual jobs ?

                         

                        so the export job puts the csv file where ?  on the appserver its running on ?  in a specific place ?

                         

                        the csv2xml_RHEL7 is targeted against what system (if the batch is using individual job targets)?

                         

                        how does the copy_xml_rhel7 file deploy job (it is a file deploy job?) know where to get the exported file ?

                         

                        more to the point, why are using a bldeploy and then a file deploy job to do something you can do in the nsh job that runs the export ?

                        • 10. Re: Error running NSH job against one of my app servers
                          Neal Meagher

                          There is also Multiple Regions Each Region has not only a separate Compliance Batch Job per Region, but a separate Compliance Batch Job per OS

                          • 11. Re: Error running NSH job against one of my app servers
                            Bill Robinson

                            ok?  not sure how that is relevant.  ?you have one batch job w/ some member jobs.  there's a problem in that job.  that's what you need to focus on.  other batch jobs will not affect each other.

                            • 12. Re: Error running NSH job against one of my app servers
                              Bill Robinson

                              so after a webex the problems seems to be:

                              the deploy job is targeted to a 'bsaloopback' "server" which resolves to 'localhost' of whatever appserver picks up the job. so it looks like what is happening is the staging phase is picked up by appserver1, the payload (just the bldeploy.xml) is staged to /var/tmp/stage on appserver1, then appserver2 picks up the apply/commit and looks for the staged files in its /var/tmp/stage directory but it's not there because it got put on appserver1 per the staging phase.

                               

                              so there are multiple ways to handle this:

                              - job routing rule to run the batch job from one appserver (all member jobs will run from one appserver)  this may not scale well depending how many targets you

                              - change the STAGING_DIR on the 'bsaloopback' server object to some location that's shared across appservers.  this may not be ideal because other jobs using this server object as a target may not like this based on whatever they are doing.

                              - change the target of the bldeploy job to an actual appserver server object that's not a VIP or loopback alias.

                              - collapse the bldeploy and file deploy job into the nsh script that you have and make that a 'type 2' script that runs w/o any targets.

                               

                               

                              imo the last option is the easiest.