1 2 Previous Next 15 Replies Latest reply on Aug 18, 2020 2:11 AM by Andreas Mitterdorfer

    Data Management Job Hung on Load Step

    Timothy Mobley
      Share This:

      [First, I wanted to mention that I found this thread and followed most of the steps that solved the problem for that user, but either I am not understanding one of their steps or their fix doesn't work in my situation.]

       

      I created a job and transformation in AI Spoon based on CI-CMDB_Express, but importing bulk inventory (as that integration is not provided OOTB). I then created a Data Management job in the Data Management Console and ran it. The result is that all the new CIs are imported to BMC.AM:BMC_BulkInventory in the BMC.ITSM.CI.DATA dataset (as expected). However, the job gets stuck there - showing the 'Load' step as 'In Progress' and the 'Validate' and 'Promote' steps stay in Queue forever. As a result, the new CIs are not visible in Asset Management (because they haven't been promoted). When I go to Atrium Core Console and run the 'BMC Asset Management CI DATA LOAD' reconciliation job, that does run, however when I try to make any edit to the new CI record it says "ARERR [302] Entry does not exist in database." Additionally, there are then duplicate records - one in the production (BMC.ASSET) dataset and the other in BMC.ITSM.CI.DATA.

       

      As mentioned at the top, I've tried all the steps that solved this problem for another user with no success. Here are a few additional details on what I've tried (based on that other article I mentioned):

      • Verified that the password in UDM:RAppPassword was correct
      • Verified that the share configured in System Configuration Settings is working (a folder is created when the job runs)
      • Created an overlay and assigned DMT:DJS:SetStatus to pool 5 (arjavaplugin logs show the job running on 'pool-5-thread-19') - this was previously set to '4'
      • Set 'Escalation' queue (in Server Information > Ports and Queues) for Min and Max both to '5' - was previously set to '1'
      • Changed 'Master' entry in UDM:Repository:Slave to the server name (was previously 'localhost')
        • However, when I went back in to check, this appears to have reverted back to 'localhost'
        • I'm not sure what port this should be. The original default setting is 9000, but what port does this specify in ar.cfg? The Plugin-Port? Db-Server-Port? Other?
      • I wasn't sure about what needed to be updated in the Data Store Connection or CAI:AppRegistry mentioned in the other thread, so I didn't make any changes there.
        • 1. Re: Data Management Job Hung on Load Step
          Andreas Mitterdorfer

          Can you open DMT:Step and search for the ci-cmdb load step of your job and copy the instanceid. Then open UDM:ExecutionInstance and search with instanceid of the step in 'execution instance name'.

          Is the Job Status Finished?

          If yes, then check escalation DMT:DJS:SetStatus and its workflow. It should have triggered filter DMT:DJS:ChkStepCompleted and the next step in sequence should activate.

          If status is not Finished then you need to investigate the job run. Set Rowlevel logging and check the carte server logs.

          2 of 2 people found this helpful
          • 2. Re: Data Management Job Hung on Load Step
            Timothy Mobley

            **I've edited this reply to make some corrections.**

             

            Thank you for the pointers, Andreas. I did find the Instance ID of the load step in UDM:ExecutionInstance. The load step definitely finished, but it never proceeds to validate or promote. I also looked in DMT:DJS:SetStatus, and don't see the filter DMT:DJS:ChkStepCompleted (unless I'm looking in the wrong place?). How do I see if/where it triggered that? The only relationships listed are the DMT:Step form, no filters.

             

            **I had another problem with the Remedy Application Password, but got that fixed now.

            • 3. Re: Data Management Job Hung on Load Step
              Timothy Mobley

              Update: Following this troubleshooting KBA, I did the following steps:

                   1. Cleaned up UDM:Variable, which had over 100,000 duplicate entries.

                   2. Noticed that arcarte.log said all the new CIs I was trying to load were showing "ERROR (12006) : Instance not found"

                        a. I found this KBA on the above error that said this could be caused by "half" of a record being deleted imporperly

                        b. I located the records (I thought I had deleted) in BMC.CORE:BMC_BaseElement and manually deleted them

                   3. I cancelled the job and reran it, but still get the same error in arcarte.log and the job still hangs on the Load step

              • 4. Re: Data Management Job Hung on Load Step
                Andreas Mitterdorfer

                Create an api/filter/sql log to one file when you import a CI and check on the reason for the 12006 error.

                Just import 1 CI to keep the log file size down.

                 

                Can you check wheter the OOTB CI-CMDB job is working fine or wheter it shows the same issue with not validating/promoting?

                Do you have entries in CAI:Events from DMT which are not processed or have an error message?

                 

                FYI: there is a comprehensive troubleshooting guide on DMT in

                https://bmcsites.force.com/casemgmt/sc_KnowledgeArticle?sfdcid=000163160

                2 of 2 people found this helpful
                • 5. Re: Data Management Job Hung on Load Step
                  Timothy Mobley

                  Updates:

                  1. This problem happens on all DMT jobs - not just custom jobs (to answer your question).
                  2. Following this KBA on Escalation Pools, I changed the Min/Max Threads in Server Information > Ports and Queues to '9' because that is the total number of escalation pools that exist in our system.
                  3. I found an overlay on the CHECKATTRIBUTES escalation and deleted it, just to ensure there are now no overlays on any escalations (this may be nothing).
                  4. Still looking through api/filter/sql logs, but wanted to make these notes for now.
                  • 6. Re: Data Management Job Hung on Load Step
                    Andreas Mitterdorfer

                    If it happens on all jobs: Please create a foundation job with an empty support group (or people) sheet and enable api/filter/sql log to one file when you run the job. Check on carte server web page (or udm:executioninstance) wheter the import job is finished and wait 2-3 minutes after that, then disable logging and check for the escalation and filter.

                     

                    Which AR/ITSM version are you running?

                    2 of 2 people found this helpful
                    • 7. Re: Data Management Job Hung on Load Step
                      Carl Wilson

                      Hi Timothy,

                      >Additionally, there are then duplicate records - one in the production (BMC.ASSET) dataset and the other in BMC.ITSM.CI.DATA.

                       

                      This is correct behavior. 

                      The data is first loaded into a temporary Dataset (BMC.ITSM.CI.DATA), then through reconciliation promoted to BMC.ASSET.

                       

                      The other settings regarding Server names, etc. for the UDM are required for you to load the data into the BMC.ITSM.CI.DATA Dataset and classes.  As this is working, then those settings are correct.

                       

                      Quick explanation of the process for normal data loads:

                       

                      • Data is retrieved from the spreadsheet, converted and loaded into the system.  You can see this working from the UDM configured directory and the directory created with the Job ID on the server, and the file being present.  This is usually where 95% of the problems occur, in the AI Jobs/Transforms and the configuration settings passed to do the conversion/load.
                      • The data is then validated and promoted - this is done via Remedy workflow, therefore if there is an issue here you can use the normally logging to see what the issue may be.  You can also expose the "z1D Action" field on the load forms and use the keywords to trigger the Validation and Promotion i.e. VALIDATE, VALIDATELOAD (you can find discussions on this).

                       

                      For the CMDB transactions, there are a couple of other steps involved where once the load in completed it "should" trigger the reconciliation to do the push to BMC.ASSSET.  Danny Kellett can explain how this works

                       

                      However, you will find that the Reconciliation Job is actually set to run at say 1:00am each morning, so if you wait long enough (overnight) your data will probably be there in the morning baring no issues. 

                      You can also manually trigger the Reconciliation to do this manually as you have found. 

                       

                      So it sounds like the trigger for the completion has not been sent to the Job correctly, Andreas has done a great job for checking why this may be occurring.

                       

                      Cheers

                      Carl

                      2 of 2 people found this helpful
                      • 8. Re: Data Management Job Hung on Load Step
                        Timothy Mobley

                        Andreas,

                         

                        I'm running version 9.1.04. Also, I am not running a server group environment (only one AR Server). I'm not sure how to determine whether the job finished from looking at UDM:ExecutionInstance (no 'Status' field or anything on that form). I ran a Foundational job for Support Group with blank spreadsheet as you suggested and the job hung on the Load step as usual. Looking at the combined API/Filter/SQL log file, I noticed the following:

                        • It successfully makes a copy of the data load spreadsheet in C:\Program Files\BMC Software\ARSystem\Arserver\Db\UDM\<DJB000000###>\
                        • However, no result file is found (as expected) in C:\Program Files\BMC Software\ARSystem\ARServer\Db\UDM\<DJB000000###>\RESULT.zip
                        • Logs mentioned DMT:AuditLog, so I checked it out and found the following:
                          1. Job START: Cleanup Completed
                          2. Load Trigger CAI (not sure what this means?)
                          3. Finished Copying Sequence Records from Skeleton
                          4. Load RUN

                         

                        ...I can understand why my custom job might hang (misconfiguration as Carl Wilson mentioned), but I still don't understand when or why all jobs do this - even the jobs created in the Onboarding Wizard. I'm going to go back to that comprehensive DMT troubleshooting guide again - that you mentioned - and see if there's any clues to be found.

                         

                        ***P.S. ~ This may be nothing, but I just noticed I don't have a arcarte server hostname mentioned in my armonitor.cfg file, which is supposed to match the host name in UDM:Config.

                        • 9. Re: Data Management Job Hung on Load Step
                          Carl Wilson

                          Hi Timothy,

                          you can check the carte.log to see what is happening with your job and spreadsheet load, you should be able to determine what is happening from this log file.

                           

                          Have you tried an OOB load sheet with say one entry e.g. Support Group, Person, Cat?

                           

                          The UDM:Config entry should match the name of the server in the ar.cfg file, some customers put both the short and FQDN names to be safe - some put the LB name.  Unless all servers are named the same in the ar.conf, and resolved by host entries, the job needs to be run on the server that is designated the primary in the UDM:Config.  This is something that BMC can improve on for SG configurations ....

                           

                          It is usually an issue with name mismatches, or data conversion issues that cause the problems on the load.  This will be shown in the carte.log.

                           

                          Each Job has "steps", which will create entries in the associate forms which you can check for the status i.e. load. validate, promote.

                          It is quite a complicate process for something as simple as loading a spreadsheet into a staging form, this is because the Jobs can be scheduled and local and global variables can be passed into the Job/Transform using the UDM Console.

                           

                          >Load Trigger CAI (not sure what this means?)

                           

                          The UDM uses the CAI subsystem (CAI:Events) to trigger the data load subsystem.

                           

                          Cheers

                          Carl

                          2 of 2 people found this helpful
                          • 10. Re: Data Management Job Hung on Load Step
                            Timothy Mobley

                            Update (for what it's worth):

                             

                            As a test, I ran the Foundation job Product_Catalog (row level logging turned on) with OOB load sheet with one entry. The arcarte.log file shows the transformation looking through the tabs in the spreadsheet, and when it got to PCT_LoadProductCatalog it wrote the one record (I=1, O=0, R=0, W=1, U=0, E=0) and then ended with  "Job execution finished" - no further log entries. But in DMT console the Load Step still says "In Progress" and Validate and Promote are "Queued". The entry on the spreadsheet doesn't appear in the Product Catalog, so it didn't load.

                             

                            Also tested with the Onboarding Wizard (entering a Company). In that case, the company did load into the system, but the job still hangs on load step.

                            • 11. Re: Data Management Job Hung on Load Step
                              Andreas Mitterdorfer

                              Please can you review https://bmcsites.force.com/casemgmt/sc_KnowledgeArticle?sfdcid=kA33n000000Y9I3CAK&type=Solution and check wheter you have these errors in arjavaplugin.log?

                               

                              Please can you configure escalation DMT:DJS:SetStatus and DMT:VIS:JOB_Stall_Check to run on its own escalation pool.

                              Then create an api/filter/sql log to one file and a separate escalation log when you run the Foundation Product_Catalog job and after that attach the zipped log files.

                              If nothing is working it might be that the sequencing records are broken.

                               

                              Regarding carte server name in armonitor.cfg:

                              There should be the servername and port after the org.pentaho.di.www.Carte parameter. Can you correct this?

                              1 of 1 people found this helpful
                              • 12. Re: Data Management Job Hung on Load Step
                                Carl Wilson

                                Hi Timothy,

                                the data will be loaded in the associated "Load" form i.e. PCT:LoadProductCatalog, from there the validation and promote workflow will push the entry to the Product Catalog.  This is the same for other Foundation forms i.e. they have an associated "Load" form, and workflow then checks for errors (validation) and then "promotes" the entry from the load form to the target destination form if no validation errors.

                                 

                                If the data is in the Load form, then it is just a matter of sorting out why once the AI load has completed it is not triggering the Validate/Promote steps (which can be done manually by exposing the "z1D Action" field and setting the keyword to perform the actions) - this is what Andreas is working with you to establish.

                                 

                                Cheers

                                Carl

                                2 of 2 people found this helpful
                                • 13. Re: Data Management Job Hung on Load Step
                                  Timothy Mobley

                                  Andreas Mitterdorfer - Yes, I am getting "ERROR (90): Cannot establish a network connection to the AR System server", however I do not have a load balancer set up yet in this deployment. However, I found this BMC video on Troubleshooting Reconciliation Jobs Stuck in Queued Status, and discovered that the AR Dispatcher Process (arsvcdsp.exe) in armonitor.cfg is not starting. When I tried manually running that line (in elevated CMD prompt) I get the following error:

                                  "The program can't start because MSVCR71.dll is missing from your computer" ...so I was able to locate the missing DLL, which was hiding in a subdirectory of C:\Users\<username>\AppData\Local\Temp\. According to this discussion, it should be in either C:\Windows\System32\ OR C:\Windows\SysWoW64\, so I copied that DLL to both of those locations. When I reran the arsvcdsp.exe, the service started right up in Task Manager.

                                   

                                  Carl Wilson - I sort-of tried your suggestion by making an overlay field for 'z1D Action' and set the default value to VALIDATELOAD. If I'm understanding you right, this should move all load forms on to the Validate and Load step automatically. Since this is more of a workaround, I suppose I'll have to do that for all load forms, but that's fine with me. Also, just for good measure, on the Data Management Job Console > Other Functions > Application Preferences > Data Management tab, I set 'Job Wait' to Auto Validate-Auto Promote.

                                   

                                  I'll do some testing tomorrow, but am hoping I'm close to a solution. We'll see... Thank you both!

                                  • 14. Re: Data Management Job Hung on Load Step
                                    Carl Wilson

                                    Hi,

                                    sounds like you are close.

                                    That DLL, I have noticed, also goes missing on customer environments occasionally so I am wondering if a Windows update removes it?

                                     

                                    Keep us informed

                                     

                                    Cheers

                                    Carl

                                    1 of 1 people found this helpful
                                    1 2 Previous Next