1 2 Previous Next 27 Replies Latest reply on Feb 19, 2018 10:59 AM by Andrew Waters

    discovery.run commands & connection timeouts

    Mark Lemar
      Share This:

      We've observed an issue where we intermittently drop data quality on SoftwareInstance.version for some products, where this attribute is populated using an active versioning technique (running a command).


      For example, for 'IBM Tivoli Monitoring Linux OS Monitoring Agent', the pattern uses the following command to get version info:


      /opt/Tivoli/itm/bin/cinfo –d


      When this issue occurs, I can see that this run command has failed with a FailureReason of 'NoAccessMethod' & Error of 'Connection timed out'.  On previous and subsequent scans of the same host, I can see that the command has executed successfully and returned version information as expected.  When the command times out, the version attribute on the SI is flushed, so we lose the value on that day's extract we run.  This results in a flip-flopping in our down stream policy engine which ingests the data.


      I'm investigating the hosts in question more closely, to understand the timeouts better.  However, there are a couple of things that I'd like to understand:


      Why does the SI lose it's version info when this happens?  Is there anything that can be coded in the TPL to check for such timeout scenarios & ensure that the SI retains its version value?


      Which of the timeout options is controlling the cmd timeouts referred to by these error messages?

        • 1. Re: discovery.run commands & connection timeouts
          Andrew Waters

          "Connection timed out" would normally be the timeout associated with the credential.


          TPL does not specifically present the different kinds of failure.


          As to what the pattern does that will depend upon how important the information is in determine the uniquely identifying the software.

          1 of 1 people found this helpful
          • 2. Re: discovery.run commands & connection timeouts
            Mark Lemar

            In terms of what information is used to uniquely identify the software, I assume you mean something like the SI key?


            For IBM Tivoli Monitoring, looking at the IBM.TivoliMonitoring pattern module, the SI key appears to be generated from the SI type and Host key.  The version information is not used in the construction of the key.


            As a principle, I'd expect ADDM to retain an SI version value it had previously discovered, it was unable to obtain it on a subsequent scan (for whatever reason).  Is that a fair assessment & is that how ADDM should work in regard to software discovery?

            • 3. Re: discovery.run commands & connection timeouts
              Andrew Waters

              As i said - it depends upon the software. For example there is software which can be installed multiple times. If you cannot distinguish the instances then it becomes significantly harder to retain details.

              • 4. Re: discovery.run commands & connection timeouts
                Mark Lemar

                I can only really comment on IBM Tivoli Monitoring products at this point.  I know that there are multiple agents available, but would only expect to see a particular agent running once on a given host.


                Interestingly, for one of the hosts running some ITM products (IBM Tivoli Monitoring Linux OS Monitoring Agent & IBM Tivoli Monitoring Log File Agent), we lost the version info again for both during the last scan.  There are 2 session logs for this last host scan and I can see that the 'cinfo' command ran and produced the required output in the earliest session log.  However, the discovery access doesn't show this in the command result and instead shows the 'Connection timed out' error.  I don't understand why that would be.


                I have raised case # 00398568 to investigate.

                • 5. Re: discovery.run commands & connection timeouts
                  Andrew Waters

                  How does the session log containing the cinfo command end? There may be some reason the system fails to identify the command line prompt.

                  • 6. Re: discovery.run commands & connection timeouts
                    Mark Lemar

                    The tail of the 1st session log looks like this, just after the cinfo command is run.


                    I'd expect to see the 'Permission denied' output in the discovery command result, as it indicates that we can't get results from the command when run unelevated.  We then attempt to run the same command with elevated privs which we get the required command output.


                    • 7. Re: discovery.run commands & connection timeouts
                      Andrew Waters

                      Okay - so that looks like the problem. cinfo -d is writing lots of failures which means then shell $ prompt is lost. That would also explain why there is some variability depending upon how quickly the output it written vs the echoing of the shell prompt.


                      If you run the command manually does redirecting standard error to /dev/null hide all the permission denied messages, i.e. running


                      /opt/Tivoli/itm/bin/cinfo -d 2>/dev/null

                      2 of 2 people found this helpful
                      • 8. Re: discovery.run commands & connection timeouts
                        Mark Lemar

                        Unfortunately, we no longer have the ability to log on to target hosts with our discovery ID to test things like this.  I'll see if I can find somebody who can do this.


                        Forgive my ignorance, but how can you tell that there are lots of failures from the command?  Are you referring to the number of "Permission denied" messages?


                        When we successfully obtain version info, it usually looks like this (the following is lifted from another example host, where unelevated cinfo didn't produce the required result but running it elevated did).  We found that we didn't need to run the command elevated on all hosts, just some.  Now I'm thinking that maybe we should maybe run it elevated every time?

                        • 9. Re: discovery.run commands & connection timeouts
                          Andrew Waters

                          Yes - I presume cinfo is a script which is trying to grep the configuration file but has insufficient permissions.

                          1 of 1 people found this helpful
                          • 10. Re: discovery.run commands & connection timeouts
                            Mark Lemar

                            Sorry, I was editing my last comment when you replied & wanted to ensure you saw my last update?


                            I'm wondering if we should just run the command elevated every time?  If so, we could obviously alter our existing custom pattern accordingly, just to do this.  However, I'm not sure if this is something that would be changed in a TKU pattern, based on how PRIV_RUNCMD is used these days?

                            • 11. Re: discovery.run commands & connection timeouts
                              Andrew Waters

                              More recent patterns tend to run the command without privilege escalation and if they fail to get the expected output with privilege escalation. This is because the system does not know if escalation is allowed or will work. That is unlikely to change.


                              If redirecting stderr works then that should be added to the pattern.


                              As to what you want to do... I would think maintaining your own versions of patterns is rather painful.

                              1 of 1 people found this helpful
                              • 12. Re: discovery.run commands & connection timeouts
                                Mark Lemar

                                I'll see what I can to in terms of logging on to the host and redirecting stderr.


                                The bit I don't understand,is the intermittent nature of how the output from un-elevated cinfo command is being handled.  I wouldn't expect the "Permission denied" output to change from day-to-day, so what's preventing it appearing as per the session log in the discovery cmd result on some days?


                                In regard to custom patterns, we try to avoid these as much as possible to stay aligned to TKU versions.  For this scenario, we originally coded an 'overlay' pattern which co-existed with the TKU pattern and filled in the blanks (i.e. triggering on the respective SI types, checking for blank versions and running an elevated cinfo command).  We then observed some timing issues, as a results of their being a undetermined about of time between the respective patterns being triggered & commands being run.  As a result, we collapsed this logic in to a single custom pattern, de-activating the TKU ITM Common Functions pattern in the process.  This meant that if the unelevated cinfo command returned "Permission denied", we'd immediately try again with elevated privs - I've attached the custom pattern to the case.

                                • 13. Re: discovery.run commands & connection timeouts
                                  Andrew Waters

                                  From the log you showed it showed a permission denied response after the shell script had appeared. This means the prompt is not recognised because there is output after the $ and hence the $ from the prompt is ignored. I would imagine that in the cases which work, i.e. do not time out, the permission errors are all reported before the shell prompt appears.

                                  1 of 1 people found this helpful
                                  • 14. Re: discovery.run commands & connection timeouts
                                    Mark Lemar

                                    For info, the following screenshots are from the latest session logs.  These relate to the last scan of the same host, which did successfully get the version info - obtained because the elevated cinfo command was executed.


                                    Taken from the 1st of 3 session logs for this scan:

                                    Taken from the 3rd of 3 session logs for this scan:


                                    1 2 Previous Next