6 Replies Latest reply on Aug 7, 2015 12:30 AM by Andrew Waters

    ADDM Discovery – How does it Work (Exactly?)

    Doug Connell

      I am not sure I understand discovery properly. 


      I have documented the discovery procedure (as I understanding it) as thoroughly as possible below - on the assumption that - if I don't understand it, then there are probably others who are struggling to understand it too.

       

      I have highlighted all my queries in Yellow.  (This inform is based on ADDM v9.02)

       

       

      Contents

       

      1. 1 Device Recognition
      2. 2 Standard Discovery
      3. 3 Pattern Execution
      4. 4 Additional Discovery
      5. 5 Consolidation

       

      1 Device Recognition

       

      1.1 Port Scanning


      The ADDM Appliance will first perform port scanning of selected ports as declined in the Discovery configuration.  There are about 10 ports that it scans by default - including 135 (Windows RPC) and ssh (22). 


      1.2 Header information


      ADDM also interprets the header information returned by the various protocols in order to try and determine the type of the device that has been discovered (e.g. Windows, Linux etc.)


      1.3 Reverse DNS Lookup


      At some point, ADDM does a reverse DNS lookup of the IP-Address.  The FQDN is linked to both the DA Node and the Host Node.  The FQDN is linked via a Node Kind Called FQDNList.  The Reverse DNS Lookup is important for Widows Discovery (see Kerberos Issue below).


      1.3.1 Kerberos Issue


      ADDM establishes WMI connections using DCOM/Kerberos and the hostname returned by the reverse DNS lookup is used somehow as a key for the Kerberos encrypted session (I think).  Without a hostname, this communication can fail to get established. WMI is supposed to fall back to the older protocol (NTLM), but I believe support for this protocol is being dropped by Microsoft (need to check).  To be honest, I do not completely understand this area, but I have encountered issues here. I believe that all hosts in your Windows environment must have Reverse DNS lookup activated (this is a checkbox ) on your DNS servers.


      1.4 DeviceInfo Node


      At some point, ADDM then creates a DeviceInfo Node. The DeviceInfo Node is created even when the DA end_state is "NoAccess".  In the DeviceInfo node, the OS Type is determined by "Heuristics".


      2 Standard Discovery

       

      2.1 Login

       

      After completing the port scanning, ADDM then tries to login to each IP-Address using the credentials configured on the ADDM Appliance.  For Windows AD Proxies, ADDM uses the credentials defined in the "Run As" service.

      ADDM will try all credentials for all devices.  For example, if you look at the SessionResults for a Windows Host, you can see that ADDM attempts to use many different types of credentials including UNIX credentials.

       

      2.1.1 UNIX Credentials used for Windows Hosts?

       

      On many of our Windows hosts that returned "NoAccess", you see that ADDM has tried to login to a Windows host using UNIX credentials.  Why?.  I don't understand this.  sshd can be installed on Windows, but ADDM has no ssh platform script for Windows, so why does ADDM try to use UNIX credentials for Windows hosts?  Is this done in case Heuristics miss-identified the OS?


      2.1.2 SNMP Use for Windows

       

      I have seen that some of our Windows Hosts have been discovered via SNMP.

       

      2.1.3 Optimization of Login Credentials

       

      ADDM remembers which credential was successful during the last scan and then retries this credential on the next discovery.

       

      2.2 getDeviceInfo

       

      One of the first sections (methods) in the Solaris platform script is getDeviceInfo which runs commands such as uname and looks at files such as /etc/release and /etc/resolve.conf (for the dns_comain).

      I don't believe that the getDeviceinfo method is used to populate the DeviceInfo Node at all.  The getDeviceInfo method contributes to information in the Host node. This confusing me.

      All the other methods in the platform scripts are executed at this point.

       

      2.3 getNetworkInterfaces

       

      During standard discovery, ADFDM runs the getNetworkInterfaces method.  On UNIX, this is part of the platform script (e.g. solaris.sh) and it runs ifconfig.  On windows, it is a WMI query.

       

      2.4 Scan Optimization (IP-Addresses).

       

      Every host may have more than one IP-Address.  It would be very inefficient to scan every IP for every host.  At some point, ADDM will mark one IP Address on each host as the preferred IP to use for discovery.  All secondary IP-Addresses will be marked as NotBestIP and the scan will be skipped.

       

      2.5 Scan Optimization Timeout

       

      IP-Addresses are recycled and changed and hosts are retired – so periodically, IP Addresses that are skipped should be revaluated.  This is controlled by the parameter Scan Optimization Timeout.  (see Model Maintenance).

      I believe this parameter only affects the IP addresses that are marked as NotBestIP.  What about credentials?  ADDM also remembers the best credential to use.  If a Windows server is currently being scanned by SNMP, but then the Customers configures a service account for that server, will ADDM then stop using SNMP and switch to WMI?


      2.6 FQDN Issue on Windows

       

      At about this time, ADDM sets (or re-affirms) the Domain and FQDN of the host.  This information may come from the NetworkInterfaces configured on the Windows host (even disabled interfaces).  When a Physical Windows host is migrated to a virtual Host using the P2V process, old Network Interfaces from the physical server (even disabled ones) are scanned.  So if a server is migrated from DEV to PROD (using the P2V process), the FQDN and Domain may not match.  It is a smallish issue, but for me it is confusing.

       

      2.7 Single Session

       

      BMC Tells me that ADDM opens a single session to the target host for both Standard Discovery and Additional Discovery.  The session is held open, so that the credentials do not have to be re-authenticated. I guess, this is done for performance reasons.

       

      2.8 DDD Data Collection

       

      All the information gathered by the platforms scripts is shipped back to the ADDM Appliance.  The information is called DDD data.  Directly Discovered Data.  When the data arrives  back at the Appliance, the data is interpreted and all the DDD Nodes are created.

      I am guessing that this initial interpretation of DDD data is not controlled by TPL – but is built-in.


      3 Pattern Execution on the ADDM Appliance

       

      3.1 Patterns Are Triggered

       

      The created of the DDD nodes in the database, triggers execution of the patterns.  Patterns are written in TPL.  Topology pattern Language).  TPL only executes on the Appliance.

      TPL never executes on the Slave or the discovered host.

       

      4 Additional Discovery

       

      Embedded within the TPL of the patterns, there may be commands such as:

       

      discovery.runCommand(host,command)

       

      The TPL executes additional commands on the target host.  The target host is revisited.  The session to the host should still be open. Yes?

       

       

      4.1 Optimization for Additional Discovery

       

      Optimization is not used for Additional Discovery.  See table below.


      Item

      Optimization

      Optimization Timeout

      Implemented In ADDM?

      IP-Addresses

      On each host ADDDM uses only one IP Address for discovery.  all Other IP Addresses are marked as NotBestIP

      Skipped IP Addresses are revaluated in case the IP has been reassigned to a different host.

      Yes

      Credential

      ADDM remembers the last successful credential (and associated protocol) and uses for the next discovery.

      The Credential is re-evaluated, in case a different credential and protocol has been implemented on the target server.

      Not Sure.

      Additional Discovery.

      ADDM Does not execute Additional Discovery if the Standard discovery is the same.  ADDM uses the stored results from last discovery.

      If the configuration of the server changes (as determined by the standard discovery), ADDM re-executes Additional Discovery.

      Not Implemented. RFI Suggested.


      5 Consolidation

       

      5.1 When does Consolidation Run?

       

      During a Discovery Run, Consolidation is triggered.  Normally one sees the Consolidation occurring on the consolidation server slightly after the discovery run on the Discovery Appliance.  However, the two jobs normally overlap.

      I assume that after each DA has been completed for target host, consolidation is triggered somehow.

       

      5.2 Delayed Consolidation

       

      Occasionally one may see that Consolidation is delayed. It does not start until the Discovery Run on the discovery appliance has completed for all target devices.  This problem points to a performance issue, but I do not understand the root cause or why it is an intermittent problem.


      5.3 On Hold

       

      Sometimes, the whole job will be marked as being "On Hold" and Consolidation will never get started.  I do not understand the root cause or why it is an intermittent problem.

       

      5.4 Re-Inferred

       

      BMC tells me that only the DDD Data is sent to the consolidated appliance.  Inferred Data (i.e. Nodes) are not consolidated. Inferred Nodes are re-inferred on the Consolidation Appliance.  I wonder why this decision was made?


      5.5 SessionResults

       

      Session Results are not sent to the consolidator.  Session results report on the success or failure of each attempt to login to the target device.  To debug NoAccess issues, it is best to perform this activity on the Scanning Appliance (and not the Consolidation appliance).

       

      5.6 Additional Discovery

       

      I was told by BMC that the Consolidation Appliance does not establish any connections to the target host. This makes sense because there are no credentials on the target host.

      The Documentation says:

      The consolidated data is the BMC Atrium Discovery Directly Discovered Data (DDD) nodes including the data collected by the patterns. The data inferred by the scanners, for example, Software Instance nodes, is not consolidated, but the consolidator will infer it again (based on its pattern configuration).

      Additional Discovery information mostly collects version information which is stored in the SoftwareInstance and SoftwareComponent Nodes.  This is not DDD data.  So how does the information from Additional Discovery get propagated to the Consolidation Appliance?  Is it trickled back, or sent as a big chunk with the DDD data.  Is this data stored in the database?  If so, what are the Node Kinds that are used?


      5.7 OptRemote

       

      Discovery Access results that are marked as NotBestIp on the scanning appliance are marked as OptRemote on the Consolidation Appliance. Why?  What is going on here?  Why is the attribute value changed?


      5.8 Missing information when patterns run commands on other hosts

       

      See link http://discovery.bmc.com/confluence/display/100/Consolidation.

       

      The documentation says:

      When a host is discovered and patterns are triggered which run commands on a second host, the DDD on both hosts is updated. When the original host is consolidated, the DDD on the second host is not available to the patterns that trigger on the consolidator. When the second host is consolidated, the DDD created on it when discovering the first host is not included. Consequently the consolidator will always report that the information from the second host is unavailable. The error "Request for information not part of the consolidated data" will be reported in the consolidated DiscoveryAccess. This can lead to missing nodes (licensing Detail, SoftwareComponents, and so on) and relationships on the consolidator. To work around this behavior, scan the original host from the consolidator.

      I have seen this error many times: " Request for information not part of the consolidated data" – but I have no idea about the impact? I am also none the wiser after reading this section of the documentation.  Perhaps an example would allow me to understand it.


      5.9 Direction of Communication Establishment (through Firewalls).

       

      Our security team are concerned about the direction in which the session between Consolidation Server and Scanning Appliance is established.  I believe (if I remember correctly), that the connection is configured on the scanning appliance.  However, after it is configured and when the Appliances are restarted, what is the direction of establishment of the connection?  I believe our Security guys are opposed to any communication that is initiated from th DMZ and into the Corporate zone.

       

      5.10    Errors

       

      When looking at some Discovery Runs - and I click on Error's - we get different results for the Scanner and Consolidator Servers.  For errors, Why do I see different results from the scanner and consolidator??  Are errors part of the consolidated data?


      5.11   SNMP

       

      Is SNMP data part of the DDD data?  We have defined Recognition Rules only on the Scanning Appliances (and not on the Consolidation Device).  This seams to work OK.  But if only DDD data is consolidated, how does the recognition get propagated properly?  This seams slightly contradictory - but there is probably something I am missing here..

        • 1. Re: ADDM Discovery – How does it Work (Exactly?)
          Andrew Waters

          2.1.1 - until ADDM gain access to a machine it cannot know what the OS is. Just because you know it is Windows does not mean ADDM knows. So it will try Unix credentials on Windows machines.

           

          2.2 - you would be wrong - most of getDeviceInfo is from running the script. A few additional bits of information like access method, credential used are added to the recovered information.

           

          2.5 - credentials do not timeout. ADDM does not remember the best credential - it remember the last credential. There is a significant difference. It is in some cases impossible and in many cases very time consuming to determine which is the best credential. As a special case if something was scanned with SNMP and there is a login credential that works then for Hosts it will attempt to make use of the login credential.

           

          2.8 - data discovered is (possible parsed) converted into a standard form a recorded as DDD. So DDD represents all discovered information with little translation. This includes all standard discovery and discovery performed by TPL. It makes absolutely no sense for TPL to control this.

           

          4 - The session has no control over this. If the session is open it will use it otherwise it will create a new session.

           

          5.2 - No this is completely incorrect. Once any endpoint has completed discovery it is available to be sent from scanner to consolidator. Things which will stop it are if the consolidator has scanning disabled or the services are down. What is does is process one DiscoveryRun at a time in sequence. So it will find all completed for one scan and send those, the next run and send those. It does this for performance reasons.

           

          5.3 - on hold on the scanner means that it is outside the scan windows. Until an endpoint is complete the scanner will not send the DDD data so there is nothing to send.

           

          5.6 - the results of additional commands is DDD data and hence is sent from scanner to consolidator.

           

          5.7 - optimisations on the scanner are changed to OptRemote on the consolidator to make it possible to distinguish what has been optimised by the scanner and what has been optimised by the consolidator.

           

          5.8 - this means that you tried to perform additional discovery on the consolidator which was not performed on the scanner. Because the consolidator is just using DDD information from the scanner there is no result and hence this error is reported. It is normally caused by have different patterns on the consolidator and scanner (something which is explicitly no supported). What the effect will depend completely on the pattern. Basically the pattern is most likely not able to build the appropriate inferred model.

           

          5.10 No errors are not part of DDD and hence not consolidated. Errors are normally caused by issues in patterns and generated when a pattern is run.

           

          5.11 SNMP is just like everything else - the results of SNMP requests are stored in DDD. The recognition does not need to be propagated because the DDD contains details about the device and hence does not need to be rerun.

          • 3. Re: ADDM Discovery – How does it Work (Exactly?)
            Doug Connell

            Good replies.  Thanks for taking the time to answer. 

             

            With regards to 5.9.  Can I reassure Security that The consolidation connection for each DR is always established from Scanner to Consolidator?

            • 4. Re: ADDM Discovery – How does it Work (Exactly?)
              Andrew Waters

              The scanner always initiates the connection and sending of data.

               

              The communication itself is slightly more complicated - it is likely it will not create a brand new connection but use a previously constructed channel.

              • 5. Re: ADDM Discovery – How does it Work (Exactly?)
                Doug Connell

                RE: You Answer to 2.1.1.  I think we are not quite on the same page.   I can't let it go.

                 

                ADDM does know the OS is Windows.  There are some attributes in Device Info Node I think called "OS Inferred by Heuristics" (or words to that effect).  This attribute clearly shows that ADDM thinks the OS is Windows.

                 

                So using a credential (that is only configured to use ssh) on a Windows host, especially after port scanning, when ADDM knows that there is no  ssh port open, seams to represent "Unusual" behavior.

                • 6. Re: ADDM Discovery – How does it Work (Exactly?)
                  Andrew Waters

                  If a credential is marked for use with ssh and the appropriate port is not open then it will not be used.

                   

                  OS inferred by heuristics means that there are no credentials which work (either they do not work or are not applicable because appropriate ports are not open). At this point in time all the session failures have happened - if ports are open it would have already tried what you term Unix credentials. The heuristics try several methods which, do not use credentials, to intelligently guess the OS.