I am not sure I understand discovery properly.
I have documented the discovery procedure (as I understanding it) as thoroughly as possible below - on the assumption that - if I don't understand it, then there are probably others who are struggling to understand it too.
I have highlighted all my queries in Yellow. (This inform is based on ADDM v9.02)
- 1 Device Recognition
- 2 Standard Discovery
- 3 Pattern Execution
- 4 Additional Discovery
- 5 Consolidation
1 Device Recognition
1.1 Port Scanning
The ADDM Appliance will first perform port scanning of selected ports as declined in the Discovery configuration. There are about 10 ports that it scans by default - including 135 (Windows RPC) and ssh (22).
1.2 Header information
ADDM also interprets the header information returned by the various protocols in order to try and determine the type of the device that has been discovered (e.g. Windows, Linux etc.)
1.3 Reverse DNS Lookup
At some point, ADDM does a reverse DNS lookup of the IP-Address. The FQDN is linked to both the DA Node and the Host Node. The FQDN is linked via a Node Kind Called FQDNList. The Reverse DNS Lookup is important for Widows Discovery (see Kerberos Issue below).
1.3.1 Kerberos Issue
ADDM establishes WMI connections using DCOM/Kerberos and the hostname returned by the reverse DNS lookup is used somehow as a key for the Kerberos encrypted session (I think). Without a hostname, this communication can fail to get established. WMI is supposed to fall back to the older protocol (NTLM), but I believe support for this protocol is being dropped by Microsoft (need to check). To be honest, I do not completely understand this area, but I have encountered issues here. I believe that all hosts in your Windows environment must have Reverse DNS lookup activated (this is a checkbox ) on your DNS servers.
1.4 DeviceInfo Node
At some point, ADDM then creates a DeviceInfo Node. The DeviceInfo Node is created even when the DA end_state is "NoAccess". In the DeviceInfo node, the OS Type is determined by "Heuristics".
2 Standard Discovery
After completing the port scanning, ADDM then tries to login to each IP-Address using the credentials configured on the ADDM Appliance. For Windows AD Proxies, ADDM uses the credentials defined in the "Run As" service.
ADDM will try all credentials for all devices. For example, if you look at the SessionResults for a Windows Host, you can see that ADDM attempts to use many different types of credentials including UNIX credentials.
2.1.1 UNIX Credentials used for Windows Hosts?
On many of our Windows hosts that returned "NoAccess", you see that ADDM has tried to login to a Windows host using UNIX credentials. Why?. I don't understand this. sshd can be installed on Windows, but ADDM has no ssh platform script for Windows, so why does ADDM try to use UNIX credentials for Windows hosts? Is this done in case Heuristics miss-identified the OS?
2.1.2 SNMP Use for Windows
I have seen that some of our Windows Hosts have been discovered via SNMP.
2.1.3 Optimization of Login Credentials
ADDM remembers which credential was successful during the last scan and then retries this credential on the next discovery.
One of the first sections (methods) in the Solaris platform script is getDeviceInfo which runs commands such as uname and looks at files such as /etc/release and /etc/resolve.conf (for the dns_comain).
I don't believe that the getDeviceinfo method is used to populate the DeviceInfo Node at all. The getDeviceInfo method contributes to information in the Host node. This confusing me.
All the other methods in the platform scripts are executed at this point.
During standard discovery, ADFDM runs the getNetworkInterfaces method. On UNIX, this is part of the platform script (e.g. solaris.sh) and it runs ifconfig. On windows, it is a WMI query.
2.4 Scan Optimization (IP-Addresses).
Every host may have more than one IP-Address. It would be very inefficient to scan every IP for every host. At some point, ADDM will mark one IP Address on each host as the preferred IP to use for discovery. All secondary IP-Addresses will be marked as NotBestIP and the scan will be skipped.
2.5 Scan Optimization Timeout
IP-Addresses are recycled and changed and hosts are retired – so periodically, IP Addresses that are skipped should be revaluated. This is controlled by the parameter Scan Optimization Timeout. (see Model Maintenance).
I believe this parameter only affects the IP addresses that are marked as NotBestIP. What about credentials? ADDM also remembers the best credential to use. If a Windows server is currently being scanned by SNMP, but then the Customers configures a service account for that server, will ADDM then stop using SNMP and switch to WMI?
2.6 FQDN Issue on Windows
At about this time, ADDM sets (or re-affirms) the Domain and FQDN of the host. This information may come from the NetworkInterfaces configured on the Windows host (even disabled interfaces). When a Physical Windows host is migrated to a virtual Host using the P2V process, old Network Interfaces from the physical server (even disabled ones) are scanned. So if a server is migrated from DEV to PROD (using the P2V process), the FQDN and Domain may not match. It is a smallish issue, but for me it is confusing.
2.7 Single Session
BMC Tells me that ADDM opens a single session to the target host for both Standard Discovery and Additional Discovery. The session is held open, so that the credentials do not have to be re-authenticated. I guess, this is done for performance reasons.
2.8 DDD Data Collection
All the information gathered by the platforms scripts is shipped back to the ADDM Appliance. The information is called DDD data. Directly Discovered Data. When the data arrives back at the Appliance, the data is interpreted and all the DDD Nodes are created.
I am guessing that this initial interpretation of DDD data is not controlled by TPL – but is built-in.
3 Pattern Execution on the ADDM Appliance
3.1 Patterns Are Triggered
The created of the DDD nodes in the database, triggers execution of the patterns. Patterns are written in TPL. Topology pattern Language). TPL only executes on the Appliance.
TPL never executes on the Slave or the discovered host.
4 Additional Discovery
Embedded within the TPL of the patterns, there may be commands such as:
The TPL executes additional commands on the target host. The target host is revisited. The session to the host should still be open. Yes?
4.1 Optimization for Additional Discovery
Optimization is not used for Additional Discovery. See table below.
Implemented In ADDM?
On each host ADDDM uses only one IP Address for discovery. all Other IP Addresses are marked as NotBestIP
Skipped IP Addresses are revaluated in case the IP has been reassigned to a different host.
ADDM remembers the last successful credential (and associated protocol) and uses for the next discovery.
The Credential is re-evaluated, in case a different credential and protocol has been implemented on the target server.
ADDM Does not execute Additional Discovery if the Standard discovery is the same. ADDM uses the stored results from last discovery.
If the configuration of the server changes (as determined by the standard discovery), ADDM re-executes Additional Discovery.
Not Implemented. RFI Suggested.
5.1 When does Consolidation Run?
During a Discovery Run, Consolidation is triggered. Normally one sees the Consolidation occurring on the consolidation server slightly after the discovery run on the Discovery Appliance. However, the two jobs normally overlap.
I assume that after each DA has been completed for target host, consolidation is triggered somehow.
5.2 Delayed Consolidation
Occasionally one may see that Consolidation is delayed. It does not start until the Discovery Run on the discovery appliance has completed for all target devices. This problem points to a performance issue, but I do not understand the root cause or why it is an intermittent problem.
5.3 On Hold
Sometimes, the whole job will be marked as being "On Hold" and Consolidation will never get started. I do not understand the root cause or why it is an intermittent problem.
BMC tells me that only the DDD Data is sent to the consolidated appliance. Inferred Data (i.e. Nodes) are not consolidated. Inferred Nodes are re-inferred on the Consolidation Appliance. I wonder why this decision was made?
Session Results are not sent to the consolidator. Session results report on the success or failure of each attempt to login to the target device. To debug NoAccess issues, it is best to perform this activity on the Scanning Appliance (and not the Consolidation appliance).
5.6 Additional Discovery
I was told by BMC that the Consolidation Appliance does not establish any connections to the target host. This makes sense because there are no credentials on the target host.
The Documentation says:
The consolidated data is the BMC Atrium Discovery Directly Discovered Data (DDD) nodes including the data collected by the patterns. The data inferred by the scanners, for example, Software Instance nodes, is not consolidated, but the consolidator will infer it again (based on its pattern configuration).
Additional Discovery information mostly collects version information which is stored in the SoftwareInstance and SoftwareComponent Nodes. This is not DDD data. So how does the information from Additional Discovery get propagated to the Consolidation Appliance? Is it trickled back, or sent as a big chunk with the DDD data. Is this data stored in the database? If so, what are the Node Kinds that are used?
Discovery Access results that are marked as NotBestIp on the scanning appliance are marked as OptRemote on the Consolidation Appliance. Why? What is going on here? Why is the attribute value changed?
5.8 Missing information when patterns run commands on other hosts
The documentation says:
When a host is discovered and patterns are triggered which run commands on a second host, the DDD on both hosts is updated. When the original host is consolidated, the DDD on the second host is not available to the patterns that trigger on the consolidator. When the second host is consolidated, the DDD created on it when discovering the first host is not included. Consequently the consolidator will always report that the information from the second host is unavailable. The error "Request for information not part of the consolidated data" will be reported in the consolidated DiscoveryAccess. This can lead to missing nodes (licensing Detail, SoftwareComponents, and so on) and relationships on the consolidator. To work around this behavior, scan the original host from the consolidator.
I have seen this error many times: " Request for information not part of the consolidated data" – but I have no idea about the impact? I am also none the wiser after reading this section of the documentation. Perhaps an example would allow me to understand it.
5.9 Direction of Communication Establishment (through Firewalls).
Our security team are concerned about the direction in which the session between Consolidation Server and Scanning Appliance is established. I believe (if I remember correctly), that the connection is configured on the scanning appliance. However, after it is configured and when the Appliances are restarted, what is the direction of establishment of the connection? I believe our Security guys are opposed to any communication that is initiated from th DMZ and into the Corporate zone.
When looking at some Discovery Runs - and I click on Error's - we get different results for the Scanner and Consolidator Servers. For errors, Why do I see different results from the scanner and consolidator?? Are errors part of the consolidated data?
Is SNMP data part of the DDD data? We have defined Recognition Rules only on the Scanning Appliances (and not on the Consolidation Device). This seams to work OK. But if only DDD data is consolidated, how does the recognition get propagated properly? This seams slightly contradictory - but there is probably something I am missing here..