I had a conversation with a customer this week who as running on CentOS 6, and was not aware of the need to start planning a migration. This is just a reminder to read past my whimsy here and get to the point: you need to be on CentOS 7 by end Nov this year.
BMC Discovery provides monthly Technology Knowledge Updates (TKU). These updates include:
- Support for new software products
- Enhancements and fixes for existing software products
- New integrations for Network Devices
- Enhancements and fixes for existing Network Devices
- New and updated patterns for Cloud Discovery and Storage Discovery
- New product content and fixes
Here are some relevant doc references:
With Helix Discovery, these updates are done automatically. Therefore, the following information applies primarily to on-premise BMC Discovery, although Use Case #3 could be helpful in either environment.
Note: It is always recommended to do a backup prior to updating a TKU.
Use Case #1
Problem: During a TKU upgrade, errors like the following occur:
"Change to module <module name> failed because imported name <name> version <version1> from <name> does not match required version <version2> at line <line number>”
Note: The following message is for information only and is not an error:
"Change to module xxx.xxx had warning Deactivating xxx.xxx to use newly activated module"
Root Cause #1: The TKU uploaded was for the wrong Discovery version. For example, a TKU for Discovery 11.1 was uploaded on an 11.3 appliance.
Solution #1: Upload the correct TKU.
Root Cause #2: The modules flagged as "failed" are custom patterns. For example:
"Change to module BAI_Application_aug_sync failed because Imported name 'BAI_Application' version 2.1 from CMDB.BAI_Application does not match required version 1.4 at line 3."
"Change to module CMDB.Extension.ComputerSystem_Augment failed because Imported name 'Host_ComputerSystem' version 2.0 from CMDB.Host_ComputerSystem does not match required version 1.8 at line 15."
These custom patterns have import dependencies on TKU patterns whose version has changed.
Solution #2: Update the custom patterns to use the correct versions of the modules they are importing.
Root Cause #3: The modules flagged as "failed" are Storage TKU patterns. For example:
"Change to module NetApp_Storage failed because Imported name 'ConversionFunctions' version 1.1 from ConversionFunctions does not match required version 1.2 at line 12."
Solution #3: This could happen in the following cases:
a) The Storage TKU was applied before the core TKU of the same month. In this case, perform the knowledge updates in the following sequence:
-Update core TKU first (BMC Discovery Technology Knowledge Update)
-Update Storage TKU next (BMC Discovery for Storage)
-Update EDP (Extended Data Pack)
b) The appliance previously had a very old Storage TKU version. In this case, the solution is to deactivate and delete the old Storage TKU.
c) An incomplete download of the core TKU causes a number of core TKU patterns to be missing. To correct this, re-download and reapply the core TKU, then reactivate the Storage TKU modules.
Use Case #2
Problem: While upgrading TKU, the message "Changes failed because Problem setting pattern state" appears
Solution: The cause of these errors is difficult to determine and the problem is frequently not reproducible. The following workarounds have proven effective:
1/ Do the following:
- Restart the Discovery services
- Go to Manage > Discovery and "stop all scans"
- Go to Manage > Knowledge and make sure "Auto Cleanup" is checked
- Try to upload the TKU again.
2/ Stop the reasoning service and restart it with the option to deactivate all patterns. See KA 000104293 for details. Once this is done, the patterns for the most recent TKU can be re-activated, followed by any needed custom patterns.
Use Case #3
Problem: Attempts to deactivate some TKU or custom patterns fail because of dependency relationships
Patterns commonly have dependency relationships with other patterns. In some cases, an attempt to deactivate or delete a pattern may fail with a message that some other pattern is dependent on it. Trying to deactivate or delete these dependent patterns then may fail with similar messages for still other patterns.
Solution: It is possible to stop the reasoning service and restart it with an option to deactivate all patterns. See KA 000104293 for details. Once this is done, the desired patterns can be deleted or re-activated as appropriate. For Helix Discovery, contact Customer Support.
Use Case #4
Problem: A TKU upgrade fails with a message like the following:
Change to module xxxxxx failed because TPL version 1.xx is not supported in file at line n.
Root Cause 1: Incompatibility between versions of BMC Discovery and TKU. For example, customer has BMC Discovery 11.2 and is applying the TKU for Discovery 11.3.
Solution 1: Download and apply the correct TKU version for the Discovery version.
Root Cause 2: The TPL version defined in a custom pattern is not supported by the Discovery version being used. For example, customer has BMC Discovery 11.2 (which supports TPL 1.14) and is uploading a pattern with TPL 1.15 defined.
Solution 2: The supported TPL version can be seen in the Manage > Knowledge page. The solution is to revise the TPL version in the pattern to be equal to or less than the supported version.
Root Cause 3: The Discovery version being used does not support the TPL version in the TKU upgrade. For example, customer is using Discovery 11.0 (with no patch) and tries to upgrade to a TKU version >= March 2016.
Solution 3: The TKU release notes would document this with something like "If you are running BMC Discovery version 11.x you must upgrade to 11.x0.1 (or later)". The solution is to upgrade Discovery as directed before applying the TKU.
Use Case #5
Problem: A TKU upgrade fails with a message like the following:
Change to Module <Module_Name> failed because Pattern <Pattern_Name> is deprecated.
Root Cause: This could happen if a TKU pattern was customized (or a beta pattern was provided by BMC Support) in the past, where pattern version > current pattern version defined in TKU. It could also happen if a pattern module from the wrong Discovery version was mistakenly uploaded.
Solution: Try the following steps:
- Search for the problematic pattern module. It should show a version different than the current TKU pattern.
- Deactivate / delete the incorrect module version
- Activate the new TKU module.
It affects any Tomcat application server that is running the AJP connector, which is on port 8009 by default. If an attacker has access to this port, then it can be "can be exploited in ways that may be surprising". That is, without authentication, read any file on the server or servlet container and obtain config files or source code. Further, if the server allows file uploads: execute arbitrary code. Yes, that would be surprising.
Versions: 9.0.0.M1 to 126.96.36.199, 8.5.0 to 8.5.50 and 7.0.0 to 7.0.99 are affected
You may want to start looking in your estate with a basic query such as:
search SoftwareInstance where type = 'Apache Tomcat Application Server'
show version, listening_ports, listen_tcp_sockets, #RunningSoftware:HostedSoftware:Host:Host.name as 'Running on', #RunningSoftware:HostedSoftware:Host:Host.#DeviceWithAddress:DeviceAddress::IPAddress.#DeviceOnSubnet:DeviceSubnet:Subnet:Subnet.ip_address_range as 'Subnet'
to get an idea of what servers are out there. You could them put extra conditions on the versions and listen ports/sockets. And perhaps look at what networks they are on: focusing on less trusted ones first.
Note that although the port may be open on a non-localhost socket, it may still be blocked by a firewall - as is the case for the Discovery appliances. So then things are more complicated: do you trust all local users? In the Discovery appliance case, the local tideway user as access to everything anyway so accessing the AJP port is of no extra advantage. Unless you have created additional local limited-rights OS users for some reason.
Keep an eye on the OSU for updates to the Discovery appliances.
BMC Discovery is capable of scanning a wide variety of SNMP devices. When successful, these usually are modeled as Network Devices, SNMP Managed Devices, or Printers. However, it’s not uncommon to encounter some problems when scanning these devices. This posting will hopefully give you some tools to troubleshoot these problems, as well as some root causes and solutions for specific use cases.
The following information applies to both Helix Discovery and on-premise BMC Discovery, although some references (for example, doing a command line snmpwalk) are relevant only to on-premise BMC Discovery.
Section 1: How to troubleshoot problems with a scan, credential test, or device capture of an SNMP-enabled device?
Discovery sometimes experiences access problems when processing an SNMP-enabled device. The most common situation is when a device capture, credential test, or scan fails with "ERROR: SNMP++: SNMP request timeout" or "Device skipped - no SNMP access".
Here’s an example from a device capture:
Note that there are two ways to get a device capture (see https://docs.bmc.com/docs/discovery/113/capturing-snmp-devices-788111406.html):
- from the Discovery Access page
- from the Device Info page (which is reached from the Discovery Access page)
In some cases, doing a capture from a Device Info node may result in a blank screen. When clicking the browser "back" button, it returns to the Device Info page and briefly shows a green banner that says "Device skipped (no SNMP access)", which then fades out after a few seconds:
Here's an example of a timeout message from a credential test:
For a scan, the Discovery Access page will typically have a result of “Skipped (Device is an unsupported device)”. The session result page will show “SNMP++: SNMP request timed out”.
There are many possible reasons for this. Here are some possible causes and things to check:
- Make sure a SNMP credential is present and that the IP range includes the address being discovered (a valid credential is needed for device capture)
- Run a credential test. What is the result?
- Increase the timeout in the SNMP credential to 100 seconds and retry.
- What is the SNMP version (v1, v2c, v3) on the credential? Is the device configured to respond on that SNMP version?
- If the device supports SNMP versions other than the one specified in the credential, change the version in the credential and retry.
- For SNMP v1 and v2c, make sure the community string in the SNMP credential is correct. An invalid community string can cause a "Unable to get the deviceInfo: TRANSIENT" error.
- Ask the device administrator:
- if there might be an Access Control List (ACL) or some other configuration on the device that prevents responses to the Discovery appliance.
- if the device is configured to use the default SNMP port (161). Run nmap to confirm the port is open (see below).
- On the Discovery Access page, click on any links for "script failure" or "xx Session Results related" to look for clues.
- In the case of a timeout during a device capture, check the log in /usr/tideway/var/captures for additional clues
- In the case of a timeout during a scan, turn DEBUG logging on for Discovery, run the scan again, and check the tw_svc_discovery.log for clues. Remember to turn DEBUG logging off!
- In one case, the customer made corrections to the IP range and mask on the device, then was able to discover the device.
- From the Discovery command line, as user tideway, run the following commands and check the results:
1/ Check connectivity from the appliance to the endpoint:
2/ Check the port status of the device by running nmap. For example:
/usr/bin/nmap --privileged -sT -sU -p T:22,U:161 [device_ip_address]
The expected result is that port 161 would have a state of "open" or "open|filtered" :
3/ Do a snmpwalk to the device. For example:
/usr/tideway/snmp++/bin/snmpWalk [device_ip_address] -v2c -cpublic > /usr/tideway/snmpwalk.out
Change the SNMP version and community string as needed. If using SNMP v3, other parameters need to be specified. To see the usage notes with a list of available options, run snmpwalk with the "--help" option.
If snmpwalk also fails, please consult the device administrator.
If the problem persists, please contact Customer Support and provide the results to all the questions / checks above.
Section 2: Specific Use Cases
Use Case #1
Symptom: A scan of a supported SNMP device fails with Skipped / Unsupported device. The Discovery Access page shows a NoAccessMethod result in getMacAddresses (or other methods such as GetPortInfo).
The related Script Failure page may show the error: SNMP++: SNMP request timed out
A device capture may also fail, and the last thing written in the UI is:
Dumping range: Start of the MIB to End of the MIB
ERROR: SNMP++: SNMP request timed out
A credential test may succeed.
This problem can occur on many different devices, and has been observed on routers, load balancers, and some Lexmark printers.
By default, Discovery asks for large chunks of data at one time from the device, using the "Use GETBULK" option. Some devices may be unable to transfer so much data at one time without hitting a timeout. In other cases, the cause may be a problem with the SNMP agent on the device.
The best solution is that the problem with the device or SNMP agent is corrected. As a workaround, it is possible to disable "Use GETBULK", by editing the appropriate SNMP credential and unchecking the "Use GETBULK" option.
Use case #2
Symptom: A scan of an unsupported network device returns NoAccess instead of Skipped/Unsupported.
A test of the SNMP credential is successful.
The discovery debug log shows that:
- Discovery detects that the sysobjectid is unsupported
discovery.devices: DEBUG: no SysObjectId 188.8.131.52.4.1.388.14 found in MODELS
- Discovery reports that it can get the sysdescr, but it is UNKNOWN
api.audit: DEBUG: 184.108.40.206: snmp.getSysDesc(): Got system description status = SUCCESS
api.classifier: DEBUG: classify(): processing 'WS5100 Wireless Switch, Revision WS.02.3.3.4.0-009R MIB=01a'
discovery.heuristics.snmp: DEBUG: identifyDevice: 220.127.116.11 sysDescr is UNKNOWN
Root cause: The scan takes too much time trying other credentials (such as SSH) and hits the reasoning timeout of 30 minutes before trying the SNMP credentials.
This can occur when the system description does not contain a known keyword like "cisco" that indicates the endpoint is a network device.
To confirm the root cause, set the Discovery logging to DEBUG and run the scan again. In the discovery log, look for traces like this related to the device:
no SysObjectId <the sysobjectid of the device> found in MODELS
sysDescr is UNKNOWN
Solution: Open a support case to request that the device be integrated in Discovery. The SNMP credentials are used when the device is supported.
Use Case #3:
Symptom: An SNMP scan fails with " Unable to get the deviceinfo: TIMEOUT " after 30 minutes. The Discovery log has " credential failed: SNMP++: SNMP request timed out ".
The correct SNMP credential is at the bottom of the credential list.
The Discovery log shows that the scan had 14 SNMP credentials to try, and the first 12 failed. The 13th credential was still being tried when the scan ended.
The 14th SNMP credential (not listed in the discovery log) was actually the correct one for the device and it was at the bottom of the credential list. When this credential was moved to the top of the list, the scan was successful.
Root Cause: The 13th credential (with uuid of b6c7a4337c564c71870a0a4a50983b4d) did not timeout. To identify the 13th credential, the following was run:
-> replacing <scanner> with the actual Discovery scanner hostname or IP address, and using the uuid from the Discovery log.
Looking at the credential, it was found that the timeout was set to 9000 seconds, which exceeds the default reasoning timeout of 30 minutes. This is why the scan timed out.
Solution: The timeout on this credential was lowered to the default value.
Use Case #4:
Symptom: A scan of a supported SNMP device fails with Skipped / Unsupported device. The Discovery Access shows that getDeviceInfo, getMACAddresses, getIPAddresses, getNetworkInterfaces, and getNames all have status "OK. The getDeviceInfo method has a script failure with message "Ambiguity in determining device kind - falling back to unsupported device".
Root Cause: In the Discovery UI, on the Administration-> Discovery Configuration page, one or more of the following options have been modified to have a value of "No" :
- Use SNMP SysDescr to identify OS
- Always try "public" community when using SNMP to identify OS
- Use Open ports to identify OS
Solution: Change the above options to a value of "Yes", and the network devices will be discovered successfully.
Use Case #5:
Symptom: When scanning a Cisco Nexus device, getNetworkInterfaces fails on TIMEOUT_CallTimedOutOnClient after 30 minutes.
Root Cause #1: Cisco defect CSCtw72949. See https://quickview.cloudapps.cisco.com/quickview/bug/CSCtw72949. To work around this, Discovery always uses the getNext method instead of getBulk to scan these particular devices. This method is slower and can lead to the reported timeout.
Solution #1: Upgrade the Cisco OS. Cisco defect CSCtw72949 is fixed in Cisco NX-OS Release 5.2(1)N1(4) and above.
Root Cause #2: Same symptoms, however the device has been upgraded to a firmware version that includes the bug fix (for example "7.1(4)N1(1c)"). In this case, the root cause is a huge amount of VLANs on the device. To get edge connectivity information, Discovery is requesting info using all these VLANs and is not able to complete this before the reasoning timeout.
Solution #2: Two previous RFEs (DRDC1-10658 and DRDC1-11888) were submitted and changes were included in the September 2018 TKU. However, this may not correct the problem in all cases. An additional RFE (DRDC1-11973) has been submitted to find another way to gather this information in less time. However, as of February 2020, there is no ETA for this request.
The only known workarounds for root cause #2 are:
- Disable edge connectivity. To do this, on the Discovery Configuration page, change "Discover neighbor information when scanning network devices" to NO. Please note that on subsequent scans, all existing host-switch connections will be deleted. Reference: https://docs.bmc.com/docs/display/DISCO113/Edge+connectivity
- The scan fails because it exceeds the reasoning request timeout. It is possible to increase this timeout, however caution should be used, as doing so will force discovery to wait longer for the end of a scan, even if the scan can't finish. This could impact the performance of some scans, but there is no way to quantify this in advance.
The reasoning timeout can be increased with the command below:
tw_options -u system REASONING_REQUEST_TIMEOUT=3600000
When prompted, provide the password for the UI 'system' account.
In this example, the timeout (30 mins by default) is increased to 1 hour. It is not recommended to increase the reasoning timeout to more than 2 hours. A restart of the Discovery services is required for this option change to take effect.
Section 3: SNMP v3 specific use cases
Use Case #6:
Symptom: An SNMP v3 scan of a supported network device fails with “Skipped (Device is an unsupported device)”, or possibly a timeout in getMACAddresses.
A credential test fails with "SNMP request timed out". Increasing the timeout to 100s does not help.
An snmpwalk from the Discovery command line is successful.
Other SNMP devices are discovered by the appliance using the same credentials.
After a restart of the Discovery services (or all the services on all cluster members), the credential test succeeds, and the device is discovered successfully.
Root cause 1: Defect DRUD1-25505 - Two or more network devices present the same EngineID, which is supposed to be unique. Discovery scans the first one successfully, but then the second one fails until the cache is flushed (by the service restart) - at which point a rescan of the first would fail, and so on.
To confirm the root cause:
- If the scan works with SNMP v2 and fails with SNMP v3, the root cause is probable.
- If the SNMP v3 scan (or the credential test) fails and then works after an appliance restart, the root cause is confirmed.
- It is also possible to confirm the problem with the query below:
search NetworkDevice show name, type, vendor, model, #InferredElement:Inference:Associate:DiscoveryAccess.endpoint as 'Scanned via', #InferredElement:Inference:Associate:DiscoveryAccess.end_state as 'End State', #InferredElement:Inference:Associate:DiscoveryAccess.#DiscoveryAccess:DiscoveryAccessResult:DiscoveryResult:DeviceInfo.snmpv3_engine_id as 'SNMP v3 Engine Identifier'
If two Network Devices have the same snmpv3_engine_id, the problem is confirmed. Otherwise, restart the appliance, rescan the devices, then re-execute the query above. This is needed because NetworkDevice nodes are only created after a successful scan (which is not possible until the appliance is restarted in this case).
This query will only show the snmpv3_engine_id that Discovery was able to find. If workaround #1 below (service restart) was not used, the issue may occur even if the query above does not return anything wrong.
If the following command is executed from the appliance:
sudo tcpdump -i any -s0 host <ipAddress> -w /tmp/snmp_issue.cap
The dump may show the elements below. This is not enough to confirm the cause but it is compatible with it.
Discovery send get-request
Device send report 18.104.22.168.22.214.171.124.1.4.0 (usmStatsUnknownEngineIDs)
Discovery send get-request with EngineID
Device send report 126.96.36.199.188.8.131.52.1.5.0 (usmStatsWrongDigests)
1A- Restart the Discovery services of a standalone appliance (or the Discovery services of all members of a cluster) before scanning any of the devices that are using a duplicate engine id. Each restart will allow Discovery to scan a single one of the N devices with duplicate engineIDs. If the services are restarted once or twice a day, it could allow Discovery to scan the devices affected by this issue with a reasonable probability of success.
1B- Rescan with SNMP v2
1C- Upgrade to a version that resolves DRUD1-25505. As of January 2020, this is still in progress. When available, this change will allow the scans and credential tests to succeed even when the SNMP engine ids are duplicated.
Note that workaround 3B below (root cause 3, SNMP_USE_ENGINE_ID_CACHE) will not help if root cause 1 is confirmed.
Solution 1: Change the SNMP v3 engineID of the scanned device and make it unique. This is recommended for security reasons.
For Cisco Devices, it is possible to make it unique using MAC addresses: see https://supportforums.cisco.com/discussion/11539996/snmp-engineid-same-multiple-routers.
It may be possible to execute a similar procedure for other vendors, such as HP.
Note this solution is not suitable when pairs of master/standby devices share the same engineid (see root cause 2 below).
Root cause 2: Some devices (such as Cisco firewalls, Brocade load balancers, or Juniper devices) can be configured with an Active/Backup setup (also referred to as master/standby). This means that the active and backup devices are two different physical devices but they share internal configurations to support failover. As they share configurations, they also share SNMP v3 engineIDs. It is an SNMP v3 security standard that SNMP v3 engine IDs should be unique per device.
Workaround 2: Use the workarounds provided for root cause 1.
Root cause 3: A new network device was found, then replaced (on the same IP, i.e. was installed with a new MAC address and SNMP v3 EngineID), and rescanned.
Workaround 3: See workarounds 1A and 1B above
A) Upgrade to a version that resolves defect DRUD1-25505.
B) If not already done, upgrade to Discovery version 11.2 or 184.108.40.206 and execute the command below:
(enter system password)
Please note that this solution could have an impact on appliance performance in theory.
Use Case 7:
Symptom: When using SNMP V3, a scan of a supported SNMP device fails with various “USM” error messages
Problem: SNMP V3 scan fails with "SNMPv3: USM: Authentication failure". In some cases, the device can be discovered using SNMP v2c, but fails when using SNMP v3.
Solution: Try the following suggestions:
- The error indicates an authentication problem (for example, the digest being invalid). Please check the credential, making sure it is valid for the specified device.
- Verify that the authentication and privacy passwords are correct. Check with the device administrator.
- Test the credential by running snmpGet from the appliance command line, using SNMP V3 parameters. Run "/usr/tideway/snmp++/bin/snmpGet" to see usage notes. If snmpGet also fails, please consult the device administrator.
Problem: SNMP V3 scan or credential test fails with "SNMPv3: USM: Unknown SecurityName".
Solution: This error means that the SNMPV3 security name being used is unknown to the device. To confirm, run snmpwalk using the same SNMPV3 parameters as the Discovery credential. If snmpwalk reports the same error, consult with the device owner about the correct security name to use.
It's also possible to check the security name from the command line by running 'tw_vault_control -S'. The security name will appear (when applicable) like this:
snmp.v3.securityname = '<the value that you set>'
Check for any extra spaces or unprintable characters at the end of the security name.
Problem: SNMP V3 scan fails with Skipped / Unsupported device. A device capture fails with "no SNMP access". A snmpWalk from the appliance command line reports “SNMPv3: USM: Decryption error”.
Solution: A decryption error typically indicates that there is some problem with the Network Device security configuration. Please check with the device administrator and/or network team to confirm that these SNMP V3 values on the credential are valid for the device:
- Authentication Protocol
- Authentication Key
- Privacy Protocol
- Private key
Problem: SNMP V3 scan fails in getMACAddresses with "SNMPv3: USM: Message not in TimeWindow”. Running snmpWalk from the Discovery command line succeeds, and device MAC addresses can be found in the snmpWalk result. This indicates that the configured SNMPv3 credential has the required permissions to retrieve MAC addresses from the device.
Solution: The probable cause is that the device's SNMP agent is not able to properly process discovery GETBULK requests. As Discovery can't find the output for the getMACAddresses method in the defined time window, it reports "SNMPv3: USM: Message not in TimeWindow"
To confirm this, create a separate SNMPv3 credential just for this device and uncheck "Use GETBULK". Move this credential to the top and temporarily deactivate the original credential. Re-scan the device. If the scan completes, contact the device owner/support and ask them to check why the SNMP agent is not able to process the GETBULK requests.
While it might not quite be Spring (northern hemisphere-centric) I have already seen an odd daffodil so I am going to pretend it is.
There have been a few posts in the past discussing the importance of keeping the Discovery Access nodes under control. My previous post is from several years ago, so it is time to revisit, based on an actual recent customer experience.
Firstly, we considered a 2-member scanning cluster. Its performance was mainly OK from a user perspective, since no direct reporting was done on it. However, it suffered from strange symptoms occasionally - mainly stuck scans, that wouldn't finish or couldn't be cancelled. I have noticed that when datastore gets too large, this can lead to this sort of irregularity - and since we saw a large number of DDD nodes in the statistics page, we planned to make the DDD removal more aggressive ("Directly Discovered Data removal" setting from 28 days to 14 days, in the Model Maintenance page). We had planned for a couple of days of non-scanning, to allow the removals to complete as quickly as possible.
We did this - but unfortunately we had dramatically underestimated the time it was going to take to do these deletes in the datastore. After a few days, we could still see the model process active chugging away: the persistence queue filled up (/usr/tideway/var/persist/reasoning/engine/queue, hundreds of thousands of files), reduced, and filled up again.
Due to the pressure of getting scanning started again, we decided to do a model wipe, which is a very quick operation. Thankfully we did not need to worry about a root node key export as these were scanners with no direct CMDB connection. All data would be refreshed after scanning was resumed.
Once the scanners had been reconnected to the Proxies, and scanning started, service was restored with no reliability problems.
The next week, we started to observe similar performance problems on the 3-member consolidation cluster. For context, we were scanning about 50 k hosts, and it had been 2 months since the last compaction. We were running about 4 million DDD nodes. Firstly, we performed a compaction, with the intention speeding up subsequent deletes: we reduced the datastore size to 66% of its original size.
We wanted to reduce the aging below the existing 14 days, but the UI option has limited granularity:
and did not have our desired value of 10 days. Moreover, having been bitten by the experience on the scanners, we wanted to change in 1-day increments to make sure deletions were finished before we deduce further. The way to do this is via the tw_options command. If run like this, it will show the current setting (in seconds):
So we changed to 13 days like this:
restarted the services, and confirmed the UI showed the new "custom" value:
Then, we monitored the model process (CPU usage) and persistent queue. The next morning, we were confident it had reached a steady-state, so reduced for another day and repeated until we reached 10 days. The statistics graph looked like this:
which clearly shows as we reduced the time (yellow) bunches of deletion work was added (red/orange) which, once processed, resulted in a dramatic reduction in the total DA count (blue).
Consolidator performance is now acceptable again, and we plan to do another compaction in a few months.
An issue that BMC Discovery customers sometimes encounter is when their Discovery appliance becomes either low on disk space, or runs out of disk space completely and will not start. Below are some of the resources available to solve this problem (note that this applies to On-Premise Discovery).
- The utilization issue will usually be related to one of two partitions, and there are previous blog posts with information about both of these situations:
The /usr partition (or the /usr/tideway partition in CentOS 7)
The /mnt/addm/db_data datastore partition
- A detailed video which explains how to diagnose and solve Discovery Appliance disk space issues can be watched here:
- There are good Knowledge Articles on the subject, most of which are linked in the blog posts above. My favorite is:
It includes commands that can be used to diagnose where the utilization problem may be, common causes of partitions filling up, and methods to address the problem.
- There is documentation on a number of related subjects, including:
The built in Disk Space monitors are located at the link below. It describes describes their purpose and how to configure them. It also details viewing and managing disk space on your appliances.:
Adding new disks to your appliance or cluster:
- If the above are not sufficient to solve your disk space issue, feel free to open a Support case. Please include the following:
The output of the command df -h run from your appliance cli.
The output of the command du -h /usr/tideway | sort -n -r | head -n 20 run from your appliance cli.
I try not to think too much about getting old. It's a terrible thing, but it is better than the alternative. I can no longer pretend I am not at least seriously middle-aged and although it doesn't stop me going to an occasional club night (see you with Digitalism at Village Underground in February?), hangovers seem to take longer to clear. And my 80s cultural references don't seem resonate with customers when I am old enough to be their mother.
Similarly, software has a lifecycle - such as Discovery's, which we document here. When we released Discovery 11.3 in March 2018, the standard VM platform was based on CentOS 7, so new installation should have used that. However, we maintained an option for an in-place upgrade of existing earlier versions to 11.3 on CentOS 6. This route was taken by some customers as it was a least-effort task, that did not involve procuring new CentOS 7 VMs and migrating. It should be noted that this is the last in-place upgrade; the next release should necessitate moving to CentOS 7.
Regardless of which OS version you are running, it is highly recommended to keep up to date with our OSUs to maintained the latest stability and security fixes. Patches are obtained from the CentOS feed, which in turn is based on what Red Hat release. Their policy is that all (including security) updates for CentOS 6 will cease this year, 2020-11-21, see here. Discovery appliances are not open to the wider Internet so are not exposed to external attacks, but still contain sensitive information about an organisation's estate. Sometimes OS updates are made that fix only theoretical vulnerabilities - eg in a library that we don't use in a way that could trigger the problem. But still, it is certainly advisable and simpler to be up-to-date.
You will see that the CentOS date is a bit earlier than the end of "full support" for 11.3 on 2021-03-21. So a reasonable question is whether BMC will offer any kind of OS updates between these. I have recently learned from Product Management that all CentOS 6 updates will cease after the earlier date, in November this year. So, in order to be assured of maintaining the security of the OS, you need to plan to migrate to new CentOS 7 VMs this year (either 11.3 or the latest on-premise version at the time).
I logged Docs defect DRUD1-28495 to try and get a clarification on the main support page.
Happy New Year to all! Welcome to the roaring 20s even if we still don't have our flying cars.
Numbers are amazing things, but so ubiquitous that most of us probably don't think of them in any detail, unless you happen to be a number theorist. One of the first things we learn as a child is to distinguish one object from another, and then to give labels to different sizes of collections, and the ordering within them: the cardinals and ordinals. Then we learn the rationals, reals and the wonders of complex numbers and vectors at school. That covers what is needed for most of Science and Engineering, although you might use extensions like quaternions and tensors. In more advanced maths and theoretical physics you might encounter esoteric beasts, which I sometimes try to get my head around. My favourites are Cantor's transfinite numbers, the surreals, Grassmann anti-commuting numbers and p-adic numbers. As an aside, all natural numbers are provably interesting.
To try and drag myself back to a (more prosaic) point: numbers are also be used to version software. This is, of course, a fundamental way of keeping track of what set of features and defects are bundled up in a release, what the support status is - and perhaps what the state of stability or security is. Historically, most software I encountered used a Semantic Versioning scheme, for example three or four groups of digits, like:
We currently use this scheme for Discovery; at the time of writing the latest on-premise version being 220.127.116.11 (in our scheme the third digit group is always zero; this is patch 5 on major version 11, minor version 3). This scheme has advantages:
- Major version of 0 indicates pre-release software that is not production-ready
- Major version jumps indicate a large feature set uplift, and/or large architectural changes
- A Patch release should only contain bug fixes, not new (or retired) features.
- Compatibility, effort of upgrading and risk can be estimated: Usually it's reasonable to suppose the upgrade from 9.X to 10.0 is going to be longer/harder/more significant than, say, 10.0 to 10.1.
But even in this scheme, things are not that simple. The popular PuTTY client has been going for 20 years, and still on a version 0.X - and yet it's stable and commonly used in production. In relation to the last bullet point, I have had at least one customer who asked us not to make add fixes in a minor release because that would take extra effort to get through change control; they wanted *exactly the same code* to be called a patch release so fewer testing steps be done. The major downside to this scheme I can see is that it is not obvious from the version *when* the release was made, so a table like this is required.
Don Knuth's versioning schemes of TeX and Metafont are, let's say... idiosyncratic. They asymptotically approach π and Euler's number, respectively (currently 3.14159265 and 2.7182818). Cute as this might be, thankfully these are exceptional examples.
My first professional OS (I don't count writing university reports in GEOS) was SunOS 4 on SPARC (4.1.3 was a classic release). But when Sun moved from BSD-style SunOS 4 to SVR4-based SunOS 5, the latter was rebranded as Solaris 2.X, with the former series retroactively renamed as Solaris 1. But after Solaris 2.6, the "2.X" prefix was dropped, so thereafter we had Solaris 7, 8, 9, 10, 11... and then we started getting point releases again to the current 11.4. Under the covers, though, the SunOS reference is still visible:
Java versioning made a similar change; things made a jump after JDK 1.4 to "Java 5", but internally, the "1." prefix still exists for (say) Java 8:
except when it's not. For example, this Java 11 installation:
Perhaps the most public change was after Windows 8; there was no Windows 9, and Windows 10 was treated as a different product. Its version adopted the "modern" trend to be the Calendar-Versioning scheme.
I can't help thinking that these marketing changes make things more complex than necessary.
Although I perceive this to be a "modern" trend, some systems have been using it for several years. Notably Ubuntu Linux's first version in Oct 2004 was version "4.10". BMC has moved over to this scheme for Remedy and CMDB from version 9.1 to 19.05, as can be seen here. It is expected that Discovery will do something similar for the next on-premise release, although the last I heard no decision had been made as to exactly what this will be. The clear advantage is:
- The release year and month should be obvious.
Again, reality does not always coincide with the ideal. For example, OpenWrt's 18.06 release was in July 2018. Perhaps you can forgive one month difference. But OpenWrt 19.07 is still not released, at the time of writing (scheduled for Jan 2020). Windows 10 1809 only made public release in November. 1703 had a public release in April. Moreover there are some disadvantages:
- It's not clear from a date whether there is a major or minor change, and the associated benefits/risks/effort.
My understand is that this is not supposed to be a problem once all software release cycles are moved closer to a continuous, "agile", model: many small releases where the whole concept of a major release goes away. This is fine as a theoretical limiting case, but I am yet to be convinced it can always be achieved in practice. There are some large changes that just can't be broken down into a series of smaller ones.
I note that even BMC version support documentation seems to be rather confused to me. It's been updated recently (Dec 2019) to include the format:
- YY.YY= 4-digit year
But what does this mean? Say, "CMDB 19.11" - only the 2 digits "19" is a year, so does that mean "11" is the minor release? If so, that means that large changes can only take place once a year (major version). Moreover the wording actually asserts exactly one major architectural change per year. That can't be right. And no facilities for service packs/patch numbers here. I have an open question with management from before Christmas; I'll let you know if I get clarification.
All this complexity makes it hard for Discovery to store consistent data too. For SoftwareInstances, we have two attributes:
- version ("full version"): the internal version, with as much detail as possible
- product_version ("Product Version"): a higher-level "marketing" version
An example would be for SQL Server:
How does Discovery record Window 10? Not very well, IMHO: we set the version to 10 and don't record the YYMM version at all. I have logged defect DRUD1-25673 back in 2019-03, and I am hopeful this will be fixed in the next on-premise release. Related, is how this would be pushed to CMDB. The BMC_OperatingSystem class has fields VersionNumber and MarketVersion, but we currently make no attempt to distinguish these in the mappings:
I think we should, like for Windows:
- MarketVersion : Server 2019
- VersionNumber : 10.0.17763
or for HP-UX:
- MarketVersion : 11i v2
- VersionNumber : 11.23
I have DRUD1-26198 open for this, but so far no sign of a fix schedule. As part of a Premier Support contract, I had to resort to writing a (simple) custom pattern for my customer, who originally could only see the "11i" part of their extensive HP-UX estate.
I dedicate this desultory post to the biggest number: 40.
A nice power of 2, of course.
Perhaps a nice green.
For me, $8000 is the start of the Commodore 64 cartridge memory space.
But if you have some Hewlett-Packard SAS Solid state drives, this number is the time (in hours) that they will live. You can't make this up, but it seems that without a firmware patch, they will die irrecoverably after 32768 hours.
I don't seem to have access to any HP drives; does anyone have any that are reported by:
search DiskDrive show vendor, model processwith countUnique(0)
Hello Discovery Community,
We have recently released some new features in BMC Helix Discovery as a Service (DaaS), as well as in the December TKU and I am excited to share with you some of the details. Some of these items came directly from all of you via your interactions in the community; whether it was via an idea or general discussion.
Below is a brief outline of the new features in 19.11:
Available in BMC Helix Discovery as a Service (DaaS) Only
Integration with Credential Management Systems
We have added in DaaS, the ability to integrate with the following external credential management systems. You can now configure the integration with the providers using the vault providers page in the BMC Helix Discovery Outpost.
- Integrating with BeyondTrust Password Safe
- Integrating with Centrify Identity Platform
- Integrating with CyberArk Enterprise Password Vault
- Integrating with Thycotic Secret Server
ServiceNow CMDB Sync
With BMC Helix Discovery 19.11, you can now setup a CMDB synchronization with ServiceNow natively within DaaS. The integration will sync your BMC Helix Discovery data to a ServiceNow CMDB with standard data mappings that can be filtered and extended. Note: This feature requires an additional BMC Helix Discovery license. If you are interested in learning more, please reach out to your account manager.
Available in the December 2019 TKU
Enhancements to Cloud Data Model
Based on feedback from multiple clients, we have introduced a change in how we model the cloud data. We are now separating the cloud data by the account that it belongs to. This will allow for easy clarification on which cloud services belong to which team (cloud account).
If you discover more than one AWS Account, more than one Azure Subscription or more than one GCP Project, all the data from Cloud Region through to individual nodes within services will be clearly separated, where before it was intermingled.
As a result, the keys of all CloudRegion and CloudService nodes, and many contained nodes will change, even if you only discover a single account. If you synchronize to a CMDB, the identities of the corresponding CIs will also change.
More information can be found here.
Enhanced AWS Role-Switching
Based on feedback from clients who are scanning their cloud environments, we are introducing a new method for AWS credential management. You can now configure an AWS account that is given a list of AWS roles within the AWS console. Discovery will then have the need to only configure that single AWS credential in the vault and it will be able to discover cloud services for all roles that the AWS account was given access to. This will streamline the setup of AWS credentials and associated scan ranges within Discovery.
**Note on role-switching
The new configuration to support role switching cannot be added automatically to existing AWS credentials, and consequently, any existing scheduled AWS scans using those will fail. The workaround is to simply click *Edit* on the scheduled scan and then click *Apply*. Using the *Edit/Apply* workaround enables you to continue scanning AWS without interruption.
More information can be found here.
New Offering - BMC Discovery for Data Center - Red Hat Edition
Back at the beginning of November, we announced the extension of the Full Support End Date for BMC Discovery v11.1 to September 15, 2020. In that email, I mentioned the introduction of the future availability of a new edition of Discovery running on Red Hat Enterprise Linux 7. That edition is now available with the same functionality as BMC Discovery v11.3. If you are interested in migration and pricing details, please contact your account manager for more details.
We are excited to release these new features and offerings and I welcome your feedback as we continue to introduce new features to both DaaS and on-premise. Be on the lookout for DaaS features showing up in future on-premise releases.
Lead Product Manager
Traditionally, we have supported two types of Windows proxies:
- Credential - Windows credentials are stored in the appliance and passed to the Cred Proxy during scanning as required
- Active Directory - The proxy's service runs under an AD account, and is able to scan targets that trust that domain/account without any credentials being stored in the appliance
Windows scanning has been problematic for some, because of security concerns: in order have really useful data, you have to assign Administrator permissions to the proxy, and this is considered too risky. Some customer mitigations have included:
- Running a Credential proxy and the credentials managed by a credential manager (currently CyberArk, other integrations in progress)
- Running Windows scanning infrequently, in a tightly controlled time window, where the account is otherwise disabled.
An alternative which we have not hitherto documented is to use group Managed Service Accounts (introduced in Windows Server 2008R2). I would be interested to hear your thoughts on how you manage the security implications of Windows scanning; do you think gMSA use would help manage this problem?
It is expected that full support for gMSA will be made in the next major release for proxies/outposts as part of DRUD1-27034.
Thanks to Roland Appleby for much of the material in this post.
These were tested with Discovery 18.104.22.168 and Windows Server 2019. PowerShell commands should be run under Administrator or equivalent.
Obtain KDS root key for your domain
On the Domain Controller, run the following PS command:
If this shows that you have a KDS root key, skip the next step.
Run the following PS command to create the root key:
You will then need to wait 10 hours before continuing.
Create a domain security group for the proxy host
On the Domain Controller, run the following PS command:
New-ADGroup "BMC Discovery Proxy" -GroupCategory Security -GroupScope Global -Path "DC=npgs,DC=bmc,DC=com"
(Modify the path as required for your domain, and here the Security Group Name has been chosen as "BMC Discovery Proxy")
Add your proxy host to this security group with this PS command:
Add-AdGroupMember -Identity "BMC Discovery Proxy" -Members PROXYSERVER$
where PROXYSERVER is the proxy server host.
Create the gMSA
On a Domain Controller, run the following PS command:
New-ADServiceAccount -Name "bmc-disco-proxy" -DnsHostName "bmc-disco-proxy.bmc.com" -PrincipalsAllowedToRetrieveManagedPassword "BMC Discovery Proxy"
(note that "BMC Discovery Proxy" here must match the Security Group Name created above)
Install the gMSA on the proxy host
Reboot the proxy host to ensure that it is up to date with respect to the group membership.
Run this PS command on the proxy host:
where the name of the gMSA given here is arbitrary. You should see the new gMSA:
Add the gMSA to the local administrators group on the proxy host
Run this PS command on the proxy host:
Add-LocalGroupMember -Group "Administrators" -Member "npgs\bmc-disco-proxy$"
where "npgs" is the name of the AD domain. Alternatively, use the Windows UI tools.
Configure the Discovery Proxy to run as the gMSA account
These steps assume that you already have an Active Directory proxy installed. Stop the proxy service (if running) and change how the service logs on:
The account name should be of the form "npgs\bmc-disco-proxy$", where "npgs" is the AD domain. The password fields should be left blank. I have found that once set, this tab is greyed-out and that I couldn't change the account without deleting and re-creating the proxy service.
Grant the gMSA account permissions to discover hosts in the domain
The gMSA account needs to have the appropriate permissions to allow the Discovery Proxy access to the hosts in the domain that it is scanning. This can be done by either adding the gMSA account to an appropriate Domain Administrators group, or by adding the gMSA account to the local Administrators group on each machine individually.
It should now be possible to scan Windows hosts in the domain once a Discovery appliance has been configured to use the proxy.
Any patterns provided in the communities, provided by Users and by BMC employees are not part of the product, and are not necessarily thoroughly tested.
Use at your own risk.
NOTE: Please test all new patterns in a TEST environment!
Customers frequently ask for help with custom patterns in the Community.
Here is some general information to help you get started.
See this YouTube video: How to add or modify BMC discovered data and synchronize to CMDB? - YouTube
TPL - The Pattern Language. Some users use the terms "TPL" and "pattern" interchangeably.
A module (contained inside a TPL file) contains one or more pattern or syncmapping statements.
There are 2 types of TPL customization for the BMC Discovery product:
1) Discovery patterns typically add and/or modify Discovered information.
There are many OOTB Discovery patterns (defined by the TKU).
There may also be custom Discovery patterns.
The keyword "pattern" is used to define a pattern.
2) syncmappings define the behavior of the CMDB Sync process.
There are many OOTB syncmappings (defined by the TKU).
There may also be custom syncmappings.
The keyword "syncmapping" is used to define a syncmapping.
Should I edit the OOTB patterns and syncmappings?
If you edit the OOTB pattern, the edits will be lost with the next TKU update.
How can I change the behavior since I should not edit the OOTB patterns and syncmappings?
1) To modify the behavior of Discovery patterns, there are 2 methods:
- Preferred method: add an additional (custom) Discovery pattern
Each discovery pattern has a trigger statement. If the trigger succeeds, then the pattern body is executed.
Example of Discovery pattern:
- Discovery Override pattern
Sometimes, an Override pattern is required to modify the OOTB Discovery pattern behavior.
Only use this method when absolutely required.
An Override pattern is only to be used when an Extension can not provide the desired functionality.
An Override pattern redefines the entire OOTB pattern, with special additional syntax to Override the base pattern.
2) To modify the behavior of the CMDB syncmappings, you can add one of these 2 types of custom patterns:
- Syncmapping Extension (also called "augment")
Syncmapping extensions can add new data to the CMDB, and/or modify data in the CMDB.
Example: if you wish to add the age_count information for a Host to the CMDB, use an Extension.
Features of Syncmapping Extensions:
- Extensions can add new attributes or modify attribute values in the CMDB. However, they can not delete attributes.
- Extensions can add new relationships to the CMDB. However, they can not delete or modify relationships.
Limitations of Syncmapping Extensions:
- Extensions can not delete attributes from the CMDB.
- Extensions can not delete or modify relationships in the CMDB.
Examples of Syncmapping Extensions:
- Syncmapping Override
Sometimes, a Syncmapping Override is required to modify the OOTB behavior because of limitations of Syncmapping Extensions.
Always use a Syncmapping Extension when possible.
An Override is only to be used when an Extension can not provide the desired functionality.
An Override redefines the entire OOTB syncmapping, with special additional syntax to Override the base pattern.
Example of CMDB Syncmapping Override pattern:
Writing a custom pattern is like writing a small software program.
Here are some best practice steps for writing/testing a pattern:
1) Write the pattern on your laptop using a text editor.
Or, try the experimental IDE: Experimental development IDE for TPL
Hint: It is less confusing to always edit the pattern from your text editor or IDE instead of editing the pattern directly in the UI.
Hint: Add log.info statements while you are debugging your pattern. Later, you can change those from log.info to log.debug.
Hint: Put an identifier (such as your name) in each log.info statement so that you can easily identify that it is your log statement.
Hint: It may be helpful to number each log statement so you can easily find it in the pattern.
log.info("LISA 1: Pattern was triggered. Here I am in the pattern body. Host name=%host.name%");
log.info("LISA 2: Inside the 'for each fsmount' loop. fsmount=%fsmount.mount%");
2) Upload the pattern from the Manage->Knowledge page
The Upload action checks for syntax errors in the pattern, and if there are no syntax errors, then it Activates the pattern.
3) Fix syntax errors in your text editor or IDE, and then Upload again in the UI until there are no more syntax errors
4) If you added new attributes (which are needed by your CMDB Syncmapping) to your CMDB,
then do this after adding the new attributes one time: Restart the Discovery Services
5) Test the pattern / Fix the pattern / Test / Fix until you are happy with the pattern
2 ways to test Discovery patterns:
A) Create a Manual Group and the "Run Pattern" feature:
With "Run Pattern", you will see the log.info and log.debug statements on the UI page easily.
The "Run Pattern" tells you if any of the nodes in the Manual Group triggered your pattern.
If your pattern triggers on a SoftwareInstance, then you must have a SoftwareInstance in your Manual Group.
If your pattern triggers on a Host, then you must have a Host node in your Manual Group.
And so on.
B) Run the Discovery of the Host/Device which should trigger your pattern.
Look for your log statements in this log file: /usr/tideway/log/tw_svc_eca_patterns.log
Make sure your Discovery pattern gets triggered
The Discovery patterns have a "triggers" statement. The trigger statement is very important to define correctly.
If the trigger statement does not succeed, then the pattern body will not be executed.
Example of a trigger statement:
on process := DiscoveredProcess created, confirmed where
cmd matches regex "(?i)CouchDB\S+\\w?erl\.exe" or cmd matches unix_cmd 'beam'
4 ways to test CMDB Syncmappings:
A) Pick a pertinent Device and choose Actions->CMDB Sync (from the Consolidator)
B) With Continuous Sync running, Scan the Device (from the Scanner)
C) Perform a Resync from the Consolidator (This is long-running. Only do this when necessary).
D) Pick a Host/Device and choose Actions->CMDB Sync Preview (from the Consolidator)
If your syncmapping syncs to a new class such as BMC_FileSystem, then check the preview visualization for the new class.
If your syncmapping only changes/adds attributes, you will not see any change in the preview visualization.
All of the above actions will log data to this log file: tw_svc_cmdbsync_transformer.log
Actions A,B,C will change the data in the CMDB.
Action D will not change the data in the CMDB. It is only a Preview. But, it logs the messages to the log file.
Hint: If the CMDB Sync Preview does not work, then there is something very wrong with your pattern.
Resources to help with custom patterns.
1) The Discovery UI:
The Discovery UI has some information about creating SoftwareInstance patterns magically through the UI:
At the top, there is specific information about modeling a SoftwareInstance in Discovery.
If you wish to have a custom pattern to create an SI, be sure to read that section, as well as the documentation that it points to.
Also, in the Discovery UI are some sample templates:
Sample SI pattern templates:
Sample "location" pattern templates:
Sample pattern template to create a SQL Integration Point:
Sample pattern template which adds addition SQL calls:
Sample Mainframe pattern templates:
Sample External Event pattern template:
Sample CMDB Syncmapping templates:
To utilize one of these pattern templates, perform the following steps:
A) Download the pattern, and save it to your laptop
B) Use an editor on your laptop to edit and save the pattern.
Names surrounded by double dollar signs like $$pattern_name$$ should all be replaced with values suitable for the pattern.
C) Upload your pattern on the Manage-Knowledge page to see if there are syntax errors.
If not, Edit / Upload until the compile errors are gone.
D) Test your Discovery pattern using a Manual Group and the "Run Pattern" feature: Executing patterns manually - Documentation for BMC Discovery 11.3 - BMC Documentation
With "Run Pattern", you will see the log.info and log.debug statements for your pattern.
If you run Discovery without "Run Pattern", you will need to look for your log statements in this log file: tw_svc_eca_patterns.log
E) To test your CMDB Syncmapping, you can add log.info statements into the pattern, upload the pattern, run CMDB Sync,
and then check for the log statements in this log file: tw_svc_cmdbsync_transformer.log
2) The Community
Visit the Discovery community link: Discovery
Click on "Patterns" as seen below:
You will find sample patterns which are made freely available by other customers, and by the BMC Discovery support team and developers.
3) Look directly at the OOTB patterns that are found in the TKU
You can look at the patterns in the UI.
And/Or, you can unzip the TKU zip file, and review the abundance of patterns.
To view the patterns in the UI, you can look on the Manage->Knowledge page.
You will find the Syncmapping patterns here on the page: (Under BMC Discovery Operation -> CMDB Sync):
The Advanced Discovery training course has information about Discovery patterns and custom Discovery patterns.
To this date, the course does not teach about the CMDB Syncmappings.
5) Customer Support can help with certain questions
Support will not write custom patterns or custom syncmappings for you. But, Support may be able to help with specific questions or problems.
Support has access to some samples that may not be in the community. Certain sample patterns are attached to internal KA's.
6) BMC Consulting or BMC Professional Services
This past Wednesday, if you are subscribed to TKU release announcements, you would have received an email from me announcing the latest TKU availability. As a reminder to the timing changes from this summer, the TKU and OSU releases will be made available on EPD on the first Wednesday of the month. SaaS customers on BMC Helix Discovery will have the latest TKU applied to their Development environment on the first Wednesday of the month and their Production environment on the second Wednesday of the month.
We have a lot of exciting new content to announce and the details can be found on the October 2019 TKU Release page. Highlights include several new patterns for software products, enhancements to existing software patterns, and general bug fixes. In this latest release, we also introduced 38 new network devices, details can be found on the TKU October 2019 Network Devices page. In addition, we have introduced 4 new cloud services across Azure and Google Cloud. To find out more about the cloud providers and services within those providers that we can discover, visit Supported Cloud Providers.
We look forward to your feedback on the content that we are delivering via the monthly TKU releases. Drop a comment below or reach out to me directly.
Have a good weekend everyone!
Product Manager, BMC Discovery
Just a quick note on syslog forwarding from the Discovery appliance.
While the Discovery appliance (physical or virtual) is based on a fairly standard CentOS build, we are careful to control the packages and configurations to ensure the OS layer is reliable and predictable for the application. Thus although it is tempting for an experienced Linux administrator to want to configure things to their liking, this urge should be avoided, and limited to only those things that are explicitly documented to avoid problems in future, and potentially voiding support.
One often-requested configuration was to forward OS syslogs to a remote syslog collector. Since we hadn't officially described it in the docs, it wasn't officially supported. I am please to say we now have, here.
It's very simple to setup, and now if your organisation's policies require/recommend it, you can do so while being fully with the appliance support rules.
As part of Premier Support, I was recently on-site at a customer for a few days, doing some "mini consultancy" work, mainly looking at extending Network Device discovery. Here, I want to make some notes to highlight some defects/surprising behaviour, and some of the things I was able to help the customer with.
Standard Network Device discovery
Many customers deploy SNMP credentials to discover Network Devices, and are quite happy with the coverage of supported devices, and/or the turnaround of adding new ones in monthly TKU updates after submitting a new device capture. Typically, Discovery is used for basic inventory recognition (being synced to CMDB) and importantly the discovery of the connection between a switch and the Host nodes it is connected to. However, the customer I was working with wanted to dig deeper into the data...
Problems and Gaps Identified
No Linkage between interfaces and IPs
In contrast to Host nodes, the interfaces and IPs of a Network Devices are not shown in a unified table. Instead they are displayed separately, and by default there is no connection in the UI or data model between an IP and the interface it is connected to. It turns out that if you turn on virtual interface discovery (see Managing network device virtual interface discovery), a side effect is that you do get a link from IP to interface and vice-versa. I logged defect DRUD1-25944 for this.
Further, I my customer wanted a more unified UI for the network interface table, like we provide for Hosts. DRUD1-272124 is logged for this. In the meantime, I was able to provide my own "hotfix" to the core code to get a just-acceptable display.
We document how to enable virtual interfaces: Managing network device virtual interface discovery, however IMHO this document is lacking in several ways. It only mentions how it controls virtual interface discovery. It doesn't mention interface-IP linkage as a side effect. Why does it have to be controlled on the command line, not a UI option? Why would you not want it on by default - are there any downsides? If yes, what are they? I created docs defect DRUD1-26743 to improve this.
Not all network interfaces discovered
By turning on virtual interface discovery, more interfaces are discovered (see above). However, core code maintains a whitelist of "interesting" interface types:
0 # unknown
6 # ethernet csmacd
7 # iso88023 csmacd
8 # iso88024TokenBus
9 # iso88025TokenRing
15 # fddi
62 # fastEther
69 # fastEtherFX
71 # ieee80211
117 # gigabitEthernet
54 # propMultiplexor
161 # IEEE 802.3ad Link Aggregate
and drops any that don't match this list. This list was added a long time ago and is no longer appropriate IMHO; this was logged as defect DRUD1-26655, planned for fix in FF release, tentatively targetted for 2010-02. As part of Premier Support, I was able to provide the customer a temporary update to remove the filter until then.
Cisco firmware image file not discovered
A simple custom pattern was written to extract this from OID 22.214.171.124.126.96.36.199.1.73 and populate the Network Device node, by calling the discovery.snmpGet() function. RFE DRDC1-13530 was logged to request this OOTB, and this Idea (feel free to vote on it) was raised on request of Engineering.
Interface statuses not discovered
Chassis and cards are only in Directly Discovered Data
As part of core discovery, we create DiscoveredCard and DiscoveredChassis nodes, but these are not visible from the main Network Device page. Also - ultimately, information will need to be consumed in the CMDB, and it is not recommended to attempt to write a sync mapping directly from DDD. So, I wrote a custom pattern to copy the data from the DDD into a couple of lists of Detail nodes, for each type, and created links from the cards to their corresponding containing Chassis:
This has been logged as an improvement, DRUD1-26654 with a tentative fix date targetted around 2019-11.
DiscoveredCard nodes missing descriptions
While looking at the data for the above point it was found that most DiscoveredCard nodes have no description. We think there is more data available in the MIB than we are pulling; this was logged as improvement DRDC1-13628.
My customer was interested in extracting specific entries for different network protocols that may be configured: BGP, OSPF, and the Cisco-specific EIGRP. It was fairly simple matter to write a custom pattern to pull entries from the 3 SNMP tables and create 3 lists of Detail nodes that corresponded to these entries.
This additional data that is now in Discovery needs to be populated into the CMDB, so I shall need to write some custom sync mappings.