SNMP Troubleshooting and SNMP Patterns

Version 8
    Share:|

    This document is for troubleshooting low-level SNMP problems occurring on ADDM appliances.  I also cover ADDM SNMP customizations using the discovery.snmp* TPL functions.

     

    Motivation

    The procedures here are for finding out details about why low-level SNMP failures are occurring.  These problems should always result in DiscoveryAccess Skips (with UnsupportedDevice) or access failures (NoAccess), but can also result in DiscoveryAccess Errors.  This last happens most often with non-core devices like cameras and IP phones that have a really bad SNMP implementation.  Postnote:  I've just expanded this document to cover SNMP Pattern development.

     

    If you find out what specifically is causing the failures, you will be in a much better position to explain the errors, to get the device owners to make configuration fixes (or disable SNMP on the devices), to raise an issue with the device vendor, or to work with ADDM support to work around or resolve the problem.

     

    These problems sometimes occurs even when no SNMP credential is configured, because except for rare cases (where custom ADDM settings have been applied), public SNMP access will be attempted on devices to get basic identifying information even if you have no SNMP credentials enable, and this can trigger the problem.

     

    This document does not cover regular SNMP access failures.  These can be corrected by providing the needed network connectivity and configuring the SNMP credentials correctly.  Credential tests are useful for this.  You should definitely eliminate these easier possibilities before jumping into detailed SNMP troubleshooting.

     

    The remainder of this document assumes that the problem is not with network access to the device or mis-configured SNMP credential.

     

    Use the ADDM UI

    Drill down on the DiscoveryAccesses for attempts on the device to see the DeviceInfos.  If DeviceInfos are present, scrutinize for useful information like the device type, operating system, vendor, and model.  Often ADDM will fail to generate any DeviceInfo.

     

    Perform Credential Tests.  Both the general credential test that walks through all applicable credentials in order, and credential-specific tests.  See what effect disabling and re-ordering different SNMP credentials has.  The credential that you intend to use for this device may not be the one eliciting the problem (for example, the problem could be SNMP-version specific to a SNMP credential that you don't need to access).  In that case, you may be able to avoid the problem by re-ordering the credentials or limiting the range of the problematic one.

     

    Search ADDM to see if you have ever scanned this device successfully.  Once you know the device model, search ADDM to see if you are able to scan any other instances of the model successfully.  Try credential tests on other instances of this model.

     

    Verify SNMP Connectivity with Nmap

    If your scanner can't connect to the SNMP agent, then you're obviously dead in the water.  From the scanner, run as root:

    nmap -sU -pU:161 <DEVICE_ADDR>

    Look at the value under STATE.  If it says "open" then you are good.  If it doesn't then the device doesn't have an SNMP agent running (on the expected, default port) or there is some problem getting access (like a firewall) to it.

     

    Use a Web Browser

    In many cases, the device will provide a web interface for management or monitoring.  A login will usually be required before you can see details about the device.  Nevertheless, you will usually be able to determine the device type, if not the vendor and model.

     

    Use SNMP++ Utilities

    I am going to document the most useful uses of the SNMP++ utilities.  Before ADDM version 9, ADDM shipped with net-smp-utils instead of snmp++.  If you are using pre-9 ADDM, you will have to convert the command syntaxes to that of the corresponding net-snmp-utils commands.

     

    If you have multiple instances of the problematic device model, run SNMP++ commands against them and compare results.  If you can successfully scan at least one of them with ADDM, then you have an excellent control case.

     

    I can find no man pages for SNMP++ on the web, but you can get syntax help by running any of the SNMP++ commands (on an appliance) without any arguments.  The net-snmp-utils documentation can be useful because the SNMP++ commands emulate what the net-snmp-utils commands do-- just be aware that the command syntaxes will be a little different.  The -t switch of the SNMP++ commands does not work. If you have a problem with latency where you need more than the default timeout of 1 s., the SNMP++ commands will be useless to you (net-snmp-utils does not suffer from that problem, but I am not taking the time to document how to install net-snmp-utils).  When looking at the documentation, be aware that much of it is concerned with trap functionality that is not relevant to ADDM.  Another limitation is that it ignores community strings that contain hyphens.  Generally, be very skeptical of SNMP++ because it is much less reliable than net-snmp-utils.  Always check the first line of output from the SNMP++ programs that echoes many of the parameters actually used (SNMP version, comm. string, timeout, retries).  In many cases, this will show you that SNMP++ is ignoring some parameter that you have specified.

     

    For shell scripters, be aware that the SNMP++ commands are not very well-behaved.  The commands do not return meaningful exit statuses and do not route messages appropriately to stdout and stderr.  They also echo the used community string to the screen (a policy widely recognized as wreckless for at least 20 years).

     

    You need a command-line shell as user tideway, and it will be much more convenient if you add the SMP++ binary directory to your shell search path like this.

    PATH="$PATH:$HOME/snmp++/bin"
    

    SNMP++ commands generally use syntax like:  COMMAND <DEVICE_ADDR> [OID] [-options...]

    DEVICE_ADDR may be a hostname (which must obviously resolve successfully) or an IP address.

    OID and options are optional and whether you need them depends on the specific command and what you want to do.

    SNMP version defaults to 1, so if you want to query with SNMP version 1, you don't need to specify the version.

    Community string value defaults to "public", so if you want to use that community string you don't need to specify it.

     

    • To fetch the device's sysDescr value using publiccommunity string and SNMP version 1:
      snmpGet <IPADDR> 1.3.6.1.2.1.1.1.0
      
      To do the same thing with a commity string other than publicand SNMP version 2c:

      snmpGet <IPADDR> 1.3.6.1.2.1.1.1.0 -v2 -C<COMM_STRING>

      SNMP version 3 requires more authentication settings.  If you're going to running multiple queries against a v3 device, you will probably want to put the command into a script file.  For any file containing a password, protect it with a command like chmod 0600 file.nameand remove the file when you are finished with it.

      snmpGet <IPADDR> 1.3.6.1.2.1.1.1.0 -v3  -sn<SEC_NAME> -sl<SEC_LVL> -authPROT<AUTH_PROTO> -privROOT<PRIV_PROT> -ua<AUTH_PWD> -C<PRIV_PWD>

      Run just <snmpGet> -sl< to see all of the available options and their default values.
    • To fetch the device's sysObjectID, use a snmpGet command just like the prevoius commands, but specify OID of 1.3.6.1.2.1.1.2.0.  The sysObjecID value retrieved should begin with 1.3.6.1.4.1.

    • To fetch the device's sysName, use a snmpGet command just like the previous commands, but specify OID of 1.3.6.1.2.1.1.5.0.

    • Other useful generic OIDs are listed in the SNMP v2 MIB.  For fetching non-table/entry scalar values (like those listed above), append ".0" to the OID in your snmpGet command.
    • snmpWalks are like recursive gets.  Specify a non-leaf OID and option <-S> to see from that OID down.  Beware that the -S option is known to also skip some nodes under the specified point, but without it it will definitely display stuff outside of that branch.  I have had situations where snmpWalk does not show everything beneath the specified OID (with or without -S) even though net-snmp-lib's snmpwalk command does.  It is often useful to display the Sys information.  Here' an example to fetch the Sys OIDs using SNMP v2 and <public> community string:

      snmpWalk <IPADDR> 1.3.6.1.2.1 -v2 -S

      and here's an example that fetches the vendor-specific MIB information:

      snmpWalk <IPADDR> 1.3.6.1.4.1 -v2 -S

    As stated above, sometimes I just can not get snmpWalk to output the hits below a certain point, even though the entries are really being returned by the device.  Part of the problem seems to be with the -S switch.  It's much more reliable to omit the -S and use a grep pipe to narrow to what you are looking for.  The draw-back to this method is that you are taxing the device to report much information that you are actually interested in.  For example:

    • snmpWalk <IPADDR> 1.3.6.1.4.1 -v2 | grep '^1\.3\.6\.1\.4\.1\.'

    There are some other SNMP++ programs in the ~tideway/snmp++/bin/directory, but they are rarely useful for non-trap read-only troubleshooting.

     

    Use MIBdumper

    MIBdumper is a wrapper program that BMC support may have you use to gather a bunch of details.  If they want to use it, BMC Support will provide the program and instructions on how to install and use it.

     

    MIB References

    I prefer OIDView's MIB Library.  If you can capture the target device's sysObjectID value, search for that using the MIB Library's search field.  If you get no hits for OIDs ending in ".1", keep repeating the search removing ".1" from the end until there are no more.

     

    When viewing search results, it's not immediately obvious when searches made no hits.  Look for "... did not match any documents...", and skip software ads in the results listing (especially one for "SNMP Monitor Software").

     

    If you have SNMP access to the target device from your workstation or some computer that you have graphical network access to, there are graphical SNMP tools that automatically load resolve OIDs with internal MIB libraries.  You can also configure SNMP++ or net-snmp-libs to add MIB libraries.

     

    SNMP Pattern Development

    By "SNMP Pattern Development", I mean development of patterns that use discovery.snmpGet or discovery.snmpGetTable calls.  See my snmpSysDescr pattern for a simple example.

     

    The binary parameters to the functions and the binary functions, both new with ADDM 10, allow you to make use of SNMP non-string values.

     

    You use these TPL functions when you need to retrieve some target device information using SNMP.  Use SNMP++ or some other SNMP tool to run test queries to check and verify that the device will provide the information that you seek. You need to know something about the target devices so you can find out what MIBs may be implemented on that device.  Fetch the generic OID values as described above, and search for the model and vendor in the MIB library.

     

    Generally, you must append ".0" to fetch non table/entry scalar OID values.

     

    The oid_table paramter for discovery.snmpGet is a smple mapping of full OIDs to arbitrary strings that will be used as keys in the key/value map that is returned.  There is no necessary correspondence between MIB OID names/labels and the strings you specify, since (unlike most SNMP APIs), TPL has no support for MIB library lookups.

     

    To fetch multiple values from a table, it usually saves a lot of code to use discovery.snmpGetTable.  I find the coding much more efficient and easy to specify a table OID as the table_oid parameter.  That will return to you a list of rows, where each row represents one object on the device, each of which is a map (like an ADDM node) from key strings that you specified in the column_table to leaf values for that object.  (You could alternatively specify an entry OID, as described in the manual page, but output from that is usually messier to work with).  Unlike disccovery.snmpGet, the specified column_table maps from the final digit of leaf attribute OIDs to arbitrary strings that will be used as keys in the key/value maps for each row that is returned.  There is no necessary correspondence between MIB OID names/labels and the strings you specify, since (unlike most SNMP APIs), TPL has no support for MIB library lookups.