1 2 Previous Next 19 Replies Latest reply: Apr 25, 2012 4:34 AM by Marko Lahtinen RSS

BMC Patrol: System Uptime monitoring

Przemyslaw Danysz

Hi everyone!

 

I'am looking for method to setup "system uptime" monitoring.

I know that function is in Patrol Express (System Performance -> System Uptime), but I need to setup this same in Patrol Central.

 

I need to monit this parameter to detect self rebooted system (AIX in this situation).

 

Patrol Version: 3.7.40

Unix KM V: 9.7.00

 

Any idea?

 

 

Best regards

Przemek

  • 1. BMC Patrol: System Uptime monitoring
    Oleg Protokolov

    Przemek, Hi!

     

    On WIndows you can use the parameter /NT_SYSTEM/NT_SYSTEM/SYSsysSystemUpTime

     

    See my post here...

    https://communities.bmc.com/communities/message/215222#215222

     

    Yet, I can't find the similar parameter for AIX and other unix...

     

    --

    Regards,

    Oleg

  • 2. BMC Patrol: System Uptime monitoring
    Jonathan Coop

    I just posted something I just put together, but I don't have an Aix system to test it on. It seems to be queued for virus checking etc, so hopefully available soon.

     

    Hope it helps

     

     

    Jon

  • 3. BMC Patrol: System Uptime monitoring
    Jonathan Coop

    Should have said it relies on the uptime command being available and returing 0 in the third column if just rebooted.

     

    Jon

  • 4. Re: BMC Patrol: System Uptime monitoring
    Przemyslaw Danysz

    Thx for replies!

     

    I wrote to BMC Support with that question, so when I will have answer from them, I will share it with you

     

    But, I'm still waiting for ideas.....,maybe use LOG Menagmnet ?, but how to configure it correctly...

  • 5. Re: BMC Patrol: System Uptime monitoring
    Michael Ashall

    Hi,

     

    Worked on a project and we got uptime added to the KM for AIX. I believe this may be added to the latest KM or the next version when it goes GA.

     

    You would have to download and validate.

     

    Thanks

     

    MASH

  • 6. BMC Patrol: System Uptime monitoring
    Przemyslaw Danysz

    Mash,

     

    Can you tell me, when it will be downloadable ?

     

     

    Regards

  • 7. BMC Patrol: System Uptime monitoring
    papabear075

    We wrote a KM to do that on Unix servers.  If you are good with writing KM's you could basically write one similar to this:

     

    # find the last valid boot stamp for the server in the bootlog and return it as epoch time

    function check_bootlog(logfile)

    {

       local b_data, b_lines, b_last, b_outageinfo, o_hdl, b_time ;

       b_time = time() ;

       if ( file(logfile) )

       {

          # printf("found logfile: %s\n", logfile) ;

          b_data = cat(logfile) ;

          b_lines = lines(b_data) ;

          if ( b_lines )

          {

             b_last = trim(nthargf(tail(b_data,"1"),"2",";"),"\n\r","2") ;

             # printf("b_last: >%s<\n", b_last) ;

             if ( b_last != "" )

             {

                # printf("length of b_last: %d\n", length(b_last)) ;

                if ( length(b_last) == 10 )

                {

                   return b_last ;

                }

                else

                {

                   printf("UPTIME: Corrupt boottime log, last line: %s\n", tail(b_data,"1") ) ;

                }

             }

          }

          return "";

       }

       else

       {

          # printf("the outage log doesn't exist.  Create it with a 0 time outage.\n") ;

          b_outageinfo = sprintf("%s;%s;initializiation date\n", b_time, b_time) ;

     

          o_hdl = fopen(logfile,"a") ;

          write(o_hdl,b_outageinfo) ;

          close(o_hdl) ;

          return "b_time";

       }

    }

     

    function who_are_u(who_data)

    {

       local x, who_time, nf  ;

     

       # normal who output:    .       system boot  Jan 31 12:00

       # Suse who output:           system boot  2009-06-10 07:33

     

       x = ntharg(who_data,"1") ;

       if ( x != "system" )

       {

          # standard unix result

          who_time = ntharg(who_data, "4-", " ", " ") ;

          return who_time ;

       }

       else

       {

          # probably a linux box

          who_time = ntharg(who_data, "3-", " ", " ") ;

          nf = 0 ;

          foreach word val ( who_time )

          {

             nf++ ;

          }

          nf = int(nf) ;

          # if 3 fields in who_time, then the format is Mon da HH:MM

          # else the format is probably the SuSE special format of YYYY-mo-da HH:MM

          #printf("num fields: %s\n", nf) ;

          if ( nf == 3 )

          {

             return who_time ;

          }

          else

          {

             # suse_yr = nthargf(ntharg(who_data,"3"," "),"1","-") ;

             suse_day = nthargf(ntharg(who_data,"3"," "),"3","-") ;

             suse_mo = nthargf(ntharg(who_data,"3"," "),"2","-") ;

             if ( suse_mo == 01 ) { suse_mo = "Jan"; }

             elsif ( suse_mo == 02 ) { suse_mo = "Feb"; }

             elsif ( suse_mo == 03 ) { suse_mo = "Mar"; }

             elsif ( suse_mo == 04 ) { suse_mo = "Apr"; }

             elsif ( suse_mo == 05 ) { suse_mo = "May"; }

             elsif ( suse_mo == 06 ) { suse_mo = "Jun"; }

             elsif ( suse_mo == 07 ) { suse_mo = "Jul"; }

             elsif ( suse_mo == 08 ) { suse_mo = "Aug"; }

             elsif ( suse_mo == 09 ) { suse_mo = "Sep"; }

             elsif ( suse_mo == 10 ) { suse_mo = "Oct"; }

             elsif ( suse_mo == 11 ) { suse_mo = "Nov"; }

             elsif ( suse_mo == 12 ) { suse_mo = "Dec"; }

             suse_time = ntharg(who,"4"," ") ;

             boottime = sprintf("%s %s %s", suse_mo, suse_day, suse_time) ;

             return boottime ;

          }

       }

    }

     

    function main()

    {

       suse_ver = 10 ;

     

       if ( !exists("/UPTIME/UPTIME"))

       {

          create("UPTIME","","OK") ;

          apptype = get("/appType") ;

          if ( apptype == "SOLARIS" )

          {

             set("/UPTIME/UPTIME/LoadAvg_10min/active","0") ;

             set("/UPTIME/UPTIME/LoadAvg_5sec/active","0") ;

             set("/UPTIME/UPTIME/LoadAvg_30sec/active","0") ;

          }

          elsif ( apptype == "HP" )

          {

             set("/UPTIME/UPTIME/LoadAvg_10min/active","0") ;

             set("/UPTIME/UPTIME/LoadAvg_5sec/active","0") ;

             set("/UPTIME/UPTIME/LoadAvg_30sec/active","0") ;

          }

          elsif ( apptype == "RS6000" )

          {

             set("/UPTIME/UPTIME/LoadAvg_1min/active","0") ;

             set("/UPTIME/UPTIME/LoadAvg_5sec/active","0") ;

             set("/UPTIME/UPTIME/LoadAvg_30sec/active","0") ;

          }

          elsif ( apptype == "OSF1" )

          {

             set("/UPTIME/UPTIME/LoadAvg_10min/active","0") ;

             set("/UPTIME/UPTIME/LoadAvg_15min/active","0") ;

             set("/UPTIME/UPTIME/LoadAvg_5min/active","0") ;

          }

          elsif ( apptype == "Linux" )

          {

             if ( file("/etc/SuSE-release") )

             {

                set("/UPTIME/UPTIME/LoadAvg_10min/active","0") ;

                set("/UPTIME/UPTIME/LoadAvg_30sec/active","0") ;

                set("/UPTIME/UPTIME/LoadAvg_5sec/active","0") ;

                apptype = "SUSE-Linux" ;

             }

             else

             {

                set("/UPTIME/UPTIME/LoadAvg_10min/active","0") ;

                set("/UPTIME/UPTIME/LoadAvg_30sec/active","0") ;

                set("/UPTIME/UPTIME/LoadAvg_5sec/active","0") ;

             }

          }

          else

          {

             set("/UPTIME/UPTIME/LoadAvg_10min/active","0") ;

             set("/UPTIME/UPTIME/LoadAvg_15min/active","0") ;

             set("/UPTIME/UPTIME/LoadAvg_5min/active","0") ;

             set("/UPTIME/UPTIME/LoadAvg_1min/active","0") ;

             set("/UPTIME/UPTIME/LoadAvg_30sec/active","0") ;

             set("/UPTIME/UPTIME/LoadAvg_5sec/active","0") ;

             set("/UPTIME/UPTIME/LoadAvgColl/active","0") ;

          } 

       }

     

       totaltime = 0 ;

     

       hostname = get("/hostname") ;

       ostype = get("appType") ;

       if ( file("/etc/SuSE-release") )

       {

          ostype = "SUSE-Linux" ;

          suse_ver = nthargf(grep("VERSION",cat("/etc/SuSE-release")),"3") ;

          printf("SUSE Version: >%s<\n", suse_ver) ;

       }

       patrol_home = get("/patrolHome") ;

       shutdownfile = "uptime-bootlog" ;

       bootinfo = "" ;

       outagelog = "outage.log" ;

       oldbootdata = "" ;

       DEBUG = 1 ;

     

       if ( ostype == "NT" )

       {

          bootlogloc = patrol_home.shutdownfile ;

          outagelogloc = patrol_home."log\\".outagelog ;

          sysoutput = trim(ntharg(internal("GetPerformanceValue System * \"System Up Time\""),"2-","="),"\n");

          sysoutput = int(internal("CalcPerformanceValue ".sysoutput." ".sysoutput)) ;

          if( ! isnumber(sysoutput) )

          {

             exit ;

          }

          totaltime = int(sysoutput / 60) ;

          days = int(totaltime / 1440) ;

          hours = int((totaltime % 1440)  / 60) ;

          minutes = int(totaltime % 60) ;

          uptime = "This server has been up for ".days." days, ".hours." hours and ".minutes." minutes\n" ;

       }

       else

       {

          bootlogloc = patrol_home."/".shutdownfile ;

          outagelogloc = patrol_home."/log/".outagelog ;

          tzdate = ntharg(system("date"),"5");

          whocmd = "who -b" ;

          uptimecmd = "uptime" ;

          who = trim(nthlinef(system(whocmd),1),"\n") ;

     

          # what kind of who output?  year-mo-da or Mon da time?

          # Look at the output and figure it out

     

         

     

          bootinfo = who ;

          uptime = system(uptimecmd) ;

          if ( DEBUG )

          {

             printf("who output: %s, uptime output: %s\n", who, uptime) ;

          }

     

          cyear = trim(asctime(time(),"%Y"),"\n") ;

     

          boottime = who_are_u(who) ;

          # printf("Boottime after processing: %s\n", boottime) ;

     

          uptimelist = replace(uptime,",","\n") ;

          days = trim(ntharg(grep("day",uptimelist),"3"),"\n") ;

          # printf("Server has been up >%d< days\n", days) ;

          if ( days == "" ) { days = 0 ; }

     

          cnt = 3 ;

         

          while ( cnt )

          {

             esecs = convert_date("Mon ".boottime." ".int(cyear),tzdate) ;

             currentboottime = esecs ;

             nsecs = time() ;

             rsecs = nsecs - esecs ;

             rdays = int(rsecs / 86400) ;

             adays = int(rdays - 1) ;

             zdays = int(rdays + 1) ;

             if ( DEBUG )

             {

                printf("Elapsed time based on who output, CDT and the year: %s, (%s)\n", int(cyear), boottime) ;

                printf("Number of elapsed seconds since boot: %s\n", int(rsecs)) ;

                printf("Number of elapsed days since boot: %s\n", int(rdays)) ;

             }

     

             if ( ( days != adays ) && ( days != rdays ) && ( days != zdays ) )

             {

                cyear-- ;

             }

             else

             {

                # totaltime = int(rsecs / 60) ;

                totaltime = rsecs ;

                uptime = uptime.who ;

                last ;

             }

             cnt-- ;

          }

       }

     

       # lasttouch = file(bootlogloc) ;

       lasttouch = trim(pconfig("GET","/CustomApplication/UPTIME_POLLINGAGENT/polltime"),"\n","3") ;

     

       if ( DEBUG )

       {

          printf("bootlogloc: %s, lasttouch: %s, totaltime: %s\n", bootlogloc, lasttouch, totaltime) ;

       }

     

       if ( ! exists("/UPTIME_POLLINGAGENT") )

       {

          exit ;

       }

     

       if ( get("/UPTIME_POLLINGAGENT/active") != 2 )

       {

          set("/UPTIME_POLLINGAGENT/active","2") ;

       }

     

       if ( totaltime )

       {

          lasttime = get("/UPTIME/UPTIME/Uptime/value");

          lasttext = get("/UPTIME/UPTIME/UptimeInfo/value");

          set("/UPTIME/UPTIME/Uptime/value",totaltime) ;

          if ( lasttime > totaltime )

          {

             annotate("/UPTIME/UPTIME/Uptime","%Text,%Text",uptime, lasttext) ;

          }

     

          ###############################################################################################

          # Outages:

          # lasttime will be blank if the agent is restarted. 

          # If the agent is restarted because of a reboot, the boot time will have changed.

          # Only check to see if a reboot occurred if the lasttime var is blank.

          if ( lasttime == "" )

          {

             # printf("A reboot might have occurred\n") ;

             defaultinterval = trim(pconfig("GET","/ServerUptimeSetup/DefaultInterval"),"\n","3") ;

             nextinterval =  trim(pconfig("GET","/ServerUptimeSetup/NextInterval"),"\n","3") ;

             if ( isnumber(nextinterval) )

             {

                defaultinterval = nextinterval ;

                pconfig("REPLACE","/ServerUptimeSetup/NextInterval","") ;

             }

             if ( ! isnumber(defaultinterval) )

             {

                defaultinterval = 20 ;

             }

             defaultinterval = defaultinterval * 60 ;

     

             if ( bootinfo == "" )

             {

                # this is an NT server run boottime.exe to get current boottime.

                bootinfo = trim(nthlinef(system("boottime"),"1"),",") ;

                currentboottime = convert_date(bootinfo,"CST") ;

             }

             oldbootdata = check_bootlog(outagelogloc) ;

     

             if ( DEBUG )

             {

                printf("Verifying outage info!\nCurrent boot time: %s\nPrevious boot time: %s\n", asctime(currentboottime), asctime(oldbootdata) );

             }

             # if bootinfo (current boot time) and oldbootdata (recorded boot time) are different, the server has been rebooted

             if ( currentboottime == "" || oldbootdata == "" )

             {

                printf("Invalid boot time: UPTIME: currentboottime: %s, oldbootdata: %s\n", currentboottime, oldbootdata) ;

                exit ;

             }

            

             # this should read if currentboottime != oldbootdate

             # if ( ( currentboottime - defaultinterval ) > oldbootdata )

             if ( currentboottime != oldbootdata )

             {

                # if the server has been rebooted, then a file called PATROL-ABORT will be found in $PATROL_ROOT/tmp

                # the timestamp on the file is the actual shutdown initiation time for the server.

                # if the file does not exist, check to see if the reboot has been reported as the reboot was not "PLANNED"

                #

                if ( ostype == "NT" )

                {

                   abortfile = patrol_home."tmp\\PATROL-ABORT" ;

                   aborttime = file(abortfile) ;

                }

                else

                {

                   abortfile = patrol_home."/../tmp/PATROL-ABORT" ;

                   aborttime = file(abortfile) ;

                }

     

                startoutage = lasttouch ;

                endoutage = currentboottime ;

                outageinfo = sprintf("%s;%s;%s\n", startoutage, endoutage, time()) ;

                if ( DEBUG )

                {

                   printf("The server has been rebooted, the current boot time minus %s seconds\nis greater than the boot time in the boot time log!\n", defaultinterval) ;

                   printf("The outaged started on: %s\n the outaged ended on: %s\n", startoutage, endoutage ) ;

                }

                if ( ( startoutage < endoutage ) && ( isnumber(startoutage) ) )

                {

                   set("/UPTIME/UPTIME/OutageDuration/value",(endoutage - startoutage)) ;

     

                   outageloghdl = fopen(outagelogloc,"a") ;

                   write(outageloghdl,outageinfo) ;

                   close(outageloghdl) ;

     

                   if ( ! aborttime )

                   {

                      # trigger an event to notify of the unplanned reboot

                      event_trigger2("SERVER_UPTIME.SERVER_UPTIME.Uptime","STD","41","ALARM","5",totaltime) ;

                   }

                   else

                   {

                      if ( ( endoutage - startoutage ) > defaultinterval )

                      {

                         event_trigger2("SERVER_UPTIME.SERVER_UPTIME.Uptime","STD","41","ALARM","5",totaltime) ;

                      }

                      else

                      {

                         event_trigger2("REBOOT.REBOOT.ServerReboot","STD","41","WARNING","5",hostname) ;

                      }

                      remove(abortfile) ;

                   }

                }

     

             }

             set("/PATROL_INIT/active","2") ;

          }

          if ( file(outagelogloc) )

          {

             # Read the outages since the first of the month and set them in the environment.

             # 3024000 equals 35 days

             firstofmonth = time() - 3024000 ;

             filter = asctime(time(),"%b") ;

             percentup = ntharg(grep(filter."  ",get("/UPTIME/UPTIME/YearlyUptimeReport/value")),"2") ;

             if ( isnumber(percentup) )

             {

                outagerpt = "\n\nPercent up this month: ".percentup ;

             }

             else

             {

                outagerpt = "\n\nPercent up this month: N/A" ;

             }

             outagerpt = outagerpt."\n###################################################\nSERVER OUTAGES FOR THE PAST 35 DAYS:\n" ;

             outagerpt = outagerpt."Start                       End\n" ;

             foreach line outage ( cat(outagelogloc) )

             {

                start=nthargf(outage,"1",";") ;

                end = nthargf(outage,"2",";") ;

                if ( start > firstofmonth || end > firstofmonth )

                {

                   outages = outages.start.";".end."\n" ;

                   outagerpt = outagerpt . sprintf("%-28s%-28s\n",asctime(start),asctime(end)) ; 

                }

             }

             set("outages",outages) ;

          }

          set("/UPTIME/UPTIME/UptimeInfo/value",uptime.outagerpt) ;

          if ( get("/PATROL_INIT/active") != 2 )

          {

             set("/PATROL_INIT/active","2") ;

          }

     

       }

    }

  • 8. BMC Patrol: System Uptime monitoring
    Marko Lahtinen

    I wonder why something like system uptime parameter is not native to Patrol products.

  • 9. BMC Patrol: System Uptime monitoring
    Oleg Protokolov

    Marko Lahtinen, Hi!

     

    Not so bad as it seems ...

    The system uptime parameter is always available in the objects tree of PA

    Try to execute this:

     

    %PSL print(get( "/uptUptime" ));

     

    The '/uptUptime' variable's value is available on Windows, AIX, Linux, etc.. PATROL agents:

     

    #AIX:
    uptUptime = "45
    days,
    22:50";

     

    #Windows:
    uptUptime = "0 Days
    , 00:04:38";

     

    #Linux:
    uptUptime = "97
    days,
    20:39";

     

    You can create you own KM, and get uptUptime value (for example, in days) by this simple code:

     

    # Supports: Windows, AIX, Linux, etc...

    function getSystemUpTime()

    {

      return poplines( nthargf( get( "/uptUptime" ), "1-", " " ), 1, "w" );

    }

    set( "value", getSystemUpTime() );

     

    Or, you can download a ready-made module, created by joncoop2 from this link https://communities.bmc.com/communities/docs/DOC-18518#comment-10184

     

    --

    Regards,

    Oleg

     

  • 10. BMC Patrol: System Uptime monitoring
    Marko Lahtinen

    Yes, I know, but thanks anyway. I also have written a KM to utilize this parameter. However I would like to see it built in in to the OS KMs.

  • 11. BMC Patrol: System Uptime monitoring
    Rahul NameToUpdate

    Hi Oleg,

     

    I copied the upTimeChecker.km which Jon has made to my Linux box. When I loaded this KM on my Patrol Central, I am just getting the value in days. What if I need the value in {days:minutes} as per the output in 'uptime' unix command.

     

    -Rahul

  • 12. BMC Patrol: System Uptime monitoring
    Rahul NameToUpdate

    I am not good at PSL coding..

  • 13. BMC Patrol: System Uptime monitoring
    Oleg Protokolov

    Rahul,

     

    Try this code...

     

    # Supports: Windows, AIX, Linux, etc...
    function getSystemUpTime()
    {
      local nf, days, hours, mins;
      nf = days = hours = mins = "";
      nf = matchline( get( "/uptUptime" ), "\\([0-9]+\\)[^0-9]+\\([0-9]+\\):\\([0-9]+\\)", "t", days, hours, mins );
      return ( 3 != nf ) ? "" : 1440 * days + 60 * hours + mins;
    }

    set( "value", getSystemUpTime() );

     

    --

    Regards,

    Oleg

  • 14. Re: BMC Patrol: System Uptime monitoring
    Jonathan Coop

    What would you like the parameter value to be? Remember that parameters that can alarm adjustable ranges, can hold only a single numerical value, and that there is an upper limit to a parameter value, you could have the code modified so that it goes to fixed value of say one if the uptime is less than a user definable number. But ease suggest what you would like. Jon

     

    Sent from my iPhone

1 2 Previous Next