I think, you should monitor the symptoms/factors that could cause any Job to "Waiting to run" state. You may monitor:
- No of available Worker Threads on JOB Servers
- Blocking sessions on DB
- Any of the AppServers is down?
Santhosh thank you for your response.
When jobs are stuck, yes one of the AppServers is down. So maybe i could run a blcli command looking for that?
The appserver crash issue is known, and fixed with an S.P, so i would just like to have a way to monitor when the crash occures.
2 of 2 people found this helpful
When AppServer is stopping the service, it will write in the logs as below. So, you can monitor the appserver log with the keyword Undeploying.
appserver.log:[11 Nov 2016 06:53:52,645] [Thread-1] [INFO] [::]  Undeploying
There are also many more ways to monitor the Application servers health, etc.
Sounds like a fair idea. I will check into that.