We are running BSA 8.5 SP1 on RHEL 5 (soon to be RHEL 6). Recently, we have noticed a performance gain from periodic restarts of the BSA server processes (the Java processes, not the entire OS). I am working on a way to automate these restarts to occur during our scheduled maintenance windows. We have 5 (soon to be 6) servers (2 config/3 job), so we should be able to cascade our restarts to minimize any impact to users.
I plan to use BAO 7.8 to handle the process of restarting the server in a cascaded order, keeping at least 1 config server and 1 job server online at all times. I have identified what I believe to be the appropriate BLCLI commands to use:
- AppServerShutdown pauseJobsByServer <APP_SERVER_NAME>
- AppServerShutdown shutdownByServer <APP_SERVER_NAME> <MAX_WAIT_TIME>
I would then use the BAO SSH adapter to verify the java processes are gone (killing any hung processes), and restart the app servers. My hope is to minimize the impact to any jobs that may be running when the restart process starts. My understanding of the 'pauseJobsByServer' command is that it will cause the specific server to stop accepting new job assignments. Does this also include new work item tasks? Do I even need to explicitly pause the job services if I am using the 'shutdownByServer' command? It seems that command will effectively include the pause function.
My next question is fairly subjective, but I will ask anyway. For anyone that may be doing something similar with scheduled restarts, what would be considered a safe max wait time? I know, it really depends on the jobs running at the time, but I am looking for an average consensus. Keep in mind, our scheduled maintenance window is 6 hrs, and we'll need to restart all servers within that window.
While the server is shutdown, we want to take the opportunity to clear the debug log directory (BLADE_HOME/NSH/debug) and clean up the app server cache (Delete cleanupAppServerCache). This will all need to happen within the 6 hr maintenance window.
Once the app servers have been restarted, will I need to execute the 'AppServerShutdown resumeJobsByServer', or is that (as I am assuming) handled by the restart?
My last question is regarding the BAO BSA adapter. If I restart the config server, will the BAO adapter connected to BSA recover automatically? I believe it will, but at what interval? Is there a way to tell BAO to restart a single adapter through an API call to the BAO grid (SOAP/Rest)?
I am looking for any input before I build the automation. I am hoping I am not the first to attempt doing this through scheduled automation. Perhaps someone has some examples they can share of how they accomplished this? !