can you try running the job in debug mode? there's a property on the job called 'ENABLE_JOB_DEBUG' or something like that. set it to true. then there should be a <install dir>/NSH/tmp/debug/application_server folder on each appserver involved in running the job that will contain the debug output. let the job run as normal and the next time you see the behaviour we can look at the corresponding debug output to see what it shows.
also - is there anything running at the time the usp runs that could be interrupting the network connectivity between the appserver and the agent?
is there anything common about the servers flagged as unreachable? is it the same servers each time? or random ?
Did you get to the bottom of this?
Thanks & Regards,