Steffen, You're most likely right that this is due to the load. This particular error has not been registered in BladeLogic yet (with exception of one active ticket that is yours), but this seems to happen with other cygwin builds based on web search. This seems to be some sort of a race condition when work items are executed in parallel.
As a workaround, see if you can set the parallelism setting of the job to something finite and test. Try with different value to see if you can find a sweet spot.
In the end, this may need to be worked at the code level (aka: out development team), so I will suggest to TSA.
we already tried limiting the work-Items in parallel.
We are now down to 50 items, distributed to 8 Job-Enginers hosten on 4 physical AppServers.
Unfortunately we still see the issue from time to time....
Hi just an update from my side,
we are in contact with support and they seem to believe it is indeed an issue with cygwin.
We still do not have a real workaround yet, as also limiting the number of parallel targets to 30 from time to time produces that issue.
Just yesterday we have seen it happening with 6 NSH jobs in parallel, were each one was just running against a single target.