It seems like the user using which you are trying to execute the job is not having the necessary permission to run it which the system user has the permission to do..and also check if proper ACL have been pushed with the user which is failing to execute the job..
The user does have the permissions because one time it will work on one application server and the next time not.
The time that it fails the rscd log shows it comming through as SYSTEM rather than the user
Can you check if any string is there like nouser in your user.local file..??
have you gone though the initial checklist..I mean application to target connectivity..agent status etc..??
yes all done, and yes there is nouser entry in the users.local
The issue is and the question is what causes the job to come through as SYSTEM one time and as the user:role another.
to get things a bit more clear I assume your environment is like the following:
- you have two application servers running BBSA (version? arch?) both connected to the same database and fileserver?
- you have ONE target host running the rscd-agent
- you have ONE nsh-script job once running on appserver1 and once running appserver2
- running on appserver1 (e.g.) always succeeds
- running on appserver2 it sometimes succeeds and sometimes fails
Am I right? If not, please describe your configuration exactly.
- There are two application servers running BBSA (8.2 SP3 on Windows) both connected to the same database and file server
- The application servers themselves are the targets as it is a script to dsync the extended_objects from the fileserver to a local directory
- Sometimes it will works on app1 and fail on app2, and sometime the other way round. It always fails on one
- On the server that fails the entry in the rscd.log show it is comming through as SYSTEM rather than the role/user
I see - interesting scenario
Is any of the two applicatioin servers also the fileserver? That's the only thing I associate with a System-account. And if so, is System:System included in the users.local-file of the FileServer-AppServer (see:https://docs.bmc.com/docs/display/public/bsa82/File+server+requirements)? A you mentioned that such an entry would fix the problem, maybe we could proceed with this.
No the file server is separate
The SYSTEM entry that I mentioned to fix this is on both of the application servers.
So I have added (exact case required) - SYSTEM rw,map=localadminaccount
to both of the application servers and the job always completes sucessfully.
What I am trying to understand is why I need to do that
Just in case - please post the output of:
# blasadmin -a show fileserver name
# blasadmin -a show fileserver location
from both appservers.
The job runs normally on one of the two jobservers running on the corresponding appserver (or do you have more than one jobserver configured per appserver?). Is there any "rule" that the job succeeds or fails according to the possible jobserver/target-combinations?
can you paste in the line from the rscd.log on the failed server that shows the failed job?
also - you are using UPM, not AP for the user mapping ?
Back from a long Christmas break so I can add a bit more information now
I have had a look at this a bit more and I forgot that we have added the following JVM arg to the application servers: -Duser.name=SYSTEM
I think that we need to do this for the application servers to use the certificates correctly when we change all the managed servers to encryption_and_auth. Is this still the case ?
So it looks like when an application server talks to a remote server with the JVM arg it comes through as SYSTEM but when it is running against itself it will use the role/account that is running the job.
This is the entry you see in the rscd log
WARN rscd - IPADDRESS 2884 SYSTEM (SYSTEM): nsh: Failed to map user to local user
I think I have answered my own question.
There was one job server without the -Duser.name=SYSTEM JVM arg configured and if the job ran on that job server it would map through as the user running the job othersie it would map through as SYSTEM and fail (unless I add a SYSTEM entry to users.local)
Maybe I should start a new post but ...
1. The JVM arg -Duser.name=SYSTEM is required if you use certificate authentication - or is there another way ?
2. If you configure 1. it seems that NSH jobs run as SYSTEM rather than the user/role the job is executing as - is there a way round this apart from configuring the application servers to use an NSH proxy ?