What's in the exports/users/users.local on the appservers?
also, does each appserver access the file server via itself (localhost) ? - i'm assuming that since you have /data mounted on each appserver.
Appservers (linux) mount with these lines in /etc/hosts:
22.214.171.124 my.local.host.name blandapp001 blandfs
126.96.36.199 blandfs001.local.host.name blandfs001
and this one in /etc/fstab:
blandfs001:/var/data /data nfs defaults 1 3
rw,user=root (on unix)
BL support has looked at these files before, they don't seem problematic. It seems as if it depends on which (of many appservers) accesses and copies files to fileserver first.
See, this is what I mean. This is a view of files created by an audit job run, on the fileserver. I got it to work by going on the fileserver, into /var/data (the partition that's served out via NFS) and setting the protection on /var/data/tmp/ to 777 - then running a job from a remote appserver. This worked, creating files with ownership reflecting their NFS origin (4294967294:4294967294). However, we have one appserver that seems to create files on the fileserver with ownership of bladmin:bladmin, which prohibits other appservers from writing to it.
drwxrwxrwx 3 bladmin bladmin 4096 Jun 8 16:11 application_server
drwxr-xr-x 3 4294967294 4294967294 4096 Jun 8 16:11 2003100
drwxr-xr-x 3 4294967294 4294967294 4096 Jun 8 16:11 usr
drwxr-xr-x 3 4294967294 4294967294 4096 Jun 8 16:11 local
drwxr-xr-x 3 4294967294 4294967294 4096 Jun 8 16:11 nsh
drwxr-xr-x 3 4294967294 4294967294 4096 Jun 8 16:11 br
drwxr-xr-x 2 4294967294 4294967294 4096 Jun 8 16:11 bnp
*-rwx------ 1 4294967294 4294967294 65536 Jun 8 16:11 audit_2016548.1_2000766.1_2deviceId_2022084.snp
-rwx------ 1 4294967294 4294967294 65536 Jun 8 16:11 audit_2016548.1_2000766.1_1deviceId_2022084.snp*
one clarification, this /data mount point, what is in there - is it the 'storage' location? or is it some shared space between all the appservers?
but the 'file server' in BladeLogic is blandfs, which is a local alias on each appserver, so when an appserver needs to save something to the 'file server', it's going to talk to the agent on itself and goto the /data/whatever directory on itself.
there is no agent that is accessed as the 'file server' in this process on blandfs001 ? (i'm just making sure we have the path of this operation right)
can you look in the export/users.local/users on each of the appservers (since that's how we're accessing the 'file server') and post those? and is there anything on the appservers' rscd.log that shows any permission denys?
I think what's happening is that a user isn't getting mapped properly on the 'file server' and that's what's causing the permission denied message.
Right, there is no agent on blandfs001 - its function is to serve out /var/data on itself to be mounted via NFS as /data on the various appservers.
rscd.log on appservers doesn't show anything relevant to this problem anywhere I looked.
exports everywhere is:
users.local everywhere is:
users everywhere has these two entries, plus others:
I agree with your assessment that it's a mapping issue, but I don't see the problem anywhere. Right now everything is working, and I'm not sure how it gets broken unfortunately, though it happens fairly often. Thanks for your insights, anyway.
For the failed job, can you see what appserver it was running on in the job log? it may be that only 1 is mis-configured.
and you're running this job as a user in the BLAdmins role that has a mapping in the users file?