(replying to own post is bad form, apologies)
this problem only occurs when your provisioning network is different than the where your dhcp/appserv/tftp/pxe are. If you are local, this shouldn't come up because the default route isn't used.
Thanks for posting this. It will definitely come in handy.
I have run into an interesting multi-nic problem. We are provisioning Dell 2950's with 6 interfaces. Two embedded broadcom (bnx2) and four intel (e1000).
Dell only gives you the option to pxe through the embedded NICs. So, it netboots fine to the gentoo image, however; the gentoo image discovers the interfaces in the wrong order. The intel NICs (not connected) become eth0-3 and the broadcom become eth4/5. The bmi process insists on binding to eth0 and just hangs there,
How do you turn the interfaces off? My hack was to add "rmmod e1000" (uninstalling the intel nic drivers), then having it do an "ip link set dev eth4 name eth0" (renaming eth4 to eth0). The bmi process starts up, binds, and used to get discovered. I say used to because it doesn't work anymore.
I write a bash script that runs through the interfaces, finds the one with an ip, and renames that one to eth0. Unfortunately, discovery is not happening anymore so I have no idea if it would work...
Thanks in advance..
PS - I attached the script I was using that may or may not work.
extraConfig.sh 887 bytes
The way I approached it on the LS41 which has 4 NICs was to modify the script that generates the boot image (mkgen2img.sh I think) to modify the startup.sh script thats in the image so that it disables anything but eth0. I cheated though, and just added (without checking for existance)
echo ' ifconfig eth1 down' >> $SETUPFILE
echo ' ifconfig eth2 down' >> $SETUPFILE
echo ' ifconfig eth3 down' >> $SETUPFILE
this is done before bmi is called, so all it knows is that there is one interface.
You may want to do this earlier .. maybe make it so the network start script (I don't know what it's called in Gentoo, but the equiv to /etc/init.d/network in redhat) doesn't even start the interfaces, so that it only does DHCP on the first interface ever.
I find gentoo very frustrating, because it's different than redhat/SLES, which are the 2 distros I know best, so I'm kind of shooting in the dark. I think the best approach is going to be to modify the boot image.
You could probalby pull the code to open that boot image up from the scripts provided to make the image - once you've opened it up, it would be pretty easy to make a custom image that would work perfectly for your network configs.
Yeah, I did the same basic thing to the image creation script and added the rmmod and such. I actually created that extraConfig.sh and just added a couple lines that copy it into the /opt/BladeLogic squashfs file system and just had the setup.sh run the extraConfig.sh.
Thing is that the script works. It finds the interface and renames it. It appears that bmi is working. The DHCP discover works, it has the app server ip and port. But then drops to the command prompt as soon as it runs. No other logging or anything is there to determine what actually happened...
I also agree with your gentoo comment. I too am most familiar with RH/CentOS/SLES and even Ubuntu, to some degree.
Very frustrating. I just want it to discover the damned thing again.
At the command prompt after bmi fails, when you do an ifconfig do you just have one interface?
if you do a route -n, just one default?
When bmi runs, what step does it say it's on?
How does the WinPE image handle this system?
Is there anything in the appserver log? if you do a tcpdump/wireshark cap, can you see what it's doing? I've seen a couple cases where it looks like it's trying to do stuff, but packets aren't actually leaving the box, it's just sitting there spinning away.
for the appserver log, if you have a quiet environment, you can turn on debugging, which is ridiculous output wise, but you might see something.
I think my approach would be to wireshark it and see if you can see whats happening.
I thought I'd read somewhere the broadcoms (or dell) will switch the nics around automagically. maybe try new firmware?
It is not only the interface naming. The extraConfig.sh does everything its supposed to do. The interface eth4 becomes eth0, and everything works. A tcpdump shows activity between the appserver and the bare metal server.
I get the DHCPDISCOVER and the bl_serv_addr and port, as well as the starting comm and comm: to connect.
Then it drops to the command prompt.
So at that point everything is as it should be. Still no joy. That is until I unload the e1000 kernel module.
Then exit and it discovers.
I hate Dell...
I am getting a similar situation for IBM Blade servers with a WinPE 2 image - any idea how to disable a nic in WinPE or change the configuration so BMI works?
firstly, all the above posts are specific to the gentoo boot image. all of the issues mentioned above have been resolved through various updates and changes to the setup scripts which are available via support.
secondly, we suggested last night that you start by updating your provisioning files with the v5 revision. have your PE images been re-generated yet using these updates?
Just a heads up, the 7.4.2 provisioning files have not been updated like the 7.4.1 files. I got them last week and they only stop and start eth0.
correct. however, the GA version of the provisioning files in 7.4.1 and 7.4.2 are mostly the same. the external updates will apply for both releases. the changes included in those revisions will be included in the next release.
um..ok, so which set of provisioning files are the most up to date and include all the fixes ?
pick a set of GA files, update it with the rev 5 update which you can get by kindly asking support.