1 2 Previous Next 21 Replies Latest reply on May 28, 2009 6:58 PM by Stephen Moss

    7.4 BMI Failure on multi-nic server

      I've run in to this on 7.4 ... when booting into the gentoo images, BMI hangs at "comm: to connect". It then fails out saying "could not connect to (appserver IP). EC: 110. ES: Connection timed outcould not connect to blademetal server!"


      After some poking, I discovered that there is more than one default route being added (check "route -n"). It appears when gentoo brings up the interfaces it adds 2 default routes (and 2 network routes as well).


      On an individual box, the easiest way to get around this is to break out of BMI (or wait for it to fail) and shutdown the other interface(s), then run BMI again.


      For a longer term solution, I modified my mkgen2img.sh script to include a line to shutdown the interface before bmi is run. I've tested both the 32 and 64 bit version and my hack seems to work no problem. On multinic servers (ie: LS41) a smarter script might be needed as they have 4 nics.


      Just wanted to share that with other forum users in case they run in to the same problem.

        • 1. Re: 7.4 BMI Failure on multi-nic server

          (replying to own post is bad form, apologies)


          this problem only occurs when your provisioning network is different than the where your dhcp/appserv/tftp/pxe are. If you are local, this shouldn't come up because the default route isn't used.

          • 2. Re: 7.4 BMI Failure on multi-nic server

            Thanks for posting this. It will definitely come in handy.

            • 3. Re: 7.4 BMI Failure on multi-nic server

              I have run into an interesting multi-nic problem. We are provisioning Dell 2950's with 6 interfaces. Two embedded broadcom (bnx2) and four intel (e1000).


              Dell only gives you the option to pxe through the embedded NICs. So, it netboots fine to the gentoo image, however; the gentoo image discovers the interfaces in the wrong order. The intel NICs (not connected) become eth0-3 and the broadcom become eth4/5. The bmi process insists on binding to eth0 and just hangs there,


              How do you turn the interfaces off? My hack was to add "rmmod e1000" (uninstalling the intel nic drivers), then having it do an "ip link set dev eth4 name eth0" (renaming eth4 to eth0). The bmi process starts up, binds, and used to get discovered. I say used to because it doesn't work anymore.


              I write a bash script that runs through the interfaces, finds the one with an ip, and renames that one to eth0. Unfortunately, discovery is not happening anymore so I have no idea if it would work...


              Any ideas?


              Thanks in advance..




              PS - I attached the script I was using that may or may not work.

              • 4. Re: 7.4 BMI Failure on multi-nic server

                The way I approached it on the LS41 which has 4 NICs was to modify the script that generates the boot image (mkgen2img.sh I think) to modify the startup.sh script thats in the image so that it disables anything but eth0. I cheated though, and just added (without checking for existance)


                echo ' ifconfig eth1 down' >> $SETUPFILE

                echo ' ifconfig eth2 down' >> $SETUPFILE

                echo ' ifconfig eth3 down' >> $SETUPFILE


                this is done before bmi is called, so all it knows is that there is one interface.


                You may want to do this earlier .. maybe make it so the network start script (I don't know what it's called in Gentoo, but the equiv to /etc/init.d/network in redhat) doesn't even start the interfaces, so that it only does DHCP on the first interface ever.


                I find gentoo very frustrating, because it's different than redhat/SLES, which are the 2 distros I know best, so I'm kind of shooting in the dark. I think the best approach is going to be to modify the boot image.


                You could probalby pull the code to open that boot image up from the scripts provided to make the image - once you've opened it up, it would be pretty easy to make a custom image that would work perfectly for your network configs.



                • 5. Re: 7.4 BMI Failure on multi-nic server

                  Yeah, I did the same basic thing to the image creation script and added the rmmod and such. I actually created that extraConfig.sh and just added a couple lines that copy it into the /opt/BladeLogic squashfs file system and just had the setup.sh run the extraConfig.sh.


                  Thing is that the script works. It finds the interface and renames it. It appears that bmi is working. The DHCP discover works, it has the app server ip and port. But then drops to the command prompt as soon as it runs. No other logging or anything is there to determine what actually happened...


                  I also agree with your gentoo comment. I too am most familiar with RH/CentOS/SLES and even Ubuntu, to some degree.


                  Very frustrating. I just want it to discover the damned thing again.

                  • 6. Re: 7.4 BMI Failure on multi-nic server

                    At the command prompt after bmi fails, when you do an ifconfig do you just have one interface?


                    if you do a route -n, just one default?


                    When bmi runs, what step does it say it's on?


                    How does the WinPE image handle this system?


                    Is there anything in the appserver log? if you do a tcpdump/wireshark cap, can you see what it's doing? I've seen a couple cases where it looks like it's trying to do stuff, but packets aren't actually leaving the box, it's just sitting there spinning away.


                    for the appserver log, if you have a quiet environment, you can turn on debugging, which is ridiculous output wise, but you might see something.


                    I think my approach would be to wireshark it and see if you can see whats happening.



                    • 7. Re: 7.4 BMI Failure on multi-nic server
                      Bill Robinson

                      I thought I'd read somewhere the broadcoms (or dell) will switch the nics around automagically. maybe try new firmware?

                      • 8. Re: 7.4 BMI Failure on multi-nic server

                        It is not only the interface naming. The extraConfig.sh does everything its supposed to do. The interface eth4 becomes eth0, and everything works. A tcpdump shows activity between the appserver and the bare metal server.


                        I get the DHCPDISCOVER and the bl_serv_addr and port, as well as the starting comm and comm: to connect.


                        Then it drops to the command prompt.


                        So at that point everything is as it should be. Still no joy. That is until I unload the e1000 kernel module.


                        Then exit and it discovers.


                        I hate Dell...

                        • 9. Re: 7.4 BMI Failure on multi-nic server

                          I am getting a similar situation for IBM Blade servers with a WinPE 2 image - any idea how to disable a nic in WinPE or change the configuration so BMI works?

                          • 10. Re: 7.4 BMI Failure on multi-nic server

                            firstly, all the above posts are specific to the gentoo boot image. all of the issues mentioned above have been resolved through various updates and changes to the setup scripts which are available via support.


                            secondly, we suggested last night that you start by updating your provisioning files with the v5 revision. have your PE images been re-generated yet using these updates?

                            • 11. Re: 7.4 BMI Failure on multi-nic server

                              Just a heads up, the 7.4.2 provisioning files have not been updated like the 7.4.1 files. I got them last week and they only stop and start eth0.

                              • 12. Re: 7.4 BMI Failure on multi-nic server

                                correct. however, the GA version of the provisioning files in 7.4.1 and 7.4.2 are mostly the same. the external updates will apply for both releases. the changes included in those revisions will be included in the next release.

                                • 13. Re: 7.4 BMI Failure on multi-nic server
                                  Bill Robinson

                                  um..ok, so which set of provisioning files are the most up to date and include all the fixes ?

                                  • 14. Re: 7.4 BMI Failure on multi-nic server

                                    pick a set of GA files, update it with the rev 5 update which you can get by kindly asking support.

                                    1 2 Previous Next