4 Replies Latest reply on Jun 1, 2010 3:43 PM by Chris NameToUpdate

    Linux Provisioning Hangs

      I was able to get Linux provisioning to work in my VM environment using patch 212.  I'm moving to the Production environment and have run into a new failure that I can't figure out.  Standard Disclaimer:  This environment is capable of provisioning Windows machines.


      Steps to reproduce:

      1. Boot target device using Windows boot files.
      2. Select target device in PM and choose "Provision"
      3. Go through the System Package wizard and click "Finish"
      4. Target device gets the message to reboot.
      5. Target device reboots and grabs my OEL54/vmlinuz && initrd.img files.
      6. Target device displays "Ready" and hangs.





      The device appears to be stuck in some kind of a loop.  I've attached a graph of the CPU usage.  NOTE:  This VM had 2 CPU's.  I think the graph pegs at 50% usage because 1 of those CPU's is maxed.  When I run this same test on a VM with 1 CPU, the graph pegs at 100%.  The dip in CPU usage you see once it flatlines is cyclic and will repeat at that interval forever.





      I went back to my VM and provisioned a new machine just to see what happens after the "Ready." prompt in the screenshot above.  The very next message is:  'Kernel is alive!" and the boot process continues on to the HTTP access of kickstart files.


      I took a peek at my logs and found something odd in the appserver.log.


      [27 May 2010 20:11:45,611] [SSL-Connections-Thread-5] [INFO] [Anonymous:Anonymous:] [Client] Connection disconnecting: id = 2926
      [27 May 2010 20:11:46,861] [SSL-Connections-Thread-2] [INFO] [Anonymous:Anonymous:] [Client] STATE received from remote machine: 0
      [27 May 2010 20:11:46,861] [SSL-Connections-Thread-2] [WARN] [Anonymous:Anonymous:] [Client] received a set state to discovered from device: 00-50-56-82-6F-F6. This should not be happening. Will process it, but get next state will return disk cleanup instead of biosinfo in this instance


      Everything else in my logs (appserver, pxe, tftp) appears to be normal.  I've pulled pertinent portions from each of those logs and can post them here if required.


      Has anyone run into this or have any idea where to start looking for a fix?