Friday, February 18, 2011

RHEL 2.1 P2V Fails at 99%

I have a physical RHEL 2.1 box that I am trying to P2V to an ESXi host using the VMware vCenter Converter Standalone 4.3 app and it is failing at 98% and 99%. The log is showing the following errors:

[#2] [2011-02-11 16:45:39.209 07840 info 'task-1'] Worker CloneTask updates, state: 1, percentage: 98, xfer rate (Bps): 8817664
[#2] [2011-02-11 16:46:04.379 05412 info 'task-2'] Remote Helper VM is reconfiguring, data clone is finished
[#2] [2011-02-11 16:46:04.379 05412 info 'task-2'] Volume-based cloning dlinux5.XX.XXX--> dlinux5 updates, state: 1, percentage: 99, xfer rate (Bps): 8817664
[#2] [2011-02-11 16:46:04.379 05412 info 'task-2'] CloneTask updates, state: 1, percentage: 99, xfer rate (Bps): 8817664
[#2] [2011-02-11 16:46:04.379 07840 info 'task-1'] WorkerCloneTask: Remote Helper VM is reconfiguring, data clone is finished
[#2] [2011-02-11 16:46:04.379 07840 info 'task-1'] Worker CloneTask updates, state: 1, percentage: 99, xfer rate (Bps): 8817664
[#2] [2011-02-11 16:46:16.489 05412 info 'task-2'] Volume-based cloning dlinux5.XX.XXX--> dlinux5 updates, state: 4, percentage: 99, xfer rate (Bps): 8817664
[#2] [2011-02-11 16:46:16.489 05412 info 'task-2'] Generating helperVM task bundle for task with id="task-1".
[#2] [2011-02-11 16:46:16.519 05412 info 'task-2'] Retrieving helper VM log bundle to "C:\Windows\TEMP\vmware-temp\vmware-SYSTEM\helperVM-task-1-sljnzvkf.zip".
[#2] [2011-02-11 16:46:16.619 05412 info 'task-2'] Bundle successfully retrieved to "C:\Windows\TEMP\vmware-temp\vmware-SYSTEM\helperVM-task-1-sljnzvkf.zip".
[#2] [2011-02-11 16:46:16.619 05412 info 'task-2'] powering off vm after linux p2v ...
[#2] [2011-02-11 16:46:16.619 05412 info 'task-2'] Reusing existing VIM connection to XXXXXXesx2.XX.XXX
[#2] [2011-02-11 16:46:18.099 05412 info 'task-2'] power off vm succeeded
[#2] [2011-02-11 16:46:18.099 05412 info 'task-2'] Reusing existing VIM connection to XXXXXXesx2.XX.XXX
[#2] [2011-02-11 16:46:18.579 05412 info 'task-2'] successfully reconfigured target vm
[#2] [2011-02-11 16:46:18.579 05412 info 'task-2'] CloneTask updates, state: 4, percentage: 99, xfer rate (Bps): 8817664
[#2] [2011-02-11 16:46:18.579 05412 info 'task-2'] CloneTask failed
[#2] [2011-02-11 16:46:18.579 05412 error 'App'] Task failed:

And earlier in the log:
#2] [2011-02-11 15:44:34.036 05412 error 'App'] Found dangling SSL error: [0] error:00000001:lib(0):func(0):reason(1)

And again:
[#1] [2011-02-11 15:43:08.256 03284 info 'App'] [,0] Partition:Invalid sector magic number.
[#1] [2011-02-11 15:43:08.266 03284 info 'App'] [,0] Disk number 1 has been skipped because of errors while reading partition table
[#1] [2011-02-11 15:43:08.276 03284 info 'App'] [,0] Partition:Invalid sector magic number.
[#1] [2011-02-11 15:43:08.276 03284 info 'App'] [,0] Disk number 2 has been skipped because of errors while reading partition table
[#1] [2011-02-11 15:43:08.276 03284 info 'App'] [,0] Partition:Invalid sector magic number.
[#1] [2011-02-11 15:43:08.276 03284 info 'App'] [,0] Disk number 1 has been skipped because of errors while reading dynamic disks header or LDM database is corrupted
[#1] [2011-02-11 15:43:08.276 03284 info 'App'] [,0] Partition:Invalid sector magic number.
[#1] [2011-02-11 15:43:08.276 03284 info 'App'] [,0] Disk number 2 has been skipped because of errors while reading dynamic disks header or LDM database is corrupted
[#1] [2011-02-11 15:43:08.286 03284 info 'App'] [,0] Partition:Invalid sector magic number.
[#1] [2011-02-11 15:43:08.296 03284 info 'App'] [,0] Partition:Invalid sector magic number.
[#1] [2011-02-11 15:43:08.296 03284 warning 'App'] [MoveActiveDiskIfNeeded] GetFirstBootDisk failed, mntapi error: 176

The great thing about having a VM is that you can snapshot it. DO IT. I destroyed around a dozen copies of my VM playing with this trying to get it right.

I have already manually run an fsck against the physical machine before the P2V attempts so I am not sure what is wrong but when I attempt to power up the virtual machine I get a nice “Error Loading Operating System”. Looks like a bootloader issue, possibly when the VM Reconfiguration Job was going something bad happened. No worries, I just dropped in a Fedora 13 DVD and selected the “Rescue Installed System” option. Follow the prompts (networking not required) and then “chroot /mnt/sysimage”. From here I can see the data on my VM and it all looks good. Now I go to /boot/grub and notice that I only have 4 files:
1. Grub.conf
2. Device.map
3. Menu.1sl
4. Splash.xpm.gz
Looks like the boot loader files are gone. Now I need to somehow recreate them. To do this you need to do the following. First, figure out what your partitions look like using the “fdisk –l” command. Then using that information open grub by typing “grub”.

Once inside the grub shell type in “root” and if you have the same issue that I had you should see the following returned value: “(fd0) : filesystem type unknown, partition type 0x0” We need to change that to our existing boot partition which in most cases is ‘hd0,0’. To do that type in “root (hd0,0)” and hit enter. If you get lost you can hit tab to see your options. From there you need to execute the changes by typing “setup (hd0,0)”. You should see it execute and give some output. After it is complete you can “quit” and then do an “ls” on the /boot/grub directory and you should see that grub has added several files.

Now you might need to convert the linux configuration from IDE to SCSI based. To do this you need to modify all entries that are “hdXY” where X is a letter starting with a and Y is a number starting with 1. You need to modify these to become sdXY. As an example hda1 becomes sda1. You need to do this in the following files:
1. /boot/grub.conf
2. /boot/device.map
3. /etc/fstab
Note: for some weird reason your 2nd drive’s partitions may show up as in the /etc/fstab as hdc1 instead on hdb1 as I would have expected. This will cause errors upon boot if you don’t change that to sdb1 instead of sdc1. Not sure what causes that initial value to be wrong.

Now you need to add scsi support. Open the /etc/modules.conf file and edit the following things:
Find ‘alias ethX module’ entires and replace the value of the module to “pcnet32”. This is the network adapter that you will be using in ESX(i).
Now find ‘alias scsi_hostadapter’ and if it does not exist add the following:
alias scsi_hostadapter BusLogic
For multiple controllers I assume you would have to add another line per controller like:
“alias scsi_hostadapter1 BusLogic”

After this you need to rebuild the ramdisk image. Locate the .img file in the /boot directory and make a note of the file name. Now you want to run the following command:
Mkinitrd –v –f /boot/initrd-X.X.X-Y.X.X.img X.X.X-Y.X.X where the X and Y symbolize the actual values on your system. For mine the actual command was
mkinitrd –v –f /boot/initrd-2.4.9.0-e.34.img 2.4.9.0-e.34
After that runs you should be able to reboot and it should work.

Thanks to everybody who posted bits and pieces of the fixes on various blogs and papers. Hope this helps somebody else so you don't have to do all the research and tweaking that I did.

No comments:

Post a Comment