Monday, January 12, 2009

Find and Fix the Missing eth0 Device on your Ubuntu Clones

Introduction

Creating VM templates in VMware ESX/VI3+ that work immediately after you clone them can be challenging. VMware provides some built-in support for creating Windows templates, and Microsoft provides the SYSPREP utility to facilitate the process, but there are few, if any, standard tools available to help with the task in a *nix environment.

This article documents my response to an annoying snag I encountered while using a self-made ESX/VI3 VM template based on the Ubuntu Server operating system.

My template was based on the 64bit Ubuntu 8.04.1 LTS edition downloaded from the Ubuntu site. I built a standard basic installation of the base operating system components. Nothing fancy. After installing the OS and turning the VM into a template, I deployed a clone from the template. I've also tested this on Ubuntu 9.04 and 9.10 and it works as expected.

After the clone booted up, it worked as expected in all areas except networking. The eth0 interface was not visible using the "ifconfig" or "ip address show" commands, and a new eth1 interface had been created and was only visible using the "ip address show" command. The /etc/network/interfaces file only had a configuration for the eth0 interface and assigning it valid IP information did not allow me to bring up either interface. I was dead in the water with no network access.

Problem

The snag occurs on the cloned VM's that are deployed from the template VM because the MAC address assigned to the eth0 interface in the udev network configuration files of these clones ends up conflicting with the MAC address assigned to the .vmx configuration files that are auto-generated by ESX server for each VM at the time of cloning. The end result of this conflict causes the cloned VM's to be created with a broken networking configuration that is unusable.

Analysis

Here is the sequence of events:
  1. The template VM is created and is assigned a virtual MAC address (address "A") by ESX.
  2. This address is inserted into the .vmx configuration file assigned to the VM.
  3. When the VM is booted up, it extracts the virtually assigned MAC address located in the .vmx file and inserts it into the OS's udev network configuration file that handles device to Ethernet mappings.
  4. When the VM is converted to a template, these settings are not removed from the udev file.
  5. When the template is cloned, the clone VM is assigned a new virtual MAC address (address "B") by ESX and this address is recorded in the new .vmx file assigned to the clone.
  6. When booting for the first time, the clone parses its .vmx file and locates the virtual MAC address that has been assigned to it (address "B").
  7. Once it has the new value, it tries to configure its eth0 interface with it by placing an entry into the udev file.
  8. Because the original MAC address (address "A") that was assigned to the template VM still exists in the udev file, the OS decides not to overwirte the old address (address "A") with the new one (address "B"), but instead, creates a new eth1 interface with the virtual MAC address it extracted from its .vmx file (address "B").
  9. When the "ip address show" command is typed, both the eth0 and eth1 interface configurations appear because the command retrieves its information from the udev file.
  10. Because there is no eth1 entry in the /etc/network/interfaces file, the eth1 interface won't work.
  11. Because the eth0 interface is assigned an invalid virtual MAC address in the udev file, the eth0 interface is unusable on the network, even if it is assigned valid IP information.
There are a couple of ways to correct the situation on a per clone basis. They work fine, but they both require extra manual effort. This can become quite time consuming if you have many clones to create. Or, if you don't create many clones, you may even forget the exact fix! One solution involves replacing address "A" with address "B" in the udev file, and the other involves renaming eth0 to eth1 in the /etc/network/interfaces file.

But there's a better solution that you only need to apply once to the original template VM!!

Solution

The solution in this article demonstrates how to bypass this problem without editing any ESX files, and with only very minor editing of the Ubuntu network configuration files on the template VM. All the other solutions to this problem that I have seen to date require you to edit each clone (manually or by script) or the .vmx file associated with the clone (not recommended). These solutions work, but in the end, they require much more effort. The solution provided in this article only needs to be applied once.

I have not tested this procedure with other Linux distributions or Ubuntu versions so any feedback on how this process differs in other environments would be helpful to users of those other distributions or Ubuntu versions. For reference, however, earlier Ubuntu versions maintain the MAC address information in the /etc/iftab file.

NOTE: The remaining steps are critical to making this process work correctly for each clone. If the template is booted up after these steps are completed, the steps will need to be repeated before the template can be used again to create new fully operational clones.

  1. Build a VM to be used as a template.
  2. Boot into the VM and log on with an account that has sudo privileges.
  3. Type "sudo ifdown eth0".
  4. Type "sudo /etc/init.d/networking stop".
  5. Type "sudo vi /etc/udev/rules.d/70-persistent-net.rules".
  6. Delete each line beginning at line 5. To do this, move the cursor to the beginning of the line, then type dd. Do not delete the first four commented lines.
  7. Press ESC, then press Shift-ZZ to save.
  8. Type "vi ~/.bash_history".
  9. For each line in the file, move the cursor to the beginning of the line, then type dd.
  10. Press ESC, then press Shift-ZZ to save.
  11. Type "sudo shutdown –P now" to shut down the VM.
  12. Wait for the VM to shut down completely.
  13. Right click the VM in the Virtual Center console and convert it to a Template (or perform the equivalent action if you are not connected via Virtual Center or the VI Client).
Conclusion

In conclusion, creating a VM template using Ubuntu 8+ in an ESX/VI3+ environment is a relatively straightforward process. To avoid having to manually repair an artifact of the cloning process on each VM clone after it is created, this article explains how to change the template VM so that a proper configuration is applied to each clone during the cloning process. The key to making this process work correctly is in disabling the template's networking capabilities prior to deleting the entry for the eth0 interface in the udev file responsible for network interface configuration. Additionally, the template must not be booted into again once these final changes are made to it, otherwise the changes will need to be re-applied.

Tuesday, April 1, 2008

.vmdk Snapshots and the Importance of CID Chains


Introduction

I spent half of yesterday troubleshooting a problem that arose from an un-synchronized VMware VI3 snapshot. It was an unpleasant experience, but I learned about the importance of CID chains!

After several hours of poking and prodding a dead VM, my colleague Dan and I finally stumbled across CID chains.

We came to the conclusion that snapshots, and specifically the delta .vmdk files VI3 creates each time it generates a new snapshot, are connected to each other in a chain via randomly assigned dynamic CID values. VI3 assigns each new snapshot file a CID, and that value changes each time the VM reboots.

If the CID chain is broken for any reason, the VM cannot mount the virtual disks assigned to it.

Problem


Your VM’s virtual disks will not mount and you receive the following error message:

Cannot open the disk ‘/vmfs/volumes/INSERT SPECIFIC VALUE HERE.vmdk’ or one of the snapshot disks it depends on.
Reason: The parent virtual disk has been modified since the child was created.

Yikes!


What do you do?!

  1. Don’t panic (yet).
  2. Shut down the VM completely, ASAP.
  3. Don’t change anything!
  4. Read this article!

You should be able to correct the problem, assuming you haven’t made any changes to the .vmdk files.

Analysis

This problem likely arose because you made one or more snapshots of one or more of the virtual disks associated with a VM, and the snapshots have become unsynchronized; you are probably also running in an ESX/VI3 environment.

The root cause of the problem relates to how VI3 manages snapshots and snapshot delta changes to the original virtual disk (.vmdk). Most likely, the snapshot delta hierarchy has become unsynchronized and the various child snapshot .vmdk files associated with the original .vmdk are no longer referencing the correct parent .vmdk.

The key to piecing the snapshot delta files back together is in the CID chain. The CID chain is the glue behind the snapshot delta hierachy.

Each .vmdk file in the chain is assigned a CID value. The .vmdk also points to a parent CID value and parent .vmdk file. The parent CID value must point to the parent .vmdk file created immediately before the child .vmdk.

Each .vmdk file's CID is dynamically changed to a random hex value every time the VM boots. If a VM boots and the CID chain is not in the same state as per the previous boot process, the child CID-->parent CID relationships become unsynchronized. This is how VI3 determines authenticity of snapshot delta files.

The only way to re-synchronize a CID chain is to edit the .vmdk files in each chain manually.

Example CID chain:

  1. Parent .vmdk (original .vmdk) = vdisk.vmdk
  2. Child .vmdk created after 1st snapshot = vdisk-000001.vmdk
  3. Child .vmdk created after 2nd snapshot = vdisk-000002.vmdk

If this chain is broken:

  1. manually point vdisk-000002.vmdk to vdisk-000001.vmdk, and
  2. manually point vdisk-000001.vmdk to vdisk.vmdk.

Three fields deserve attention in each .vmdk file, for the purposes of this discussion:

  1. The CID field,
  2. The parentCID field, and
  3. The parentNameHint field.

Note: The parent .vmdk does not contain a parentNameHint field and its parentCID field always equals "ffffffff".

Samples

Sample output of parent vdisk.vmdk file:

[root@myvi3server]# cat vdisk.vmdk
# Disk DescriptorFile
version=1
CID=7f81b951
parentCID=ffffffff
createType="vmfs"
# Extent description
RW 50331648 VMFS "vdisk-flat.vmdk"
# The Disk Data Base
#DDB
ddb.virtualHWVersion = "4"
ddb.geometry.cylinders = "3133"
ddb.geometry.heads = "255"
ddb.geometry.sectors = "63"
ddb.adapterType = "lsilogic"
ddb.toolsVersion = "7202"

Sample output of child vdisk-000001.vmdk file:

[root@myvi3server]# cat vdisk-000001.vmdk
# Disk DescriptorFile
version=1
CID=8eb633b8
parentCID=7f81b951
createType="vmfsSparse"
parentFileNameHint="/vmfs/volumes/478b9802-ce7ed955-96a4-0015c5fd9308/servername/vdisk.vmdk"
# Extent description
RW 50331648 VMFSSPARSE "vdisk-000001-delta.vmdk"
# The Disk Data Base
#DDB
ddb.toolsVersion = "7202"

Note: The file path in the parentNameHint field in this sample is pointing to an iSCSI SAN volume. The long hex guid number is a directory pointer.

Sample output of child vdisk-000002.vmdk file:

[root@myvi3server]# cat vdisk-000002.vmdk
# Disk DescriptorFile
version=1
CID=249e6aff
parentCID=8eb633b8
createType="vmfsSparse"
parentFileNameHint="sryulris0cogp01-000001.vmdk"
# Extent description
RW 50331648 VMFSSPARSE "vdisk-000002-delta.vmdk"
# The Disk Data Base
#DDB
ddb.toolsVersion = "7202"

Note: The file path in the parentNameHint field is pointing to a file that resides in the same local directory as vdisk-000002.vmdk. If the file is located in a different directory, the path must indicate this accordingly.

Solution

The following steps synchronize the snapshot delta files so that VI3 recognizes their relationship with each other, as well as their authenticity:

Note: Perform the following steps in a text editor. Backup the original files prior to making any changes.

Edit vdisk-000002.vmdk:

  1. Take note of the CID value in vdisk-000001.vmdk (=8eb633b8)
  2. Take note of the file path of vdisk-000001.vmdk (=local directory)
  3. Point the parentCID field to the CID value of vdisk-000001.vmdk
    --> parentCID=8eb633b8
  4. Point the parentNameHint field to the vdisk-000001.vmdk file
    --> parentFileNameHint="sryulris0cogp01-000001.vmdk"

Edit vdisk-000001.vmdk:

  1. Take note of the CID value in vdisk.vmdk (=7f81b951)
  2. Take note of the file path of vdisk.vmdk (="/vmfs/volumes/478b9802-ce7ed955-96a4-0015c5fd9308/servername/)
  3. Point the parentCID value to the CID of vdisk.vmdk
    --> parentCID=7f81b951
  4. Point the parentNameHint field to the vdisk.vmdk file
    --> parentFileNameHint="/vmfs/volumes/478b9802-ce7ed955-96a4-0015c5fd9308/servername/vdisk.vmdk"

If the virtual disks assigned to the VM were not changed or altered in any way, the solution ends here. If the virtual disks were altered, or removed and then re-added, manually edit the appropriate virtual SCSI controller’s settings in the VM’s .vmx configuration file to point to the last child .vmdk in the snapshot chain.

Note: Perform the following steps in a text editor. Backup the original file prior to making any changes.

Edit servername.vmx:

  1. Locate the SCSI controller associated with the virtual disk that must be repaired. The first virtual disk will normally be assiged to virtual SCSI controller scsi0:0, the second virtual disk will normally be assigned to virtual SCSI controller scsi0:1, and the third virtual disk will normally be assigned to virtual SCSI controller scsi0:2 etc…
  2. Assuming that the second virtual disk must be repaired, locate the section of the file that references scsi0:1.
  3. Point the following field to the correct location of the last child .vmdk in the snapshot chain:
    scsi0:1.fileName="/vmfs/volumes/46fd62d479749c9697e30015c5fd9308/servername/vdisk-000002.vmdk - Note: The file path may vary, though the snapshots are stored in the same directory as the parent .vmdk (vdisk.vmdk) by default. Despite this, enter the complete path to the child .vmdk (vdisk-000002.vmdk), including any iSCSI volume guid’s etc...
  4. Save the file and restart the VM.
  5. If the VM does not boot, or if it complains that "The parent virtual disk has been modified since the child was created", verify the CID chain hierarchy and file paths, then retry. If the VM still does not boot, and the CID chains and file paths are all correct, the parent or child .vmdk files were probably altered or changed in an unrecoverable way, or they may not be the correct files!