We had an interesting week; started the week with a lot of calls from the desktop team complaining that many of the workstations which they had either rebooted over the weekend or powered up could not be powered up. These are virtual machines running in our vSphere environment. In vCenter we can see that the VMs were showing up as “invalid” and we found that they all had the VMX file as zero bytes. Initially, they all appeared random, but we soon narrowed them down to 2 particular clusters, that is, those issues occurred in only these 2 clusters.
A bit of explanation on our setup: These are UCS blades built as ESXi 4.1 hosts. There are 16 hosts in each cluster and DRS and HA are enabled for all clusters (of course). The hosts have NFS datastores using Netapp filers. These clusters host our VDI environment which is managed via XenDesktop and we have a monthly reboot and patching cycle for all desktops in our firm. Read the rest of this entry »