Category Archives: Operations

Sharing my experiences in production operations

vROPS: resetting admin password and quirks

Recently we found out that we had to reset the admin password due to it being locked and, I think, expired. VMware does have a KB on how to this can be done:

However, what it fails to mention is that an additional step may be necessary

Now, if you just reset the admin password, it will allow you to logon via SSH as admin to the appliance. However, if you try to logon as admin to the admin GUI interface, you may encounter incorrect user name/password error. This is corrected by performing the stops in the 2nd KB article.



Leave a comment

Posted by on December 15, 2017 in Operations, vmware


Tags: , , ,

ESXi host performance issues and the importance of thermal paste

A totally interesting read on how the team in VMware resolved an issue with a non performing HP blade. The final take away from this is:

  • Thermal paste really can impact performance!
  • HP Active Health System logs should (but don’t) include when CPU’s clock down to prevent overheating.
  • CPU clock throttled error message don’t appear in ESXi logs.


Leave a comment

Posted by on July 1, 2016 in Operations, vmware, Windows


Tags: ,

Algorithm that limits disk space usage in pure SSD SAN solutions

With cheaper SSDs, consumers have more choices of SSD SAN and many traditional and upstart SAN storage vendors are competing to offer a better SSD SAN product. Features like increased 100,000 (32K) IOPS versus traditional SAN and high compression and deduplication ratio, meaning a 10 TB of SSD storage can be as effective as 30 to 40 TB are no common. Those are, of course, vendor literature.

Read the rest of this entry »

Leave a comment

Posted by on December 4, 2015 in Operations, vmware



HP blades and vSphere ESXi compatibility matrix

The most challenging part of having a blade enclosure system like HP c7000 blade encloure series is the getting the firmware to match the components. For example, you may start off with BL460c G7 blades and 1 year down the road decide to add Gen8 blades into the same enclosure. It is not a simple question of just plugging them in, many times it will not work as the underlyin OA firmware may not support Gen8 blades. However, one cannot just go ahead and upgrade the OA firmware without first checking the firmware versions of the G7 blades and iLOs to ensure that the new firmware is supported by each other. This is usually not too big a problem when the blades are new and estate is small, but if you have them deployed globally and over a few years, you can be assured that firmware versions will be very varied. And any attempts to standardize just say the OA firmware can difficult.

This is why HP has a compatibility matrix for its system. It used be a bit more complex (but easier) as the table would state the minimum firmware version for each component to work with each other. So you may want to upgrade the OA firmware to 3 versions higher but keep the rest the same, it would be not an issue. However, they have since streamlined this and force everyone to upgrade to a single version level. So if you want to upgrade the OA firmware to 3 version upwards, you need to upgrade all other components to the same version base.

Now if you are runnning ESXi hosts on these blades, your have to consider recommended driver versions which works in tandem with the OS version and the HP blade firmware.


Leave a comment

Posted by on August 27, 2014 in Operations, vmware, Windows


Tags: , ,

Inaccessible iLO on HP blades after Heartbleed (AKA OpenSSL) vulnerability scans

Last week as part of the openssl heartbleed vulnerability checks, our security department ran a vulnerability check against all the HP iLOs in the firm. We are not sure what happened (they wouldn’t tell! lol), but after that scan, all of the HP iLO of the c-class and p-class series blades showed communications errors and you cannot access the iLOs after that.


HP has acknowledged that this is an issue and is working on it. Luckily it only impacted blades G6 and below (which are legacy hardware for us) and iLO and iLO2. Firmware upgrade or downgrade does not fix this issue.

The only fix for it is to shutdown the blade and reseat it, basically the iLO needs to loss power to reset this issue. Alternatively, you can telnet into the OA and reset the blade, but you are still required to shutdown your server OS first since that will reboot the blade

Telnet <OA IP address> --> you will need logon credentials
In telnet prompt type "reset server <slot number>"

After that, login to the OA web gui and re-discover the blade.




Posted by on April 17, 2014 in Operations, vmware, Windows


Tags: , ,

Testing shared datastores and network for a new VMware cluster using Powershell

Ok, you have built all the ESX hosts and create the clusters and everything looks ready to go; assuming that you ESX hosts are health and HA/DRS is setup. What would be a good simple test of this new cluster? Here is what I would test:

  • Ensure that all datastores are acccessible by each ESX host
  • Ensure that all public VLANs are are correct

To quickest test for this is to create an new VM, adding all the datastores to the VM and also all networks (assuming DHCP, but that is not really required) and build out either a new OS on it or boot it up to WINPE. What you want to do is to migrate this VM to each and every ESX host and ping the configured IP after that. Read the rest of this entry »

Leave a comment

Posted by on March 15, 2013 in Operations, powershell, vmware


HP Proliant Gen8 now has integrated SmartStart and Firmware updates

Looks like life will get easier for those who deploy servers using HP Proliant Gen8. With SmartStart and Firmware DVD integrated into the system BIOS, it sure makes live much easier!

Leave a comment

Posted by on June 24, 2012 in Operations, Windows


Redhat/SUSE kdump and HP cciss driver issue

Got this HP advisory Document ID: c02758009, though its Redhat related, thought it would be useful for someone out there.

Document ID: c02758009

Version: 1

Advisory: HP Smart Array Controllers – CUSTOMER ACTION REQUIRED for Certain HP Smart Array Controllers to Ensure Pending Writes to Storage Devices Complete Properly if Attempting to Use the Linux Kdump Facility
NOTICE: The information in this document, including products and software versions, is current as of the Release Date. This document is subject to change without notice.

 Release Date: 2011-03-16

Last Updated: 2011-03-16


IMPORTANT : Pending writes to storage devices may not complete properly as detailed below if the kdump facility is used. Using the kdump facility could leave the server in an unstable condition, which could potentially result in an inconsistent filesystem state. By disregarding this notification, the customer accepts the risk of incurring potential related errors.

The Linux kdump facility fails to execute properly when used on HP ProLiant servers running Linux and configured with certain Smart Array controllers using certain versions of the cciss device drivers. Read the rest of this entry »

Leave a comment

Posted by on March 22, 2011 in Operations



HP DL385 G7s and BL465 failures due to suspected AMD processor failures

You may see some of your HP DL385 G7s which powers off at their own accord or BL465 which crashes for no apparent reasons.

HP has recommendation to upgrade the BIOS and iLo firmware and also change some BIOS power settings but other HP customers have done this and continued to see the issues.

It is now suspected to be an issue with the AMD processors as the issue was not confined to HP servers only. HP and AMD is currently investigate as far as I am aware

Leave a comment

Posted by on December 9, 2010 in Operations



Pyschological distance in troubleshooting

Sigh… I spent almost 2 hours today trying to troubleshoot some routing rules for my fax server dial plans. The rule only apply to a special group call TKY_OUB. After I created that rule, no matter how many times I tested it, it just won’t work. This defies logic because it cannot be that difficult because there is only that many parameters in the dialing plan rules.

So I was just about write a email to my friendly support vendor, when I went to test the rules again. This time, it starts to work! I could almost pull my hair out, except that I don’t have much hair 🙂

Guess what was starring back at me? Yes, I have entered the group name as TKY_UOB all the while, that was why the rules never worked! On my second try, I entered the correct spelling and that was why it worked.

Lesson learnt:
Pyschology distance between alphabets can cause you to loose hours of troubleshooting time, on something that was never broken. When parameters are really that simple, that is nothing more to troubleshoot, recheck your input values!

Leave a comment

Posted by on September 11, 2006 in General, Operations