RSS

Category Archives: Operations

Sharing my experiences in production operations

ESXi host performance issues and the importance of thermal paste

A totally interesting read on how the team in VMware resolved an issue with a non performing HP blade. The final take away from this is:

  • Thermal paste really can impact performance!
  • HP Active Health System logs should (but don’t) include when CPU’s clock down to prevent overheating.
  • CPU clock throttled error message don’t appear in ESXi logs.

http://www.vmspot.com/esxi-host-performance-issues-and-the-importance-of-thermal-paste/

 

Advertisements
 
Leave a comment

Posted by on July 1, 2016 in Operations, vmware, Windows

 

Tags: ,

Algorithm that limits disk space usage in pure SSD SAN solutions

With cheaper SSDs, consumers have more choices of SSD SAN and many traditional and upstart SAN storage vendors are competing to offer a better SSD SAN product. Features like increased 100,000 (32K) IOPS versus traditional SAN and high compression and deduplication ratio, meaning a 10 TB of SSD storage can be as effective as 30 to 40 TB are no common. Those are, of course, vendor literature.

Read the rest of this entry »

 
Leave a comment

Posted by on December 4, 2015 in Operations, vmware

 

Tags:

HP blades and vSphere ESXi compatibility matrix

The most challenging part of having a blade enclosure system like HP c7000 blade encloure series is the getting the firmware to match the components. For example, you may start off with BL460c G7 blades and 1 year down the road decide to add Gen8 blades into the same enclosure. It is not a simple question of just plugging them in, many times it will not work as the underlyin OA firmware may not support Gen8 blades. However, one cannot just go ahead and upgrade the OA firmware without first checking the firmware versions of the G7 blades and iLOs to ensure that the new firmware is supported by each other. This is usually not too big a problem when the blades are new and estate is small, but if you have them deployed globally and over a few years, you can be assured that firmware versions will be very varied. And any attempts to standardize just say the OA firmware can difficult.

This is why HP has a compatibility matrix for its system. It used be a bit more complex (but easier) as the table would state the minimum firmware version for each component to work with each other. So you may want to upgrade the OA firmware to 3 versions higher but keep the rest the same, it would be not an issue. However, they have since streamlined this and force everyone to upgrade to a single version level. So if you want to upgrade the OA firmware to 3 version upwards, you need to upgrade all other components to the same version base.

http://h18000.www1.hp.com/products/blades/components/matrix/compatibility.html

Now if you are runnning ESXi hosts on these blades, your have to consider recommended driver versions which works in tandem with the OS version and the HP blade firmware.

http://vibsdepot.hp.com/hpq/recipes/

 

 
Leave a comment

Posted by on August 27, 2014 in Operations, vmware, Windows

 

Tags: , ,

Inaccessible iLO on HP blades after Heartbleed (AKA OpenSSL) vulnerability scans

Last week as part of the openssl heartbleed vulnerability checks, our security department ran a vulnerability check against all the HP iLOs in the firm. We are not sure what happened (they wouldn’t tell! lol), but after that scan, all of the HP iLO of the c-class and p-class series blades showed communications errors and you cannot access the iLOs after that.

hp_ilo

HP has acknowledged that this is an issue and is working on it. Luckily it only impacted blades G6 and below (which are legacy hardware for us) and iLO and iLO2. Firmware upgrade or downgrade does not fix this issue.

The only fix for it is to shutdown the blade and reseat it, basically the iLO needs to loss power to reset this issue. Alternatively, you can telnet into the OA and reset the blade, but you are still required to shutdown your server OS first since that will reboot the blade

Telnet <OA IP address> --> you will need logon credentials
In telnet prompt type "reset server <slot number>"

After that, login to the OA web gui and re-discover the blade.

 

 

 
2 Comments

Posted by on April 17, 2014 in Operations, vmware, Windows

 

Tags: , ,

Testing shared datastores and network for a new VMware cluster using Powershell

Ok, you have built all the ESX hosts and create the clusters and everything looks ready to go; assuming that you ESX hosts are health and HA/DRS is setup. What would be a good simple test of this new cluster? Here is what I would test:

  • Ensure that all datastores are acccessible by each ESX host
  • Ensure that all public VLANs are are correct

To quickest test for this is to create an new VM, adding all the datastores to the VM and also all networks (assuming DHCP, but that is not really required) and build out either a new OS on it or boot it up to WINPE. What you want to do is to migrate this VM to each and every ESX host and ping the configured IP after that. Read the rest of this entry »

 
Leave a comment

Posted by on March 15, 2013 in Operations, powershell, vmware

 

HP Proliant Gen8 now has integrated SmartStart and Firmware updates

Looks like life will get easier for those who deploy servers using HP Proliant Gen8. With SmartStart and Firmware DVD integrated into the system BIOS, it sure makes live much easier!

http://hpproliant.blogspot.sg/2012/06/reinstall-intelligent-provisioning-for.html

 
Leave a comment

Posted by on June 24, 2012 in Operations, Windows

 

Redhat/SUSE kdump and HP cciss driver issue

Got this HP advisory Document ID: c02758009, though its Redhat related, thought it would be useful for someone out there.

SUPPORT COMMUNICATION – CUSTOMER ADVISORY
Document ID: c02758009

Version: 1

Advisory: HP Smart Array Controllers – CUSTOMER ACTION REQUIRED for Certain HP Smart Array Controllers to Ensure Pending Writes to Storage Devices Complete Properly if Attempting to Use the Linux Kdump Facility
NOTICE: The information in this document, including products and software versions, is current as of the Release Date. This document is subject to change without notice.

 Release Date: 2011-03-16

Last Updated: 2011-03-16

——————————————————————————–

DESCRIPTION
IMPORTANT : Pending writes to storage devices may not complete properly as detailed below if the kdump facility is used. Using the kdump facility could leave the server in an unstable condition, which could potentially result in an inconsistent filesystem state. By disregarding this notification, the customer accepts the risk of incurring potential related errors.

The Linux kdump facility fails to execute properly when used on HP ProLiant servers running Linux and configured with certain Smart Array controllers using certain versions of the cciss device drivers. Read the rest of this entry »

 
Leave a comment

Posted by on March 22, 2011 in Operations

 

Tags: