Redhat/SUSE kdump and HP cciss driver issue

22 Mar

Got this HP advisory Document ID: c02758009, though its Redhat related, thought it would be useful for someone out there.

Document ID: c02758009

Version: 1

Advisory: HP Smart Array Controllers – CUSTOMER ACTION REQUIRED for Certain HP Smart Array Controllers to Ensure Pending Writes to Storage Devices Complete Properly if Attempting to Use the Linux Kdump Facility
NOTICE: The information in this document, including products and software versions, is current as of the Release Date. This document is subject to change without notice.

 Release Date: 2011-03-16

Last Updated: 2011-03-16


IMPORTANT : Pending writes to storage devices may not complete properly as detailed below if the kdump facility is used. Using the kdump facility could leave the server in an unstable condition, which could potentially result in an inconsistent filesystem state. By disregarding this notification, the customer accepts the risk of incurring potential related errors.

The Linux kdump facility fails to execute properly when used on HP ProLiant servers running Linux and configured with certain Smart Array controllers using certain versions of the cciss device drivers.

Kdump is a facility for capturing a system memory image when a kernel panic occurs. These memory images are useful for debugging. Kdump loads a special kernel into a reserved memory area. When a panic occurs in the main kernel, control is transferred to the kdump kernel. As the memory image dump process begins, storage device drivers reset their associated controllers to clear all pending I/O activity before starting the memory dump activity, including pending writes to storage devices.

When kdump is executing on the affected Smart Array controllers, the reset process does not work as expected, and commands that were issued to the controller prior to the kdump kernel beginning execution may complete during the kdump process, potentially disrupting the kdump kernel’s I/O, and potentially leading to a corrupt kdump image or inconsistent filesystem state.

This problem occurs on Smart Array controllers that do not support the PCI power management reset method. In these cases, the needed reset never occurs. When these controllers are used with older cciss device driver versions on Linux, this situation may go unnoticed, and the dump process will complete, if no I/O was pending at the time of the panic. A newer cciss device driver version will always detect the problem, and stop responding at the reset stage to avoid potential I/O loss, but the kdump process will not succeed.

Messages similar to the following may be displayed on some but not all controllers having this problem:

<4>cciss 0000:19:08.0: Unable to successfully reset controller. . .

Any HP ProLiant server configured with any of the following HP Smart Array controllers:

•HP Smart Array P400 controller
•HP Smart Array P400i controller
•HP Smart Array P800 controller
•HP Smart Array E500 controller
•HP Smart Array P700m controller
•HP Smart Array E200 controller
•HP Smart Array E200i controller
And running any of the following versions of the HP Smart Array driver (cciss):

•For Red Hat Enterprise Linux 5, Version 3.6.26-5 (or earlier)
•For SUSE Linux Enterprise Server 10, Version 3.6.26-5 (or earlier)
•For Red Hat Enterprise Linux 6, Version 4.6.26-6 (or earlier)
•For SUSE Linux Enterprise Server 11, Version 4.6.26-6 (or earlier)

To ensure this issue does not occur, do not use the kdump facility to dump the memory image on the affected Smart Array controllers when using the cciss drivers listed in the Scope section.

Updated cciss device drivers for Linux that prevent this issue from occuring will become available on in the near future. The updated driver eliminates the potential for lost data during the kdump process on the affected Smart Array controllers, however, the kdump process will still not produce a dump file on these controllers and should not be used.

As a workaround , change the kdump configuration to use an alternate storage device for the dump, or use the net dump method. Refer to the kdump manpage for additional information.

Only the Smart Array controllers listed in the Scope section are affected by this issue. Other Smart Array Controllers use an alternative method for resets, and do not require the PCI power management reset method to successfully complete a kdump process. As a result no other Smart Array Controllers are affected.

RECEIVE PROACTIVE UPDATES : Receive support alerts (such as Customer Advisories), as well as updates on drivers, software, firmware, and customer replaceable components, proactively via e-mail through HP Subscriber’s Choice. Sign up for Subscriber’s Choice at the following URL:

Leave a comment

Posted by on March 22, 2011 in Operations



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: