This week, one of the HP proliant servers we have was found to have a warning in HP web sim. The array controller was in a state “Ready for Recovery” on one of our RAID5 logical disk. We had previously replaced one of the hard disks because it was faulty
Initially, I had thought that someone had hit the F1 key during POST to disable the RAID from recovery or that the replaced hard disk was faulty. So I got the guys to replace the hard disk. The hard disk started rebuilding until less than half-way, it stopped and was back to “ready for recovery”.
Then, I suspected maybe the firmware needs to be updated, but a quick check found it to be at the latest firmware already.
After some search, I found that one of the reasons for this is that there may be some problem with another disk in the RAID which prevented a successful rebuild of the RAID. A check with HP web SIM confirms that another hard disks have 147 Fail Recovery Reads, as such the RAID don’t have enough information for a completed rebuild.
After this find, the next step was to talk to my apps team to move their files away from the RAID, destroy it, replace with good hard disks, recreate a new RAID and move the files back.