CPU issue causing BSOD 7F (0x0D, 0x00, 0x00) & 9C (0x00,xxx,xx,xx)

21 Feb

We had a strange server that would BSOD regularly with stop codes 7F or 9C, in the later days it was BSODing every 10 mins or so.

So we had the motherboard change and it stop BSODing for about 1 hour and started again!

Diagnostics was ran but all were successful (how typical!) and HP was really reluctant to bring down other parts as the diagnostics showed okay. In fact, this is not the first time I have seen HP servers with good diagnostics but after changing on of the parts, everything is resolved.

So we insisted of getting some other parts down to change. I was not really involved in this but on that day, I was one shift to be the hardware engineer escort to our datacenter. So I did a quick check on the BSOD errors and found that most of the discussion was around memory or CPU issues.

So when I met the HP engineer in the datacenter, I eagerly asked if he had brought spare memory and CPU to test. Instead, they brought down an array controller to swap out! What rubbish and waste of time, I thought to myself, as the BSOD error does not show any of the symptoms pointing to a faulty array controller.

Anyway, I did not want to get the controller changed and insisted that we did some memory and CPU swap around to test.

So the engineerings took out 2 banks (out of 4) of memory, booted up the server, 10 mins later BSOD occurred.Then he swapped the other 2 banks and booted up and still the same problem. At this point, it could not be the memory, unless we are very unlucky, its unlikely that we had 2 faulty memory modules exhibiting the same symptoms.

Next was the CPU test, he took out one of the CPU and boot up… 10 mins later, BSOD occured. Then a swap the CPU, boot up…10 mins later no BSOD, so we wait for another 15-20 mins and the server is still running. So it was indeed a CPU causing this issue after all.

Anyway, it was later at night and I asked the engineer to come in the next day with a new CPU. I left the server running overnight with one CPU and it was still running the next day I went back to the office.

Problem solved.

Posted by on February 21, 2009



