Help - Search - Members - Calendar
Full Version: Server Crash Help
Linuxhelp > Support > Technical Support
I recently installed debian on my server (thanks to hughesjr).

Today I came into my office and found that the server had an error on the screen. I opened up /var/log/syslog and found this error:

debian kernel: MCE: The hardware reports a non fatal, correctable incident occurred on CPU 1.
debian kernel: Bank 2: 940040000000017a

I googled that error code and found a bunch of different responses from people who said that it could be bad ram or a bad CPU.

The machine is a Dual proc athlon MP 1800.

The back story on it is that a few weeks ago it was happily running W2k server and then suddenly it started shutting down and rebooting. At that time we formatted it and reinstalled up to SP 4. However, it kept on rebooting. At the time we did a little research and found a lot of people reporting that happening after installing SP4. I decided that I was going to ditch windows anyway, so I'd just install linux and hope for the best.

When the machine was delivered to me (It had been co-located in Detroit), I turned it on and found that one of the CPU fans was bugging out. I ordered two new cpu fans and installed them both before installing linux.

So, it seems likely to me that the cpu with the screwed up fan is now toast. But how can I be sure that is what the problem is? I don't want to waste the money on a new cpu and find out that it was the ram - and I don't want to spend the money on new ram and find out it was the cpu.

Any idea how I can further debug this and find out who the culprit is?
Update: I did some more digging and found this:

Looks like the L2 cache ECC checking spotted something going wrong,
and fixed it up. This can happen in cases where there is inadequate
cooling, power, or overclocking (or in rare circumstances, flaky CPUs)

I shut the machine down and took out cpu2 (Those cpu fans are HARD to get off!). Cleaned off the cooling goo and saw that it *is* pretty browned - more than you would expect.

At this point I think I'll just replace the two cpu's with newer ones.

Just thought I'd let anyone who reads this know that I've resolved the issue (sort of), or will soon.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2018 Invision Power Services, Inc.