How To Detect Hardware Errors On Linux

by lifeLinux on September 9, 2011

A Machine Check Exception (MCE) is a type of computer hardware error that occurs when a computer’s central processing unit detects a hardware problem. On Linux, a process (such as mcelog ) writes a message to the kernel log and/or the console screen.

mcelog is a daemon to handle machine check events (hardware errors) on x86-64 machines running an x86 Linux kernel. It accounts and logs CPU and memory errors, supports triggers on error thresholds, and can predictively offline memory pages and CPUs based on error trends. This daemon should run on all x86 Linux systems that want to handle hardware errors. All errors are logged to /var/log/mcelog or syslog.

Installing mcelog

To install mcelog on CentOS / RedHat, type the following command

# yum install mcelog

Type the following command under Debian / Ubuntu

# apt-get update && apt-get install mcelog

How to view mcelog ?

Using tail command

# tail -f /var/log/mcelog

To send mail alert automatically when hardware error found on the system, you can write a shell script and call it via cron job:

# [ $(tail -i "hardware error" /var/log/mcelog) -gt 0 ] && echo "Hardware Error Found $(hostname) @ $(date)" | mail -s "[Alert] Hardware error" admin@domain.com

{ 1 comment… read it below or add one }

www.everytrail.com October 26, 2013 at 3:31 pm

If some one wishes expert view on the topic of blogging after
that i advise him/her to go to see this webpage, Keep up the
good work.

Reply

Previous post:

Next post: