A Machine Check Exception (MCE) is a type of computer hardware error that occurs when a computer’s central processing unit detects a hardware problem. On Linux, a process (such as mcelog ) writes a message to the kernel log and/or the console screen.
mcelog is a daemon to handle machine check events (hardware errors) on x86-64 machines running an x86 Linux kernel. It accounts and logs CPU and memory errors, supports triggers on error thresholds, and can predictively offline memory pages and CPUs based on error trends. This daemon should run on all x86 Linux systems that want to handle hardware errors. All errors are logged to /var/log/mcelog or syslog.
To install mcelog on CentOS / RedHat, type the following command
# yum install mcelog
Type the following command under Debian / Ubuntu
# apt-get update && apt-get install mcelog
How to view mcelog ?
Using tail command
# tail -f /var/log/mcelog
To send mail alert automatically when hardware error found on the system, you can write a shell script and call it via cron job:
# [ $(tail -i "hardware error" /var/log/mcelog) -gt 0 ] && echo "Hardware Error Found $(hostname) @ $(date)" | mail -s "[Alert] Hardware error" firstname.lastname@example.org