The Linux kernel can reset the system if serious problems are detected. This can be implemented via special watchdog hardware, or via a slightly less reliable software-only watchdog inside the kernel. Either way, there needs to a daemon that tells the kernel the system is working fine. If the daemon stops doing that, the system is reset.
watchdog such as daemon. It opens /dev/watchdog , and keeps writing to it often enough to keep the kernel from resetting, at least once per minute. Each write delays the reboot time another minute. After a minute the watchdog hardware will cause the reset. In the case of the software watchdog the ability to reboot will depend on the state of the machines and interrupts.
The watchdog can be stopped without causing a reboot if the device /dev/watchdog is closed correctly, unless your kernel is compiled with the CONFIG_WATCHDOG_NOWAYOUT option enabled.
The watchdog daemon does several tests to check the system status:
- Is the process table full?
- Is there enough free memory?
- Are some files accessible?
- Have some files changed within a given interval?
- Is the average work load too high?
- Has a file table overflow occurred?
- Is a process still running? The process is specified by a pid file.
- Do some IP addresses answer to ping?
- Do network interfaces receive traffic?
- Is the temperature too high? (Temperature data not always available.)
- Execute a user defined command to do arbitrary tests.
If any of these checks fail watchdog will cause a shutdown. Should any of these tests except the user defined binary last longer than one minute the machine will be rebooted, too.
To startup watchdog use this command sequence:
watchdog [-i <interval> [-f]] [-l <max load avg>] [-v] [-s] [-b] [-m <max temperature>]