Deadman

Version: 
0.5
Release date: 
Tuesday, 5 December, 2017

License:

Interface:

Authors/Port authors:

DEADMAN attemps to detect when a system is not operating properly and to reboot the system when this occurs.

This software is distributed as compressed package. You have to download and manually install it; if prerequisites are required, you will have to manually install them too.

Manual installation

Program is distributed as ZIP package: download to temporary directory and unpack to destination folder. See below for download link(s).

Following links are to additional programs, not mandatory but useful:

Following ones are the download links for manual installation:

Deadman v. 0.5 (21/6/2022, Steven Levine) Readme/What's new
deadman user guide v0.3 2017-12-05 SHL Baseline 2022-04-03 SHL Version 0.3 2022-06-21 SHL Version 0.5 == Introduction == Deadman attempts to detect when one or more of a known set of problems occurs in a running a system and to take appropriate recovery actions when one of these problems is detected. Deadman is an evolving application. New features are added when new failure modes and/or new recovery modes are discovered. Deadman was originally written to keep apache httpd servers that I maintain up and running with minimal human interaction so some of deadman's features are specific httpd servers. Other features are more generic and may be useful for use with other applications. Deadman logs its actions to the deadman.log log file. The log file will be written to the %LOGFILES% directory if defined. Otherwise it will be written the %TEMP% directory. Deadman also logs its action the STDOUT, unless it is running detached. Deadman writes its PID to deadman.pid in the %TEMP% directory. This allows other processes to check and/or deadman. == Usage == deadman is a VIO command line application which is typically run detached. Output is written to the standard output, if deadman is not running detached, and to the log file (%LOGFILES%\deadman.log). The log file entries are timestamped so that they can be correlated with information from other timestamped logs. Each log file entry includes and message id of the form (#number). The id number can be used to locate the code that generated the message, if needed. To display the help screen, enter deadman.exe -? at the command line. The help screen currently displays as: The deadman daemon checks system health based on configuration file settings. See deadman.txt for a detailed description of operation and options. deadman [-c] [-h] [-s] [-t] [-v] [-V] [-?] [cfgfile] -c Check daemon status -h -? Display this message -s Stop daemon -t Run in TEST mode -v Display verbose status -V Display version cfgfile Configuration file to process Copyright (c) 2008-2022 Steven Levine and Associates, Inc. All rights reserved. == Theory of operation == Deadman attempts to monitor system health by watching the state of a user selected files, as defined in the configuration file. Deadman can: - monitor a transaction log file for activity - monitor an error log file for certain errors - reboot the system on request When monitoring a transaction log file for activity, deadman expects the file size to increase over time. If the file size fails to increase for longer than the configured interval, deadman will attempt to reboot the system after the reboot delay expires. The check interval and the reboot delay interval are both configurable. Deadman contains logic to handle log rotation which will cause the log file size to be reduced. When monitoring an error log file for errors, deadman will check the log file for known errors at configurable intervals. The set of known errors is currently: - httpd cannot create child proceeds As deadman evolves additional check may be implemented. When one of the known errors is detected, deadman will perform error specific recovery actions. If the recovery actions fail, deadman will attempt to reboot the system after the reboot delay expires. The check interval and the reboot delay interval are both configurable. When monitoring for reboot requests, deadman checks if the reboot request file has been cremated. When the file is created, deadman will attempt to reboot the system. If the reboot request file is not empty, deadman will write the first line of the file to the deadman log file to record the reason for the reboot. == Sample configuration file == The Configuration File Keywords section describes the available keywords in more detail ; hostname: steven, domain: www.scoug.com ; checks error log for child process start failures ; checks transaction log for lack of activity ; checks transaction log for lack of activity ; 2022-04-03 SHL Baseline - steven translogfile = d:\logs\apache\scoug-combined_log processname = httpd TransLogCheckIntervalSec = 60 ; 1 minute errlogfile = d:\logs\apache\scoug-error_log ErrorLogCheckIntervalSec = 30 rebootfile = d:\apps\apache24\reboot-me-now SleepSec = 10 RebootDelaySec = 3600 ; 1 hour, 0 suppresses reboots ForceStatusSec = 21600 ; 6 hours == Sample command lines == To start deadman in VIO mode start "deadman" deadman d:\apps\bin\deadman.cfg To start deadman detached detach "deadman" deadman d:\apps\bin\deadman.cfg To check if deadman daemon is running: deadman -c To stop the running instance of deadman: deadman -s == Running multiple deadman instances == Currently, deadman only supports a single occurrence of each keyword. This will probably change in the future. For now, if you need to monitor multiple files of the same type, you need to run multiple instances of deadman. To do this, make a copy of deadman.exe giving it a unique name (i.e. deadman2.exe) and run the copy with a unique configuration file. The deadman log file name, the deadman pid file name and the default configuration file name are determined by the deadman executable's name so there will be no conflict with other running deadman instances. == Configuration file keywords == All keywords are optional. If a keyword enables a feature, the feature will not be enabled if the keyword is omitted. If the keyword sets a time interval, a default interval will be set if the keyword is omitted. The translogfile keyword names a transaction log file and enables the transaction log file monitor feature. Deadman monitors this file for growth. If the file stops growing for longer than the configured interval, deadman will schedule a reboot. There is no default for the transaction log file. If this keyword is omitted, transaction log monitoring will not be enabled. The translogcheckintervalsec keyword defines how often the transaction log monitor feature will check the transaction log file. If this keyword is omitted, the default check interval is 600 seconds (i.e. 5 minutes). The processname keyword names the process that is responsible for writing to the configured transaction log file. If a process name is defined, deadman monitors the processes with this name. If there are no instances with this process name running, deadman assumes that the user has stopped the processes for maintenance and suspends transaction log file monitoring until one or more instances of the process are restarted. This prevents deadman from rebooting during planned shutdowns of these processes. There is no default for the process name. If this keyword is omitted, process monitoring will not be enabled. The errlogfile keyword names an apache httpd error log file and enables the httpd error log monitoring feature. Deadman will monitor the error log file for httpd child create failures. There is no default for the error log file. If this keyword is omitted, error log monitoring will not be enabled. The errorlogcheckintervalsec keyword defines how often deadman will check the configured error log file. If this keyword is omitted, the default check interval is 600 seconds (i.e. 5 minutes). The rebootfile keyword names the reboot request file and enables the reboot request feature. If this file exists, deadman will reboot the system. If this file exists when deadman is started, it will be deleted to prevent a stale reboot request file from triggering a reboot. There is no default for the reboot request file. If the keyword is omitted, reboot request monitoring will not be enabled. The sleepsec keyword defines how long deadman sleeps between check cycles. If this keyword is omitted, the default interval is 30 seconds. The rebootdelaysec keyword defines how long deadman waits after scheduling a reboot to perform the reboot. This allows for intermittent errors to be reported without forcing an unneeded reboot. If this keyword is omitted, the default delay interval is 30 seconds. The forcestatussec keyword defines how long deadman will wait before writing a proof of life message to the deadman log file. If this keyword is omitted, the default reporting interval is 21,600 seconds (i.e. 6 hours). == Tuning deadman == Every system is different. The goal of tuning the deadman timing parameters is to check often enough so that problems can be detected and effectively handled, while at the same time miminizing false positives and not checking so often as to waste system resources that could be better used elsewhere. When tuning deadman, it is recommended that deadman be run in test mode (i.e. -t). Test mode suppresses reboots and reduces the forcestatussec check interval which makes the the tuning process more efficient. When tuning deadman, the deadman log file can be helpful. Look for spurious reports that can be avoided by optimizing the timing parameters. Sleepsec defines the minimum reasonable value for all the other checking intervals. Translogcheckintervalsec should be set large enough to avoid most false positives, but small enough so that any reboot attempt occurs before the system has become so unstable that the reboot attempt will fail. Errorlogcheckintervalsec should be set large enough to avoid wasting system resources, but small enough so that the recovery attempt has a high probability of success. Rebootdelaysec large enough to allow intermittent reboot requests to clear, but small enough so that the reboot attempt occurs before the system has become so unstable that the reboot request will fail. == Requirements == The dos.sys driver must be installed. This driver provides application level access to the DosReboot DevHlp API. == Known issues == None == Ideas for the future == - Enhance the error log monitor feature to detect more types of errors and provide recovery support. - Support units of measure for numeric values - Support multiple occurrences of keywords in a single configuration file where this simplifies deadman usage. - Support deadmanlogfile keyword. - Support deadmanpidfile keyword. == Copyright and License == COVERED CODE IS PROVIDED UNDER THIS LICENSE ON AN "AS IS" BASIS, WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES THAT THE COVERED CODE IS FREE OF DEFECTS, MERCHANTABLE, FIT FOR A PARTICULAR PURPOSE OR NON-INFRINGING. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE COVERED CODE IS WITH YOU. SHOULD ANY COVERED CODE PROVE DEFECTIVE IN ANY RESPECT, YOU (NOT THE INITIAL DEVELOPER OR ANY OTHER CONTRIBUTOR) ASSUME THE COST OF ANY NECESSARY SERVICING, REPAIR OR CORRECTION. THIS DISCLAIMER OF WARRANTY CONSTITUTES AN ESSENTIAL PART OF THIS LICENSE. NO USE OF ANY COVERED CODE IS AUTHORIZED HEREUNDER EXCEPT UNDER THIS DISCLAIMER. Copyright (c) 2008-2022 Steven Levine and Associates, Inc. All rights reserved. Deadman is provided AS-IS, WITHOUT ANY WARRANTY OF ANY KIND, EITHER EXPRESS, IMPLIED OR STATUTORY, not even any implied warranty of MERCHANTABILITY. YOUR USE THIS PRODUCT IS CONDITIONED UPON YOUR ACCEPTANCE OF THIS LICENSE AGREEMENT. INSTALLING AND/OR USING THE PRODUCT INDICATES YOUR ACCEPTANCE OF THESE TERMS AND CONDITIONS. IF YOU DO NOT AGREE TO THESE TERMS AND CONDITIONS PROMPTLY DELETE THIS PRODUCT. You are granted a non-exclusive, non-assignable, non-transferable right to use deadman.exe. == eof ==
 www.warpcave.com/betas/deadman-20220621.zip  local copy
Deadman v. 0.3 (3/4/2022, Steven Levine)
 www.warpcave.com/betas/deadman-20220403.zip
Deadman v. 0.1 (5/12/2017, Steven Levine) Readme/What's new
deadman user guide v0.1 2017-12-05 SHL SHL Baseline == Introduction == Deadman attemps to detect when a system is not operating properly and to reboot the system when this occurs. == Usage == deadman is a VIO command line application which is typically run detached. Output is written to the standard output, if deadman is not running detached, and to the log file (%LOGFILES%\deadman.log). The log file entries are timestampped so that they can be correlated with information from other timestampped logs. To display the help screen, enter deadman.exe -? at the command line. The help screen will display as: Checks application flag file to monitor system health. Monitor mode can be file create or file write. Create mode deletes flag file and expects application to recreate the file. Write mode expects file size to increase. Checks activity at regular intervals (default is 5 minutes). Lack of activity indicates system may have a problem. Reboots system if lack of activity exceeds limit (default is 25 minutes). Logs status to deadman.log log file. Log file written to log directory. Can run in foreground or detached. deadman [-c] [-f] [-h] [-s] [-T] [-v] [-V] [-w] [-?] [interval [retries]] [file] -c Check daemon status -f Monitor in file create mode -h -? Display this message -s Stop daemon -T Run in TEST mode -v Display verbose status -w Monitor in file write mode -V Display version interval Activity period, default is 5 minutes Units defaults to seconds. Optionally suffix with h, m, s. retries Retry limit, defaults is 5 retries file Flag file to monitor Copyright (c) 2008-2017 Steven Levine and Associates, Inc. All rights reserved. == Theory of operation == Deadman attempts to monitor system health by watching the state of a user selected state file. Deadman can operate in two modes - write mode and create mode. In write mode, deadman expects the state file size to increase at regular intervals. If the file size fails to increase for longer than the reboot delay interval, deadman will attempt to reboot the system. In create mode, deadman expects the state file be be created at regular intervals. Deadman deletes the state file after it is detected. If the file is not created within the reboot delay interval, deadman will attempt to reboot the system. == Sample command lines == start "deadman" deadman -w 10m 3 d:\logs\apache\scoug-combined_log Run in write mode monitoring the specified file. Check for file every 10 minutes and reboot if file does not change after 3 tries (i.e. 30 minutes). detach deadman -c 1m 10 d:\var\log\signoflife Run in create mode monitoring the specified file. Check for file every minute and reboot if file does not appear after 10 tries (i.e. 10 minutes). deadman -c Check if deadman daemon is running. == Tuning deadman == The reboot delay interval is calculated as activity_interval * retry_count The activity interval defines how often deadman checks the state file. The activity interval should be defined large enough to minimize the number of log entries created when the system is operating normally under average loads. The retry count should be defined large enough so that a system that is under high load, but is otherwise operating normally does not reboot. At the same time, the retry count should be defined small enough to maximize the chances for a successful reboot. When tuning the deadman timing parameters, it is recommended that deadman be run in test mode (-T) which suppresses reboots when the reboot delay interval is exceeded. == Requirements == The dos.sys driver must be installed. This driver provides application level access to the DosReboot DevHlp. == Known Issues == None == Copyright and License == COVERED CODE IS PROVIDED UNDER THIS LICENSE ON AN "AS IS" BASIS, WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES THAT THE COVERED CODE IS FREE OF DEFECTS, MERCHANTABLE, FIT FOR A PARTICULAR PURPOSE OR NON-INFRINGING. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE COVERED CODE IS WITH YOU. SHOULD ANY COVERED CODE PROVE DEFECTIVE IN ANY RESPECT, YOU (NOT THE INITIAL DEVELOPER OR ANY OTHER CONTRIBUTOR) ASSUME THE COST OF ANY NECESSARY SERVICING, REPAIR OR CORRECTION. THIS DISCLAIMER OF WARRANTY CONSTITUTES AN ESSENTIAL PART OF THIS LICENSE. NO USE OF ANY COVERED CODE IS AUTHORIZED HEREUNDER EXCEPT UNDER THIS DISCLAIMER. Copyright (c) 2008-2017 Steven Levine and Associates, Inc. All rights reserved. Deadman is provided AS-IS, WITHOUT ANY WARRANTY OF ANY KIND, EITHER EXPRESS, IMPLIED OR STATUTORY, not even any implied warranty of MERCHANTABILITY. YOUR USE THIS PRODUCT IS CONDITIONED UPON YOUR ACCEPTANCE OF THIS LICENSE AGREEMENT. INSTALLING AND/OR USING THE PRODUCT INDICATES YOUR ACCEPTANCE OF THESE TERMS AND CONDITIONS. IF YOU DO NOT AGREE TO THESE TERMS AND CONDITIONS PROMPTLY DELETE THIS PRODUCT. You are granted a non-exclusive, non-assignable, non-transferable right to use deadman.exe. == eof ==
 www.warpcave.com/betas/deadman-20171205.zip
Record updated last time on: 16/07/2022 - 11:11

Translate to...

Add new comment