This page describes the SoloManager program and its useage
The purpose of SoloManager is to start a target program locally and to then continuously monitor the target program's availability. The target program responds to tcp/ip queries on a specified port if it is operating normally. If the target program becomes unresponsive for for a specified period of time then the SoloManager can terminate it and restart it, and/or reboot the host computer entirely. Many aspects of the SoloManager program can be configured by specifying values in the SoloManager.ini text file.
Description of components
SoloManager jar file
This contains the SoloManager program, and a sample SoloManager.ini file. It also contains all necessary Java library files.
SoloManager configuration file
Contains configuration details specifying how the SoloManager operates. See the example configuration file listed below.
This is an the program we wish to monitor and to ensure is always available. It must expose a TCP port and respond to socket queries on that port.
Wrapper service (optional)
This could be another program which starts SoloMonitor whenever the host computer is booted up. It must have permission to start the SoloManager
Relationships and processing sequence
Typical process flow The SoloManager is typically started automatically when the host computer is booted up. This might be accomplished by a wrapper service for example.
The SoloManager begins by reading in values for all configurable parameters from the SoloManager.ini file. This file can be edited by the user to specify their preferred settings but it must be located in the same directory as the SoloManager jar file. This is where the user specified the name of the target executable which SoloManager will start and monitor, for example.
SoloManager then begins its unending loop where it checks the status of the target program. SoloManager creates a socket connection to the target program and sends a query. If the target program is alive it sends a response which must match what SoloManager is expecting. SoloManager checks the Target program is alive by
1. opening a socket on the target program's port query 2. Sending the parameter "msgToSocket" to the socket and verify that the first line returned from the socket equals the parameter "expectedResponse". If the response is not valid SoloManager will repeat this check up to "fastFailCountLimit" times with a pause of "FastQueryIntervalSeconds" seconds. If the response is valid the check is complete with result success.
If the target check was successful then the failure counter is reset to zero and the loop repeats after a specified pause period of "SlowQueryIntervalSeconds" seconds. If the target check was not successful then the failure counter is incremented. The loop continues until this counter reaches a specified "nResponseRestart" counter value, whereupon SoloManager issues a command to restart the the target program and continue with the loop. If the target program restarts then the next check will be successful so the loop continues normally.
If the restart command does not suceed in restarting the target program then the target checks will continue failing and the failure counter incrementing until it eventually attains the specified "nResponseReboot" counter value. At this point SoloManager issues a command to reboot the host computer and the entire process begins again.
During these operations SoloManager writes status information to a log file and optionally can send e-mail to report events. The log file will be located in the directory specified by "outdir". Its size is limited to the last "logFileMsgCapacity" log messages. E-mailed alerts are optional and are enabled by setting "enableEmailing" = true. TODO: what messages can be e-mailed.
Platform independent provided a Java Virtual Machine is available on the host computer. Be able to write log file on the filesystem. SoloManager must be able to issue a system reboot command.
-------------------------------------------------------------------------- ------------ start: Example SoloMonitor.ini configuration file ----------- # default values for the SoloManager # # Period to pause when fast and slow polling the executable fastQueryIntervalSeconds = 2 slowQueryIntervalSeconds = 6 # # How many times to poll when getting fail result before escalating the response level fastFailCountLimit = 2 # The initial fastFailCountLimit is usually larger, to allow time for target system startup startFastFailCountLimit = 15 # # How many fast cycles should occur with fails before applying response for level 1, 2, etc. # Note: set to zero or a negative integer to suppress the response action from occurring #nResponse1 nResponseRestart = 1 # nResponse2 nResponseReboot = 3 # # executable details executableName = solo_predictor.exe startExecutableCommandPre = cmd /c start startExecutableCommandPost = stopExecutableCommandPre = taskkill /F /IM \" stopExecutableCommandPost = \" # # reboot rebootCommandPre = rebootCommandPost = rebootCommand = shutdown /? # # executable socket details serverIP = 127.0.0.1 serverPort = 2211 # # log file capacity logFileMsgCapacity = 6000 # # Output directory. DO NOT add surrounding quotes outdir = C:/junk/solomonitor # # must be true or false, case insensitive: enableEmailing = false # # mailserver mailServer = mail.eigenvector.com mailServerPort = 587 mailUsername = USERNAME@eigenvector.com mailPassword = PASSWORD # Note: mail Addresses cannot include spaces and must be well-formed addresses mailRecepientAddress = SOMEONE@gmail.com # Use something which will be a valid e-mail address: mailSenderAddress = firstname.lastname@example.org # //---------- start: Example SoloMonitor.ini configuration file -----------