SoloManager: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
 
(6 intermediate revisions by the same user not shown)
Line 70: Line 70:
== Dependencies ==
== Dependencies ==
SoloManager requires the following:
SoloManager requires the following:
# Java version 1.8 or later is available on the host computer.
# Java version 1.8 or later is available on the host computer. Any implementation of Java 1.8 or later can be downloaded, for example, from https://jdk.java.net/nn/, where "nn" is a version number. Expand the downloaded .zip file and copy the contained folder, for example "jdk-21.0.2", to wherever you want to keep it, such as: "C:\tmp\jdk-21.0.2". Then add the location of this folder's "bin" folder to your system PATH environmental variable: "C:\tmp\jdk-21.0.2\bin". Now test Java is available by opening a new cmd terminal and running "java -version", which should return the version number.
# It must be able to write to a log file on the filesystem.
# It must be able to write to a log file on the filesystem.
# It must be able to issue a system reboot command (command can be defined within the configuration file).
# It must be able to issue a system reboot command if SoloManager.ini allows rebooting (command can be defined within the configuration file).
# Operating system may be any of: Linux, Windows, or MAC
# Operating system may be any of: Linux, Windows, or MAC


== Configuration File ==
== Configuration File ==


The configuration file will almost always need to be modified for the individual application and installation settings. An example file is included below, but a few key settings to modify include:
The SoloManager.ini configuration file will almost always need to be modified for the individual application and installation settings. An example file is included below, but a few key settings to modify include:


* '''executableName''' Name of the program to run (usually either Solo.exe or Solo_Predictor.exe.)
* '''executableName''' Name of the program to run (usually either Solo.exe or Solo_Predictor.exe.)
* '''startExecutableCommandPre''' Full path to the program listed as executableName (unless the program's folder has been added to the system path by the installer.)
* '''startExecutableCommandPre''' Full path to the program listed as executableName (unless the program's folder has been added to the system path by the installer.)
* '''outdir''' Specifies the folder which should contain the log files. By default these will be written to the same folder as the configuration file, but another file may be preferable if the user does not have read/write permissions to that folder.
* '''startExecutableCommandPost''' Any command parameters which should be appended to the executable when it is invoked.
* '''outdir''' Specifies the folder which should contain the log files. By default these will be written to the same folder as the configuration file, but another file may be preferable if the user does not have read/write permissions to that folder. PLEASE ensure that any folders included in the outdir path do exist.
* '''maxTargetRunDurationHours''' The target will be stopped and restarted every maxTargetRunDurationHours if this is a positive number. It has no effect if it is not a positive number.
* '''maxTargetRunDurationHours''' The target will be stopped and restarted every maxTargetRunDurationHours if this is a positive number. It has no effect if it is not a positive number.
* '''nResponseRestart''' and '''nResponseReboot''' indicates how many target check failures must occur before the application is restarted and/or the system is rebooted (respectively). If the Target Application fails after starting successfully, it will be detected by the next normal check, which occur every slowQueryIntervalSeconds seconds. When a target check fails a restart is invoked after (fastQueryIntervalSeconds+1)*fastFailCountLimit*nResponseRestart seconds. If the restart attempts fail then a system reboot is invoked after (fastQueryIntervalSeconds+1)*fastFailCountLimit*nResponseRboot seconds. Thus the worst case total elapsed time, in seconds, from the target failing until an action occurs can be roughly calculated by:
* '''nResponseRestart''' and '''nResponseReboot''' indicates how many target check failures must occur before the application is restarted and/or the system is rebooted (respectively). If the Target Application fails after starting successfully, it will be detected by the next normal check, which occur every slowQueryIntervalSeconds seconds. When a target check fails a restart is invoked after (fastQueryIntervalSeconds+1)*fastFailCountLimit*nResponseRestart seconds. If the restart attempts fail then a system reboot is invoked after (fastQueryIntervalSeconds+1)*fastFailCountLimit*nResponseRboot seconds. Thus the worst case total elapsed time, in seconds, from the target failing until an action occurs can be roughly calculated by:
Line 133: Line 134:
logFileMsgCapacity = 6000
logFileMsgCapacity = 6000
#
#
# Output directory. DO NOT add surrounding quotes
# Output directory. DO NOT add surrounding quotes. Verify any folders used actually exist.
outdir = C:\\tmp
outdir = C:\\temp
# Linux: use path like:
# Linux: use path like:
# outdir = \\tmp
# outdir = \\tmp
Line 182: Line 183:


====Troubleshooting Windows Service Problems====
====Troubleshooting Windows Service Problems====
* Errors and status messages will be reported to the log files stored in the ''C:/temp'' folder (if this doesn't exist, the log will be created in the same folder as the wrapper.exe). To move logs to a different location, edit the ''service64/conf/service.conf'' file (or ''service/conf/service.conf'' if using 32-bit). You can also modify the logging behavior in this file (maximum length, number of log backups, etc.)
* Several of the default configuration files expect the folder ''C:/temp'' to exist for logging purposes. This is used by default in SoloManager.ini for the "outdir" variable, and in service64\conf\wrapper.conf for the "wrapper.logfile" variable. If the folders for this path do not exist you may receive "Null Pointer Exception" errors from the SoloManager application. Either modify the paths used in the ''service.conf'' and/or ''solomanager.ini'' files '''or''' create the folder as needed.
 
* Several of the configuration files expect the folder ''C:/temp'' to exist. If it does not, you may receive "Null Pointer Exception" errors from the SoloManager application. Either modify the ''service.conf'' and ''solomanager.ini'' files '''or''' create the folder as needed.


* An error in the log saying that Java could not be found usually means that the service was unable to locate java in the standard Solo_Predictor folder. If encountered, edit the ''service/conf/service.conf'' file and locate the "wrapper.java.command" property. The usual value for this property is:
* An error in the log saying that Java could not be found usually means that the service was unable to locate java in the standard Solo_Predictor folder. If encountered, edit the ''service/conf/service.conf'' file and locate the "wrapper.java.command" property. The usual value for this property is:

Latest revision as of 12:23, 21 March 2024

This page describes the SoloManager program and its usage

Introduction

The purpose of SoloManager is to start a target program locally, for example Solo_Peredictor, and to then continuously monitor the target program's availability and restart it if necessary. The target program responds to tcp/ip queries on a specified port if it is operating normally. If the target program becomes unresponsive for for a specified period of time then the SoloManager can terminate it and restart it, and/or optionally reboot the host computer entirely. Many aspects of the SoloManager program can be configured by specifying values in the SoloManager.ini text file.


Note: The configuration files are contained in a folder called 'solomonitor', this is due to historical reasons. We apologize for any confusion this may have caused.

Description of components

SoloManager.jar file

This contains the SoloManager program. It also contains all necessary Java library files.

SoloManager configuration file

The SoloManager.ini file contains configuration details specifying how the SoloManager operates. See the example configuration file listed below.

Target program

This is an the program we wish to monitor and to ensure is always available. It will usually be Solo_Predictor. It must expose a TCP port and respond to socket queries on that port.

Wrapper service (optional)

This is an optional component which will start SoloManager whenever the host computer is booted up. It is described in the sections below about starting SoloManager as a Service or Daemon.

Relationships and processing sequence

These components are related as shown in the SoloManager flowchart.

Flowchart.png

Typical process flow

The SoloManager is typically started automatically when the host computer is booted up, usually via the Service and Daemon Wrapper.

Once started, the SoloManager begins by reading in values for all configurable parameters from the SoloManager.ini file. This file can be edited by the user to specify their preferred settings but it must be located in the same directory as the SoloManager jar file. This is where the user specified the name of the target executable which SoloManager will start and monitor, for example.

SoloManager then begins its unending loop where it checks the status of the target program. SoloManager creates a socket connection to the target program and sends a query. If the target program is alive it sends a response which must match what SoloManager is expecting.

SoloManager checks the Target program is alive by:

  1. opening a socket on the target program's port
  2. Sending the parameter "msgToSocket" to the socket and verifying that the first line returned from the socket equals the parameter "expectedResponse".
If the response is not valid SoloManager will repeat this check up to "fastFailCountLimit" times with a pause of "FastQueryIntervalSeconds" seconds.
If the response is valid the check is complete with result success.

If the target check was successful then the failure counter is reset to zero and the loop repeats after a specified pause period of "SlowQueryIntervalSeconds" seconds. If the target check was not successful then the failure counter is incremented. The loop continues until this counter reaches a specified "nResponseRestart" counter value, whereupon SoloManager issues a command to restart the the target program and continue with the loop. If the target program restarts then the next check will be successful so the loop continues normally.

If the restart command does not succeed in restarting the target program then the target checks will continue failing and the failure counter incrementing until it eventually attains the specified "nResponseReboot" counter value. At this point SoloManager issues a command to reboot the host computer and the entire process begins again.

During these operations SoloManager writes status information to a log file and optionally can send e-mail to report events. The log file will be located in the directory specified by "outdir". Its size is limited to the last "logFileMsgCapacity" log messages. E-mailed alerts are optional and are enabled by setting "enableEmailing" = true. In this case e-mail messages will be sent to the specified user whenever:

  1. The SoloManager program starts.
  2. SoloManager is about to issue a restart command for the target program.
  3. SoloManager is about to issue a reboot command to the host computer's operating system.

Dependencies

SoloManager requires the following:

  1. Java version 1.8 or later is available on the host computer. Any implementation of Java 1.8 or later can be downloaded, for example, from https://jdk.java.net/nn/, where "nn" is a version number. Expand the downloaded .zip file and copy the contained folder, for example "jdk-21.0.2", to wherever you want to keep it, such as: "C:\tmp\jdk-21.0.2". Then add the location of this folder's "bin" folder to your system PATH environmental variable: "C:\tmp\jdk-21.0.2\bin". Now test Java is available by opening a new cmd terminal and running "java -version", which should return the version number.
  2. It must be able to write to a log file on the filesystem.
  3. It must be able to issue a system reboot command if SoloManager.ini allows rebooting (command can be defined within the configuration file).
  4. Operating system may be any of: Linux, Windows, or MAC

Configuration File

The SoloManager.ini configuration file will almost always need to be modified for the individual application and installation settings. An example file is included below, but a few key settings to modify include:

  • executableName Name of the program to run (usually either Solo.exe or Solo_Predictor.exe.)
  • startExecutableCommandPre Full path to the program listed as executableName (unless the program's folder has been added to the system path by the installer.)
  • startExecutableCommandPost Any command parameters which should be appended to the executable when it is invoked.
  • outdir Specifies the folder which should contain the log files. By default these will be written to the same folder as the configuration file, but another file may be preferable if the user does not have read/write permissions to that folder. PLEASE ensure that any folders included in the outdir path do exist.
  • maxTargetRunDurationHours The target will be stopped and restarted every maxTargetRunDurationHours if this is a positive number. It has no effect if it is not a positive number.
  • nResponseRestart and nResponseReboot indicates how many target check failures must occur before the application is restarted and/or the system is rebooted (respectively). If the Target Application fails after starting successfully, it will be detected by the next normal check, which occur every slowQueryIntervalSeconds seconds. When a target check fails a restart is invoked after (fastQueryIntervalSeconds+1)*fastFailCountLimit*nResponseRestart seconds. If the restart attempts fail then a system reboot is invoked after (fastQueryIntervalSeconds+1)*fastFailCountLimit*nResponseRboot seconds. Thus the worst case total elapsed time, in seconds, from the target failing until an action occurs can be roughly calculated by:
ResponseTime = slowQueryIntervalSeconds + (fastQueryIntervalSeconds+1)*fastFailCountLimit*nResponse____


The settings in the configuration file represent likely minimum settings. If longer delays are acceptable before a response, increase the fastQueryIntervalSeconds and/or the nResponse___ settings.

--------------------------------------------------------------------------
------------ start: Example SoloManager.ini configuration file -----------
# default values for the SoloManager
#
# Period to pause when fast and slow polling the executable
fastQueryIntervalSeconds = 2
slowQueryIntervalSeconds = 6
#
# How many times to poll when getting fail result before escalating the response level
fastFailCountLimit = 2
# The initial fastFailCountLimit is usually larger, to allow time for target system startup
startFastFailCountLimit = 15
#
# How many fast cycles should occur with fails before applying response for level 1, 2, etc.
# Note: set to zero or a negative integer to suppress the response action from occurring
#nResponse1
nResponseRestart = 1
# nResponse2. A value of -1 means the system will not be rebooted.
nResponseReboot = -1
#
# maxTargetRunDurationHours. Non-positive value disables this feature.
# Positive value must be greater than 0.05 (hours)
maxTargetRunDurationHours = -1
#
# executable details
executableName = solo_predictor.exe
startExecutableCommandPre = C:\\Program Files\\EVRI\\Solo_Predictor_43\\application\\
startExecutableCommandPost = -loadsettings \\\"C:\\Program Files\\EVRI\\Solo_Predictor_43\\application\\default.xml\\\"
stopExecutableCommandPre = taskkill /F /IM 
stopExecutableCommandPost = 
#
# reboot
rebootCommandPre =
rebootCommandPost =
rebootCommand = shutdown
#
# executable socket details
serverIP = 127.0.0.1
serverPort = 2211
#
# log file capacity
logFileMsgCapacity = 6000
#
# Output directory. DO NOT add surrounding quotes. Verify any folders used actually exist.
outdir = C:\\temp
# Linux: use path like:
# outdir = \\tmp
#
# must be true or false, case insensitive:
enableEmailing        = false
#
# mailserver

mailServer            = mail.eigenvector.com

mailServerPort        = 587
mailUsername          = USERNAME@eigenvector.com
mailPassword          = PASSWORD
# Note: mail Addresses cannot include spaces and must be well-formed addresses
mailRecepientAddress  = SOMEONE@gmail.com
# Use something which will be a valid e-mail address:
mailSenderAddress     = monitor@solopredictor.com
#
//---------- start: Example SoloManager.ini configuration file -----------

Starting SoloManager Manually

SoloManager is described in the context of Solo_Predictor since this is where it is most commonly used. The solomanager.jar and SoloManager.ini files are supplied in the "solomonitor" folder located in the Solo_Predictor install folder which contains Solo_Predictor.exe, for example ...\Solo_Predictor_43\application\solomonitor\.

SoloManager can be started by double-clicking on the solomanager.jar file or at the command line by running "java -jar solomanager.jar" (after changing directory to the solomanager.jar file's folder). SoloManager will expects to find the SoloManager.ini configuration file to be located in the folder where it is launched from.

Starting SoloManager Automatically

SoloManager is most useful when run automatically by an operating system. This will start the Target Application in the background. The following describes how to install SoloManager as a service (Windows) or daemon (Linux).

Running SoloManager as a Windows Service

The service64 folder in the Solo_Predictor_xx\application\solomonitor\ folder contains the tools necessary to run SoloManager as a Windows service. This will automatically start the application without a user logging in. Follow these instructions to install SoloManager as a Windows service:

Configure solomanager.ini as needed for the intended behavior. Please ensure that:

  1. The startExecutableCommandPre and startExecutableCommandPost fields have the correct path for the Solo_Predictor target
  2. The correct serverIP and serverPort settings which match what Solo_Predictor is using (as specified in the Solo_Predictor settingsfile, default.xml)
  3. Copy solomanager.ini into the "service64" folder. This copy of solomanager.ini will be used by the service.
  4. Run the Install_Service.bat file in the "service64" folder to install the service (this batch file must be run by a user with administrative privileges)
  5. Run the Start_Service.bat file in the "service64" folder to start the service (this batch file must be run by a user with administrative privileges)

Note: different versions of Windows have differing levels of user access control. To run an application in an administrative mode, you will have to right click on the application icon and select "Run as an administrator".

To workaround the issue, you will need to open a command window as an Administrator. To do so, click on Start and search for "cmd". Right-click on "cmd.exe" and select the option "Run as Administrator". From this command window, you will be able to run all of the necessary batch files at administrative level.

Troubleshooting Windows Service Problems

  • Several of the default configuration files expect the folder C:/temp to exist for logging purposes. This is used by default in SoloManager.ini for the "outdir" variable, and in service64\conf\wrapper.conf for the "wrapper.logfile" variable. If the folders for this path do not exist you may receive "Null Pointer Exception" errors from the SoloManager application. Either modify the paths used in the service.conf and/or solomanager.ini files or create the folder as needed.
  • An error in the log saying that Java could not be found usually means that the service was unable to locate java in the standard Solo_Predictor folder. If encountered, edit the service/conf/service.conf file and locate the "wrapper.java.command" property. The usual value for this property is:
  C:/Program Files/EVRI/Solo_Predictor/application/sys/java/jre/win64/jre/bin/java
which is the default Solo_Predictor sub-folder in which the 64-bit version of Java is located. If Solo_Predictor is installed in a location other than the default folder, or you are using the 32-bit version of Solo_Predictor, change this value to reflect the correct location. For 32-bit Solo_Predictor, replace "win64" with "win32".
An alternative solution to the above issue is to execute the service with the credentials of a specific user that has a full copy of Java installed. To resolve this issue, go to the windows "Services" control panel, locate the EVRI SoloManager service, double-click the service and change the "Log On" properties to a specified user.
  • If you have problems, try running the test script:
 Test_Service
to see if the server will start when run manually. Errors from this script can be used to adjust the service.conf file.
  • To uninstall the service, run the Uninstall_Service.bat file (as an administrator.)

Running SoloManager as a Unix/Linux Daemon

The daemon_linux folder in the SoloManager main folder contains the tools necessary to run SoloManager as a Linux daemon. This will automatically start the application without a user logging in. Follow these instructions to install SoloManager as a Linux Daemon:

  1. Copy the application files onto the computer on which the application is to be run.
  2. Configure solomanager.ini as needed for the intended behavior.
  3. Copy solomanager.ini into the daemon_linux folder. This copy of solomanager.ini will be used by the daemon.
  4. Run the Install_Daemon script to install the daemon (this batch file must be run by a user with root privileges).
./Install_Service

NOTE: In order to execute this script and have the daemon operate correctly, you may have to manually set the "execute" bit on all files in the top-level daemon_linux folder to "on" using the chmod command inside the daemon_linux folder:

chmod 755 *

Errors and status messages will be reported to the log files stored in the daemon_linux/logs folder. To move logs to a different location, edit the daemon_linux/conf/wrapper.conf file. You can also modify the logging behavior in this file (maximum length, number of log backups, etc.)

To uninstall the daemon, run the Uninstall_Daemon script (as root.)

./Uninstall_Daemon

If you have problems, try running the test script:

./Test_Daemon

to see if the server will start when run manually.

Additional SoloManager Troubleshooting Steps

In case of problems it is helpful to start SoloManager manually, to verify that it can start Solo_Predictor. Check that:

  • 1. your Solo_Predictor's solomonitor\SoloManager.ini is identifying the location of:
    • A) your Solo_Predictor executable. See the steps and example SoloManager.ini file above. In particular, check that "startExecutableCommandPre" and "executableName" identify your Solo_Predictor executable.
    • B) the SoloManager log file, SoloManagerLog.txt, will be created in the folder identified in the SoloManager.ini at the line "outdir". This is very useful for identifying the cause of any problems. Make sure the identified folder exists.
    • C) "serverIP" and "serverPort" are correct for your Solo_Predictor (check its default.xml file).
    • D) Be sure to leave "nResponseReboot" = -1 to prevent SoloManager rebooting your computer, until you are ready to give it that ability.
  • 2. Check that java is identified in your service/conf/service.conf file​, as described in the third bullet-point under "Troubleshooting_Windows_Service_Problems​" above.
  • 3. Now try starting SoloManager manually by double-clicking on the SoloManager.jar file. This should start Solo_Predictor for you and keep it running. So you should see the Solo_Predictor start-up splash screen appear and Solo_Predictor run. If it does not start successfully then examine the contents of the "SoloManagerLog.txt" file in the folder you specified for "outdir".