Watchdogs

MailCleaner Support
Added over 1 year ago

MailCleaner Enterprise Edition has a number of Watchdogs which report common errors to MailCleaner Staff. In an effort to make these errors more visible for clients as well as for Community Edition users so that they can see and correct these issues for themselves, these watchdogs errors have been added to the web interface.

Below are a list of the current Watchdogs as well as recommendations on the possible resolutions. Note that many of the watchdogs that are not specifically checking a daemon's status operate by monitoring the current day's logs, so corrections will not be noticed until the next day. At most the watchdog reporting is only generated every 15 minutes, so you will not see them immediately disappear. If you would like to refresh all of the watchdogs immediately, you can use:

/usr/mailcleaner/bin/watchdog/watchdogs.pl All ; /usr/mailcleaner/bin/watchdog/watchdogs_report.sh

detect_bad_DKIM.pl

This indicates an issue with one of the DKIM keys on your machine.

Note: This watchdog is brand new and an exception was not made for a blank version of the 'default.pkey' file. If you have an 'invalid' message for this key file, check to see if that file is blank before reading further:

cat /var/mailcleaner/spool/tmp/mailcleaner/dkim/default.pkey

If it is, you can ignore the warning for that file. This watchdog will be patched shortly to ignore a blank version of this file if you have not yet generated one.

One or both of the following issues can appear:

  • Short DKIM key length:

The first indicates that those keys are shorter than 1024 bits which is the recommended standard. Some services are beginning to reject, ignore, or otherwise penalize messages signed with an out-dated (short) key. To resolve this issue for 'default.pkey' you need to generate a new key with:

Configuration->SMTP->DKIM->Generate new private key...

For any domain-specific key you need to generate a new one from:

Configuration->Domains->[select domain]->Outgoing Relay->Generate new private key...
  • Invalid DKIM key:

This error indicates that the private key file cannot actually be read/decoded by OpenSSL. The files are located in the directory:

/var/mailcleaner/spool/tmp/mailcleaner/dkim/

You can see what OpenSSL sees by running:

openssl rsa -in /var/mailcleaner/spool/tmp/mailcleaner/dkim/<domain.pkey> -noout -text

The Watchdog is specifically looking for the line containing:

Private-Key: (**N** bit)

where N is the length. If you see this error, it is likely that OpenSSL will output an error, having not read the key at all. Otherwise, it is possible that the output does not include this line, but this is not a case that is known to us. You will probably need to generate a new key as described in the other case.

detect_bad_git.sh

This indicates that the last time the automatic update script (/root/Updater4MC/updater4mc.sh) ran, it exited with an error showing that the Git repository couldn't be synced due to local corruption. You can try to diagnose the issue by moving to the MailCleaner repository and checking the status and/or attempting to pull changes:

cd /usr/mailcleaner
git status
git pull

detect_Community (Enterprise only)

This test simply reports whether the version of the host is seen to be Community. This watchdog is only downloaded when a machine is registered for Enterprise and will report if the machine that had been registered now appears to be using Community. Some events, such as running a 'reset' on the Git tree can cause a registered appliance to revert to Community.

If you believe the appliance should still be registered, log in to the web interface of the affected host and use the Configuration->Base System->Registration form to register the host again. If this does not resolve the issue, you can contact support.

If you intentionally unregistered the appliance, it is possible that this script failed to be removed. You can simply delete the script and its configuration file:

rm /usr/mailcleaner/bin/watchdog/MC_mod_detect_Community.sh /usr/mailcleaner/etc/watchdog/MC_mod_detect_Community.conf

detect_F2B

Checks whether the Fail2Ban server is running. This feature was in development for quite some time and this watchdog was used to alert us to any hosts using the beta feature. It has since been released. It can appear for two reasons. The first being that you have an older version of the script which is erroneously reporting that fail2ban is not running. Otherwise you have an updated script that reports that Fail2Ban is not running.

Some admins will choose to disable Fail2Ban since they may have other firewall protections already in place. If this is the case, you can simply disable this script by adding 'disable' to the TAGS in /usr/mailcleaner/etc/watchdog/MC_mod_detect_F2B.conf :

TAGS=all dix disable

detect_license_Kaspersky

Checks to see if the Kaspersky license is expired, if installed. Contact your sales representative if your license has expired. If you trialled Kaspersky or used it for a time but no longer wish to use it you can uninstall the package with:

apt-get remove kaspersky-64-2.0

detect_msg_sniffer_db

Searches for MessageSniffer logs without any database information, indicating that it has not been successfully downloaded.

Contact support to let us know that you need the DB to be re-downloaded.

detect_PrefTDaemon

Checks for redundant PrefTDaemon processes

Restart the Preferences daemon to kill any orphaned processes. This can be done from Monitoring->Status (you will need to expand the Status column with "Show more..."), or with the command:

/usr/mailcleaner/etc/init.d/preftdaemon restart

NOTE: An update to the init scripts should have removed the chance of seeing this error during normal operation. If you are regularly seeing this watchdog it is likely that your system has not fully updated.

detect_pyenv_install.sh

Detects the unsuccessful installation of the new Python libraries being used to develop new tools. These libraries are not yet used for any component of the system other than Fail2Ban, so it is not critical to fix at this time. We will monitor the reports that we receive as it nears broader usage.

If you would like to try installing these libraries manually to remove the error, you can do so with:

pip install mailcleaner-library -U --trusted-host repository.mailcleaner.net --index https://repository.mailcleaner.net/python/ --extra-index https://pypi.org/simple/

detect_Spam_Handler

Checks for redundant Spam_Handler processes

Restart the SpamHandler daemon to kill any orphaned processes. This can be done from Monitoring->Status (SpamHandler is identified as "Filtering Engine"), or with the command:

/usr/mailcleaner/etc/init.d/spamhandler restart

NOTE: An update to the init scripts should have removed the chance of seeing this error during normal operation. If you are regularly seeing this watchdog it is likely that your system has not fully updated.

detect_StatsDaemon

Checks for redundant StatsDaemon processes

Restart the StatsDaemon to kill any orphaned processes. This can be done from Monitoring->Status, or with the command:

/usr/mailcleaner/etc/init.d/statsdaemon restart

NOTE: An update to the init scripts should have removed the chance of seeing this error during normal operation. If you are regularly seeing this watchdog it is likely that your system has not fully updated.

detect_Summaries

Reports an inability for the summaries to be sent due to an inability to connect to the database on the previous day. This normally indicates that the Summaries have conflicted with a MailCleaner restart. This will happen if the summaries are sent too soon after the nightly updates at 10:30pm, or if they are still running prior to the updates. Consider changing the daily task timing from Configuration->General Settings->Periodic Tasks.

detect_unsync_git

Less severe than detect_bad_git. This simply reports that there is something in your Git tree that has diverged from origin/master. This will happen if you or a MailCleaner staff member is testing experimental commits or making other changes that have been commited but not merged upstream. You will have to navigate to /usr/mailcleaner and reset the tree to match origin/master for this to disappear.

disk_full

Reports if the / or /var partitions are over 85% used.

You may choose to clear space, reduce the retention time for quarantined items, or increase the size of your disk.gg

disk_full_inodes

Same as previous except that it is based on the inode count (available unique files) instead of the actual bytes used.

exim_4.XX

Checks to make sure that you are using the latest supported version of Exim.

Attempt to upgrade with:

apt-get update && apt-get upgrade mc-exim

git_conflicts.sh

The last automatic update was unable to pull changes to the git tree because there are conflicting changes that could not be stashed and restored. This is probably because you've modified files that have sinced been changed upstream. Manually resolving the merge conflict is generally the only solution. You can find which files cause the issue from the last update log (/root/Updater4MC/updater_.log) which should contain:

error: The following untracked working tree files would be overwritten by merge:

followed by the list of files. Once these files are known, you can use the 'git diff' command to find out what changes you made:

cd /usr/mailcleaner
git diff path/to/file

You can then reset this file(s):

git reset -- path/to/file

then try manualy running the updater again:

/root/Updater4MC/updater4mc.sh

and reapply your modifications to the updated file, if possible/necessary.

Internal keys

Reports missing or excessive internal keys in /root/.ssh/authorized_keys

You can generate, propagate and install these with: /usr/mailcleaner/bin/internal_access -gpi

modified_exim1

Checks for modifications to: /usr/mailcleaner/etc/exim/exim_stage1.conf_template

Historically many clients have customized this template in order to change Exim settings not available in the UI. Unfortunately, this file is tracked by Git, so changes would be overwitten any time that an upstream change to the file was published. This alert allows us to know who will be impacted by an upstream change.

Currently, this customization is rarely necessary as there are dedicated files to hold customization to all of the most frequently modified settings in the /usr/mailcleaner/etc/exim/stage1 directory. These will not be overwritten since the upstream copy should never change.

nb_mails_in_queue

Reports when there are a greater number of messages in each of the Exim queues than are expected during normal operation. Reasons for queuing vary. Open the queued item list from Monitoring->Status by clicking on the queue count. If the issue is resolved, you can try to flush the queue directly:

https://support.mailcleaner.net/boards/3/topics/49

If you have several machines and only one with a backlog, you may wish to temporarily stop the Inbound MTA on that machine until it clears.

slave_status

Checks if the Slave DB is in sync. This can happen for a number of reasons during normal operations. The database syncronization has been updated to automatically restore itself during the main Cron tasks. If you see this persist, you can manually run the resync command:

/usr/mailcleaner/bin/resync_db.sh

It is likely that you will be told that the DB is already synced, indicating that this was one of the transient issues mentioned.

TesseractOcr (Not yet implemented)

Checks to ensure that TesseractOcr dependencies have been installed. These will be necessary when the TesseractOcr module is added to SpamAssassin. The dependencies require about 250MB, so the upgrade is scheduled to avoid installing them if there won't be room to spare.

Check do see available disk space:

df -h /

If you have at least 400MB, you can install the dependencies manually:

apt-get install tesseract-ocr libopencv-dev libswitch-perl

Updater_Status

Reports troubles with last automatic update. Will report

  • if the update exited unsuccessfully
  • if the main /etc/mailcleaner.conf file is invalid and thus cannot provide the necessary details to do an update
  • if it started running over 1 hour ago and never finished.

You can try manually running the updater (/root/Updater4MC/updater4mc.sh) or inspect the last log (/root/Updater4MC/updater_.log).

May appear alongside other watchdogs with a better description of the issue.

Updater_TLS

Checks for TLS issues during the update process. This is likely to be an issue on our end. Contact us if you see this error.

Others

Watchogs are added and removed regularly. Check to make sure you are on the latest version because you could be seeing a watchdog that is no longer relevant. If you do believe that you are on the latest version and see an error that isn't listed, report it to us as we may be required to update the documentatino. Thanks!

Custom Watchdogs

If you are considering writing your own custom module, please consider submitting it as a Pull Request to our GitHub page. The documentation that follows will instruct you to use the 'CUSTOM_mod_' naming convention. These will be included in the WebUI but will not be reported to us, ever if you use Enterprise Edition, so it is a way to have modules that only exist and report locally. This will also ensure that you don't risk having a naming conflict with watchdogs that exist in the Git tree or that would be fetched from our Enterprise servers. If you are satisfied with the results of the custom module and believe that it will be useful for others, please rename it to a 'MC_mod_' and submit a Pull Request with that name.

You can create your own watchdogs that will be displayed in the WebUI. To do this, copy one of the 'MC_mod_SKELETON.pl' or 'MC_mod_SKELETON.sh' scripts with a filename starting with 'CUSTOM_mod_' then change this file to accomplish whatever task you need it to check. The output file should contain:

A text description of the error (optional)
RC : 0
EXEC : 0

where RC is the return code (0 meaning no error, any other value meaning there was an error). If you are going to submit a module to GitHub, please ensure that there is a comment indicating the significance of the return code, or that the significance is obvious from context. This will allow us to no better monitor user reports. EXEC should simply be the execution time in seconds as presented in the example code. If one is to be provided, the text description should be printed first on one line. The order of the other two does not matter.

Once the script is complete, create an accompanying config file in /usr/mailcleaner/etc/watchdog/ with the same name other than the extension being substituted for '.conf'. The contents of this file should be:

TAGS=oneday
EXEC_MODE=Parralel

The TAGS are the list of groups that this watchdog will be included in. The built-in tags are:

dix - Run every 10 minutes
oneday - Run 3 times a day (used to be once)
all - (Redundant) to be included with manual evocations of 'watchdogs.pl All'

You may use your own TAG value for custom modules if you do want to run on a different schedule. These existing TAGS are run by the root user's Crontab. Edit with:

crontab -e

The EXEC_MODE can be:

Parrallel - All run at once and before sequential processes.
Sequence - Run one after another in alphabetical order (useful if an existing watchdog might cause an error if it has yet to complete).

The config file can also have a TIMEOUT value (in seconds) if you anticipate that it could stall.