Ajitabh Pandey.Info - Monitoring Lotus Notes/Domino ServersAjitabh Pandey.Info

Very recently I was asked to setup Nagios to monitor the Lotus Notes/Domino Servers. There were some around 500 plus servers across the globe. It was an all Windows shop and the current monitoring was being done using GSX, HP Systems Insight Manager and IBM Director. The client wanted a comprehensive solution so that they have a single monitoring interface to look at and after an initial discussion they decided to go ahead with Nagios.

This document looks at monitoring Lotus Notes/Domino servers using SNMP through Nagios. I have provided some of the required OIDs and their initial warning and critical threshold values in tabular format. There are many more interesting OIDs listed in the domino.mib file. Also I have attached the Nagios commands definition file and service definition files at the end of the document. In order to use certain checks, some plugins are required which can be downloaded from http://www.barbich.net/websvn/wsvn/nagios/nagios/plugins/check_lotus_state.pl.

Note – I recently found that the required plugins are not available on the original site anymore, so I have made my copy available with this document. You can download the scripts from the link at the bottom of the document.

To start with I asked the windows administrators to install the Lotus/Domino SNMP Agent on all servers and after that I got hold of a copy of domino.mib file which is located in C:\system32.

Next I listed all the interesting parameters from the domino.mob file and started querying a set of test servers to find out if a value is being returned or not. Following is the OID list and what each OID means. Most of these checks are only valid in the Active node. This is important to know if the Domino servers are in a HA cluster (active-standby pair). If there is only one Domino Server then these checks will apply.

Moinitoring Checks on Active Node

Monitoring Checks on Active Node
Nagios Service Check	OID	Description	Threshholds (w- warning, c-critical)
dead-mail	enterprises.334.72.1.1.4.1.0	Number of dead (undeliverable) mail messages	w 80, c 100
routing-failures	enterprises.334.72.1.1.4.3.0	Total number of routing failures since the server started	w 100, c 150
pending-routing	enterprises.334.72.1.1.4.6.0	Number of mail messages waiting to be routed	w10, c 20
pending-local	enterprises.334.72.1.1.4.7.0	Number of pending mail messages awaiting local delivery	w 10, c 20
average-hops	enterprises.334.72.1.1.4.10.0	Average number of server hops for mail delivery	w 10, c 15
max-mail-delivery-time	enterprises.334.72.1.1.4.12.0	Maximum time for mail delivery in seconds	w 300, c@600
router-unable-to-transfer	enterprises.334.72.1.1.4.19.0	Number of mail messages the router was unable to transfer	w 80, c100
mail-held-in-queue	enterprises.334.72.1.1.4.21.0	Number of mail messages in message queue on hold	w 80, c 100
mails-pending	enterprises.334.72.1.1.4.31.0	Number of mail messages pending	w@80, c@100
mailbox-dns-pending	enterprises.334.72.1.1.4.34.0	Number of mail messages in MAIL.BOX waiting for DNS	w 10, c 20
databases-in-cache	enterprises.334.72.1.1.10.15.0	The number of databases currently in the cache. Administrators should monitor this number to see whether it approaches the NSF_DBCACHE_MAXENTRIES setting. If it does, this indicates the cache is under pressure. If this situation occurs frequently, the administrator should increase the setting for NSF_DBCACHE_MAXENTRIES	w 80, c 100
database-cache-hits	enterprises.334.72.1.1.10.17.0	The number of times an lnDBCacheInitialDbOpen is satisfied by finding a database in the cache. A high ‘hits-to-opens’ ratio indicates the database cache is working effectively, since most users are opening databases in the cache without having to wait for the usual time required by an initial (non-cache) open. If the ratio is low (in other words, more users are having to wait for databases not in the cache to open), the administrator can increase the NSF_DBCACHE_MAXENTRIES	w, c
database-cache-overcrowding	enterprises.334.72.1.1.10.21.0	The number of times a database is not placed into the cache when it is closed because lnDBCacheCurrentEntries equals or exceeds lnDBCacheMaxEntries*1.5. This number should stay low. If it begins to rise, you should increase the NSF_DbCache_Maxentries settings	w 10, c 20
replicator-status	enterprises.334.72.1.1.6.1.3.0	Status of the Replicator task
router-status	enterprises.334.72.1.1.6.1.4.0	Status of the Router task
replication-failed	enterprises.334.72.1.1.5.4.0	Number of replications that generated an error
server-availability-index	enterprises.334.72.1.1.6.3.19.0	Current percentage index of server’s availability. Value range is 0-100. Zero (0) indicates no available resources; a value of 100 indicates server completely available

Interesting OIDs to plot for trend analysis

Interesting OIDs to plot for Trend Analysis
enterprises.334.72.1.1.4.2.0	Number of messges received by router
enterprises.334.72.1.1.4.4.0	Total number of mail messages routed since the server started
enterprises.334.72.1.1.4.5.0	Number of messages router attempted to transfer
enterprises.334.72.1.1.4.8.0	Notes server’s mail domain
enterprises.334.72.1.1.4.11.0	Average size of mail messages delivered in bytes
enterprises.334.72.1.1.4.13.0	Maximum number of server hops for mail delivery
enterprises.334.72.1.1.4.14.0	Maximum size of mail delivered in bytes
enterprises.334.72.1.1.4.15.0	Minimum time for mail delivery in seconds
enterprises.334.72.1.1.4.16.0	Minimum number of server hops for mail delivery
enterprises.334.72.1.1.4.17.0	Minimum size of mail delivered in bytes
enterprises.334.72.1.1.4.18.0	Total mail transferred in kilobytes
enterprises.334.72.1.1.4.20.0	Count of actual mail items delivered (may be different from delivered which counts individual messages)
enterprises.334.72.1.1.4.26.0	Peak transfer rate
enterprises.334.72.1.1.4.27.0	Peak number of messages transferred
enterprises.334.72.1.1.4.32.0	Number of mail messages moved from MAIL.BOX via SMTP
cache cmd hit rate	enterprises.334.72.1.1.15.1.24.0
cache db hit rate	enterprises.334.72.1.1.15.1.26.0
hourly access denials	enterprises.334.72.1.1.11.6.0
req per 5 min	enterprises.334.72.1.1.15.1.13.0
unsuccesfull run	enterprises.334.72.1.1.11.9.0

Files and Scripts

18 Responses to Monitoring Lotus Notes/Domino Servers

Mike says:

March 9, 2022 at 4:06 pm

Hi there, the .gz file above downloads as a zero yte file – could you refresh the file/link please?

Pingback: Monitor Domino with SNMP | Chris' blog
Evgeny Tsyganov says:

October 14, 2015 at 11:59 am

Hello, this link “http://www.barbich.net/websvn/wsvn/nagios/nagios/plugins/check_lotus_state.pl” is unavailable, how i can download it? Thanks fo help.

mathieu says:

October 8, 2014 at 7:39 pm

Hi,
I think i have a problem with the perl file.
Can anyone upload the file please ?
Thanks

Manuel Peña says:

October 13, 2012 at 1:00 am

Excuseme could you send me the utils.pm file pls. I need it urgent to my mail pls

Ajitabh says:

June 14, 2011 at 12:06 pm

Cristian,

I have verified the link is not broken – I am reposting direct link here.

http://ajitabhpandey.info/wp-content/uploads/2007/04/lotus-monitoringtar.gz
http://ajitabhpandey.info/wp-content/uploads/2007/04/lotus-commandscfg.txt
http://ajitabhpandey.info/wp-content/uploads/2007/04/lotus-servicescfg.txt

Cristian says:

June 8, 2011 at 12:08 pm

Hello
can you please post the scripts again or send me to mail
thank you very much , but the link is broken

* lotus-monitoring.tar.gz
* lotus-commands.cfg
* lotus-services.cfg

Ajitabh says:

May 31, 2011 at 11:26 am

The original site seems to be down, but if you look above I have provided a local download of the files. Download lotus-monitoring.tar.gz which has all the necessary plugins.

MaCo says:

May 30, 2011 at 10:34 pm

The download link is down:
http://www.barbich.net/websvn/wsvn/nagios/nagios/plugins/check_lotus_state.pl.

🙁

Marcelo Barbosa says:

March 17, 2011 at 1:00 am

The Link of plugin don’t exit more.

Ajitabh says:

October 25, 2010 at 4:15 pm

Tom/Andre,

I have fixed the links. Looks like file permissions got screwed up after a recent page.

Andre says:

October 25, 2010 at 11:21 am

Hi there.
Could you fix the download link ? . Seems like the files are gone missing again 😉

Tom Corcoran says:

September 29, 2010 at 2:19 pm

Hi

Are these scripts not available anymore?

Ajitabh says:

October 21, 2009 at 6:01 am

Sorry about that, but its working now, it was the wp-cache plugin playing. Somehow the cache directory went read-only and the old path was cached there. It is working now, I have tested it. One thing you might want to note is that the lotus-monitoring.tar.gz will be downloaded as lotus-monitoringtar.gz, but it is definitely a tar-gziped archive, so you need to untar after gunzipping or you can also directly run the tar -xvzf lotus-monitoringtar.gz.

Wim Savenberg says:

October 20, 2009 at 8:08 pm

if you could mail them it would be wonderfull … 😉

Wim Savenberg says:

October 20, 2009 at 8:04 pm

sorry to inform you that the links are still not working …. :-[

Ajitabh says:

October 5, 2009 at 5:05 am

Thanks for notifying me. It was a broken link. I have fixed it now. Please download from the above links.

Wim Savenberg says:

September 23, 2009 at 5:16 pm

I tried to download the following scripts. Unfortunatly they are not available anymore. Would you, by any chance, have these scripts
Files and Scripts

* lotus-monitoring.tar.gz
* lotus-commands.cfg
* lotus-services.cfg
if so would you be so kind to send them to me ?