Tag Archives: monitoring
title-icon-code

Monitoring Mac OS X Server DHCP service with Nagios

Whilst not recommended as the primary DHCP server for large networks, Mac OS X Server is quite competent at providing leases for a subnet or two, particularly when coupled with NetBoot services. In our testing lab, we use a Mac OS X server for this exact purpose, serving as a NAT gateway, and providing DHCP and NetBoot services to a subnet, and we have had issues in the past with the underlying bootpd process locking up or too many clients on the subnet exhausting available leases.

With that, I have added a script to check the DHCP service on Mac OS X server. It will work on 10.6-8, and will check that the DHCP service is running, and then return performance data on the number of provided leases and active clients.

Over the next few weeks, I am planning to detail some of the workflows I am using to generate graphs like the one pictured above, giving excellent visibility into the historical performance of your Mac services.

check_osx_dhcp.sh on GitHub

title-icon-code

Monitoring launchd tasks with Nagios

Over the last 4 major releases of Mac OS X, launchd has cemented itself as the virtual engine room of service control, becoming responsible for the execution of most critical processes in the operating system.

From bootstrap all the way through to iCloud and Time Machine backup, launchd is somehow involved in almost every important task. It therefore makes sense that the comprehensive overview provided by launchd of it’s current tasks and exit statuses become central to the monitoring of a critical Mac server or client system. Particularly on Mac OS X server, where processes such as Postgres and Apache now take such a central role in the provision of services, monitoring of launchd can be a game changer in identifying issues quickly.

I have added a new script to my OSX-Monitoring-Tools project which checks launchctl, and reports on non-zero exit codes for tasks. This is a lifesaver on Mac OS X Server, and can alert you very quickly to processes controlled by launchd that exit badly (very good at picking up ‘Throttling respawn’ tasks).

CRITICAL - critical daemons (org.postgresql.postgres,com.apple.devicemanager) exited with a non-zero code! | active=113;inactive=161;error=4;

In the example above, bad permissions are stopping PostgreSQL from starting, which is also causing the failure of Profile Manager. As you can see, the script returns performance data for task counts.

Because some Apple provided launch daemons happily exit with non zero exit codes as status identifiers (com.apple.xprotectupdater, com.apple.suhelperd), an array of exceptions are built into the script. You can also supply your own exceptions and critical identifiers as arguments.

check_osx_launchd.sh on GitHub

title-icon-code

Checking Carbon Copy Cloner with Nagios

In many Mac OS X Server deployments that I see, a file level cloning tool such as Carbon Copy Cloner or SuperDuper is used to keep a bootable backup of the server OS on a secondary drive. Whilst this is not a coverall backup strategy, nor effective for all services (databases and other files can be left in an inconsistent state on disk), it does provide an excellent emergency boot option that can have essential services back up and running very quickly in the case of a primary drive failure.

Generally, I use a scheduled task with Carbon Copy Cloner (CCC) at about 1am each morning to do a full clone of data from the boot drive to a secondary. It is possible for this process to provide email notifications on completion or failure, but as I prefer all my monitoring through Nagios, I have developed a script to track and check for successful CCC clones.

Full instructions are in the header of the script, but it has a dual purpose. On the Nagios end, the script checks for the existence and age of a hidden file on the clone destination that signifies when the last successful clone took place, and a warning is thrown based on the age threshold you supply. On the CCC end, you simply choose the script as the post-clone scripting action for your scheduled task, and it will handle the creation of the hidden file on your destination if the clone was successful.

It has been added to my OSX-Monitoring-Tools project on GitHub today.

check_ccc_currency.sh on GitHub