Tag Archives: nagios

Monitoring Mac OS X Server DHCP service with Nagios

Whilst not recommended as the primary DHCP server for large networks, Mac OS X Server is quite competent at providing leases for a subnet or two, particularly when coupled with NetBoot services. In our testing lab, we use a Mac OS X server for this exact purpose, serving as a NAT gateway, and providing DHCP and NetBoot services to a subnet, and we have had issues in the past with the underlying bootpd process locking up or too many clients on the subnet exhausting available leases.

With that, I have added a script to check the DHCP service on Mac OS X server. It will work on 10.6-8, and will check that the DHCP service is running, and then return performance data on the number of provided leases and active clients.

Over the next few weeks, I am planning to detail some of the workflows I am using to generate graphs like the one pictured above, giving excellent visibility into the historical performance of your Mac services.

check_osx_dhcp.sh on GitHub


Using Pushover to push Nagios notifications

With the recent outages in Boxcar‘s service, I have had to move to a more reliable push notification platform. Pushover seems to fit the bill perfectly, and whilst it is currently mobile device only (iOS and Android clients), their FAQ states their plans for a Mac client with notification center integration.

I have added a script to my OSX-Monitoring-Tools project to send a notification to your Pushover account that should be easy to integrate into your existing Nagios workflow.

notify_by_pushover.sh on GitHub


Monitoring launchd tasks with Nagios

Over the last 4 major releases of Mac OS X, launchd has cemented itself as the virtual engine room of service control, becoming responsible for the execution of most critical processes in the operating system.

From bootstrap all the way through to iCloud and Time Machine backup, launchd is somehow involved in almost every important task. It therefore makes sense that the comprehensive overview provided by launchd of it’s current tasks and exit statuses become central to the monitoring of a critical Mac server or client system. Particularly on Mac OS X server, where processes such as Postgres and Apache now take such a central role in the provision of services, monitoring of launchd can be a game changer in identifying issues quickly.

I have added a new script to my OSX-Monitoring-Tools project which checks launchctl, and reports on non-zero exit codes for tasks. This is a lifesaver on Mac OS X Server, and can alert you very quickly to processes controlled by launchd that exit badly (very good at picking up ‘Throttling respawn’ tasks).

CRITICAL - critical daemons (org.postgresql.postgres,com.apple.devicemanager) exited with a non-zero code! | active=113;inactive=161;error=4;

In the example above, bad permissions are stopping PostgreSQL from starting, which is also causing the failure of Profile Manager. As you can see, the script returns performance data for task counts.

Because some Apple provided launch daemons happily exit with non zero exit codes as status identifiers (com.apple.xprotectupdater, com.apple.suhelperd), an array of exceptions are built into the script. You can also supply your own exceptions and critical identifiers as arguments.

check_osx_launchd.sh on GitHub