Monitoring launchd tasks with Nagios

Over the last 4 major releases of Mac OS X, launchd has cemented itself as the virtual engine room of service control, becoming responsible for the execution of most critical processes in the operating system.

From bootstrap all the way through to iCloud and Time Machine backup, launchd is somehow involved in almost every important task. It therefore makes sense that the comprehensive overview provided by launchd of it’s current tasks and exit statuses become central to the monitoring of a critical Mac server or client system. Particularly on Mac OS X server, where processes such as Postgres and Apache now take such a central role in the provision of services, monitoring of launchd can be a game changer in identifying issues quickly.

I have added a new script to my OSX-Monitoring-Tools project which checks launchctl, and reports on non-zero exit codes for tasks. This is a lifesaver on Mac OS X Server, and can alert you very quickly to processes controlled by launchd that exit badly (very good at picking up ‘Throttling respawn’ tasks).

CRITICAL - critical daemons (org.postgresql.postgres,com.apple.devicemanager) exited with a non-zero code! | active=113;inactive=161;error=4;

In the example above, bad permissions are stopping PostgreSQL from starting, which is also causing the failure of Profile Manager. As you can see, the script returns performance data for task counts.

Because some Apple provided launch daemons happily exit with non zero exit codes as status identifiers (com.apple.xprotectupdater, com.apple.suhelperd), an array of exceptions are built into the script. You can also supply your own exceptions and critical identifiers as arguments.

check_osx_launchd.sh on GitHub

Submit a comment

Your email address will not be published. Required fields are marked *