Zenoss as a replacement for Nagios
At work, i'm responsible of maintaining something like 60 desktop computers up and running. To keep track of their state, I use Nagios. It's not a bad tool, but its interface is ugly and quite unintuitive, and it lacks some features.
Last month, a new version of Zenoss was released. Zenoss is a full monitoring suite based on SNMP. It is rumored to scale well, track the hosts' states with more accuracy and be visually more appealing. It looked like a nice alternative, so I tried to set it up to replace Nagios.
Installation and configuration
Installing the software is quite easy: just follow the installation manual. On Debian, all you have to do is add a repository and install zenoss-stack. Approx. 100MB to download and 400 when installed. Uh-oh, first catch: Zenoss is badly integrated and comes with his own copy of MySQL and of all the tools it needs instead of using the standard Debian packages.
After starting the service (why isn't this done automatically?), Zenoss is available at http://localhost:8080/. I could have started using Zenoss immediately, but I want more control over my server: I need to grant access from a restricted set of users (Zenoss could have done it with its own users) and onyl using SSL.
The recommended way of enabling SSL is to proxy the dedicated Web server that ships with Zenoss (Zope) through Apache, so that calls to the HTTPS server will be transparently redirected to the embedded HTTP server (without having to open more ports in the firewall). The manual is probably enough if you don't have another site running on Apache but I had, and it ain't. So, I had to guess the rewrite rules to make the server work like it should. I never managed to do exactly what I wanted but the end result was close enough.
Actually, the manual isn't precise enough anyway, since it forgets to state you need to enable Apache's mod_proxy and mod_proxy_http. Not too hard to find through Google, but annoying.
Zenoss indeed looks much better than Nagios, and is more complete. For example, you can monitor both services and system properties (system load, memory usage...) and you can have a meaningful graph of the latest activity, instead of the bland OK/Warning/Critical history that Nagios has. Adding a node can be done directly from the Web interface, while you'd have to tweak configuration files and restart the server if you were running Nagios. Surprisingly, the OS/Hardware fields weren't populated with that should have been gathered from an SNMP request, but that might just be me misconfiguring either Zenoss or the SNMP server.
Monitoring a few more variables is not too painful either, assuming you know how SNMP works.
Events can easily be acknowledged, and are accessible from a configurable dashboard.
Adding some services
Okay, those were the advantages over Nagios. Now, it's time to restore what I used to have, the biggest part being services. One HTTPS server, one mail server, four printers, 50-60 SSH servers... Oh, good, the Services tab already knows about mostly everything I need, I'll just set up a service group, put my hosts in the group and... Oh noes, no service groups!
That's right, Zenoss deals with everything in a clean, hierarchical manner. Everything but services. If you want to monitor a set of services on a set on hosts, you'll have to add them one by one, without forgetting any and without breaking the configuration. I don't want to know how you're supposed to do when you schedule some downtime.
Apparently, a ticket has been submitted on their issue tracker but there's no milestone and, while they acknowledge the problem, they don't look really eager to work on it (the ticket is 17 months old at the time of this writing).
Zenoss is an interesting alternative: nice interface, access control, configuration through an interface rather than through configuration files, better monitoring of usage data... However, it still lacks in some critical areas. Amongst others:
- Bad OS integration (maintenance and security hell);
- Incomplete documentation. The available documentation is well written but you sometimes need more than the simplest example;
- Awful service management.
If these look like deal breakers to you, you'd better keep what you're using right now (for example, Nagios + Cacti).