Icon Rufen Sie uns an
+49 441.309197-69 +49 441.309197-69
 
EN

bytemine infrastructure - part 2

Posted by Daniel Rauer on Monday, August 12, 2013

This is the second episode of our series about reliable infrastructure, the first part is available here: bytemine infrastructure - part 1

In the previous post we focused on the foundation of our infrastructure: We mainly use Dell servers with KVM for virtualization and Icinga for monitoring purposes.

Puppet labs logo Puppet status teaser In 2010 we introduced Puppet as a means to automation and (remote) management. Whenever applicable servers and services are set up and configured via Puppet. To this end, software gets packaged for our repositories whenever required.
There are several advantages of using an automation tool: The amount of time needed for recurring tasks is heavily reduced, whether setting up a service or on changes that apply to several servers. Even more important is the unification of environments: Identical servers are set up identical and configurations are stored always at the same location. This means people can find information more quickly and team-wide standards can easily be abided.

collectd logo The tool we chose for performance analysis is collectd: The plugin architecture and a large amount of add ons as well as various frontends lead to great flexibility. Especially when used in conjunction with Puppet new systems are integrated quickly, providing instant access to standard metrics like disk and network throughput, CPU load, file system allocation and many more. Adding custom metrics for special applications is quite simple: Here is an example to retrieve statistics of an ActiveMQ broker: collectd and ActiveMQ


Cockpit screenshot The number of administration tools we use in our team increases continuously. For our daily work we have a somewhat large collection of tools for perfomance measurement, monitoring, centralized logging, and so on. Most of them are webbased, so the idea to create an own app as a dashboard and central point of entry to our toolset came to mind. We called it "Cockpit" and integrated it in our and our customers various workflows.
Cockpit provides information about Icinga, unowned tickets from our request tracker and links to all relevant tools for troubleshooting at a glance. Furthermore, one can search and display collectd graphs: select a host from a list, choose the desired parameter (CPU, throughput, etc.) and the graph is instantly rendered.