Earlier this year, we noticed an increasing stability problem with our main mailserver, mail1.bytemine.net. First we blamed it on the increased load due to the ever increasing volume in spam. We deciced to move the biggest ressource hog, SpamAssassin away from this machine.
Doing mostly processing of text and binary content in e-mails SpamAssassin eats CPU for breakfast. For a bunch of weeks this seemed to have helped quite a bit, but then our problems returned even worse. Thanks to the nice systems data graphing suite symon Bernd noticed that our mailserver was constantly doing roughly 3 to 4 MB/s disk IO. While this does not seem that much at first, in average that meant, that the disks were always busy and data meant to go to disk was waiting around in buffer cache.
Additionally we were now watching the machine with systat(5), where you can watch the I/O stats very nice in realtime.
Around the same time we got a new machine with a LSI MegaRaid controller (MegaRAID SAS 8408, in OpenBSD driven by the mfi(4) driver) and while debugging performance issues with the raid, we figured out that it came with very safe, but slow default settings:
- all disk caches were disabled
- the controller cache was disabled
- the controller was in Write-Through mode
Then we checked the manual for the LSI MegaRaid controller present in our mailserver (LSI MegaRAID SATA 150-4 driven by the ami(4) driver) and discovered, that the same options where present there and assumed they would have the very same defaults. The mailserver was initially deployed in 2005 and until november 2007 never showed any signs of having I/O congestion. Yet it definitely looked to us as if we were having problems with i/o bigtime.
By mid-february we were having resource starvation symptons every other day, with highly sluggish behavior during peek hours. Out of nowhere userland would freeze, the kernel yet still answering to icmp packets.
Once we went down to frankfurt to look at the controller and saw, that the very same defaults were used it became quite clear. Bernd reconfigured the controller to use more sane defaults and within minutes we saw a totally differeny I/O behavior. The following graph showing the days before and after the changed controller settings displays this very well (click images to see large version).
Looking at the week after we adjusted controller settings, the difference in I/O seems enormous. What happened was:
- with the default controller settings data could not get to the disks quick enough, so we constantly dragged pending I/O with us. With the adjusted settings, data gets to the data quickly, so that we don’t start to build up such a congested situation:
We then took some time to look at some of the recored data from last year. And suddenly the picture became all clear. Since November 2007 we had a n increase of I/O of roughly 700%. No wonder, that the “safe” (but slow) default settings didn’t hinder us before, the machine was bored, yet with more and more I/O coming it suddenly became a bottleneck. As the big increase of I/O is reading from the disk, we suspect that more and more of our customers have switched to using IMAP, which does cause a bigger perfomance hit that POP3. The following graph shows I/O over the last year:
While updating machines at our colocation in Frankfurt last friday, Bernd and I hit a pretty harsh bug. Because of changes in the disklabel code, the disklabel of our main customer mailserver got updated by the OpenBSD kernel upon booting with the new OpenBSD kernel. After updating all packages and configurations on the machine we gave the server another reboot. Next the machine got stuck in the first-stage bootloader, complaining that there is no OS to boot.
Quite baffled, we booted bsd.rd kernel just to see that there is no disklabel present. Now, usually you have backups of /var/backups, where OpenBSD keeps informations like this around. Of course, these files were not in the backup (time to blush on my part). Thanks to a hint from our fellow OpenBSD developer Henning, we gave scan_ffs a try. Since scan_ffs is not present in the ramdisk kernels, we compiled it statically and placed in on a USB stick. scan_ffs scans a disk and outputs the partitions it finds. With a bit of thinking, we could then create a disklabel with the correct data. Once we had /var mounted, we could use the saved disklabel there to make sure the label is correct.
The manpage for scan_ffs contains the following advise:
The basic operation of this program is as follows: 1. Panic. You usually do so anyways, so you might as well get it over with. Just don’t do anything stupid. Panic away from your machine. Then relax, and see if the steps below won’t help you out.
Very true. While we did not panic, the best thing to do in such situations is to think and evaluate your options and possible recovery methods and don’t be quick on trying to fix it, likely you will make it worse ;)