Team bytemine will attend the Kieler Linux und Open Source Tage which will happen at the 10th and 11th of October 2008. We will have a booth where we’re going to present some of our products and services. Felix Kronlage will give a talk titled “Commercial deployment of OpenBSD-based products”. At our booth we will also have OpenBSD CDs and T-Shirt merchandise available to help fund the OpenBSD project.
So if you’d like to see the talk or just like to have a chat with some byteminers, we’re looking forward to see you at the conference and our booth.
Finally, after waiting for more than a year, we got couple of Liantec EMB-5740 prototypes to play with. Since we’ve been selling Soekris systems for years now and lots of customers ask us about performance data, sometimes in comparison to other, similar, hardware solutions, we sat down and compared the Liantec systems to the Soekris 5501-70. This ended as an IPsec performance comparison between the both.
In this test we wanted to determine how much IPSec traffic the Soekris and the Liantec can handle. For comparison, we also made the tests without IPSec.
The test setup consisted of two workstations (about 1.5Ghz with 512MB memory each) connected via two routers. The first test was done with two Soekris 5501-70, then with the two Liantec boxes. The workstations were running OpenBSD 4.3 default.
To test the throughput of the boxes, we used a tool called iperf. We could have used tcpbench, but that is only available in OpenBSD current (becomes 4.4) and we also wanted to compare our results to 4.3. On server side the iperf was running with “
iperf -s -B 10.1.2.2” and on the client “
iperf -c 10.1.2.2 -s1 -t100”. “
-s1” makes iperf output every second and “
-t100” limits the test to 100 seconds. For IPSec we copied the keys from /etc/isakmpd to the other machine.
The next table is a small comparison of the technical specification of both boxes:
|CPU||500Mhz (AMD Geode LX single chip processor with CS5536 companion chip)||1000Mhz (VIA Embedded C7-Eden Platform with VIA PadLock Security Engine)|
|Heat||cold, even under load||After some hours very hot|
|Form factor||Slim and small, 19" cases available from Kerberos||Fat and heavy; better do not stack because of the heat|
|Stackable||yes||preferably not (see above)|
|Power consumption||8-12 W||25 W (with 2 network cables)|
|Price||266 EUR (33 EUR for case and power supply)||398 EUR|
All tests were made with three different kernels. Two were default 4.3 and current (4.4) kernels, the other one was a current kernel without pf. We also did a fourth test with some changed buffer sizes, which we used a current kernel for.
Following a FAQ advice, we also testet some different buffer sizes to increase network performance, i.e. we set sendspace and recvspace to higher values.
Throughput without IPSec
The following chart is just for comparison, to see what throughput is possible without IPSec. There is just one current (4.4) test as it was not possible to put the Liantecs at their limit with our test-setup, so tuning things would have been useless.
As you can see, in this test case the Liantec box is at least twice as fast as the Soekris. But the results are not really repesentative, because the Soekris was at its limit whereas the Liantec was hanging at about 40% of load. That comes from the fact that the two (older)
workstations were not fast enough to fill a Gbit link with packages to put the Liantec at its limit.
Throughput with IPSec
Quite obvious in this chart is, that the Liantec is about twice as fast as the Soekris and that there is just a litte difference between the kernels (see also the next chart). What troubled us is, that while the Soekris has become faster from 4.3 to current, the Liantec box
slowed down a bit.
Throughput increase (with IPSec)
This chart compares the increase of throughput from the Liantec box to the Soekris. It uses the same data as the graph before, just a different layout. As the chart shows, there is just a small increase in throughput by disabling features like pf and tuning some buffers.
While testing we also compared the load of the two boxes, with the following results (everything with IPSec):
The test results are based on the meridian of 5
What is interesting about these comparions is the fact, that the interrupt load has increased from 4.3 to current while system and crypto load decreased. Another remarkable thing is, that there is no scope in throughput with the Soekris, so it is at its limits. On the other
hand it seems to be possible to squeeze some mbits out of the Liantec with some smart tuning.
As mentioned before, the Liantec box became really hot while testing.
$ sysctl -a | grep hw.sensors.lm1.temp hw.sensors.lm1.temp0=63.00 degC hw.sensors.lm1.temp1=62.50 degC
Compared to a Soekris:
test@soekris1# sysctl -a | grep hw.sensors.nsclpcsio0.temp hw.sensors.nsclpcsio0.temp0=124.00 degC (Remote) hw.sensors.nsclpcsio0.temp1=127.00 degC (Remote) hw.sensors.nsclpcsio0.temp2=56.00 degC (Local) test@soekris2$ sysctl -a | grep hw.sensors.nsclpcsio0.temp hw.sensors.nsclpcsio0.temp0=113.00 degC (Remote) hw.sensors.nsclpcsio0.temp1=127.00 degC (Remote) hw.sensors.nsclpcsio0.temp2=51.00 degC (Local)
It is obvious that the results of the Soekris are not correct, so either the Soekris sensor is broken or the OpenBSD driver has bugs.
While testing the Soekris and Liantec boxes some problems occured. First of all, broken Soekris boards suck. One of the test boards was not able to warm-reboot, which can be really annoying while rebooting different kernels.
A problem with the Liantec boxes is, that they are way too hot. You have to put them in a cold place and may not stack them, otherwise they will probably die because of their heat. Additionally putting a cf-card into the Liantec is really annoying. You have to disassemble the whole box, which means you have to screw off the entire case, the cpu heat sink and the mainboard. When you are finished, you can easily put the cf-card into the slot on the bottom of the mainboard and assemble the whole box again. Besides that, the test case could have
been better. For IPSec and Soekris testing the workstations are fast enough, but for Gbit throughput tests they are not.
What to do next
It would be interesting to test some different buffers and sizes (not just
net.inet.tcp.sendpsace=32768. Another interesting test would be to further customise the kernel, e.g. throw IPSec out and see how fast it will be. When there is time it also would be nice to test seperate crypto acceleators like the vpn1401 from Soekris engineering, but from what we have heard, they are worthless.
Coming to the conclusion of our tests, what do we gain? The tests showed that it is a better choice to use a generic kernel with default settings instead of a crippled one, because this way you only have a performance lost by a few mbit/s but much less maintainance work.
As mentioned earlier, the Liantec box is at least about twice as fast as the Soekris, but also more expensive. Which one you will buy depends on your needs, both have major advances and disadvances.
A while ago, one of our customers had an inquiry regarding good thinclient products which can be used with NoMachine (NX) servers. We actually found a nice thinclient based on VIA Eden. This hardware is shipped with a pre-installed Linux based operating system called “2X” which offers the possibility to connect to different remote terminal services like NX, Citrix, Remote Desktop (RDP) and Virtual Network Computing (VNC).
To get to know the hardware we purchased one unit. At first sight the shipped operating system was okay, yet second looks revealed deficiencies. One shortcoming was the really old NX client version. Another problem for our customer was that this software was built with some proprietary components which couldn’t be replaced by free alternatives easily.
Our client asked us if the shipped operating system can be replaced with an OpenBSD installation and some free programs to connect to the different terminal services. Since the operating system is installed on a compact flash card, used as the harddisk, it was just a matter of replacing this card with an empty one and booting an OpenBSD kernel via the network.
The first installation of OpenBSD 4.3 went quite well. Almost all hardware was detected properly, this is mainly due to the fact that this machine is based on standard PC hardware. Only the graphics card didn’t work with Xorg at first. The problem was that the shipped Via driver doesn’t support the newer graphic chips. We talked to Matthieu Herrb (matthieu@) of the OpenBSD project and he told us that there’s a replacement driver called openChrome which will actually support these chips.
After fixing some compile problems we got the openChrome driver to work on OpenBSD. After creating a small Xorg configuration we plugged one of our 22" displays and it just started with the correct resolution. To get some more test results for this new driver, it went into the official OpenBSD snapshots for some days so users can test this. Matthieu Herrb committed the driver to the -current CVS tree at Jul 12 and made it the default for all Via graphics chips since it’s much better than the old driver. Thanks Matthieu!
In the meantime we developed a basic prototype for an OpenBSD thinclient. The goal was to offer the same features as the shipped operating system does but with open source software only. The first test version that went out to the customer included a really slim window manager, jwm, and opennx to connect to NX servers. Our own tests went quite well (we’re actually using this system on a daily basis) and the initial feedback of our customer was quite well.
Here are some screenshots of our initial prototype:
The day started with a nice walk in the sun from our apartment in the jewish quarter to the conference center just on the other side of the river.
Today one of my favourite speakers, Dinis Cruz, gave his talk on OWASP. His talk was actually splitted in two subjects. First on OWASP, then on “ounce F1” which is an enhancement from him to the Ounce Labs Product. Dinis is an amazing speaker, full of energy jumping around on stage.
The afternoon was filled with hanging out on the lawn outside in the sun, talking to various people about binpatchng, infosec in general as well as giving my second interview with Tyrel from Sites Collide. We mostly covered topics like flashable devices, the loss in quality in Software because of easy-updateable devices and of course binpatchng. I will post a link, once Tyrel has published the interview.
The interview I made with Tyrel at last years Confidence conference.
One of the nice things about Conferences like confidence is that you have certain people, like Dinis, that you meet every now and then at conferences and you have these amazing discussions. For me this is one of the best ways to gather new input, collect thoughts and be inspired about playing with and hacking new technology.
The evening was filled with going for food with all the speakers and of course having the obligatory beer!
’till next year at Confidence ’09!
Back in Krakow!
Like last year, I was invited to Confidence again. I’ve arrived on Thursday (after spending hours on the rollfield of Warsaw airport) and experienced the very warm welcome of the organizers. At the airport I’ve met with Alesio and we were immediately taken to a pub to “recover” from the flight stress. ;)
The first day of the conference was filled with talking to many of the people that you meet at these places, it’s always really nice meeting your friends all over the place. Being at Confidence as a speaker is incredible, since the organizers do whatever they can to make you happy.
Hey, I even got a plate of fruits (thanks Anna), when I did not want the bbq. :)
Stay tuned for more. Both on Confidence second day as well as binpatchng. ;)
Yesterday the Debian Security Team annouced a serious security issue in Debian’s openssl package. You probably know it already because it’s been around on all news sites.
We just want to remind people on how important this is and how it can affect your systems.
The Debian annoucment1 reads:
It is strongly recommended that all cryptographic key material which has
been generated by OpenSSL versions starting with 0.9.8c-1 on Debian
systems is recreated from scratch. Furthermore, all DSA keys ever used
on affected Debian systems for signing or authentication purposes should
be considered compromised; the Digital Signature Algorithm relies on a
secret random value used during signature generation.
The first vulnerable version, 0.9.8c-1, was uploaded to the unstable
distribution on 2006-09-17, and has since propagated to the testing and
current stable (etch) distributions.
Affected keys include SSH keys, OpenVPN keys, DNSSEC keys, and key
material for use in X.509 certificates and session keys used in SSL/TLS
This actually means that all SSH, OpenVPN, DNSSEC, SSL/TLS session and X.509 key material which has been generated on a Debian machine after Sep 17 2006 is probably vulnerable.
This is quite bad and will create a lot of work for Debian sysadmins.
So if you’re using Debian on any of your systems we recommend reading of the actual security annoucement and follow-ups.
Earlier this year, we noticed an increasing stability problem with our main mailserver, mail1.bytemine.net. First we blamed it on the increased load due to the ever increasing volume in spam. We deciced to move the biggest ressource hog, SpamAssassin away from this machine.
Doing mostly processing of text and binary content in e-mails SpamAssassin eats CPU for breakfast. For a bunch of weeks this seemed to have helped quite a bit, but then our problems returned even worse. Thanks to the nice systems data graphing suite symon Bernd noticed that our mailserver was constantly doing roughly 3 to 4 MB/s disk IO. While this does not seem that much at first, in average that meant, that the disks were always busy and data meant to go to disk was waiting around in buffer cache.
Additionally we were now watching the machine with systat(5), where you can watch the I/O stats very nice in realtime.
Around the same time we got a new machine with a LSI MegaRaid controller (MegaRAID SAS 8408, in OpenBSD driven by the mfi(4) driver) and while debugging performance issues with the raid, we figured out that it came with very safe, but slow default settings:
- all disk caches were disabled
- the controller cache was disabled
- the controller was in Write-Through mode
Then we checked the manual for the LSI MegaRaid controller present in our mailserver (LSI MegaRAID SATA 150-4 driven by the ami(4) driver) and discovered, that the same options where present there and assumed they would have the very same defaults. The mailserver was initially deployed in 2005 and until november 2007 never showed any signs of having I/O congestion. Yet it definitely looked to us as if we were having problems with i/o bigtime.
By mid-february we were having resource starvation symptons every other day, with highly sluggish behavior during peek hours. Out of nowhere userland would freeze, the kernel yet still answering to icmp packets.
Once we went down to frankfurt to look at the controller and saw, that the very same defaults were used it became quite clear. Bernd reconfigured the controller to use more sane defaults and within minutes we saw a totally differeny I/O behavior. The following graph showing the days before and after the changed controller settings displays this very well (click images to see large version).
Looking at the week after we adjusted controller settings, the difference in I/O seems enormous. What happened was:
- with the default controller settings data could not get to the disks quick enough, so we constantly dragged pending I/O with us. With the adjusted settings, data gets to the data quickly, so that we don’t start to build up such a congested situation:
We then took some time to look at some of the recored data from last year. And suddenly the picture became all clear. Since November 2007 we had a n increase of I/O of roughly 700%. No wonder, that the “safe” (but slow) default settings didn’t hinder us before, the machine was bored, yet with more and more I/O coming it suddenly became a bottleneck. As the big increase of I/O is reading from the disk, we suspect that more and more of our customers have switched to using IMAP, which does cause a bigger perfomance hit that POP3. The following graph shows I/O over the last year:
While updating machines at our colocation in Frankfurt last friday, Bernd and I hit a pretty harsh bug. Because of changes in the disklabel code, the disklabel of our main customer mailserver got updated by the OpenBSD kernel upon booting with the new OpenBSD kernel. After updating all packages and configurations on the machine we gave the server another reboot. Next the machine got stuck in the first-stage bootloader, complaining that there is no OS to boot.
Quite baffled, we booted bsd.rd kernel just to see that there is no disklabel present. Now, usually you have backups of /var/backups, where OpenBSD keeps informations like this around. Of course, these files were not in the backup (time to blush on my part). Thanks to a hint from our fellow OpenBSD developer Henning, we gave scan_ffs a try. Since scan_ffs is not present in the ramdisk kernels, we compiled it statically and placed in on a USB stick. scan_ffs scans a disk and outputs the partitions it finds. With a bit of thinking, we could then create a disklabel with the correct data. Once we had /var mounted, we could use the saved disklabel there to make sure the label is correct.
The manpage for scan_ffs contains the following advise:
The basic operation of this program is as follows: 1. Panic. You usually do so anyways, so you might as well get it over with. Just don’t do anything stupid. Panic away from your machine. Then relax, and see if the steps below won’t help you out.
Very true. While we did not panic, the best thing to do in such situations is to think and evaluate your options and possible recovery methods and don’t be quick on trying to fix it, likely you will make it worse ;)