Experience migrating from VMware ESXi to KVM in a production environment

My notes from setting up a production KVM environment, after migrating from VMware ESXi 5.1 to Ubuntu 12.04.2 64-bit with Linux Kernel 3.2 and QEMU 1.4.2, and open vSwitch.

General

Disk I/O throughput and performance characteristics

  • Always use LVM backed storage (which is aligned), with cache='none' and io='native' (aio) for guests. Disabling cache allows the host system to properly schedule disk reads and writes
  • Use deadline I/O scheduler for host systems, and vm.swappiness = 0 in on host or equivalent to reduce pressure on I/O resources and make use of host memory
  • Use virtio for bus type to allow direct access to storage instead of going through QEMU, if supported by guest operating system drivers

Processor

  • Pass through CPU flags to guest to take advantage of newer instruction sets, assuming host hardware is the same or migration is not going to be used (-cpu host)

Network

  • Use virtio Network adapters (except with Windows) to realise full throughput and lower latency on guest operating systems, where support is available (Linux 2.6+, FreeBSD)
  • Load vhost_net kernel module on host, which permits direct access to network devices skipping QEMU (libvirt will detect if vhost_net is enabled, and add vhost=on to qemu command line by default)

Software

  • Linux Kernel 3.5, distributed with Ubuntu 12.04.2 does not support building open-vswitch – you must install Kernel 3.5 for the DKMS to properly build
  • Build QEMU from source to include new functionality and Hyper-V enhancements for Windows guests, using the 1.4 stable branch – 1.5 does not work with libvirt due to the way QEMU help parameters are parsed by the library

Guest

Linux

  • vm.swappiness = 0

FreeBSD

  • vm.defer_swapspace_pageouts = 1
  • kern.timecounter.hardware=ACPI-fast
  • kern.timecounter.smp_tsc="1"
  • kern.timecounter.invariant_tsc="1"
  • Use prebuilt virtio drivers, or compile from ports under emulators/virtio-kmod after each system upgrade

Windows

  • Disable HPET via qemu (-no-hpet) or libvirt configuration, force use of TSC to reduce time drift in guest
  • If you are using qemu 1.4 or greater, enable CPU flags (Hyper-V shims) hv_vapic,hv_relaxed,hv_spinlocks=0xffff on top of disabling HPET
  • Install Memory Ballooning service and drivers, SCSI (virtio) drivers using the stable branch from Redhat
  • Use e1000 Ethernet for Windows 2008 R2 to avoid high latency/freezing of guest operating system

libvirt

  • Add xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0' to domain type, if you are going to add custom qemu command args
  • If you are using Ubuntu, and want to change the version of qemu you are going to use – you will either need to disable AppArmor, or update the profile to include the directory you've installed the alternative qemu version to

Migration

  • Make sure to uninstall VMware Tools in the guest environment after you have migrated, and install acpid on the guest to allow graceful shutdowns (if Linux)
  • Take a snapshot of the virtual machine – transfer the VMDK, and use a utility such as vmdksync to merge the deltas after shutting down the VM for the file migration to reduce downtime
  • For Windows VMs, make sure to add a dummy virtio SCSI and Ethernet device so you can install drivers and then switch the root drive to virtio

Sophos AntiVirus (SAVDI) and amavisd-new for AntiVirus on email

Update: Based on an email I received, I've updated this post with more relevant information regarding setting up SAVDI with amavisd-new.

I recently migrated to using Postfix with amavisd-new on Ubuntu Linux, and was looking at integrating Sophos AntiVirus with amavisd-new. amavisd-new shipping with the LTS release of Ubuntu is 2.6.5, which does not include SPPP functionality for communicating with savdi so you must use Sophie protocol.

The following components were used for setting up this functionality with amavisd-new from MySophos Download & Updates:

This post is assuming that you have setup amavisd-new on your system, and have it integrated with Postfix or equivalent MTA already.

savdid.conf:

channel {
commprotocol {
type: UNIX
socket: /var/run/savdid/savdid.sock
user: amavis
group: amavis
requesttimeout: 120
sendtimeout: 2
recvtimeout: 5
}

scanprotocol {
type: SOPHIE
allowscandir: SUBDIR
maxscandata: 500000
maxmemorysize: 250000
tmpfilestub: /tmp/savid_tmp
}

scanner {
type: SAVI
inprocess: YES
maxscantime: 3
maxrequesttime: 10
deny: /dev
deny: /home
savigrp: GrpArchiveUnpack 0
savigrp: GrpInternet 1
savists: Xml 1
}
}

This should permit amavisd-new to communicate with the SAVDID interface. Don't forget to create the appropriate init.d script to start savdid on boot and make sure to create the /var/run/savdid directory – as Debian/Ubuntu clean /var/run on system startup. Please download the init script from here and place it in /etc/init.d/savdid with an executable bit.

amavis communicates with SAVI using the Sophie protocol, to enable this support in amavis-new edit /etc/amavis/conf.d/15-av_scanners and add the following lines:

  ['Sophie',
    \&ask_daemon, ["{}/\n", '/var/run/savdid/savdid.sock'],
    qr/(?x)^ 0+ ( : | [\000\r\n]* $)/m,  qr/(?x)^ 1 ( : | [\000\r\n]* $)/m,
    qr/(?x)^ [-+]? \d+ : (.*?) [\000\r\n]* $/m ],

Please be aware that the socket line is being pointed at the place where we have SAVI listening for connections, based on our previous post.

Jabber/XMPP via Trillian on ejabberd with multiple Android devices

If you have an ejabberd deployment, and you use Trillian and try to sign in on multiple devices you may notice that you are disconnected on one of the Android devices (usually the oldest) when the other device signs in. As one of the benefits of Trillian is to be able to use chat on multiple devices and have messages sync, this is annoying and defeats the purpose of Trillian.

The reason that Trillian has this behaviour with Jabber/XMPP is because the Resource name on all devices is the same ("Android" by default, or configurable statically.) ejabberd will disconnect the old connection with the same resource name.

In 2.1.9 release of ejabberd, a new configuration/behaviour was added to handle duplicate resources names called resource_conflict. The new behaviour will permit a second connection with the same name to be assigned a random resource name allowing both devices to stay connected.

To enable this functionality, add the following stanza to your /etc/ejabberd/ejabberd.cfg (or equivalent location):

{resource_conflict, setresource}.

Then restart ejabbered, or reload the configuration via ejabberdctl. This should resolve the issue with resource name conflicts, and allow you to use ejabberd with Trillian to provide a seamless mobile experience!

Actiontec releases removed from Open Source Download Area

Update on March 26th, 2011: Actiontec has put the files for V1000H back on their Open Source Download Area. The files are now missing the Broadcom userspace/private directory, and will not provide you with a working open source build out-of-the-box (missing a lot of userspace binary blobs.) The last known good release for the V1000H is 2011/01/18. Please note that this new release will cause your device to be put into an irreparable state if you flash it.

Update on September 24th, 2011: Actiontec has once again removed the open source downloads for a number of their products on their Open Source Download Area site. I have decided to republish the tarballs of the source in response to this action.

Actiontec has removed all traces of releases for the Actiontec V1000H, R1000H, and Q1000 from their Open Source Download Area. These routers are running Broadcom 96368VVW reference design which uses Linux and BusyBox, and thus under the terms of GPL the source code must be released. I e-mailed the group in charge of Open Source compliance at Actiontec, and received the following response:

We are in the middle of revamping all of those codes and until our engineering department is done they have asked that we take the old ones down. Sorry for any trouble this has cause. I do not have an ETA on when I will have them back up there. I can only suggest that you keep checking back periodically.

I have an archive of most files that Actiontec has released based around the BCM96368 reference platform, and have posted them below. This issue adds onto my already growing list of issues with Actiontec, such as posting source code that doesn't work on the device, or is missing critical drivers (eg. Ethernet, HPNA.) The reason for multiple releases with the same filename and version is that Actiontec re-releases files under the same name with changed content.

Release Size MD5 Model Description
4.02L.01 (2010/03/16) 123M 02b864cadaafb829a59754f1fc5e9329 Q1000 Current release
4.02L.03 (2010/12/06) 124M 60d4dce62b6148c3eed09b757e9cb957 R1000H
4.02L.03 (2011/01/18) 125M eb4caa8ba7d307fc2eb567e7f00e306a R1000H Current release
4.02L.03 (2010/12/06) 124M b9c1171cbb88347f2350f6d6dec2c795 V1000H Drivers missing for Ethernet, HPNA
4.02L.03 (2011/01/18) 124M 2c5f750811bfbb8f043f70095801b244 V1000H Last known "good" release. Drivers missing for HPNA, Missing Actiontec Web GUI
4.02L.03 (2011/03/24) 130M b6f379e5fdebbb8edf9b480d128a0d85 V1000H Incomplete release. Missing complete Web GUI and "private" Broadcom files (userspace/private), missing drivers for HPNA.
31.30L.55 (2011/11/01) 124M b3f9161ecbf7c8a56e97e1ac22b52c0a V1000H Unknown release. No web interface (?)

It has also come to my attention that the releases for these products have also been removed from the Qwest GPL Download site. TELUS, which also distributes the Actiontec V1000H does not make an offer available for source code under terms of the GPL.

Please view the README authored by Actiontec for more information on how to build and flash the open source release.

Android's K-9 Mail battery life and Dovecot's Push-IMAP

What is Push-IMAP, and why is it useful?

In the world of mobile phones, battery life is a concern. You want to be able maximize the battery life on your mobile phone, while still getting instant notifications of new e-mail. This is where Push-IMAP (aka. IMAP PUSH, P-IMAP) comes into play, an extension based on an RFC which combines with IMAP IDLE. You no longer need to poll the IMAP server if you are using this feature, as you always have an open connection.

What's the problem?

I am using K-9 Mail on my Android phone, and would like to make use of the IMAP PUSH feature but I found it consumes far too much battery. In Dovecot 1.2 when you initiate IMAP IDLE via your IMAP client (eg. K-9 Mail) Dovecot sends a message every 2 minutes stating "OK Still here", this causes the mobile data connection to have to wake up and consumes excessive amounts of battery. There is no way to configure this behavior in Dovecot 1.2 except by a source edit.

The solution

You will need to upgrade your Dovecot installation to 2.0 (if you aren't running 2.0 already,) which is slightly out of the scope of this blog entry. I found the upgrade rather painless by following the Upgrading Dovecot v1.2 to v2.0 guide on the Dovecot Wiki.

Dovecot 2.0 supports a configuration option called imap_idle_notify_interval which enables you to specify the interval between "OK Still here" messages. K-9 mail by default refreshes IDLE connections every 24 minutes, but of course Dovecot wakes up the client much more frequently than that. We are going to fix this behavoir.

The configuration of Dovecot 2.0 is slightly different than Dovecot 1.2, composing of multiple files. If you are using Linux your Dovecot configuration is most likely contained under /etc/dovecot, and on FreeBSD it is contained under /usr/local/etc. You will want to edit the conf.d/20-imap.conf file under the respective directory based on your host operating system.

You will see a stanza similar to the one outlined below, and you will want to uncomment the imap_idle_notify_interval line and replace 2 mins with 29 mins.

protocol imap {
 # How long to wait between "OK Still here" notifications when client is
 # IDLEing.
 #imap_idle_notify_interval = 2 mins
}

When you have completed this step, you will want to restart Dovecot. This can be accomplished on Linux with /etc/init.d/dovecot restart, or /usr/local/etc/rc.d/dovecot restart on FreeBSD.

My results

By switching the Dovecot server to send the "OK Still here" notification to every 24 minutes instead of 2 minutes the mobile client is woken up much more infrequently, either when you receive a new e-mail or every 29 minutes respectively. This has greatly improved the battery life on my HTC Desire Z with K-9 Mail, and hopefully it will help out with your device issues and being able to instantly receive new e-mail notifications!

An update, RFC style

Clint Pachl e-mailed me to inform me that IMAP4 IDLE RFC (rfc2177) specifics that the client should issue a IDLE command every 29 minutes. I have updated the guide to reflect this change.

Because the K-9 default "Refresh IDLE connection" is 24 minutes, that gives a buffer of 5 minutes if the IMAP server timeout is set to 29 minutes. The RFC and K-9 default times, 29 and 24 respectively, don't seem like a coincidence. I think the RFC may have been an influence on the K-9 devs when choosing a default IDLE refresh.

Consequently, setting Dovecot's imap_idle_notify_interval to 29 minutes seems most appropriate considering K-9's default. This gives K-9 ample time to respond in case of short outages or passing between cell towers (<5min window). However, beyond that window, the server can then shut down the connection.

Setting both the server and the client to the same timeout/refresh may cause some cross-talk.