Linux 3.17 KVM, qemu 2.1, libvirt 1.2.9 experiences (and how to cleanly disable TCP checksum offload in libvirt)

Update: This issue has been resolved in kernel 3.18.10 release. The below instructions are no longer required if your distribution has updated the kernel or backported the fix.

Due to latency issues that I was having with KVM and Windows 2008 R2 with Linux 3.10, I decided to update to Linux 3.17 series despite the TCP checksuming issue that I had been encountering (eg. virtio-net not working at all between guests due to the CHECKSUM_PARTIAL bug in 3.11 and above.)

I updated to Linux 3.17.1, and kept qemu at 2.0 (included in Ubuntu 14.04) and libvirt 1.2.2. Unfortunately, the TCP checksuming bug still exists. However, this resolved my Windows 2008 R2 latency issues. I am no longer seeing latency jumps to 1500ms or packet loss under load, this was using SRV-IO passthrough of a NIC.

Due to the issues I was experiencing with TCP checksuming, virtio-net and openvswitch I decided to update to libvirt 1.2.9 which includes new support for tuning guest network interfaces. This allows me to cleanly turn off TCP checksuming on an interface using the following interface definition (and thus allows all my guests to function properly):

<interface type='network'>
  <model type='virtio'/>
  <driver name='vhost'>
    <guest csum='off' tso4='off' tso6='off'/>
  </driver>
</interface>

Additionally, my Sophos UTM 9 guest (which is my firewall) no longer halts cleanly so I tried updating to qemu 2.1 – but this did not solve the issue. I have decided to leave the newer releases in place, as they have improved performance with the Windows guests as well.

For those interested, pre-built packages for Ubuntu 14.04 amd64 are available here.

New KVM deployment bugs and recommendations (Ubuntu 14.04: qemu 2.0, libvirt 1.2.4, Linux 3.10)

New Linux KVM qemu deployment, running on Ubuntu 14.04 with Linux 3.10 kernel and openvswitch. Hardware setup is 2 SSD in RAID1, and 2 7200RPM HDD in RAID1 using mdadm. bcache is being used as the backing cache for the HDD.

Bugs

  • hv_vapic ("vapic state='on'" in libvirt) causes Windows 2008 R2 and above VMs not to boot if CPU is an Intel IvyBridge or greater (check /sys/module/kvm_intel/parameters/enable_apicv) – Redhat Bugzilla
  • Linux 3.12 or greater (Ubuntu 14.04 ships with 3.13) have issues with virtio-net NIC and TSO (RX and TX checksuming) offloading – TCP sessions can't be established across virtual machines in certain situations (think a virtual machine as a firewall) – Debian Bugreport
  • Windows virtual machines still freeze up/high latency if you use virtio NIC, this is with the latest signed drivers available from the Fedora Project
  • Still have issues with "Russian roulette" of network interfaces with openvswitch – Blog post

Recommendations

Installed Packages

System
apt-get install haveged ntp sysstat irqbalance acpid
Linux KVM, openvswitch, virt-install, virt-top
apt-get install qemu-kvm libvirt-bin virtinst virt-top openvswitch-switch sysfsutils iotop gdisk iftop
bcache
apt-get install python-software-properties
add-apt-repository ppa:g2p/storage && apt-get update && apt-get install bcache-tools

Tuning memory, scheduler I/O subsystems for Linux KVM

Taken from RHEL 6 tuned (virtual-host)

/etc/sysctl.conf
kernel.sched_min_granularity_ns=10000000
kernel.sched_wakeup_granularity_ns=15000000
vm.dirty_ratio=10
vm.dirty_background_ratio=5
vm.swappiness=10

Disable experimental virtio-net zero copy transmit

RHEL 7 has experimental_zcopytx disabled by default.

/etc/modprobe.d/vhost-net.conf
options vhost_net  experimental_zcopytx=0

Use virtio-blk for guests, and enable Multiqueue virtio-net (except Windows)

Linux KVM page describing Multiqueue

libvirt
<devices>
  <interface type='network'>
    <model type='virtio'/>
    <driver name='vhost' queues='4'/>
  </interface>
</devices>

Where number of queues is equal to the number of virtual processors assigned to the virtual machine. Don't forget to enable the vhost_net kernel module, edit /etc/default/qemu-kvm and set VHOST_NET_ENABLED=1.

Make sure to enable Multiqueue support in the guest

ethtool -L eth0 combined 4

Use deadline scheduler, and enable transparent hugepages for KVM

/etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="elevator=deadline transparent_hugepage=always"

Don't forget to run update-grub to make the changes persistent.

For Windows guests, take advantage of Hyper-V enlightments and use e1000 Ethernet adapter

Linux KVM presentation on Hyper-V enlightenment (slightly outdated)

  • hv_vapic (for "supported processors") for Virtual APIC
  • hv_time (aka "hypervclock") for TSC invariant timestamps passed to guest
  • hv_relaxed to prevent BSOD under high load (when a timer can't be serviced when expected)
  • hv_spinlocks let's the guest know when a virtual processor is trying to acquire a lock on the same resource as another processor
libvirt
<features>
  <acpi/>
  <apic/>
  <hyperv>
    <relaxed state='on'/>
    <vapic state='on'/>
    <spinlocks state='on' retries='4096'/>
  </hyperv>
</features>
<clock offset='localtime'>
  <timer name='hypervclock' present='yes'/>
  <timer name='hpet' present='no'/>
</clock>

Build and install longterm Linux 3.10 kernel for stability (and working openvswitch with virtio-net)

apt-get -y install build-essential
cd /usr/local/src
wget https://www.kernel.org/pub/linux/kernel/v3.x/linux-3.10.44.tar.xz
tar -Jxf linux-3.10.44.tar.xz
cd linux-3.10.44
cp /boot/config-`uname -r` .config
make olddefconfig
make -j`nproc` INSTALL_MOD_STRIP=1 deb-pkg
dpkg -i ../*.deb
apt-mark hold linux-libc-dev

Time keeping is king on FreeBSD – TSC and "how not to have time go backwards in guest"

/etc/sysctl.conf
kern.timecounter.hardware=ACPI-fast
/boot/loader.conf
virtio_load="YES"
virtio_pci_load="YES"
virtio_blk_load="YES"
if_vtnet_load="YES"
virtio_balloon_load="YES"
kern.timecounter.smp_tsc="1"
kern.timecounter.invariant_tsc="1"
libvirt
<clock offset='localtime'>
  <timer name='rtc' tickpolicy='catchup'/>
  <timer name='pit' tickpolicy='delay'/>
  <timer name='hpet' present='no'/>
</clock>

openvswitch and libvirt: vnet port "russian roulette" on restart (solution)

Update: This issue has been resolved in libvirt 1.2.7 release, or commit. The below instructions are no longer required if your distribution has updated the package.

libvirt has openvswitch integration. When a virtual machine is started that is using openvswitch for the network port, a vnetX interface is created (where X is an incremental number, from 0) on start and destroyed on shutdown by libvirt. openvswitch's configuration is persistent, being that the vnetX interface created by libvirt is saved to a database and will be available on the following reboot.

As outlined in my bug report submitted in September 2013, this quickly breaks down if libvirtd is shutdown after openvswitch because libvirt can't delete the port it's created or the machine is restarted/shutdown incorrectly. If you have virtual machines that are on different VLANs, or interfaces you can quickly have them being assigned to the wrong virtual machine as libvirt doesn't error out if the interface already exists when it tries to create it (imagine swapping around LAN and WAN ports on a firewall.)

I solved this by adding creating an upstart job override on the Ubuntu LTS releases in /etc/init/openvswitch-switch.override:

post-start script
    ovs-vsctl show | grep 'Port \"vnet[0-9]*\"' | awk -F\" {'print $2'} | xargs -I {} ovs-vsctl del-port {} || :
end script

I've tested this issue and proven it's existence in OpenSuSE 12.3 (Dartmouth), Debian (stable) and Ubuntu 12.04/14.04 (LTS) distributions.