Linux 3.17 KVM, qemu 2.1, libvirt 1.2.9 experiences (and how to cleanly disable TCP checksum offload in libvirt)

Update: This issue has been resolved in kernel 3.18.10 release. The below instructions are no longer required if your distribution has updated the kernel or backported the fix.

Due to latency issues that I was having with KVM and Windows 2008 R2 with Linux 3.10, I decided to update to Linux 3.17 series despite the TCP checksuming issue that I had been encountering (eg. virtio-net not working at all between guests due to the CHECKSUM_PARTIAL bug in 3.11 and above.)

I updated to Linux 3.17.1, and kept qemu at 2.0 (included in Ubuntu 14.04) and libvirt 1.2.2. Unfortunately, the TCP checksuming bug still exists. However, this resolved my Windows 2008 R2 latency issues. I am no longer seeing latency jumps to 1500ms or packet loss under load, this was using SRV-IO passthrough of a NIC.

Due to the issues I was experiencing with TCP checksuming, virtio-net and openvswitch I decided to update to libvirt 1.2.9 which includes new support for tuning guest network interfaces. This allows me to cleanly turn off TCP checksuming on an interface using the following interface definition (and thus allows all my guests to function properly):

<interface type='network'>
  <model type='virtio'/>
  <driver name='vhost'>
    <guest csum='off' tso4='off' tso6='off'/>
  </driver>
</interface>

Additionally, my Sophos UTM 9 guest (which is my firewall) no longer halts cleanly so I tried updating to qemu 2.1 – but this did not solve the issue. I have decided to leave the newer releases in place, as they have improved performance with the Windows guests as well.

For those interested, pre-built packages for Ubuntu 14.04 amd64 are available here.

4 Comments.

  1. Have you tried the Ubuntu 14.10 stack with this issue? Also does multi-queue virtio-net work properly with open-vswitch?

    • I have not tried Ubuntu 14.10, but multi-queue virtio-net works fine with all stock Ubuntu 14.04 components.

      • Thanks for your response, I wasn't able to recreate the checksum_partial bug. According to the link you provided, it seems the vlan tag would have to be passed to the guest for this bug to occur.

        • Exactly the situation where the bug occurs. I have a trunked port on a virtual machine which is running a firewall, and this is where I run into the issue.