QEMU agent for graceful shutdown of Windows guests under libvirt/qemu-kvm

libvirt sending an ACPI shutdown doesn't always prompt Windows guests to shutdown. That's why there is a QEMU guest agent (which is also handy for freezing/thawing guest file systems.) Installing QEMU guest agent will also cause libvirt to block on shutdown commands until the guest has terminated.

libvirt XML definition required

<channel type='unix'>
<target type='virtio' name='org.qemu.guest_agent.0'/>
<address type='virtio-serial' controller='0' bus='0' port='1'/>
</channel>

You may also have to create /var/lib/libvirt/qemu/channel/target on the KVM host.

mkdir -p /var/lib/libvirt/qemu/channel/target
chown -R libvirt-qemu:kvm /var/lib/libvirt/qemu/channel/target

virtio-win drivers distributed by the Fedora Project contain the guest-agent and required virtioserial drivers for communication between guest and host and can be downloaded as an RPM package called virtio-win.

Once you have the RPM, you can either install it or convert it to a Debian package using alien. The ISO will be installed to /usr/share/virtio-win/virtio-win.iso and can be mounted in the guest.

virsh attach-disk GuestName /usr/share/virtio-win/virtio-win.iso hdc –type cdrom –mode readonly

Once you have restarted the guest with the XML definition changes, you can complete the following steps

  • Install virtioserial driver for PCI Simple Communication Controller under Device Manager, in the vioserial folder
  • Install the guest-agent located under the guest-agent folder for your architecture in the virtio-win ISO

Intel E1G42ET (82576 controller) SR-IOV with Windows 2008 R2 guest

I've followed the Redhat Enterprise Linux 7 Using SR-IOV guide, with the following changes made for Ubuntu 14.04 and the fact that the Intel driver set (PROWinx64) doesn't install the drivers automatically.

Make sure to bring the network link state up before you start the virtual machine, or the network driver will report "Network cable unplugged" permanently. igbvf doesn't want to detach on Linux 3.10 on Ubuntu 14.04, so I have blacklisted the module.

/etc/modprobe.d/blacklist-igbvf.conf

blacklist igbvf

/etc/modprobe.d/igb.conf

options igb max_vfs=7

Download the latest Intel Virtual Function drivers from the Intel site, extract PROWinx64 with your favourite archival program. Then run the following command:

pnputil -a PRO1000\Winx64\NDIS62\v1q62x64.inf

Then you can either go to Device Manager and scan for New hardware changes or restart the virtual machine. Your guest networking should now be working.

New KVM deployment bugs and recommendations (Ubuntu 14.04: qemu 2.0, libvirt 1.2.4, Linux 3.10)

New Linux KVM qemu deployment, running on Ubuntu 14.04 with Linux 3.10 kernel and openvswitch. Hardware setup is 2 SSD in RAID1, and 2 7200RPM HDD in RAID1 using mdadm. bcache is being used as the backing cache for the HDD.

Bugs

  • hv_vapic ("vapic state='on'" in libvirt) causes Windows 2008 R2 and above VMs not to boot if CPU is an Intel IvyBridge or greater (check /sys/module/kvm_intel/parameters/enable_apicv) – Redhat Bugzilla
  • Linux 3.12 or greater (Ubuntu 14.04 ships with 3.13) have issues with virtio-net NIC and TSO (RX and TX checksuming) offloading – TCP sessions can't be established across virtual machines in certain situations (think a virtual machine as a firewall) – Debian Bugreport
  • Windows virtual machines still freeze up/high latency if you use virtio NIC, this is with the latest signed drivers available from the Fedora Project
  • Still have issues with "Russian roulette" of network interfaces with openvswitch – Blog post

Recommendations

Installed Packages

System
apt-get install haveged ntp sysstat irqbalance acpid
Linux KVM, openvswitch, virt-install, virt-top
apt-get install qemu-kvm libvirt-bin virtinst virt-top openvswitch-switch sysfsutils iotop gdisk iftop
bcache
apt-get install python-software-properties
add-apt-repository ppa:g2p/storage && apt-get update && apt-get install bcache-tools

Tuning memory, scheduler I/O subsystems for Linux KVM

Taken from RHEL 6 tuned (virtual-host)

/etc/sysctl.conf
kernel.sched_min_granularity_ns=10000000
kernel.sched_wakeup_granularity_ns=15000000
vm.dirty_ratio=10
vm.dirty_background_ratio=5
vm.swappiness=10

Disable experimental virtio-net zero copy transmit

RHEL 7 has experimental_zcopytx disabled by default.

/etc/modprobe.d/vhost-net.conf
options vhost_net  experimental_zcopytx=0

Use virtio-blk for guests, and enable Multiqueue virtio-net (except Windows)

Linux KVM page describing Multiqueue

libvirt
<devices>
  <interface type='network'>
    <model type='virtio'/>
    <driver name='vhost' queues='4'/>
  </interface>
</devices>

Where number of queues is equal to the number of virtual processors assigned to the virtual machine. Don't forget to enable the vhost_net kernel module, edit /etc/default/qemu-kvm and set VHOST_NET_ENABLED=1.

Make sure to enable Multiqueue support in the guest

ethtool -L eth0 combined 4

Use deadline scheduler, and enable transparent hugepages for KVM

/etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="elevator=deadline transparent_hugepage=always"

Don't forget to run update-grub to make the changes persistent.

For Windows guests, take advantage of Hyper-V enlightments and use e1000 Ethernet adapter

Linux KVM presentation on Hyper-V enlightenment (slightly outdated)

  • hv_vapic (for "supported processors") for Virtual APIC
  • hv_time (aka "hypervclock") for TSC invariant timestamps passed to guest
  • hv_relaxed to prevent BSOD under high load (when a timer can't be serviced when expected)
  • hv_spinlocks let's the guest know when a virtual processor is trying to acquire a lock on the same resource as another processor
libvirt
<features>
  <acpi/>
  <apic/>
  <hyperv>
    <relaxed state='on'/>
    <vapic state='on'/>
    <spinlocks state='on' retries='4096'/>
  </hyperv>
</features>
<clock offset='localtime'>
  <timer name='hypervclock' present='yes'/>
  <timer name='hpet' present='no'/>
</clock>

Build and install longterm Linux 3.10 kernel for stability (and working openvswitch with virtio-net)

apt-get -y install build-essential
cd /usr/local/src
wget https://www.kernel.org/pub/linux/kernel/v3.x/linux-3.10.44.tar.xz
tar -Jxf linux-3.10.44.tar.xz
cd linux-3.10.44
cp /boot/config-`uname -r` .config
make olddefconfig
make -j`nproc` INSTALL_MOD_STRIP=1 deb-pkg
dpkg -i ../*.deb
apt-mark hold linux-libc-dev

Time keeping is king on FreeBSD – TSC and "how not to have time go backwards in guest"

/etc/sysctl.conf
kern.timecounter.hardware=ACPI-fast
/boot/loader.conf
virtio_load="YES"
virtio_pci_load="YES"
virtio_blk_load="YES"
if_vtnet_load="YES"
virtio_balloon_load="YES"
kern.timecounter.smp_tsc="1"
kern.timecounter.invariant_tsc="1"
libvirt
<clock offset='localtime'>
  <timer name='rtc' tickpolicy='catchup'/>
  <timer name='pit' tickpolicy='delay'/>
  <timer name='hpet' present='no'/>
</clock>

KVM PCI Passthrough of an AHCI SATA controller to a guest causing data corruption

I recently migrated from VMware ESXi to Linux KVM, where I was using PCI Passthrough under VMware ESXi to pass through an Intel AHCI SATA controller to a guest. I implemented the same setup by enabling IOMMU on the KVM host, and passed through the AHCI SATA controller to the guest.

After a week or two, I started seeing the following messages in /var/log/syslog on the guest:

Aug  6 13:25:28 yama kernel: [78351.258573] XFS (md0): Corruption detected. Unmount and run xfs_repair
Aug  6 13:25:28 yama kernel: [78351.259102] XFS (md0): Corruption detected. Unmount and run xfs_repair
Aug  6 13:25:28 yama kernel: [78351.259616] XFS (md0): metadata I/O error: block 0x31214bd0 ("xfs_trans_read_buf_map") error 117 numblks 16
Aug  6 13:25:28 yama kernel: [78351.260203] XFS (md0): xfs_imap_to_bp: xfs_trans_read_buf() returned error 117.
Aug  6 13:29:10 yama kernel: [78573.533933] XFS (md0): Invalid inode number 0xfeffffffffffffff
Aug  6 13:29:10 yama kernel: [78573.533940] XFS (md0): Internal error xfs_dir_ino_validate at line 160 of file /build/buildd/linux-lts-raring-3.8.0/fs/xfs/xfs_dir2.c.  Caller 0xffffffffa045cd96
Aug  6 13:29:10 yama kernel: [78573.533940]
Aug  6 13:29:10 yama kernel: [78573.538440] Pid: 1723, comm: kworker/0:1H Tainted: GF            3.8.0-27-generic #40~precise3-Ubuntu
Aug  6 13:29:10 yama kernel: [78573.538443] Call Trace:
Aug  6 13:29:10 yama kernel: [78573.538496]  [<ffffffffa042316f>] xfs_error_report+0x3f/0x50 [xfs]
Aug  6 13:29:10 yama kernel: [78573.538537]  [<ffffffffa045cd96>] ? __xfs_dir2_data_check+0x1e6/0x4a0 [xfs]
Aug  6 13:29:10 yama kernel: [78573.538560]  [<ffffffffa045a150>] xfs_dir_ino_validate+0x90/0xe0 [xfs]
Aug  6 13:29:10 yama kernel: [78573.538579]  [<ffffffffa045cd96>] __xfs_dir2_data_check+0x1e6/0x4a0 [xfs]
Aug  6 13:29:10 yama kernel: [78573.538598]  [<ffffffffa045d0ca>] xfs_dir2_data_verify+0x7a/0x90 [xfs]
Aug  6 13:29:10 yama kernel: [78573.538637]  [<ffffffff810135aa>] ? __switch_to+0x12a/0x4a0
Aug  6 13:29:10 yama kernel: [78573.538664]  [<ffffffffa045d195>] xfs_dir2_data_reada_verify+0x95/0xa0 [xfs]
Aug  6 13:29:10 yama kernel: [78573.538675]  [<ffffffff8108e2aa>] ? finish_task_switch+0x4a/0xf0
Aug  6 13:29:10 yama kernel: [78573.538697]  [<ffffffffa042133f>] xfs_buf_iodone_work+0x3f/0xa0 [xfs]
Aug  6 13:29:10 yama kernel: [78573.538706]  [<ffffffff81078c21>] process_one_work+0x141/0x490
Aug  6 13:29:10 yama kernel: [78573.538710]  [<ffffffff81079be8>] worker_thread+0x168/0x400
Aug  6 13:29:10 yama kernel: [78573.538714]  [<ffffffff81079a80>] ? manage_workers+0x120/0x120
Aug  6 13:29:10 yama kernel: [78573.538721]  [<ffffffff8107f0f0>] kthread+0xc0/0xd0
Aug  6 13:29:10 yama kernel: [78573.538726]  [<ffffffff8107f030>] ? flush_kthread_worker+0xb0/0xb0
Aug  6 13:29:10 yama kernel: [78573.538730]  [<ffffffff816fc6ac>] ret_from_fork+0x7c/0xb0
Aug  6 13:29:10 yama kernel: [78573.538735]  [<ffffffff8107f030>] ? flush_kthread_worker+0xb0/0xb0

I initially used xfs_repair on the file system, thinking that the issue was caused by a number of power failures that happened when the machine was running ESXi. However, this did not resolve the issue and made the problem worse. Eventually I decided that I wanted to scrap the file system, and pulled a drive from the array to backup the data and re-create the file system.

The drive that I pulled from the array for backups started showing the same issues with XFS corruption.

After further investigation via trial-and-error, I determined that KVM PCI Passthrough was causing the issue and decided to just pass through an array to the guest using vrtio-block – This solved the corruption problem and I haven't had any issues (knock on wood) since!