NetGear GS108T, NetGear R7000 Nighthawk AP and Intel igb/e1000e on Linux: adapter reset

What's the issue?

At home I have a NetGear GS108T with a NetGear R7000 serving as an Access Point. My Ubuntu 16.04 machine which functions as a hypervisor (qemu/kvm and openvswitch) has an "adapter reset"/"transmit queue timed out" error on igb and e1000e Intel NICs and causes a 30-60 second outage when my NetGear R7000 Access Point restarts. Kernel logs show the following:

Jun 29 16:31:26 kvm kernel: [ 145.888552] ------------[ cut here ]------------
Jun 29 16:31:26 kvm kernel: [ 145.888557] NETDEV WATCHDOG: p6p1 (igb): transmit queue 0 timed out
Jun 29 16:31:26 kvm kernel: [ 145.888595] WARNING: CPU: 6 PID: 0 at /build/linux-hwe-0EwvTm/linux-hwe-4.15.0/net/sched/sch_generic.c:323 dev_watchdog+0x222/0x230
Jun 29 16:31:26 kvm kernel: [ 145.888597] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio ebtable_filter ebtables xt_multiport openvswitch nsh nf_nat_ipv6 nf_nat_ipv4 binfmt_misc ip6t_REJECT nf_reject_ipv6 xt_hl ip6t_rt ppdev intel_rapl nf_conntrack_ipv6 nf_defrag_ipv6 x86_pkg_temp_thermal intel_powerclamp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ipt_REJECT nf_reject_ipv4 intel_cstate usblp xt_limit xt_tcpudp ipmi_si ipmi_devintf xt_addrtype intel_rapl_perf ipmi_msghandler lpc_ich parport_pc mei_me mei pcbc shpchp ie31200_edac mac_hid aesni_intel crypto_simd glue_helper cryptd aes_x86_64 dm_crypt vhost_net vhost tap kvm_intel kvm irqbypass nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack
Jun 29 16:31:26 kvm kernel: [ 145.888689] libcrc32c iptable_filter ip_tables x_tables sunrpc ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nct6775 hwmon_vid coretemp lp parport autofs4 btrfs xor zstd_compress raid6_pq raid1 raid10 ast ttm drm_kms_helper syscopyarea sysfillrect igb sysimgblt fb_sys_fops hid_generic dca i2c_algo_bit usbhid ahci ptp libahci drm mpt3sas pps_core hid raid_class scsi_transport_sas video
Jun 29 16:31:26 kvm kernel: [ 145.888761] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 4.15.0-54-generic #58~16.04.1-Ubuntu
Jun 29 16:31:26 kvm kernel: [ 145.888763] Hardware name: ASUSTeK COMPUTER INC. P9D-C Series/P9D-C Series, BIOS 1301 01/26/2015
Jun 29 16:31:26 kvm kernel: [ 145.888770] RIP: 0010:dev_watchdog+0x222/0x230
Jun 29 16:31:26 kvm kernel: [ 145.888773] RSP: 0018:ffff8a72efd83e68 EFLAGS: 00010282
Jun 29 16:31:26 kvm kernel: [ 145.888777] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
Jun 29 16:31:26 kvm kernel: [ 145.888779] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff8a72efd96490
Jun 29 16:31:26 kvm kernel: [ 145.888782] RBP: ffff8a72efd83e98 R08: 0000000000000001 R09: 000000000000049c
Jun 29 16:31:26 kvm kernel: [ 145.888785] R10: 0000000000000000 R11: 000000000000049c R12: 0000000000000008
Jun 29 16:31:26 kvm kernel: [ 145.888787] R13: ffff8a72bda80000 R14: ffff8a72bda80478 R15: ffff8a72bda79940
Jun 29 16:31:26 kvm kernel: [ 145.888791] FS: 0000000000000000(0000) GS:ffff8a72efd80000(0000) knlGS:0000000000000000
Jun 29 16:31:26 kvm kernel: [ 145.888794] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 29 16:31:26 kvm kernel: [ 145.888797] CR2: 00007fa1302dbf48 CR3: 00000001f120a002 CR4: 00000000001626e0
Jun 29 16:31:26 kvm kernel: [ 145.888800] Call Trace:
Jun 29 16:31:26 kvm kernel: [ 145.888803] <IRQ>
Jun 29 16:31:26 kvm kernel: [ 145.888812] ? dev_deactivate_queue.constprop.33+0x60/0x60
Jun 29 16:31:26 kvm kernel: [ 145.888819] call_timer_fn+0x32/0x140
Jun 29 16:31:26 kvm kernel: [ 145.888824] run_timer_softirq+0x1ed/0x440
Jun 29 16:31:26 kvm kernel: [ 145.888830] ? ktime_get+0x3e/0xa0
Jun 29 16:31:26 kvm kernel: [ 145.888838] ? lapic_next_deadline+0x26/0x30
Jun 29 16:31:26 kvm kernel: [ 145.888846] __do_softirq+0xf5/0x28f
Jun 29 16:31:26 kvm kernel: [ 145.888856] irq_exit+0xb8/0xc0
Jun 29 16:31:26 kvm kernel: [ 145.888861] smp_apic_timer_interrupt+0x79/0x140
Jun 29 16:31:26 kvm kernel: [ 145.888866] apic_timer_interrupt+0x84/0x90
Jun 29 16:31:26 kvm kernel: [ 145.888868] </IRQ>
Jun 29 16:31:26 kvm kernel: [ 145.888879] RIP: 0010:cpuidle_enter_state+0xa7/0x300
Jun 29 16:31:26 kvm kernel: [ 145.888881] RSP: 0018:ffffac58031cfe60 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff11
Jun 29 16:31:26 kvm kernel: [ 145.888886] RAX: ffff8a72efda2840 RBX: 0000000000000005 RCX: 000000000000001f
Jun 29 16:31:26 kvm kernel: [ 145.888888] RDX: 0000000000000000 RSI: 0000000025bbf79e RDI: 0000000000000000
Jun 29 16:31:26 kvm kernel: [ 145.888890] RBP: ffffac58031cfe98 R08: ffff8a72efda16a4 R09: 0000000000000018
Jun 29 16:31:26 kvm kernel: [ 145.888893] R10: ffffac58031cfe30 R11: 0000000000000391 R12: 0000000000000005
Jun 29 16:31:26 kvm kernel: [ 145.888895] R13: ffff8a72efdacf00 R14: ffffffff9f371e38 R15: 00000021f77f87f8
Jun 29 16:31:26 kvm kernel: [ 145.888905] cpuidle_enter+0x17/0x20
Jun 29 16:31:26 kvm kernel: [ 145.888914] call_cpuidle+0x23/0x40
Jun 29 16:31:26 kvm kernel: [ 145.888920] do_idle+0x197/0x200
Jun 29 16:31:26 kvm kernel: [ 145.888926] cpu_startup_entry+0x73/0x80
Jun 29 16:31:26 kvm kernel: [ 145.888931] start_secondary+0x1ab/0x200
Jun 29 16:31:26 kvm kernel: [ 145.888939] secondary_startup_64+0xa5/0xb0
Jun 29 16:31:26 kvm kernel: [ 145.888942] Code: 37 00 49 63 4e e8 eb 92 4c 89 ef c6 05 a8 63 d8 00 01 e8 52 33 fd ff 89 d9 48 89 c2 4c 89 ee 48 c7 c7 c0 1e fa 9e e8 3e 3c 80 ff <0f> 0b eb c0 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48
Jun 29 16:31:26 kvm kernel: [ 145.889026] ---[ end trace 87c230fe7d27f115 ]---
Jun 29 16:31:26 kvm kernel: [ 145.889079] igb 0000:0a:00.0 p6p1: Reset adapter
Jun 29 16:31:30 kvm kernel: [ 149.681151] igb 0000:0a:00.0 p6p1: igb: p6p1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Jun 29 16:31:40 kvm kernel: [ 159.964671] igb 0000:0a:00.0 p6p1: Reset adapter
Jun 29 16:31:44 kvm kernel: [ 163.669265] igb 0000:0a:00.0 p6p1: igb: p6p1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX

This only happened on my Linux hypervisor, none of the other devices on my network exhibited this behaviour.

What troubleshooting steps did I take?

  • Different network cables
  • Using a newer kernel (4.15, I was on 4.4 to start)
  • Building igb drivers from Intel's website via DKMS and loading them
  • Using a different network card in a PCI express slot (e1000e) vs. onboard (igb)
  • Trying all slots on the NIC
  • Updating the BIOS on the ASUS server mainboard (P9D-C) to the latest available
  • Turning off TSO/GSO offloading on the NIC via ethtool

None of these issues resolved the error I was encountering. If I restarted the R7000, it caused the adapter to time out every time.

How did I solve it?

Disabling RX/TX Flow control on the switch caused the issue to stop happening.

Since I did not want to disable RX/TX Flow control globally, I implemented the fix by disabling auto negotiate and setting RX/TX flow control off (aka. "PAUSE" Ethernet frames) using interface(8) and ethtool on Ubuntu:

pre-up /sbin/ethtool -s $IFACE autoneg off speed 1000 duplex full
pre-up /sbin/ethtool -A $IFACE autoneg off rx off

Now my R7000 can do whatever it likes, including randomly restarting and my network connectivity continues to function on my hypervisor.

prosody websocket behind nginx reverse proxy

Useful for kawai and other XMPP services behind the same URL (eg. for serving SSL traffic.)

WebSockets require HTTP/1.1, and prosody assumes traffic on TCP port 5280 is not secure (and trying to force it to starttls) thus requiring the configuration knob highlighted below.

prosody.cfg.lua

consider_websocket_secure = true

nginx.conf

map $http_upgrade $connection_upgrade {
    default upgrade;
    '' close;
}

server {
# ...
  location /xmpp-websocket {
    proxy_pass http://127.0.0.1:5280;
    proxy_buffering off;
    proxy_set_header Host $host;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
  }
}

Formosa21 eHome Infrared Transceiver (MCE) and OpenELEC 6: buttons not working

Since the upgrade to OpenELEC 6.0, I think there were some changes in the kernel regarding usbhid that causes this Windows Media Remoteclone  to behave strangely sometimes on boot (guessing due to race condition) where some keys (especially OK) do not work on the remote. The transceiver portion of the device identifies itself as:

lsusb
Bus 001 Device 004: ID 147a:e03a Formosa Industrial Computing, Inc. eHome Infrared Receiver
Kernel buffer (dmesg)
New USB device found, idVendor=147a, idProduct=e03a
New USB device strings: Mfr=1, Product=2, SerialNumber=3
Product: eHome Infrared Transceiver
Manufacturer: Formosa21

You may see the following error messages in your kernel buffer when buttons aren't working:

hid-generic 0003:147A:E03A.0001: timeout initializing reports
mceusb  Error: urb status = -71

To resolve this error, mount /flash partition as read/write and added the following string to /flash/cmdline.txt:

usbhid.quirks=0x147A:0xE03A:0x20000000

This activates the HID_QUIRK_NO_INIT_REPORTS (0x20000000) quirk which resolves the timeout error and allows lirc to bind properly to the USB device.

radicale behind Apache reverse proxy with Dovecot authentication

  • Requires mod_authn_dovecot for Apache 2.2 and 2.4, which can authenticate against Dovecot using email address or just username (depending on Dovecot configuration)
  • Requires auth_basic, authn_alias, authn_default, authz_default, authz_host and authz_user to be enabled for authentication.
  • For the reverse proxy: proxy, and proxy_http must be enabled in Apache.
  • Optional AppArmor changehat support provided (and accompanied AppArmor profiles for the web component in Apache and radicale itself.)

Apache configuration (/etc/apache2/conf.d/radicale.conf or equivalent)

ProxyPassMatch ((\.(ics|vcf))|((\.well-known\/)?(cal|card)dav)/)$ http://localhost:5232
<LocationMatch ((\.(ics|vcf))|((\.well-known\/)?(cal|card)dav)/)$>
   AuthType basic
   AuthName "Dovecot Authentication"
   AuthBasicProvider dovecot
   AuthDovecotAuthSocket /var/run/dovecot/auth-client
   AuthDovecotTimeout 5
   AuthDovecotAuthoritative On
   Require valid-user

   RewriteEngine On
   RewriteCond %{REMOTE_USER}%{REQUEST_URI} !^([^/]+/)\1
   RewriteCond %{REQUEST_URI} !^/.well-known/.+
   RewriteRule .* - [Forbidden]
   <IfModule security2_module>
      SecRuleEngine On
   </IfModule>
   <IfModule apparmor_module>
     AAHatName radicale
   </IfModule>
</LocationMatch>

Radicale configuration, relevant sections only (/etc/radicale/config)

[server]
hosts = 127.0.0.1:5232

[auth]
type = remote_user

[rights]
type = None

[storage]
filesystem_folder = /var/lib/radicale/collections

/etc/apparmor.d/usr.bin.radicale

/usr/bin/radicale {
  #include <abstractions/base>
  #include <abstractions/nameservice>
  #include <abstractions/python>



  /bin/dash rix,
  /etc/radicale/* r,
  /proc/*/mounts r,
  /run/radicale/* w,
  /sbin/ldconfig rix,
  /sbin/ldconfig.real rix,
  /usr/bin/python2.7 ix,
  /usr/bin/radicale r,
  /var/lib/radicale/** rw,
  /var/log/radicale/* w,

}

/etc/apparmor.d/apache2/radicale

^radicale {
  #include <abstractions/apache2-common>
  #include <abstractions/base>
  #include <abstractions/nameservice>

  # for log writing (could be abstracted)
  /var/log/apache2/*.log w,


}

Boot LVM on mdraid (5, and others) on Ubuntu 14.04 on newer kernels

If you build a LVM mdraid5 on Ubuntu 14.04, and update the kernel you may be dropped into initramfs on reboot and be forced to manually activate the logical volumes on the volume group. This is due to a missing/incomplete udev rule for LVM which should be incorporated into initramfs.

/etc/udev/rules.d/85-lvm2.rules

# This file causes block devices with LVM signatures to be automatically
# added to their volume group.
# See udev(8) for syntax

SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_TYPE}=="disk", \
        RUN+="watershed sh -c '/sbin/lvm vgscan; /sbin/lvm vgchange -a y'"

Once you have added this udev rule, you should update initramfs on your system:

update-initramfs -u -k all

If you need to activate the logical volume groups from initramfs, execute the following commands to boot the system:

lvm vgscan
lvm vgchange -a y
exit