pfSense Gold Subscription

Author Topic: IMPORTANT: Xen/KVM networking will not work using default hypervisor settings!  (Read 38470 times)

0 Members and 1 Guest are viewing this topic.

Offline johnkeates

  • Hero Member
  • *****
  • Posts: 639
  • Karma: +51/-1
    • View Profile

This is still needed with 2.4-RELEASE so any version using VirtIO (2.2 and above) so far is still affected.



If you are reading this, you probably have issues with your virtual network and pfsense, usually when packets need to pass pfSense for NAT or routing.
You will be able to ping stuff, but TCP and sometimes it seems UDP as well might fail or transport super-slow.

An issue exists with VirtIO drivers in combination with checksum offloading and the packet filter (pf) when you leave checksum offloading on for your virtual interfaces.

The reason this is happening is that virtual networks don't need checksums to verify the integrity of packets that are sent over a wire, because there is no wire in the virtual network (it's using shared memory). So packets will not be checksummed by the virtual interfaces and therefore supposedly arrive at pf with an invalid checksum. Those packets get dropped! I currently don't know if this is intended behavior, but since the packets are practically incorrect it would seem understandable that they get filtered out and dropped.


Your symptoms should include:

- Ping works with no problem, even over NAT, from LAN to LAN, from WAN to WAN and any cross-subnet combination with the correct NAT/gateway rules.
- TCP connections work in one way or between specific hosts, but in one direction to WAN or from WAN they don't work, or just silently drop. Really slow traffic (~0.4Kbps) has been observed too
- UDP seems to work sometimes, but sometimes it randomly fails depending on the application (which seems to point towards a TCP initiation)

In case you are not sure, you can apply the offloading change anyway, as it won't harm your network and at it's worst will simply degrade performance a few percent. Reverting is easy.

This will, however, not fix any VLAN  issues, the VirtIO drivers simply do not support VLANS. Circumvent that by either using VLANs on your VIF stanzas creating multiple interfaces on the pfSense side, or use HVM emulated network devices.


This is currently triple confirmed on IRC and this forum for all Xen types (XenServer, Xen source etc. in at least 3 major Linux distributions and different releases and kernels), so if you are using Xen and you can't seem to get proper packet flow after upgrading, this is probably your problem. For KVM, only reports on this forum are confirmation, as I didn't need to research anything about it for my own systems and didn't confirm with people anywhere else. Since they both use VirtIO it seems plausible that the checksumming implementation is the same and therefore presents the same problems. In theory, any virtualisation system using virtual interfaces that don't checksum packets will have this issue.


The solution is to turn off at least tx checksum offloading for the interface that pfSense receives it's non-checksummed packets on. Definitely on the pfSense side and on the hypervisor side as well!


I'm collecting platform-specific settings but for now, here is a guide using ethtool for Xen hosts using XL as toolstack and a Linux control domain (dom0):

To fix the checksum problem, do the following:

1. In pfSense, make sure all forms of offloading are turned off. We don't want to offload anything to the virtIO drivers!
2. On the hypervisor side of the pfSense interfaces (commonly called vifX.Y where X is the instance ID and Y is the interface ID) at least turn off tx offloading.

Regarding step 2: if your hypervisor has a Linux control domain (which it will in most cases) you can use the program called ethtool to do this, for example:

Code: [Select]
$ sudo ethtool -K vif123.3 tx off
This is assuming you are using sudo so gain root previleges and vif123.3 is the pfSense netback interface you want to turn tx off for.


(Beginners guide ahead!)

If you do not know which interface is for the pfSense vm (domU in Xen lingo), follow the following steps (on dom0, the Xen control domain):

1. List the currently running instances using your command toolchain, in case of the current standard (which is XenLight, or xl):

Code: [Select]
$ sudo xl list
This will give you a list of all the VMs containing their name as well as their current running ID. Note this ID.

2. List all the interfaces for this VM by listing all the interfaces on the system and filtering for the vif ID that belongs to your pfSense vm. In this example, I use 16 as the ID, so I'll grep for vif16:

Code: [Select]
$ sudo ifconfig | grep vif16
This will give you a list of all the virtual interfaces that your pfSense interfaces are connected to using VirtIO.
It might look like:

Code: [Select]
vif16.0   Link encap:Ethernet  HWaddr fe:ff:ff:ff:ff:ff 
vif16.1   Link encap:Ethernet  HWaddr fe:ff:ff:ff:ff:ff 

If there are lines like "vif16.0-emu", ignore those.

3. Disable offloading for the interfaces to make sure the dom0 (xen's control domain, Linux) recalculates the sum. If one of those interfaces is bridged to an actual hardware network card, this isn't strictly needed. If you are not sure that is the case for you, disable offloading for all interfaces. The cost to the CPU is a lot lower than it used to be.

Code: [Select]
$ sudo ethtool -K vif16.0 tx off
$ sudo ethtool -K vif16.1 tx off

(Beginners guide ending here!)

From this point on, all traffic should pass as it normally should and because of the VirtIO drivers it should be faster than using the HVM virtual ethernet cards!

Now, those settings are not persistent in any way, so if you want it to stick, you will have to adjust your vif-script on the hypervisor to check the settings whenever you create the domU. It's up to you to find out how to do that, this is a pfSense forum, not a Xen forum (and I don't have the time to create a sample script right now :p).


But what if you don't want to have those bloody VirtIO interfaces? Or what if you really need the old VLAN capability?

Well, just disable PV altogether. VirtIO or Paravirtualised networking relies on communication via a virtual PCI device, which can be turned off for any domU in Xen that you don't want to load PV drivers for. For Xen 4 and above when using the XL toolstack, just add this line to your domU configuration file:

Code: [Select]
xen_platform_pci=0
And stop / start (not restart, we wan't to re-create the domU with the new settings) the pfSense domU.

A different option would be disabling the enlightenment interfaces in pfSense, but that's a bit hacky and modifying your pfSense isn't something I would recommend if you can do it reliably from the outside. You can probably use loader settings, or disable the loading of the virtIO kernel modules, but why make such a mess of thing if the hypervisor can do that with just one line ;-)



I hope this is clear to everybody, from beginners to SysOps running Xen farms, you now know what to do until a better documented fix comes along!

(and to mods/admins: pinning or sticky-ing this post might reduce duplicate threads :) )
« Last Edit: October 13, 2017, 09:14:55 pm by johnkeates »

Offline johnkeates

  • Hero Member
  • *****
  • Posts: 639
  • Karma: +51/-1
    • View Profile
Reserved for future use.

Offline duntuk

  • Newbie
  • *
  • Posts: 9
  • Karma: +0/-0
    • View Profile
Are you suppose to do this on the Windows Device Manager level too?

Example:

Device Manager --> Network adapters --> Intel PRO/1000 PT Dual Port Network Connection --> Advanced --> TCP Checksum Offload (IPv4) --> Value: Disabled


Offline johnkeates

  • Hero Member
  • *****
  • Posts: 639
  • Karma: +51/-1
    • View Profile
Are you suppose to do this on the Windows Device Manager level too?

Example:

Device Manager --> Network adapters --> Intel PRO/1000 PT Dual Port Network Connection --> Advanced --> TCP Checksum Offload (IPv4) --> Value: Disabled

No. This is for the Xen hypervisor side. Not the domU's.

Offline hvisage

  • Newbie
  • *
  • Posts: 22
  • Karma: +1/-0
    • View Profile
The "better" place to do this, is in pfSense: System -> Advanced -> Networking (tab) and check the "Disable hardware checksum offload"


Offline johnkeates

  • Hero Member
  • *****
  • Posts: 639
  • Karma: +51/-1
    • View Profile
The "better" place to do this, is in pfSense: System -> Advanced -> Networking (tab) and check the "Disable hardware checksum offload"

That doesn't do anything to resolve the problem. In theory it should, but in practise it doesn't. Hence the big workaround ;-)

Offline bdube

  • Newbie
  • *
  • Posts: 1
  • Karma: +0/-0
    • View Profile
I had the same situation 2 times, with pfsense as a vm with KVM/Qemu Linux.

What i saw from the packets (capturing with Wireshark as any network guy will do) from inside, they are passing through the pfsense firewall and go outside. When I look at the receive end, another server that I'm managing, i received only syn packets with bad checksum which are dropped at the receive end. And my physical nic at the transmit end has the tcp checksum offload activated.

And other machines directly on the same public virtual network within the same host than pfsense are all able to transmit through the physical nic, and all those vm, as pfsense also, are configured tcp checksum offload to the physical nic.

Then, there is something related with pfsense. The question would be: Why is pysical NIC TCP checksum offload doing his job with any VMs but pfsense?

HTH

Ben

Offline johnkeates

  • Hero Member
  • *****
  • Posts: 639
  • Karma: +51/-1
    • View Profile
I had the same situation 2 times, with pfsense as a vm with KVM/Qemu Linux.

What i saw from the packets (capturing with Wireshark as any network guy will do) from inside, they are passing through the pfsense firewall and go outside. When I look at the receive end, another server that I'm managing, i received only syn packets with bad checksum which are dropped at the receive end. And my physical nic at the transmit end has the tcp checksum offload activated.

And other machines directly on the same public virtual network within the same host than pfsense are all able to transmit through the physical nic, and all those vm, as pfsense also, are configured tcp checksum offload to the physical nic.

Then, there is something related with pfsense. The question would be: Why is pysical NIC TCP checksum offload doing his job with any VMs but pfsense?

HTH

Ben

PVHVM mode drivers in FreeBSD 10 have a problem with PV IO using shared memory for communication. This is because shared memory doesn't need checksums. Checksum-offloading to a IO device that doesn't do any checksumming results in empty or fake checksums. When the checksums are re-calculated in software outside the IO part, it will not match, and therefore the packet gets dropped. In the case of FreeBSD 10, it's a bit worse: it silently drops packets somewhere unknown. A bug report has already been filed. This is not something pfSense is to blame for, nor is it something pfSense can fix.

Wrong or bad checksums on virtual networks are not strange and not 'bad', because transport-checksums are not needed in memory-based transports, only for actual transport over wires etc. outside the machine.

Online pablot

  • Jr. Member
  • **
  • Posts: 58
  • Karma: +3/-0
    • View Profile
Just in case this may help someone this is my experience with KVM.

This drove me mad, since I have spent the last week trying to solve this.

I have an Ubuntu Server 14.04.2 LTS with KVM and libvirt. I have two guests virtual machines:

   - pfSense-2.2: virtio network interfaces, WAN1 - cable, WAN2: aDSL and LAN IP:192.168.2.13
   - Ubuntu Server 14.10: e1000 network interface, IP:192.168.2.10 (Hostname: deathstar)

The host have bridged interfaces as follows:

Code: [Select]
# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
#auto eth0
#iface eth0 inet manual

auto br0
iface br0 inet static
        address 192.168.2.10
        netmask 255.255.255.0
        network 192.168.2.0
        broadcast 192.168.2.255
        gateway 192.168.2.13
        bridge_ports eth0
        bridge_stp on
        bridge_fd 0
        bridge_maxwait 0
        # dns-* options are implemented by the resolvconf package, if installed
        dns-nameservers 192.168.2.13
        dns-search localdomain

auto br1
iface br1 inet manual
        bridge_ports eth1
        bridge_stp off
        bridge_fd 0
        bridge_maxwait 0

auto br2
iface br2 inet manual
        bridge_ports eth2
        bridge_stp off
        bridge_fd 0
        bridge_maxwait 0


The pfSense guest with virtio bridged interfaces is working perfectly but the second guest with Ubuntu works ok only with e1000 interface, if instead I use virtio bridged interface, it can ping and resolve DNS perfectly, buy cannot access any site with wget or apt-get.

No big deal, I can live with it with e1000, but the host machine now cannot access the internet, same symthoms as the before, I can ping and resolve DNS, but cannot access with wget or apt-get.

The funny thing is that this was working perfectly till around 10 days ago, probably some system update "broke" something in the host machine and now it cannot be updated or access any site. (UPDATE: Now, it seems that pfSense 2.2 with the new version of FreeBSD is the update that broke things).

I've searched many times different forums, tried disabling IPv6 (many people reported this fixed the problem), changed bridge_stp on and off, included and excluded "auto eth0" from the /etc/network/interfaces file, etc, but nothing works, I'm stucked with this.

As you can see DNS and ping works ok...

Code: [Select]
pablot@deathstar:~$ ping google.com
PING google.com (173.194.42.14) 56(84) bytes of data.
64 bytes from eze03s05-in-f14.1e100.net (173.194.42.14): icmp_seq=1 ttl=51 time=26.0 ms
64 bytes from eze03s05-in-f14.1e100.net (173.194.42.14): icmp_seq=2 ttl=51 time=27.4 ms
64 bytes from eze03s05-in-f14.1e100.net (173.194.42.14): icmp_seq=3 ttl=51 time=24.9 ms
64 bytes from eze03s05-in-f14.1e100.net (173.194.42.14): icmp_seq=4 ttl=51 time=24.7 ms
^C
--- google.com ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3002ms
rtt min/avg/max/mdev = 24.732/25.798/27.421/1.071 ms
pablot@deathstar:~$


But everything else fails...

Code: [Select]
pablot@deathstar:~$ sudo apt-get update
0% [Connecting to ar.archive.ubuntu.com (200.236.31.4)]

This just ends like this...

Code: [Select]
Err http://ar.archive.ubuntu.com trusty InRelease

Err http://ar.archive.ubuntu.com trusty-updates InRelease

Err http://ar.archive.ubuntu.com trusty Release.gpg
  Unable to connect to ar.archive.ubuntu.com:http:
Err http://ar.archive.ubuntu.com trusty-updates Release.gpg
  Unable to connect to ar.archive.ubuntu.com:http:
Reading package lists... Done
W: Failed to fetch http://ar.archive.ubuntu.com/ubuntu/dists/trusty/InRelease

W: Failed to fetch http://ar.archive.ubuntu.com/ubuntu/dists/trusty-updates/InRelease

W: Failed to fetch http://ar.archive.ubuntu.com/ubuntu/dists/trusty/Release.gpg  Unable to connect to ar.archive.ubuntu.com:http:

W: Failed to fetch http://ar.archive.ubuntu.com/ubuntu/dists/trusty-updates/Release.gpg  Unable to connect to ar.archive.ubuntu.com:http:

W: Some index files failed to download. They have been ignored, or old ones used instead.


And this is what I get with wget...

Code: [Select]
pablot@deathstar:~$ wget google.com
--2015-03-17 10:13:20--  http://google.com/
Resolving google.com (google.com)... 173.194.42.0, 173.194.42.1, 173.194.42.9, ...
Connecting to google.com (google.com)|173.194.42.0|:80... failed: Connection timed out.
Connecting to google.com (google.com)|173.194.42.1|:80... failed: Connection timed out.
Connecting to google.com (google.com)|173.194.42.9|:80... failed: Connection timed out.
Connecting to google.com (google.com)|173.194.42.3|:80... failed: Connection timed out.
Connecting to google.com (google.com)|173.194.42.7|:80... failed: Connection timed out.
Connecting to google.com (google.com)|173.194.42.14|:80... failed: Connection timed out.
Connecting to google.com (google.com)|173.194.42.4|:80... failed: Connection timed out.
Connecting to google.com (google.com)|173.194.42.2|:80... failed: Connection timed out.
Connecting to google.com (google.com)|173.194.42.8|:80...

I've replaced my firewall with a fresh new pfSense install (just in case I've blocked my self without knowing) and the same results. I've also installed the same Ubuntu version on a Virtualbox on my notebok and tried it with both bridged and not bridged interfaces and works perfectly well on both cases trhough the same firewall.

So everything makde me think that my host machine had a wrong configuration that also affects only the ubuntu installation with bridged interface, but cannot found it.

Sooooo, I've replaced pfSense with an old WRT54g with Openwrt and voila!, everything worked as expected. So pfSense was clearly the guilty one, but then I've found this post and.......... FreeBSD seems to be the real bad guy here!!!! :-)

Checked a little and aparently ICMP packets can get in and out, but TCP and UDP packets are blocked or dropped.

Any help to find a work around till this get solved on FreeBSD will be greatly appreciated.

Thanks in advance,
Pablo


UPDATE: ok, found the work around just by following this very same thread...

On the HOST machine I issued the following commands:

pablot@deathstar:~$ sudo ethtool -K br0 tx off
pablot@deathstar:~$ sudo ethtool -K br1 tx off
pablot@deathstar:~$ sudo ethtool -K br2 tx off

Probably not all needed, but this worked for me; and then on the GUEST, just disable all offloading in pfSense. Go to "System->Advance Networking" and disable:

 - Hardware Checksum Offloading
 - Hardware TCP Segmentation Offloading
 - Hardware Large Receive Offloading

and reboot.

Keep in mind that if you reboot the host you'll have to issue this commands again, so you may want to put them on a startup script.

Despite this, I keep waiting for the definitive solution.

Regards,
Pablo
« Last Edit: March 24, 2015, 04:41:30 pm by pablot »

Offline tdslot

  • Newbie
  • *
  • Posts: 1
  • Karma: +10/-0
    • View Profile
Hi to All,

I just installed PFSense 2.2 on Xenserver 6.5 . So I got the same problem as described. And I want to share with my configuration fix for this problem.

Sorry if will be off topic I didn't find best place for this.  :)

Find your PFsense VM network VIF UUID's:
Code: [Select]
[root@xen ~]# xe vif-list vm-name-label="RT-OPN-01"
uuid ( RO)            : 08fa59ac-14e5-f087-39bc-5cc2888cd5f8
         vm-uuid ( RO): 0128bdba-df81-d729-ddbc-c60575e02624
          device ( RO): 1
    network-uuid ( RO): 7af0dc44-dc05-44f2-3741-883acb937747


uuid ( RO)            : 799fa8f4-561d-1b66-4359-18000c1c179f
         vm-uuid ( RO): 0128bdba-df81-d729-ddbc-c60575e02624
          device ( RO): 0
    network-uuid ( RO): 106ad80e-9522-77fd-3cc6-4b2b6fc03ecc

Then modify those VIF UUID's with this settings:
Code: [Select]
xe vif-param-set uuid=08fa59ac-14e5-f087-39bc-5cc2888cd5f8 other-config:ethtool-tx="off"
xe vif-param-set uuid=799fa8f4-561d-1b66-4359-18000c1c179f other-config:ethtool-tx="off"

And Shutdown VM and start again. Not restart PFSense from console.

Offline johnkeates

  • Hero Member
  • *****
  • Posts: 639
  • Karma: +51/-1
    • View Profile
Those configuration examples will be of big help for people just starting out with XenServer :)

Offline craggy

  • Full Member
  • ***
  • Posts: 103
  • Karma: +0/-0
    • View Profile
We are having this same issue on Xen 4.4 under CentOS 6.6

However, we are unable to find the commands necessary to find the pfSense VM vif so we can disable HW checksum offloading on the hypervisor.

"xl network-list VMUUID" doesn't return the vif id.

Anyone any suggestions?

Offline johnkeates

  • Hero Member
  • *****
  • Posts: 639
  • Karma: +51/-1
    • View Profile
We are having this same issue on Xen 4.4 under CentOS 6.6

However, we are unable to find the commands necessary to find the pfSense VM vif so we can disable HW checksum offloading on the hypervisor.

"xl network-list VMUUID" doesn't return the vif id.

Anyone any suggestions?

You can't do that using XL. Configuring network interfaces via Xen can only be done using XE on XenServer. Using XL, you have to find the vif as outlined in the first post.
To make the settings stick, edit the vif-script you are using for your domains.

Offline tiagoratto

  • Newbie
  • *
  • Posts: 1
  • Karma: +0/-0
    • View Profile
Hello there! I'm a newcomer and started using pfSense a few months ago. I moved the pfSense VM to another server and started having the mentioned checksum problems. This thread confirmed my suspicions and helped solving my problem along with the following resources:

http://wiki.xenproject.org/wiki/Xen_Networking
http://blog.feld.me/posts/2014/07/pfsense-on-citrix-xenserver/
http://cloudnull.io/2012/07/xenserver-network-tuning/
http://www-archive.xenproject.org/files/summit_3/rdd-tso-xen.pdf

Guess those links could help.

Offline rainer_d

  • Newbie
  • *
  • Posts: 12
  • Karma: +0/-0
    • View Profile
It seems it's now been fixed in HEAD:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=154428

No idea when it's going to show up anywhere else (i.e. FreeBSD 10 or pfSense) in the near future.