pfSense Support Subscription

Author Topic: CARP Master manual switch introduces packet loss  (Read 82 times)

0 Members and 1 Guest are viewing this topic.

Offline girtsd

  • Newbie
  • *
  • Posts: 17
  • Karma: +0/-0
  • Don`t assume! VERIFY!
    • View Profile
CARP Master manual switch introduces packet loss
« on: December 01, 2017, 10:14:13 am »
Hello!

The reason for me writing this is, that I have discovered a packet loss scenario for CARP failover in PFsense. Most of you will probably not care about this, but some may, and may also have asked this in a different way before, so I will post this here!
The goal is to provide a packetlossles (that word is bad :)) network.

There are 2 general ways a CARP master host can change:
    Manual switch with either Persistent CARP Maintenance mode or a reboot (there might be other ways)
    Actual hardware or software failure, which leads to a failover
In the case of an actual failure (unplugged network cable), the failover happens splendidly and there is no packet loss. In the case of a manual switch, however, a packet loss is introduced in the network.

Testing environment/scenario:
There is a master PFsense box called Master_PF, which has 3 network interfaces: a WAN and two LANs. Both of the LANs have CARP redundancy setup with virtual IPs. I also have net.inet.carp.preempt set to 1.
There is the other PFsense box, which is called Slave_PF. It is practically a clone of the master and serves as the CARP backup for the two LANs.
In each of the LANs there is a test box Ubuntu 14.04. They can only reach one another by passing both PFsense CARP interfaces, since their GW is set to the respective CARP VIPs.

Actual test:
From one Ubuntu test box I start pinging the other and monitor the sequence numbers of the pings.
If I remove the network cable on one of the interfaces of the Master_PF, both CARP VIPs switch to the Slave_PF instantly and I experience no packet loss (not a single ICMP sequence number is skipped).
After the environment reset I try the next test case, in which I Enter Persistent CARP Maintenance Mode. This causes 3 packets to be dropped (3 sequence numbers skipped) before the connections continues.
The same happens if I reboot the machine, when also 3 packets get skipped. Switching back to the master, however, does not always cause a packet loss.

Test results:
Manual CARP Master switch introduces temporary packet loss.

Further info:
From past experiences I know, that FreeBSD (on which PFsense is built) allows for a perfect manual CARP Master switch, so it must be a problem of PFsense.

Test nr2:
I repeat the pinging scenario, only this time the advbase (advertisement frequency base) parameter on all CARP VIPs is set to 5 instead of the default 1, as it was in the previous tests.
I repeat all the tests to find that after a manual CARP Master switch the connection is lost entirely, while in the case of an actual failure, the switch happens just fine.

Test results nr2:
Manual CARP Master switch on PFsense introduces an interface downtime equivalent to the interface advbase parameter in seconds.
This Is bad since it will always introduce packet loss on scheduled master box updates, during which the master will enter the persistent CARP maintenance mode, so that the slave takes care of business during master updates.

Further research:
Next I find that the frequency in which the master pings the slave is advbase + advskew/255 in seconds. This means, that the keepalive message from the master will be transmitted rarer than every 1 second.
Since the advbase parameter can only be an 8 bit non negative integer, it will always be in the range from 1 to 255, so the advertisement frequency will never be less than a second.
This one second is the time during which the 3 packets are lost during the initial test and the N packets are lost during the repeated test (where advbase was 5).
This happens because of how PFsense handles manual CARP Master switches. In the case of a reboot and entering of the Persistent CARP Maintenance Mode, PFsense changes the advskew parameter on the master, which makes it higher than on the slave. This results in the slave waiting for the next keepalive from the master and realizing its keepalive frequency is smaller than that of the current master, when it decides to be the master. This waiting period is when the packet loss happens.
In the case of an actual failure, the standard FreeBSD functionality kicks in, and the master is changed via the net.inet.carp.demotion kernel parameter, which in the case of pfsense is used for some sort of CARP error reporting.

Actual PFsense files:
There is a file /usr/local/www/status_carp.php, in which the push of the "Enter Persistent CARP Maintenance Mode" is processed.
Code: [Select]
if ($status != 0 && $_POST['carp_maintenancemode'] != "") {
        interfaces_carp_set_maintenancemode(!isset($config["virtualip_carp_maintenancemode"]));
}
The function interfaces_carp_set_maintenancemode() is found in /etc/inc/interfaces.inc file
Code: [Select]
function interfaces_carp_set_maintenancemode($carp_maintenancemode) {
global $config;
if (isset($config["virtualip_carp_maintenancemode"]) && $carp_maintenancemode == false) {
unset($config["virtualip_carp_maintenancemode"]);
write_config("Leave CARP maintenance mode");
} else if (!isset($config["virtualip_carp_maintenancemode"]) && $carp_maintenancemode == true) {
$config["virtualip_carp_maintenancemode"] = true;
write_config(gettext("Enter CARP maintenance mode"));
}
$viparr = &$config['virtualip']['vip'];
foreach ($viparr as $vip) {
if ($vip['mode'] == "carp") {
interface_carp_configure($vip);
}
}
}
This functions calls interface_carp_configure() function for CARP VIP of the firewall and this function is also found in the /etc/inc/interfaces.inc file
Code: [Select]
function interface_carp_configure(&$vip) {
global $config, $g;
if (isset($config['system']['developerspew'])) {
$mt = microtime();
echo "interface_carp_configure() being called $mt\n";
}
if ($vip['mode'] != "carp") {
return;
}
/* NOTE: Maybe its useless nowadays */
$realif = get_real_interface($vip['interface']);
if (!does_interface_exist($realif)) {
file_notice("CARP", sprintf(gettext("Interface specified for the virtual IP address %s does not exist. Skipping this VIP."), $vip['subnet']), "Firewall: Virtual IP", "");
return;
}
$vip_password = $vip['password'];
$vip_password = escapeshellarg(addslashes(str_replace(" ", "", $vip_password)));
if ($vip['password'] != "") {
$password = " pass {$vip_password}";
}
$advbase = "";
if (!empty($vip['advbase'])) {
$advbase = "advbase " . escapeshellarg($vip['advbase']);
}
$carp_maintenancemode = isset($config["virtualip_carp_maintenancemode"]);
if ($carp_maintenancemode) {
$advskew = "advskew 254";
} else {
$advskew = "advskew " . escapeshellarg($vip['advskew']);
}
mwexec("/sbin/ifconfig {$realif} vhid " . escapeshellarg($vip['vhid']) . " {$advskew} {$advbase} {$password}");
if (is_ipaddrv4($vip['subnet'])) {
mwexec("/sbin/ifconfig {$realif} " . escapeshellarg($vip['subnet']) . "/" . escapeshellarg($vip['subnet_bits']) . " alias vhid " . escapeshellarg($vip['vhid']));
} else if (is_ipaddrv6($vip['subnet'])) {
mwexec("/sbin/ifconfig {$realif} inet6 " . escapeshellarg($vip['subnet']) . " prefixlen " . escapeshellarg($vip['subnet_bits']) . " alias vhid " . escapeshellarg($vip['vhid']));
}
return $realif;
}
This function changes the advskew parameter for the affected interface.

Workaround:
I`ve found 2 ways to get past this easy:
1) Instead of Entering the persistent CARP maintenance mode one can just issue the command line command
Code: [Select]
sysctl net.inet.carp.demotion 240
This also works if you do this before a reboot.
2) One can change the /usr/local/www/status_carp.php file code from the above quoted to:
Code: [Select]
if ($status != 0 && $_POST['carp_maintenancemode'] != "") {
        if (!isset($config["virtualip_carp_maintenancemode"])) {
set_single_sysctl('net.inet.carp.demotion', '240');
}
else if (isset($config["virtualip_carp_maintenancemode"])) {
set_single_sysctl('ņet.inet.carp.demotion', '0');
}
        interfaces_carp_set_maintenancemode(!isset($config["virtualip_carp_maintenancemode"]));
}
This will set the same kernel parameter each time the persistent CARP maintenance mode is entered. It will, however, result in the CARP (failover) status page showing an error about CARP, because of the above mentioned use of this parameter for error reporting.

Suggestion:
Provide a WebGUI way to manually switch the CARP master to the slave state in a way, which does not introduce packet loss.
For some administrators this will be an important feature and for some its absence a dealbreaker.

Epilogue:
I realise this might not be the right place for this, but people who care will most likely look here first (as I did).
Should I also add this as an improvement in the appropriate ticket management system?

With respect,
Girts
Don`t assume! VERIFY!