pfSense Support Subscription

Author Topic: failover didn't fall back to tier1 after downtime  (Read 2058 times)

0 Members and 1 Guest are viewing this topic.

Offline heper

  • Hero Member
  • *****
  • Posts: 2690
  • Karma: +253/-11
    • View Profile
failover didn't fall back to tier1 after downtime
« on: May 02, 2016, 07:18:32 am »
-Tier1 came back online
-Gateway group showed both 'online'
-Default route = Tier1
-----Default-gateway switching = enabled

still, for whatever reason, it kept sending "some" clients out through tier2 - this was still happening 10hours after the last gateway event.

to fix it i clicked "reset all states" in the GUI.
(it's impossible that the states were still alive from before the gateway-event, because nobody was around at 2am in the morning)

Code: [Select]
May 2 13:38:35 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 40% dest_addr 195.130.130.11 bind_addr 81.82.213.131 identifier "WAN_TELENET0 "
May 2 13:38:35 dpinger send_interval 500ms loss_interval 10000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 40% dest_addr 195.238.2.21 bind_addr 192.168.5.2 identifier "WAN_SCARLETGW "
May 1 02:14:26 dpinger WAN_TELENET0 195.130.130.11: Clear latency 9788us stddev 1395us loss 29%
May 1 02:13:26 dpinger WAN_TELENET0 195.130.130.11: Alarm latency 9607us stddev 1263us loss 41%

The values set for dpinger are those that made APINGER work somewhat reliably.
perhaps the values need to be set to sane values, now that we have a good pinger ?


That said, the system has been up for 20 days, and a couple of failover events took place ... this is the first time it didn't fall back.


suggestions?



Offline heper

  • Hero Member
  • *****
  • Posts: 2690
  • Karma: +253/-11
    • View Profile
Re: failover didn't fall back to tier1 after downtime
« Reply #1 on: May 09, 2016, 07:25:10 am »
since first post it happened again, 3 times to be exact.
i've reset all dpinger value's/variables to their default settings by GUI.

dpinger clears the error, but pfsense keeps sending traffic towards the Tier2 gateway (identical as first post).
i'm thinking the 'clear' isn't (always/under every circumstance) picked up by the backend code.

today i changed the trigger level from 'member down' -to-> 'packetloss or high latency'.
will update this thread with updates in the next couple of days/weeks


Offline maverick_slo

  • Hero Member
  • *****
  • Posts: 809
  • Karma: +37/-2
    • View Profile
Re: failover didn't fall back to tier1 after downtime
« Reply #2 on: May 09, 2016, 07:45:09 am »
Maybe you`re hitting this: https://redmine.pfsense.org/issues/6110 ?

Offline heper

  • Hero Member
  • *****
  • Posts: 2690
  • Karma: +253/-11
    • View Profile
Re: failover didn't fall back to tier1 after downtime
« Reply #3 on: May 09, 2016, 08:30:21 am »
perhaps, but no PPP(oE) involved. it is possible that default gateway switching is still enabled (from back in the day when there was a transparent proxy running). will check if disabling this makes a difference

Offline cmb

  • Hero Member
  • *****
  • Posts: 11230
  • Karma: +893/-7
    • View Profile
    • Chris Buechler
Re: failover didn't fall back to tier1 after downtime
« Reply #4 on: May 11, 2016, 01:59:09 am »
Guessing it's already-established connections that are staying there maybe? That'd be expected.

Two things influence traffic routing. Guessing your clients are being routed via a gateway group, which you can verify on the back end with:

Code: [Select]
grep route-to /tmp/rules.debug
The other thing would be the default gateway, for traffic matching firewall rules set to "default" rather than a gateway group. Check Diag>Routes to verify that.

Offline heper

  • Hero Member
  • *****
  • Posts: 2690
  • Karma: +253/-11
    • View Profile
Re: failover didn't fall back to tier1 after downtime
« Reply #5 on: May 11, 2016, 04:54:44 am »
Guessing it's already-established connections that are staying there maybe? That'd be expected.
it kept sending new clients towards tier2 for days after the gateway event ... can't have been (all) established connections

Two things influence traffic routing. Guessing your clients are being routed via a gateway group, which you can verify on the back end with:

Code: [Select]
grep route-to /tmp/rules.debug
The other thing would be the default gateway, for traffic matching firewall rules set to "default" rather than a gateway group. Check Diag>Routes to verify that.

will check the rules.debug when/if it happens next