Netgate SG-1000 microFirewall

Author Topic: Unbound DNS intermittent failure  (Read 496 times)

0 Members and 1 Guest are viewing this topic.

Offline Liath.WW

  • Full Member
  • ***
  • Posts: 141
  • Karma: +1/-0
    • View Profile
Unbound DNS intermittent failure
« on: February 02, 2018, 12:10:21 pm »
It would seem that DNS is failing intermittently, and it has really started to impact my day to day operation.

I'm using an old 2nd gen I5, board is fine but the built-in NIC only runs at 400MBps before bottlenecking, added an intel d33682 2-port NIC, Intel logo and all that because I know there were some Chinese cheapo clones with crap capacitors and such.

The machine has had this setup for well, since 2nd get i5's were new. Haven't had much issue with pfSense until this latest build, with the new interface and loss of rrd graphs.  DNS since that upgrade has been a bit of an issue.  Lately its so bad I'm pulling aggro from my family because 'the internet is broke'.

Only real hints I can think of are that I have an AT&T modem with IP-passthrough turned on, modem has all filtering off.
The logs will occasionally spam llinfo arp resolution issues with the modems IP even though the link is up and passing traffic.
I also see in logs>system>DNS resolver that every 5 minutes like clockwork, it is evaluating and dropping some aliases:
Code: [Select]
.....lots of similar entries like
Feb 2 13:01:08 filterdns adding entry 54.239.172.202 to pf table Eve for host launcher.eveonline.com
Feb 2 12:56:08 filterdns IP address 52.84.128.4 already present on table Eve as address of hostname launcher.eveonline.com
...lots more of the same
Feb 2 12:56:08 filterdns adding entry 52.84.27.39 to pf table Eve for host resources.eveonline.com
Feb 2 12:51:08 filterdns clearing entry 52.84.133.166 from pf table Eve on host binaries.eveonline.com
....
Feb 2 12:51:08 filterdns adding entry 52.84.128.4 to pf table Eve for host launcher.eveonline.com
Feb 2 12:46:09 filterdns clearing entry 54.192.7.112 from pf table Eve on host resources.eveonline.com
In StatusSystem LogsDHCP
Code: [Select]
Feb 2 13:15:33 dhclient Creating resolv.conf
Feb 2 13:15:33 dhclient RENEW
Feb 2 13:10:33 dhclient Creating resolv.conf
Feb 2 13:10:33 dhclient RENEW
Feb 2 13:05:33 dhclient Creating resolv.conf
Feb 2 13:05:33 dhclient RENEW
Feb 2 13:00:33 dhclient Creating resolv.conf
Feb 2 13:00:33 dhclient RENEW
Feb 2 12:55:33 dhclient Creating resolv.conf
Feb 2 12:55:33 dhclient RENEW
System/Gateways
Code: [Select]
Feb 2 02:59:51 dpinger WAN_DHCP6 2001:4860:4860::8888: Clear latency 10158us stddev 1982us loss 16%
Feb 2 02:59:34 dpinger WAN_DHCP6 2001:4860:4860::8888: Alarm latency 9857us stddev 1487us loss 21%
I was up late last night trying to figure this out while family was asleep. In my tiredness I cleared logs for a fresh view since I was testing new cables, re-tipped even the factory tipped ones, etc. etc.  Wishing now I'd not done so.

When the DNS is on the fritz, connections that were already made continue passing traffic as normal. Streams keep streaming, SIP calls keep working, etc.  That rules out the connection dropping as the issue.  Only DNS seems to fail, so new connections can't be made.

Any clue what's going on and how to fix it?

Info that might be useful:
packages:
bandwidthd
darkstat
iperf   
mtr-nox11
openvpn-client-export
Service_Watchdog       << Added to *try* and resolve dns issues, thought maybe the service was dying? Possibly related to 5-minute interval with filterdns? I believe i added because before I did unbound just died and stayed dead.

« Last Edit: February 02, 2018, 12:21:49 pm by Liath.WW »

Offline toluun

  • Newbie
  • *
  • Posts: 12
  • Karma: +0/-0
    • View Profile
Re: Unbound DNS intermittent failure
« Reply #1 on: February 02, 2018, 01:22:13 pm »
Does a restart of unbound solve the issue? I have been having major issues with DNSSEC on unbound causing DNS failures. Same thing would happen to me, streams would continue, WAN gateways were shown as still open, etc...  Only new new DNS lookups would fail.  Once I restarted Unbound everything would go back to normal for a short period of time, then BOOM DNS failures. I am still trying to solve my issue (see a couple posts down) but I did find that disabling DNSSEC stopped the DNS failures.  Not sure if this helps, but your problems seemed very similar to mine so I thought I would comment with my temporary fix.

Offline Liath.WW

  • Full Member
  • ***
  • Posts: 141
  • Karma: +1/-0
    • View Profile
Re: Unbound DNS intermittent failure
« Reply #2 on: February 02, 2018, 02:51:25 pm »
I am going to say "yes" to this one. I'd been having issues with it dying before, and installed the watchdog package to automatically restart it.

From last night until shortly before I made this thread, the internet was generally unbrowsable due to constant DNS issues.  I reboot the pfSense box a few hours ago, and have had no more issues since, however this is a repeat issue that seems to get worse until I get tired of it and reboot the entire network.

It really concerns me because I have business clients who I *really* want to migrate from SonicWall to pfSense, but if I replace them and DNS is going to act like this in a business production environment, I'll be looking for new clients.

Offline toluun

  • Newbie
  • *
  • Posts: 12
  • Karma: +0/-0
    • View Profile
Re: Unbound DNS intermittent failure
« Reply #3 on: February 02, 2018, 03:06:07 pm »
Well at least yours sound a lot more uncommon then mine.  My DNS would go down every 10 - 30 min. Do you have DNSSEC enabled on unbound? 

Offline Gertjan

  • Hero Member
  • *****
  • Posts: 2435
  • Karma: +192/-9
    • View Profile
Re: Unbound DNS intermittent failure
« Reply #4 on: February 02, 2018, 03:42:27 pm »
@Liath.WW : filterdns : Take a look at "binaries.eveonline.com" :

Code: [Select]
[code]root@ns311465:~# host binaries.eveonline.com
binaries.eveonline.com is an alias for d17ueqc3zm9j8o.cloudfront.net.
d17ueqc3zm9j8o.cloudfront.net has address 13.32.153.137
d17ueqc3zm9j8o.cloudfront.net has address 13.32.153.7
d17ueqc3zm9j8o.cloudfront.net has address 13.32.153.11
d17ueqc3zm9j8o.cloudfront.net has address 13.32.153.177
d17ueqc3zm9j8o.cloudfront.net has address 13.32.153.52
d17ueqc3zm9j8o.cloudfront.net has address 13.32.153.156
d17ueqc3zm9j8o.cloudfront.net has address 13.32.153.186
d17ueqc3zm9j8o.cloudfront.net has address 13.32.153.181
[/code]
A couple of seconds later, the list changes ! :
Code: [Select]
root@ns311465:~# host binaries.eveonline.com
binaries.eveonline.com is an alias for d17ueqc3zm9j8o.cloudfront.net.
d17ueqc3zm9j8o.cloudfront.net has address 13.32.153.11
d17ueqc3zm9j8o.cloudfront.net has address 13.32.153.137
d17ueqc3zm9j8o.cloudfront.net has address 13.32.153.52
d17ueqc3zm9j8o.cloudfront.net has address 13.32.153.181
d17ueqc3zm9j8o.cloudfront.net has address 13.32.153.7
d17ueqc3zm9j8o.cloudfront.net has address 13.32.153.177
d17ueqc3zm9j8o.cloudfront.net has address 13.32.153.156
d17ueqc3zm9j8o.cloudfront.net has address 13.32.153.186
so it's normal that filterdns is very busy every 5 minutes with removing IP's, and adding new ones.
filterdns is payed to do so.

UP to you to remove "inaries.eveonline.com" from your alias list, or complain against them ;)

DNS : You are using the DHCP client to obtain a new WAN IP ? Somethings goes very wrong with that. When I see it recreates "resolv.conf" I wouldn't be surprised that your local DNS server (unbound) is restarting. Every 5 minutes. Yep, you're right, consider your DNS in very bad state. But this is not his fault.

Find out why your DHCP clients (is forced ?!) to renew evey 5 minutes - like when filterdns is running ... Strange, it's time to describe your setup completely.

Btw : unbound resolves up against the root DNS servers, and is ROCK solid as a DNS server.
Your issues is not DNSSEC related. DNSSEC activated for unbound works for thousands if not tens of thousands of pfSense installs, and all other servers that use unbound.

Offline Liath.WW

  • Full Member
  • ***
  • Posts: 141
  • Karma: +1/-0
    • View Profile
Re: Unbound DNS intermittent failure
« Reply #5 on: February 02, 2018, 06:06:15 pm »
FilterDNS runs after Unbound kicks the bucket and restarts.
Code: [Select]
Feb 2 16:30:10 filterdns adding entry 54.239.172.212 to pf table Eve for host binaries.eveonline.com
Feb 2 16:26:51 unbound 53607:0 info: start of service (unbound 1.6.6).
....
Feb 2 16:26:47 unbound 88185:0 info: service stopped (unbound 1.6.6).
Feb 2 16:25:09 filterdns clearing entry 52.84.133.127 from pf table Eve on host binaries.eveonline.com
...
Feb 2 16:25:09 filterdns adding entry 54.192.7.236 to pf table Eve for host launcher.eveonline.com
Feb 2 16:22:46 unbound 88185:0 info: start of service (unbound 1.6.6).
Feb 2 16:22:46 unbound 88185:0 notice: init module 1: iterator
Feb 2 16:22:46 unbound 88185:0 notice: init module 0: validator
...
Feb 2 16:22:32 unbound 67005:0 info: server stats for thread 0: 139 queries, 61 answers from cache, 78 recursions, 0 prefetch, 0 rejected by ip ratelimiting
Feb 2 16:22:32 unbound 67005:0 info: service stopped (unbound 1.6.6).
Feb 2 16:21:43 unbound 67005:0 info: start of service (unbound 1.6.6).
Feb 2 16:21:43 unbound 67005:0 notice: init module 0: iterator
...
Feb 2 16:21:40 unbound 59480:0 info: server stats for thread 0: 566 queries, 183 answers from cache, 383 recursions, 6 prefetch, 0 rejected by ip ratelimiting
Feb 2 16:21:40 unbound 59480:0 info: service stopped (unbound 1.6.6).
Feb 2 16:20:12 filterdns adding entry 52.84.133.127 to pf table Eve for host binaries.eveonline.com
...
Feb 2 16:17:25 unbound 59480:0 info: start of service (unbound 1.6.6).
Feb 2 16:17:25 unbound 59480:0 notice: init module 1: iterator
Feb 2 16:17:25 unbound 59480:0 notice: init module 0: validator
Feb 2 16:17:22 unbound 18317:0 info: 4096.000000 8192.000000 1
...
If I understand you correctly, there is something happening that is causing unbound to restart.  How can I find the root cause?

One rabbit hole I fell down was because of the arp llinfo messages, but I don't have an example of right now.  They do point to the IP of my ISP-provided modem - which I cannot get rid of (I'm on fiber, they said the system wont allow me to go straight from the "ONT?" (fiber<>eth bridge) to my router. but I admit I haven't tried to bypass it.)

The passthrough on the modem *is* weird. The device first hands out an address in the 192.168.1.x range, then once pass-through is handled it hands out the public facing IP.

I do see a bunch of this in DHCP log, but I'm not 100% is applicable:
Code: [Select]
Feb 2 15:44:44 dhclient Creating resolv.conf
Feb 2 15:44:44 dhclient RENEW
Feb 2 15:39:44 dhclient Creating resolv.conf
Feb 2 15:39:44 dhclient RENEW

Offline Liath.WW

  • Full Member
  • ***
  • Posts: 141
  • Karma: +1/-0
    • View Profile
Re: Unbound DNS intermittent failure
« Reply #6 on: February 02, 2018, 06:16:13 pm »
I think I may have stumbled upon something in the ISP modem config that could be causing this, though the times are different than the pfSense 5 minute issues.
In the IP-passthrough page, there is a Passthrough DHCP Lease. Default value is 10 minutes.  I changed to 1 day, hopefully this is the root cause and will fix things.

FYI, the modem is this one:

Manufacturer   ARRIS
Model Number   BGW210-700

Offline Liath.WW

  • Full Member
  • ***
  • Posts: 141
  • Karma: +1/-0
    • View Profile
Re: Unbound DNS intermittent failure
« Reply #7 on: February 03, 2018, 11:12:58 am »
Haven't seen much more logs about dns/dhcp dying since I updated the thread last night.
Computers seem to be going well enough.  Phones still aren't too happy, though they're phones, no idea if there's something goofy going on with them.

Offline Liath.WW

  • Full Member
  • ***
  • Posts: 141
  • Karma: +1/-0
    • View Profile
Re: Unbound DNS intermittent failure
« Reply #8 on: February 05, 2018, 08:14:29 pm »
Forgot to update this because it was late and I was tired.  Had the services die again last night, unbound restarting itself.  Switched to just using dns forwarder and haven't had a peep from anything since.

Despite people saying that it isn't unbound DNS, that is the service with the symptom. If there are logs or configs that someone would like to have that might be able to help identify the issue, I'll be happy to provide them. I understand that some other service failure may be causing unbound to die and restart, but thus far all of the information I've seen and read doesn't solve the issue for me, and I've not seen any useful requests that yield results.

Unfortunately this means I can't pitch pfSense with dnssec as a selling point.  The rest of it works great, and I've been using pfSense as a whole for years.

I might be able to put it in production without unbound, but if I can't get a home setup stable, it makes me wonder if the underlying cause of unbound dying would end up impacting customers.

Offline johnpoz

  • Hero Member
  • *****
  • Posts: 15193
  • Karma: +1414/-206
  • Not a pfSense employee, they cannot fire me...
    • View Profile
Re: Unbound DNS intermittent failure
« Reply #9 on: February 06, 2018, 05:05:47 am »
What aliases are using?

Also unbound can restart when you have it set to register dhcp.
- An intelligent man is sometimes forced to be drunk to spend time with his fools.
- Please don't PM me for personal help
- if you want to say thanks applaud or https://www.freebsdfoundation.org/donate/
1x SG-2440 2.4.2-RELEASE-p1 (work)
1x SG-4860 2.4.2-RELEASE-p1 (home)

Offline Liath.WW

  • Full Member
  • ***
  • Posts: 141
  • Karma: +1/-0
    • View Profile
Re: Unbound DNS intermittent failure
« Reply #10 on: February 06, 2018, 04:57:16 pm »
Which type of aliases would you want to know about? I have a few that have FQDN in them, I have some that are IPs and some that are ports. Be happy to share if you think there may be something with them that is causing the issues, however I'm not sure I want to 'lift my skirt' in public so to speak :P

Also, I didn't have the option to register DHCP leases in DNS resolver config, so while I wish it was that simple it's not.  Although it does beg to question why such an option would even be available if it causes instability?

Offline romainp

  • Full Member
  • ***
  • Posts: 139
  • Karma: +6/-0
    • View Profile
Re: Unbound DNS intermittent failure
« Reply #11 on: February 06, 2018, 07:08:26 pm »
Hi,
I got too some really strange dns issue...
From time to time, the dns resolution does not work at all or take very long time. It could happen several time per hour, all system connected to pfsense are affected. It is not from my ISP or my DSL router on which I am connected to because if I do a nslookup google.com 8.8.8.8, it works perfectly but if I use the internal pfsense dns, it fail.

It happens some weeks ago. At that time I thought that because I did several upgrade of pfsense without a real good clean installation it could be the root cause. So I made backup, install from scratch and restore my config and everything was fine until today. The only thing I change yesterday was to install the trafic total package.
I don't see any obvious reason why I got this issue but I will try to investigate more.

Thanks.
R.

Offline johnpoz

  • Hero Member
  • *****
  • Posts: 15193
  • Karma: +1414/-206
  • Not a pfSense employee, they cannot fire me...
    • View Profile
Re: Unbound DNS intermittent failure
« Reply #12 on: February 07, 2018, 04:19:56 am »
How many entries?  FQDN have to be looked up every so often - if you have hundreds of fqdn and they all return lots of IPs then sure could be a contributing factor..

Not sure if it still an issue but register dhcp restarts unbound - so if you have hundreds of dhcp clients and or very short lease times you could have unbound starting every few minutes which would for sure cause a problem with clients actually being able to lookup anything ;)

Also if filterdns is having to lookup 1000's of fqdn every few minutes that have very short ttl's etc.. This also could be a problem depending...
- An intelligent man is sometimes forced to be drunk to spend time with his fools.
- Please don't PM me for personal help
- if you want to say thanks applaud or https://www.freebsdfoundation.org/donate/
1x SG-2440 2.4.2-RELEASE-p1 (work)
1x SG-4860 2.4.2-RELEASE-p1 (home)

Offline romainp

  • Full Member
  • ***
  • Posts: 139
  • Karma: +6/-0
    • View Profile
Re: Unbound DNS intermittent failure
« Reply #13 on: February 07, 2018, 11:53:25 am »
Hi,
It's just an home setup with max 30 fixed dns entries. I use pfblockerng also but even if I stop it I still have this strange behaviour. I understand that unbound could be restarted when a dhcp client register itself to the dns but it should not take 30 sec to the dns to work again...

The problem is that I don't see obvious reasons in the logs that could explain this...

Offline johnpoz

  • Hero Member
  • *****
  • Posts: 15193
  • Karma: +1414/-206
  • Not a pfSense employee, they cannot fire me...
    • View Profile
Re: Unbound DNS intermittent failure
« Reply #14 on: February 07, 2018, 12:24:14 pm »
Is it restarting or not?  I have been running unbound on a home setup in resolver mode in pfsense since before it was included and was a package.  Have never had any such issues other than the dhcp restart thing.

I really see no point of registering dhcp in a home setup.  All my devices I care about have reservations so I know what IP they are and yes the static entries are registered.  Devices that are just going to get some random IP out of the pool are going to be guest sort of devices and don't give 2 shits what what their name is or IP is, etc.  They are only going to to be on the network temp... If they were always going to be on the network and I wanted to resolve them they would have reservations for an IP, etc.
- An intelligent man is sometimes forced to be drunk to spend time with his fools.
- Please don't PM me for personal help
- if you want to say thanks applaud or https://www.freebsdfoundation.org/donate/
1x SG-2440 2.4.2-RELEASE-p1 (work)
1x SG-4860 2.4.2-RELEASE-p1 (home)