pfSense Support Subscription

Author Topic: Round robin DNS with tinydns: changing + records to - when a gateway is down  (Read 12314 times)

0 Members and 1 Guest are viewing this topic.

Offline dpg2

  • Newbie
  • *
  • Posts: 12
  • Karma: +0/-0
    • View Profile
This post concerns the use of multiple public IP addreses from different ISPs pointing to the same internal server to provide network services over different network paths. The goals of such a setup are to maintain the availability of a service even if one ISP is down, and to balance services over multiple network paths.

This is referred to as "Round robin DNS" http://en.wikipedia.org/wiki/Round_robin_DNS. It is accomplished by publishing multiple A records that point to the same server (or server pool). The server hands out these addresses reordered with each request. The client's resolver then selects one address from the list of returned addresses and subsequently makes a connection to that address. The net effect is that inbound requests arrive via the different addresses published in a "round robin" fashion.

I've seen some unanswered posts in the forum regarding the question of how multiple RRs for a given resource are handled when the inbound gateway associated with the record is down:

http://forum.pfsense.org/index.php/topic,5937.0.html
http://forum.pfsense.org/index.php/topic,5932.0.html

I've read the pfsense book but I'm still unclear on  this topic.

I've read RFC 1794 (http://tools.ietf.org/html/rfc1794) and have done a fair bit of research on the topic in general, including the approaches taken with BIND 9 (sdb driver), PowerDNS (pipe backend), an LDAP-backed solution (ldapdns),  Eddieware Enhanced DNS (lbdns), and an implementation in PERL (lbnamed).

I'm also aware of DJB's page on this topic (http://cr.yp.to/djbdns/balance.html).

I'm prepared to contribute to the pfsense project with code to implement the prefixing of down gateway-associated RRs in the 'data' file with the "-" character (followed by `make` to rebuild the binary database) , as suggested by DJB, and restoring the "+" character prefix when the gateway returns (again, followed by 'make' to rebuild the binary database file). These records would have a short time to live (TTL).

I'm wondering if there is general interest in this feature and whether anyone has pointers regarding how best to implement this with the current tinydns server package with pfsense 1.2.3.

Of course, if the current code does this already, then accept my great thanks for implementing the feature!   :)


Offline dpg2

  • Newbie
  • *
  • Posts: 12
  • Karma: +0/-0
    • View Profile

OK, so clearly this feature has been implemented ("a day in the library saves an hour in the lab"). I've finally started playing with the pfsense implementation and I can see that there are a number of settings in the "DNS Servers" GUI under "Services" to handle fail over and load balancing.

For others that might be perusing the forums to see what pfsense has to offer, the "Failover" section of the GUI includes the following form fields: "IP to ping to ensure service is up", "Time in minutes before DNS switches to backup host",  and then provides one or more entries for fail over records, each of which has the following form fields: "Failover IP ", "Load balance" check box,  "Ping threshold", " Wan ping threshold",  "IP to ping to ensure service is up".

I haven't started testing the service yet, but I am having the problem that every change in the GUI is presenting the following error:

"""
Warning: Missing argument 6 for tinydns_get_rowline_data() in /usr/local/pkg/tinydns.inc on line 607 Warning: Missing argument 6 for tinydns_get_rowline_data() in /usr/local/pkg/tinydns.inc on line 607 Warning: Cannot modify header information - headers already sent by (output started at /usr/local/pkg/tinydns.inc:607) in /usr/local/www/pkg_edit.php on line 35
"""

I'm running the 1.2.3 release. The changes I make do seem to be sticking in the GUI.

Offline dpg2

  • Newbie
  • *
  • Posts: 12
  • Karma: +0/-0
    • View Profile

The warning message from "tinydns_get_rowline_data()" may be resolved by setting a default parameter for the sixth parameter, $dist, which is the MX record priority. Later in the code, which starts at line 607 in tinydns.inc this is set to "10" if it is initially unset (that is, "10" is the default value in the code.

So the change is at line 609: """ $rdns, $dist) {""" changes to """ $rdns, $dist = "10") {""".

Offline dpg2

  • Newbie
  • *
  • Posts: 12
  • Karma: +0/-0
    • View Profile

So, the fail over works great - just as expected for the most part.

The problem I'm seeing now is that tinydns seems to just lock-up after more than a day. The tinydns process is running, but using netstat I can see there is a large receive queue (>60000) and the daemon isn't answering any queries.

I've seen some earlier posts regarding this issue. Is there a cure?

Offline dpg2

  • Newbie
  • *
  • Posts: 12
  • Karma: +0/-0
    • View Profile

OK, I found that for some reason my log file directory was owned by uid 1004 rather than by 'Gdnslog', this was also true of the 'status' file. The default log file directory is /etc/tinydns/log. The logging process was defunct, so perhaps the tinydns service had a problem with the bad logging setup and stopped responding to requests.

I cleared up this problem with the following steps:

(1) /usr/local/etc/rc.d/svscan.sh stop
(2) make sure that all dns related processes are dead (ps awux | grep dns)
(3) mv /etc/tinydns /etc/tinydns.old
(4) /usr/local/bin/tinydns-conf Gtinydns Gdnslog /etc/tinydns 127.0.0.1
(5) cp /etc/tinydns.old/root/data* /etc/tinydns/root
(6) /usr/local/etc/rc.d/svscan.sh start
(7) cd /etc/tinydns/log && ls -l

The 'status' file and 'main' directory (and its contents) should now be owned by Gdnslog:Gdnslog.

The web view of the tinydns logs now also works properly.