The pfSense Store

Author Topic: Optimizing rc.bootup  (Read 8219 times)

0 Members and 1 Guest are viewing this topic.

Offline ssheikh

  • Full Member
  • ***
  • Posts: 131
  • Karma: +2/-0
    • View Profile
Re: Optimizing rc.bootup
« Reply #15 on: September 05, 2013, 08:10:54 pm »
The only two static routes in there are a result of two of DNS servers configured to use non-default gateway.

There seem to be too many things stepping over each other in the bootup process for it happen cleanly specially in a CARP HA cluster environment.

I had one bootup instance in which check_reload_status didn't start in /etc/rc (because I commented it out there) so while rc.bootup was running, it wound up firing off 3 instances check_reload_status because (I guess) the send_event functions in util.inc attempt to start check_reload_status. And it attempts to start it 3 times without ever waiting long enough in each iteration to actually allow check_reload_status to really start executing.

At this point I am changing my configuration around to make sure that I am not using gateway groups for anything other than switching the default route and making sure that services are bound to actual interfaces and not to CARP VIPs.

As for my full mesh OpenVPN tunnels, I am now planning on the primary at each site to connect to primary at all the other sites and the backup at each site to connect to backup at all the other sites and see if I can make my routing decisions through OSPF. For the client to reestablish a connection to the server when the server DefGW switches I plan on using an extra remote statement in the client. On the server side I will have two servers for each remote site, one each bound to each of the WAN interfaces.

The moment I try to bind OpenVPN to a CARP VIP or a GW group, unpredictable things start happening during the bootup process.  So I am trying to keep all local services off of VIPs and GWGs in the hope to get this setup to be a bit more stable.

Offline individual-it

  • Newbie
  • *
  • Posts: 11
  • Karma: +0/-0
    • View Profile
Re: Optimizing rc.bootup
« Reply #16 on: September 05, 2013, 11:44:15 pm »
After we added VLANs into our system last week we could not bring it up correctly. After the boot there were no GUI, but we easily could restart that by SSH, so then it looked like everything worked but the routing didn't. Outside it got darker and darker and we got pretty hungry while we tried to find the mistake in our configuration. And after a long time we found out that it wasn't a mistake in the configuration but the firewall and the routing processes didn't start. They crashed during boot.
The only quick and dirty solution that came up into our tired minds was to put sleep() loops in the rc.bootup so we would have a running system the next morning.

So the changes phil was talking about is an improvement of this solution. It makes it more general and configurable.
I'm just made a pull request out of the changes: https://github.com/pfsense/pfsense/pull/798

I understand that this solution might not be the most nicest and sure enough you can call it a kludge. But I would also argue its a bug fix as 2.1 is supposed to run on a 256MB system, but does not in every case.

We cannot start our Alix boards without this changes. So please consider to take it into 2.1.
If you don't want to have this hack in the master branch I'm happy to rewrite and resubmit it to 2.1. But I suspect you just could merge the changes automatically into 2.1 as every change is tested on 2.1 systems

php_fpm should help.
And the other thing I've noticed that we have set PHP memory_limit to 128M. Do we need so much, or can we reduce that?

Offline ermal

  • Hero Member
  • *****
  • Posts: 3832
  • Karma: +85/-5
    • View Profile
Re: Optimizing rc.bootup
« Reply #17 on: September 06, 2013, 02:35:54 am »
After we added VLANs into our system last week we could not bring it up correctly. After the boot there were no GUI, but we easily could restart that by SSH, so then it looked like everything worked but the routing didn't. Outside it got darker and darker and we got pretty hungry while we tried to find the mistake in our configuration. And after a long time we found out that it wasn't a mistake in the configuration but the firewall and the routing processes didn't start. They crashed during boot.
The only quick and dirty solution that came up into our tired minds was to put sleep() loops in the rc.bootup so we would have a running system the next morning.

So the changes phil was talking about is an improvement of this solution. It makes it more general and configurable.
I'm just made a pull request out of the changes: https://github.com/pfsense/pfsense/pull/798

I understand that this solution might not be the most nicest and sure enough you can call it a kludge. But I would also argue its a bug fix as 2.1 is supposed to run on a 256MB system, but does not in every case.

We cannot start our Alix boards without this changes. So please consider to take it into 2.1.
If you don't want to have this hack in the master branch I'm happy to rewrite and resubmit it to 2.1. But I suspect you just could merge the changes automatically into 2.1 as every change is tested on 2.1 systems

php_fpm should help.
And the other thing I've noticed that we have set PHP memory_limit to 128M. Do we need so much, or can we reduce that?

Usually as it is for 2.1 you probably are doing too much on one system(just a guess).
If you want this to be put as priority on the fix list i would rather push through commercial support to fix the low memory platforms.
The only way to fix this is controlling php forking and i know cause i have a lot of lowmem system myself.

Any other solution like yours with sleep is not sustainable since the behavior is unpredictable.

Offline individual-it

  • Newbie
  • *
  • Posts: 11
  • Karma: +0/-0
    • View Profile
Re: Optimizing rc.bootup
« Reply #18 on: September 08, 2013, 11:16:14 pm »
Hi ermal,
thanks for your quick answer.

Yes we are doing quite much on the system, but as we are sitting in Nepal we have no chance at the moment to get an other one with more RAM. We striped down to the minimum of services we need but still have problems. And i doubt our charity can pay for commercial support.

Could you explain why you think this fix is unpredictable? By default it does not do anything, just if the administrator enters values in the "Maximum delay time" field it will be active.

I agree with you that the problem is PHP and not the services them self. But even more that is an argument for me to use the sleeping solution as an temporary fix. We basically wait till the one php process is finished before we start a new one.

Controlling the PHP forking would be of course a much better way of solving this problem. But as you told your self it's to late to have that change in 2.1. That would mean there would be no solution till 2.2 comes.

What about having a workaround in 2.1 and going for a proper solution in 2.2?