Netgate m1n1wall

Author Topic: Updated today to latest snap - very slow  (Read 16321 times)

0 Members and 1 Guest are viewing this topic.

Offline bardelot

  • Full Member
  • ***
  • Posts: 176
  • Karma: +0/-0
    • View Profile
Re: Updated today to latest snap - very slow
« Reply #30 on: April 19, 2012, 04:31:46 pm »
I never did have a problem with my card that I had running 2.0.1 and firmware upgraded to 2.1. Only with a fresh image.

On my system it's exactly the other way around. The fresh image (same snapshot) is much faster. Although it might also be the CF card which I replaced from 1G to 4G. The datarate while downloading a snapshot increased from 20kbps up to 500kbps, and any saving operation only takes a few seconds again.

Offline markky

  • Newbie
  • *
  • Posts: 9
  • Karma: +0/-0
    • View Profile
Re: Updated today to latest snap - very slow
« Reply #31 on: April 19, 2012, 09:42:49 pm »
Things are smooth/snappy on my ALIX with a snapshot from today, I've not seen this slowness myself. Saving does take about 6-7 seconds but it always has on ALIX.

Using HTTPS to get to the GUI, everything seems happy.


I have a soekris 4801 (and a couple of 4501's, but not enough ram).  Works fine with 2.0.1, but every 2.1 snapshot I've tried in the last week or two exhibits the stall problem.

A number of people have mentioned shell commands like 'ls' hanging etc.  If you type ^T (ctrl T) during one of these stalls you'll see that the process is waiting on suspfs, which is one of two functions in kern/vfs_vnops.c (both vn_start_write and vn_start_secondary_write call msleep with the same wmesg string.  usually the wait message is unique so that there is no confusion which [tm]sleep we're waiting in.  I think this is a bug in itself and the msleep in vn_start_secondary_write should be changed to "suspf2" or something similar to differentiate it ).

Anyway, seeing that processes are stalling in either of these functions leads you back to the mount updates being the cause.  If you run /etc/rc.conf_mount_ro from a shell, followed by /etc/rc.conf_mount_rw I see a stall on the second command everytime.  ^T reports that the mount process is waiting on biowr.

I put some logging into /etc/inc/config.lib.php (where conf_mount_ro and rw function reside) to determine which of the 4 mount updates was stalling (syslog conveniently timestamps the messages) and it's the read only mount update of /.  On my system it's a stall of between 107 and 110 seconds.

I can't reproduce the behaviour using the same image under virtualbox.

The CF card in this 4801 is a cheap card with a brand I had not heard of, I'm going to try a sandisk card when I can find it.

 - Mark



I forgot to say that this is likely a race condition, probably triggered by the longer flash write time.  It sounds to me very similar to
   http://www.freebsd.org/cgi/query-pr.cgi?pr=149022
which apparently was seen on normal drives under heavy load.  The bug wasn't resolved and was closed due to lack of feedback.
There was a query about whether a fix to the soft-update code fixed it.
Soft-updates aren't enabled on the flash device.  However I did try enabling soft-updates just to see if a change in the write timing might work around the issue I was seeing.

Would be much easier to debug if I could get a 2.1 build environment working, but that's the topic of another post...

"show mount" from the kernel debugger would be very interesting.

 - Mark

Offline bao

  • Newbie
  • *
  • Posts: 7
  • Karma: +0/-0
    • View Profile
Re: Updated today to latest snap - very slow
« Reply #32 on: April 20, 2012, 10:13:27 am »
Recently we changed the builder to use conv=sparse when making image files to save time/space on the builders. Not sure if that's related, but I backed it out today. We'll have to wait and see if the snapshots from tomorrow, when freshly imaged, work better.

I noticed something different, which may be related.

In the create_nanobsd_diskimage (), a change was made to the zeroing of the nanobsd image:

        dd if=/dev/zero of=${IMG} bs=${NANO_SECTS}b \
           count=0 seek=`expr ${NANO_MEDIASIZE} / ${NANO_SECTS}

That did not really do anything.

Later on, the dd command was used to write the image, with "conv=sparse".  I don't think it matters at this time, since the image probably contains a lot of garbage.

We changed it back for our purposes, since we build 8G and 16GB nanobsd images and we can't compress them to reasonable sizes.

        dd if=/dev/zero of=${IMG} bs=${NANO_SECTS}b \
           count=`expr ${NANO_MEDIASIZE} / ${NANO_SECTS}

On another note, the last patch from ermalluci broke the build process in the last couple of days.  We made changes by hands to the headers to get it going again, i.e.

From
........
diff --git a/sys/netinet/ip_carp.c b/sys/netinet/ip_carp.c
index a4890dd..5b5fb19 100644
--- a/sys/netinet/ip_carp.c
+++ b/sys/netinet/ip_carp.c
.........
To
.........
Index: sys/netinet/ip_carp.c
===================================================================
--- sys/netinet/ip_carp.c
+++ sys/netinet/ip_carp.c
.........


Offline databeestje

  • Administrator
  • Hero Member
  • *****
  • Posts: 1048
  • Karma: +0/-0
  • It just might be your luck day, if you only knew.
    • View Profile
Re: Updated today to latest snap - very slow
« Reply #33 on: April 20, 2012, 04:45:23 pm »
the count=0 and seek=number means we don't actually write the image out with zeros to disk.

When reading the file it will have the correct size and everything will be returned as 0.

But you don't actually write an entire 4-16GB file. So it's much faster.

Offline podilarius

  • Hero Member
  • *****
  • Posts: 1768
  • Karma: +0/-0
    • View Profile
Re: Updated today to latest snap - very slow
« Reply #34 on: April 20, 2012, 04:52:54 pm »
Quote
On another note, the last patch from ermalluci broke the build process in the last couple of days.  We made changes by hands to the headers to get it going again, i.e.

The latest updates seems to have fixed the issue. I ran through the patches and they seem to be fine now. I am going to complete a build tonight and I am guessing the build servers will also produce a snapshot sometime later tonight.

Offline moullas

  • Newbie
  • *
  • Posts: 12
  • Karma: +0/-0
    • View Profile
Re: Updated today to latest snap - very slow
« Reply #35 on: April 22, 2012, 04:34:46 pm »
So.. I've updated to snapshot 2.1-DEVELOPMENT (i386) built on Fri Apr 20 22:46:42 EDT 2012, however I did not re-flash, just update from the Webif

Speed is still the same (slow).

Should I do a full re-image to see if this got fixed?

Offline podilarius

  • Hero Member
  • *****
  • Posts: 1768
  • Karma: +0/-0
    • View Profile
Re: Updated today to latest snap - very slow
« Reply #36 on: April 22, 2012, 09:33:03 pm »
I ran mine as an update, but, I had just completed a clean install. I would backup and re-image to to make sure there is nothing holding you back.

Online phil.davis

  • Hero Member
  • *****
  • Posts: 2374
  • Karma: +10/-0
    • View Profile
    • International Nepal Fellowship
Re: Updated today to latest snap - very slow
« Reply #37 on: April 24, 2012, 01:26:11 am »
Alix 2D13, 2G CF card
2.1-DEVELOPMENT (i386)
built on Mon Apr 23 08:11:16 EDT 2012
FreeBSD 8.3-RELEASE

I just flashed this current build to a CF card.
"mount -uw /" and "mount -ur /" are both <<1 second straight after a basic wizard setup.
After installing cron, pfblocker, squid and squidguard, "mount -ur /" is 10 to 15 seconds - I guess it is doing some extra thinking about all the files that are open on "/" before switching it to read-only.

I am also wondering about the mount state of "/" immediately after boot. I had assumed it would be mounted read-only, but if I use the webGUI Diagnostics, Command Prompt to do "mount -v" then this is what is displayed:

$ mount -v
/dev/ufs/pfsense0 on / (ufs, local, noatime, synchronous, fsid c050954f24ebab91)
devfs on /dev (devfs, local, fsid 00ff000303000000)
/dev/md0 on /tmp (ufs, local, fsid 8843964f53e8126e)
/dev/md1 on /var (ufs, local, fsid 8843964f5d6ec7d7)
/dev/ufs/cf on /cf (ufs, local, noatime, synchronous, fsid 0758954f63bcd04b)
devfs on /var/dhcpd/dev (devfs, local, fsid 01ff000303000000)

No read-only to be seen.

Same thing if a access the console, option 8 "Shell" and "mount -v".

Is this expected? Maybe the code that does the commands or drops to the shell enables write before giving control to the user command?
If I helped you, then help someone else - buy someone a gift from the INF catalog http://secure.inf.org/gifts/usd/

Offline markky

  • Newbie
  • *
  • Posts: 9
  • Karma: +0/-0
    • View Profile
Re: Updated today to latest snap - very slow
« Reply #38 on: April 24, 2012, 03:32:58 am »
Things are smooth/snappy on my ALIX with a snapshot from today, I've not seen this slowness myself. Saving does take about 6-7 seconds but it always has on ALIX.

Using HTTPS to get to the GUI, everything seems happy.


I have a soekris 4801 (and a couple of 4501's, but not enough ram).  Works fine with 2.0.1, but every 2.1 snapshot I've tried in the last week or two exhibits the stall problem.

A number of people have mentioned shell commands like 'ls' hanging etc.  If you type ^T (ctrl T) during one of these stalls you'll see that the process is waiting on suspfs, which is one of two functions in kern/vfs_vnops.c (both vn_start_write and vn_start_secondary_write call msleep with the same wmesg string.  usually the wait message is unique so that there is no confusion which [tm]sleep we're waiting in.  I think this is a bug in itself and the msleep in vn_start_secondary_write should be changed to "suspf2" or something similar to differentiate it ).

Anyway, seeing that processes are stalling in either of these functions leads you back to the mount updates being the cause.  If you run /etc/rc.conf_mount_ro from a shell, followed by /etc/rc.conf_mount_rw I see a stall on the second command everytime.  ^T reports that the mount process is waiting on biowr.

I put some logging into /etc/inc/config.lib.php (where conf_mount_ro and rw function reside) to determine which of the 4 mount updates was stalling (syslog conveniently timestamps the messages) and it's the read only mount update of /.  On my system it's a stall of between 107 and 110 seconds.

I can't reproduce the behaviour using the same image under virtualbox.

The CF card in this 4801 is a cheap card with a brand I had not heard of, I'm going to try a sandisk card when I can find it.

 - Mark



I forgot to say that this is likely a race condition, probably triggered by the longer flash write time.  It sounds to me very similar to
   http://www.freebsd.org/cgi/query-pr.cgi?pr=149022
which apparently was seen on normal drives under heavy load.  The bug wasn't resolved and was closed due to lack of feedback.
There was a query about whether a fix to the soft-update code fixed it.
Soft-updates aren't enabled on the flash device.  However I did try enabling soft-updates just to see if a change in the write timing might work around the issue I was seeing.

Would be much easier to debug if I could get a 2.1 build environment working, but that's the topic of another post...

"show mount" from the kernel debugger would be very interesting.

 - Mark

Well after way too much messing around, and finally abandoning the prebuilt developer ova virtual image, starting with a fresh install of freebsd 8.3 worked far simpler.

I upgraded my soekris 4801 to my first self built upgrade test image (from build_nano.sh).  That worked fine, but still exhibits the stall condition.
Then I tweaked build_nano.sh so that I had a ddb enabled kernel (i.e. call build_embedded_dev_kernel instead of build_embedded_kernel).
This kernel sees the same issues.  show mount is indeed very interesting.  There are way way more secondary writes happening than I'd expect on a quiescent filesystem.  I haven't had time to look at this in depth, but it kind of looks like every open vnode is resulting in an update write of the inode, despite the fact noasync is set...  It's been a bit problematic investigating as show mount root produces so much output that it uncovered a bug in the usb serial interface driver I use to to connect to the 4801 (generic prolific usb device, the osx driver seems to not handle overruns very well).

After poking around a little in the kernel debugger, I finally got around to trying the exact same image on a sandisk flash card (by dd'ing from the cheapo card to the sandisk card).  Interestingly enough, there were no stalls going from read only to read write when I was using the (higher spec) sandisk flash.  I speculated earlier that this seemed like a race condition, this seems to confirm at least that timing is an important factor.

Many have also suggested that there's a difference in behaviour between an upgrade an a fresh install of the flash card.  I doubt it actually has any impact, but I'll run that test tomorrow on the cheapo card to eliminate/confirm it.

 - Mark

Offline jimp

  • Administrator
  • Hero Member
  • *****
  • Posts: 14998
  • Karma: +4/-0
    • View Profile
Re: Updated today to latest snap - very slow
« Reply #39 on: April 24, 2012, 03:12:15 pm »
I only had the problem with a full fresh image. I haven't tried a new one yet, but I can write out the exact same CF I had problems with.

Alix 2D13, 2G CF card
2.1-DEVELOPMENT (i386)
built on Mon Apr 23 08:11:16 EDT 2012
FreeBSD 8.3-RELEASE

I just flashed this current build to a CF card.
"mount -uw /" and "mount -ur /" are both <<1 second straight after a basic wizard setup.
After installing cron, pfblocker, squid and squidguard, "mount -ur /" is 10 to 15 seconds - I guess it is doing some extra thinking about all the files that are open on "/" before switching it to read-only.

So you're saying that, on the same CF card that was slow with previous images, it's now fast with a new fresh image?

I am also wondering about the mount state of "/" immediately after boot. I had assumed it would be mounted read-only, but if I use the webGUI Diagnostics, Command Prompt to do "mount -v" then this is what is displayed:

No read-only to be seen.

Same thing if a access the console, option 8 "Shell" and "mount -v".

Is this expected? Maybe the code that does the commands or drops to the shell enables write before giving control to the user command?

Diag > Command does switch to rw. The shell does not. However, some packages can monkey with what happens there... In particular, after a firmware update that reinstalls packages, often you have to reboot after the update before it goes to ro properly.
Need help fast? Commercial Support!

Co-Author of pfSense: The Definitive Guide. - Check the Doc Wiki for FAQs.

Do not PM for help!

Offline markky

  • Newbie
  • *
  • Posts: 9
  • Karma: +0/-0
    • View Profile
Re: Updated today to latest snap - very slow
« Reply #40 on: April 25, 2012, 12:56:29 am »
...

Many have also suggested that there's a difference in behaviour between an upgrade an a fresh install of the flash card.  I doubt it actually has any impact, but I'll run that test tomorrow on the cheapo card to eliminate/confirm it.

 - Mark

As expected, the slow CF card still exhibits stalls in mount rw -> ro irrespective of whether the image is an upgrade or a complete reimage of the flash.

As far as I can remember, my slow CF card can be written at just under 4 MB/s from the dev virtual machine.  The sandisk card by comparison writes at over 5MB/s.

 - Mark

Online phil.davis

  • Hero Member
  • *****
  • Posts: 2374
  • Karma: +10/-0
    • View Profile
    • International Nepal Fellowship
Re: Updated today to latest snap - very slow
« Reply #41 on: April 25, 2012, 11:59:48 pm »
2.1-DEVELOPMENT (i386)
built on Wed Apr 25 15:47:29 EDT 2012
FreeBSD 8.3-RELEASE

Here is some testing to observe what happens with read-only on a fresh 2.1 nanobsd system:
After initial boot, from console shell (option 8), mount -v - "/" and "/cf" are read-only - good.
During the wizard they remain read-only, after clicking "reload" to implement the wizard settings, they go to read-write, but then they stay that way. I tried logging off from the webGUI, still read-write - unexpected but most people will do more configuration at this point and probably reboot anyway.
After another reboot they are read-only again - good.
After defining an alias and applying, they are read-only - good. So a simple change like defining an alias is putting "/cf" back to read-only.

"mount -ur /" and "mount -uw /" are really quick (<<1sec) on this test box with a simple configuration.
If I helped you, then help someone else - buy someone a gift from the INF catalog http://secure.inf.org/gifts/usd/

Online phil.davis

  • Hero Member
  • *****
  • Posts: 2374
  • Karma: +10/-0
    • View Profile
    • International Nepal Fellowship
Re: Updated today to latest snap - very slow
« Reply #42 on: April 26, 2012, 12:39:39 am »
I installed cron and pfblocker - the installation of those packages correctly switches to read-write then back to read-only when the installation is done.
Then I installed squid - at the end of the installation, "/" and "/cf" are both left read-write.
After a reboot, "/" and "/cf" are still both read-write.
So squid does something to the system that negates the read-only filesystems.
If I helped you, then help someone else - buy someone a gift from the INF catalog http://secure.inf.org/gifts/usd/

Offline tix

  • Jr. Member
  • **
  • Posts: 45
  • Karma: +0/-0
    • View Profile
Re: Updated today to latest snap - very slow
« Reply #43 on: April 26, 2012, 01:23:04 am »
I would like to add that I have the slow response problem as well on my Firebox x700 (non "e") that will last for about 20 - 30 mins after booting.  After that the webif responds a what consider a normal speed.

I've also noticed that on every boot that all filesystems are RW and remain that way unless I remount them RO.  However, if I force /cf to be RO then the web interface returns to 'normal' speeds.

I do have the LCDProc-dev package and the OpenVPN Client Export Utility installed but I'm going to uninstall them to see if that corrects the RW filesystem problem at least.

Offline stephenw10

  • Hero Member
  • *****
  • Posts: 8177
  • Karma: +12/-0
    • View Profile
Re: Updated today to latest snap - very slow
« Reply #44 on: April 26, 2012, 06:31:42 am »
Try testing with the /etc/rc.conf_mount_rw and /etc/rc.conf_mount_ro commands instead of calling mount directly since this is what the package system uses.

Steve