pfSense Forum

pfSense English Support => General Questions => Topic started by: CDuv on November 02, 2016, 07:47:44 am

Title: [v2.3 & v2.4] Kernel crash with Fatal trap 12: page fault while in kernel mode
Post by: CDuv on November 02, 2016, 07:47:44 am
My 2.3.2-RELEASE-p1 appliance (Lanner FW-7551 (http://www.lannerinc.com/products/x86-network-appliances/desktop/fw-7551): Atom C2758 with 8GB of RAM) is crashing at least once per hour.

It reboots itself and resumes network access automatically.

The crash report is as follows:
Crash report begins.  Anonymous machine information:

amd64
10.3-RELEASE-p9
FreeBSD 10.3-RELEASE-p9 #1 5fc1b19(RELENG_2_3_2): Tue Sep 27 12:26:06 CDT 2016     root@ce23-amd64-builder:/builder/pfsense-232/tmp/obj/builder/pfsense-232/tmp/FreeBSD-src/sys/pfSense

Crash report details:

Filename: /var/crash/bounds
1

Filename: /var/crash/info.0
Dump header from device /dev/label/swap0
  Architecture: amd64
  Architecture Version: 1
  Dump Length: 80896B (0 MB)
  Blocksize: 512
  Dumptime: Wed Nov  2 13:21:44 2016
  Hostname: hermes.example.com
  Magic: FreeBSD Text Dump
  Version String: FreeBSD 10.3-RELEASE-p9 #1 5fc1b19(RELENG_2_3_2): Tue Sep 27 12:26:06 CDT 2016
    root@ce23-amd64-builder:/builder/pfsense-232/tmp/obj/builder/pfsense-232/tmp/FreeBSD-src/sys/pfSense
  Panic String: sbflush_internal: cc 4294965256 || mb 0 || mbcnt 0
  Dump Parity: 916287357
  Bounds: 0
  Dump Status: good


Full crash report (https://framabin.org/?a1171938dad7724a#xxxECLn9ypv2TFBochMHvfXiaZUDKWPJcWYrs4MK5JY=).

Googling led me to Issue #4689 (https://redmine.pfsense.org/issues/4689) and freebsd-current mailing list (https://lists.freebsd.org/pipermail/freebsd-current/2007-February/069049.html) but theses are old/resolved issues.

I have:


Edit: Added hardware brand and model
Title: Re: Kernel crash with Panic: sbflush_internal: cc 4294965256 || mb 0 || mbcnt 0
Post by: CDuv on November 03, 2016, 06:24:53 am
I am starting to think this is because of the OpenVPN service which uses the following settings:

I tried enabling "AES-NI CPU-based Acceleration" Cryptographic Hardware for system and "BSD cryptodev engine - RSA, DSA, DH, AES-128-CBC, AES-192-CBC, AES-256-CBC" Hardware Crypto for OpenVPN (as advised on IRC) but it does not prevent crash from occurring.

Box does not seems to overheat: I'm around 34C (cannot seems to be able to graph this thought)
Title: Re: Kernel crash with Panic: sbflush_internal: cc 4294 || mb 0 || mbcnt 0 [OpenVPN?]
Post by: CDuv on November 03, 2016, 12:08:21 pm
Spent the whole day with OpenVPN server disabled: and yet it crashed once, but only once (not 4 times a day as before)...
Title: Re: Kernel crash with Panic: sbflush_internal: cc 4294 || mb 0 || mbcnt 0 [OpenVPN?]
Post by: CDuv on November 04, 2016, 01:14:10 pm
According to Tuning and Troubleshooting Network Cards wiki entry (https://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards#Intel_igb.284.29_and_em.284.29_Cards) I added:
kern.ipc.nmbclusters=1000000
to my /boot/loader.conf.local file.

It didn't changed anything... : still crashing.
Title: Re: Kernel crash with Fatal trap 12: page fault while in kernel mode
Post by: CDuv on November 11, 2016, 11:28:54 am
I did a full re-installation of v2.3.2 from a USB memstick (Serial), then applied "-p1" patch.
I re-added
kern.ipc.nmbclusters=1000000
to my /boot/loader.conf.local file.

But it is still crashing (twice this night and once around noon: even if today no one was using it...).
I have a clone of this server (exact same model): same problem with this one too (so issue is not related to faulty hardware).


I have lots of crash report, but don't know how to read them: can someone help me?
Title: Re: Kernel crash with Fatal trap 12: page fault while in kernel mode
Post by: stedwsyy on November 12, 2016, 01:59:36 am
This series of events is unlikely to happen.
Title: Re: Kernel crash with Fatal trap 12: page fault while in kernel mode
Post by: CDuv on November 12, 2016, 10:57:04 am
That's why I am completely lost here...


Here is a map of the network installation:

    /------►(   Internet   )◄-------------\
    |               ▲                     |
    |               |                     |
┌─────┐        ┌─────┐                 ┌─────┐
│ISP B --\     │ISP C │                 │ISP A │
│router│  |     │router│                 │router│
└──────┘  |     └─────┘                 └─────┘
          |        |                          |
       (WAN_B)  (WAN_C)                       |
          |        |                          |
┌─────────────────────────────────┐      (WAN_A)
│        igb2     igb3              │         |
│                                   │         |
│  Lanner FW-7551 running pfSense   │         |
│                                   │         |
│                 VID3&VID4         │         |
│igb0    igb1        igb4      igb5 │         |
└───────────────────────────────┘         |
  |       |           |         |             |
(LAN)  (WAN_A)  (LAN_GUEST1)  (SYNC)          |
  |       |           &         |             |
  |       |     (LAN_GUEST2)    |             |
  |       |           |         |             |
  |       |           |         x             |
  |       |           |  unused yet: left     |
  |       |           |  for future HA        |
  |       |           |  pfSense setup        |
  |       |           |                       |
┌───────────────────────────────────────┐  |
│ p7     p13         p12                   │  |
│VID1    VID2     VID3&VID4                │  |
│                                          │  |
│         D-Link DGS managed switch      p9--/
│                                      VID2│
│ VID1                     VID3  VID4      │
│  p8                      p10   p11       │
└───────────────────────────────────────┘
   |                        |     |
   |                        |     \---► (LAN_GUEST2)
   ▼                        |
 (LAN)                      \---► (LAN_GUEST1)


Particularities:

Networks:
 

VLANs (VID<->Network mapping):


Today (non working day) it crashed about every two or three hours.
Title: Re: Kernel crash with Fatal trap 12: page fault while in kernel mode
Post by: beppo on November 12, 2016, 12:43:02 pm
I got also your described problem with the supermicro board. The openvpn service is not enabled but I got daily reboots after crashs. For my thinking the issue is somehow related to the 2.3.x version as versions before ran months without problems.
Title: Re: Kernel crash with Fatal trap 12: page fault while in kernel mode
Post by: kpa on November 12, 2016, 12:53:15 pm
If the crash is always the same or looks very similar to the others it's likely that the problem is a software one, random crashes with wildly varying types of reports is an indication of a hardware problem instead.
Title: Re: Kernel crash with Fatal trap 12: page fault while in kernel mode
Post by: CDuv on November 12, 2016, 02:37:22 pm
I don't know how to read crash reports.
Sometime the file "/var/crash/info.0" has:
Quote
Panic String: sbflush_internal: cc 0 || mb 0xfffff800643e2800 || mbcnt 2304
sometime it does not.
But crash report always has:
Quote
Fatal trap 12: page fault while in kernel mode

igb driver is Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k
Title: Re: Kernel crash with Fatal trap 12: page fault while in kernel mode
Post by: w0w on November 13, 2016, 01:45:41 am
Try different pfSense version, for example 2.2.6 or even 2.4-BETA, if problem persists its looks more like hardware issue, the fastest way is to try same config and version on other hardware, but it is possible only if you have another one.
Title: Re: Kernel crash with Fatal trap 12: page fault while in kernel mode
Post by: CDuv on November 13, 2016, 08:11:39 am
OK, I'll try other versions: I have other hardware.

I guess testing v2.3.3 (https://snapshots.pfsense.org/amd64/pfSense_RELENG_2_3/installer/?C=M;O=D) wouldn't do any better (since I doubt this unknown bug would get fixed).

I rather try v2.4 (https://snapshots.pfsense.org/amd64/pfSense_master/installer/?C=M;O=D) instead of v2.2 (to avoid loosing any feature v2.3 brought): would my v2.3.2-p1 configuration file be accepted on v2.4?
Title: Re: Kernel crash with Fatal trap 12: page fault while in kernel mode
Post by: CDuv on November 14, 2016, 04:29:40 am
Should I disable "Flow Control" (as the Wiki says (https://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards#Flow_Control))
Title: Re: Kernel crash with Fatal trap 12: page fault while in kernel mode
Post by: w0w on November 14, 2016, 01:39:57 pm
Yes, 2.4 can use backup config from previous versions. You can  also try any other settings you find, not only flow control.
Title: Re: Kernel crash with Fatal trap 12: page fault while in kernel mode
Post by: CDuv on November 15, 2016, 03:22:07 am
So, I tried 2.4.0-BETA v20161113-2326 (pfSense-CE-memstick-serial-2.4.0-BETA-amd64-20161113-2326), in 15 hours it failed twice (9h and 15h later).

When I logged in in the WebConfigurator to get the first crash report I got it fine:

Quote
               Crash report begins.  Anonymous machine information:

amd64
11.0-RELEASE-p3
FreeBSD 11.0-RELEASE-p3 #180 8fb831d(RELENG_2_4): Sun Nov 13 23:31:20 CST 2016     root@buildbot2.netgate.com:/builder/ce/tmp/obj/builder/ce/tmp/FreeBSD-src/sys/pfSense

Crash report details:

Filename: /var/crash/bounds
1

Filename: /var/crash/info.0
Dump header from device: /dev/ada0s1b
  Architecture: amd64
  Architecture Version: 2
  Dump Length: 580517888
  Blocksize: 512
  Dumptime: Tue Nov 15 04:00:16 2016
  Hostname: pfsensebox.example.com
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 11.0-RELEASE-p3 #180 8fb831d(RELENG_2_4): Sun Nov 13 23:31:20 CST 2016
    root@buildbot2.netgate.com:/builder/ce/tmp/obj/builder/ce/tmp/FreeBSD-src/sys/pfSense
  Panic String: page fault
  Dump Parity: 1903556642
  Bounds: 0
  Dump Status: good

Filename: /var/crash/info.last
Dump header from device: /dev/ada0s1b
  Architecture: amd64
  Architecture Version: 2
  Dump Length: 580517888
  Blocksize: 512
  Dumptime: Tue Nov 15 04:00:16 2016
  Hostname: pfsensebox.example.com
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 11.0-RELEASE-p3 #180 8fb831d(RELENG_2_4): Sun Nov 13 23:31:20 CST 2016
    root@buildbot2.netgate.com:/builder/ce/tmp/obj/builder/ce/tmp/FreeBSD-src/sys/pfSense
  Panic String: page fault
  Dump Parity: 1903556642
  Bounds: 0
  Dump Status: good

Filename: /var/crash/minfree
2048
            

but when sending report to developers, it got it's second crash which I don't want to send (because it would maybe re-crash the system) but I got this on the serial console:

Quote
Enter an option:
Message from syslogd@pfsensebox at Nov 15 10:01:15 ...
pfsensebox php-fpm[84587]: /index.php: Successful login for user 'admin' from: 10.0.1.53
panic: sbsndptr: sockbuf 0xfffff8010d811518 and mbuf 0xfffff8010ddc6000 clashing
cpuid = 6
Uptime: 6h0m53s
Dumping 567 out of 8135 MB: (CTRL-C to abort) ..3%..12%..23%..32%..43%..51%..63%..71%..82%..91%
Dump complete
                                                                             99
TAB Key on Remote Keyboard To Entry Setup Menu
MB-7551 Ver.AE0 03/28/2014
Version 2.16.1242. Copyright (C) 2013 American Megatrends, Inc.
Press <DEL> or <ESC> to enter setup.


(.. many empty lines ..)


|oading /boot/defaults/loader.conf serial port                                 
/IOS drive C: is disk0        /boot/config: -S115200 -D
BIOS 619kB/2081240kB available memory

FreeBSD/x86 bootstrap loader, Revision 1.1
(root@buildbot2.netgate.com, Wed Aug  3 08:04:25 CDT 2016)


(.. many empty lines ..)

/boot/entropy size=0x100017b93e]a0 |       
Booting... _/ ___|  ___ _ __  ___  ___     
Copyright (c) 1992-2016 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
  | .__/The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 11.0-RELEASE-p3 #180 8fb831d(RELENG_2_4): Sun Nov 13 23:31:20 CST 2016
    root@buildbot2.netgate.com:/builder/ce/tmp/obj/builder/ce/tmp/FreeBSD-src/sys/pfSense amd64


I have a 567MB file "/var/crash/vmcore.0".
Title: Re: [v2.3 & v2.4] Kernel crash with Fatal trap 12: page fault while in kernel mode
Post by: CDuv on November 15, 2016, 07:01:49 am
Other crash:

Quote
Fatal trap 12: page fault while in kernel mode
cpuid = 5; apic id = 0a
fault virtual address   = 0x78
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80d6632c
stack pointer           = 0x28:0xfffffe01ec7d9930
frame pointer           = 0x28:0xfffffe01ec7d9990
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (irq289: igb4:que 5)
trap number             = 12
panic: page fault
cpuid = 5
Uptime: 1h33m1s
Dumping 568 out of 8135 MB:..3%..12%..23%..31%..43%..51%..62%..71%..82%..91%
Dump complete
                                                                             99
TAB Key on Remote Keyboard To Entry Setup Menu
MB-7551 Ver.AE0 03/28/2014
Version 2.16.1242. Copyright (C) 2013 American Megatrends, Inc.
Press <DEL> or <ESC> to enter setup.


(.. many empty lines ..)


|oading /boot/defaults/loader.conf serial port                                 
/IOS drive C: is disk0                                                          BIOS 619kB/2081240kB available memory

FreeBSD/x86 bootstrap loader, Revision 1.1
(root@buildbot2.netgate.com, Wed Aug  3 08:04:25 CDT 2016)
\


(.. many empty lines ..)


syms=[0x8+0x17b620+0x8+0x17b93e]a0 data=0xaad7b8+0x4c60e8 \
/boot/entropy size=0x1000 __  ___  ___     
Booting...|_\___ \ / _ \ '_ \/ __|/ _ \   
Copyright (c) 1992-2016 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
  |_|   The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.               
FreeBSD 11.0-RELEASE-p3 #180 8fb831d(RELENG_2_4): Sun Nov 13 23:31:20 CST 2016
    root@buildbot2.netgate.com:/builder/ce/tmp/obj/builder/ce/tmp/FreeBSD-src/sys/pfSense amd64

I will now try v2.3.1 (I think I recall that it was not crashing that much at that time).

2.4.0-BETA crashed few seconds after I changed Virtual IP settings (to disable box running v2.4.0-BETA and switch production to the box running v2.3.1).
Title: Re: [v2.3 & v2.4] Kernel crash with Fatal trap 12: page fault while in kernel mode
Post by: w0w on November 15, 2016, 01:32:42 pm
I can be wrong but it mostly looks like ECC memory failure. Please replace your memory and test again.
Title: Re: [v2.3 & v2.4] Kernel crash with Fatal trap 12: page fault while in kernel mode
Post by: CDuv on November 15, 2016, 03:47:50 pm
It occurs on 2 identical brand-new box so I doubt faulty hardware is the cause (it is still possible indeed but lowly possible).
Title: Re: [v2.3 & v2.4] Kernel crash with Fatal trap 12: page fault while in kernel mode
Post by: w0w on November 16, 2016, 12:06:16 pm
Two boxes? Does it mean you don't use same pieces of hardware when testing on both machines? For example harddrive/CF or USB stick?
Title: Re: [v2.3 & v2.4] Kernel crash with Fatal trap 12: page fault while in kernel mode
Post by: CDuv on November 16, 2016, 04:30:41 pm
It means I have 2 identical servers with exact same model of component (1X SSD, 1x RAM memory stick) in each one.

v2.3.2 was tested on both servers.
v2.40-BETA (which performs a bit better : only 2-3 crashes per day) was only tested on box 2.
v2.3.1 was only tested on box 1.

I never swapped any piece (would it be RAM or SSD)
Used a couple of USB memory stick for the installations : it is actually the only piece of hardware that was shared between servers.
Title: Re: [v2.3 & v2.4] Kernel crash with Fatal trap 12: page fault while in kernel mode
Post by: CDuv on November 17, 2016, 08:29:06 am
Funny fact, when unplugging network cables (when I want to swap production from one server to another: for the tests): the server crashes...
Title: Re: [v2.3 & v2.4] Kernel crash with Fatal trap 12: page fault while in kernel mode
Post by: w0w on November 17, 2016, 09:58:52 am
Are there some BIOS/UEFI options regarding OS installation compatibility?
Did you try to install 2.4 on ZFS with GPT-UEFI (it should work on latest builds)? I am not sure may be its related to some power savings or anything else you can find in BIOS or UEFI settings related to power savings.
It would be good to disable all CPU power saving modes except common C1 mode for testing purposes.
Title: Re: [v2.3 & v2.4] Kernel crash with Fatal trap 12: page fault while in kernel mode
Post by: W4RH34D on November 17, 2016, 10:11:39 am
bad cable?  faulty plug on the other end with out of spec voltages?
Title: Re: [v2.3 & v2.4] Kernel crash with Fatal trap 12: page fault while in kernel mode
Post by: CDuv on November 17, 2016, 10:48:10 am
Are there some BIOS/UEFI options regarding OS installation compatibility?
Did you try to install 2.4 on ZFS with GPT-UEFI (it should work on latest builds)? I am not sure may be its related to some power savings or anything else you can find in BIOS or UEFI settings related to power savings.
It would be good to disable all CPU power saving modes except common C1 mode for testing purposes.
pfSense 2.4 was installed on MBR: I'll try GPT...

I have no access to BIOS on this appliance which is pre-prepared for pfSense (bought to a local appliance reseller): serial console does not allow me to enter BIOS (I see the "Press <Del> to enter..." but pressing the [Del] key does nothing).: Got it working.
The BIOS says the following about CPU:
* EIST (GV3) : Disable
* P-stat Coordination : Package (cannot be modified)
* TM1 : Enable
* TM2 Mode : Adaptative Throttling (cannot be modified)
* CPU C State : Disable
* Enhanced Halt State : Disable (cannot be modified)
* ACP C2 : Diable (cannot be modified)
* Monitor/Mwait : Enable (cannot be modified)
* L1 Prefetcher : Enable
* L2 Prefetcher : Enable
* Max CPUID Value Limit : Disable
* Execute Disable Bit : Enable
* AES-NI : Enable
* Turbo : Enable (cannot be modified)
* Active Processor Core : All

But, on the pfSense config, PowerD is disabled and AC Power, Battery Power and Unknown Power settings are all set to "Hiadaptive" (did not touched theses after 2.4.0-BETA installation).

It is advised to disable power saving modes in the BIOS/UEFI?
Title: Re: [v2.3 & v2.4] Kernel crash with Fatal trap 12: page fault while in kernel mode
Post by: beppo on November 17, 2016, 01:42:11 pm
Code: [Select]
Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 06
fault virtual address = 0x5e00000000
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80d80b00
stack pointer         = 0x28:0xfffffe00a1644b60
frame pointer         = 0x28:0xfffffe00a1644b80
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 75463 (pfctl)

Got the same error once a day with the same motherboard in 2.3.x. Bios settings weren't changed since the first install of 2.x. I don't think it is a hardware issue but I also have no clue how to solve the issue but installing 2.2 again.
Title: Re: [v2.3 & v2.4] Kernel crash with Fatal trap 12: page fault while in kernel mode
Post by: CDuv on November 17, 2016, 04:00:36 pm
Yeah! I am not alone!

Is your network architecture partially similar to mine? Do you have a lot of users?

If your workaround is to downgrade to 2.2 it is a lead for some debugging and a possible fix...

I see bug #4689 (Panic/Crash "sbflush_internal: cc 4294967166 || mb 0 || mbcnt 0") (https://redmine.pfsense.org/issues/4689) is similar but is marked as resolved for 2.3...
Original bug report on FreeBSD bug tracker (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=138782) is still open and someone reported it ran into the issue it a month ago.
Title: Re: [v2.3 & v2.4] Kernel crash with Fatal trap 12: page fault while in kernel mode
Post by: w0w on November 17, 2016, 10:17:23 pm
I don't see any options you must change in BIOS. All your posted options are OK. Just for testing purpose, enable PowerD and set it to maximum perfomance. Make sure you do not have polling enabled and enable all setting below (see picture).
If it does not help then install 2.2.x version.
Title: Re: [v2.3 & v2.4] Kernel crash with Fatal trap 12: page fault while in kernel mode
Post by: CDuv on November 18, 2016, 06:00:15 am
I had "Hardware Checksum Offloading" unchecked and both "Hardware TCP Segmentation Offloading" & "Hardware Large Receive Offloading" checked.

I'll check "Hardware Checksum Offloading" and set PowerD to maximum...
Title: Re: [v2.3 & v2.4] Kernel crash with Fatal trap 12: page fault while in kernel mode
Post by: CDuv on November 21, 2016, 06:33:19 am
Update:
I tested OPNsense (v16.7.8 ) under the same load and configuration and it works (3 days now)...
In the same time, the other server (on pfSense 2.4.0) which is up but not used (no traffic towards him) did not crashed either: indicates crashes are load/traffic related.

Hope this helps to pinpoint the exact cause of the issue.
Title: Re: [v2.3 & v2.4] Kernel crash with Fatal trap 12: page fault while in kernel mode
Post by: w0w on November 21, 2016, 01:44:42 pm
I think you must create bug report on redmine. The crashes I have had on different hardware also happened under heavy traffic. It could be driver related or NIC hardware revision/firmware.
Title: Re: [v2.3 & v2.4] Kernel crash with Fatal trap 12: page fault while in kernel mode
Post by: w0w on November 23, 2016, 12:42:00 pm
Any news?
Title: Re: [v2.3 & v2.4] Kernel crash with Fatal trap 12: page fault while in kernel mode
Post by: Harvy66 on December 19, 2016, 12:35:44 pm
My car won't start, I heard your's doesn't, I bet it's the same reason.

I'd recommend starting your own threads so people can step you through stuff to check without causing massive confusion about who is talking about what.
Title: Re: [v2.3 & v2.4] Kernel crash with Fatal trap 12: page fault while in kernel mode
Post by: parsalog on January 02, 2018, 05:15:31 pm
I am fighting this same issue, any updates on this issue.  I am running a super micro server as well. it seems a lot of people are seeing the trap 12 when using the intel igb driver specifically?