Netgate SG-1000 microFirewall

Author Topic: pfSense 2.4.2 crashing on PC engines apu2 at random times  (Read 547 times)

0 Members and 1 Guest are viewing this topic.

Offline /CS

  • Jr. Member
  • **
  • Posts: 85
  • Karma: +0/-0
    • View Profile
pfSense 2.4.2 crashing on PC engines apu2 at random times
« on: January 08, 2018, 12:10:32 pm »
My pfSense (2.4.2) started crashing at random times, it might be once a week or so. There was no recent configuration change, there is nothing written in the logs, no console output and there is no connectivity when that happens. It works fine after rebooting it.

I already upgraded the PC Engines APU firmware/bios and did a memtest which came back clean without any errors.

Any ideas how to troubleshoot it and identify the root cause of the problem? It seems hardware related and I'd like to find out what's happening. I tried to boot from an SD card and a USB drive but it wasn't stable enough to understand if my mSATA SSD is causing the problem.

Thanks,
CS
« Last Edit: January 08, 2018, 12:41:56 pm by /CS »

Offline jimp

  • Administrator
  • Hero Member
  • *****
  • Posts: 21548
  • Karma: +1469/-26
    • View Profile
Re: pfSense 2.4.2 crashing on PC engines apu2 at random times
« Reply #1 on: January 08, 2018, 01:16:31 pm »
If there is no console output, it is most likely hardware related. The first two things to check are cooling and power supply. It could be that your power supply is failing, which would also explain why it would have trouble booting from USB since it takes a little more power to use a USB drive on top of the base system.

If you have another power supply, swap it out and test it that way.
Need help fast? Commercial Support!

Co-Author of pfSense: The Definitive Guide. - Check the Doc Wiki for FAQs.

Do not PM for help!

Offline acascianelli

  • Jr. Member
  • **
  • Posts: 46
  • Karma: +1/-0
    • View Profile
Re: pfSense 2.4.2 crashing on PC engines apu2 at random times
« Reply #2 on: January 08, 2018, 02:49:09 pm »
I have a similar problem with mine.  Mine seems to lock up maybe once a month.  What temperatures is your running at?
PC Engines APU2C4

Offline /CS

  • Jr. Member
  • **
  • Posts: 85
  • Karma: +0/-0
    • View Profile
Re: pfSense 2.4.2 crashing on PC engines apu2 at random times
« Reply #3 on: January 10, 2018, 08:40:27 am »
@acascianelli, it's running around 50-55 Celsius and it happens every 10 days or so.

@jimp, thanks for the hint, I'll search for another power supply and give it a try. I also think it's hardware related, I hope it's the hard drive or the power supply that I can easily replace and not the main board.

Offline /CS

  • Jr. Member
  • **
  • Posts: 85
  • Karma: +0/-0
    • View Profile
Re: pfSense 2.4.2 crashing on PC engines apu2 at random times
« Reply #4 on: January 30, 2018, 06:47:32 pm »
I finally managed to get some error logs from my console, nothing written in system logs though.

Code: [Select]
ahcich0: Timeout on slot 10 port 0
ahcich0: is 00000008 cs 00000000 ss 00000000 rs ffffe7ff tfd 40 serr 00000000 cm d 00406a17
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 01 e8 c7 00 40 00 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: Command timeout
(ada0:ahcich0:0:0:0): Retrying command
ahcich0: Timeout on slot 11 port 0
ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00000800 tfd 50 serr 00000000 cm d 00406b17
(aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
(aprobe0:ahcich0:0:0:0): CAM status: Command timeout
(aprobe0:ahcich0:0:0:0): Retrying command
ahcich0: Timeout on slot 12 port 0
ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00001000 tfd 50 serr 00000000 cm d 00406c17
(aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
(aprobe0:ahcich0:0:0:0): CAM status: Command timeout
(aprobe0:ahcich0:0:0:0): Error 5, Retries exhausted
ahcich0: Timeout on slot 13 port 0
ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00002000 tfd 50 serr 00000000 cm d 00406d17
(aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
(aprobe0:ahcich0:0:0:0): CAM status: Command timeout
(aprobe0:ahcich0:0:0:0): Error 5, Retry was blocked
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <TS16GMSA310 20120703> s/n 20121222A55XXXXXX detached
...
db:0:kdb.enter.default> textdump set
...
db:0:kdb.enter.default>  capture on
...
db:0:kdb.enter.default>  run lockinfo
...
db:0:kdb.enter.default>  show pcpu
...
db:0:kdb.enter.default>  bt
...
db:0:kdb.enter.default>  ps
...
db:0:kdb.enter.default>  alltrace
...
db:0:kdb.enter.default>  capture off
...
db:0:kdb.enter.default>  textdump dump
...
...
Tracing command kernel pid 0 tid 100099 td 0xfffff800144fe000
sched_switch() at sched_switch+0x4aa/frame 0xfffffe011fdb9960
mi_switch() at mi_switch+0xe5/frame 0xfffffe011fdb9990
sleepq_wait() at sleepq_wait+0x3a/frame 0xfffffe011fdb99c0
_sleep() at _sleep+0x255/frame 0xfffffe011fdb9a40
taskqueue_thread_loop() at taskqueue_thread_loop+0x121/frame 0xfffffe011fdb9a70
fork_exit() at fork_exit+0x85/frame 0xfffffe011fdb9ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe011fdb9ab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
db:0:kdb.enter.default>  capture off
db:0:kdb.enter.default>  textdump dump
textdump_writeblock: offset 801111552, error 6
Textdump: Error 6 writing dump
db:0:kdb.enter.default>  reset
cpu_reset: Restarting BSP
cpu_reset_proxy: Stopped CPU 2
PC Engines apu2
coreboot build 07/24/2017
BIOS version v4.6.0


This is where it gets stuck and doesn't boot until I manually reset it.

Offline /CS

  • Jr. Member
  • **
  • Posts: 85
  • Karma: +0/-0
    • View Profile
Re: pfSense 2.4.2 crashing on PC engines apu2 at random times
« Reply #5 on: January 30, 2018, 07:02:45 pm »
Could a moderator move this topic under "Hardware" please?

Offline software

  • Newbie
  • *
  • Posts: 9
  • Karma: +1/-0
    • View Profile
Re: pfSense 2.4.2 crashing on PC engines apu2 at random times
« Reply #6 on: January 31, 2018, 07:22:56 am »
Hi,

Same issue on APU3. I noticed few days ago the same behaviour

Code: [Select]
ahcich0: Timeout on slot 10 port 0
ahcich0: is 00000008 cs 00000000 ss 00000000 rs ffffe7ff tfd 40 serr 00000000 cm d 00406a17
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 01 e8 c7 00 40 00 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: Command timeout
(ada0:ahcich0:0:0:0): Retrying command
ahcich0: Timeout on slot 11 port 0

Temperature says around  53 - 57 C .

Running coreboot v4.6.0

Offline /CS

  • Jr. Member
  • **
  • Posts: 85
  • Karma: +0/-0
    • View Profile
Re: pfSense 2.4.2 crashing on PC engines apu2 at random times
« Reply #7 on: January 31, 2018, 02:48:56 pm »
That's interesting. Let me also highlight that the crashes do NOT happen when the system is under load.

PC Engines apu2
Coreboot: build 07/24/2017
BIOS: version v4.6.0
pfSense: 2.4.2-RELEASE-p1 (amd64)
OS: FreeBSD 11.1-RELEASE-p6
mSATA SSD: Transcend TS16GMSA310 16 GB - https://www.amazon.co.uk/Transcend-TS16GMSA310-16-GB-Internal/dp/B007DIS8Y2


@software, what kind of storage do you use?
« Last Edit: January 31, 2018, 02:53:27 pm by /CS »

Offline Gil

  • Full Member
  • ***
  • Posts: 110
  • Karma: +3/-1
    • View Profile
Re: pfSense 2.4.2 crashing on PC engines apu2 at random times
« Reply #8 on: January 31, 2018, 03:11:37 pm »
I run a number of APU2; Coreboot 4.0.7 & have not seen this issue occur.
11 cheers for binary

Offline hda

  • Sr. Member
  • ****
  • Posts: 599
  • Karma: +32/-4
    • View Profile
Re: pfSense 2.4.2 crashing on PC engines apu2 at random times
« Reply #9 on: January 31, 2018, 03:21:29 pm »
No problems:

System    PC Engines APU2B2
BIOS    Vendor: coreboot
Version: 88a4f96
Release Date: Mon Mar 7 2016
Version    2.4.2-RELEASE-p1 (amd64)
built on Tue Dec 12 13:45:26 CST 2017
FreeBSD 11.1-RELEASE-p6
&
mSATA SSD: Transcend TS16GMSA370 16 GB
« Last Edit: January 31, 2018, 03:24:45 pm by hda »

Offline software

  • Newbie
  • *
  • Posts: 9
  • Karma: +1/-0
    • View Profile
Re: pfSense 2.4.2 crashing on PC engines apu2 at random times
« Reply #10 on: February 05, 2018, 02:18:47 pm »
That's interesting. Let me also highlight that the crashes do NOT happen when the system is under load.

PC Engines apu2
Coreboot: build 07/24/2017
BIOS: version v4.6.0
pfSense: 2.4.2-RELEASE-p1 (amd64)
OS: FreeBSD 11.1-RELEASE-p6
mSATA SSD: Transcend TS16GMSA310 16 GB - https://www.amazon.co.uk/Transcend-TS16GMSA310-16-GB-Internal/dp/B007DIS8Y2


@software, what kind of storage do you use?
It's a no name chinese mSata SSD.  I think the cheap SSD bits me now in the ass.
Same here, it was during the night. Almost no load, except some openvpn ping traffic and some home automation.

Code: [Select]
Jan 23 23:33:46 kernel (aprobe0:ahcich0:0:0:0): Error 5, Retries exhausted
Jan 23 23:33:46 kernel (aprobe0:ahcich0:0:0:0): CAM status: Command timeout
Jan 23 23:33:46 kernel (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Jan 23 23:33:46 kernel ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00008000 tfd 50 serr 00000000 cmd 00406f17
Jan 23 23:33:46 kernel ahcich0: Timeout on slot 15 port 0
Jan 23 23:33:16 kernel (aprobe0:ahcich0:0:0:0): Retrying command
Jan 23 23:33:16 kernel (aprobe0:ahcich0:0:0:0): CAM status: Command timeout
Jan 23 23:33:16 kernel (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Jan 23 23:33:16 kernel ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00004000 tfd 50 serr 00000000 cmd 00406e17
Jan 23 23:33:16 kernel ahcich0: Timeout on slot 14 port 0
Jan 23 23:32:46 kernel (aprobe0:ahcich0:0:0:0): Error 5, Retries exhausted
Jan 23 23:32:46 kernel (aprobe0:ahcich0:0:0:0): CAM status: Command timeout
Jan 23 23:32:46 kernel (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Jan 23 23:32:46 kernel ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00000800 tfd 50 serr 00000000 cmd 00406b17
Jan 23 23:32:46 kernel ahcich0: Timeout on slot 11 port 0
Jan 23 23:32:16 kernel (aprobe0:ahcich0:0:0:0): Retrying command
Jan 23 23:32:16 kernel (aprobe0:ahcich0:0:0:0): CAM status: Command timeout
Jan 23 23:32:16 kernel (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Jan 23 23:32:16 kernel ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00000400 tfd 50 serr 00000000 cmd 00406a17
Jan 23 23:32:16 kernel ahcich0: Timeout on slot 10 port 0
Jan 23 23:31:46 kernel (aprobe0:ahcich0:0:0:0): Error 5, Retries exhausted
Jan 23 23:31:46 kernel (aprobe0:ahcich0:0:0:0): CAM status: Command timeout
Jan 23 23:31:46 kernel (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Jan 23 23:31:46 kernel ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00000020 tfd 50 serr 00000000 cmd 00406517
Jan 23 23:31:46 kernel ahcich0: Timeout on slot 5 port 0
Jan 23 23:31:16 kernel (aprobe0:ahcich0:0:0:0): Retrying command
Jan 23 23:31:16 kernel (aprobe0:ahcich0:0:0:0): CAM status: Command timeout
Jan 23 23:31:16 kernel (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Jan 23 23:31:16 kernel ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00000010 tfd 50 serr 00000000 cmd 00406417
Jan 23 23:31:16 kernel ahcich0: Timeout on slot 4 port 0
Jan 23 23:30:46 kernel (aprobe0:ahcich0:0:0:0): Error 5, Retries exhausted
Jan 23 23:30:46 kernel (aprobe0:ahcich0:0:0:0): CAM status: Command timeout
Jan 23 23:30:46 kernel (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Jan 23 23:30:46 kernel ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00000002 tfd 50 serr 00000000 cmd 00406117
Jan 23 23:30:46 kernel ahcich0: Timeout on slot 1 port 0
Jan 23 23:30:16 kernel (aprobe0:ahcich0:0:0:0): Retrying command
Jan 23 23:30:16 kernel (aprobe0:ahcich0:0:0:0): CAM status: Command timeout
Jan 23 23:30:16 kernel (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Jan 23 23:30:16 kernel ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00000001 tfd 50 serr 00000000 cmd 00406017
Jan 23 23:30:16 kernel ahcich0: Timeout on slot 0 port 0
Jan 23 23:29:46 kernel (aprobe0:ahcich0:0:0:0): Error 5, Retries exhausted
Jan 23 23:29:46 kernel (aprobe0:ahcich0:0:0:0): CAM status: Command timeout
Jan 23 23:29:46 kernel (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Jan 23 23:29:46 kernel ahcich0: is 00000002 cs 00000000 ss 00000000 rs 08000000 tfd 50 serr 00000000 cmd 00407b17
Jan 23 23:29:46 kernel ahcich0: Timeout on slot 27 port 0
Jan 23 23:29:16 kernel (aprobe0:ahcich0:0:0:0): Retrying command
Jan 23 23:29:16 kernel (aprobe0:ahcich0:0:0:0): CAM status: Command timeout
Jan 23 23:29:16 kernel (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Jan 23 23:29:16 kernel ahcich0: is 00000002 cs 00000000 ss 00000000 rs 04000000 tfd 50 serr 00000000 cmd 00407a17
« Last Edit: February 05, 2018, 02:22:10 pm by software »

Offline KOM

  • Hero Member
  • *****
  • Posts: 5591
  • Karma: +688/-23
    • View Profile
Re: pfSense 2.4.2 crashing on PC engines apu2 at random times
« Reply #11 on: February 05, 2018, 03:01:51 pm »
Looks like crappy disks to me.

Offline /CS

  • Jr. Member
  • **
  • Posts: 85
  • Karma: +0/-0
    • View Profile
Re: pfSense 2.4.2 crashing on PC engines apu2 at random times
« Reply #12 on: February 05, 2018, 10:34:35 pm »
Looks like crappy disks to me.

I think so too.

I have ordered a new "Transcend 32GB mSATA SSD (TS32GMSA370)" and I'll keep you posted.

Offline silentcreek

  • Newbie
  • *
  • Posts: 19
  • Karma: +0/-0
    • View Profile
Re: pfSense 2.4.2 crashing on PC engines apu2 at random times
« Reply #13 on: February 06, 2018, 03:16:49 am »
Did your system crash before you upgraded the BIOS to 4.6.0? Which BIOS version were you running then? 4.0.x or 4.5.x?

Because the version that you are currently running is not recommended to use with pfSense or FreeBSD. PCEngines has a warning on their BIOS/Howto page:
Quote
For FreeBSD based OS like OPNSense and pfSense please use the legacy versions.
Note: "legacy" versions are 4.0.x
See: http://pcengines.ch/howto.htm#bios

There have been several reports about issues with the newer coreboot releases (afaik not *this* issue though), so I wouldn't be surprised if this is caused by the firmware and not the disk itself.

That being said, if you have seen this on the older firmware 4.0x. as well, then it's likely a disk issue.

Offline /CS

  • Jr. Member
  • **
  • Posts: 85
  • Karma: +0/-0
    • View Profile
Re: pfSense 2.4.2 crashing on PC engines apu2 at random times
« Reply #14 on: February 06, 2018, 11:01:37 am »
Did your system crash before you upgraded the BIOS to 4.6.0? Which BIOS version were you running then? 4.0.x or 4.5.x?
...
That being said, if you have seen this on the older firmware 4.0x. as well, then it's likely a disk issue.

It actually happened when I was on older firmware, I don't recall the version. I just thought it was a good opportunity to upgrade to the latest one hoping that it could possibly help.
I'll find out soon.