The pfSense Store

Author Topic: Yet another sk0 (Dlink DGE-530T) driver issue on pfSense 2.0  (Read 4250 times)

0 Members and 1 Guest are viewing this topic.

Offline Ventolin

  • Newbie
  • *
  • Posts: 11
  • Karma: +0/-0
    • View Profile
Yet another sk0 (Dlink DGE-530T) driver issue on pfSense 2.0
« on: December 06, 2011, 03:36:13 pm »
Hi folks,

I've been suffering with pfSense 2.0 (inc betas) dropping my LAN interface after a week or so of working properly ever since upgrading from 1.2.x.  When pfSense drops the nic I can't ping via that nic.

I now have tried 4 gigabit network cards - 3 Dlink DGE-530T and 1 SMC 9452TX gigabit nic which looks almost the same as the dlink cards.  They all have the Marvel Yukon chipset and all use the sk driver.  3 of the nics are brand new.  I had no issues with 1.2.x with the original Dlink nic.

I have discovered that many others have had the same issues as I have (some with DOWN/UP cycles) and a ifconfig sk0 DOWN / ifconfig sk0 UP will bring the nic back to life.

So there definitely looks like there is an issue with this sk driver.

It might be fixed in FreeBSD 8.2+, the question is how can I get a new updated driver working on my pfSense 2.0 box?  Alternatively, how can I get the 1.2.x driver working instead?

I understand the newer driver might have to be compiled to get it to work, but I don't know how to do this.

Can someone please give a guide on how do try newer drivers?

Alternatively, how can I try the driver from 1.2.x?

To clarify, by driver I mean this: http://www.freebsd.org/cgi/man.cgi?query=sk&apropos=0&sektion=0&manpath=FreeBSD+8.1-RELEASE&arch=default&format=html

Others with similar issues:
http://forum.pfsense.org/index.php/topic,41215.0.html
http://forum.pfsense.org/index.php/topic,42942.0.html
http://forum.pfsense.org/index.php/topic,40147.0.html
http://forum.pfsense.org/index.php/topic,42865.0.html

Probably more, but that should illustrate the point.

While waiting for assistance to change the driver, is there a way I can get pfSense to automatically execute a script doing the ifconfig sk0 DOWN/UP cycle when the nic gets dropped (shows up in the logs as hotplug event).  apinger works for my WAN interfaces, is there a way to make it work for the LAN interface too?

In summary I'm looking for:
1. Way to use a newer sk driver
2. Way to use the old sk driver from 1.2.x
3. Workaround script / apinger reconfiguration that cycles the nic automatically.

Thank you very much in advance,

Best Regards,

Vent

PS. my first post, apologies in advance if I have done something wrong or stepped on the wrong toes or posted to the wrong forum etc, please forgive me!

Online stephenw10

  • Hero Member
  • *****
  • Posts: 8176
  • Karma: +10/-0
    • View Profile
Re: Yet another sk0 (Dlink DGE-530T) driver issue on pfSense 2.0
« Reply #1 on: December 06, 2011, 04:10:53 pm »
There's a lot more to running a newer driver than just compiling it unfortunately. This is made all the harder by this:
Quote from: FreeBSD 8.2 Release Notes
The miibus(4) has been rewritten for the generic IEEE 802.3 annex 31B full duplex flow control support. The alc(4), bge(4), bce(4), cas(4), fxp(4), gem(4), jme(4), msk(4), nfe(4), re(4), stge(4), and xl(4) drivers along with atphy(4), bmtphy(4), brgphy(4), e1000phy(4), gentbi(4), inphy(4), ip1000phy(4), jmphy(4), nsgphy(4), nsphyter(4), and rgephy(4) have been updated to support flow control via this facility

Though I note that sk(4) is not in that list.

You could wait for the first builds of 2.1 on FreeBSD 9. No time frame for that though.

Have you tried any tunables? Do your logs show any errors when it stops responding?

Steve

Offline Ventolin

  • Newbie
  • *
  • Posts: 11
  • Karma: +0/-0
    • View Profile
Re: Yet another sk0 (Dlink DGE-530T) driver issue on pfSense 2.0
« Reply #2 on: December 06, 2011, 04:44:21 pm »
Hi Steve,

Yes, I'm beginning to realise there is alot more than simply adding in a new file somewhere in the system and getting it to work, I've been reading your posts!

The first indication of something going wrong is there's a hotplug event detected for lan but ignoring as static ip.

When I saw that I did as others did by turning off all power management I could find in the bios, but that didn't help.

From the man page it looks like the only tunables are to disable jumbo frames and I don't use those, so don't think that would help.  The other thing would be to force full duplex gigabit with slave / master options, but honestly, I'm not holding out much hope of these fixing the issue.

I'm beginning to think the sk driver is just unusable and/or unreliable for pfSense 2.0 and the only realistic option is to just go and buy a different card altogether or revert back to 1.2.x

How about helping out with option 3 and that is to get apinger / other script to cycle the nic down/up? That might provide a workaround in the meantime?

Many thanks for your response and time,

Vent


Online stephenw10

  • Hero Member
  • *****
  • Posts: 8176
  • Karma: +10/-0
    • View Profile
Re: Yet another sk0 (Dlink DGE-530T) driver issue on pfSense 2.0
« Reply #3 on: December 06, 2011, 06:17:00 pm »
There is also:
Quote
dev.skc.%d.int_mod
        This variable controls interrupt moderation.  The accepted range
        is 10 to 10000.  The default value is 100 microseconds.  The
        interface has to be brought down and up again before a change
        takes effect.

Interestingly the default value under the Linux sk98lin driver is 500Ás.

I have been using the sk(4) driver with 2.0 on my test box with no problems at all, unlike the msk(4) driver which freezes all the time!  ::)

Steve

Offline Ventolin

  • Newbie
  • *
  • Posts: 11
  • Karma: +0/-0
    • View Profile
Re: Yet another sk0 (Dlink DGE-530T) driver issue on pfSense 2.0
« Reply #4 on: December 06, 2011, 11:07:16 pm »
Hi,

I think if the interface was cycling between DOWN and UP the interrupt moderation tuning parameter might be helpful, but in my case the interface goes down and stays down, so even a long time might not work.

It's worth a try though, where / how can I configure this?

Secondly, I have been looking at /etc/rc.linkup and it looks to have the exact event I'm seeing in my logs:

Code: [Select]
function handle_argument_group($iface, $argument2) {
global $config;

$ipaddr = $config['interfaces'][$iface]['ipaddr'];
if (is_ipaddr($ipaddr) || empty($ipaddr)) {
log_error("Hotplug event detected for {$iface} but ignoring since interface is configured with static IP ({$ipaddr})");
interfaces_staticarp_configure($iface);
$iface = get_real_interface($iface);
interfaces_bring_up($iface);
if ($argument2 == "start" || $argument2 == "up")
send_event("interface newip {$iface}");

There are other interesting functions in there to bring the interface down or up depending on the second variable.

I might not be reading this correctly, but it looks like it tries to bring the interface up after logging the error.  This is what I'm looking for if it does do that, but how do I get this to run?  It's almost as if the script is doing the opposite of ignoring the event as it says in the log line and goes ahead and tries to bring the interface up anyway.

I am confused!

This is the first time I'm looking at pfSense code and I'm way to new to this to try and play with the code, so can you or anyone offer any advice or suggestions?

Thanks in advance,

Vent

Online stephenw10

  • Hero Member
  • *****
  • Posts: 8176
  • Karma: +10/-0
    • View Profile
Re: Yet another sk0 (Dlink DGE-530T) driver issue on pfSense 2.0
« Reply #5 on: December 07, 2011, 05:43:40 am »
It's worth a try though, where / how can I configure this?

You configure it using the sysctl command:
Code: [Select]
sysctl dev.skc.0.int_modShould give the current value.

Code: [Select]
sysctl dev.skc.0.int_mod 500Sets it to 500.

To set it permanently you can add this as a new value in System: Advanced: System Tunables:


Re-writing parts of pfSense is beyond my usual level of tinkering ;) It shouldn't have to do this. If the interface were actually going down and then coming back up it would reload no problem. Adding code to reset the interface when it sees a hotplug event would cause flapping if you actually unplugged the cable. Perhaps if you posted some of your logs it would be helpful. Also the relevant sections from the boot log:
Code: [Select]
cat /var/log/dmesg.boot | grep sk0And maybe:
Code: [Select]
pciconf -lv | grep sk0
Steve

Offline Ventolin

  • Newbie
  • *
  • Posts: 11
  • Karma: +0/-0
    • View Profile
Re: Yet another sk0 (Dlink DGE-530T) driver issue on pfSense 2.0
« Reply #6 on: December 07, 2011, 02:39:34 pm »
Hi,

Thanks Steve for your help again.

Code: [Select]
sysctl dev.skc.0.int_modreturns 100

Code: [Select]
sysctl -w dev.skc.0.int_mod=500sets it to 500
(quick man sysctl was needed for correct syntax :P)

From the logs, this is what I get...
Code: [Select]
Dec  1 18:51:17 myfirewall check_reload_status: Linkup starting sk0
Dec  1 18:51:17 myfirewall kernel: sk0: link state changed to DOWN
Dec  1 18:51:24 myfirewall php: : Hotplug event detected for lan but ignoring since interface is configured with static IP (192.168.0.3)

After that, nothing until I got hope and typed:
Code: [Select]
ifconfig sk0 UP
and got this in the logs:
Code: [Select]
Dec  1 22:25:30 myfirewall check_reload_status: Reloading filter
Dec  1 22:26:31 myfirewall check_reload_status: Reloading filter
Dec  1 22:32:06 myfirewall check_reload_status: Linkup starting sk0
Dec  1 22:32:06 myfirewall kernel: sk0: link state changed to UP
Dec  1 22:32:17 myfirewall php: : Hotplug event detected for lan but ignoring since interface is configured with static IP (192.168.0.3)
Dec  1 22:32:17 myfirewall check_reload_status: rc.newwanip starting sk0
Dec  1 22:32:29 myfirewall php: : rc.newwanip: Informational is starting sk0.
Dec  1 22:32:29 myfirewall php: : rc.newwanip: on (IP address: 192.168.0.3) (interface: lan) (real interface: sk0).
and then some stuff about routing, apinger restarting, dnsmasq starting etc.

From dmseg.boot:
Code: [Select]
skc0: <Marvell Gigabit Ethernet> port 0xd400-0xd4ff mem 0xd8000000-0xd8003fff irq 10 at device 9.0 on pci0
skc0: Marvell Yukon Gigabit Ethernet rev. (0x1)
sk0: <Marvell Semiconductor, Inc. Yukon> on skc0
miibus0: <MII bus> on sk0
e1000phy0: <Marvell 88E1011 Gigabit PHY> PHY 0 on miibus0
e1000phy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
skc0: [ITHREAD]
skc1: <D-Link DGE-530T Gigabit Ethernet> port 0xe000-0xe0ff mem 0xd8004000-0xd8007fff irq 11 at device 15.0 on pci0
skc1: DGE-530T Gigabit Ethernet Adapter rev. (0x1)
sk1: <Marvell Semiconductor, Inc. Yukon> on skc1
miibus3: <MII bus> on sk1
e1000phy1: <Marvell 88E1011 Gigabit PHY> PHY 0 on miibus3
e1000phy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
skc1: [ITHREAD]
skc2: <D-Link DGE-530T Gigabit Ethernet> port 0xe400-0xe4ff mem 0xd8008000-0xd800bfff irq 10 at device 17.0 on pci0
skc2: DGE-530T Gigabit Ethernet Adapter rev. (0x1)
sk2: <Marvell Semiconductor, Inc. Yukon> on skc2
miibus4: <MII bus> on sk2
e1000phy2: <Marvell 88E1011 Gigabit PHY> PHY 0 on miibus4
e1000phy2:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
skc2: [ITHREAD]
I have 3 gigabit nics in there and 2 10/100 nics (not Marvell/Yukon), a total of 5 nics.
sk0 is currently a brand new SMC 9452TX gigabit card, as mentioned before.  I put the extra cards in there for quick testing of other nics to see if I could isolate the problem but all 3 cards do the same thing (even when installed 1 at a time) which led me to suspect the sk driver.

From pciconf:
Code: [Select]
skc0@pci0:0:9:0:        class=0x020000 card=0xb45210b8 chip=0x432011ab rev=0x12 hdr=0x00
    class      = network
    subclass   = ethernet
skc1@pci0:0:15:0:       class=0x020000 card=0x4c001186 chip=0x4c001186 rev=0x11 hdr=0x00
    class      = network
    subclass   = ethernet
skc2@pci0:0:17:0:       class=0x020000 card=0x4c001186 chip=0x4c001186 rev=0x11 hdr=0x00
    class      = network
    subclass   = ethernet

If a script ran triggered by the hotplug event I don't mind the consequence of the interface 'flapping' if I unplug the network cable; I'm only ever doing that when the box is off and I'm chaning network cards.  Very much the lesser of 2 evils.

Thank you very much for your help,

Regards,

Vent
« Last Edit: December 07, 2011, 02:42:51 pm by Ventolin »

Online stephenw10

  • Hero Member
  • *****
  • Posts: 8176
  • Karma: +10/-0
    • View Profile
Re: Yet another sk0 (Dlink DGE-530T) driver issue on pfSense 2.0
« Reply #7 on: December 07, 2011, 06:52:54 pm »
Hmm, well I'm at the end of diagnostic skills I'm afraid.
My own interfaces:
Code: [Select]
skc0: <Marvell Gigabit Ethernet> port 0xc000-0xc0ff mem 0xd042c000-0xd042ffff irq 16 at device 0.0 on pci5
skc0: Marvell Yukon Lite Gigabit Ethernet rev. (0x9)
sk0: <Marvell Semiconductor, Inc. Yukon> on skc0
e1000phy4: <Marvell 88E1011 Gigabit PHY> PHY 0 on miibus4

skc0@pci0:5:0:0: class=0x020000 card=0x43201148 chip=0x432011ab rev=0x13 hdr=0x00

Slightly different from yours and working perfectly. Not much help to you though! 

Steve


Offline Ventolin

  • Newbie
  • *
  • Posts: 11
  • Karma: +0/-0
    • View Profile
Re: Yet another sk0 (Dlink DGE-530T) driver issue on pfSense 2.0
« Reply #8 on: December 23, 2011, 04:18:15 am »
Hi,

Just to report sysctl -w dev.skc.0.int_mod=500 didn't work, though I got 20 days uptime this time.

Have updated to pfsense 2.0.1 now, maybe that will be better.

Would really like to know how to write that script.

Would be even better if the sk driver was updated or fixed.  How do I make a bug report?

To be honest, I'm either going to put a cheap realtek nic in or build a second pfsense box and use CARP failover between them.

Will report back if any further issues.

Online stephenw10

  • Hero Member
  • *****
  • Posts: 8176
  • Karma: +10/-0
    • View Profile
Re: Yet another sk0 (Dlink DGE-530T) driver issue on pfSense 2.0
« Reply #9 on: December 23, 2011, 05:10:01 am »
It's unlikely a bug report on 2.0.X would be acted on at this point since builds of 2.1 on FreeBSD 9 are now getting close. CMB recently posted 'by the end of the year'.
These will be the first builds for testing only and will probably have many bugs but will have much newer drivers. And by running these and reporting the bugs you will be helping out everyone.  :)

Steve

Edit: However now I can't find where I read that! A tweet maybe? Maybe it wasn't CMB.  ::) Fairly sure I did read it though!
« Last Edit: December 23, 2011, 05:30:03 am by stephenw10 »

Offline Ventolin

  • Newbie
  • *
  • Posts: 11
  • Karma: +0/-0
    • View Profile
Re: Yet another sk0 (Dlink DGE-530T) driver issue on pfSense 2.0
« Reply #10 on: December 25, 2011, 08:19:03 pm »
Hi,

Well 2.0.1 lasted 2 days of uptime, so the sk bug is still there, perhaps unsurprisingly.

So, I did some digging and learnt how to write a shell script which simply pings an ip on the LAN 3 times (with a delay of 1 second) and then does an ifconfig sk0 DOWN then ifconfig sk0 UP if the ping fails.  There's a 2 second wait between the DOWN and UP commands in case the driver needs a bit of time to work.

I then hooked the script up to cron so it could run every minute.

So, here's a quick guide on how to keep your breaking sk driver working.

1. Make a new file called ifcheck.sh in /usr/local/bin

2. copy and paste the following code in:

Code: [Select]
#!/bin/sh
#set logfile here or uncomment second line for no logging
LOGFILE=/tmp/ifcheck.log
#LOGFILE=/dev/null

#Set primary interface/ip to check here
IF1=sk0
IP1=192.168.0.10

#add more interfaces/ips here
#IF2=rl0
#IP2=192.168.1.254

#uncomment next line for debugging
#echo $(date) "pinging interfaces..." >> $LOGFILE

ping -c 3 -t 1 $IP1 > /dev/null 2>&1 || (echo $(date) "$IF1 DOWN, bouncing..." >> $LOGFILE && /sbin/ifconfig $IF1 down && sleep 2 && /sbin/ifconfig $IF1 up && echo $(date) "$IF1 set to UP" >> $LOGFILE)
#ping -c 3 -t 1 $IP2 > /dev/null 2>&1 || (echo $(date) "$IF2 DOWN, bouncing..." >> $LOGFILE && /sbin/ifconfig $IF2 down && sleep 2 && /sbin/ifconfig $IF2 up && echo $(date) "$IF2 set to UP" >> $LOGFILE)

3. reconfigure the script to your chosen interface name and ip address - you can uncomment the second ping and variables to test another interface and extend to test more.

I used the Easy Editor ee from the console shell prompt to write the script instead of the Virtually Impossible vi editor because I didn't want to take 10 years to master vi to do a simple edit when I could spend 5 seconds doing the same thing in ee :P

4. from a shell prompt type the following to make the script executable:
Code: [Select]
chmod +755 /usr/local/bin/ifconfig.sh
5. from the web interface add the cron package

6. Add a new entry to the cron table:
minute */1
hour *
mday *
month *
wday *
who root
command /usr/local/bin/ifcheck.sh

7. Save

8. To test, uncomment the debugging line like this:
Code: [Select]
#uncomment next line for debugging
echo $(date) "pinging interfaces..." >> $LOGFILE

This will write to a log file: /tmp/ifcheck.log

The cron settings will mean this will fire every minute.  You'll want to comment back the debug line once you're satisfied the script is working to only have errors in the log file and to save space.

You could probably use:
Code: [Select]
/usr/bin/nice -n20 /usr/local/bin/ifcheck.sh in the cron entry which I think will make the process run with a lower priority, but I haven't tested that yet.

The script should be self explanatory.  If the ping fails, ping will return an output that will trigger everything after the || to run.

OK, standard disclaimer, if it breaks your system sorry about that, use at your own risk.

To write the script I did quite a bit of googling and took bits of code from other slightly more complicated scripts and kept things simple.

If anyone has any suggestions to make it better, please just add your thoughts and/or improvements, it will be very welcome.  Feel free to use / modify as you wish.

Thanks everyone, esp stephenw10.

Vent
« Last Edit: December 25, 2011, 08:48:37 pm by Ventolin »

Offline Ventolin

  • Newbie
  • *
  • Posts: 11
  • Karma: +0/-0
    • View Profile
ifcheck with syslog logging
« Reply #11 on: December 27, 2011, 05:38:33 am »
OK, this version logs to the syslog as well as the logfile:

Code: [Select]
#!/bin/sh
#set logfile here or uncomment second line for no logging
LOGFILE=/tmp/ifcheck.log
#LOGFILE=/dev/null

#Set primary interface/ip to check here
IF1=sk0
IP1=192.168.0.10

#add more interfaces/ips here
#IF2=rl0
#IP2=192.168.1.254

#uncomment next line for debugging
echo $(date) "Pinging interfaces..." >> $LOGFILE
#logger -t ifcheck Pinging interfaces...

ping -c 3 -t 1 $IP1 > /dev/null 2>&1 || (logger -t ifcheck $IF1 DOWN, bouncing... && echo $(date) "$IF1 DOWN, bouncing..." >> $LOGFILE && /sbin/ifconfig $IF1 down && sleep 2 && /sbin/ifconfig $IF1 up && logger -t ifcheck $IF1 set to UP && echo $(date) "$IF1 set to UP" >> $LOGFILE)
#ping -c 3 -t 1 $IP2 > /dev/null 2>&1 || (logger -t ifcheck $IF2 DOWN, bouncing... && echo $(date) "$IF2 DOWN, bouncing..." >> $LOGFILE && /sbin/ifconfig $IF2 down && sleep 2 && /sbin/ifconfig $IF2 up && logger -t ifcheck $IF2 set to UP && echo $(date) "$IF2 set to UP" >> $LOGFILE)

Offline Ventolin

  • Newbie
  • *
  • Posts: 11
  • Karma: +0/-0
    • View Profile
ifcheck with syslog logging only
« Reply #12 on: December 27, 2011, 05:39:59 am »
This version only logs to the Syslog and no log file:

Code: [Select]
#!/bin/sh

#Set primary interface/ip to check here
IF1=sk0
IP1=192.168.0.10

#add more interfaces/ips here
#IF2=rl0
#IP2=192.168.1.254

#uncomment next line for debugging
#logger -t ifcheck Pinging interfaces...

ping -c 3 -t 1 $IP1 > /dev/null 2>&1 || (logger -t ifcheck $IF1 DOWN, bouncing... && /sbin/ifconfig $IF1 down && sleep 2 && /sbin/ifconfig $IF1 up && logger -t ifcheck $IF1 set to UP)
#ping -c 3 -t 1 $IP2 > /dev/null 2>&1 || (logger -t ifcheck $IF2 DOWN, bouncing... && /sbin/ifconfig $IF2 down && sleep 2 && /sbin/ifconfig $IF2 up && logger -t ifcheck $IF2 set to UP)

Hope that helps someone,

Regards,

Vent

Offline thestealth

  • Jr. Member
  • **
  • Posts: 45
  • Karma: +0/-0
    • View Profile
Re: Yet another sk0 (Dlink DGE-530T) driver issue on pfSense 2.0
« Reply #13 on: June 12, 2012, 07:41:47 am »
I found this post as I have the exact same problem. The posted solution works for my needs.

Thanks Ventolin!!! ;D ;D