1

I have a physical SLES 11 SP2 server on a Sun Fire x4140 that is giving me problems with networking upon reboot. The NICs are onboard.

The networking appears successful during boot, but network services such as nfs fail hard. This is because eth0 and eth1 are both receiving the same configuration and are both ifup-ed. Once everything times out and I'm at the console, ifconfig shows that eth0 and eth1 are UP and running with the same IP. Attempting to ping anything in that subnet fails. Restarting the network service fixes the issue.

eth0 is the correct NIC that should be configured as primary, per the MAC address.

Question: Whats causing eth1 to be brought up with the same config as eth0??

I do not have a config script set up for eth1:

banjer@harp:~> ls -la /etc/sysconfig/network/
total 104
drwxr-xr-x 6 root root  4096 Jun 11 12:21 .
drwxr-xr-x 6 root root  4096 Apr 10 09:46 ..
-rw-r--r-- 1 root root 13916 Apr 10 09:32 config
-rw-r--r-- 1 root root  9952 Apr 10 09:36 dhcp
-rw------- 1 root root   180 Jun 11 12:21 ifcfg-eth0
-rw------- 1 root root   180 Jun 11 12:21 ifcfg-eth3
-rw------- 1 root root   172 Feb  1 08:32 ifcfg-lo
-rw-r--r-- 1 root root 29333 Feb  1 08:32 ifcfg.template
drwxr-xr-x 2 root root  4096 Apr 10 09:32 if-down.d
-rw-r--r-- 1 root root   239 Feb  1 08:32 ifroute-lo
drwxr-xr-x 2 root root  4096 Apr 10 09:33 if-up.d
drwx------ 2 root root  4096 May  5  2010 providers
-rw-r--r-- 1 root root    25 Nov 16  2010 routes
drwxr-xr-x 2 root root  4096 Apr 10 09:36 scripts

On a side note, eth3 is also configured with an IP in a different subnet, but this has not posed any problems. FYI the kernel module being used is forcedeth.

banjer@harp:~> sudo cat /etc/sysconfig/network/ifcfg-eth0
BOOTPROTO='static'
BROADCAST=''
ETHTOOL_OPTIONS=''
IPADDR='172.21.64.25/20'
MTU=''
NAME='MCP55 Ethernet'
NETWORK=''
REMOTE_IPADDR=''
STARTMODE='auto'
USERCONTROL='no'
ONBOOT="yes"

Here's eth3 in case you need to see it:

banjer@harp:~> sudo cat /etc/sysconfig/network/ifcfg-eth3
BOOTPROTO='static'
BROADCAST=''
ETHTOOL_OPTIONS=''
IPADDR='172.11.200.4/24'
MTU=''
NAME='MCP55 Ethernet'
NETWORK=''
REMOTE_IPADDR=''
STARTMODE='auto'
USERCONTROL='no'
ONBOOT="yes"

Perhaps is something related to udev? 70-persistent-net-rules looks OK to me, but I may not understand it completely.

banjer@harp:~> cat /etc/udev/rules.d/70-persistent-net.rules
# This file was automatically generated by the /lib/udev/write_net_rules
# program, run by the persistent-net-generator.rules rules file.
#
# You can modify it, as long as you keep each rule on a single
# line, and change only the value of the NAME= key.

# PCI device 0x10de:0x0373 (forcedeth)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:18:4f:8d:85:4c", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth2"

# PCI device 0x10de:0x0373 (forcedeth)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:18:4f:8d:85:4a", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"

# PCI device 0x10de:0x0373 (forcedeth)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:18:4f:8d:85:4b", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth1"

# PCI device 0x10de:0x0373 (forcedeth)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:18:4f:8d:85:4d", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth3"

# PCI device 0x1077:0x3032 (qla3xxx)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:c1:dd:0e:34:6c", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth4"

Any other thoughts on what would cause this?

UPDATE 1

Per suggestions, I gave a config to all the other NICs not being used (eth1 and eth2) e.g. here is eth1:

banjer@harp:/etc/sysconfig/network> sudo cat ifcfg-eth1
BOOTPROTO='static'
BROADCAST=''
ETHTOOL_OPTIONS=''
IPADDR=''
MTU=''
NAME='MCP55 Ethernet'
NETMASK='255.255.255.0'
NETWORK=''
REMOTE_IPADDR=''
STARTMODE='off'
ONBOOT='no'
USERCONTROL='no'

and added the specific HWADDR to the NICs that are actually plugged in (eth0 and eth3). During the test reboot, I see the networking come up as expected, and eth1 and eth2 say "skipped" as expected. However, eth1 is still getting brought up with eth0's config.

I set udev_log="debug" in /etc/udev/udev.conf, and now I have a bunch of debug messages in /var/log/messages. Here is a paste of grep eth1 /var/log/messages, but I don't see anything that stands out when comparing to a grep of other eth's.

UPDATE 2

Thinking this is a udev issue, I made a change to /lib/udev/rules.d/75-persistent-net-generator.rules and did rm /etc/udev/rules.d/70-persistent-net.rules.

# device name whitelist
#KERNEL!="eth*|ath*|wlan*[0-9]|msh*|ra*|sta*|ctc*|lcs*|hsi*", GOTO="persistent_net_generator_end"
KERNEL!="eth[03]|ath*|wlan*[0-9]|msh*|ra*|sta*|ctc*|lcs*|hsi*", GOTO="persistent_net_generator_end"

After rebooting, this did exactly what I wanted (generated rules for eth0, eth3) but it did not solve the problem. eth1 is still brought up. Is there a way to debug the entire boot process, e.g. strace? I have no idea where this is coming from.

As a band-aid, I'm adding an rc script to restart the network late in the boot process.

Banjer
  • 3,854
  • 11
  • 40
  • 47

4 Answers4

2

you say you don't have a config script for eth1. why not? is it supposed to be configured or not? if it is, then what IP is it supposed to have. static allocations or dhcp?

those are questions for you to think about, btw, not necessarily to answer here.

try creating a config for eth1, even if it's just a minimal one with ONBOOT="no", suse might be doing some insane default automagic crap if there's no config file.

cas
  • 6,653
  • 31
  • 34
  • re: config scripts, all other physical servers in our datacenter running SLES 11 are similar: we only have config scripts for the NICs being used, and the others are non-existent. Its never posed an issue. I put in a minimal config with all of the suggestions here, but won't be able to test a reboot for a while (maybe not until the weekend). Thanks for the info. – Banjer Jun 25 '12 at 13:09
2

Making the config files more specific should help. Add the following directives to your ifcfg-ethX files:

DEVICE=eth0
HWADDR=00:18:4f:8d:85:4a

Rinse, Lather, Repeat for eth3 etc

You could (should?) add config files for eth1 etc as well:

DEVICE=eth1
HWADDR=00:18:4f:8d:85:4b
ONBOOT=no
fukawi2
  • 5,327
  • 3
  • 30
  • 51
1

Try adding:

HWADDR='00:18:4f:8d:85:4a'

to /etc/sysconfig/network-scripts/ifcfg-eth0. You may also want to create an ifcfg-eth1 that contains something like this:

DEVICE='eth1'
BOOTPROTO='none'
HWADDR='00:18:4f:8d:85:4b'
USERCONTROL='no'
ONBOOT='yes'

At least on RHEL that will just bring up the interface with no IP configuration, and the networking init scripts look similar on SuSE 11. The other solution regarding SuSE networking configuration is to clear out the 70-persistent-net.rules with something like:

cat < /dev/null > /etc/udev/rules.d/70-persistent-net.rules

That will clear the udev rules and tell init to use the ifcfg-eth* files for interface identification.

d34dh0r53
  • 1,671
  • 11
  • 11
  • it's simpler to just type "> /etc/udev/rules.d/70-persistent-net.rules" at the shell prompt, no need to cat /dev/null and redirect. even easier to just rm -f it. – cas Jun 25 '12 at 02:45
  • I just cut and pasted the command from the Novell docs, I'm fully versed on the inner-workings of shell redirection, thx ;) – d34dh0r53 Jun 25 '12 at 02:58
  • that's in the novell docs? awe-inspiring :) – cas Jun 25 '12 at 03:16
  • LOL, my thoughts exactly :) – d34dh0r53 Jun 25 '12 at 03:24
  • Thanks for the suggestions, I'll try them out. So glad we're getting away from SLES and moving to CentOS. The fact that they jumped from the 2.6.x kernel to 3.x from SP1 to SP2 blew my mind. We've had all sorts of issues. I'm sure we'll find our share in CentOS as well, but so far its been smooth. – Banjer Jun 25 '12 at 13:01
0

I was unable to determine the cause behind this mystery of two NICs getting configured the same IP and subnet on boot.

The final solution to the problem however, was to move the cable from the first NIC to the second NIC, i.e. from eth0 to eth1. Then I configured ifcfg-eth1 and "unconfigured" ifcfg-eth0. Now my networking and network-dependent services come up perfectly.

I get the sense that it may be a forcedeth module or perhaps a BIOS issue, but I won't be spending any more time on it, as we're building servers with totally different hardware these days and moving from SLES to CentOS, so I don't expect the problem to manifest again.

Banjer
  • 3,854
  • 11
  • 40
  • 47