9

Here's a weird problem.

We've got a number of devices with dual-NIC mainboards. Some are Realtek NICs, which suck. Some are Intel e1000s, which don't.

I've just noticed on 2 machines, one is an Intel NIC, one is a Realtek, that when I put the MAC address of one machine into the dhcpd.conf file on our DHCP server to get it to PXE boot the machine into a rebuild environment, initially everything is fine.

The server gets a DHCP allocation, and PXE boots into the Ubuntu preseed enviroment.

On one or two machines, it gets as far as Ubuntu's DHCP network configuration, and fails. If i pull up a busybox shell (on tty2 on the installing machine), and run ip link, I can see that the UP flag is set on the other NIC.

Here's some stuff.

  host xeon16-ghz240-gb48-node1 {
        hardware ethernet BC:AE:C5:07:1F:18;
        filename "pxelinux.0";
        next-server 192.168.123.80;
  }

That's what's in dhcpd.conf

This is what ip link on the evil machine looks like. ip link output

Only one NIC is actually connected (deliberately).

As you can see, the NIC that's in the dhcpd config, is not marked as UP, and the link that is UP, isn't the one in DHCP.

So far I've seen this on two brands of dual-NIC configuration.

Does anyone know 1) what's causing it, and b) What we can do about it?

Tom O'Connor
  • 27,440
  • 10
  • 72
  • 148
  • 1. Different order of intializing PCI devices. So the BIOS uses the ":18" MAC and the OS is using the ":19" MAC first. 2. No idea =] – Chris S Jan 27 '12 at 16:11
  • I'll add this as a comment rather than an answer because it's fairly weak, but I can say that someone prior to me found this exact same problem and solved it by adding MAC and MAC+1 to the `dhcpd.conf` file when setting up a Kickstart. – Kyle Smith Jan 27 '12 at 16:12
  • What's the preseed look like? Specifically, is `netcfg/choose_interface` set? – Shane Madden Jan 27 '12 at 16:18
  • `./master/master_preseed.cfg:d-i netcfg/choose_interface select auto` – Tom O'Connor Jan 27 '12 at 16:35
  • @KyleSmith Yeah.. It's a little stochastic though. – Tom O'Connor Jan 27 '12 at 16:36
  • Are you using both NICs? If not, try disabling the Realtek one in the BIOS. I do this for HP servers that have PCI NICs when Kickstarting them so that the onboard NICs become eth[0-3] (otherwise the PCI ones usually get there first). – James O'Gorman Jan 28 '12 at 16:25

2 Answers2

8

There's always more than one way to do anything :)

Solution 1

Motherboards with one of each?

Blacklist whichever module (ethtool -i eth0) is supporting the Realtek card.

Ubuntu supports module_name.blacklist=yes to blacklist it at boot and you should be able to change the modprobe options in the preseed environment so that it doesn't get probed later.


Solution 2

Let me rephrase the problem:

We have motherboards with two NICs and we want them to work consistently no matter which interface is plugged in. We can't always determine which interface (from the OS point of view) will be plugged in.

Set up bonding! Use an active-passive configuration (mode=active-backup miimon=100) with both interfaces as slaves. This way, it will always work no matter which interface is plugged in.


Solution 3

Are the motherboards consistent enough that the NICs always show up on the same PCI ID? Use udev rules to always assign the card on a particular PCI address to eth0 and the card on the other address to eth1.

Note that you can have two different udev rules that assign a device to eth0 - this allows you to handle the Realtek and e1000 case at the same time.

MikeyB
  • 38,725
  • 10
  • 102
  • 186
  • They're both Realtek sadly.. Gonna get some e1000s to replace them, then will probably kill them off in the bios. – Tom O'Connor Jan 28 '12 at 17:44
  • 1
    Ooohhhh, misunderstood. Thought you had motherboards with 1 x e1000 and 1 x Realtek. – MikeyB Jan 28 '12 at 21:50
  • Good answers.. I'm not entirely sure what is supported as this problem tends to present itself between PXE loader and debian-installer's DHCP. I personally think the best option will be to disable all but one decent ***Intel*** NIC – Tom O'Connor Jan 29 '12 at 01:42
  • We ended up setting up bonding, and getting around the problem by putting both addresses into DHCP. – Tom O'Connor Feb 08 '12 at 16:15
5

You can try to add PXELINUX IPAPPEND 2 option to your pxelinux.cfg file to tell init scripts to use the interface that did the PXE boot:

/var/lib/tftpboot/pxelinux.cfg/default

LABEL linux
   KERNEL /ubuntu/casper/vmlinuz 
   APPEND initrd=/ubuntu/casper/initrd.gz root=/dev/nfs boot=casper netboot=nfs nfsroot=192.168.1.1:/var/lib/tftpboot/ubuntu --
   IPAPPEND 2

see: http://www.syslinux.org/wiki/index.php/SYSLINUX#IPAPPEND_flag_val_.5BPXELINUX_only.5D

panticz
  • 731
  • 7
  • 5