3

I'm not sure "honor" is the right word for this, but it's the best I could come up with. I have a scenario where I have two servers on the same network. They have primary and secondary IPs, all on the same subnet. For the sake of discussion, they look like this:

server1    eth0    172.16.45.3/24
server1-A  eth0:11 172.16.45.21/27
server1-B  eth0:12 172.16.45.22/27

server2    eth0    172.16.45.4/27

Yes, server1 is set to /24, and yes it's a mistake.

I noticed this problem because connections from server1->server2 had a source IP of 172.16.45.21 instead of 172.16.45.3. Since the app originating the connection does not specify a source IP, I was shocked that it wasn't using 172.16.45.3.

That's when I noticed the incorrect netmask. Since the target IP is in a known smaller network, it uses an IP from the same /27 instead of the IP it thinks is from a /24. Oops.

So, I fixed the netmask on server1:eth0 by running the following command:

ifconfig eth0 netmask 255.255.255.224

ifconfig seemed happy now too:

eth0      Link encap:Ethernet  HWaddr 00:22:19:54:EF:11  
          inet addr:172.16.45.3  Bcast:172.16.45.31  Mask:255.255.255.224
          inet6 addr: fe80::222:19ff:fe54:ef11/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1085587580 errors:0 dropped:1355 overruns:0 frame:0
          TX packets:1208356392 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:365708046601 (340.5 GiB)  TX bytes:667099868812 (621.2 GiB)
          Interrupt:169 Memory:f8000000-f8012100 

Also, the routing table cleaned itself up.

Before:

server1 0 /home/jj33 ># route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
172.16.45.0     0.0.0.0         255.255.255.224 U     0      0        0 eth0
172.16.45.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U     0      0        0 eth0
0.0.0.0         172.16.45.1     0.0.0.0         UG    0      0        0 eth0

After:

server1 0 /home/jj33 ># route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
172.16.45.0     0.0.0.0         255.255.255.224 U     0      0        0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U     0      0        0 eth0
0.0.0.0         172.16.45.1     0.0.0.0         UG    0      0        0 eth0

The only problem is, after all of that, the OS still seems to choose 172.16.45.21 as the source address for outbound connections to the same network (SMTP doesn't directly figure into this problem, just a convenient way to show the source IP of a connection):

server1 0 /home/jj33 ># telnet server2 25
Trying 172.16.45.4...
Connected to 172.16.45.4.
Escape character is '^]'.
220 server2.example.com ESMTP mailer ready at Wed, 23 Dec 2009 12:18:28 -0600
ehlo foo
250-server2.qcommcorp.com Hello server1-A.example.com [172.16.45.21]
250 HELP

(in case it's not obvious, I would expect the mailer to say "Hello server1.example.com [172.16.45.3]" in response to my ehlo if everything were working properly).

So, now my question. How can I get my OS to take notice of the fact that the netmask on eth0 has changed such that it's a better choice for outbound connections to my local /27? I assume restarting the server or restarting networking services would do it, but I would have to wait a week until my next maintenance window and it seems like something I could do without interrupting service (this is a production system and this improper source IP is a small, tangential problem - the core app is working well).

Any help greatly appreciated. Thanks!

UPDATE 1/8/2010:

So, this problem drew more attention than I was expecting and I ended up getting permission to fail the application over to the standby silo and restart networking services on the affected server outside of our standard window, meaning I can't test any of the theories below.

In general though I believe Juliano's response covered the most detail. I didn't copy and paste it but in playing with ip it generally seemed to corroborate what he posited.

Also, enough people piled on about using ip in preference to ifconfig that I spent some time playing with it and tip my hat to you all, I certainly should be using ip. Thanks for the pointers.

jj33
  • 11,038
  • 1
  • 36
  • 50
  • Please, do not use `ifconfig` and `route`, they were written 15+ years ago when Linux had a very different network stack. Instead, you should use the `ip` family of commands. Also, please, post the results of `ip addr show`. – Juliano Dec 24 '09 at 13:23

5 Answers5

3

First things first, do not use ifconfig and route. These commands are generally regarded as obsolete today; they were written too long ago when Linux had a very different network stack, and have been patched ever since. The very idea of interface aliases (e.g. ethX:YY) in order to have multiple addresses is obsolete today, they still exist mostly to please ifconfig itself. Today, the ip command should satisfy all your needs.

Now, understand your original situation: Your eth0 interface originally had two active scopes: /24 and /27. 172.16.45.3 was the primary address for the /24 scope, while 172.16.45.21 was the primary address for the /27 scope (because it is listed first). When you issued the ifconfig command to change the prefix of the first address, it deleted it and reinserted it as a secondary address in the /27 scope. So now you should have something like this:

inet 172.16.45.21/27 brd 172.16.45.31 primary   eth0:11
inet 172.16.45.22/27 brd 172.16.45.31 secondary eth0:12
inet 172.16.45.3/27  brd 172.16.45.31 secondary eth0

It doesn't matter that eth0 should be primary, or that it looks like it should be primary (another reason not to use ifconfig). It was inserted latter in the /27 scope, so it is a secondary address. This also means that outbound packets will be addressed 172.16.45.21, and that if you bring eth0:11 down using ifconfig, all your addresses will be taken down together. This is how it works.

The only way to fix this is to remove all addresses from the interface and reinsert them in the correct order. Then, the first address added (in the /27 scope) will be the primary address in that scope, and further addresses will all be secondaries.

The addressing was already broken from the beginning, there wasn't much you could do in this situation. Your best solution is to just restart the network service.

One possible workaround is to change the source routing address. This will have almost the same effect of changing the primary address. In your case:

ip route change 172.16.45.0/27 dev eth0 src 172.16.45.3

In this case, packets going to 172.16.45.0/27 will have the source address set to 172.16.45.3. You will need another command if you also want to change the source of packets passing through the gateway.

Juliano
  • 5,402
  • 27
  • 28
  • I'm accepting this because it seems to get closest to the heart of the issue, even though I can't test it anymore. Thanks for your details response. – jj33 Jan 08 '10 at 15:48
2

I had a similar issue (two servers with eth0 and eth1 in the same ethernet segment) and couldn't figure out how to force a source in my case. However, you may try this kind of approach to force the source ip in your case :

ip route add dev eth0 src 172.16.45.3 172.16.45.4 metric 2

It's about metric again but including the source in the equation. On my home configuration, it allowed me to pick a different ip to connect to my server compared to what the kernel would choose by default.

Zeograd
  • 201
  • 1
  • 3
1

You did not say what distribution are you using. You should change the configuration files and use some kind of initscript to reload the network settings. (If you skip this step your settings will be lost after reboot.)

The second thing is that nowadays the ip tool is preferred over ifconfig on linux. With ip it you can add and remove ip addresses on-the-fly.

cstamas
  • 6,607
  • 24
  • 42
  • CentOS 5.1, though I don't expect the core of this issue to be distribution specific. I did indeed modify the config files such that they will be correct at boot. I fully expect that if I ran 'service network restart' my problem would be fixed, but that would interrupt produciton traffic, which was the entire point of my question. ifconfig can certainly add and remove ip addresses on the fly, but I will look at ip to see if it offers some control over this situation that ifconfig does not. Thanks for the response. – jj33 Dec 23 '09 at 19:07
1

Have you tried setting the metrics with ifconfig?

metric n Set the routing metric of the interface to n, default 0. The routing metric is used by the routing protocol. Higher metrics have the effect of making a route less favorable; metrics are counted as addition hops to the destination network or host.

Link

Basically you want to prefer using one NIC over another. Try setting links A and B to have a metric of 1.

Joseph Kern
  • 9,809
  • 3
  • 31
  • 55
  • 1
    metric in and of itself wouldn't fix this I don't think because all the IPs are using the same routing entry in the table. It's not about choosing the destination, it's about choosing the source IP, and all the source IPs are (now) in the same subnet. Thanks for the response! – jj33 Dec 23 '09 at 21:29
  • Ah. I got it backwards. – Joseph Kern Dec 24 '09 at 01:03
1

Not sure if I understand but maybe you can do something like

route add -net 172.16.45.0/27 dev eth0

This forces all connections to that subnet to go through eth0 which has the IP address you mentioned.

  • I already have a route for that network, via that device, in my routing table as shown in the route -n output above. The "Device" in the routing table seems to be physical device, not logical device, so they all are set to the physical device eth0 regardless of what the logical device of the chosen outbound IP is. Thanks for the response! – jj33 Dec 23 '09 at 21:31