Problem
All Production Servers were suddenly not able to access the internet anymore, while four other Servers connected to the same VLAN and same eth0 settings can.
Figure 1: System 1
represents the four systems which are able to access the internet, while System 2
indicates the ones which suddenly cannot since today afternoon.
Analysis
- System 1 can access System 2 and vice versa
Default Gateway
(10.10.10.1) can be pinged from System 1 and System 2 as well- System 1 can access the internet
- System 2 cannot access the internet
- Ifconfig's
eth0
configuration identical between all Production Servers - Internal DNS server is identical to other systems which can access the internet
- The IP's and names located in
/etc/resolve.conf
can be accessed - The internet can be accessed from the Switch
- Configuration of all 8 Switchports on Cisco IOS is identical
- Tracepath from System 2 to
8.8.8.8 (DNS Google), google IP or google.com hangs at the
Default Gateway
- The systems which cannot access the system seems to have an
em1
adapter instead ofeth0
sudo arping -I eth0 ping.tweakers.net
works on all 8 systems- One of the systems which cannot access the internet show an output if
sudo iptables-save
has been executed - Output
route -n
is identical between all the systems
Tracepath
[username@hostname ~]$ tracepath google.com
1: 10.10.10.10 (10.10.10.10) 0.222ms pmtu 1500
1: 10.10.10.1 (10.10.10.1) 0.662ms
1: 10.10.10.1 (10.10.10.1) 0.601ms
2: no reply
ARP
System1: ? (10.10.10.1) at AA:BB:CC:DD:EE:FF [ether] on em1
System2: ? (10.10.10.1) at AA:BB:CC:DD:EE:FF [ether] on eth0
Output iptables-save on one of the systems which cannot access the internet
# Generated by iptables-save vX on Fri Aug 1 10:00:01 2014
*filter
:INPUT ACCEPT [X:Y]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [X:Y]
COMMIT
# Completed on Fri Aug 1 10:00:01 2014
route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
10.10.10.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
X.Y.0.0 0.0.0.0 255.255.0.0 U Z 0 0 eth0
0.0.0.0 10.10.10.1 0.0.0.0 UG 0 0 0 eth0
It is unclear why the internet cannot be accessed anymore from the four production servers. As these are running in Production, a restart of the network should be prevented. Which further tests could be done to investigate the issue?