I would not say that I am a total newbie to server administration but obviously I have missed some key moment here...
Problem: Connectivity to the server lost (like lockout via firewall on the server) from particular subnet when a website is accessed from one specific device - Apple iPad (Version 8.4.1 (12H321) Model: MD515HC/A) running Safari.
The connectivity comes back after a short while of ipad inactivity.
If, before the lockout, there is an active SSH connection to the server - the connection is kept all right, but new connections to the server cannot be made (as if all ports are closed).
Iptables input/output policies ar set to ACCEPT. Amazon EC2 has my IP address set up to allow all traffic.
# iptables -L -n
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
Chain f2b-sshd (0 references)
target prot opt source destination
Regular log files show absolutely no relevant information.
# apparmor_status
apparmor module is loaded.
1 profiles are loaded.
1 profiles are in enforce mode.
docker-default
0 profiles are in complain mode.
0 processes have profiles defined.
0 processes are in enforce mode.
0 processes are in complain mode.
0 processes are unconfined but have a profile defined.
# cat /etc/selinux/config
SELINUX=permissive
SELINUXTYPE=targeted
SETLOCALDEFS=0
Webserver running nginx 1.12.1 with php 5.6 and 7.0 Updated from nginx from 1.10 to 1.12.1 - the problem remains.
I doubt that the problem is directly connected to nginx rather than to how it is using system resources.
Instance type currently is Amazon EC2 - t2.micro but the same problem persists on c4.8xlarge
Nothing out of obvious comes out from nginx strace when webpage is accessed from iPad.
Right after connection hangs - Wireshark output on the emitter end:
13713 1413.319083 192.168.8.100 52.57.147.216 TCP 66 54046 → 80 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 WS=256 SACK_PERM=1
...
13750 1422.319314 192.168.8.100 52.57.147.216 TCP 66 [TCP Retransmission] 54046 → 80 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 WS=256 SACK_PERM=1
tcpdump on the server when the connection is allright:
17:10:53.188562 IP (tos 0x0, ttl 111, id 11792, offset 0, flags [DF], proto TCP (6), length 40)
XXX.XXX.XXX.XXX.55020 > 172.31.12.47.80: Flags [F.], cksum 0x3882 (correct), seq 2232, ack 1211, win 255, length 0
17:10:53.188741 IP (tos 0x0, ttl 111, id 11793, offset 0, flags [DF], proto TCP (6), length 52)
XXX.XXX.XXX.XXX.55031 > 172.31.12.47.80: Flags [S], cksum 0x111a (correct), seq 2503140615, win 64240, options [mss 1420,nop,wscale 8,nop,nop,sackOK], length 0
17:10:53.249513 IP (tos 0x0, ttl 111, id 11794, offset 0, flags [DF], proto TCP (6), length 40)
XXX.XXX.XXX.XXX.55031 > 172.31.12.47.80: Flags [.], cksum 0x984a (correct), seq 2503140616, ack 1871922116, win 260, length 0
17:10:53.252631 IP (tos 0x0, ttl 111, id 11795, offset 0, flags [none], proto TCP (6), length 784)
XXX.XXX.XXX.XXX.55031 > 172.31.12.47.80: Flags [P.], cksum 0x88f8 (correct), seq 0:744, ack 1, win 260, length 744: HTTP, length: 744
GET /wtf/2.htm HTTP/1.1
Host: www....lv
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.8
Cookie: PHPSESSID=0hospgqmearo59saf20cfv7tt3; _hjIncludedInSample=1; _ga=GA1.2.2014778432.1499196989; _gid=GA1.2.833813339.1507378818
x-tele2-subid: XXX.XXX.XXX.XXX
17:10:53.359526 IP (tos 0x0, ttl 111, id 11796, offset 0, flags [DF], proto TCP (6), length 40)
XXX.XXX.XXX.XXX.55031 > 172.31.12.47.80: Flags [.], cksum 0x93d0 (correct), seq 744, ack 404, win 259, length 0
tcpdump on the server right after connections are dropped:
17:11:19.181562 IP (tos 0x0, ttl 47, id 38570, offset 0, flags [DF], proto TCP (6), length 52)
XXX.XXX.XXX.XXX.51273 > 172.31.12.47.80: Flags [.], cksum 0xf058 (correct), seq 1157, ack 199, win 4129, options [nop,nop,TS val 323793256 ecr 4027542545], length 0
17:11:19.251976 IP (tos 0x0, ttl 47, id 8939, offset 0, flags [DF], proto TCP (6), length 52)
XXX.XXX.XXX.XXX.51274 > 172.31.12.47.80: Flags [.], cksum 0x711b (correct), seq 1158, ack 198, win 4129, options [nop,nop,TS val 323793326 ecr 4027542547], length 0
17:11:20.212575 IP (tos 0x0, ttl 111, id 11804, offset 0, flags [DF], proto TCP (6), length 40)
XXX.XXX.XXX.XXX.55058 > 172.31.12.47.80: Flags [F.], cksum 0x3b77 (correct), seq 744, ack 405, win 259, length 0
17:11:20.212839 IP (tos 0x0, ttl 111, id 11805, offset 0, flags [DF], proto TCP (6), length 52)
XXX.XXX.XXX.XXX.55069 > 172.31.12.47.80: Flags [S], cksum 0xc9cb (correct), seq 4012888626, win 64240, options [mss 1420,nop,wscale 8,nop,nop,sackOK], length 0
17:11:20.459739 IP (tos 0x0, ttl 111, id 11806, offset 0, flags [DF], proto TCP (6), length 52)
XXX.XXX.XXX.XXX.55070 > 172.31.12.47.80: Flags [S], cksum 0xd787 (correct), seq 1916158319, win 64240, options [mss 1420,nop,wscale 8,nop,nop,sackOK], length 0
17:11:21.219597 IP (tos 0x0, ttl 47, id 25897, offset 0, flags [DF], proto TCP (6), length 52)
XXX.XXX.XXX.XXX.51272 > 172.31.12.47.80: Flags [.], cksum 0x4702 (correct), seq 2220, ack 2185, win 4096, options [nop,nop,TS val 323795291 ecr 4027543025], length 0
17:11:21.221524 IP (tos 0x0, ttl 47, id 12413, offset 0, flags [DF], proto TCP (6), length 52)
XXX.XXX.XXX.XXX.51273 > 172.31.12.47.80: Flags [.], cksum 0xe66f (correct), seq 1157, ack 200, win 4129, options [nop,nop,TS val 323795291 ecr 4027543046], length 0
17:11:21.221548 IP (tos 0x0, ttl 47, id 40941, offset 0, flags [DF], proto TCP (6), length 52)
XXX.XXX.XXX.XXX.51274 > 172.31.12.47.80: Flags [.], cksum 0x6779 (correct), seq 1158, ack 199, win 4129, options [nop,nop,TS val 323795291 ecr 4027543047], length 0
17:11:22.010619 IP (tos 0x0, ttl 47, id 20698, offset 0, flags [DF], proto TCP (6), length 52)
XXX.XXX.XXX.XXX.51272 > 172.31.12.47.80: Flags [F.], cksum 0x43ff (correct), seq 2220, ack 2185, win 4096, options [nop,nop,TS val 323796061 ecr 4027543025], length 0
17:11:22.010687 IP (tos 0x0, ttl 47, id 21278, offset 0, flags [DF], proto TCP (6), length 52)
XXX.XXX.XXX.XXX.51273 > 172.31.12.47.80: Flags [F.], cksum 0xe36c (correct), seq 1157, ack 200, win 4129, options [nop,nop,TS val 323796061 ecr 4027543046], length 0
17:11:22.010780 IP (tos 0x0, ttl 47, id 37726, offset 0, flags [DF], proto TCP (6), length 52)
XXX.XXX.XXX.XXX.51274 > 172.31.12.47.80: Flags [F.], cksum 0x6477 (correct), seq 1158, ack 199, win 4129, options [nop,nop,TS val 323796060 ecr 4027543047], length 0
17:11:22.391572 IP (tos 0x0, ttl 47, id 30595, offset 0, flags [DF], proto TCP (6), length 52)
XXX.XXX.XXX.XXX.51273 > 172.31.12.47.80: Flags [F.], cksum 0xe208 (correct), seq 1125, ack 200, win 4129, options [nop,nop,TS val 323796449 ecr 4027543046], length 0
17:11:22.462590 IP (tos 0x0, ttl 47, id 9929, offset 0, flags [DF], proto TCP (6), length 40)
XXX.XXX.XXX.XXX.51273 > 172.31.12.47.80: Flags [R], cksum 0xf229 (correct), seq 3704030890, win 0, length 0
17:11:23.201564 IP (tos 0x0, ttl 111, id 11807, offset 0, flags [DF], proto TCP (6), length 52)
XXX.XXX.XXX.XXX.55069 > 172.31.12.47.80: Flags [S], cksum 0xc9cb (correct), seq 4012888626, win 64240, options [mss 1420,nop,wscale 8,nop,nop,sackOK], length 0
17:11:23.459562 IP (tos 0x0, ttl 111, id 11808, offset 0, flags [DF], proto TCP (6), length 52)
XXX.XXX.XXX.XXX.55070 > 172.31.12.47.80: Flags [S], cksum 0xd787 (correct), seq 1916158319, win 64240, options [mss 1420,nop,wscale 8,nop,nop,sackOK], length 0
Sometimes it takes 2 or 3 page refresh on the iPad to crash connection. Connections are dropped for ~2 - 5 minutes and then everything returns back to normal ...until the iPad is used.
Any hint on how to track down this issue is highly appreciated. To be honest - I am out of ideas...
Update #1
# sysctl -p
net.ipv4.ip_forward = 1
fs.file-max = 65536
net.ipv4.conf.all.rp_filter = 1
net.ipv4.tcp_synack_retries = 2
net.ipv4.ip_local_port_range = 2000 65535
net.ipv4.tcp_rfc1337 = 1
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_probes = 5
net.ipv4.tcp_keepalive_intvl = 15
net.core.rmem_default = 31457280
net.core.rmem_max = 12582912
net.core.wmem_default = 31457280
net.core.wmem_max = 12582912
net.core.somaxconn = 4096
net.core.netdev_max_backlog = 65536
net.core.optmem_max = 25165824
net.ipv4.tcp_mem = 65536 131072 262144
net.ipv4.udp_mem = 65536 131072 262144
net.ipv4.tcp_rmem = 8192 87380 16777216
net.ipv4.udp_rmem_min = 16384
net.ipv4.tcp_wmem = 8192 65536 16777216
net.ipv4.udp_wmem_min = 16384
net.ipv4.tcp_max_tw_buckets = 1440000
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
Update 2 as requested nginx.conf
user www-data;
#worker_processes 8; worker_processes 1;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
multi_accept on;
use epoll;
}
worker_rlimit_nofile 65536;
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
log_format scripts '$document_root$fastcgi_script_name > $request';
access_log /var/log/nginx/access.log main;
server_tokens off;
sendfile off;
tcp_nopush on;
tcp_nodelay on;
client_max_body_size 400M;
client_body_buffer_size 1m;
client_header_timeout 15;
keepalive_timeout 2 2;
# open_file_cache max=10000 inactive=5m;
# open_file_cache_valid 2m;
# open_file_cache_min_uses 5;
# open_file_cache_errors off;
send_timeout 15;
fastcgi_max_temp_file_size 0;
gzip on;
gzip_disable "msie6";
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_buffers 16 8k;
gzip_http_version 1.1;
gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascripti application/javascript;
server {
listen 80 default_server;
server_name _;
return 444;
include /etc/nginx/sites-enabled/*;
}
Virtualhost config (does not matter - all hosts are affected. When any vhost is accessed from the iPad, connection to server gets frozen)
server {
listen 80;
listen 443 ssl http2;
server_name ds.somehost.lv;
root "/www/ds.somehost.lv/html/public";
index index.html index.htm index.php;
charset utf-8;
location / {
try_files $uri $uri/ /index.php?$query_string;
}
location = /favicon.ico { access_log off; log_not_found off; }
location = /robots.txt { access_log off; log_not_found off; }
error_log /var/log/nginx/ds.somehost.app-error.log error;
sendfile off;
client_max_body_size 1000m;
location ~ \.php$ {
fastcgi_split_path_info ^(.+\.php)(/.+)$;
fastcgi_pass unix:/var/run/php/php7.0-fpm.sock;
fastcgi_index index.php;
include fastcgi_params;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_intercept_errors off;
fastcgi_buffer_size 16k;
fastcgi_buffers 4 16k;
fastcgi_connect_timeout 300;
fastcgi_send_timeout 300;
fastcgi_read_timeout 300;
}
location ~ /\.ht {
deny all;
}
}
access.log: Requests from windows machine:
XXX.XXX.XXX.XXX - - [08/Oct/2017:23:53:49 +0300] "GET /?asdfasd=asdfasd HTTP/1.1" 200 5430 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36" "-"
XXX.XXX.XXX.XXX - - [08/Oct/2017:23:53:52 +0300] "GET /?asdfasd=asdfasd HTTP/1.1" 200 5430 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36" "-"
XXX.XXX.XXX.XXX - - [08/Oct/2017:23:53:54 +0300] "GET /?asdfasd=asdfasd HTTP/1.1" 200 5430 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36" "-"
Request from iPad
XXX.XXX.XXX.XXX - - [08/Oct/2017:23:53:57 +0300] "GET /login HTTP/1.1" 200 967 "-" "Mozilla/5.0 (iPad; CPU OS 8_4_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12H321 Safari/600.1.4" "-"
XXX.XXX.XXX.XXX - - [08/Oct/2017:23:53:57 +0300] "GET /css/bootstrap.min.css HTTP/1.1" 304 0 "http://ds.somehost.lv/login" "Mozilla/5.0 (iPad; CPU OS 8_4_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12H321 Safari/600.1.4" "-"
XXX.XXX.XXX.XXX - - [08/Oct/2017:23:53:57 +0300] "GET /css/gentelella.min.css HTTP/1.1" 304 0 "http://ds.somehost.lv/login" "Mozilla/5.0 (iPad; CPU OS 8_4_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12H321 Safari/600.1.4" "-"
XXX.XXX.XXX.XXX - - [08/Oct/2017:23:53:57 +0300] "GET /css/font-awesome.min.css HTTP/1.1" 304 0 "http://ds.somehost.lv/login" "Mozilla/5.0 (iPad; CPU OS 8_4_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12H321 Safari/600.1.4" "-"
Here I am trying to connect to the same page from windows machine (request times out)
Trying to refresh the page from iPad - request instantly satisfied
XXX.XXX.XXX.XXX - - [08/Oct/2017:23:54:08 +0300] "GET /login HTTP/1.1" 200 967 "-" "Mozilla/5.0 (iPad; CPU OS 8_4_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12H321 Safari/600.1.4" "-"
XXX.XXX.XXX.XXX - - [08/Oct/2017:23:54:09 +0300] "GET /css/bootstrap.min.css HTTP/1.1" 304 0 "http://ds.somehost.lv/login" "Mozilla/5.0 (iPad; CPU OS 8_4_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12H321 Safari/600.1.4" "-"
XXX.XXX.XXX.XXX - - [08/Oct/2017:23:54:09 +0300] "GET /css/font-awesome.min.css HTTP/1.1" 304 0 "http://ds.somehost.lv/login" "Mozilla/5.0 (iPad; CPU OS 8_4_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12H321 Safari/600.1.4" "-"
XXX.XXX.XXX.XXX - - [08/Oct/2017:23:54:09 +0300] "GET /css/gentelella.min.css HTTP/1.1" 304 0 "http://ds.somehost.lv/login" "Mozilla/5.0 (iPad; CPU OS 8_4_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12H321 Safari/600.1.4" "-"
There are no errors logged within error log (syslog, kernel.log, nginx error log).
Update 3 Turns out, new connections are blocked for exactly 60 seconds sharp.