0

I'm facing an issue with setup I am using for ocassionally doing maintenance on a bunch of customer servers via remote SSH

Following Setup:

1 Control Server X Arbitrary number of Customer servers set up to have a 'service' account connect to my control server via SSH.

I've set up the clients to automatically connect to the control server, which has a fixed IP, via the service account using autoSSH after bootup. This is my /etc/ssh/ssh_config on the customer machine:

# This is the ssh client system-wide configuration file.  See
# ssh_config(5) for more information.  This file provides defaults for
# users, and the values can be changed in per-user configuration files
# or on the command line.

# Configuration data is parsed as follows:
#  1. command line options
#  2. user-specific file
#  3. system-wide file
# Any configuration value is only changed the first time it is set.
# Thus, host-specific definitions should be at the beginning of the
# configuration file, and defaults at the end.

# Site-wide defaults for some commonly used options.  For a comprehensive
# list of available options, their meanings and defaults, please see the
# ssh_config(5) man page.

Host *
#   ForwardAgent no
#   ForwardX11 no
#   ForwardX11Trusted yes
#   PasswordAuthentication yes
#   HostbasedAuthentication no
#   GSSAPIAuthentication no
#   GSSAPIDelegateCredentials no
#   GSSAPIKeyExchange no
#   GSSAPITrustDNS no
#   BatchMode no
#   CheckHostIP yes
#   AddressFamily any
#   ConnectTimeout 0
#   StrictHostKeyChecking ask
#   IdentityFile ~/.ssh/id_rsa
#   IdentityFile ~/.ssh/id_dsa
#   IdentityFile ~/.ssh/id_ecdsa
#   IdentityFile ~/.ssh/id_ed25519
#   Port 22
#   Protocol 2
#   Ciphers aes128-ctr,aes192-ctr,aes256-ctr,aes128-cbc,3des-cbc
#   MACs hmac-md5,hmac-sha1,umac-64@openssh.com
#   EscapeChar ~
#   Tunnel no
#   TunnelDevice any:any
#   PermitLocalCommand no
#   VisualHostKey no
#   ProxyCommand ssh -q -W %h:%p gateway.example.com
#   RekeyLimit 1G 1h
    SendEnv LANG LC_*
    HashKnownHosts yes
    GSSAPIAuthentication yes
    ServerAliveInterval 300

On the control server I am using the following sshd_config:

#       $OpenBSD: sshd_config,v 1.103 2018/04/09 20:41:22 tj Exp $

# This is the sshd server system-wide configuration file.  See
# sshd_config(5) for more information.

# This sshd was compiled with PATH=/usr/bin:/bin:/usr/sbin:/sbin

# The strategy used for options in the default sshd_config shipped with
# OpenSSH is to specify options with their default value where
# possible, but leave them commented.  Uncommented options override the
# default value.

Port --hidden--
#AddressFamily any
#ListenAddress 0.0.0.0
#ListenAddress ::

#HostKey /etc/ssh/ssh_host_rsa_key
#HostKey /etc/ssh/ssh_host_ecdsa_key
#HostKey /etc/ssh/ssh_host_ed25519_key

# Ciphers and keying
#RekeyLimit default none

# Logging
#SyslogFacility AUTH
#LogLevel INFO

# Authentication:

#LoginGraceTime 2m
#StrictModes yes
#MaxAuthTries 6
#MaxSessions 10

#PubkeyAuthentication yes

# Expect .ssh/authorized_keys2 to be disregarded by default in future.
#AuthorizedKeysFile     .ssh/authorized_keys .ssh/authorized_keys2

#AuthorizedPrincipalsFile none

#AuthorizedKeysCommand none
#AuthorizedKeysCommandUser nobody

# For this to work you will also need host keys in /etc/ssh/ssh_known_hosts
#HostbasedAuthentication no
# Change to yes if you don't trust ~/.ssh/known_hosts for
# HostbasedAuthentication
#IgnoreUserKnownHosts no
# Don't read the user's ~/.rhosts and ~/.shosts files
#IgnoreRhosts yes

# To disable tunneled clear text passwords, change to no here!
#PermitEmptyPasswords no

# Change to yes to enable challenge-response passwords (beware issues with
# some PAM modules and threads)
ChallengeResponseAuthentication no

# Kerberos options
#KerberosAuthentication no
#KerberosOrLocalPasswd yes
#KerberosTicketCleanup yes
#KerberosGetAFSToken no

# GSSAPI options
#GSSAPIAuthentication no
#GSSAPICleanupCredentials yes
#GSSAPIStrictAcceptorCheck yes
#GSSAPIKeyExchange no

# Set this to 'yes' to enable PAM authentication, account processing,
# and session processing. If this is enabled, PAM authentication will
# be allowed through the ChallengeResponseAuthentication and
# PAM authentication via ChallengeResponseAuthentication may bypass
# If you just want the PAM account and session checks to run without
# and ChallengeResponseAuthentication to 'no'.
UsePAM yes

#AllowAgentForwarding yes
AllowTcpForwarding yes
GatewayPorts yes
X11Forwarding yes
#X11DisplayOffset 10
#X11UseLocalhost yes
#PermitTTY yes
PrintMotd no
#PrintLastLog yes
#TCPKeepAlive yes
#PermitUserEnvironment no
#Compression delayed
ClientAliveInterval 30
ClientAliveCountMax 99999
#UseDNS no
#PidFile /var/run/sshd.pid
#MaxStartups 10:30:100
#PermitTunnel no
#ChrootDirectory none
#VersionAddendum none

# no default banner path
#Banner none

# Allow client to pass locale environment variables
AcceptEnv LANG LC_*

# override default of no subsystems
Subsystem       sftp    /usr/lib/openssh/sftp-server

# Example of overriding settings on a per-user basis
#Match User anoncvs
#       X11Forwarding no
#       AllowTcpForwarding no
#       PermitTTY no
#       ForceCommand cvs server

PasswordAuthentication no
PermitRootLogin yes

Basically I would expect the servers to just keep the connections open, since both sides have enough timeouts set. However, the connections randomly keep dropping. I've checked /var/log/syslog and it seems like sshd randomly drops one of the active connections once a new connection comes in. So I'm pretty sure I'm hitting some connection limit here:

Nov 26 18:38:38 v2202102140578142103 systemd[1]: session-115234.scope: Succeeded.
Nov 26 18:38:38 v2202102140578142103 systemd[1]: Started Session 115376 of user service.
Nov 26 18:38:47 v2202102140578142103 systemd[1]: session-115235.scope: Succeeded.
Nov 26 18:38:47 v2202102140578142103 systemd[1]: Started Session 115377 of user service.
Nov 26 18:38:52 v2202102140578142103 systemd[1]: session-115236.scope: Succeeded.
Nov 26 18:38:53 v2202102140578142103 systemd[1]: Started Session 115378 of user service.
Nov 26 18:39:08 v2202102140578142103 systemd[1]: session-115237.scope: Succeeded.
Nov 26 18:39:08 v2202102140578142103 systemd[1]: Started Session 115379 of user service.
Nov 26 18:39:08 v2202102140578142103 systemd[1]: session-115238.scope: Succeeded.
Nov 26 18:39:08 v2202102140578142103 systemd[1]: Started Session 115380 of user service.
Nov 26 18:39:09 v2202102140578142103 systemd[1]: session-115239.scope: Succeeded.
Nov 26 18:39:09 v2202102140578142103 systemd[1]: Started Session 115381 of user service.
Nov 26 18:39:14 v2202102140578142103 systemd[1]: session-115240.scope: Succeeded.
Nov 26 18:39:15 v2202102140578142103 systemd[1]: Started Session 115382 of user service.
Nov 26 18:39:31 v2202102140578142103 systemd[1]: session-115241.scope: Succeeded.
Nov 26 18:39:31 v2202102140578142103 systemd[1]: session-115242.scope: Succeeded.
Nov 26 18:39:31 v2202102140578142103 systemd[1]: Started Session 115383 of user service.
Nov 26 18:39:31 v2202102140578142103 systemd[1]: Started Session 115384 of user service.
Nov 26 18:39:32 v2202102140578142103 systemd[1]: session-115243.scope: Succeeded.
Nov 26 18:39:33 v2202102140578142103 systemd[1]: Started Session 115385 of user service.

Probably something super simple to fix, but I'm not a linux networking expert, and I wasn't able to find anything useful via own research. So hopefilly someone is able to point me to the limit I have to change for this behaviour to stop?

Thanks in advance!

Corsair
  • 101
  • Not an answer, but I strongly suggest you consider using an actual VPN, in particular something like wireguard which is good at re-establishing connections if there was any network issues. – Zoredache Nov 27 '21 at 04:52

0 Answers0