Wireguard not completing handshake

Question

I have two Debian GNU/Linux systems (bullseye/sid), both running wireguard on port 23456, both behind NAT. Both run a kernel version > 5.6 (wireguard mainlined).

System A is the server, and it dynamically updates a dedicated "A record" in the authoritative nameserver for its internet domain, with the correct public IP address its internet facing router A (ZyWALL USG 100 firewall) is assigned with. It does so once every minute, but the public IP address actually changes only on reboot of the router/firewall, which basically never happens.

System B is behind VDSL router B and it acts as wireguard client, pointing to the dynamically updated "A record" and port 33456. Router B is a consumer grade VDSL router and it allows everything in outbound direction, only replies inbound.

Router/firewall A (ZyWALL USG 100) is configured to allow UDP packets on port 23456 through it and forwards them to server A. Here is the relevant configuration screen:

Here is the server A wireguard configuration file (keys in this snippet, despite being valid, aren't the real ones):

[Interface]
Address = 10.31.33.100/24, fc00:31:33::1/64
ListenPort = 23456
PrivateKey = iJE/5Qy4uO55uUQg8nnDKQ/dFT1MEq+tDfFXrGNj3GY=
# PreUp = iptables -t nat -A POSTROUTING -s 10.31.33.0/24  -o enp1s0 -j MASQUERADE; ip6tables -t nat -A POSTROUTING -s fc00:31:33::/64 -o enp1s0 -j MASQUERADE
# PostDown = iptables -t nat -D POSTROUTING -s 10.31.33.0/24  -o enp1s0 -j MASQUERADE; ip6tables -t nat -D POSTROUTING -s fc00:31:33::/64 -o enp1s0 -j MASQUERADE

# Simon
[Peer]
PublicKey = QnkTJ+Qd9G5EybA2lAx2rPNRkxiQl1W6hHeEFWgJ0zc=
AllowedIPs = 10.31.33.211/32, fc00:31:33::3/128

And here is client B wireguard configuration (again, keys and domain aren't the real ones):

[Interface]
PrivateKey = YA9cRlF4DgfUojqz6pK89poB71UFoHPM6pdMQabWf1I=
Address = 10.31.33.211/32

[Peer]
PublicKey = p62kU3HoXLJACI4G+9jg0PyTeKAOFIIcY5eeNy31cVs=
AllowedIPs = 10.31.33.0/24, 172.31.33.0/24
Endpoint = wgsrv.example.com:33456
PersistentKeepalive = 25

Here is a dirty diagram that depicts the situation:

Client B -> LAN B -> VDSL Router B (NAT) -> the internet -> ZyWALL (NAT) -> LAN A -> Server A

Starting wireguard on both systems does not establish the VPN connection. Activating debug messages on the client and adding a LOG rule into iptables, that logs OUTPUT packets, I get lots of these:

[414414.454367] IN= OUT=wlp4s0 SRC=10.150.44.32 DST=1.2.3.4 LEN=176 TOS=0x08 PREC=0x80 TTL=64 ID=2797 PROTO=UDP SPT=36883 DPT=33456 LEN=156 
[414419.821744] wireguard: wg0-simon: Handshake for peer 3 (1.2.3.4:33456) did not complete after 5 seconds, retrying (try 2)
[414419.821786] wireguard: wg0-simon: Sending handshake initiation to peer 3 (1.2.3.4:33456)

I've added a LOG iptables rule to the server, in order to diagnose router configuration problems.

root@wgserver ~ # iptables -t nat -I INPUT 1 -p udp --dport 23456 -j LOG

It logs the wireguard packets received from the client (but I can't tell if they are somehow invalid or incomplete):

[ 1412.380826] IN=enp1s0 OUT= MAC=6c:62:6d:a6:5a:8e:d4:60:e3:e0:23:30:08:00 SRC=37.161.119.20 DST=10.150.44.188 LEN=176 TOS=0x08 PREC=0x00 TTL=48 ID=60479 PROTO=UDP SPT=8567 DPT=23456 LEN=156 
[ 1417.509702] IN=enp1s0 OUT= MAC=6c:62:6d:a6:5a:8e:d4:60:e3:e0:23:30:08:00 SRC=37.161.119.20 DST=10.150.44.188 LEN=176 TOS=0x08 PREC=0x00 TTL=48 ID=61002 PROTO=UDP SPT=8567 DPT=23456 LEN=156

so I'm inclined to assume the A router (ZyWALL USG 100) was correctly configured to let the packets come into the server local network. To confirm that assumption, I've even tried replacing the ZyWALL with another consumer grade router and moving the server over a different internet connection, but the problem is still there, so I'm sure the problem is not the firewall, nor its specific internet connection.

Here is the server network configuration, just in case it matters:

auto lo
iface lo inet loopback

auto enp1s0
iface enp1s0 inet static
    address 10.150.44.188/24
    gateway 10.150.44.1

On top of that, other wireguard VPN tunnels DO work correctly using the same client, same VDSL router (client-side), same internet connection, similar server configuration (obviouisly different keys and domain), similar firewall configuration (server-side, different firewall model).

Don't use a port number in the range 33434-33533 or so. These are used by UDP traceroute and some Internet routers drop packets to/from these port numbers. — Michael Hampton, Oct 26 '20 at 22:55
@MichaelHampton I've just tried switching to port 23456 after your comment on the client, the server, the Zyxel router and the `iptables` rules: same result. — Lucio Crusca, Oct 26 '20 at 23:03
You should check the firewall on the server. It doesn't appear to be allowing the traffic through. — Michael Hampton, Oct 26 '20 at 23:16
@MichaelHampton so how the nmap packets manage to traverse it? — Lucio Crusca, Oct 26 '20 at 23:18
Why do you think the nmap packets traversed the firewall? They were only logged. — Michael Hampton, Oct 26 '20 at 23:20
Of course they were, you told it to log them! That doesn't mean they weren't dropped later. — Michael Hampton, Oct 26 '20 at 23:34
I'm afraid you didn't quite get it: nmap packets reach the server, so they clearly went through the ZyWALL, while wireguard packets do not even reach it. What will be of those packets AFTER they reach the server and AFTER they get logged is another problem, eventually. Here I'm trying to understand why wireguard packets do not even reach the server while nmap packets manage to, and while all of them are generated from the same client outside the server network, which is protected by the ZyWALL. — Lucio Crusca, Oct 26 '20 at 23:39
You're right, I didn't get that detail. Now I see that you have logged output traffic at the client. But does it get past the client's own host firewall? Does it get past the upstream NAT device on the client's network? — Michael Hampton, Oct 26 '20 at 23:49
Well, nmap packets go past the client host firewall for sure, because they even reach the server. Wireguard packets go past the client host firewall for sure, because what you see in the client dmesg snippet its them being logged by iptables. I assume the VDSL router lets all of them through, because they all (both wireguard and nmap) are UDP and destination port 33456, but I don't really know how to verify that, except I have other wireguard vpn connections working on this same client and network, so I can assume the VDSL router behaves always the same for all of them. — Lucio Crusca, Oct 26 '20 at 23:58
There can be UDP or generic offloading done too, which gets the NIC to do the checksum. If the UDP checksum is bad, then router B should have discarted it (considering it's doing NAT so probably checks UDP ports), so it's quite strange for this to go undetected. Do a DNS query and check if the checksum of the DNS UDP query packet appears good or not. — A.B, Oct 27 '20 at 13:11
@A.B I couldn't quite get the logic behind your reasoning, but I decided to blindly rely on you and tried a few DNS queries. The result is some packets have a good checksum, other have a bad cheksum. Regardless, my DNS queries always worked without any error. — Lucio Crusca, Oct 27 '20 at 21:08
I'm not definitive about the checksum performed by the NIC (especially a wireless NIC), but by logic this makes the checksum not be a good enough reason to blame. normally you should take measurements on the other nodes involved: the routers starting with router B, but I understand this might not be possible. Then you should just try and reduce your setup to a smaller scale tu rule out some possibilities (eg: the checksum), like wireguard between two systems in the same lan, then wireguard to an other location than the one behind router A etc. — A.B, Oct 27 '20 at 21:16
It might be stupid, but did you try to create new server keys, client keys, and retry? Wireguard can act exactly like this when the profiles are wrong. — setenforce 1, Nov 03 '20 at 18:38
@Sanael: thank you very much! I enabled WG debugging on the server after your comment and found cryptic error messages there, that show that the server was actually receiving packets and my iptables logging rule wasn't catching them! Looking for those cryptic wg log messages (`Invalid handshake initiation from :`) on Google led me to this page https://www.the-digital-life.com/wiki/wireguard-troubleshooting/ that confirmed your suspects. I've recreated the keys and it now works! Please turn your comment into an answer so I can accept it and assign it the bounty. — Lucio Crusca, Nov 04 '20 at 08:40

score 9 · Accepted Answer · answered Nov 04 '20 at 09:01

9

It might be stupid, but did you try to create new server keys, client keys, and retry? Wireguard can act exactly like this when the profiles are wrong.

answered Nov 04 '20 at 09:01

setenforce 1

928
5
7

1

As commented above that was exactly the problem. After recreating the keys it started working. – Lucio Crusca Nov 04 '20 at 13:10
It's somewhat counter-intuitive to me, but, for me at least, it tends to less error-prone to use a desktop Wireguard GUI to generate the keys than working it out using "wireguard-tools" CLI utilities. Leaving this memo here as a tip in case CLI-based configurations are leaving other folks stranded. – starlocke Dec 19 '21 at 22:19
What is the key type? – Ori Wiesel Apr 01 '22 at 09:16

score 5 · Answer 2 · answered Oct 27 '20 at 00:16

5

OK, you mentioned that the client is on VDSL, so I suspect you have an MTU problem.

The normal MTU of a wired (and these days, wireless) network connection is 1500 bytes, but on *DSL the PPPoE layer takes up 8 bytes, making the usable MTU actually 1492. (It's also possible your network connection has been set to an even lower MTU.)

Wireguard's packet overhead is 80 bytes, meaning the tunnel MTU is 1420 by default. Try lowering this by the same 8 bytes, to 1412. (Or lower if you already had a lower MTU than 1492.)

You also need to have the client to tell the server to lower its MTU on tunnelled packets. This can be done with an iptables rule.

On the client side wg0.conf you will need something like:

[Interface]
MTU = 1412
PostUp = iptables -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu
PostDown = iptables -D FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu
;....the rest

answered Oct 27 '20 at 00:16

Michael Hampton

237,123
42
477
940

Just a thought: why other wireguard vpn tunnels do work on my client withtout that MTU workaround? Anyway: no. I've tried your solution, but it still doesn't work. Thanks regardless. – Lucio Crusca Oct 27 '20 at 00:25
OK. I will leave this here for other readers who may have a similar problem. – Michael Hampton Oct 27 '20 at 01:01
I've just edited my question with my latest findings – Lucio Crusca Oct 27 '20 at 01:08
This actually worked for me, thank you @MichaelHampton ! – Tasos Bitsios Apr 11 '22 at 18:43

Wireguard not completing handshake

2 Answers2