15

We have a small office with ~20 people, each using a MacBook, and optionally connecting with a mobile phone too. Previously we used usual Wi-Fi with a shared key, but recently I reconfigured it to WPA Enterprise, where all users received their own credentials: login/password pair. Authentication goes through a freeradius service running on an AWS EC2 box.

RADIUS server is not configured to use any certificates, every user has an entry in /etc/freeradius/users file that looks like this:

john.doe Cleartext-Password := "my_password"

RADIUS client has been configured in a minimalistic way - here is our /etc/freeradius/clients.conf

client RADIUSClient {
  ipaddr = <our office external IP>
  secret = <secret key shared with the Access Point>
  require_message_authenticator = no
}

This setup seems to work fine with all mobile phones and most of the MacBooks. MacBooks first complain about an untrusted self-signed certificate (which is understandable), yet after setting this certificate as trusted, everything works smoothly.

Yet some MacBooks, after getting successfully connected, start displaying authentication errors in random intervals (1-30 minutes):

Authentication failed on network “Network SSID”.
The authentication server is unresponsive. Contact your network administrator to check the network infrastructure.

There is a single "Disconnect" button in this dialog. Yet until the user presses this button, the MacBook stays perfectly connected. The window can be moved away from the screen, but it springs up to the center again and again, irritating the users. Clicking "Disconnect" disconnects the laptop from Wi-Fi, and then in a couple of seconds the Mac reconnects to the same network, leaving a successful login record in RADIUS server logs.

While trying to investigate, I saw that when connected to WPA Enterprise network, MacBook displays additional entry in network setting named 802.1X. When normally connected, it says "Authenticated via EAP-PEAP (MSCHAPv2)" all the time since connected (see screenshot). Hitting "Disconnect" button immediately disconnects the laptop from Wi-Fi.

On those laptops that have this problems with authentication issue window popping up, after some random period the "Authenticated via..." message disappears, and new authentication attempt starts (see screenshot). After some while the message changes to "Authentication server is not responding". I looked at RADIUS server logs: every time when a user connects to Wi-Fi, there is a successful authentication record, yet nothing gets logged during these authentication attempts displayed under "802.1X" section.

After several cycles between "Authenticating..." and "Authentication server is not responding" messages the dialog pops up.

Since this only happens on a couple of laptops, I don't think this is a server issue, but I have no idea how to fix the problem for those who have it. I didn't have it initially, but when I started experimenting with switching networks, deleting and re-creating networks, I managed to reproduce the issue, and now can't get rid of it :)

Can anyone please suggest the right direction of investigation?

UPDATE (03.03.2017). It was eventually decided to switch to an enterprise-class access point. We bought and installed UniFi APAC PRO, and the issue was gone.

Vlad Nikiforov
  • 441
  • 6
  • 15
  • 3
    Your RADIUS server really should be on premise, not in the cloud, if you can avoid it. The briefest interruption in Internet connectivity (which is common enough) can cause this to happen. – Michael Hampton Dec 08 '16 at 16:35
  • I also encounter exactly the same behaviour (still-working-internet-connection, dialog that pops up again and 802.1X status when error occurs.) I have a local radius server, so flaky connection AP -> radius server can't be the cause for me. – bas Dec 09 '16 at 14:39
  • I have a dual-band AP. Originally I used two separate ESSID's for each band. I started encountering this problem when I started using the same ESSID for both bands. Do you use multiple AP's (or bands) on the same ESSID? – bas Dec 09 '16 at 14:49
  • Actually yes, I forgot to mention that. Initially we had the same SSID for both bands, and there were more users that had this issue. After splitting the bands (I thought the root cause was hopping back and forth between the bands), for some of them the nagging popup disappeared, but there is still a couple who still experience this problem. – Vlad Nikiforov Dec 09 '16 at 15:12
  • I also encounter the problem now with separate SSID's. :( (Although less frequently.) – bas Dec 09 '16 at 15:52
  • I noticed that when I encounter the problem, the errors occur exactly 2 minutes after connecting. – bas Dec 10 '16 at 15:07
  • Which AP do you use btw? I have two RT-AC68U – bas Dec 24 '16 at 08:50
  • I tried two different APs, the symptoms seem to be identical on both: Linksys E3200 and ASUS RT-AC68R. – Vlad Nikiforov Jan 12 '17 at 15:23
  • @VladNikiforov I've reported the issue to ASUS so they can help me debug it. If you have time, it would be great if you could do so as well. – bas Jan 21 '17 at 22:55
  • @bas - thanks, did the same, let's see if there will be any outcome. – Vlad Nikiforov Jan 27 '17 at 13:13
  • ASUS gave standard directions to look at the client configuration. We eventually decided to buy a higher-class access point, which solved the issue. Wrote an update to the question. – Vlad Nikiforov Mar 03 '17 at 12:39
  • In the past, I have had issues with OS X having trouble with certain beacon intervals for the SSID. Try setting it to 400 on your AP and see if it resolves your issue. – James Shewey Sep 24 '17 at 01:35

3 Answers3

3

This is a well known bug in a mac, but they fixed that a few months ago. If you have the latest update check the router instead or buy a cheap router. Select extender mode when setting it up, then once set up change the name of the extender so its some sort of bridge to the current network. That way, when you connect to it the router keeps redirecting your connection to the enterprise network.

sysadmin1138
  • 131,083
  • 18
  • 173
  • 296
0

Another option is that the MacOS likes to scan for available networks periodically. To do this, there is a brief disconnection from your WiFi. There is a setting in the Mac to connect automatically to nearby networks and this can be turned off. There is also a wifi configuration (I can't recall it) to keep it from trying to jump from AP to AP frequently. These jumps can interrupt the network.

Kevin Buchs
  • 313
  • 1
  • 3
  • 19
0

Have you run through WiFi diagnostics on one of your affected Macs? It might reveal something outside your network, like a nearby access point that doesn't have its country code properly configured. This happened to us when FiveGuys moved in downstairs and set up an improperly-configured hotspot. Your switch to a UniFi AP while a good choice could still be covering up the root cause.