802.1x Roaming Issue

  • 1
  • Question
  • Updated 3 years ago
  • (Edited)
Bear with me as it's complicated to explain what we're seeing happen.

When a client connects to switch A, which is stacked with the switch that our firewall, dns, dhcp, and active directory servers are on, everything works as intended.

If the client roams to another AP on switch A, it works fine.

If that client roams to switch B (or any other switch), the roaming appears to work, as it says "connected" and gets an IP address, but the client cannot get anywhere (can't even ping the gateway). After about 5-10 minutes of waiting, the client starts working perfectly.

If the client does the initial connection on an AP on switch B, it works fine. When they roam to an AP on switch A, it works fine. If they then roam off switch A (back to B, or another switch), it stops working again for another 5-10 minutes.

If the client joins the same SSID as a different user, it works right away (bypasses the 5-10 mins of waiting). If the client joins a different SSID, then back to the original (same user), it works right away - even if the different SSID is on the same VLAN and gives the client the same IP.

Any ideas?
Photo of Eric

Eric

  • 4 Posts
  • 4 Reply Likes
  • frustrated

Posted 3 years ago

  • 1
Photo of Ruwan Indika

Ruwan Indika

  • 66 Posts
  • 22 Reply Likes
Hi Eric,

Please add a client mac address to the client monitor, reproduce the issue, the log should show what the client is doing when it roam to the AP in switch B,




(Edited)
Photo of Eric

Eric

  • 4 Posts
  • 4 Reply Likes
-------
Initial auth
-------
04/10/2015 07:58:41 AM  9CD91758ACBE  F09CE902C9E9  Rodney_2F_TechOfc  INFO    (1241)IEEE802.1X auth is starting (at if=wifi1.2)
04/10/2015 07:58:41 AM  9CD91758ACBE  F09CE902C9E9  Rodney_2F_TechOfc  INFO    (1242)PMK is got from local cache (at if=wifi1.2)
04/10/2015 07:58:41 AM  9CD91758ACBE  F09CE902C9E9  Rodney_2F_TechOfc  INFO    (1243)Sending 1/4 msg of 4-Way Handshake (at if=wifi1.2)
04/10/2015 07:58:41 AM  9CD91758ACBE  F09CE902C9E9  Rodney_2F_TechOfc  INFO    (1244)Received 2/4 msg of 4-Way Handshake (at if=wifi1.2)
04/10/2015 07:58:41 AM  9CD91758ACBE  F09CE902C9E9  Rodney_2F_TechOfc  INFO    (1245)Sending 3/4 msg of 4-Way Handshake (at if=wifi1.2)
04/10/2015 07:58:41 AM  9CD91758ACBE  F09CE902C9E9  Rodney_2F_TechOfc  INFO    (1246)Received 4/4 msg of 4-Way Handshake (at if=wifi1.2)
04/10/2015 07:58:41 AM  9CD91758ACBE  F09CE902C9E9  Rodney_2F_TechOfc  INFO    (1247)PTK is set (at if=wifi1.2)
04/10/2015 07:58:41 AM  9CD91758ACBE  F09CE902C9E9  Rodney_2F_TechOfc  BASIC   (1248)Authentication is successfully finished (at if=wifi1.2)
04/10/2015 07:58:41 AM  9CD91758ACBE  F09CE902C9E9  Rodney_2F_TechOfc  INFO    (1249)station sent out DHCP REQUEST message
04/10/2015 07:58:42 AM  9CD91758ACBE  F09CE902C9E9  Rodney_2F_TechOfc  INFO    (1250)DHCP server sent out DHCP ACKNOWLEDGE message to station
04/10/2015 07:58:42 AM  9CD91758ACBE  F09CE902C9E9  Rodney_2F_TechOfc  BASIC   (1251)DHCP session completed for station
04/10/2015 07:58:42 AM  9CD91758ACBE  F09CE902C9E9  Rodney_2F_TechOfc  BASIC   (1252)IP 10.120.0.2 assigned for station
-------
Moving to new location
-------
04/10/2015 08:00:47 AM  9CD91758ACBE  4018B18EBBA9  Rodney_2F_BrdRm        BASIC   (1764)Rx auth <open> (frame 1, rssi 0dB)
04/10/2015 08:00:47 AM  9CD91758ACBE  4018B18EBBA9  Rodney_2F_BrdRm        BASIC   (1765)Tx auth <open> (frame 2, status 0, pwr 11dBm)
04/10/2015 08:00:47 AM  9CD91758ACBE  4018B18EBBA9  Rodney_2F_BrdRm        BASIC   (1766)Rx assoc req (rssi 45dB)
04/10/2015 08:00:47 AM  9CD91758ACBE  4018B18EBBA9  Rodney_2F_BrdRm        BASIC   (1767)Tx assoc resp <accept> (status 0, pwr 11dBm)
04/10/2015 08:00:47 AM  9CD91758ACBE  4018B18EBBA9  Rodney_2F_BrdRm        INFO    (1768)IEEE802.1X auth is starting (at if=wifi1.2)
04/10/2015 08:00:47 AM  9CD91758ACBE  4018B18EBBA9  Rodney_2F_BrdRm        INFO    (1769)PMK is got from roaming cache (at if=wifi1.2)
04/10/2015 08:00:47 AM  9CD91758ACBE  4018B18EBBA9  Rodney_2F_BrdRm        INFO    (1770)Sending 1/4 msg of 4-Way Handshake (at if=wifi1.2)
04/10/2015 08:00:47 AM  9CD91758ACBE  4018B18EBBA9  Rodney_2F_BrdRm        INFO    (1771)Received 2/4 msg of 4-Way Handshake (at if=wifi1.2)
04/10/2015 08:00:47 AM  9CD91758ACBE  4018B18EBBA9  Rodney_2F_BrdRm        INFO    (1772)Sending 3/4 msg of 4-Way Handshake (at if=wifi1.2)
04/10/2015 08:00:47 AM  9CD91758ACBE  4018B18EBBA9  Rodney_2F_BrdRm        INFO    (1773)Received 4/4 msg of 4-Way Handshake (at if=wifi1.2)
04/10/2015 08:00:47 AM  9CD91758ACBE  4018B18EBBA9  Rodney_2F_BrdRm        INFO    (1774)PTK is set (at if=wifi1.2)
04/10/2015 08:00:47 AM  9CD91758ACBE  4018B18EBBA9  Rodney_2F_BrdRm        BASIC   (1775)Authentication is successfully finished (at if=wifi1.2)
--------
<broadcast packets>
--------
04/10/2015 08:00:51 AM  9CD91758ACBE  F09CE902C9E9  Rodney_2F_TechOfc      BASIC   (2237)Sta(at if=wifi1.2) is de-authenticated because of STA roam away
04/10/2015 08:00:51 AM  9CD91758ACBE  F09CE902C9E9  Rodney_2F_TechOfc      INFO    (2238)roam away
04/10/2015 08:00:51 AM  9CD91758ACBE  F09CE902C9E9  Rodney_2F_TechOfc      BASIC   (2239)Sta(at if=wifi1.2) is de-authenticated because of notification of driver
--------
<broadcast packets>
--------
04/10/2015 08:00:55 AM  9CD91758ACBE  4018B18EBBA9  Rodney_2F_BrdRm        INFO    (1816)IPv6 address fe80::41b6:d57c:5232:ab59 is snooped.
--------
<broadcast packets> 
--------
Turning off WiFi on the device
--------
04/10/2015 08:01:21 AM  9CD91758ACBE  4018B18EBBA9  Rodney_2F_BrdRm        BASIC   (1845)Sta(at if=wifi1.2) is de-authenticated because of notification of driver
Photo of Eric

Eric

  • 4 Posts
  • 4 Reply Likes
Hmm... looks like it IS affecting other SSIDs, not just the RADIUS authenticated one. Looks like only VLANs are affected.
Photo of Eric

Eric

  • 4 Posts
  • 4 Reply Likes
Case closed!

Three days of troubleshooting later, we finally found the culprit - ARP! The switch stack in closet A was not refreshing its ARP table properly, causing it (and the servers attached to it) to think that a wireless client was always connected to the switch A APs. When we manually cleared the ARP cache after a client roamed to a different AP, the client started working immediately. We changed the ARP cache timeout to 30 seconds, and it has helped drastically reduce the amount of time it takes to fully roam - from 5 minutes to a few seconds.

Now to figure out why ARP wasn't updating properly!
(Edited)
Photo of Mike Kouri

Mike Kouri, Official Rep

  • 1030 Posts
  • 271 Reply Likes
Thanks for coming back and letting us know the root cause!