clients staying connected to ap but lose lan/wan connectivity

  • 5
  • Question
  • Updated 4 years ago
  • Answered
So we recently did some switch upgrades to cisco catalyst 2960s and expanded our wifi coverage. We've noticed users have no problem getting connected but over the duration of their connection they will have random dropouts of connectivity to the lan/wan but they maintain connectivity to the ap. Anyone have similar issues?
Photo of phil stokes

phil stokes

  • 10 Posts
  • 1 Reply Like

Posted 5 years ago

  • 5
Photo of Sarah Banks

Sarah Banks

  • 75 Posts
  • 4 Reply Likes
Phil, might I suggest you work with Support? A connection to the AP with a loss of network connectivity could be any number of things; I'd want to start a ping to something that should be reachable, or a traceroute to a known host, for example, but if you're experiencing outages, support should be able to assist you the quickest.
Photo of phil stokes

phil stokes

  • 10 Posts
  • 1 Reply Like
This is happening in all our facilities I might point out (we did switch upgrades in each facility as well) I've done pings during the outage with no success troubleshooting wise.
Photo of Sarah Banks

Sarah Banks

  • 75 Posts
  • 4 Reply Likes
Phil, thanks for the update. I still believe Support can assist you far more expeditiously and easily than the community can. They'll most likely walk through the configuration, assess the network connectivity (pings from where? Pings from a client on the AP wouldn't be expected to work, I suppose, right? :)) I encourage you to give Support a call. I've seen no known issue or previous incident reported that would lead me to believe the scenario above shouldn't work.
Photo of Ryan Powell

Ryan Powell

  • 17 Posts
  • 3 Reply Likes
Hi Phil,

Did you ever get a resolution to this? We're having the exact same issue and as of yet our wireless supplier / Aerohive support have been unable to resolve it.

Regards,
Ryan
Photo of phil stokes

phil stokes

  • 10 Posts
  • 1 Reply Like
Ryan-I did thanks to aerohive support. What firmware version are you running on your aps? It's a bug in 6.1 I had to roll mine back to 5.1 I'm told fix will either be a patch or rolled into next release aerohive isn't sure.
Photo of Ryan Powell

Ryan Powell

  • 17 Posts
  • 3 Reply Likes
Hi Phil,

We were on 6.1 but have downgraded back to 5.1, still doesn't seem to help. Ping tests show we miss 2 to 3 pings every minute and streaming services only last a couple of minutes before dropping out. This is only about a month old and we've never had it working reliably.
Photo of Ryan Powell

Ryan Powell

  • 17 Posts
  • 3 Reply Likes
Can you confirm the full Firmware version you've downgraded to?
Photo of phil stokes

phil stokes

  • 10 Posts
  • 1 Reply Like
Hive manager is on 6.1r1 ap's are on 5.1r5.1089
Photo of Ryan Powell

Ryan Powell

  • 17 Posts
  • 3 Reply Likes
Thanks, we were on 5.1R4, my colleague and I are downgrading another couple of APs to that firmware and will test.

Thanks for the fast replies, fingers crossed you can fix something that AeroHive support have tried for 4 weeks to resolve.
Photo of phil stokes

phil stokes

  • 10 Posts
  • 1 Reply Like
let me know if that does the trick for you! seems to be holding up good for me.
Photo of Steve Folk

Steve Folk

  • 4 Posts
  • 0 Reply Likes
This problem should only be affecting AP120's . I have the same problem as Phil I have 3000 AP's (300 120's and 2700 121's) after downgrading all my 120's to 5.1r5.1089 the issue is gone.
Photo of phil stokes

phil stokes

  • 10 Posts
  • 1 Reply Like
@steve support didn't mention that it was only with 120's so I was assuming all models but now that you mention it and I look my users reporting the issue are all on 120's.
Photo of Ryan Powell

Ryan Powell

  • 17 Posts
  • 3 Reply Likes
Hi guys,

Thanks for the messages, we downgraded all the APs over the weekend and my colleague made some changes to the auto-channel selections settings. This seemed to help but we're still seeing drop outs on IP connectivity - losing pings , streaming services fail fairly quickly, etc,

Pinging the AP that a device is connected to is fine and it never loses a ping - even when the devices on that AP do. The physical LAN seems fine, wired hosts work flawlessly and we've not had any problems there. It's only wireless clients of all description - laptops, desktops, iPhones, iPads etc,

We're running out of ideas now.
Photo of Ryan Powell

Ryan Powell

  • 17 Posts
  • 3 Reply Likes
Additionally, we've had a full wireless survey performed prior to installation of the wireless network and there is very little sign of interference or other causes of signal degradation.
Photo of Nick Lowe

Nick Lowe, Official Rep

  • 2491 Posts
  • 451 Reply Likes
I suggest that you try troubleshooting with:

1) Aerohive's excellent Client Monitor.
2) Wireshark on a client that is able to enter both monitor and promiscuous mode. (Under Windows you would have to use Network Monitor instead as WinPcap does not support a recent enough revision of NDIS to do this.)
3) Wireshark (another instance) in promiscuous mode behind an access point, seeing traffic via appropriately configured port mirroring or a 100Mb/s hub that is temporarily inserted.

This will allow you to drill down and isolate exactly where the fault lies. (You may be surprised what you find, it could have nothing to do with the access point itself.)
Photo of phil stokes

phil stokes

  • 10 Posts
  • 1 Reply Like
hmm strange stuff indeed happening there. Sounds rather similar to our issue but the firmware downgrade has solved that. Any slight chance it could be a switch config issue? is your wireless on a seperate vlan?
Photo of Nick Lowe

Nick Lowe, Official Rep

  • 2438 Posts
  • 445 Reply Likes
I am curious what steps were taken to conclude that it is a fault with the access point where the more recent software is in use. Only anecdotal correlation has been mentioned in this thread but nothing that shows actual causality.

Cheers,

Nick
Photo of Brian Powers

Brian Powers, Champ

  • 384 Posts
  • 88 Reply Likes
I was curious of that too as I couldnt find anything in the HiveOS release notes mentioning an issue with code for the AP120s.
Photo of Brian Powers

Brian Powers, Champ

  • 396 Posts
  • 92 Reply Likes
You said that when you ping the AP from the client, it always works? Never loses a ping? Wired clients work fine.

If you SSH into an AP and ping your gateway, what are the results? Sounds to me like the issue is not client connectivity (based on your prior statement), but once the traffic is getting dumped on the wire, something is happening.

Maybe go so far as to get multiple pings going on a wireless client:
PC -> AP
PC -> Default Gateway
PC -> public host (8.8.8.8, etc.)

Which one loses packets, if any?

Do the same thing from an AP.

Other things to note:
Did you create custom radio profiles when the installation was done? I've seem some rare occurrences where some settings in there caused some issues with clients (this was typically older, legacy clients, not iOS devices and newer clients).

Maybe try setting things to the most absolute basic as possible (even using just the default radio_g and/or radio_a as opposed to radio_ng/na profiles). Set an SSID w/out encryption. Try to find a scenario where things work. Then make incremental changes to the network to find the root of the cause.

Is the timing of the drops completely random, or does it seem to occur at specific times/areas.
Photo of Nick Lowe

Nick Lowe, Official Rep

  • 2491 Posts
  • 451 Reply Likes
I think looking for evidence of root cause with appropriate tools would, perhaps, be preferable to changing settings as you suggest. It does depend on having know how to do this, however.
Photo of Brian Powers

Brian Powers, Champ

  • 396 Posts
  • 92 Reply Likes
I agree with you, but I'm unaware of their Wireshark prowess and was trying to give alternatives that may at least give us more information to assist.
Photo of phil stokes

phil stokes

  • 10 Posts
  • 1 Reply Like
I came to this conclusion in my environment after doing some monitoring of specific clients from the controller side and the client side. Also after hearing from Steve about the same thing confirmed it was firmware related.
Photo of Nick Lowe

Nick Lowe, Official Rep

  • 2438 Posts
  • 445 Reply Likes
Is there anything specific that we can go on?
Photo of GPLExpert

GPLExpert

  • 2 Posts
  • 0 Reply Likes
Hi All,

same problems here.

I'm going to try to downgrade to 5.1r5.
Photo of Nick Lowe

Nick Lowe, Official Rep

  • 2491 Posts
  • 451 Reply Likes
Can somebody get us packet captures of this (as above) so we have something tangible to analyse to determine root cause?

I am very happy to take a look.
Photo of GPLExpert

GPLExpert

  • 2 Posts
  • 0 Reply Likes
@Nick Lowe

I will try to get a capture.

It's not possible for me before the end of the week. I will try on next Monday
Photo of Peter Powell

Peter Powell

  • 2 Posts
  • 2 Reply Likes
Hi

I work with Steve (comment above) and can confirm the following:
- we have about 3000 APs deployed at over 120 sites.
- 4 sites (the first in the project) use AP120
- the rest are AP121
- the building our Data Center is in uses AP120
- users started complaining of the symptoms above (associated but no throughput, connections dropped etc)
-as well we saw APs becoming unresponsive, crashing/rebooting showing very high CPU usage etc.
- downgraded all the AP120 in the building to 5.1r5
- problems manifesting above stopped.
- applied same downgrade to all other AP120, stability restored.
Not saying this is a "fix" or that there may not have been other network issues involved but this got our users happy and certainly appears to point strongly to some kind of hardware/firmware interaction.

Yes we did open a support ticket but didn't get a resolution and it didn't seem to be progressing. We will revisit this with support again but maybe after the next update?

Peter Powell
Photo of Linda Robinson

Linda Robinson

  • 2 Posts
  • 0 Reply Likes
We have the same thing going on with 330 and 350 APs. HiveOS 5.1r5.1089 Anyone else having the issues with these models?
Photo of Scott M.

Scott M., Sr. Support Engineer

  • 104 Posts
  • 8 Reply Likes
Connectivity issues can have many causes. I agree with those above who recommend that a support case be opened. Specific troubleshooting will need to be conducted as each customer is likely to have different issues and there's a good chance your issue can be resolved via refinement of your configuration.
Photo of Andy Cannarella

Andy Cannarella

  • 31 Posts
  • 0 Reply Likes
I was having the same issues and here is what we did. In this order over a few months while trouble shooting with support.

Disabled radio rates below 11MB
Downgraded to 5.1r5 firmware
Removed WIPS from the policy - this seemed to be the kicker.

We didn't need WIPS for any compliance, was just seeing what it would do.
Photo of Linda Robinson

Linda Robinson

  • 2 Posts
  • 0 Reply Likes
Working with support we are finding poorly coded user applications on devices that are using up all the sessions on an AP. Users all stay connected, but can't get anywhere.

We need a tool to alert us to these problems before they impact everyone. Any ideas? Now we have to catch it in action, check packets for suspicious activity, work backward to find the device from the IP, go through each executable to see what's causing the problem. Once we find all this and delete it things start working again. This on each of 60 APs with hundreds of devices. We've tried limiting the sessions anyone device could take up but that didn't work on UDP broadcasts. Also, has to be manually added to each AP everytime there's an update.
Photo of Nick Lowe

Nick Lowe, Official Rep

  • 2438 Posts
  • 445 Reply Likes
So I can do some digging, what command would this be? I assume via SSH?
Photo of Van Jones

Van Jones

  • 75 Posts
  • 4 Reply Likes
Linda, do you have WIPS enabled? We are troubleshooting a high cpu / rebooting issue and support analyzed the tech data and said that WIPS was causing the issue.