AP120 keeps going offline.

  • 3
  • Question
  • Updated 3 years ago
  • Answered
We have an AP 120 that frequently (pretty much daily) loses connection to Hivemanager. We can ping the AP but not browse (HTTP) or connect via SSH to the AP when this occurs. How can i diagnose the cause? Firmware and Hivemanager versions are HiveOS 6.1r6.1779. Load is about 14 users with memory 56% and CPU at average or 20%
We have a number of other 120s as well as other Aerohive AP models. There is 1 AP170 exhibiting similar behavior but is going offline less frequently, around once a week.
Photo of Paul Daniel

Paul Daniel

  • 5 Posts
  • 1 Reply Like

Posted 4 years ago

  • 3
Photo of Kyle Olson

Kyle Olson

  • 1 Post
  • 0 Reply Likes
Hi Paul,
It's possible that the CPU usage is spiking during these periods.  Do  these events generally clear themselves, or do you need to reboot the AP  in order to restore functionality?

The best way to address this issue would be to open a case with us here at ATAC either via phone or our support portal, linked below.  If you do open a ticket with us, make sure to attach tech data from the relevant AP(s); you can get the data from the Monitor > Aerohive APs page, by selecting the AP and clicking Utilities > Get Tech Data.

Thank you,Kyle O

Aerohive Technical Assistance Center
https://support.aerohive.com
(866) 365-9918 (US toll-free)
+1-408-510-6100 (International)
Photo of Travis Kaufman

Travis Kaufman, Champ

  • 113 Posts
  • 30 Reply Likes
Curious - What is the Bonjour Configuration for this AP?  
Photo of Sjoerd de Jong

Sjoerd de Jong, Employee

  • 97 Posts
  • 20 Reply Likes
Hello Paul,

Can you tell us if this occurs on all AP's at the same time? Or just the ones on, for example, a specific switch, building or floor?

Best regards,
Sjoerd
Photo of Paul Daniel

Paul Daniel

  • 5 Posts
  • 1 Reply Like
Thank you for the responses above.  The 2 APs are on different edge switches and in different buildings.  One has a POE injector, and the other is powered off an Allied Telesys 8000S POE switch.  Both power configurations are in use on other APs of the same models without seeing the issue.  Timings for the issue occurring do not match across the devices.  Bonjour is using auto generated name and priority of 10.  
Photo of Eastman Rivai

Eastman Rivai, Official Rep

  • 146 Posts
  • 17 Reply Likes
Paul,

Can you disable application, WIPS and location services if they are enabled and monitor?

Thank you,
Photo of Nick Shipway

Nick Shipway

  • 5 Posts
  • 0 Reply Likes
Hi Paul,

I have a customer with the exact same problem on 110s, 120s and 170s across three sites.

HMOL 6.2r1a
HiveOS 6.2r1.1924

Apps, WIPS and Loc services are all disabled. Bonjour is not configured.  It happens randomly twice a week to different AP devices that can only be restored to service by bouncing the switch ports. There is no way to get useful tech data from the APs.  I may resort to installing a syslog server on one of the sites...

What type of data switches and internet connection do you employ? Are you using HMOL or on prem?

Regards
Nick
Photo of Eastman Rivai

Eastman Rivai, Official Rep

  • 146 Posts
  • 17 Reply Likes
Paul,

Can you check the size of the management subnet?
Is there any captive web portal configured on the AP?

Thank you,

Eastman
Photo of Paul Daniel

Paul Daniel

  • 5 Posts
  • 1 Reply Like
Thanks Eastman.  The subnet they are managed on is not a separate management subnet but rather the default subnet that all desktops are on.  We are using a x.x.0.0/16 subnet with addresses in use from x.x.0.0 to x.x.6.254 
Photo of Nick Shipway

Nick Shipway

  • 5 Posts
  • 0 Reply Likes
Paul, 

Do you have a CWP within your configuration?

Apparently the session timeout trigger can cause this issue...

If so, try setting it to the maximum 120960 minutes (about 6 months) so the trigger never occurs.  I have set this on my customers site and will report back the results.

N
Photo of Paul Daniel

Paul Daniel

  • 5 Posts
  • 1 Reply Like
CWP is configured on all APs.  Thank you for the suggestions.  I'll look at the timeout.  We have a number of other APs of the same models with the same firmware which are not exhibiting the same behavior so we have also tried switching out POE power pack for one AP and putting the other AP on a POE injector rather than POE switch in case the issue relates to power quality.
Photo of Eastman Rivai

Eastman Rivai, Official Rep

  • 146 Posts
  • 17 Reply Likes
Paul,

The issue does not always happen on the AP, these registration period fault increases the CPU which is added to the existing load. If the load is not that high you may not have the issue. Big broadcast traffic coming from the wired also increases CPU load. I would suggest create smaller separate subnet for the AP. Ideally one subnet per floor. This will also reduce the AMRP traffic between APs. 

The registration period issue will be fixed in the next release 6.4r1, which should be out soon.
Photo of Nick Lowe

Nick Lowe, Official Rep

  • 2491 Posts
  • 451 Reply Likes
As the AP110, AP120 and AP170 are not getting 6.4r1, will this be back ported to the 6.2r1b for these that is due out to disable the SSLv3 protocol?
(Edited)
Photo of Nick Shipway

Nick Shipway

  • 5 Posts
  • 0 Reply Likes
I too would like to know the answer to Nick's question.  I can confirm 100% this was the cause of my customers issue. Will this and any other fixes be back ported for these APs in the future?
Photo of Nick Lowe

Nick Lowe, Official Rep

  • 2491 Posts
  • 451 Reply Likes
The same applies, in principle, for the AP320/AP340 where support stops at 6.1 and 6.1r6b is due there to disable SSLv3.

The document that discloses support being discontinued for new major releases and a forthcoming change to disable SSLv3 is here: http://www.aerohive.com/support/security-center/security-bulletins/psa-cve-2014-3566-poodle

The EOL policy is set out here: http://www.aerohive.com/support/endoflifeproducts.html

It does say in the PDFs there that "Aerohive will actively maintain a software release that supports this product till End of Life".
(Edited)
Photo of Shane Walters

Shane Walters

  • 23 Posts
  • 2 Reply Likes
FYI - I had this same issue. One AP out of 33 APs was doing this. We replaced everything including the AP and all the way down to the PoE and patch cables involved. Ultimately it SEEMS to be an actual problem with the CAT5 cable running through the wall. I moved it to another area of the room and it has been up for over 13 days now (since the change) which is a record. You may want to try moving it to another location even just as a test to eliminate the CAT5 cabling.
Photo of Patience

Patience

  • 61 Posts
  • 0 Reply Likes
We are having same issue on AP120 and AP170 access points. We have case opens for months and ATAC has suggested to play with different CWP registration time. If we go with max time, disconnection does not occur but if we setup time less than 420 minutes (7 hours) or so, we see lots of zombie APs. We use CWP for  temporary guest network and having registration period for more than 7 hours is really not acceptable use policy. Hopefully, this new upcoming firmware 6.2r1c will fixed this and many more other issues on AP120s.