Access points (AP121) are loosing capwap connection randomly!

  • 1
  • Question
  • Updated 3 years ago
  • Answered
Hi!

I am writing this because i haven't heard from support now from almost 10 days and i need some feedback asap! Our customers has a network of 180APs implemented on over 80 locations over Central Europe. 

After Aerohive maintenance of HMOL on the 26th of December APs started to loose capwap connection randomly on ALL locations. There are cca 70 notifications per day saying that the capwap connection is being lost. When this happens, the clients are disconnected from the AP and the AP needs to be rebooted to start working again...and this is very frustrating for the end customer.

There has been no changes on wired network on any of the sites and also everything was working fine before this maintenance. 

Customer is using HMOL version 6.2r1a 

Here is the sh capwap client from the random AP1 that was disconected.

PLKRA_GK_02#show capwap clientCAPWAP client:   Enabled
CAPWAP transport mode:  UDP
RUN state: Connected securely to the CAPWAP server
CAPWAP client IP:        10.0.35.150
CAPWAP server IP:        54.76.233.68
HiveManager Primary Name:hm-emea-045.aerohive.com
HiveManager Backup Name:
CAPWAP Default Server Name: redirector.aerohive.com
Virtual HiveManager Name: "The VHM name was correct xxxxxxxx"
Server destination Port: 12222
CAPWAP send event:       Enabled
CAPWAP DTLS state:       Enabled
CAPWAP DTLS negotiation: Disabled
     DTLS next connect status:   Enable
     DTLS always accept bootstrap passphrase: Enabled
     DTLS session status: Connected
     DTLS key type: passphrase
     DTLS session cut interval:     5 seconds
     DTLS handshake wait interval: 60 seconds
     DTLS Max retry count:          3
     DTLS authorize failed:         0
     DTLS reconnect count:          0
Discovery interval:      5 seconds
Heartbeat interval:     30 seconds
Max discovery interval: 10 seconds
Neighbor dead interval:105 seconds
Silent interval:        15 seconds
Wait join interval:     60 seconds
Discovery count:         0
Max discovery count:     3
Retransmit count:        0
Max retransmit count:    2
Primary server tries:    0
Backup server tries:     0
Keepalives lost/sent:    0/455
Event packet drop due to buffer shortage: 0
Event packet drop due to loss connection: 9

Here is the sh capwap client from the random AP2 that was disconected.

PLWAW_ZT_01#show capwap client
CAPWAP client:   Enabled
CAPWAP transport mode:  UDP
RUN state: Connected securely to the CAPWAP server
CAPWAP client IP:        10.0.3.147
CAPWAP server IP:        54.76.233.68
HiveManager Primary Name:hm-emea-045.aerohive.com
HiveManager Backup Name:
CAPWAP Default Server Name: redirector.aerohive.com
Virtual HiveManager Name: xxxxx
Server destination Port: 12222
CAPWAP send event:       Enabled
CAPWAP DTLS state:       Enabled
CAPWAP DTLS negotiation: Disabled
     DTLS next connect status:   Enable
     DTLS always accept bootstrap passphrase: Enabled
     DTLS session status: Connected
     DTLS key type: passphrase
     DTLS session cut interval:     5 seconds
     DTLS handshake wait interval: 60 seconds
     DTLS Max retry count:          3
     DTLS authorize failed:         0
     DTLS reconnect count:          0
Discovery interval:      5 seconds
Heartbeat interval:     30 seconds
Max discovery interval: 10 seconds
Neighbor dead interval:105 seconds
Silent interval:        15 seconds
Wait join interval:     60 seconds
Discovery count:         0
Max discovery count:     3
Retransmit count:        0
Max retransmit count:    2
Primary server tries:    0
Backup server tries:     0
Keepalives lost/sent:    0/1147
Event packet drop due to buffer shortage: 0
Event packet drop due to loss connection: 9

The IP addresses are OK, also DHCP is working well, there are no FW rules or sonic walls and as well all the ports are opened as they should be. 

I don't know why, but there is complete radio silence for this support ticket from support, so if anyone can give some advice upon this would be much appreciated!

Thanks!

BR,

Aleš
Photo of Aleš Gošek

Aleš Gošek

  • 9 Posts
  • 0 Reply Likes
  • frustrated

Posted 4 years ago

  • 1
Photo of Nick Lowe

Nick Lowe, Official Rep

  • 2491 Posts
  • 451 Reply Likes
When APs lose connection with HiveManager, this, by design, does not affect client connectivity. Are you really sure about your observations? Honestly, I am a little suspicious/dubious! :P

As far as I know, the maintenance that was originally planned on the 26th to move from OpSource ro AWS didn't take place and it is instead taking place later today, (January 9, 2015 from 7:00PM PST onwards).
(Edited)
Photo of BJ

BJ, Champ

  • 374 Posts
  • 45 Reply Likes
What has client monitor revealed? As Nick pointed out, capwap should have no affect on client production environments.

Best,
BJ 
Photo of Aleš Gošek

Aleš Gošek

  • 9 Posts
  • 0 Reply Likes
Hi!

Yes, loosing CAPWAP connection to Hivemanager doesn't affect the client connectivity...I am aware of that. However, the APs are going offline randomly disregarding the location and as well clients are being effected (we have already rollout ALL the reasons from local networks). Recently i found out that the client has made an upgrade of Hivemanager and HiveOS from version 6.1r6 to 6.2r1 on the same that as Aerohive announced maintenence of HMOL...

therefore, I will suggest downgrade of APs until we resolve the issue. And we will upgrade APs per each location...it will be simpler to troubleshoot then 80 locations at once.

What bugs me the most is that in 10+ days, none of the Aerohive support and CEUR team could provide a simple feedback about this issue! We do 80% of troubleshooting by ourself, but here we asked just for a simple opinion and i guess we are asking to much...

Best regards,

Aleš
Photo of Nick Lowe

Nick Lowe, Official Rep

  • 2491 Posts
  • 451 Reply Likes
Pragmatically then, you may wist to avoid upgrading HiveOS for a second time to 6.2r1 after downgrading to 6.1r6 as you will be able to deploy HiveOS 6.4r1 to these APs soon. It is going to be available with HMOL 6.4r1 by the end of the month. That said, I am not aware of any specific stability issues that HiveOS 6.2r1 introduced over 6.1r6.

Any specific data/evidence that you have that concerns the APs would allow us to delve and speculate in to root cause. At the moment, we just have anecdotal and empirical observation which, honestly, does not help us troubleshoot the issue.
Photo of Aleš Gošek

Aleš Gošek

  • 9 Posts
  • 0 Reply Likes
Hi!

Yes, i am now waiting for the approval from the customer to upgrade the Hivemanager version to 6.4r1 version and if confirmed we will do the upgrade. Meanwhile the system has return back in the normal state after some while...rebooting all the APs helped :) Now, for one complete day none of the APs went down. We will monitor it now for some while and see how it will go in the future...

Thanks for your all your help!

BR,

Aleš
Photo of Aleš Gošek

Aleš Gošek

  • 9 Posts
  • 0 Reply Likes
Hi!

So far everything is good since upgrading to newest version of HM and HiveOS:) Client is happy and so am I. At the and this is all it matters:) Nick & BJ...thank you!

Br,

Aleš
Photo of Mark Lanham

Mark Lanham

  • 3 Posts
  • 0 Reply Likes
Hi,

I have experienced this myself. However, I was able to resolve this issue I believe as it is better now.

Increase heart beat to 90 seconds, not relevant to the connection itself, but should help with clients getting disconnected. Clients are getting disconnected from the internet, not the AP itself, after getting these error I needed to verify personally. The root cause of this issue appears to be network infrastructure related. I have found that the more APs you get the more traffic it generates. Switches and firewalls are not up to the task. for use the Core switches needed to be 10G the connection to the internet was redundant.

Happy to report issue appears to be resolved with the 10G core switches and redundant LAN connections to the firewall.

Thank you,

Mark