Sudden CAPWAP errors

  • 1
  • Question
  • Updated 2 years ago
We have been running Aerohive for almost 4 years without issue and as of 4/26 we are getting hit with CAPWAP errors like crazy.  AP's will randomly report "The CAPWAP connection with HiveManager was lost." and not work.  We have made no changes or upgrades.  Where do I even begin to fix this?  Is Aerohive having CAPWAP issues today?  Is there a way I can test the CAPWAP connection from behind my firewall?

We are on HiveManager 6.6r2a

Thanks!!
Photo of Joel T

Joel T

  • 7 Posts
  • 0 Reply Likes
  • frustrated

Posted 2 years ago

  • 1
Photo of Gary Smith

Gary Smith, Official Rep

  • 299 Posts
  • 61 Reply Likes
Hi Joel,

Can you tell us what changed on your side at the time that you started to see messages?

Did you upgrade the HM version? (upgrading will essentially move you from one server instance to another)
Did you upgrade the AP's?
What does the AP CPU look like when these issues are being reported?

Q - Is there a way I can test the CAPWAP connection from behind my firewall?
A - You can run a CAPWAP ping from the AP CLI (example "capwap ping hm-useast111.aerohive.com". This will give you a round trip time and could show if there is significant delay.

Kind Regards,
Gary Smith
Photo of Joel T

Joel T

  • 7 Posts
  • 0 Reply Likes
Literally nothing changed ... That is the truly bizarre part.  I work at a school so we have the luxury of holding off on major changes until the summer months.  After the error started happening, I upgraded HiveManager to 6.6r3a and then upgraded all AP's as they would come online.  I ran "show capwap client" and then capwap ping and got this result:

CAPWAP ping parameters:    Destination server: hm-useast-326.aerohive.com (52.201.171.90)
    Destination port: 12222
    Count: 5
    Size: 56(82) bytes
    Timeout: 5 seconds
--------------------------------------------------
CAPWAP ping result:
    82 bytes from 52.201.171.90 udp port 12222: seq=1 time=44.553 ms
    82 bytes from 52.201.171.90 udp port 12222: seq=2 time=44.699 ms
    82 bytes from 52.201.171.90 udp port 12222: seq=3 time=53.314 ms
    82 bytes from 52.201.171.90 udp port 12222: seq=4 time=46.42 ms
    82 bytes from 52.201.171.90 udp port 12222: seq=5 time=73.712 ms
    ------- hm-useast-326.aerohive.com CAPWAP ping statistics -------
    5 packets transmitted, 5 received, 0.00% packet loss, time 5280.204ms
    rtt min/avg/max = 44.553/52.464/73.712 ms


All AP's are reporting as connected and fine.  Is it possible that there is something running on a client computer that would kill connectivity.  We are behind a firewall and have virus scanning and malware detection etc.  It just seems to ramp up as our number of users ramp up.

Thanks for the reply!
Photo of Gary Smith

Gary Smith, Official Rep

  • 299 Posts
  • 61 Reply Likes
Hi Joel,

I would look at what the AP is doing at the time of the CAPWAP issues. Maybe the CPU is high? This might explain why the issue occurs when client are connected - a particular process or traffic type could be causing high CPU.

To see historically if you have had high CPU you can issue "show log flash" and look for "system busy" messages.

If you see high CPU in real-time, look at "show system processes state" and "show cpu detail" to see what might be causing high CPU.

Kind Regards,
Gary Smith
Photo of Joel T

Joel T

  • 7 Posts
  • 0 Reply Likes
This is great info!!  Thank you so much and I will check as the day progresses.
Photo of Joel T

Joel T

  • 7 Posts
  • 0 Reply Likes
On one of them that just dropped off this is what I get when trying to login to the CLI:

"ah_event_start_subthread failed, rc = -1, flag = 4"
Photo of Gary Smith

Gary Smith, Official Rep

  • 299 Posts
  • 61 Reply Likes
Hi Joel,

I suspect that the issue you see with CAPWAP connectivity is a result of high CPU and would suggest you first take a look at this thread as I commented on a similar issue;
https://community.aerohive.com/aerohive/topics/high-cpu-utilization-after-upgrading-to-hiveos-6-5r3a...

I would then ask that you look at a wired packet capture to see traffic types hitting your AP's ethernet interface.

Kind Regards,
Gary Smith
Photo of C4Church IT Support

C4Church IT Support

  • 7 Posts
  • 0 Reply Likes
I periodically get these CAPWAP errors too (10 APs) sometimes it is one or two APs, sometimes it is all of them. Is there a way to troubleshoot this without using the CLI?
Photo of Joel T

Joel T

  • 7 Posts
  • 0 Reply Likes
I t is absolutely a CPU overload issue.  I am unsure as to why it would suddenly explode like this without any changes on our side.  I am looking to roll back to a previous OS version on a couple of the AP's but only see the current OS version in Hivemanager.  A different post had mentioned being able to do it through HMOL.  Is this a CLI only process now?
Photo of Gary Smith

Gary Smith, Official Rep

  • 299 Posts
  • 61 Reply Likes
Hi Joel,

There may have been no obvious network changes on your side however, the High CPU will have a root cause and I would suspect network traffic. Are you able to take a wired capture at the AP to see what kinds of traffic and what volumes are hitting the AP?

I would suggest that you open a support ticket and work with Aerohive support on this. I would ask that the findings be posted here as they may help others in the future.

Kind Regards,
Gary Smith
Photo of Joel T

Joel T

  • 7 Posts
  • 0 Reply Likes
I am doing a capture at one of the AP's now.  I have submitted a Support Ticket and will update this post as we go along.  Thanks Gary!
Photo of Gary Smith

Gary Smith, Official Rep

  • 299 Posts
  • 61 Reply Likes
Hi Joel,

Did this issue get resolved as per the suggestions in the support ticket?

Kind Regards,
Gary Smith
Photo of Joel T

Joel T

  • 7 Posts
  • 0 Reply Likes
We were never able to get it resolved. We have since moved on.
Photo of Lukas Lajos

Lukas Lajos

  • 13 Posts
  • 0 Reply Likes
Hi guys,

do you have any update regarding the issue?
One of our clients is experiencing similar problem but their APs are only losing  CAPWAP connection via UDP. I've tried to reset the CAPWAP connection and get AP back online using UDP 12222. After not even 5 minutes the connection dropped and AP re-connected itself but using TCP 80 instead. Connection is stable on TCP 80 but customer has over 700 APs and TCP traffic is not ideal.
They are using Palo Alto firewalls and originally were using CAPWAP service but they've set static rules to allow CAPWAP with no change at all. I've asked them about traffic shaping but they don't have any (as this was the issue for another customer) for UDP traffic.

Any ideas?

Thanks
Photo of Gary Smith

Gary Smith, Official Rep

  • 299 Posts
  • 61 Reply Likes
Hi Lukas,

Did you make any more progress on this? It might be worth looking at CAPWAP debugs if the issue persists. To start;

_debug capwap info
_debug capwap basic

Kind Regards,
Gary Smith
Photo of Lukas Lajos

Lukas Lajos

  • 13 Posts
  • 0 Reply Likes
Hi Gary,

Customer is trying to collect some packet capture data from APs but I will also suggest to run debug commands on few APs.

There is a ticket opened (00176480) but engineer suggested that there is no bug within firmware so we are waiting for packet captures.

Thanks