Anyone facing stability issues with AP250s?

  • 1
  • Question
  • Updated 1 year ago
We have recently installed 13 AP250s in our office. 2 of them had CAPWAP issues. Initially manual reboot of the switch port fixed the issue but eventually they went to an inaccessible state even from CLI. The logs showed that APs are frozen during reboot procedure and we received new APs as replacement. Today another AP lost the CAPWAP connection and is not coming back online. The event happened at 3:48 AM when there is no traffic at all. Another kind of issue is that whenever I update and upload a config to a Test AP, the test AP always loses CAPWAP connection  and wouldn't come back. From CLI access, it is noticed that if I configure the management IP and VLAN again manually then it would come back online. I wanted to know if anyone else has the same stability issues.
Photo of Om Prakash Ravipati

Om Prakash Ravipati

  • 22 Posts
  • 2 Reply Likes

Posted 1 year ago

  • 1
Photo of Scott Farrand

Scott Farrand

  • 7 Posts
  • 0 Reply Likes
What software version do you have installed?

I have around 290 AP250's deployed, and haven't had too many issues.  I initially had capwap stability issues, but that was actually a remote issue.  Periodically I see speed degregation issues, but mostly things just seem to be working.  

(Note - I have upwards of 4,000 unique devices on my network daily)
Photo of Om Prakash Ravipati

Om Prakash Ravipati

  • 22 Posts
  • 2 Reply Likes
All my access points are running version 7.1r1.
Photo of Om Prakash Ravipati

Om Prakash Ravipati

  • 22 Posts
  • 2 Reply Likes
Could you please share if possible, what was the remote issue that caused CAPWAP instability?


Thanks.
Photo of Scott Farrand

Scott Farrand

  • 7 Posts
  • 0 Reply Likes
I experienced two issues that were greatly affecting CAPWAP. address translation was part of it, and remote capwap nodes would stop talking was another issue.

The NAT issue was partially caused by my Cisco ASA cluster assigning a translation, and then the Fatpipe warp we run would switch data streams beteween 3 different ISP connections.  Once I locked down the address translation, many of these issues went away.  

The remote capwap node communication issue appears to have been resolved last fall - honestly, I'm extrapolating what I was told at that time.

There was more resolution with fixing address translation in my opinion.

I'm currently running 7.1r1.

Are you running ACL's on your wireless networks?  or are you just using your access points for 802.1X pass through?

Have you possibly engaged tech support to get some help?
Photo of Johnny Matthews

Johnny Matthews

  • 34 Posts
  • 2 Reply Likes
I've rolled out just over 100 AP250 & AP130s in the last month and have had a couple do this. I'm getting ready to have them replaced. Watching in the console, they get part way through the boot sequence and just hang.
Photo of Om Prakash Ravipati

Om Prakash Ravipati

  • 22 Posts
  • 2 Reply Likes
Scott,
We are not running any ACLs on the WAPs. The last time I had CAPWAP issues, I got replacement APs. I want to know how this can be fixed because we plan to suggest aerohive to our customers and want to make 100% sure about the product performance.

Johnny,
Looks like we both have the same issue. The tech support is providing replacement APs but not suggesting any fix for this. I wonder if its actually hardware issue to get a replacement AP. I'd rather be happy with a software fix. Why would an AP reboot in the first place and stay frozen in that state without any configuration changes is one big question I have.
Photo of Scott Farrand

Scott Farrand

  • 7 Posts
  • 0 Reply Likes
I haven't had any issues with access points hanging in the boot process.  I know when I was seeing capwap problems, I found I could actually change the capwap server and see the issue vanish...  (I want to say with our number of access points, we have ~ 10-15 capwap servers get get referenced by our devices)  Although I thought there was an issue with the AP120's having issues with hanging...
Photo of Om Prakash Ravipati

Om Prakash Ravipati

  • 22 Posts
  • 2 Reply Likes
Interesting. could you please let me know how I can identify the capwap servers available for our hivemanager instance and configure them manually? We are using hivemanagerNG.

Thanks.
Photo of BJ

BJ, Champ

  • 374 Posts
  • 45 Reply Likes
Online or on-prem?
Photo of BJ

BJ, Champ

  • 374 Posts
  • 45 Reply Likes
The basic commands are show capwap client, capwap client server name x.x.x.x.
You may also need to restart the capwap process, no capwap client enable, capwap enable.
Photo of Om Prakash Ravipati

Om Prakash Ravipati

  • 22 Posts
  • 2 Reply Likes
Its hivemanagerNG online.


Thanks.
Photo of Johnny Matthews

Johnny Matthews

  • 34 Posts
  • 2 Reply Likes
This is what the AP boot sequence looks like from the console, it goes no further and is unresponsive at the end.

U-Boot 2012.10 (Nov 01 2016 - 04:53:25)
I2C:   ready
Wait.
Done.
DEV ID= 0000cf12
REV ID= 00000000
SKU ID = 0
OTP status: eca00018
MEMC 0 DDR speed = 800MHz
Log: ddr40_phy_init.c: Configuring DDR Controller PLLs
Log: offset = 0x18010800
Log: VCO_FREQ is 1600 which is greater than 1Ghz.
Log: DDR Phy PLL polling for lock
Log: DDR Phy PLL locked.
Log: ddr40_phy_init::DDR PHY step size calibration complete.
Log: ddr40_phy_init:: Virtual VttSetup onm CONNECT=0x01CF7FFF, OVERRIDE=0x00077FFF
Log: ddr40_phy_init:: Virtual Vtt Enabled
Log: DDR Controller PLL Configuration Complete
PHY register dump after DDR PHY init
PHY register dump after mode register write
DRAM:  512 MiB
WARNING: Caches not enabled
GPIO Init ... Done
Power Input Detection: POE AF, Drive GPIO17(USB 5V enable) success
NAND:   NAND_FLASH_DEVICE_ID_ADDR = 18028194
Done that
(ONFI),  S34ML04G2, blocks per lun: 1000 lun count: 1
128 KiB blocks, 2 KiB pages, 16B OOB, 8-bit
NAND:   chipsize
total 0 bad blocks,LIST:
now the up level will see a good flash chip no bad block which size is 20000000
before nvram partition, there are 0 bad blocks
512 MiB
Using default environment
In:    serial
Out:   serial
Err:   serial
Unlocking L2 Cache ...Done
arm_clk=1000MHz, axi_clk=500MHz, apb_clk=250MHz, arm_periph_clk=500MHz
Net:   Registering eth
Broadcom BCM IPROC Ethernet driver 0.1
Using GMAC1 (0x18025000)
et0: ethHw_chipAttach: Chip ID: 0xcf12; phyaddr: 0x1e
bcm_robo_attach: devid: 0x53012
bcmiproc_eth-0
MAC address is c413:e248:94c0
NVRAM_MAGIC found at offset 700000
nvram_init: ret 1
Reset TPM chip...
Hit any key to stop autoboot:  0
Hit any key to interrupt boot from flash:  0
Loading kernel from device 0: nand0 (offset 0x800000) ... done
Loading rootfs from device 0: nand0 (offset 0x1600000) ... done
## Booting kernel from Legacy Image at 01005000 ...
   Image Name:   Linux-2.6.36
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:    2548476 Bytes = 2.4 MiB
   Load Address: 00008000
   Entry Point:  00008000
   Verifying Checksum ... OK
## Loading init Ramdisk from Legacy Image at 02005000 ...
   Image Name:   uboot initramfs rootfs
   Image Type:   ARM Linux RAMDisk Image (uncompressed)
   Data Size:    29970432 Bytes = 28.6 MiB
   Load Address: 00000000
   Entry Point:  00000000
   Verifying Checksum ... OK
Power off two PHY...
   Loading Kernel Image ... OK
OK
boot_prep_linux commandline: root=/dev/ram console=ttyS0,9600 ramdisk_size=70000 cache-sram-size=0x10000
Starting kernel ...
Uncompressing Linux... done, booting the kernel.
...board_fixup
board_fixup: mem=512MiB
board_map_io
board_init_irq
board_init_timer
board_init
Mounting local file systems...
UBI device number 0, total 3256 LEBs (413433856 bytes, 394.3 MiB), available 123 LEBs (15618048 bytes, 14.9 MiB), LEB size 126976 bytes (124.0 KiB)
set attenuator GPIO 7b0f00->7b4f40
et_module_init: et_txq_thresh set to 0xce4
et_module_init: et_rxlazy_timeout set to 0x3e8
et_module_init: et_rxlazy_framecnt set to 0x20
et_module_init: et_rxlazy_dyn_thresh set to 0
register snif device on interface eth0.
eth0: Broadcom BCM47XX 10/100/1000 Mbps Ethernet Controller 10.10.69.74_e5.0.9.1 (r629731)
register snif device on interface eth1.
eth1: Broadcom BCM47XX 10/100/1000 Mbps Ethernet Controller 10.10.69.74_e5.0.9.1 (r629731)
PCI: Enabling device 0001:01:00.0 (0140 -> 0142)
PCI: Enabling device 0002:02:00.0 (0140 -> 0142)
Photo of Nick Lowe

Nick Lowe, Official Rep

  • 2491 Posts
  • 451 Reply Likes
Johnny,

For the apparent boot hang, have you contacted Aerohive support about this?

If not, I would suggest opening a support case.

How are you powering the AP? If via a switch, is there any difference if you power via a PoE+ injector for testing purposes also while not connected to a switch?

Thanks,

Nick
(Edited)
Photo of Johnny Matthews

Johnny Matthews

  • 34 Posts
  • 2 Reply Likes
These are being RMA'd
Photo of Kevin Whelan

Kevin Whelan

  • 53 Posts
  • 2 Reply Likes
confirm same problem here, all have to RMA