Access points still rebooting after 6.1r3

  • 3
  • Question
  • Updated 4 years ago
  • Answered
We had a ticket open with Aerohive concerning our access points rebooting. We sent multiple tech dumps and we were told they found the problem and that the fix would be in 6.1r3. We upgraded HMOL and a majority of our access points over the weekend and I see that 3 access points have rebooted on their own since the upgrade. Prior to the upgrade I saw that several people were having problems with reboots. I'm curious if others that were having reboot problems prior to this version are still having the same issue after the upgrade...or is it just me?
Photo of Van Jones

Van Jones

  • 75 Posts
  • 4 Reply Likes

Posted 4 years ago

  • 3
Photo of Erik Gunnarsson

Erik Gunnarsson

  • 38 Posts
  • 6 Reply Likes
I'm running 6.1r3a now and the problem just started to occure for me, with random reboot of accesspoints
(Edited)
Photo of Erik Gunnarsson

Erik Gunnarsson

  • 38 Posts
  • 6 Reply Likes
I just managed to get my APs to stop reboot by change mitigation mode to manual in the rouge ap detection settings.

Configuration -> {your profile} -> additional settings -> service settings -> WIPS policy -> optional settings -> mitigation mode
Photo of Van Jones

Van Jones

  • 75 Posts
  • 4 Reply Likes
We have always had WIPS turned on with mitigation mode set to manual in our network policy.  Yesterday, at the direction of support we completely removed the WIPS policy from our network policy and pushed that out.  We still had 3 access points reboot last night.
Photo of Erik Gunnarsson

Erik Gunnarsson

  • 38 Posts
  • 6 Reply Likes
Sorry to hear that it wasn't the same problem.
If you want to, you could email me a "get tech data" from one of the APs that is rebooting, and I will be happy to have a look at it.

erik.gunnarsson [at] qls.se

(Edited)
Photo of Matthew Hinson

Matthew Hinson

  • 6 Posts
  • 0 Reply Likes

We are getting the same. It's exclusive to our AP121's. APs reboot randomly every 6-24 hrs ever since going to 6.1r3
Photo of Erik Gunnarsson

Erik Gunnarsson

  • 38 Posts
  • 6 Reply Likes
I was a lite too qiuck. Changing WIPS did help for a while, but some of our APs is still rebooting randomly..
Trying to go back to 6.1r2 firmware on one of them now to see if it stops.
Photo of Nick Lowe

Nick Lowe, Official Rep

  • 2491 Posts
  • 451 Reply Likes
Is it possible one of you could set up a Syslog service for debugging purposes in your environment to see what, if anything, gets logged just before a reboot occurs?

Nick
Photo of Erik Gunnarsson

Erik Gunnarsson

  • 38 Posts
  • 6 Reply Likes
I have the log from the AP when it reboots from its NVRAM.

Is there more info to catch with a syslog?

2014-02-16 05:00:02 notice  capwap: "scpuser" successfully http to server
2014-02-16 04:00:03 notice  capwap: "scpuser" successfully http to server
2014-02-16 03:00:03 notice  capwap: "scpuser" successfully http to server
2014-02-16 02:00:03 notice  capwap: "scpuser" successfully http to server
2014-02-16 01:00:03 notice  capwap: "scpuser" successfully http to server
2014-02-16 00:00:03 notice  capwap: "scpuser" successfully http to server
2014-02-15 22:11:23 info    ntpclient: [ntpclient]Set time - Sat Feb 15 22:11:23 2014
1970-01-01 01:01:16 notice  ah_top: System is initialized
1970-01-01 01:01:12 notice  ah_scd: Daylight saving time was set and will be in effect from <03-30 01:59:59> to <10-26 02:59:59>
1970-01-01 01:01:12 notice  ah_scd: The time zone for the device was reset to <GMT+1:00>
1970-01-01 00:00:45 notice  amrp2: set default route at node 4018:b13b:7200 ip 0.0.0.0
1970-01-01 00:00:44 notice  amrp2: elect_inet_ifp set default route
1970-01-01 00:00:29 info    ah_scd: Initial AP hostname:default(AH-3b7200), cli(AH-3b7200), value(AH-3b7200)
1970-01-01 00:00:16 alert   ah_get_ktrace_bbox: last time rebooted at 2014-02-15_21-07-50, reboot reason: hardware watchdog (confirmed)
2014-02-15 22:00:03 notice  capwap: "scpuser" successfully http to server
2014-02-15 21:00:03 notice  capwap: "scpuser" successfully http to server
2014-02-15 20:00:03 notice  capwap: "scpuser" successfully http to server
2014-02-15 19:00:03 notice  capwap: "scpuser" successfully http to server
2014-02-15 18:00:03 notice  capwap: "scpuser" successfully http to server
2014-02-15 17:00:03 notice  capwap: "scpuser" successfully http to server
2014-02-15 16:00:03 notice  capwap: "scpuser" successfully http to server
2014-02-15 15:00:03 notice  capwap: "scpuser" successfully http to server
2014-02-15 14:00:03 notice  capwap: "scpuser" successfully http to server
2014-02-15 13:00:03 notice  capwap: "scpuser" successfully http to server


Photo of Erik Gunnarsson

Erik Gunnarsson

  • 38 Posts
  • 6 Reply Likes
I downgraded our APs and have now been running 6.1r2 for a day without reboots, so I guess we will be running that until Aerohive releases a fix
Photo of Van Jones

Van Jones

  • 75 Posts
  • 4 Reply Likes
Erik, I wonder how much we have in common.  We are running all AP330 access points across 2 campuses.  Our law campus is academic only (no dorms) running around 40 access points and I don't recall any of those access points rebooting on their own at any time (even with WIPS enabled).  Although the remainder of the 243 access points on the main campus have never rebooted on their own, we have been subject to around 2 to 3 AP reboots per day.  The majority of reboots happen on access points within our dorms, but within the last couple of weeks we have seen some reboots in our academic buildings.  So obviously in some environments, (our law school) crashes/reboots are not a problem.  Is it the physical environment? older dorms vs office space type construction.  Is it the type of clients being brought into the dorm environment that you won't often find in academic buildings or office environments?

In looking at the KDDR logs, our support engineer says that it points to a bug in the WIFI driver on the AP.  Erik, what does your environment look like and where do you see most of your reboots?

Nick, I am investigating whether we can use our new SIEM to aggregate and access the raw syslog entries.
Photo of Mike Kouri

Mike Kouri, Official Rep

  • 1030 Posts
  • 271 Reply Likes
Folks,
We are still trying to determine the commonalities (sp?) of the customers experiencing random reboots. It does seem to be a bit more prevalent on the AP121, but as Van Jones has reported it does also occur on AP330 and other platforms.

We did find and fix one issue that we were hopeful applied to many of the customers reporting this. It was related to multiple processes adjusting the radios (i.e. WIPS background scanning, active mitigation, location services, and/or classic AP functionality) that could lead to the radios being left in an unusable state, which eventually triggers our watchdog timer to reboot the device.

Please do continue to open cases with Support - I am sure that once we discover the root cause or causes of these issues we will be able to fix them in short order.
Photo of Roberto Casula

Roberto Casula, Champ

  • 231 Posts
  • 111 Reply Likes
Hi Mike,

I've got a couple of APs (121s) at a customer that we upgraded to 6.1r3 last week that have rebooted due to hardware watchdog. This wasn't observed when running the prior release (6.1r1 in this case).

Again, we have manual mitigation set in WIPS policy, but we do have location server running. Background scanning is enabled (but set not to run if clients are connected).

So far, the reboots have occurred overnight when there are no active clients.

I will open a support case of course, but I just want to add my voice to those that are observing a potentially worse situation with 6.1r3 than with the previous release.

Regards,
Roberto
Photo of Van Jones

Van Jones

  • 75 Posts
  • 4 Reply Likes
Just an FYI,
I have been told by support that they have isolated the root cause of our particular reboot causing issue.  A special build has been created and is going through QA sanity testing now.  I will let everyone know if that resolves our problem or not once we receive / apply the update.
Photo of Chad Burton

Chad Burton

  • 1 Post
  • 0 Reply Likes
I have about 150 AP's, mostly 330's, but some 121, 120, 170, and 350 mixed in.  I recently upgraded HIVEOS to  6.1r3a from 6.1r2. I noticed immediately we were having issues with reboots so I called support. I went back to 6.1 r2 and also disabled WIPS for the time being and feel it's back to normal. I am still running Hivemanager with 6.1 r3a.
Photo of Van Jones

Van Jones

  • 75 Posts
  • 4 Reply Likes
We are running the special build on most of our 330 aps now and our reboots have drastically reduced.  However, we still have 1 to 2 reboots per week out of a total of 313 access points.
Photo of Rob

Rob

  • 42 Posts
  • 5 Reply Likes
I was having the same issues on 6.1r2.
As per Aerohive support
"disable 802.11k and 802.11R to see if that has helped the reboot"

Worked for me. No more random reboots.

Photo of Jeremy Stewart

Jeremy Stewart

  • 47 Posts
  • 0 Reply Likes

Does that not affect roaming?

Having the same issues with AP's randomly rebooting (reboot reason: hardware watchdog).

Photo of Scott M.

Scott M., Sr. Support Engineer

  • 104 Posts
  • 8 Reply Likes

Generally speaking...

If unexpected reboots occur, it’s recommended to contact your service provider to open a Support case.  The AP logs will be examined to determine the nature of the reboots.  Relevant data will be collected for Engineering.  Collected data will be used to address the root cause.  Where possible, Aerohive Support will recommend configuration changes to mitigating rebooting while Engineering addresses the root-cause.




Photo of Jeremy Stewart

Jeremy Stewart

  • 47 Posts
  • 0 Reply Likes
I have a support case open for another issue where AP's lose their config and go into a boot cycle. After reviewing logs for other AP's that mysteriously reboot, I brought the "hardware watchdog" up for example, as it was rebooting many of our AP's. The response was not favourable, and deemed to be "normal". Seeing that others have the same issue, I would conclude that the issue is not normal, and this thread should be taken as an example for Aerohive to jump on a solution without having us all open individual support cases. I have the same policy for a hundred AP's, and only a percentage of them have this issue, so reviewing configuration would not be a good place to start. 
(Edited)
Photo of Scott M.

Scott M., Sr. Support Engineer

  • 104 Posts
  • 8 Reply Likes

Hello Jeremy,

Reviewing the configuration is indeed a good place to start.  While configuration can matter, rebooting isn't just a matter of configuration, but is also effected by load, environment and the type of traffic being passed (which equates back to load).

Generally speaking, APs should not reboot unexpectedly.

It’s normal for APs to reboot under the conditions that follow:

·         “reboot” command is issued via command line interface or by HiveManager

·         Complete configuration upload is pushed to an AP.

·         Firmware is pushed to an AP

·         Power event occurs

If an AP reboots for reasons other than those listed above, then this is likely an unexpected reboot.  Generally speaking, APs should not reboot unexpectedly.

If you already have a case open for AP reboots, please reply to this thread and tell me the case number and I will take a look.