Has anyone else had a problem with SNMP response time since HIveOS 6.5r5?

  • 1
  • Question
  • Updated 10 months ago
I monitor 1,500 AP130s with a SNMP program called intermapper.  All of the access points are on HiveOS 6.5r4 and work great.  I upgraded one AP130 to 6.5r5 and noticed that I immediately got an error for SNMP response time Threshold being over 2000ms.  I ran a packet capture on the AP giving me the error and it is sending the response 2000ms after the get request.  Does anyone know why SNMP responses take longer now?
Photo of Dennis Hamman

Dennis Hamman

  • 29 Posts
  • 4 Reply Likes

Posted 1 year ago

  • 1
Photo of Ruwan Indika

Ruwan Indika

  • 66 Posts
  • 22 Reply Likes
Hi Dennis,

I tested snmpwalk with AP130 version 6.5r4 and 6.5r5 but couldn't see a delay in response time in 6.5r5. Is it that it takes more than 2000ms to respond when you call a specific SNMP OID ? which OID does your SNMP manager call periodically to measure the response time ? What is the SNMP manager you are using to get this result ?
(Edited)
Photo of Dennis Hamman

Dennis Hamman

  • 29 Posts
  • 4 Reply Likes
I am using Intermapper for SNMP monitoring. I will attach some screen shots of my packet capture hopefully that will help.
Photo of Dennis Hamman

Dennis Hamman

  • 29 Posts
  • 4 Reply Likes
Photo of Ryan M

Ryan M, Official Rep

  • 3 Posts
  • 0 Reply Likes
Hi Dennis,

In the first screenshot you've posted the get request in packet 17 looks like it receives a response in less than 1ms in packet 18. Packet 17 was captured at 14:13:32.092084000 and packet 18 was captured at 14:13:32.092743000. In all of the transactions the transactions I've compared in those screenshots the latency does not approach 2000ms.
Photo of Dennis Hamman

Dennis Hamman

  • 29 Posts
  • 4 Reply Likes
Between packets 3 and 4.?
Photo of Dennis Hamman

Dennis Hamman

  • 29 Posts
  • 4 Reply Likes
Alright finally after 3 months i got the result i expected.  AEROHIVE CHANGED THE WAY SNMP WORKED BETWEEN RELEASE 6.5R4 AND 6.5R5.  sorry for the caps but, it took 2 months to convince them there was a delay in the SNMP respone.  After that they sent the ticket to QA and came back with the response below.  If only I could call support and they could give me this answer within a day or so.

Dennis Hamman,
A new comment has been added to this case.

Case:       00195121
Subject:   SNMP error on latest golden HiveOS

Latest Case Comment:
Hi Dennis,

After some back and forth with QA it looks like there was a change in SNMP after 6.5r4. The average response time shifted from about .3 seconds to about 1 second and it was the byproduct of a change made to address an issue regarding CPU utilization and snmpd. It looks like this is expected behavior.
_______________________________________________________________

If you would like to respond or provide an update please reply to this email, or to review the history of this case, please click on the link below to login to our Support Portal.

Support Portal Login

Sincerely,
Aerohive Networks Technical Support


ref:_00D30pTvc._500a01KhNWp:ref

Photo of Nick Lowe

Nick Lowe, Official Rep

  • 2491 Posts
  • 451 Reply Likes
Hi Dennis,

Sorry about the delay.

Please do correct me if I am wrong, but it seems this is not service impacting and is an nuanced implementation detail? (Reading through the detail behind the change, it enhanced reliability by ensuring that the SNMP query is less likely to fail in CPU resource contention conditions at the tradeoff of increased latency.)

I understand this was escalated for clarification from the Tier 3 team 2017-05-04 and this was then explored with QA and engineering with a lower priority on the basis that it was understood to not be service impacting.

Sorry we didn't meet your expectations this time.

Can you clarify if this is causing you an operational issue or is it just cosmetic? Can you configure the thresholds in your SNMP poller?

Thanks,

Nick
(Edited)
Photo of Dennis Hamman

Dennis Hamman

  • 29 Posts
  • 4 Reply Likes
Nick, I agree that the change was made for good reason but, I don't understand why support cannot simply give me that answer in a day or so.  In the end I changed the threshold in my SNMP poller but, I don't think this should be  such a process I should have had this answer in a week or less.  Why is this not a known issue with 6.5r5 software and up?  
Photo of Nick Lowe

Nick Lowe, Official Rep

  • 2491 Posts
  • 451 Reply Likes
Hi Dennis,

Simply because support do not have access to that type of information at its fingertips as it is a private implementation detail, of which there are many nuanced implementation details in HiveOS (as with any complex system) that was known only to the engineers that work on this part of the software who have reference to and direct knowledge of the source code. It is not an issue as it is behaving as designed and intended in HiveOS 6.5r5 and is not service affecting. It required escalation up the chain eventually to the engineering team that work that part of HiveOS answer the question as to why it behaves as it does and to clarify that a change was made as a considered tradeoff between completing implementation goals.

Thanks,

Nick
(Edited)