Bonjour Gateway stops advertising services after a time?

  • 1
  • Question
  • Updated 4 years ago
  • Answered
  • (Edited)

I'm having issues with a client using BG on an AP350 on 6.1r3b.


I have to restart BG once a day to have all of the AppleTVs consistently able to be seen, and I'm trying to find a long-term solution. The threads here haven't shown a comprehensive solution yet, and a lot of them are many months old. I don't think restarting BG once a day is the final piece to the puzzle.

After restarting BG, everything's great. Then after some time passes (I'm dependent on the client notifying me, as there's nothing in HiveManager that shows me definitively that it has failed):

-Monitor->BG shows the services being shared from the correct VLAN and IP


-"show bonjour status" shows that the BDD is still attached to all denoted VLANs

-Yet the client cannot "see" some AppleTVs in the list 

VLAN300 is the shared devices VLAN, with AppleTVs and printers statically assigned. AppleTVs are using wireless, I haven't been able to get the client to test them wired-in.

Photo of Steven Bateman

Steven Bateman

  • 65 Posts
  • 12 Reply Likes

Posted 4 years ago

  • 1
Photo of Andrew MacTaggart

Andrew MacTaggart, Champ

  • 483 Posts
  • 86 Reply Likes
Hi Steve

How many services are being advertised? you could always bring down you advertised services by allowing vlan group [appleTVs, printers etc..] to the user vlan group. Unless you want to advertise Sally's itunes etc...

there was a bug, but you have to check the release notes to see which version it was fixed.
If the bonjour gateway didn't hear from the bonjour device for awhile it would cleanse it from it's records.

you can always try this secret command from cli on the BDG

_test bgd show 7

In general the service advertisements and discovery query messages will increment 

From RFC 6762

"Therefore, when retransmitting Multicast DNS queries to implement
this kind of continuous monitoring, the interval between the first
two queries MUST be at least one second, the intervals between
successive queries MUST increase by at least a factor of two, and the
querier MUST implement Known-Answer Suppression, as described below

The Known-Answer Suppression mechanism tells
responders which answers are already known to the querier, thereby
allowing responders to avoid wasting network capacity with pointless
repeated transmission of those answers. A querier retransmits its
question because it wishes to receive answers it may have missed the
first time, not because it wants additional duplicate copies of
answers it already received. Failure to implement Known-Answer
Suppression can result in unacceptable levels of network traffic.
When the interval between queries reaches or exceeds 60 minutes, a
querier MAY cap the interval to a maximum of 60 minutes, and perform
subsequent queries at a steady-state rate of one query per hour. To
avoid accidental synchronization when, for some reason, multiple
clients begin querying at exactly the same moment (e.g., because of
some common external trigger event), a Multicast DNS querier SHOULD
also delay the first query of the series by a randomly chosen amount
in the range 20-120 ms."

I see more issues with our Cisco equipment, macbooks need to kick start the bonjour process after being connected for several hours.

Cheers
A



Photo of Steven Bateman

Steven Bateman

  • 65 Posts
  • 12 Reply Likes

Here's part of "show bonjour status" as of now, about 22 hours after I last restarted it.


Total 4 Local Attached VLANs: 154 156 200 300 
Total 0 Remote BDDs:
Bonjour VLAN range:   154       156       200       300  
Total Services: 75; Total Self-Services: 0; Published Times: 102


I also ran "_test bgd show 7" and now local keepalive is enabled. Not 100% clear on that mechanism but that's OK for the moment.


I'm afraid I won't have much more to update until I hear one way or the other from the client, but I'll let you know.


Thanks for the input.

(Edited)
Photo of J. Goodnough

J. Goodnough, Champ

  • 266 Posts
  • 32 Reply Likes
I currently have to have _test bgd show 7 enabled on my BDD in order to keep things running properly. Note that you'll have to re-enter the command at every reboot of your BDD.
Photo of Steven Bateman

Steven Bateman

  • 65 Posts
  • 12 Reply Likes
The test command seems to work. The client hasn't reported any issues over the last 4-5 days. Thanks for the help!

I'll update this thread if we encounter anything else related.
Photo of Mike Kouri

Mike Kouri, Official Rep

  • 1030 Posts
  • 271 Reply Likes

Steve Bateman, what is the current state for local keepalives in your environment? They are now enabled? And your environment seems to be more stable than from before making this change?

Other Folks contributing to this thread (or just here to lurk & learn),

This particular command, like all "_test ..." commands, was designed for Aerohive internal use only. It SHOULD NOT be used EXCEPT under the direction of Aerohive technical support. The lack of documentation is intentional as is the delay in fixing the typo (hey, WE know what it means, and if you'll never see it then the urgency of fixing a typo goes way down).

Back before the release of HiveOS 6.1r2 we had a theory that the presence of local keepalives was contributing to the problems of "losing services" so we extended an internal-use-only command to allow control over whether to do local keepalives or not and tested that at several customer sites.

This appeared to address their problems, so in HiveOS 6.1r2 we changed the behavior of the Bonjour Gateway to disable local keepalives by default. That's one of the times I (prematurely) declared success here. Since the _test command is hidden, we did not remove it from the codebase. The command is a toggle, executing it twice leave you in the original state.

So, for folks running 6.1r2 or later on their APs, executing this command does exactly the opposite of what we thought would make things better. 

I am trying to find out if there is a need for us to create a persistent command to control whether local keepalives are enabled or disabled. 

Photo of Andrew MacTaggart

Andrew MacTaggart, Champ

  • 483 Posts
  • 86 Reply Likes
Mike

Feel free to redact the _test command

Cheers
A
Photo of Mike Kouri

Mike Kouri, Official Rep

  • 1030 Posts
  • 271 Reply Likes
There's no un-ringing that bell, my friend. No harm, no foul; I appreciate all the help and expertise shared by the champs.
Photo of Steven Bateman

Steven Bateman

  • 65 Posts
  • 12 Reply Likes
Mike,

Thanks for the response. Anecdotally the issue seems to be better, but there was a point yesterday where the Apple TVs disappeared, and then showed up again after a reboot of an AP that was NOT the BDD. This was completed by the client to address a backup AP RADIUS server not joining AD properly. I'm reliant on the client feeding info to me, so I'm afraid I don't have anything more quantitative than that at this point.

I've run the command once. So presumably local keepalives are now enabled.

We have a different client that runs BG and has since last October without any need to reboot it. So tracking down the triggering behavior has been a struggle because so much of both Bonjour itself and Bonjour Gateway is intended to be relatively automatic.


Photo of Steven Bateman

Steven Bateman

  • 65 Posts
  • 12 Reply Likes
Unfortunately, we're still seeing this issue.

With the upcoming changes to Bonjour Gateway, and the upcoming changes to Airplay in iOS 8 and presumably Yosemite this might get a whole lot better... in 3-6 months..

Does anyone have any more ideas for workarounds in the interim?

The Apple TVs are wireless currently, so there may be some benefit to wiring them via Ethernet. Other than that I'm out of ideas other than restarting their BG once a day for them.
Photo of Andrew MacTaggart

Andrew MacTaggart, Champ

  • 483 Posts
  • 86 Reply Likes
So the AppleTVs are in the BDD list, but clients can not see them.

Can I ask what the clients are.

In the same location do you notice a difference between an IPAD that was alseep and a MB that stays awake for long periods of time?

In general we are seeing better results with ethernet connected AppleTVs

I will be deploying additional 100+ CrappleTVs this summer, so I can't wait for all the fun to start. I call it AppleTV sprawl.

Did I mention I hate Bonjour! there might be a fan club somewhere.

Anyway with our current setups, the wireless AppleTVs either need the client to turn wifi off and on or turn airplay on and off on the AppleTV and maybe a rare reboot of the AppleTV.

Our ethernet AppleTVs are more stable, but if the MBA or MBP has been on over an hour or so then they need to kick start the airplay process somehow.

I actually have 4 different setups
Aerohive BDD with AppleTvs as wireless clients - unpredictable

Cisco WLCs code 7.0 with Aerohive AP as BDD - stable but the AP craps out when services reach a high value between 900 and 1000 services. Mostly on rainy days

Cisco WLCs 7.4 using the cisco phase 2 Bonjour Gateway with wireless AppleTVs - unpredictable and no control since all roads lead to Rome

Cisco WLCs 7.4 using the cisco phase 2 Bonjour Gateway with wired AppleTVs - better then wireless AppleTVs but issues with teachers that use their MBA longer then 1 hour and no control since all roads lead to Rome

Since Ipads tend to sleep more then MBA and MBP they seem to get the lists easier.

Things I will be doing this summer
select APs that are not busy - very few clients for the BDDs

filter the services needed
so AppleTV vlan, printer Vlan etc. and only allow these services to be advertised to the client vlans. Unfortunately this does not prevent the BDD from learning about services that I might not care about. [every environment is different, some might want to see every bonjour device in the environment]

break up the bonjour devices into smaller bonjour domains, requires multiple network policies and bonjour policies. Added work but hopefully the lists of airplay devices can be limited to <25 and the services learning can be controlled. can be difficult in multi floor environments.

ibeacons are here, but require the idevices to support bluetoothv4 and currently I think only IOS7.1 devices are supporting this feature of discovery through ibeacons. I suspect the OSX might support this soon.

maybe things will get better.

Cheers and let me know if you find any solutions
A


Photo of Steven Bateman

Steven Bateman

  • 65 Posts
  • 12 Reply Likes

-Clients are Mac and PC laptops as far as I know. No major iOS presence for the use of Airplay that has been communicated to me. The PCs use AirParrot.

-I'm making on on-site visit today that may be able to shed more light on usage, and I can test it myself as well. It seems like wiring in the Apple TVs will be the best option for now. These are used in conference rooms so it doesn't seem like they would be using the AppleTV for more than an hour or so at a time.

-I can't imagine I'm running into a filtering issue because it's four Apple TVs total and less than 30 users in a small two-floor office, but I have seen stranger things. There are four Airplay devices total and two printers, that's really it. The BDDs do learn everyone's file sharing, iTunes, etc so total services end up around 100.  But I'll keep your smaller domain suggestion in mind for sure.

Thanks for the help, we'll see what comes out of today's visit.

Photo of Andrew MacTaggart

Andrew MacTaggart, Champ

  • 483 Posts
  • 86 Reply Likes
Hi Steve

I wanted to follow up, and clear up something.

I wasn't saying the client was connected to an appleTV for 2 hours, was saying that the client was connected to WiFi for 2 hours and then they try to use Airplay services.

At the end of the day I am sure the issues resides with the mechanisms used by Bonjour to reduce overhead. My best guess is the exponential back-off, which is what I was trying to describe.

Since tablets and iphones sleep more then laptops, they seem to see the services better, as long as the services are in the BDD. If the laptop in question, turns off the wifi and turns it on, it restarts the mdns querry. That is how I am interpreting the described behaviour.

https://developer.apple.com/library/mac/Documentation/Cocoa/Conceptual/NetServices/Articles/about.ht...

Bonjour makes use of several mechanisms for reducing zero-configuration overhead, including caching, suppression of duplicate responses, exponential back-off, and service announcement, as described in the following sections

Exponential Back-off and Service Announcement
When a host is browsing for services, it does not continually send queries to see if new services are available. Instead, the host issues an initial query and sends subsequent queries exponentially less often, for example: after 1 second, 3 seconds, 9 seconds, 27 seconds, and so on, up to a maximum interval of one hour.
This does not mean that it can take over an hour for a browser to see a new service. When a service starts up on the network, it announces its presence a few times using a similar exponential back-off algorithm. This way, network traffic for service announcement and discovery is kept to a minimum, but new services are seen very quickly.
Services running on a Bonjour-equipped host are announced automatically when they register with the mDNSResponder daemon. Services running on other hardware, such as printers, should implement service announcement with exponential back-off to take full advantage of Bonjour.
The keep alive function on the BDD would test to see if the service was still alive, and remove it from the BDD if no response was received.

I am interested in what you have discovered.

Cheers
A


Photo of Steven Bateman

Steven Bateman

  • 65 Posts
  • 12 Reply Likes
Oh OK, that explains these mechanisms more clearly. Thanks for that.

The client chose to wait and make no major changes. Due to lack of physical jacks we would have had to bridge through an existing connection to the AppleTV or change the network architecture fairly significantly. They chose to just wait it out until changes come with iOS 8 and an updated BG. Which is fine, but I think it would have been a good empirical test.

When I visited on-site, naturally BG was working flawlessly...



Photo of Steven Bateman

Steven Bateman

  • 65 Posts
  • 12 Reply Likes
While this issue isn't being reported anymore, I may have stumbled upon a possible original cause for this by accident, while troubleshooting something else.

Would this issue have anything to do with the PPSK reauthorization time being set to the default of 30 minutes?

It seems like if anything, forcing the Apple TV to generally reauthenticate so aggressively might be a major driver of this behavior, especially combined with the symptom of incrementing their hostnames. This is something I missed in the original config.
(Edited)