Panels dropping offline

cma · July 2016

I have run into this on numerous occasions and happens with wifi, hardwired and tpcontrol ipads. The issue is that the panel will drop offline, come back online after 5 or 10 seconds, stay online for about a minute or two then drop offline again for 5 to 10 seconds. They will cycle like this off and on. I have tried static ips, dhcp, url and listen connection types and so on. What is the correct fix for this?

John Nagy · July 2016

Your network or your netlinx is doing this. It's clearly NOT the panels, no possible flaw can affect them all, at the same time, in the same way... it's nearly certainly the network. You have a conflict of addresses, a loop in your wiring topology, or something is irritating your NetLinx so as to dump connections. Monitor with telnet, see if all the panels go off at once, see if you have "connection refused" meaning some address may be used by more than one device.

In two locations where I have personally seen this, it was a failing network switch. In others, it was a loop. In still others, it was hacking from outside on an exposed internet connection... not that they got in, just that the constant attempts used up the NetLinx resources for network connections... and it fell over.

vining · July 2016

Some networked devices like directv and sonos are notorious for broadcast storms on enterprise networks with out stp enabled and config'd properly.

fogled@mizzou · July 2016

There's a known bug with older CV7 panels, the NICs (at a hardware level) can't handle any significant amount of broadcast packets at all; on any busy network, they choke and start dropping offline. Some Kaspersky installation processes were notorious for getting stuck on proxy discovery and spamming networks with broadcast traffic. AMX has released a hotfix for the problem though, it effectively makes the reconnection cycle so fast nobody ever notices it. Otherwise, it's almost always been some kind of network disruption causing symptoms like that. I'd start with the switch and go out from there.

ericmedley · July 2016

It might be worth your while to put a lappy with Wireshark on the network for a while and see if there's some kind of Chatty Cathy on the network. Older VoIP phones have been known to bring a network to its knees. Stuff like that can bite you in the butt all the time without knowing it.

cma · July 2016

Thanks for the replies.. Another company handles the network at this particular job so I guess I will need to get them involved. This is a business class Cisco/Meraki network, basic tools like Fing and so on don't show any issues.

ericmedley · July 2016

cma wrote: »

Thanks for the replies.. Another company handles the network at this particular job so I guess I will need to get them involved. This is a business class Cisco/Meraki network, basic tools like Fing and so on don't show any issues.

Oh, well that's a horse of a different color...
In this case, your networking people might be able to help indeed. Some common things in commercial grade networks is IT department making changes to the switches for one reason or another. If you can get them to do it, you might ask them to do no port blocking on the segment with all the AV gear on it. Another I've seen quite a few times is when the IT Admins are pushing out mass software/firmware upgrades over the network to the various workstations on site. This kind of thing can flood the network. It just shows up as network sluggishness to the computer users who just get annoyed. But, to the AV system that is quite often looking for heartbeats it can temporarily take it down.

A lot will depend upon the general disposition of the IT people. If they're a grumpy lot, you might have trouble getting it resolved. If, however, they're pretty "chill" they are usually pretty helpful.

cma · August 2016

Just though t I would update this. Network admins played with spanning tree settings, things are much better. I'm still getting some random drops but it's hard to get good info from the client on when this happens and what other things may be going on in the home at the same time. I'm on site right now and the claim that the master bedroom ipad is going offline so much that they can't use it yet since I've been here this morning it's only gone offline once in the last hour. Fun, fun, fun....

cma · August 2016

OK, after being here onsite the ipads are staying online while sitting idle, if you pick them up and start using them they are still dropping randomly every half a minute or so and then they reconnect quickly. What can I look at or try next? Network guy says there is very little wifi and network traffic and can't get much else out of him.

Anything I can look at regarding the settings in TPControl itself?

cma · August 2016

OK, I just ran across this in Netlinx Studio Diagnostics...

Line 41 (13:36:49):: Device 10008:1:1 Push on Chan:6
Line 42 (13:36:49):: Device 10008:1:1 Push on Chan:6
Line 43 (13:36:50):: Device 10008:1:1 Push on Chan:6
Line 44 (13:36:50):: Device 10008:1:1 Push on Chan:6
Line 45 (13:36:52):: Ignoring/closing connection from IP address 192.168.101.105 (Duplicate of Device=10008:1:0)
Line 46 (13:36:58):: Ignoring/closing connection from IP address 192.168.101.105 (Duplicate of Device=10008:1:0)
Line 47 (13:37:04):: Ignoring/closing connection from IP address 192.168.101.105 (Duplicate of Device=10008:1:0)
Line 48 (13:37:10):: Ignoring/closing connection from IP address 192.168.101.105 (Duplicate of Device=10008:1:0)
Line 49 (13:37:16):: Ignoring/closing connection from IP address 192.168.101.105 (Duplicate of Device=10008:1:0)
Line 50 (13:37:19):: Device 10004:1:1 Push on Chan:31
Line 51 (13:37:19):: Sat on audio input 28 was removed from zone 4
Line 52 (13:37:23):: Ignoring/closing connection from IP address 192.168.101.105 (Duplicate of Device=10008:1:0)
Line 53 (13:37:25):: CIpEvent::OffLine 10008:1:1
Line 54 (13:37:29):: CIpEvent::OnLine 10008:1:1

When it says "Duplicate of Device" does this mean that AMX thinks there is another device 10008 on the system? If so, there is not another device 10008...

cma · August 2016

One of the other iPads generates this response when it drops...

Line 61 (13:41:37):: Device 10009:1:1 Push on Chan:5
Line 62 (13:41:38):: Device 10009:1:1 Push on Chan:5
Line 63 (13:41:38):: CIpEvent::OffLine 10009:1:1
Line 64 (13:41:41):: Invalid message received @ CDMDeviceManager (00A8)
Line 65 (13:41:41):: CIpEvent::OnLine 10009:1:1
Line 66 (13:41:45):: Device 10009:1:1 Push on Chan:6
Line 67 (13:41:45):: Device 10009:1:1 Push on Chan:6

vining · August 2016

Is there any slugishness or missed pushes or feedback during these seizures?

cma · September 2016

vining wrote: »

Is there any slugishness or missed pushes or feedback during these seizures?

system seems to be responsive then the panel drops.. you wait a few seconds to come back up and then it works again until the next drop.

Basically the panels at rest never drop offline, if I pick one up and start pressing Channel up or down or any other function and if I continue to do so whether it's one press a second or one press every ten seconds the panels will drop every 30-40 seconds or so then reconnect within a second or two. Stop pressing buttons and the panel stays online..

John Nagy · September 2016

>>>Line 45 (13:36:52):: Ignoring/closing connection from IP address 192.168.101.105 (Duplicate of Device=10008:1:0)

I've seen this a lot over the years, and it means the panel abandoned the prior and still existing connection and made a new request for connection. The Netlinx at this point seems ever more determined to reject the new connection and hang on to the one that's actually gone.

Newer firmware has helped the IP robustness of the NetLinx in challenging settings... like storms of hacker attempts and the link, but this still occurs. Try a cold reboot of the NetLinx and see if you get immediate relief... before whatever is challenging the connections in the first place gets ahead of the Netlinx cleanup algorithms.

This isn't a cure, but it does localize what is the bottleneck. If you can't figure out what's actually causing the issue, you can maybe see how long it takes to get ugly, and schedule an off-hours Netlinx reboot. Yeah, that's a hack. But one you can do.

We went so far as to institute drop counters (with logs) that would actively refuse reconnections if more than a set threshold of disconnects happened in a short period. It puts up a warning on the panel as the last thing it tries to say, indicating that the panel connection is causing disruption and may be out of range. Then the user has to either wait about 5 minutes and we try to start over, or they can press a TRY TO CONNECT NOW button on the warning page to try to resume. The "time away" both gives a chance for the disruption to be cured or stop on its own, and gives the NetLinx a chance to clear its head without struggling over the connection.

It helped a little. It leaves copius logs that have sometimes helped determine times and devices to look at.

One place I saw this sort of thing happening had in fact a bad master access point, it would cut out and back on without provocation, but it certainly provoked the TPControl devices that demand a steady connection.

Panels dropping offline

Comments