Panels dropping offline
cma
Posts: 94
in AMX Hardware
I have run into this on numerous occasions and happens with wifi, hardwired and tpcontrol ipads. The issue is that the panel will drop offline, come back online after 5 or 10 seconds, stay online for about a minute or two then drop offline again for 5 to 10 seconds. They will cycle like this off and on. I have tried static ips, dhcp, url and listen connection types and so on. What is the correct fix for this?
0
Comments
In two locations where I have personally seen this, it was a failing network switch. In others, it was a loop. In still others, it was hacking from outside on an exposed internet connection... not that they got in, just that the constant attempts used up the NetLinx resources for network connections... and it fell over.
Oh, well that's a horse of a different color...
In this case, your networking people might be able to help indeed. Some common things in commercial grade networks is IT department making changes to the switches for one reason or another. If you can get them to do it, you might ask them to do no port blocking on the segment with all the AV gear on it. Another I've seen quite a few times is when the IT Admins are pushing out mass software/firmware upgrades over the network to the various workstations on site. This kind of thing can flood the network. It just shows up as network sluggishness to the computer users who just get annoyed. But, to the AV system that is quite often looking for heartbeats it can temporarily take it down.
A lot will depend upon the general disposition of the IT people. If they're a grumpy lot, you might have trouble getting it resolved. If, however, they're pretty "chill" they are usually pretty helpful.
Anything I can look at regarding the settings in TPControl itself?
Line 41 (13:36:49):: Device 10008:1:1 Push on Chan:6
Line 42 (13:36:49):: Device 10008:1:1 Push on Chan:6
Line 43 (13:36:50):: Device 10008:1:1 Push on Chan:6
Line 44 (13:36:50):: Device 10008:1:1 Push on Chan:6
Line 45 (13:36:52):: Ignoring/closing connection from IP address 192.168.101.105 (Duplicate of Device=10008:1:0)
Line 46 (13:36:58):: Ignoring/closing connection from IP address 192.168.101.105 (Duplicate of Device=10008:1:0)
Line 47 (13:37:04):: Ignoring/closing connection from IP address 192.168.101.105 (Duplicate of Device=10008:1:0)
Line 48 (13:37:10):: Ignoring/closing connection from IP address 192.168.101.105 (Duplicate of Device=10008:1:0)
Line 49 (13:37:16):: Ignoring/closing connection from IP address 192.168.101.105 (Duplicate of Device=10008:1:0)
Line 50 (13:37:19):: Device 10004:1:1 Push on Chan:31
Line 51 (13:37:19):: Sat on audio input 28 was removed from zone 4
Line 52 (13:37:23):: Ignoring/closing connection from IP address 192.168.101.105 (Duplicate of Device=10008:1:0)
Line 53 (13:37:25):: CIpEvent::OffLine 10008:1:1
Line 54 (13:37:29):: CIpEvent::OnLine 10008:1:1
When it says "Duplicate of Device" does this mean that AMX thinks there is another device 10008 on the system? If so, there is not another device 10008...
Line 61 (13:41:37):: Device 10009:1:1 Push on Chan:5
Line 62 (13:41:38):: Device 10009:1:1 Push on Chan:5
Line 63 (13:41:38):: CIpEvent::OffLine 10009:1:1
Line 64 (13:41:41):: Invalid message received @ CDMDeviceManager (00A8)
Line 65 (13:41:41):: CIpEvent::OnLine 10009:1:1
Line 66 (13:41:45):: Device 10009:1:1 Push on Chan:6
Line 67 (13:41:45):: Device 10009:1:1 Push on Chan:6
system seems to be responsive then the panel drops.. you wait a few seconds to come back up and then it works again until the next drop.
Basically the panels at rest never drop offline, if I pick one up and start pressing Channel up or down or any other function and if I continue to do so whether it's one press a second or one press every ten seconds the panels will drop every 30-40 seconds or so then reconnect within a second or two. Stop pressing buttons and the panel stays online..
I've seen this a lot over the years, and it means the panel abandoned the prior and still existing connection and made a new request for connection. The Netlinx at this point seems ever more determined to reject the new connection and hang on to the one that's actually gone.
Newer firmware has helped the IP robustness of the NetLinx in challenging settings... like storms of hacker attempts and the link, but this still occurs. Try a cold reboot of the NetLinx and see if you get immediate relief... before whatever is challenging the connections in the first place gets ahead of the Netlinx cleanup algorithms.
This isn't a cure, but it does localize what is the bottleneck. If you can't figure out what's actually causing the issue, you can maybe see how long it takes to get ugly, and schedule an off-hours Netlinx reboot. Yeah, that's a hack. But one you can do.
We went so far as to institute drop counters (with logs) that would actively refuse reconnections if more than a set threshold of disconnects happened in a short period. It puts up a warning on the panel as the last thing it tries to say, indicating that the panel connection is causing disruption and may be out of range. Then the user has to either wait about 5 minutes and we try to start over, or they can press a TRY TO CONNECT NOW button on the warning page to try to resume. The "time away" both gives a chance for the disruption to be cured or stop on its own, and gives the NetLinx a chance to clear its head without struggling over the connection.
It helped a little. It leaves copius logs that have sometimes helped determine times and devices to look at.
One place I saw this sort of thing happening had in fact a bad master access point, it would cut out and back on without provocation, but it certainly provoked the TPControl devices that demand a steady connection.