Home AMX User Forum AMX Technical Discussion

AMX NI-700 and/or NXD-700(V)i Freezing Issues

Please bear with me for this long explanation.

Over the past two years we have deployed NI-700/NXD-700(V)i (the 'V' was removed in recent iterations of the panel) across our campus, totaling close to 100 rooms. Most of them have been put on our network and integrated with RMS.

The systems are relatively simple — one Panasonic projector, one Extron switcher, 3-5 sources, and an occasional Rane DSP with microphone inputs. They all have the same equipment, so consistency is not an issue.

Unfortunately, this summer has seen many of those room go offline because either the master or panel became unresponsive. The rash of incidents has got our help desk spooked — it is obviously a big problem if we need to cycle the power in two rooms per day.

I have only scratched the surface in troubleshooting the issue — as you might imagine, this is an issue that is difficult to replicate in a lab environment. Of course I called AMX and they could not help me over the phone.

An issue we have all but ruled out is power fluctuations. Some of our buildings have been known for power surges, and I suspected there might be some intentional power outages because of summer usage. Every attempt to try to replicate the issue using spikes and outages has failed, however; the system always comes back online.

The next troubleshooting step I plan to take is to emulate a "hacker attack" on a system in my office. Our systems are actually on an open network, but I have them locked down for security purposes. I'm not sure if we have an exposed port — maybe 1319 is being targeted? — or if there is another issue that can be exploited by hackers.

Finally, there is the issue of code. The problem with this route is that I have created a code core that automates almost everything — I went about it this way because we will have 200+ rooms online by the end of this project and I wanted a way to deploy code on a room-to-room basis without needing to re-write code each time. We have not had widespread issues like this in the past, however. I am unsure why they would have started cropping up now.]

At any rate... if you are still reading this, and you think you might be able to help, I would greatly appreciate anything you could give. Even if it's bouncing questions back and forth, every little bit will help.

I will gladly answer any questions about code or setup.

Comments

  • DHawthorneDHawthorne Posts: 4,584
    Are there any IP communications going on with the NetLinx masters? And I don't mean just IP devices, but any regular IP connection ... like for time servers or something like that. I've had systems that were working fine for years with no changes in the code suddenly start freezing because a DNS server entry changed, and it couldn't get online. I've had others act up for no other reason than something was going on with the network outside my control, and just as mysteriously stopped acting up ... and me never really finding out for sure what the cause of it was. Even being on a separate subnet won't help you, unless it's a VLAN, if there is a ton of sporadic traffic interfering with connections. My conclusion has been IP connections are the Achilles Heel of the NetLinx processor ... if those connections get stalled, it can bring down the whole master.
  • HedbergHedberg Posts: 671
    These kinds of things are not new. Here's a discussion about similar things:http://www.amxforums.com/showthread.php?7903-Network-Probs-since-change-switch

    So, make sure your firmware is up to date, disable UDP if not needed, turn off zeroconfig, consider 10T/half-duplex, set your duet memory. Check your URL routing against the masters which have problems, check DNS settings and gateways and all other things IP/TCP. Maybe try to get the IT people to tell you if they've changed anything with the network (switches etc), assuming you have the top secret security clearance necessary.

    added: so, to the extent possible make the IP communications among your masters as simple and transparent as possible. Understand that this is not absolutely possible being on "their" network and using RMS and all that.
  • annuelloannuello Posts: 294
    Here are some things we have seen over the years. It may or may not apply to you, depending on your network environment. All our AMX equipment is running the most recent publicly-released firmware. We use managed Cisco switches for all network infrastructure. Our AMX gear is set up as DHCP clients with the DHCP server issuing "reserved" IPs to each unit based on MAC address. (That way the AMX item always receives the same IP address, even though it is a DHCP client. Centrally-managed "sticky" IPs are great.) Unless we are doing some network-related diagnosing, we leave all AMX gear to auto negotiate link speed, which is typically 100/Full. Panels are on the same subnet & vlan as their masters, so they locate their master using System #.

    Router configuration updates:
    In the past the network guys would push out the occasional update to their routers (a 3-scond process, out of hours). I noticed that a large number of my CV7/700Vi panels would go missing in action as a result, but typically the masters would come back up okay. The "failure" rate was over 50%, but random in distribution. What was happening is that during the router update it would shut down all OSI layers except layer 1 (electrical connection). http://en.wikipedia.org/wiki/OSI_model I.e. No traffic routed to/from the end-points while the update is being applied. This resulted in end-points losing their IP address. The NetLinx masters are good at reacquiring an address when layers 2 -> 7 come back, but the touch panels struggle with it at the best of times. It seems to get confused when it sees electrical connectivity on the RJ45 jack but no link layer. (Looked at the panels setup pages: IP = 0.0.0.0) I found that rather than power-cycling the panel all I had to do was physically disconnect the IP lead to it, wait several seconds, and plug it back in. The panel then successfully negotiated an IP address, and everything worked again. I concluded from this that the panel was not locking up but that the IP stack gets stuck when there is an electrical connection but no link layer. I suspect that it only requests a DHCP lease renewal when it sees the physical (layer 1) come up, as well as the usual DHCP T1 and T2 renewal periods. I have requested that AMX consider adding a background/watchdog routine which runs every minute or so. If it doesn't have a valid IP it aught to re-request one. I don't think they have taken up my suggestion. I think our network guys now shut down the ports (including layer 1) when they update the routers, but I'm not 100% sure. I haven't seen the issue for maybe a year now.

    Spanning Tree issues:
    The other issue we have seen is when spanning tree updates go a bit wonky. The Spanning Tree Protocol is basically a network-routing related protocol designed to prevent IP packets going around in a loop when you have a mesh network. Meshed networks are great for fail-over redundancy, but if the spanning tree gets mucked up the routers don't know where to send the IP packets - the packets can literally end up getting lost in the Ether. We recently had a new switch which was not behaving 100% upon arrival. I think it may have been responsible for mucking up the spanning tree on one particular campus. We were seeing a large percentage (> 50%) of AMX masters dropping offline and not coming back online. I was not on-site to try a disconnect/reconnect of IP cable, but a reboot of the master would get it back online. It would potentially drop offline again anywhere from 1 hour to several weeks away. Drop-offs were random in both time and location/master. There seemed to be a bit of a correlation with DHCP lease-renewal, but the one DHCP server serviced all campuses and we were only seeing the issue on the one campus. Once the faulty switch was removed the problem stopped occurring. Getting empirical evidence to diagnose a fault Spanning Tree is a pretty difficult thing since it involves many routers and updates to the spanning tree are somewhat random as to when they occur.

    I hope this is of some use to you.

    Roger McLean
    Swinburne University
Sign In or Register to comment.