AMX controller locked up every couple weeks
yanbin
Posts: 86
Hi everyone, we having a locked up issue for an AMX system, every couple weeks the controller locked up, needs reboot to make it work again. it had happened for last 6 months. At the beggining we thought it was a controller issue, so we changed controller(Ni3100), but iusse happened again. in this system, autopatch, iport dock, lutron lighting using RS232 contol, rest are IRs, all the AMX devices(Panels, access points, controller) connecting to a D-Link 8 ports switcher then connect to the house 3Com 48 ports ethernet switcher, did anyone have this kind of problem?
Thanks,
yanbin
Thanks,
yanbin
0
Comments
This seemed to happen with the cheaper routers/switched. (particularly the old blue Linksys WG54-whatevers) The only fix was a reboot. I still have a couple systems out there that the owner refuses to upgrade to some kind of a pro level router/switch. My fix there is to program in a nightly reboot. That has fixed the issue.
The real way to see if it's genuinely locked up is the status light is frozen and not blinking.
If this is going on, there are a multitude of possible causes. Bad or corrupt firmware files, problem with the logic board, power issues. The list can be very long.
Yes we have 2 systems rolled out just like this. There's 2 projectors that return 'empty' string on occasion "$FF,$FF,$FF,$FF,$FF" that locks up the message queue eventually. Or at least that's what I think. I haven't written the program myself so I'm not entirely positive, but AMX has verified that it shouldn't be the program causing it.
In the other system they do think it's the programming that's causing part of the malfunction, so we're rewriting that. At least they admit that it shouldn't be causing lock-ups though.
Edit:
Forgot to tell you it's the device controller locking up (or rather stuck in a loop) and it can't process the messages it receives.
The NetLinx IP services aren't the most robust; they've been improving in every firmware for years, but flaky networks can bring them down. I think it is not so much that the "network card" is dead, rather it seems the OS has given up talking through it.
Worst problems are those where an IP device drops offline and comes back quickly. Each time it does, it uses a new IP resource in the NetLinx, and if it happens faster than the NetLinx can clean up the dead ones, soon you have no sockets left to connect to, at which point the cleanup seems to get lost too. First you lose FTP, then ip control, then panels and finally TELNET won't connect either. At which point a hard power reboot is the only "cure". BUT if you don't relieve the actual network cause, it's not a cure, only a treatment. It will happen again.
We also see cheap routers as a recurring issue; the customer often thinks they know IP and the hardware they put in before us should be just fine, thank you, don't go stealing my money for things I know I don't need. About the second time we roll a truck on a T&M to reboot because their $30 router lost track of DHCP or DNS long enough to mess up the Netlinx, the $200 router we recommended starts looking like a bargain.
By the way, we don't use DHCP on the NetLinx, or panels, or anything. Use STATIC, we put them in high numbers above the DHCP range of the router, and put in hard DNS on everything. And the GOOGLE public DNS at 8.8.8.8 and secondary at 8.8.4.4 work like lightning. I generally put the router as DNS 1 and GOOGLE as 2 and 3 in the DNS lists.
BTW, all the IP devices are using Static IP address.
We use Cicso, Sonic Wall, Ruckus, there is a business class - Linksys that's actually pretty okay.
Netgear Pro line, the metal blue box ones, are a good value and pretty stable - the lowest grade I'd go for a customer. They are lifetime guaranteed, and we've had good luck with them too.
Consumer grade stuff works just fine, if you don't mind rebooting them a few times a year. Which is fine if it's your own house and you know that. Let's see, if each switch and router only locks up ONCE a year, a system will need a service call, what, six or so times a year? Just to power cycle a $20 box.
For a customer whose system just quit, they only know it quit and dammit they paid a lot for it, so get out here now and fix it! What's your time worth? Don't do it. Get better hardware.
Note, the device firmware is what (supposedly) fixes this problem; but 1.30.4 device firmware REQUIRES 3.60.447 master firmware. So upgrade the master first, then the device. Couple of caveats:
- First, turn off as much ICSP bus traffic as possible. You can send_command foo,"'RXOFF'" to every port in the system, or I guess you could load an empty program to accomplish the same thing.
- Even having done that, it is still possible to brick your NI. I did it once three weeks ago, and again on Wednesday. In addition, the AMX engineer and I each had systems we thought we'd bricked, but thankfully those two were OK. The others? PO and overnight to the hotel; not pretty.
Godspeed!
.
We've got several installations with the same customer and they all have NI3100 masters with two wired Modero panels each. From time to time the master will lose connection with the touch panels. It always seems to happen when the panels go offline and come back on line a couple times (as described above). Today it happened after loading TP files -- the master appeared to be working fine but after loading the TP files to the TPs they didn't come back on line. Netlinx studio seemed to maintain connection fine. Rebooted the master (soft) through Studio and all came back fine.
These are the only installations that we have had that experience with and they all use the inexpensive Linksys 5-port switch (no router on the network). I'm thinking it might be something with the switch, but maybe I should update the firmware on the master and the NI device. These are the only installs we have with the Linksys switches -- provided by the customer. I'm thinking of programming the master to reboot itself if the TPs both go offline and stay offline. Interesting, it never happens except when I'm on site loading TP files or programming or something. Maybe I need to have my chakra re-aligned.
I'm suspicious of the Linksys switches and am thinking about trying to get them replaced with blue-metal-box Netgear switches. We've got about 100 of the blue-metal-box Netgear switches and WAPs spread around Houston and have very few issues with them.
But, the blue box Netgear stuff is involved with the other problem we've been having and I'm going to start another thread about that as I think it's a power stability issue separate from the topic of this thread. Wouldn't want the thread Nazi to accuse me of hijacking.
So in this case the device controller is stuck in a loop and still sending it's polling strings, but it cannot process anything that gets returned. So it's stuck in a loop. (at least that's what it looks like if you check the diagnostics and the notifications)
I'm not the only one having this issue, several of my colleagues whom I'm in contact with have the exact same issue. And this is happening on years old systems, I have yet to see it in one of the systems I have personally programmed, but there's no reason for the code that was working years ago to suddenly stop due to new firmware.
edit:
I've just been given the new firmware version chill was talking about, according to the release notes that came with it, it should fix this issue. It also describes why I haven't been having this issue as I don't tend to poll a lot. So the activity on systems I programmed is far less than what's necesarry to cause these lock-ups.
I will put this firmware on the system that's been having this issue and let you know if this fixes it.