Me and my odd problems, this time an NI-3000
DHawthorne
Posts: 4,584
in AMX Hardware
I have an NI-3000 that keeps losing it's network connection. There are no error messages except the touch panels falling offline, and, of course, subsequent failure of IP device. It will restore after a reboot, but doesn't seem to run for more than a day. This system was working just fine until I was in there to update firmwares; I needed to do the R4/Zigbee updates, and updated the rest while I was in there. I must also add, however, that there was also some manner of power issue in the house that killed the power conditioner and shut everything down. It was when I reset and later replaced the power conditioner that I did the upgrades and started seeing this problem.
Near as I can tell, the master is not locking up. It is just unable to communicate on the network. Other network devices are fine, including the 8400's, which I can VNC into ... but they show as not connected to the master.
I have a nearly identical system in another home belonging to the same client, and with the same updates, that system is just fine. That one, however, is running on an NI-3100, not a 3000.
I had this same problem once before with another client, and again it was a 3000 ... but I never did find a real solution there. The client was so upset over having to constantly reboot the system, I didn't have any leeway to experiment; I put a timer on the thing to make it reset itself every night at 2 AM as long as it wasn't in use. It hasn't gone offline since.
Any thoughts or similar experiences? I am leaning towards a network issue of some sort, and perhaps one that is specific to the NI-3000's . Unfortunately, I can't get to the site until tomorrow (Tuesday), and since it is a critical system (remote vacation house that has a tendency for the furnace to shut down), I can't experiment again, and will need to put a reset band-aid on it at least for the holidays. The client needs to be able to monitor the home, and the NetLinx is his gateway for that. But I want a real solution to this problem for the long term, and would appreciate any input from anyone out there with similar problems.
Tech support, I might add, is stumped. It's getting to where I can almost guarantee if I can't find an issue myself, AMX doesn't know about it either.
Near as I can tell, the master is not locking up. It is just unable to communicate on the network. Other network devices are fine, including the 8400's, which I can VNC into ... but they show as not connected to the master.
I have a nearly identical system in another home belonging to the same client, and with the same updates, that system is just fine. That one, however, is running on an NI-3100, not a 3000.
I had this same problem once before with another client, and again it was a 3000 ... but I never did find a real solution there. The client was so upset over having to constantly reboot the system, I didn't have any leeway to experiment; I put a timer on the thing to make it reset itself every night at 2 AM as long as it wasn't in use. It hasn't gone offline since.
Any thoughts or similar experiences? I am leaning towards a network issue of some sort, and perhaps one that is specific to the NI-3000's . Unfortunately, I can't get to the site until tomorrow (Tuesday), and since it is a critical system (remote vacation house that has a tendency for the furnace to shut down), I can't experiment again, and will need to put a reset band-aid on it at least for the holidays. The client needs to be able to monitor the home, and the NetLinx is his gateway for that. But I want a real solution to this problem for the long term, and would appreciate any input from anyone out there with similar problems.
Tech support, I might add, is stumped. It's getting to where I can almost guarantee if I can't find an issue myself, AMX doesn't know about it either.
0
Comments
I had the same exact experience after upgrading firmware on an NI-3000, the NI-3000 kept dropping out of sight. Turns out the port on the Cisco switch had to be reconfigured for auto duplex instead of some sort of forced duplex. I wasn?t on-site myself nor did I talk to the network guy but that was the feedback I received. After the Cisco switch port was reconfigured life was good.
Interesting that the time I saw this we were forced to use the customer's network and there were managed Cisco switches involved. We tried setting the ethernet mode on the masters that were affected without success. We weren't allowed to mess with the switches -- the customer's IT guys did that. That's one reason it's tough to say what was causing the problem and what fixed it. The IT guys were messing with the switches IAW what they considered their standards to be, but they wouldn't tell us anything -- too big of a secret.
I?ve had jobs several jobs where there were initial network issues with keeping touch panels online and such. Then after somebody waved the magical Cisco wand things cleared up.
This was the first time that I know of where a Cisco switch had to be reconfigured after an NI firmware upgrade. Things were working fine for years before that.
I wish I knew more about the networking world. I know enough to get by and sound dangerous sometimes but there is so much more to learn.
You can try using a dumb 10/100 switch in line with whatever the facilities switches are. That seemed to help in my situation.
It is just a fact that not all Ethernet PHY chips are created equal. The times I have seen the issue there were a few common causes:
1. Wiring. The termination wasn't done properly or there was an issue in the wire from the inside of the wall to the other end. In each case I have seen this a new cable run from the master to the switch (long cable run on the floor to the other end) would make it work, proving a cabling issue of some sort.
2. Too short of a cable run or too long of a cable run.
3. An incompatibility of the master with a setting on the switch.
In some cases, if an IT department is unwilling to make changes to their Cisco equipment (because Cisco is God-like and their equipment can never be mis-configured) putting a simple switch between the master and the network should resolve the issue. I hate this solution because it invariably points the finger at the AMX equipment (why does a $30 switch fix the problem?). However, in most cases it truly is a networking problem outside of the AMX equipment.
All good thoughts, except for one thing: this was a fully working system with no such issues prior to a firmware upgrade. So I cannot believe it is a wiring or network device issue. None of that was changed, or even physically touched. You will also note I have a nearly identical system for the same customer on another premise, with all the same network equipment and an also nearly identical program (the biggest difference being the working system is running on a 3100, while the bad one is a 3000).
Before the holidays, I turned off UDP broadcasts, and set the network mode to 10/half. When I came back after the new year, it was offline again. So that had no effect. I had also put a routine in the master that had it receiving timed messages from a second master in the house, so it could reboot itself on a lost connection, but that didn't work either; I'm assuming now it needs to be a cold boot.
My customer was livid the last time this happened. I am dreading the call I have to make very soon to get back into the job and "try" something else. It's also a 3-hour round-trip from my office, and you can't very well remote a system that lost it's network connection.
Yes, and I am more than considering that at this point, I'm looking up prices and availability. I'll get the money back in saved service calls alone even if I give it away.
Going along a totally different line of thinking, any possibility that the new power conditioner or PS is defective? Power anomalies cause all sorts of lockup issues and are just as difficult to diagnose, plus it's the only other thing that coincides with your firmware upgrade. Could also be some settings are different on the network switch after the power issue that caused the initial service call?
--John
--John
I am happy to see that other people are having the same problem!
I have posted messages before and did not get any solution.
Problem is only with the network, it stop responding (cannot ping it), but I can access it from the console port with my laptop.
I had Cisco (800 series), Dlink, TPlink and linksys router.
I tried 10/100, half and full without success
I tried putting a swith and connecting the NI-3000 either on the router or the switch without any success
the ni-3000 is on a UPS so forget about current variation.
no cable is longer than than 50 feets and no shorter than 7 feets.
I do not remember it failing after a firmware upgrade, I think it failed since day one and I did not return it because it is my demo, so it get reset everytime I carry it.
On this network I have several PC and 1 printer, no other device have problem, so to me it looks like a design problem, it might be a bad batch. maybe be we should all report that problem to AMX tech support along with our NI-3000 serial number.
It just did it again this weekend. I'm going to put in the daily reboot routine. I have three systems out there where I just cannot get the client to change routers/switches and the nightly reboot thing seems to be an 'okay' work around.
Too many peoples are having the same problem (not everybody participate in this forum, so I beleive we are only seeing the tip of the iceberg)
I am sure that AMX is aware of the problem, to me it is a design flaw.
We are not talking about a a cheap DVD at 29$ here and even if the waranty is out, I beleive AMX should do something about it.
Putting a reset routine is non-sence, If I buy a Ferrari (yes I do beleive that AMX one of the best company in this area); I expect it to be flawless. If I want to go cheap, I would use X10 module and find workaround for the problem; but putting a band aid on expensive product like those is non-sense.
Seriously though I've never really experienced this issue but it does seem a little odd that a reboot or cold reboot of the master is the fix and not a reboot of the switch or router. That tends to make me lean towards a problem with the master but what do I know?
A little while ago we started receiving masters with a "new" firmware version from the AMX factory. I noticed that they would drop offline after two or so days on the network. A reboot was reuired to get the IP side working again, which would subsequently last for another two days. This was on all our masters with that firmware version (NI-700/3000/3100) The firmware was either v3.30.371 or v3.41.414 - my memory is hazy. I think it must have been v3.30.371 since the latter had the Combine Levels bug which I avoided rolling out to our fleet.
Once I updated to v3.41.422 they (all master editions) were rock-solid reliable again. I think they may have also improved the whole DHCP Renew/Rebind behaviour as well, since it seems to be working much better for me when I manually patch from one vlan to another. As part of my dispatching to our installers I now make sure that the master has at least v3.41.422, so that the master can reliably "check in" to our RMS server upon installation. I'm starting to roll out v3.50.430 and have not seen any IP issues to date (three weeks so far on approx 15 NI-3000/3100 masters).
Roger McLean
Swinburne University
I'm planning on swapping this particular unit out with an NI-3100 I have kicking around. I'll bench test it here and see if it does the same thing on my network. I can't rule out it's one of those truly arcane interactions between a specific model and a specific network layout. There could also be some contribution from the programming (I use several IP modules regularly, and though they are fine elsewhere, again maybe it's some arcane interaction). If the bad one acts bad here, I will be more than happy to send the whole thing out to you to test over there.
I have two masters in the field that do this consistently. When it happens the network is fine. I can unplug the master and plug it back in a different port on the switch with no results. I can swap out network cables, bla bla bla... I've beat this issue into bloody hamburger with no success. My workaround for it is to reboot the machine each night from code. It keeps the box honest and the clietns don't know it happens as they're tucked away cozy in their beds.
I could count on it happening with any AMX NI master if the client had one of those Linksys WRT54G routers. We eventually weened our clients of them and into enterprise level networking. Since then the problems have gone way down.
I never find any errors in the logs that I can see that give me any clues. My only clue is that I have my own logs that are emailed to me daily. I notice that I quit getting emailed reports. I then have the client throw the switch and manally reboot the master. The email log then comes in and I can see that all the devices that use the network (Modero TPs, etc...) fell offline at such-and-so hour. All the hard-wired things like RS-232 or axlink are just fine.
If you hook up to the master via rs232 you can see it just fine. However, no network functionality works at all. (Ping, web page scrapes, time server, etc...)
At first I kinda thought it might have something to do with when the ISP would reset their network to reorganize the ad-hoc arrangement. That's when eveyrbody on Comcast or AT&T DSL had to power-cycle their modems. They would then get new WAN IPs and whatnot.
This seemed to be fairly consistent with the NI network failure but not every time. Also, at no time have I ever found any evidence of a network hiccup with other devices that track such things. (Like Kaleidescape or ReQuest ARQ Link)
Those are all the hints and clues I have. To be frank, I have just wrote the whole thing off as a ghost in the machine. I have no more patience to buckle down and tackle the problem. The nightly reboot makes the problem go away and that's good enough for me.
I know that troubleshooting things like this can be maddening. they take too long to occur to be in any way practiclal to trace. I don't doubt your experience of not finding an issue.
However, in my many years on the forum, I've heard about this thing many many times and have personally run into it myself quite often. I would challenge AMX to look into it. There is definitely a problem and it is fairly predictable and reproducable.
The problem I have as a programmer and engineer is that at our end of things it looks bad to the client to have some ghost in the machine that brings down their system. As I've said before, this issue is serious in that the client doesn't know the difference nor care. They don't call up and say, "Hey, this stupid computer network won't let me watch the game." The call and say, "Hey your AMX S*** is broken. Come fix it now! I didn't pay $X00,000 for a pile of crap the doesn't work." Telling them that there's some minor issue between a $200 network switch/router, a $3 peice of Cat5e and a NIC card that probably cost AMX a few bucks is not acceptable.
My personal opinion is that it's some kind of issue of the interaction between the router/switch ports and their management and the NIC card on the NI Master. But that's a W.A.G.
I see people saying that the problem went away or in part by using Cisco manage switch.
Before somebody conclude that the problem is with cheap switch and/or router; I was using a Cisco router (806) at the begining with no additional switch and an AMX access point and I was having the same problem.
OK, I did some experiment this weekend. I dusted it off and plugged it back.
I updated it to the latest firmware (master and device) MVP-8400 already had the latest firmware.
The system was failing within 5 minutes!
System was connected directly to the router, I moved the connection from the router to my switch and this time it stayed connected ... for a while. It could be that it prefer to be on a longer cable. Wehn I was connected directly on the router, I was using a 2 meters cable (roughtly 6' 6"). Minimum lenght if I remember is 1 meter.
I found that if I unplug network and reconnect it, it comes back to live without any rebooting; I do not think It was coming back by itself before the firmware upgrade.
I will do some more experiment and keep you posted
I'd be interested to know if it's a short cable issue ... because I did have that problem years ago with a D-Link switch. In that case it was the panels ... connected directly, they wouldn't work, but if I coiled up 30' or so of Cat 5 and coupled it to the lines, they were fine. Without my extensions, the lines were about 20'. The problem went away when I replaced the D-Link with a NetGear. I was told it had something to do with the4 auto-speed negotiating protocols.
OK Dave, here is what I have found so far:
I use the same lenght of cable and connected the NI-3000 directly on the router and the first time it failed within half an hour, so cable lenght is not the solution.
Tech note 656 seems to imply that NI controller works beter if you do not auto-negociate with the switch/router, so it is now set to 100 full.
And tech note 557 said that NI controler have or had problem with netgear router at 100Mb (Dave I beleive that is the one you had problem in the past).
I will let it run and see if it fail.
Everything is on a UPS, so AC cannot be at fault.
If it fail again , I will try at 10Mbps/half.
If it fails again, I will open the NI-3000, find the chip that is responsible for network communication and replace it.
I will keep you posted
Brgds Yves Laurin
Les Systemes Umad
Cable lenght does not seems to do anything
NI-3000 stop responding earlier when connected to the routher (last longer on the switch)
setting ethernet port to 100Mbps/full did not change anything.
This seems to imply that some router/switch are better made than other; but that does explain why I have only the NI-3000 that fails on the network. Firmware should just do like Windows XP (automatically re-establish the link).
Again removing and reconnecting the network cable on the NI-3000 was enought to re-establish the link, no need to reboot the NI-3000.
I am now trying at 10Mbps/half; but I do not have great hope on this.