The crash that just happened. Did the processor totally lock up, or did it just stop accepting connections?
I've got two different installations that seem to lock up right around the 1 month time frame. I have been reworking the code to minimize feedback updates, but I still haven't been able to narrow it down to specifically that problem.
I don't have anything unexpected popping up in system diags, and I know that I don't have any buffers (that I wrote) filling up. If I get a chance, I'll list all of the devices/comm modules in common between the two jobs and see if anything matches with you.
Totally locked up Jeff. Totally.
I lucked out and was at lunch when it happened and a junior tech got to go out(were having a small blizzard here today) to reset the box. Just unplugged and restarted everything. Nothing is showing up on the logs, very troublesome to figure this one out. All I can say is it NOT the AMX(99% sure of it) and the programming is the same used in 30 others rooms without a hitch for all most a year with this particular model of projector. I just ordered a UPS to try.
Just a question for everyone involved with this thread, What type of UPS are you using. Richard I think mentioned his has tracking abilities? The one they sent me has no such capabilities and I would like one that can create a log file if that is possible.
I know that various APC models have a serial port that talks with computers. I'm guessing that it is a simple commmunication protocol that could be coded to work with an AMX. In fact, I'm feeling slightly inspired and will add this to my list of new developments. I would like to be able to display to a client when they are having power issues, or atleast be able to email them or myself when power problems arise.
I'll research the APC product line and communication protocol a little and let you know what I find.
I use (almost exclusively) rackmount SmartUPS from APC. These units have a management card slot into which a network management card can be inserted. I always include the network management option as part of the UPS. In this manner, you get SNMP management of the UPS including monitoring and the ability to generate alerts to a management console, you can monitor via the web, monitor via telnet, and you can have the UPS send you email (to multiple email addresses) when an event occurs (you determine the events and the severity for triggering alerts). There is also a log file if you prefer to monitor it periodically but the UPS is capable of alerting you to events when they occur through a variety of means. I realize the UPS and the management card are expensive (in a relative sense) but the management capabilities are very valuable. SNMP support in firewalls (GETs for inbound and Alerts for outbound) is common - this is something that has been around for many years. There are some security issues to pay attention to when using SNMP as is true with any network service but it is possible to remotely monitor environments with reasonable security. We are actually beginning to package this capability as part of our maintenance contracts with commercial customers and some advanced residential customers.
I realize that rackmount UPS are not an option for every installation - there are some smaller footprint UPS that also offer the management card option. I also realize that managed UPS might not be an option for cost reasons for some residential installations but they can be powerful tools for troubleshooting power issues. (outages, sags, surges, overvoltage at the outlets, battery charging problems ,etc.)
One last comment, a number of people are awaiting the release of the new Master firmware version 3 for Duet support. While I am interested in Duet, one of the intended features also in version 3 (unless it has been dropped) is SNMP support in the Master. The more management tools we can take advantage of on the Master and in the Netlinx environment, the better we will be able to troubleshoot problems like the one(s) that started this thread. This is one of the reason I almost exclusively use SNMP managed network switches, wirelsss access points (the lack of SNMP in the WAP-200 is a negative) and UPS. Hopefully, we will soon be able to add Netlinx masters to that list. With proactive monitoring of your customer sites and alerts for problem conditions, our service responses to our customers will get better and more efficient. To some extent, this is what AMX has been trying to accomplish with RMS. Sorry for the extended response - perhaps a new thread on management/monitoring would be in order.
Reese
P.S. To further Jeff's point, integrating UPS information into the Master/TP for the customer is pretty simple. He noted that some UPS have serial ports although more of the newer models have switched to USB ports. However, the network management card could also be used to obtain UPS information since a telnet interface and other network interfaces are supported. You could use the i!-Equipment Monitor module to accept incoming alert emails from a UPS in which case very little code would need to be written.
Thanks Reese, that's what I was looking for. The UPS that was sent to me is by 'Powerware' . The RS-232 connector that was on the box was just a multi-contact conector that offered no logging or monitoring capabilities. Not saying that 'Powerware' is also not a good company.
first, we connect to a lot of devices, amps, power control, you all know the range.
i some times get suspect to what is going on with ground/earth with some devices. i had to cut the ground of a serial cable once to get a device to talk properly.
second, on the subject of UPS monitoring, i've created a new thread here...
Hello everyone involved in this tread! It sounded to me like black magic what you talked about before I faced the same $/i# during last week. Simptoms are the same, except that netlinx doesn't lock up - it's just the whole bunch of AXLink devices falling off-line with no reason. It seems to me, that something wrong with ME-260 (wich is used as a master on this installation). I checked the program - it doesn't seem to me that it may be an issue. Last night simplified the program to "push the switch - turn on the light" level - same thing - system works for about 8-9 hours, then logs show something like:
Line 92 :: BE-86-FF - 19:37:00
Line 93 :: BE-0 -86 - 19:37:00
Line 94 :: BE-86-0 - 19:37:00
Line 95 :: BE-0 -86 - 19:37:00
Line 96 :: BE-BF-0 - 19:37:00
Line 97 :: FastSettleBus - 19:37:00
Line 98 :: FastSettleBus - 19:49:02
Line 99 :: CLEAR CHECK DEVICE 201 - 19:49:02
Line 100 :: CIpEvent::OffLine 201:1:1 - 19:49:02
Line 101 :: ?1 - 19:49:03
Line 102 :: ?1 - 19:49:03
Line 103 :: **POLL LOST** - 19:49:03
Line 104 :: BE-BF-86 - 19:49:03
Line 105 :: FastSettleBus - 19:49:03
Line 106 :: CLEAR PROCESS BYTE 201 - 19:49:03
Line 107 :: CIpEvent::OnLine 201:1:1 - 19:49:03
Line 108 :: BAD CARD EDGE 201 0 - 19:49:08
Then I have AXLink devices falling off-line one after another.
UPS-Thing - I have a really nice 30 KW On-Line UPS from MGE covering whole site power - lights, all electronics, AMX (so it doesn't seem to be power line failure).
Does anyone have an idea whats that "BE-BF-86" - that I recently have on the logs?
Axlink is a form of RS-422. It only takes one flaky device to mess up the entire buss. I would suspect one of the Axlink devices has issues.
Those hex numbers look more like subnet addresses to me. Telnet (or terminal) into the master, and type in "show database," and see if anything in there matches up, it may help narrow down the flaky device.
You might want to double check your axlink wiring to make sure there's no shorts or anything. I seem to remember seeing something similar on a site and what had happened was the client had pulled on a wired touchpanel and the wires on the connector had come out and were shorting.
Originally posted by alexanbo You might want to double check your axlink wiring to make sure there's no shorts or anything. I seem to remember seeing something similar on a site and what had happened was the client had pulled on a wired touchpanel and the wires on the connector had come out and were shorting.
Ah, I had forgetten that - I had exactly that happen when I hurriedly uninstalled an Axlink device, and forgot to unplug the wire from the buss. It shorted intermeittently and made all manner of obscure errors, eventually knocking out the Axlink buss (thankfully, not damaging it, all worked when I unplugged it - after smacking myself).
Take a look at AMX Tech Note 433 as it discusses Axlink device problems in particular FastSettleBus and Bad Card Edge. The Tech Note discusses them in the context of a system with a large number of Axlink devices but the context is primarily booting and not after the system is running so your situation might be different.
I would give Tech Support a call - there is a note regarding seeing the messages when the total capacitance of your Axlink bus exceeds the threshold in which case AMX recommends AXB-SPE Slave Port Expanders to increase the Axlink bus length overall and to eliminate the problem. Perhaps this is an issue if you have a large number of Axlink devices and/or a significantly long Axlink bus length. If the system has been running fine for an extended length of time and this is a new occurrence, I would focus instead on the most recently changed component if there is one. What kind of device is 201:1:1 just for curiosity?
it would seem device 201 is faulty. maybe cabling, firmware, or simply the device. i had a CP4 (which i also happen too address in the low 20x range) out of the box with a problem, so that's where i'd head.
Unfortunately I've run into this at more than one job site. It always came down to either a bad axlink device, or shorted wiring. A lot of times techs will terminate the axlink connector BEHIND the actual clip, so it looks like it's connected but really is only haning on by a thread. The best way I know how to trouble shoot this is simply disconnect EVERYTHING in the field that has to do with AXLINK and hook things up one at time until you find the culprit.
OK. Finally. It does not lock up - no AXLink problems anymore. The troublemaker was found and it seems to me like a very incredible thing and the last one to troubleshoot... It was a AXD-IR+ (device no. 190 on AXLink hanging next to MSP8 - device no. 201). That was really weird to find out, that it was picking up interference from Electroluminicent lamps (20 Philips lamps with dimmable balasts hanging right in front of the wall, where AXD-IR was installed) and probably Sun (well, there's sun in Russia too). After it was (in cronological order) a.) closed from any radiation with the holy help of masking tape b.) wached for BAD behaviour for about 48 hours and revealed no any weirs messages or AXLink hangups c.) opened for radiation again and initiated all the things I described before d.) removed from the system (how you suppose it to function with the masking tape on). There we go... !!!! Just wanted to make sure it's really true, that AXD-IR was picking it up - now it's still device no. 190 but installed in a basement in a really dark corner - it's already there for about 12 hours troublefree).
Okay, the box just locked up with a UPS on it. The next thing I'm testing is thermal. Oddly I was checking this box remotely earlier today and I noticed that Sunday at 21:58 everything went offline and then came back on line. I'm going to check the UPS log later tonight.
i dont want to be telling you things you probably know, and that i've mentioned here before, but have you done the Queue_and_Threshold_Sizes.axi patch ?
this has made a difference, well, at least so far, to what was an unreliable system that now has two UPSs on it, one for the master (and two DMSs) and one the 15" panel.
and yes, check temperature. i've tested an NI-3000 to 44 degrees celcius and it is okay. the spec is 50 degrees C for non-display equipment, and 40 degrees C for any equipment with a display.
i did wonder why that particular box crashes sometimes, as i think it may have approached 50c on a really hot day, but we are now in autumn (fall to some of you) so i wont see a hot day for 8-9 months now.
See attached - the file is generally distributed as part of the Design XPress installation as that is where I got my copy. The file attached is the original file although I modified my own copy (not reflected in the attached file) since subsequent AMX Tech Notes recommended some higher thresholds when using a large number of DMS keypads. I am not aware of a Tech Note that contains a link to the file but there may in fact be one.
I had the same issue happen to me. In my case, I looked at the NetLinx Diagnostics and the controller was getting absolutely POUNDED with network traffic that was totally unnecessary.
In my case I was using the AWFUL Duet module for the Sonance iPort. Everything that was on that controller was locking up, control was doing funky things, etc. I changed up the code, replaced the Duet Module with a NetLinx Module, and voila, no more lockups. In my case, it was an issue of tons and tons of network traffic just bringing the controller to its knees.
If this is not the case, but you still feel that the problem is occuring, in the meantime you can do something. It's flaky and kind of a non standard thing to do, but you can force the controller to reboot everyday at 2 AM (or when its not in use) to reset everything. I did that for a couple of weeks while I was rectifying my problem.
Forgive me as well if someone already brought this up. I didnt read all 5 pages of the posts
Comments
I've got two different installations that seem to lock up right around the 1 month time frame. I have been reworking the code to minimize feedback updates, but I still haven't been able to narrow it down to specifically that problem.
I don't have anything unexpected popping up in system diags, and I know that I don't have any buffers (that I wrote) filling up. If I get a chance, I'll list all of the devices/comm modules in common between the two jobs and see if anything matches with you.
Jeff
I lucked out and was at lunch when it happened and a junior tech got to go out(were having a small blizzard here today) to reset the box. Just unplugged and restarted everything. Nothing is showing up on the logs, very troublesome to figure this one out. All I can say is it NOT the AMX(99% sure of it) and the programming is the same used in 30 others rooms without a hitch for all most a year with this particular model of projector. I just ordered a UPS to try.
Jeff
I'll research the APC product line and communication protocol a little and let you know what I find.
Thomas,
I use (almost exclusively) rackmount SmartUPS from APC. These units have a management card slot into which a network management card can be inserted. I always include the network management option as part of the UPS. In this manner, you get SNMP management of the UPS including monitoring and the ability to generate alerts to a management console, you can monitor via the web, monitor via telnet, and you can have the UPS send you email (to multiple email addresses) when an event occurs (you determine the events and the severity for triggering alerts). There is also a log file if you prefer to monitor it periodically but the UPS is capable of alerting you to events when they occur through a variety of means. I realize the UPS and the management card are expensive (in a relative sense) but the management capabilities are very valuable. SNMP support in firewalls (GETs for inbound and Alerts for outbound) is common - this is something that has been around for many years. There are some security issues to pay attention to when using SNMP as is true with any network service but it is possible to remotely monitor environments with reasonable security. We are actually beginning to package this capability as part of our maintenance contracts with commercial customers and some advanced residential customers.
I realize that rackmount UPS are not an option for every installation - there are some smaller footprint UPS that also offer the management card option. I also realize that managed UPS might not be an option for cost reasons for some residential installations but they can be powerful tools for troubleshooting power issues. (outages, sags, surges, overvoltage at the outlets, battery charging problems ,etc.)
One last comment, a number of people are awaiting the release of the new Master firmware version 3 for Duet support. While I am interested in Duet, one of the intended features also in version 3 (unless it has been dropped) is SNMP support in the Master. The more management tools we can take advantage of on the Master and in the Netlinx environment, the better we will be able to troubleshoot problems like the one(s) that started this thread. This is one of the reason I almost exclusively use SNMP managed network switches, wirelsss access points (the lack of SNMP in the WAP-200 is a negative) and UPS. Hopefully, we will soon be able to add Netlinx masters to that list. With proactive monitoring of your customer sites and alerts for problem conditions, our service responses to our customers will get better and more efficient. To some extent, this is what AMX has been trying to accomplish with RMS. Sorry for the extended response - perhaps a new thread on management/monitoring would be in order.
Reese
P.S. To further Jeff's point, integrating UPS information into the Master/TP for the customer is pretty simple. He noted that some UPS have serial ports although more of the newer models have switched to USB ports. However, the network management card could also be used to obtain UPS information since a telnet interface and other network interfaces are supported. You could use the i!-Equipment Monitor module to accept incoming alert emails from a UPS in which case very little code would need to be written.
first, we connect to a lot of devices, amps, power control, you all know the range.
i some times get suspect to what is going on with ground/earth with some devices. i had to cut the ground of a serial cable once to get a device to talk properly.
second, on the subject of UPS monitoring, i've created a new thread here...
http://www.amxforums.com/showthread.php?s=&threadid=523
Hello everyone involved in this tread! It sounded to me like black magic what you talked about before I faced the same $/i# during last week. Simptoms are the same, except that netlinx doesn't lock up - it's just the whole bunch of AXLink devices falling off-line with no reason. It seems to me, that something wrong with ME-260 (wich is used as a master on this installation). I checked the program - it doesn't seem to me that it may be an issue. Last night simplified the program to "push the switch - turn on the light" level - same thing - system works for about 8-9 hours, then logs show something like:
Line 92 :: BE-86-FF - 19:37:00
Line 93 :: BE-0 -86 - 19:37:00
Line 94 :: BE-86-0 - 19:37:00
Line 95 :: BE-0 -86 - 19:37:00
Line 96 :: BE-BF-0 - 19:37:00
Line 97 :: FastSettleBus - 19:37:00
Line 98 :: FastSettleBus - 19:49:02
Line 99 :: CLEAR CHECK DEVICE 201 - 19:49:02
Line 100 :: CIpEvent::OffLine 201:1:1 - 19:49:02
Line 101 :: ?1 - 19:49:03
Line 102 :: ?1 - 19:49:03
Line 103 :: **POLL LOST** - 19:49:03
Line 104 :: BE-BF-86 - 19:49:03
Line 105 :: FastSettleBus - 19:49:03
Line 106 :: CLEAR PROCESS BYTE 201 - 19:49:03
Line 107 :: CIpEvent::OnLine 201:1:1 - 19:49:03
Line 108 :: BAD CARD EDGE 201 0 - 19:49:08
Then I have AXLink devices falling off-line one after another.
UPS-Thing - I have a really nice 30 KW On-Line UPS from MGE covering whole site power - lights, all electronics, AMX (so it doesn't seem to be power line failure).
Does anyone have an idea whats that "BE-BF-86" - that I recently have on the logs?
Those hex numbers look more like subnet addresses to me. Telnet (or terminal) into the master, and type in "show database," and see if anything in there matches up, it may help narrow down the flaky device.
Take a look at AMX Tech Note 433 as it discusses Axlink device problems in particular FastSettleBus and Bad Card Edge. The Tech Note discusses them in the context of a system with a large number of Axlink devices but the context is primarily booting and not after the system is running so your situation might be different.
I would give Tech Support a call - there is a note regarding seeing the messages when the total capacitance of your Axlink bus exceeds the threshold in which case AMX recommends AXB-SPE Slave Port Expanders to increase the Axlink bus length overall and to eliminate the problem. Perhaps this is an issue if you have a large number of Axlink devices and/or a significantly long Axlink bus length. If the system has been running fine for an extended length of time and this is a new occurrence, I would focus instead on the most recently changed component if there is one. What kind of device is 201:1:1 just for curiosity?
Reese
nice UPS you got there
OK. Finally. It does not lock up - no AXLink problems anymore. The troublemaker was found and it seems to me like a very incredible thing and the last one to troubleshoot... It was a AXD-IR+ (device no. 190 on AXLink hanging next to MSP8 - device no. 201). That was really weird to find out, that it was picking up interference from Electroluminicent lamps (20 Philips lamps with dimmable balasts hanging right in front of the wall, where AXD-IR was installed) and probably Sun (well, there's sun in Russia too). After it was (in cronological order) a.) closed from any radiation with the holy help of masking tape b.) wached for BAD behaviour for about 48 hours and revealed no any weirs messages or AXLink hangups c.) opened for radiation again and initiated all the things I described before d.) removed from the system (how you suppose it to function with the masking tape on). There we go... !!!! Just wanted to make sure it's really true, that AXD-IR was picking it up - now it's still device no. 190 but installed in a basement in a really dark corner - it's already there for about 12 hours troublefree).
i dont want to be telling you things you probably know, and that i've mentioned here before, but have you done the Queue_and_Threshold_Sizes.axi patch ?
this has made a difference, well, at least so far, to what was an unreliable system that now has two UPSs on it, one for the master (and two DMSs) and one the 15" panel.
and yes, check temperature. i've tested an NI-3000 to 44 degrees celcius and it is okay. the spec is 50 degrees C for non-display equipment, and 40 degrees C for any equipment with a display.
i did wonder why that particular box crashes sometimes, as i think it may have approached 50c on a really hot day, but we are now in autumn (fall to some of you) so i wont see a hot day for 8-9 months now.
Thomas,
See attached - the file is generally distributed as part of the Design XPress installation as that is where I got my copy. The file attached is the original file although I modified my own copy (not reflected in the attached file) since subsequent AMX Tech Notes recommended some higher thresholds when using a large number of DMS keypads. I am not aware of a Tech Note that contains a link to the file but there may in fact be one.
Reese
I had the same issue happen to me. In my case, I looked at the NetLinx Diagnostics and the controller was getting absolutely POUNDED with network traffic that was totally unnecessary.
In my case I was using the AWFUL Duet module for the Sonance iPort. Everything that was on that controller was locking up, control was doing funky things, etc. I changed up the code, replaced the Duet Module with a NetLinx Module, and voila, no more lockups. In my case, it was an issue of tons and tons of network traffic just bringing the controller to its knees.
If this is not the case, but you still feel that the problem is occuring, in the meantime you can do something. It's flaky and kind of a non standard thing to do, but you can force the controller to reboot everyday at 2 AM (or when its not in use) to reset everything. I did that for a couple of weeks while I was rectifying my problem.
Forgive me as well if someone already brought this up. I didnt read all 5 pages of the posts
Have a great weekend