Rebooting master
Danny Campbell
Posts: 311
in AMX Hardware
I have a customer that has asked to have an automatic reboot of the masters on a scheduled basis.
They have one system that locks up after about 3 months even after the threshold and queues code has been added in. Now they want to reboot this and all of the other systems every day.
While this may address the symptoms for the one system, I feel uneasy about doing this every day, and on all systems. (100+).
Does anyone have any feeling (pro or con) about doing this?
Danny
They have one system that locks up after about 3 months even after the threshold and queues code has been added in. Now they want to reboot this and all of the other systems every day.
While this may address the symptoms for the one system, I feel uneasy about doing this every day, and on all systems. (100+).
Does anyone have any feeling (pro or con) about doing this?
Danny
0
Comments
I had a similar issue. Auto reboots are out of the question in my opinion. What I ended up doing was to put the troublematic master on a Power strip with "Switched" outlets and a front access button to "Switch" the power on and off. The master was the only thing plugged into it. If the system went down, locked up, etc... The client would have to go push the button. They complained at first, but now its not an issue for the rarity of malfunction
Just a thought . . .
Danny
But anyway, the solution I had for reboots is that I put a timeline in that reset another timeline on a second master. If the second master didn't hear from the first, it fired a relay that rebooted it. That way, it only rebooted if there was a hang of some sort, or enough sluggish behaviour that it timed out.
A single-master solution along the same lines would be to put a timer relay on the power to the master, and use a heartbeat timeline and one of the onboard relays (or an IO port) to pulse it frequently enough that it stays closed all the time. If the pulses stop for any reason, the relay times out and resets the power. In effect, a deadman switch.
Fred
Why? I mean, apparently reading your post it's not the reboot that is the issue, its having it automatically that is. In which way is a manual switch better?
Just curious
Fred
Why reboot when not needed?
And, even after scheduled reboots, The system may still lock up in between, thus not really accomplishing anything.
If the lockup is consistent then there are other issues to be resolved...
I can understand the client?s expectation that the system should function every time it?s put to use. But I have a hard time swallowing auto reboots. I?m not going out on any limb by saying I would much rather focus on getting to the root of the problem than covering it up by reboots. Sounds to Micro$oftish to me. I expect more from an AMX system.
I?m sure you?ve already been down this path so don?t feel obligated to answer any or all of these questions that are crossing my mind:
1) Are all 100 systems programmed exactly alike with the same hardware?
2) Is there any disk I/O performed on the system that locks up?
3) When you say locked up, does the output and/or input LED lock on solid?
4) Can you telnet into the locked up system?
5) What kinds of devices are in the system? Any IP? Lots of RS-232?
6) Are there many reoccurring TIMELINEs?
7) Have you tried monitoring messages to see if memory is leaking or any run-time errors occur?
If there is a 3 month pattern of failure then something must be wrong and more rocks need to be uncovered?
I?ve heard of dead man?s curve but never dead man?s switch so naturally it had to be googled. Interesting background.
http://en.wikipedia.org/wiki/Dead_man's_switch
Another thing, now that I am thinking about it, is I like to put i!-EquipmentMonitor in my jobs when I can to fire off an e-mail whenever a system starts up. That will alert me to potential problems, and in the above case, it alerted me when there was a reboot.
Do you remember which one of these devices that was? There seems to be a bunch of them out there going from the very inexpensive to the quite expensive. I've been wondering how reliable they are in general and if when the inevitable interruption occurs, how well they are at regaining connections.
1) Are all 100 systems programmed exactly alike with the same hardware?
No. This is the only one that locks up and it is a one-of-a-kind system that is naturally used by the top execs. I've given a brief description of it at the bottom of this post.
2) Is there any disk I/O performed on the system that locks up?
No.
3) When you say locked up, does the output and/or input LED lock on solid?
It has been so long that I can't remember. I do remember that the LEDs for the RS232 ports are all dead. No RX or TX.
4) Can you telnet into the locked up system?
No.
5) What kinds of devices are in the system? Any IP? Lots of RS-232?
No IP devices, but it does use IP to get to a database for a dialing directory for the video conferencing system. Lots of RS-232. All 7 ports on the NI-4000 and three COMM-2 cards.
6) Are there many reoccurring TIMELINEs?
I use three timelines in the main program. Two begin at startup and repeat constantly. The third is used as a shutdown timer. If there are no button presses for two hours after 5:30PM, the system shuts down. If a button is pressed, the timeline is killed and restarted if it is between 5:30PM and 7:00AM. There are also 10 modules in use. A few I had written but most are from AMX, so I don't know what they are doing.
7) Have you tried monitoring messages to see if memory is leaking or any run-time errors occur?
I've suggested that we do this before we do any crazy rebooting. My biggest problem is that now whenever anything odd happens, they cry "lockup" and someone reboots the system without doing any diagnostic work at all. They will not leave the system in the locked state long enough for me to get there, and I'm only 15 minutes away.
This is a video conferencing room that has 8 cameras, the codec, 8 38" DLP cubes, one 60" DLP cube, a Zandar video processor, two Sierra A/V switches, a DSS receiver, a DVD player, a VHS player, and a couple of Vortex EF systems. The design was to have an auto tracking ability built into the system so one of the four eye-level cameras would point to whoever was speaking based on the input from one of 12 table-mounted microphones. After a few seconds of quiet, or when several people are speaking at once, the view would switch to a quad-view of the entire room using the Zandar and the wall cameras. This is controlled by the Vortex, and does cause more RS-232 I/O than the standard systems. However, this same feature is used in one other unique room which does not have any lockup issues. Did I also mention two separate speaker systems? Everything is RS-232 controlled. Basically, the table is a giant square with nothing in the center. The DLP's are mounted with two along each side (on the inside), so the participants can look down slightly to see the far side of the video conference plus the other monitor for the near side.
With the exception of the Zandar, all of the equipment used in this room is used in some of the other rooms. Much of the equipment used in this room is used in another room that uses 8 DLP cubes in a videowall arrangement, has fewer cameras, but adds a Yamaha surround receiver and a Barco into the mix. This is the same one that does the camera tracking trick that has not had a lockup.
I'm sure that there is some kind of memory leak type of issue that is killing the system, but there is no way to reproduce the problem on demand.
I believe that I've talked them out of rebooting all systems on a schedule, but I also believe that every morning someone goes in and reboots this one system. In fact, the reason I started this thread was to gather ammunition on why they should not do auto-reboots.
thanks. That's what I'm trying to get them to let me do. In fact, I downloaded your module a few days ago with this in mind.