Redundant Processors

matt95gsr · May 2007

Has anyone here ever attempted to create a system with redundant Netlinx processors, and if so what was the best way you came across to handle it? Specifically, I have a building where I need to control lighting, shades, etc. The customer requires redundant processors with automatic failover, and I'm thinking about the best way to do this. Obviously, I will need the ability to either switch the control port wiring from one master to the other, or multiple control interfaces on the controlled devices, but I don't see much problem there. The real issue that I see is in maintaining status of each processor in the other and having the panels switch from the primary to the backup in the event of a failure. I'm thinking that I should just set up a polling mechanism in each to verify status of the other, and only have the backup process panel events if the primary is offline. Also, I don't know I should have the backup processor telnet to the panels to change the Master IP on them, or if there is a better way to handle that part. Any ideas?

vining · May 2007

As far as control port wiring if RS232 you could use a serial server and talk IP to it like wise for IR you could use something from EXTRON or Global Cache to send IR commands via IP commands from a master then when a processor fails you can still maintain all comms because comms are independant of the processors or NI connections and which ever master has the CON (control) can simple talk via IP to remotely connected devices.

Now I haven't give this much thought but if you have both masters running the same program with a little extra code in the seocnd to query the first master and some other stuff, then connect the main master to a controlled power source, then have the 2nd master query the first. When the first master doesn't respond shut down power via the controlled power switching device (in case it comes back online to avoid an IP conflict) then initiate a set_ip command and set system number command on the back up to the same values of the first. I believe there are commands for these although I've never used them. I beleive this can be done by telnet if not by system functions. Then reboot, send your self an email and everything should work as before.

annuello · May 2007

Master 2 Master?

From the programming side of things, I'd probably take the approach of using M2M to see when the master goes offline and comes back online. I'd also put in a delay to allow for slightly erratic comms conenctions between the two masters.

//Secondary master code
define_device
vdvPrimaryMaster = 0:1:1234  //Assume primary has System# 1234.

define_variable
char bTakeControl

define_event
data_event[vdvPrimaryMaster]{
 offline:{
  wait 100 'offline delay'{
   cancel_wait 'online delay'
   on[bTakeControl]
   //Gracefully take control of TPs and other devices here.
  }
 }
 online:{
  wait 100 'online delay'{
   cancel_wait 'offline delay'
   off[bTakeControl]
   //Pass back control to Primary
   //Gracefully relinquish control of TPs and other devices here.
  }
 }
}

Put the URL/IP of the Primary into the URL list of the Secondary and there you go. I use this to monitor around 25 masters at the moment. I put them in a dev array and monitor the whole array. When something happens I use get_last() to figure out which one it happen on.

As for the panels, that may be a bit more difficult. The telnet idea sounds reasonable, though I'm glad it isn't me doing it!

I'd certainly put them in a dev array so you could iterate through the array connecting them one at a time.

Roger McLean

vining · May 2007

annuello wrote:

As for the panels, that may be a bit more difficult. The telnet idea sounds reasonable, though I'm glad it isn't me doing it! I'd certainly put them in a dev array so you could iterate through the array connecting them one at a time.

My thinking was leave the TPs alone an just change the IP and system number of the back up master to that of the primary master but first removing the power supply to the primary so that for what ever reason it can't come back online and cause an IP conflict. This way all TPs will continue working with out any problems other than the slight gap between the time the primary master went down and the back up takes over.

This may not work but I think it sounds good.

annuello · May 2007

Yes, I was pondering that approach as well. However, I thought it may be of some benifit to leave the Primary master running rather than pulling the power on it. If the CPU on the Primary locked up, for whatever reason, I'd like to be able to get as much info from it as I possibly can, to diagnose the cause of failure.

Not that I can really think of a suitable example, but say the Primary got choked down with some very delayed IP comms to various devices. If the Primary comes back online, should it then pull the power on the Secondary? What if the secondary is servicing a moderatly long request, or interacting with the end user at the time that the failure occurs?

It's just thoughts. I like the idea of a dynamic & graceful changeover. Perhaps my head is in the clouds in a <oxymoron>"perfect world" scenario where failures to the Primary occur</oxymoron>. It probably comes down to how much programming time Matt can put into it.

Roger.

vining · May 2007

annuello wrote:

I'd like to be able to get as much info from it as I possibly can, to diagnose the cause of failure.

Who could use Dave's "DevTermLogger" to record pertinent error messages and what not to ram or create your own. This would require a site visit to power up the master to review. I quess you could also do it remotely if you re-apply power and upon reboot you change the primary master's IP to to a reserved address. Then you could FTP the log and review.

If you wanted something a little more dynamic that keeps both running and that's assuming that what crashed the primary master in the first place goes away who could reapply power to the primary, upon reboot change the IP to the reserved IP address and then this could become your back up, kind of a flip/flop approach so if a master isn't running on the primary IP it queries the master running on the primary IP and roles can be reversed.

This could be fun to play around with but your right it depends on how much time Matt has to put into this and how far he might want to take it.

matt95gsr · May 2007

Thanks guys, very good ideas all around - that's why I wanted to post the question here. I've got some time to play around with different ideas on this, so I think I will try out a combination of the ideas you guys have brought forth. It'll be about three hours away, so the ability to diagnose remotely would be helpful. Hopefully I can have some fun with this, and by all means keep the ideas coming!

yuri · May 2007

how about this.
Make some sort of button (a real button,not a touchpanel button) that allows the end user to switch to the other master in case something goes wrong.
You could wire the button to the second master, which performs some checks when the button is pressed to ensure the primary master is indeed down and out. Then switch over the comm etc, and mark a flag that the secondary master is now the primary master.
I don't know if your customer is willing to perform an action like this (some of them are lazy

) but it would surely make things alot more error proof if you ask me

Redundant Processors

Comments