Redundant Processors
matt95gsr
Posts: 165
Has anyone here ever attempted to create a system with redundant Netlinx processors, and if so what was the best way you came across to handle it? Specifically, I have a building where I need to control lighting, shades, etc. The customer requires redundant processors with automatic failover, and I'm thinking about the best way to do this. Obviously, I will need the ability to either switch the control port wiring from one master to the other, or multiple control interfaces on the controlled devices, but I don't see much problem there. The real issue that I see is in maintaining status of each processor in the other and having the panels switch from the primary to the backup in the event of a failure. I'm thinking that I should just set up a polling mechanism in each to verify status of the other, and only have the backup process panel events if the primary is offline. Also, I don't know I should have the backup processor telnet to the panels to change the Master IP on them, or if there is a better way to handle that part. Any ideas?
0
Comments
Now I haven't give this much thought but if you have both masters running the same program with a little extra code in the seocnd to query the first master and some other stuff, then connect the main master to a controlled power source, then have the 2nd master query the first. When the first master doesn't respond shut down power via the controlled power switching device (in case it comes back online to avoid an IP conflict) then initiate a set_ip command and set system number command on the back up to the same values of the first. I believe there are commands for these although I've never used them. I beleive this can be done by telnet if not by system functions. Then reboot, send your self an email and everything should work as before.
From the programming side of things, I'd probably take the approach of using M2M to see when the master goes offline and comes back online. I'd also put in a delay to allow for slightly erratic comms conenctions between the two masters.
Put the URL/IP of the Primary into the URL list of the Secondary and there you go. I use this to monitor around 25 masters at the moment. I put them in a dev array and monitor the whole array. When something happens I use get_last() to figure out which one it happen on.
As for the panels, that may be a bit more difficult. The telnet idea sounds reasonable, though I'm glad it isn't me doing it! I'd certainly put them in a dev array so you could iterate through the array connecting them one at a time.
Roger McLean
This may not work but I think it sounds good.
Not that I can really think of a suitable example, but say the Primary got choked down with some very delayed IP comms to various devices. If the Primary comes back online, should it then pull the power on the Secondary? What if the secondary is servicing a moderatly long request, or interacting with the end user at the time that the failure occurs?
It's just thoughts. I like the idea of a dynamic & graceful changeover. Perhaps my head is in the clouds in a <oxymoron>"perfect world" scenario where failures to the Primary occur</oxymoron>. It probably comes down to how much programming time Matt can put into it.
Roger.
If you wanted something a little more dynamic that keeps both running and that's assuming that what crashed the primary master in the first place goes away who could reapply power to the primary, upon reboot change the IP to the reserved IP address and then this could become your back up, kind of a flip/flop approach so if a master isn't running on the primary IP it queries the master running on the primary IP and roles can be reversed.
This could be fun to play around with but your right it depends on how much time Matt has to put into this and how far he might want to take it.
Make some sort of button (a real button,not a touchpanel button) that allows the end user to switch to the other master in case something goes wrong.
You could wire the button to the second master, which performs some checks when the button is pressed to ensure the primary master is indeed down and out. Then switch over the comm etc, and mark a flag that the secondary master is now the primary master.
I don't know if your customer is willing to perform an action like this (some of them are lazy ) but it would surely make things alot more error proof if you ask me