A way to use a backup master? Any ideas on my way?
samos
Posts: 106
Here is an Idea that I have. I have 2 masters. One is system one the other is system two. The panels are all connected to system 1, but they are set to listen for system one not hard set to ip. The code is the same in both masters but the logic says if you are system 2 just monitor system one for failure. If you notice a failure set your system number to 1 and reboot. That way when you come back online the panels now connect to you. On any reboot i look at the MAC address of system one. If it is not the MAC from the original system one I send a email out of RMS to notify tech support the first master is broken. Any thoughts?
0
Comments
I've been down that road with tech support. That was before the EXBs were in the picture, so maybe it's more feasible now.
Didn't get much help from support because the other TP models seemed to work fine. I first tried to just accomodate having TP devices that could be on any master but when they did run off feedback became too slow through the M2M communication.
So, the solution that worked quite well was I put a little code on the subordinate NIs that would telnet into the TPs when hey showed up drunk on their porch. It would telnet in, reset them to the URL mode, enter in the IP address of the correct master then reboot them. It worked great. The client never reported a problem again.
Perhaps this might spur some ideas.
Presuming you got past those limitations, you're overthinking the changeover concept. Code in System 2 would know it was the backup, as it would have to be different in order to monitor system 1. So there's no reason to have it look at the MAC address to know who it is. It could send your notice to RMS before a changeover to be system 1.
It should be noted that with protection from IP invasions, good code and a UPS, a NetLinx is likely to run forever. Failure modes are most likely I/O, which your duplication would not detect or cure. Your complicated switchover schemes might also introduce more likelihood of failure points than provide protection.
Perhaps you could just have the code "call in" regularly to RMS, and if the calls stop, trigger the trouble call.
yep, we've all seen this before.
<snark> I love it when that happens! </snark>
Maybe you should use one of these
If you are talking to the Enova on IP, you are talking through that NetLinx.
Your web is getting more tangled.
If your customer ever gets another dealer in to look at the system, imagine what they will think when they are told they bought MANY thousands of dollars of devices he never needed...
And you know how new dealers love to make fun of existing systems they are called in to consult on.
More of why we have the bad reputation.
Also, you might want to avoid failing back over to the first master whenever it comes back since that would cause an additional disruption in control while all the connections move. Maybe instead have the first master become the new backup instead?
This fail-over method is rather graphically known as STONITH: Shoot The Other Node In The Head. When either node can pull the plug on the other, watch out for literal "dead"lock solutions as a result of network issues where both decide the other is dead and kill each other simultaneously...
Apropos,
A 911 call comes in, and the operator hears an excited man say "I'm out hunting with my buddy, he's fallen out of a tree and he's dead! What do I do?"
The operator says, "I understand. First, calm down. You can't help if you are in a panic."
"OK, I'm taking a moment and trying to get it together. What next?"
The operator replies, "Next, it's important to know his actual condition. Be sure he's really dead."
The operator then hears a pause then three gunshots, and the caller returned, saying,
"OK, he's really dead. Now what?"
You could have the masters play ping pong over some serial ports to avoid that. It'll work until the caps go bad...
and now for my joke of the day.
Jesus and Satan have an argument as to who is the better programmer. This goes on for a few hours until they come to an agreement to hold a contest with God as the judge. They set themselves before their computers and begin. They type furiously, lines of code streaming up the screen, for several hours straight.
Seconds before the end of the competition, a bolt of lightning strikes, taking out the electricity. Moments later, the power is restored, and God announces that the contest is over. He asks Satan to show his work. Visibly upset, Satan cries and says, “I have nothing. I lost it all when the power went out.”
“Very well,” says God, “let us see if Jesus has fared any better.”
Jesus presses a key, and the screen comes to life in vivid display, the voices of an angelic choir pour forth from the speakers.
Satan is astonished. He stutters, “B-b-but how?! I lost everything, yet Jesus’ program is intact! How did he do it?”
God chuckles, “Everybody knows Jesus saves.”
I too have suffered the salesman's wrath! Almost the same issue but the end result is pretty much the same.
The basis of the system is 5 iPADs linked to a master processor which relay commands to 20 slave processors. There are no physical devices connected to the master, just IP based kit.
With such a system hinging on a single master processor a backup processor was provided as a get out of jail.
The system took some head scratching but here's how we turned it out.
The master system is system 1 and the backup system 2
The slave system monitors the online status of the master and visa versa.
If an offline event occurs both systems wait a timeout period.
If the timeout period is reached the active processor will cycle the power of the missing processor.
If a second timeout period elapses then the fun begins.
If the master decides the backup is dead then it flags a warning....
If however the backup decides the master is dead then it switches off the power to the master from the PDU.
Knowing that the master is now permanently missing, the slave runs the following routine.
PANIC!!!!! send a message for help
Change IP Address to that of the master
Change System number to 1
Add all the other processors to the URL list
Reboot.
To get the code running the program has a simple flag in it called Master. Master is checked as a simple test on boot to check if the system number is 1. In the case of Master, the full code runs and the IP connections are made.
To reset, a hard button wired to an IO simply reverses the above process, turns the master back on and reboots before the master has time to come up.
Always a risk of the above making the situation worse, so leave plenty of timeout so not to cause unexpected events.......