Home AMX User Forum AMX General Discussion

A way to use a backup master? Any ideas on my way?

Here is an Idea that I have. I have 2 masters. One is system one the other is system two. The panels are all connected to system 1, but they are set to listen for system one not hard set to ip. The code is the same in both masters but the logic says if you are system 2 just monitor system one for failure. If you notice a failure set your system number to 1 and reboot. That way when you come back online the panels now connect to you. On any reboot i look at the MAC address of system one. If it is not the MAC from the original system one I send a email out of RMS to notify tech support the first master is broken. Any thoughts?

Comments

  • http://www.amx.com/techsupport/techNote.asp?id=961

    I've been down that road with tech support. That was before the EXBs were in the picture, so maybe it's more feasible now.
  • ericmedleyericmedley Posts: 4,177
    I guess I kinda did the inverse on a troublesome project once. There were three NI masters in the house. There were 4 TPs (MVP 5100s) in the home of the many that would regularly run away from the main master and hook up with one of the other masters.

    Didn't get much help from support because the other TP models seemed to work fine. I first tried to just accomodate having TP devices that could be on any master but when they did run off feedback became too slow through the M2M communication.

    So, the solution that worked quite well was I put a little code on the subordinate NIs that would telnet into the TPs when hey showed up drunk on their porch. It would telnet in, reset them to the URL mode, enter in the IP address of the correct master then reboot them. It worked great. The client never reported a problem again.

    Perhaps this might spur some ideas.
  • John NagyJohn Nagy Posts: 1,742
    Good tech note, pointing out the problems with serial, IR, Relay, sensors all hard wired to one master... just getting the panels to switch is not enough. You need to be able to connect to all the controlled devices too. The new external IP boxes are to my experience more likely to fail than a NetLinx.

    Presuming you got past those limitations, you're overthinking the changeover concept. Code in System 2 would know it was the backup, as it would have to be different in order to monitor system 1. So there's no reason to have it look at the MAC address to know who it is. It could send your notice to RMS before a changeover to be system 1.

    It should be noted that with protection from IP invasions, good code and a UPS, a NetLinx is likely to run forever. Failure modes are most likely I/O, which your duplication would not detect or cure. Your complicated switchover schemes might also introduce more likelihood of failure points than provide protection.

    Perhaps you could just have the code "call in" regularly to RMS, and if the calls stop, trigger the trouble call.
  • samossamos Posts: 106
    I agree that this is overkill but the salesman has sold the client two processors and sold the whole idea of redundancy with no human interaction. I told him that masters hardly ever fail and that he should just manually replace the unit if it ever fails with the backup unit. he went out of his way to not wire any IO to the masters. everything is IP controlled or connected 232 to the enova switch. Trust me this is not my idea of the way to do things. This is a salesman selling a feature that should not have been sold and then depending on my coding skills to get him out of a tight place. Thank you for all of your input. i will be running some test code later today. I will let you know what I find out.
  • ericmedleyericmedley Posts: 4,177
    samos wrote: »
    I agree that this is overkill but the salesman has sold the client two processors and sold the whole idea of redundancy with no human interaction. I told him that masters hardly ever fail and that he should just manually replace the unit if it ever fails with the backup unit. he went out of his way to not wire any IO to the masters. everything is IP controlled or connected 232 to the enova switch. Trust me this is not my idea of the way to do things. This is a salesman selling a feature that should not have been sold and then depending on my coding skills to get him out of a tight place. Thank you for all of your input. i will be running some test code later today. I will let you know what I find out.

    yep, we've all seen this before.
  • rfletcherrfletcher Posts: 217
    samos wrote: »
    I agree that this is overkill but the salesman has sold the client two processors and sold the whole idea of redundancy with no human interaction. I told him that masters hardly ever fail and that he should just manually replace the unit if it ever fails with the backup unit. he went out of his way to not wire any IO to the masters. everything is IP controlled or connected 232 to the enova switch. Trust me this is not my idea of the way to do things. This is a salesman selling a feature that should not have been sold and then depending on my coding skills to get him out of a tight place. Thank you for all of your input. i will be running some test code later today. I will let you know what I find out.

    <snark> I love it when that happens! </snark>

    Maybe you should use one of these :p

  • viningvining Posts: 4,368
    I suppose it might be possible to use a cardframe w/ HubCard and connect 2 masters to it via ICSNet if you have IO, IR, relay and serial devices, then you could use a couple NI-2100 or ME260 to run your programs. Not using any ports other than the ethernet and ICSNet on either master.
  • John NagyJohn Nagy Posts: 1,742
    Is the NetLinx in the Enova one of your "masters"?
    If you are talking to the Enova on IP, you are talking through that NetLinx.
    Your web is getting more tangled.
  • samossamos Posts: 106
    No the netlinx in the Enova is not one of the masters. This is a salesman for you. He sold them 2 N1-4100's in addition to the master that already existed in the Enova.(even though he is not using any of the IO ports on the 4100, could have used just 900's or 700's) I am only using the Envoa as a switcher and for the com cards. I am hoping that my code in master 2 will figure out when one fails. Reboot itself as #1 and then all panels and the switcher will connect back to it.
  • John NagyJohn Nagy Posts: 1,742
    This kind of salesmanship is what gives the industry a bad reputation.
    If your customer ever gets another dealer in to look at the system, imagine what they will think when they are told they bought MANY thousands of dollars of devices he never needed...
    And you know how new dealers love to make fun of existing systems they are called in to consult on.
    More of why we have the bad reputation.
  • samossamos Posts: 106
    So I seem to have most of this working. I have code that will detect when the master main master is offline. Start a five minute timeline and if the main master is not online after five minutes change the backup masters system number to the main masters system number and reboot. The panels then all come back online with the new main master. The only thing that I have not figured out is what to do when the original main master comes back online. Does anyone know of a way to poll for all system numbers on the network, or to find out if there is a system number on the network that matches yours?
  • rfletcherrfletcher Posts: 217
    Maybe you should have the second master telnet to the panels and change the system connection settings instead of having the panels connect to a specific system number?

    Also, you might want to avoid failing back over to the first master whenever it comes back since that would cause an additional disruption in control while all the connections move. Maybe instead have the first master become the new backup instead?
  • John NagyJohn Nagy Posts: 1,742
    Or put a power cutoff on the main master that is activated to remove power from it as the last act of the slave before it changes its system number... so it CAN NOT resume without intervention.
  • jweatherjweather Posts: 320
    John Nagy wrote: »
    Or put a power cutoff on the main master that is activated to remove power from it as the last act of the slave before it changes its system number... so it CAN NOT resume without intervention.

    This fail-over method is rather graphically known as STONITH: Shoot The Other Node In The Head. When either node can pull the plug on the other, watch out for literal "dead"lock solutions as a result of network issues where both decide the other is dead and kill each other simultaneously...
  • viningvining Posts: 4,368
    The whole purpose was for a failover that would keep the system functioning and not require an emergency service call. If it fails let the second master take charge, pull power on the primary and send yourself a notification to check the system when convenient. Once onsite or remote if possible dtermine what caused the primary master to malfunction (assuming they're on battery back up), repair and reset the roles.
  • John NagyJohn Nagy Posts: 1,742
    I didn't suggested a mutual murder/suicide pact. Just the backup would have the ability to assure the primary couldn't revive without deliberate action.

    Apropos,
    A 911 call comes in, and the operator hears an excited man say "I'm out hunting with my buddy, he's fallen out of a tree and he's dead! What do I do?"
    The operator says, "I understand. First, calm down. You can't help if you are in a panic."
    "OK, I'm taking a moment and trying to get it together. What next?"
    The operator replies, "Next, it's important to know his actual condition. Be sure he's really dead."
    The operator then hears a pause then three gunshots, and the caller returned, saying,
    "OK, he's really dead. Now what?"
  • jweather wrote: »
    This fail-over method is rather graphically known as STONITH: Shoot The Other Node In The Head. When either node can pull the plug on the other, watch out for literal "dead"lock solutions as a result of network issues where both decide the other is dead and kill each other simultaneously...

    You could have the masters play ping pong over some serial ports to avoid that. It'll work until the caps go bad...
  • samossamos Posts: 106
    serial port ping pong perfect. That is all I needed.

    and now for my joke of the day.

    Jesus and Satan have an argument as to who is the better programmer. This goes on for a few hours until they come to an agreement to hold a contest with God as the judge. They set themselves before their computers and begin. They type furiously, lines of code streaming up the screen, for several hours straight.

    Seconds before the end of the competition, a bolt of lightning strikes, taking out the electricity. Moments later, the power is restored, and God announces that the contest is over. He asks Satan to show his work. Visibly upset, Satan cries and says, “I have nothing. I lost it all when the power went out.”

    “Very well,” says God, “let us see if Jesus has fared any better.”

    Jesus presses a key, and the screen comes to life in vivid display, the voices of an angelic choir pour forth from the speakers.

    Satan is astonished. He stutters, “B-b-but how?! I lost everything, yet Jesus’ program is intact! How did he do it?”

    God chuckles, “Everybody knows Jesus saves.”
  • Hi Samos,
    I too have suffered the salesman's wrath! Almost the same issue but the end result is pretty much the same.

    The basis of the system is 5 iPADs linked to a master processor which relay commands to 20 slave processors. There are no physical devices connected to the master, just IP based kit.

    With such a system hinging on a single master processor a backup processor was provided as a get out of jail.

    The system took some head scratching but here's how we turned it out.

    The master system is system 1 and the backup system 2
    The slave system monitors the online status of the master and visa versa.
    If an offline event occurs both systems wait a timeout period.
    If the timeout period is reached the active processor will cycle the power of the missing processor.
    If a second timeout period elapses then the fun begins.
    If the master decides the backup is dead then it flags a warning....
    If however the backup decides the master is dead then it switches off the power to the master from the PDU.

    Knowing that the master is now permanently missing, the slave runs the following routine.
    PANIC!!!!! send a message for help
    Change IP Address to that of the master
    Change System number to 1
    Add all the other processors to the URL list
    Reboot.

    To get the code running the program has a simple flag in it called Master. Master is checked as a simple test on boot to check if the system number is 1. In the case of Master, the full code runs and the IP connections are made.

    To reset, a hard button wired to an IO simply reverses the above process, turns the master back on and reboots before the master has time to come up.

    Always a risk of the above making the situation worse, so leave plenty of timeout so not to cause unexpected events.......
Sign In or Register to comment.