How to debug master stucking?

adysadys Junior MemberPosts: 395
What is the best way to see what causing one of our masters to stuck?

I thought about leaving the Studio notifications open for all night, or maybe telnet will be better and then I can leave it for all the masters?


thanks

Ady.

Comments

  • vincenvincen Junior Member Posts: 526
    Just keep a telnet open on master in which you have issued msg on command first ;)

    Vince
  • adysadys Junior Member Posts: 395
    thanks :)


    But i want to see every mrssrge and not just the diagnostic.

    I want to see full log of every message before it going offline....
  • NMarkRobertsNMarkRoberts Junior Member Posts: 455
    adys wrote:
    But i want to see every mrssrge and not just the diagnostic.

    What other kinds of message are there?
  • adysadys Junior Member Posts: 395
    Push, strings, command, offline, online

    When I turn "msn on" I get only the messages that I send to 0.0.0

    how to set the master to show me every message in the log?

    Like in the netlinx device notification dialog in the studio?
  • DHawthorneDHawthorne Junior Member Posts: 4,584
    adys wrote:
    Push, strings, command, offline, online

    When I turn "msn on" I get only the messages that I send to 0.0.0

    how to set the master to show me every message in the log?

    Like in the netlinx device notification dialog in the studio?

    Telnet in, then "show log," or "show log /all." If won't help you, however, if the master locked up or if you reboot, it's in volatile memory.
  • adysadys Junior Member Posts: 395
    Thanks, now I understand...

    I must also use "set log count" to make it longer... I set it to 10000 to make it longer

    I have strange problems in this house

    since I set the main master url list with all the rest of the masters, TP are going online and offline, buttons stop responding, masters are going offline (but I can enter them with telnet and ping but must restart them to use them in AMX network) , some modules are unreachable from some TPs, while in other TPs everything is ok, debugger is not working. simply not working anymore in my code :(

    I set with the Network manager. We have new clean network only for AMX use.
    all the addresses are static.
    I checked url list, its only in the main master and contain only the other 5 masters (for now only 5 - will grow to 9 )
    There are no ip confilicts

    I suspect that there are there are AMX modules that sending message all the time (Denon 3910 Dvd, Arcam Dt91 Tuner) and maybe (maybe?) causing the main master that holds the url list to loose its mind... the masters that runing those modules are with output led turn on all the time, even not blink...
    The DVD sends update all the time about TRACK location, and the TUNER sends on Freq. But those ar AMX written modules, so maybe this way its should be?

    Is it right to set URL LIST only on one main master? I must have everyone talk with everyone, so I didn't find another way to do it without creating a loop...

    the strange that this happend from time to time and its not consistent. sometimes you press the buttons for 5 minutes and you are offline, then you are onlne again, sometimes you must do restart for the main master, sometimes to other master...

    I am in deep mess in the last days without any idea on whats going on...

    ( I am not sure this is the right forum to discuss it?)
  • DHawthorneDHawthorne Junior Member Posts: 4,584
    Just because AMX wrote the module, doesn't mean it's OK ... in fact, they are notorious for UI modules that are not optimized for large systems. You will probably have to modify every overly chatty module so that it only sends updates to the panels when it needs to, instead of all the time like the default UI modules do.

    I routinely turn on all notifications in NetLinx Diagnostics, and watch what is being passed around. When the system is idle, the only notifications you should be getting are comms on devices - no panel updates.
  • NMarkRobertsNMarkRoberts Junior Member Posts: 455
    Ooooooo!
    adys wrote:
    I have strange problems in this house

    A poltergeist, perhaps?
  • NMarkRobertsNMarkRoberts Junior Member Posts: 455
    adys wrote:
    since I set the main master url list with all the rest of the masters, TP are going online and offline, buttons stop responding, masters are going offline (but I can enter them with telnet and ping but must restart them to use them in AMX network) , some modules are unreachable from some TPs, while in other TPs everything is ok, debugger is not working. simply not working anymore in my code :(

    Adys, you are spinning on this one, you need to take a deep breath and stop trying to fix everything at once.

    Start by disconnecting / switching off / unplugging / commenting out everything except one controller, one touchpanel, one control module, one cable, one device and one device transport module eg DVD. Don't skip the "one cable" bit. You can leave the touchpanel as it is.

    Now thoroughly test this mini-system, watch the LEDs, switch on ALL notifications, and deal with any unnecessary traffic.

    Now add one more device module / cable / device and test, then the next, until you have a cleanly running single room. By now you will probably have found the problem(s) - it could be any of the things you had initially removed.

    Now do the next room in the same way.

    The BIG advantage of this approach is that at every moment you have something to show for your work - something that's OK and you can demo and feel good about.
    the masters that runing those modules are with output led turn on all the time, even not blink...

    That is a very bad sign, you need to deal with it, the best way is to unplug that module as described above.
    Is it right to set URL LIST only on one main master? I must have everyone talk with everyone, so I didn't find another way to do it without creating a loop...

    I believe that you only need to set it in one master, and everything becomes visible to everything else. Others can confirm - I've never done this.

    Have you double checked that each master has a different system number?

    Do you have lots and lots of debug chatter in your control module, that you can switch off and on at runtime? You need a Debug function which does a "send_string 0" depending on the value of a boolean which you can set on the touchpanel or wherever. Switch it on to see what's going on, switch it off to reduce "send_string 0" traffic which uses up an astonishing amount of processor time. (Don't believe me? Try it.)

    I'd say "best of luck" but with planning and a methodical approach you don't need luck.
  • John GonzalesJohn Gonzales Junior Member Posts: 609
    Do you have lots and lots of debug chatter in your control module, that you can switch off and on at runtime? You need a Debug function which does a "send_string 0" depending on the value of a boolean which you can set on the touchpanel or wherever. Switch it on to see what's going on, switch it off to reduce "send_string 0" traffic which uses up an astonishing amount of processor time. (Don't believe me? Try it.)

    This is very true. If you have systems updating touchpanel data and sending lots of debug info, it won't take long to crash. Then the system comes back on when the debug data and panel updates slows down or stops flowing -- hopefully. What you may see in the extended diagnostics is a ton of data starting to flow before the disconnect occurs.

    What might be easier than commenting out a lot of lines and devices at first might be to just remove all of the entries from the URL list. This should take all of about 10 seconds. Then you'll have 5 mini systems running and you can run diagnostics on each one to see which one might be the offending one. If that doesn't go anywhere, then I would start going through code and commenting stuff out.

    Hope you're able to find the cause.

    --John
  • GSLogicGSLogic Original Member Posts: 562
    DHawthorne wrote:
    Just because AMX wrote the module, doesn't mean it's OK ... in fact, they are notorious for UI modules that are not optimized for large systems. You will probably have to modify every overly chatty module so that it only sends updates to the panels when it needs to, instead of all the time like the default UI modules do.

    I routinely turn on all notifications in NetLinx Diagnostics, and watch what is being passed around. When the system is idle, the only notifications you should be getting are comms on devices - no panel updates.
    Well said!
    Can we write this in stone somewhere.
  • adysadys Junior Member Posts: 395
    Thanks Guys!


    1. All debug messages are commented out. I didn't know its taking CPU but its a habbit of mine to remove extra noise when not needed.

    2. When URL list being remove from the Main master , every thing is working good. I still need to test it for a while to see if its stable, but the problems started when I connected all systems with URL list... no duplicated systems, ip, or URL Lists.

    3. I guess I was too naive.. I thought AMX modules are ready for any system...first thing I am going to do is to remove those 2 modules from my system.
    Those are my main suspetcs to stuck the communication with the main Master.

    Thanks for all the info, I will do as recommended and update as soon as I will get results.


    BTW - if a master is stuck from any reason, will I get an Offline event after some time?
    I want to listen to offline event of my main master... but I am talking on cases like getting the power our, network lost etc...
    Thanks
  • REBUILD_EVENTREBUILD_EVENT Junior Member Posts: 127
    my teacher always said: "If you have something, you don't understand, reduce it until you have something you understand, then re- introduce step by step, all remaining difficulities until you have the original problem. I'd cast that in stone...

    - check if you dont have two devices (e.g. touchpanels) with the same device ID, otherwise, they'd go on- and offline all the time.

    - start with wired tp's and leave the wireless tp's beside for the beginnning.

    - check if you don't do combine_devices twice with the same things (maybe in a module?) as that causes the master to hang with the input led ON instead of flashing, if I remember correctly....
  • adysadys Junior Member Posts: 395
    update 1:

    I rewrite the ARCAM module and removed (for now) Denon Dvd module.

    For now, all problems are gone!

    I think that both module create message queue over flow in the AMX system...

    I also made some volume control improvments to save messages.

    Thanks for all the help, I will test it in the next coming days and update results.

    2 questions:

    Is levels update is diff from channel update? (for volume use - I am talking about performance)

    is it better to use Timeline event than to put "wait" in DEFINE_PROGRAM?

    Thanks again

    Ady
  • DHawthorneDHawthorne Junior Member Posts: 4,584
    Level updates and channel updates are handled mostly the same, near as I can figure. The master does track them internally if they are not virtual, and only sends updates if the data actually changes. This won't help you, however, if the data changes too frequently and that is what is causing the traffic.

    I generally prefer timelines over a wait, but sometimes I get lazy and use the wait anyway. Timelines are more precise, and can't collide with each other like wait's might if they aren't named. But timelines are also much more flexible, and can be used to stagger events more easily rather than doing everything on each pass.
  • adysadys Junior Member Posts: 395
    Thanks

    I don't fully understand how wait works.. I am coming from C++ and multi threaded env, and things are working different there.

    Given the following code:

    DEFINE_PROGRAM

    WAIT 10
    {
    // Do XXX
    }

    in every cycle of DEFINE_PROGRAM a new WAIT is born?
    So I have a huge amount of WAITS in the AIR?

    I know it works, and in reallty its doing what I need, but I don't undestand how, and if I don't waste a lot of resources...
  • jweatherjweather Junior Member Posts: 320
    adys wrote:
    in every cycle of DEFINE_PROGRAM a new WAIT is born?
    So I have a huge amount of WAITS in the AIR?

    I know it works, and in reallty its doing what I need, but I don't undestand how, and if I don't waste a lot of resources...


    See http://www.amxforums.com/showthread.php?t=2908 regarding waits. Only one instance of each wait can be active at a time, so putting one in your define_program only starts a single wait. Each time through the mainline, it checks to see if the wait is still running -- if it is, it ignores the statement, otherwise it starts it again. When the wait expires, the statement runs. So, it's a simple way to say "do this every 5 seconds" or whatever interval you want without adding a timeline.

    Jeremy
  • adysadys Junior Member Posts: 395
    Thanks

    We need to write AMX book... too much undoumented features
  • ericmedleyericmedley Senior Member - 4000+ posts Posts: 4,159
    adys wrote:
    Thanks

    We need to write AMX book... too much undoumented features

    They give you a book 'of sorts' when you go to AMX training. I very rarely ever look at it. But I'm not a reference book kinda guy. So, I'm not a very good example. I usually either bump around in the help file or come here.
Sign In or Register to comment.