Home AMX User Forum NetLinx Studio

Need help/critique on a system slowdown

I have a simple system with a 3100, a precis 8x8, and virtual keypad to run the system. Client is having an issue where every couple to few weeks, the system slows to a crawl. I have been digging around and tweaking things but have been unable to isolate the issue.
Logging into the system and checking error logs shows no errors. Last time the system was running slow, the CPU usage was at 1.57% with a max 5%.

Would anyone be willing to look over my workspace and tell me if they can see any bugs that can cause this kind of slow down?
Please excuse the code mess, I have been tweaking it a bit in the last few weeks.


Thanks
Jason

Comments

  • Looking at it, the first thing I see is that you have a standard netlinx module that is attached to a Duet virtual device. From what I understand, the Duet virtual devices, 41001 to 43 something, are treated differently by the processor and so might cause issues. I would suggest changing that device to a standard virtual device, maybe 33001, and see if that helps with anything.
  • jjamesjjames Posts: 2,908
    I thought the virtual keypad took two virtual devices, not a virtual and a real IP device, at least this is what I remembered reading when it first came out several years ago. I'd be care with the number of persistent variables you have, especially in the AutoPatch include. IMO, persistent variables are useful for storing presets and anything you absolutely must have stored after program changes, etc.

    Otherwise, I'm not spotting anything out of the ordinary. Possibly the Virtual Keypad might be an issue? Kind of hard to test especially since your only interface to the program is that.
  • Thanks for the replies and time guys. IIRC, the device definitions such as device numbers etc were cut/pasted directly from the virtual keypad module and example file. I will try changing them.
    What about my parsing of IP data in the GET_IP module? I am currently parsing the data in the offline event. Should I move this to a string event?
  • jjamesjjames Posts: 2,908
    As a rule of thumb, keep it in the offline event. You could *potentially* be parsing incomplete data while doing it in the string event.
  • UPDATE

    Client called today. System had slowed to crawl.
    Site visit.
    reboot managed network switch, no joy.
    reboot overloaded router, no joy
    telnet to master=
    cpu usage 0.5%
    memory was all good(I should have copied/pasted the numbers)
    duet mem = 12m
    clean disk -f + reboot

    reload code having changed some persistent var to nonvolatile or volatile.
    Back to normal operation

    I think that it has to do with timed events as the weekly cycle is pretty predictable...
    Any other ideas?
  • jjamesjjames Posts: 2,908
    Just curious - have you tried loading it on a different master to see if it does the same thing? Possibly try implementing a logging feature and write it to disk, or email you the log daily. At the bare minimum, log the button presses' date & time, and ask the client to keep a close eye as to when it starts to slow down.
  • Joe HebertJoe Hebert Posts: 2,159
    PHSJason wrote: »
    I have a simple system with a 3100, a precis 8x8, and virtual keypad to run the system. Client is having an issue where every couple to few weeks, the system slows to a crawl.
    Next time the system bogs down telnet into the master and from the command prompt do a show buffers

    When the system is *idle* all buffers should be empty and if there are messages in the buffers they should clear out quickly (within a few seconds or so.) If you have any messages permanently stuck in the buffer (sometimes it only takes 1) the system will slow down to a crawl.

    What version firmware are you running? I have a small job with an NI-3100 and code that was running fine for about a year until the NI was upgraded to the latest version (NI Master=v.3.60.453, Device=v.1.30.8) After the upgrade the system exhibited the same type of problems you are describing.

    When I did a *show buffers* I saw that messages were queued up in the Axlink buffer and that they were climbing (in the hundreds) and there was nothing physically attached to the Axlink bus. AMX confirmed that there should be 0 messages in the Axlink queue if nothing is attached to the Axlink bus even if you have Axlink device numbers defined in code.

    AMX has a hotfix for this issue (3.60.455) but when that hotfix was applied the NI-3100 was still having problems. So the NI-3100 was swapped out for a NI-3101 running the latest released firmware and we saw the same Axlink buffer problem. The tech onsite was about to apply the hotfix to the NI-3101 when all of a sudden for no apparent reason the Axlink buffer problem disappeared. It was late in the day so the NI-3101 was left as is without the hotfix.

    I don’t know if any of this applies to your situation so take it for what it’s worth.

    Since you are running a Duet module I would check the Duet memory partition and I would also do a show max buffers after a reboot and confirm that nothing looks out the ordinary.
  • Thanks for the advice.

    SHOW BUFFERS = no messages in queues or waiting
    SHOW MAX BUFFERS =
    2011-12-17_104915.jpg

    Device info:
    2011-12-17_105453.jpg
  • DHawthorneDHawthorne Posts: 4,584
    Does "msg on all" from Telnet offer any insight when it's bogged down? There may be a lot of chatter from some devices or modules that you don't normally see.

    Also take a look at any IP devices (if you have them ... as usual, I'm racing through the forums before hitting the road and don't have time to look at your workspace right now); I had a similar case that turned out to be an errant IP-RS-232 device that periodically lost connection, then bogged down the master in its attempts to re-connect. When I switched it to straight RS-232, the problem went away.
  • PHSJasonPHSJason Posts: 66
    Update:

    Had to do another reboot this morning.

    Here are the before reboot numbers(forgot to do a max buffers):
    >show mem
    Display Memory
    Volatile Free : 22139480/67108864 (largest free block in bytes/max physical)
    NonVolatile Free: 1039840/1047536 (bytes free/max physical)
    Disk Free :502071296/512196608 (bytes of free space/max physical)
    Duet Memory Free : 7942684 (bytes)
    Partition 1 - 7942684 (bytes)
    Total Collections - 5
    Average Time Between Collections - 38582490ms
    Partition 2 - <UNKNOWN>
    >cpu usage
    CPU usage = 0.10% (30 sec. average = 0.01%, 30 sec. max = 0.10%)


    After a reboot, it shows this:
    >show mem
    Display Memory
    Volatile Free : 22723032/67108864 (largest free block in bytes/max physical)
    NonVolatile Free: 1039840/1047536 (bytes free/max physical)
    Disk Free :502071296/512196608 (bytes of free space/max physical)
    Duet Memory Free : 0 (bytes)
    Partition 1 - <UNKNOWN>
    Partition 2 - <UNKNOWN>
    >cpu usage
    CPU usage = 0.00% (30 sec. average = 0.61%, 30 sec. max = 17.64%)
    >show max buffers
    Show Max Buffers

    Thread TX RX
    ---- ----
    Axlink 1
    UDP 1
    IPCon Mgr 1 (Total for TCP Connections TX=3)

    Con Manager 31
    Interpreter 18
    Device Mgr 34
    Diag Mgr 1
    Msg Dispatc 0
    Cfg Mgr 0
    Route Mgr 0
    Notify Mgr 0
    Java Router 0
    ---- ---- ----
    Total 2 85 GrandTotal 87
    >
  • Most of my systems (not just AMX) have a reboot(0) at 4am every day just so I know for a fact that each morning the system is starting up in a known state.
  • PHSJasonPHSJason Posts: 66
    I just logged-in again, set my duet mem lower and disabled the NDP beacon. This is a network that I have no control over and there are guest devices popping online at all times. I had locked the master down through security, disabled ssh UDP BC etc. Didn't know about the NDP beacon.
  • When the system "slows to a crawl" is that just perceived at the UI or is it present when you directly manipulate device control from Diagnostics or Terminal control?
  • a_riot42a_riot42 Posts: 1,624
    To debug this I would write channel events for the IP devices with a channel of 0, and then print to the console a string in the on/off clauses and see if there is any issue going on with the IP devices.
    Paul
  • PHSJasonPHSJason Posts: 66
    icraigie wrote: »
    When the system "slows to a crawl" is that just perceived at the UI or is it present when you directly manipulate device control from Diagnostics or Terminal control?

    Both the UI and slow responses in telnet.
Sign In or Register to comment.