Need help/critique on a system slowdown
PHSJason
Posts: 66
I have a simple system with a 3100, a precis 8x8, and virtual keypad to run the system. Client is having an issue where every couple to few weeks, the system slows to a crawl. I have been digging around and tweaking things but have been unable to isolate the issue.
Logging into the system and checking error logs shows no errors. Last time the system was running slow, the CPU usage was at 1.57% with a max 5%.
Would anyone be willing to look over my workspace and tell me if they can see any bugs that can cause this kind of slow down?
Please excuse the code mess, I have been tweaking it a bit in the last few weeks.
Thanks
Jason
Logging into the system and checking error logs shows no errors. Last time the system was running slow, the CPU usage was at 1.57% with a max 5%.
Would anyone be willing to look over my workspace and tell me if they can see any bugs that can cause this kind of slow down?
Please excuse the code mess, I have been tweaking it a bit in the last few weeks.
Thanks
Jason
0
Comments
Otherwise, I'm not spotting anything out of the ordinary. Possibly the Virtual Keypad might be an issue? Kind of hard to test especially since your only interface to the program is that.
What about my parsing of IP data in the GET_IP module? I am currently parsing the data in the offline event. Should I move this to a string event?
Client called today. System had slowed to crawl.
Site visit.
reboot managed network switch, no joy.
reboot overloaded router, no joy
telnet to master=
cpu usage 0.5%
memory was all good(I should have copied/pasted the numbers)
duet mem = 12m
clean disk -f + reboot
reload code having changed some persistent var to nonvolatile or volatile.
Back to normal operation
I think that it has to do with timed events as the weekly cycle is pretty predictable...
Any other ideas?
When the system is *idle* all buffers should be empty and if there are messages in the buffers they should clear out quickly (within a few seconds or so.) If you have any messages permanently stuck in the buffer (sometimes it only takes 1) the system will slow down to a crawl.
What version firmware are you running? I have a small job with an NI-3100 and code that was running fine for about a year until the NI was upgraded to the latest version (NI Master=v.3.60.453, Device=v.1.30.8) After the upgrade the system exhibited the same type of problems you are describing.
When I did a *show buffers* I saw that messages were queued up in the Axlink buffer and that they were climbing (in the hundreds) and there was nothing physically attached to the Axlink bus. AMX confirmed that there should be 0 messages in the Axlink queue if nothing is attached to the Axlink bus even if you have Axlink device numbers defined in code.
AMX has a hotfix for this issue (3.60.455) but when that hotfix was applied the NI-3100 was still having problems. So the NI-3100 was swapped out for a NI-3101 running the latest released firmware and we saw the same Axlink buffer problem. The tech onsite was about to apply the hotfix to the NI-3101 when all of a sudden for no apparent reason the Axlink buffer problem disappeared. It was late in the day so the NI-3101 was left as is without the hotfix.
I don’t know if any of this applies to your situation so take it for what it’s worth.
Since you are running a Duet module I would check the Duet memory partition and I would also do a show max buffers after a reboot and confirm that nothing looks out the ordinary.
SHOW BUFFERS = no messages in queues or waiting
SHOW MAX BUFFERS =
Device info:
Also take a look at any IP devices (if you have them ... as usual, I'm racing through the forums before hitting the road and don't have time to look at your workspace right now); I had a similar case that turned out to be an errant IP-RS-232 device that periodically lost connection, then bogged down the master in its attempts to re-connect. When I switched it to straight RS-232, the problem went away.
Had to do another reboot this morning.
Here are the before reboot numbers(forgot to do a max buffers):
>show mem
Display Memory
Volatile Free : 22139480/67108864 (largest free block in bytes/max physical)
NonVolatile Free: 1039840/1047536 (bytes free/max physical)
Disk Free :502071296/512196608 (bytes of free space/max physical)
Duet Memory Free : 7942684 (bytes)
Partition 1 - 7942684 (bytes)
Total Collections - 5
Average Time Between Collections - 38582490ms
Partition 2 - <UNKNOWN>
>cpu usage
CPU usage = 0.10% (30 sec. average = 0.01%, 30 sec. max = 0.10%)
After a reboot, it shows this:
>show mem
Display Memory
Volatile Free : 22723032/67108864 (largest free block in bytes/max physical)
NonVolatile Free: 1039840/1047536 (bytes free/max physical)
Disk Free :502071296/512196608 (bytes of free space/max physical)
Duet Memory Free : 0 (bytes)
Partition 1 - <UNKNOWN>
Partition 2 - <UNKNOWN>
>cpu usage
CPU usage = 0.00% (30 sec. average = 0.61%, 30 sec. max = 17.64%)
>show max buffers
Show Max Buffers
Thread TX RX
---- ----
Axlink 1
UDP 1
IPCon Mgr 1 (Total for TCP Connections TX=3)
Con Manager 31
Interpreter 18
Device Mgr 34
Diag Mgr 1
Msg Dispatc 0
Cfg Mgr 0
Route Mgr 0
Notify Mgr 0
Java Router 0
---- ---- ----
Total 2 85 GrandTotal 87
>
Paul
Both the UI and slow responses in telnet.