Options

Need help/critique on a system slowdown

PHSJason Posts: 66

December 2011 in NetLinx Studio

I have a simple system with a 3100, a precis 8x8, and virtual keypad to run the system. Client is having an issue where every couple to few weeks, the system slows to a crawl. I have been digging around and tweaking things but have been unable to isolate the issue.
Logging into the system and checking error logs shows no errors. Last time the system was running slow, the CPU usage was at 1.57% with a max 5%.

Would anyone be willing to look over my workspace and tell me if they can see any bugs that can cause this kind of slow down?
Please excuse the code mess, I have been tweaking it a bit in the last few weeks.

Thanks
Jason

Blacksmith Workspace 2.AXW 3.6M

Comments

Andrew G Welker Posts: 124

December 2011

Looking at it, the first thing I see is that you have a standard netlinx module that is attached to a Duet virtual device. From what I understand, the Duet virtual devices, 41001 to 43 something, are treated differently by the processor and so might cause issues. I would suggest changing that device to a standard virtual device, maybe 33001, and see if that helps with anything.

0
jjames Posts: 2,908

December 2011

I thought the virtual keypad took two virtual devices, not a virtual and a real IP device, at least this is what I remembered reading when it first came out several years ago. I'd be care with the number of persistent variables you have, especially in the AutoPatch include. IMO, persistent variables are useful for storing presets and anything you absolutely must have stored after program changes, etc.

Otherwise, I'm not spotting anything out of the ordinary. Possibly the Virtual Keypad might be an issue? Kind of hard to test especially since your only interface to the program is that.

0
PHSJason Posts: 66

December 2011

Thanks for the replies and time guys. IIRC, the device definitions such as device numbers etc were cut/pasted directly from the virtual keypad module and example file. I will try changing them.
What about my parsing of IP data in the GET_IP module? I am currently parsing the data in the offline event. Should I move this to a string event?

0
jjames Posts: 2,908

December 2011

As a rule of thumb, keep it in the offline event. You could *potentially* be parsing incomplete data while doing it in the string event.

0
PHSJason Posts: 66

December 2011

UPDATE

Client called today. System had slowed to crawl.
Site visit.
reboot managed network switch, no joy.
reboot overloaded router, no joy
telnet to master=
cpu usage 0.5%
memory was all good(I should have copied/pasted the numbers)
duet mem = 12m
clean disk -f + reboot

reload code having changed some persistent var to nonvolatile or volatile.
Back to normal operation

I think that it has to do with timed events as the weekly cycle is pretty predictable...
Any other ideas?

0
jjames Posts: 2,908

December 2011

Just curious - have you tried loading it on a different master to see if it does the same thing? Possibly try implementing a logging feature and write it to disk, or email you the log daily. At the bare minimum, log the button presses' date & time, and ask the client to keep a close eye as to when it starts to slow down.

0
Joe Hebert Posts: 2,159

December 2011

PHSJason wrote: »

I have a simple system with a 3100, a precis 8x8, and virtual keypad to run the system. Client is having an issue where every couple to few weeks, the system slows to a crawl.

Next time the system bogs down telnet into the master and from the command prompt do a show buffers

When the system is *idle* all buffers should be empty and if there are messages in the buffers they should clear out quickly (within a few seconds or so.) If you have any messages permanently stuck in the buffer (sometimes it only takes 1) the system will slow down to a crawl.

What version firmware are you running? I have a small job with an NI-3100 and code that was running fine for about a year until the NI was upgraded to the latest version (NI Master=v.3.60.453, Device=v.1.30.8) After the upgrade the system exhibited the same type of problems you are describing.

When I did a *show buffers* I saw that messages were queued up in the Axlink buffer and that they were climbing (in the hundreds) and there was nothing physically attached to the Axlink bus. AMX confirmed that there should be 0 messages in the Axlink queue if nothing is attached to the Axlink bus even if you have Axlink device numbers defined in code.

AMX has a hotfix for this issue (3.60.455) but when that hotfix was applied the NI-3100 was still having problems. So the NI-3100 was swapped out for a NI-3101 running the latest released firmware and we saw the same Axlink buffer problem. The tech onsite was about to apply the hotfix to the NI-3101 when all of a sudden for no apparent reason the Axlink buffer problem disappeared. It was late in the day so the NI-3101 was left as is without the hotfix.

I don’t know if any of this applies to your situation so take it for what it’s worth.

Since you are running a Duet module I would check the Duet memory partition and I would also do a show max buffers after a reboot and confirm that nothing looks out the ordinary.

0
PHSJason Posts: 66

December 2011

Thanks for the advice.

SHOW BUFFERS = no messages in queues or waiting
SHOW MAX BUFFERS =

Device info:

0
DHawthorne Posts: 4,584

December 2011

Does "msg on all" from Telnet offer any insight when it's bogged down? There may be a lot of chatter from some devices or modules that you don't normally see.

Also take a look at any IP devices (if you have them ... as usual, I'm racing through the forums before hitting the road and don't have time to look at your workspace right now); I had a similar case that turned out to be an errant IP-RS-232 device that periodically lost connection, then bogged down the master in its attempts to re-connect. When I switched it to straight RS-232, the problem went away.

0
PHSJason Posts: 66

August 2012

Update:

Had to do another reboot this morning.

Here are the before reboot numbers(forgot to do a max buffers):
>show mem
Display Memory
Volatile Free : 22139480/67108864 (largest free block in bytes/max physical)
NonVolatile Free: 1039840/1047536 (bytes free/max physical)
Disk Free :502071296/512196608 (bytes of free space/max physical)
Duet Memory Free : 7942684 (bytes)
Partition 1 - 7942684 (bytes)
Total Collections - 5
Average Time Between Collections - 38582490ms
Partition 2 - <UNKNOWN>
>cpu usage
CPU usage = 0.10% (30 sec. average = 0.01%, 30 sec. max = 0.10%)

After a reboot, it shows this:
>show mem
Display Memory
Volatile Free : 22723032/67108864 (largest free block in bytes/max physical)
NonVolatile Free: 1039840/1047536 (bytes free/max physical)
Disk Free :502071296/512196608 (bytes of free space/max physical)
Duet Memory Free : 0 (bytes)
Partition 1 - <UNKNOWN>
Partition 2 - <UNKNOWN>
>cpu usage
CPU usage = 0.00% (30 sec. average = 0.61%, 30 sec. max = 17.64%)
>show max buffers
Show Max Buffers

Thread TX RX
---- ----
Axlink 1
UDP 1
IPCon Mgr 1 (Total for TCP Connections TX=3)

Con Manager 31
Interpreter 18
Device Mgr 34
Diag Mgr 1
Msg Dispatc 0
Cfg Mgr 0
Route Mgr 0
Notify Mgr 0
Java Router 0
---- ---- ----
Total 2 85 GrandTotal 87
>

0
regallion Posts: 95

August 2012

Most of my systems (not just AMX) have a reboot(0) at 4am every day just so I know for a fact that each morning the system is starting up in a known state.

0
PHSJason Posts: 66

August 2012

I just logged-in again, set my duet mem lower and disabled the NDP beacon. This is a network that I have no control over and there are guest devices popping online at all times. I had locked the master down through security, disabled ssh UDP BC etc. Didn't know about the NDP beacon.

0
HARMAN_icraigie Posts: 660

August 2012

When the system "slows to a crawl" is that just perceived at the UI or is it present when you directly manipulate device control from Diagnostics or Terminal control?

0
a_riot42 Posts: 1,624

August 2012

To debug this I would write channel events for the IP devices with a channel of 0, and then print to the console a string in the on/off clauses and see if there is any issue going on with the IP devices.
Paul

0
PHSJason Posts: 66

August 2012

icraigie wrote: »

When the system "slows to a crawl" is that just perceived at the UI or is it present when you directly manipulate device control from Diagnostics or Terminal control?

Both the UI and slow responses in telnet.

0

or Register to comment.