Startup trouple with the Interpreter buffer

sphere27 · May 2012

I have been going around and around for the last few days trying to figure out why I'm overrunning the interpreter buffer on startup.

I've bumped the threshold up to 2000 and its still exceeding it......sometimes.

That's the strange part. I commented out different parts of code (subsystems, room controls, etc) and watched the performance at startup. I thought that I'd be able to isolate who was building up all the cmds in the interpreter. To an extent I could see a change as I removed sections but the results are inconclusive.

What make it even more frustrating is that I can load the exact same program and on one load the max that that interpreter gets to is ~1500 on the next it goes all the way to 2k. So that tells me it's environmental. I have device holdoff on.

Has anyone else wrestled with this before?

I realize it's a loaded question that's based on how my code is written and a ton of other details but in searching the forums for 'interpreter' I couldn't find any relevent threads.

It also appears that sometimes when the interpreter overruns the ICSNet becomes locked up and there are 2 (always 2) stuck messages in the TX buffer for ICSNet.

it's a pretty big system for me (25 TPs, duet driven autopatch with 32 zones, 8 met-ecoms)

For everything I use a virtual device to interface with the physical device. So for the 20 cable boxes I have 20 modules passing in virtual_dev, physical_dev, tp_dev[]

Am I maybe using too many virtual devices?

sphere27 · May 2012

Just a reply to my own desperate email.

It appears the the problem is somehow network related. If I take the processor off the network and reboot then put it back on the network it comes up fine. Since I don't really want to live behind the rack to do the unplugging everytime a load or reboot happens I investigated further with tech support.

Setting the processor to a fixed 100 full duplex mode and setting the network port on the switch to the same mode helped. But then I noticed that every once in a while the processor wouldnt' come back on the network after reboot. Setting the port on the switch to auto-negociate (but only allowing 100 full duplex connections) seems to have gotten me back in the game.

Joy

DHawthorne · May 2012

It's more than likely your physical devices bogging down on startup. ICSNet devices, especially, take forever to come online and respond properly. In most cases, once they catch up, everything is fine, but it does play heck with an operations attempted before they do. I've got a mid-sized job right now that stalls somewhat on startup because I have a half-dozen of the new EXB boxes, and it seems randomly one or more of them is slow coming up and backs up the message queue.

I don't think it's your virtual devices at all. Quick test, temporarily comment devicesout and see what happens. My money is on the MET-Ecoms.

If you've done everything you can do in code (like stagger delays in intensive startup and online ops), there's nothing for it but wait it out. In really egregious cases, I suppose you could put a page up on your panels that says "Please Wait while system initializes," then take it down after a time period you know is safe.

sphere27 · May 2012

Thanks for the response. The effects of this are more disruptive than a long boot up time. When the interpreter queues up to 2000k it dumps all it's pending messages. When that happens the system seems to come up in a screwey way.

Devices will all show online but some things might not work. For instance, the autopatch module won't have gotten it's re-init cmd and therefore won't allow any cmds through it's virtual device. Stuff works, but some stuff doesn't.

I thought that I had this licked but the issue is back. I will try commenting out devices to see what happens but one of the things that's so frustrating about troubleshooting this problem is that the behavior is different even if I don't change code but just reboot and watch it through the boot cycle.

Thanks,

Jimi

Jimweir192 · May 2012

I had this exact issue recently and it near killed me trying to isolate this.

In my case the culprit was a very chatty HVAC system operating via a Canbus to serial interface, with a complex parsing routine that would kill the message queue even with code to delay updates to the 12 TP's as they came online for first time after a master reboot.

The solution that worked was to run the power for the canbus interface via a relay and only energise this once the masters were all back online and settled. The HVAC data could then be dealt with in a normal manner.

sphere27 · May 2012

Very interesting. I have been toying with the idea of killing the network to the processor through a relay but that is obviously a desperate attempt.

I have yet to isolate what is causing this but I pulled 10 of the 25 touchpanels and am using an NI900 to manage those connections. So far it seems to have kept me within specs for the bootup. Once up, the processor runs perfectly.

It is a relief to hear that someone has seen this and it's not just me. But I would still like to know what the heck is really at the root. Pulling of the METECOMS and HVAC devices from code didn't fix it.

sphere27 · October 2012

As a follow up on this issue.

With the help of AMX I was able to finally get to a real solution for this problem. We determined that if I disconnected the processor from the network during startup, it came up fine.

So they suggested that I populate my URL list through code instead of netlinx studio. Instead of putting URL entries in the primary master (I tried it the other way too; with URLs only in the secondary masters) I put this code in for all the URLS I wanted to create on startup. Another advantage to doing it this way is that in the event of a processor failure, the URL list wouldn't have to be configured by the tech on site.

DEFINE_DEVICE

dvMaster = 0:1:0 //Self

Somewhere in code

STACK_VAR SLONG slADDRESULT

STACK_VAR URL_STRUCT AMX_URL

AMX_URL.Flags=17 //17 denotes temporary, TCP connection

AMX_URL.PORT=1319

AMX_URL.URL='192.168.240.170' //IP of remote master

slADDRESULT=ADD_URL_ENTRY(dvMaster,AMX_URL)

I never knew about this feature and was wondering where I would reference it besides tech support.

Andrew G Welker · October 2012

If you look in the Netlinx Keywords help file from Netlinx Studio, you can find a lot more information on what you can do with the URL list commands and how to use them. You can search for ADD_URL_ENTRY, DELETE_URL_ENTRY, or GET_URL_LIST in the index, or in contents, expand Netlinx Keywords, then IP Keywords. That should give you all the information you need to use these commands in the future.

sphere27 · October 2012

Thank you for the direction

Startup trouple with the Interpreter buffer

Comments