Startup trouple with the Interpreter buffer
sphere27
Posts: 40
I have been going around and around for the last few days trying to figure out why I'm overrunning the interpreter buffer on startup.
I've bumped the threshold up to 2000 and its still exceeding it......sometimes.
That's the strange part. I commented out different parts of code (subsystems, room controls, etc) and watched the performance at startup. I thought that I'd be able to isolate who was building up all the cmds in the interpreter. To an extent I could see a change as I removed sections but the results are inconclusive.
What make it even more frustrating is that I can load the exact same program and on one load the max that that interpreter gets to is ~1500 on the next it goes all the way to 2k. So that tells me it's environmental. I have device holdoff on.
Has anyone else wrestled with this before?
I realize it's a loaded question that's based on how my code is written and a ton of other details but in searching the forums for 'interpreter' I couldn't find any relevent threads.
It also appears that sometimes when the interpreter overruns the ICSNet becomes locked up and there are 2 (always 2) stuck messages in the TX buffer for ICSNet.
it's a pretty big system for me (25 TPs, duet driven autopatch with 32 zones, 8 met-ecoms)
For everything I use a virtual device to interface with the physical device. So for the 20 cable boxes I have 20 modules passing in virtual_dev, physical_dev, tp_dev[]
Am I maybe using too many virtual devices?
I've bumped the threshold up to 2000 and its still exceeding it......sometimes.
That's the strange part. I commented out different parts of code (subsystems, room controls, etc) and watched the performance at startup. I thought that I'd be able to isolate who was building up all the cmds in the interpreter. To an extent I could see a change as I removed sections but the results are inconclusive.
What make it even more frustrating is that I can load the exact same program and on one load the max that that interpreter gets to is ~1500 on the next it goes all the way to 2k. So that tells me it's environmental. I have device holdoff on.
Has anyone else wrestled with this before?
I realize it's a loaded question that's based on how my code is written and a ton of other details but in searching the forums for 'interpreter' I couldn't find any relevent threads.
It also appears that sometimes when the interpreter overruns the ICSNet becomes locked up and there are 2 (always 2) stuck messages in the TX buffer for ICSNet.
it's a pretty big system for me (25 TPs, duet driven autopatch with 32 zones, 8 met-ecoms)
For everything I use a virtual device to interface with the physical device. So for the 20 cable boxes I have 20 modules passing in virtual_dev, physical_dev, tp_dev[]
Am I maybe using too many virtual devices?
0
Comments
It appears the the problem is somehow network related. If I take the processor off the network and reboot then put it back on the network it comes up fine. Since I don't really want to live behind the rack to do the unplugging everytime a load or reboot happens I investigated further with tech support.
Setting the processor to a fixed 100 full duplex mode and setting the network port on the switch to the same mode helped. But then I noticed that every once in a while the processor wouldnt' come back on the network after reboot. Setting the port on the switch to auto-negociate (but only allowing 100 full duplex connections) seems to have gotten me back in the game.
Joy
I don't think it's your virtual devices at all. Quick test, temporarily comment devicesout and see what happens. My money is on the MET-Ecoms.
If you've done everything you can do in code (like stagger delays in intensive startup and online ops), there's nothing for it but wait it out. In really egregious cases, I suppose you could put a page up on your panels that says "Please Wait while system initializes," then take it down after a time period you know is safe.
Devices will all show online but some things might not work. For instance, the autopatch module won't have gotten it's re-init cmd and therefore won't allow any cmds through it's virtual device. Stuff works, but some stuff doesn't.
I thought that I had this licked but the issue is back. I will try commenting out devices to see what happens but one of the things that's so frustrating about troubleshooting this problem is that the behavior is different even if I don't change code but just reboot and watch it through the boot cycle.
Thanks,
Jimi
In my case the culprit was a very chatty HVAC system operating via a Canbus to serial interface, with a complex parsing routine that would kill the message queue even with code to delay updates to the 12 TP's as they came online for first time after a master reboot.
The solution that worked was to run the power for the canbus interface via a relay and only energise this once the masters were all back online and settled. The HVAC data could then be dealt with in a normal manner.
I have yet to isolate what is causing this but I pulled 10 of the 25 touchpanels and am using an NI900 to manage those connections. So far it seems to have kept me within specs for the bootup. Once up, the processor runs perfectly.
It is a relief to hear that someone has seen this and it's not just me. But I would still like to know what the heck is really at the root. Pulling of the METECOMS and HVAC devices from code didn't fix it.
With the help of AMX I was able to finally get to a real solution for this problem. We determined that if I disconnected the processor from the network during startup, it came up fine.
So they suggested that I populate my URL list through code instead of netlinx studio. Instead of putting URL entries in the primary master (I tried it the other way too; with URLs only in the secondary masters) I put this code in for all the URLS I wanted to create on startup. Another advantage to doing it this way is that in the event of a processor failure, the URL list wouldn't have to be configured by the tech on site.
DEFINE_DEVICE
dvMaster = 0:1:0 //Self
Somewhere in code
STACK_VAR SLONG slADDRESULT
STACK_VAR URL_STRUCT AMX_URL
AMX_URL.Flags=17 //17 denotes temporary, TCP connection
AMX_URL.PORT=1319
AMX_URL.URL='192.168.240.170' //IP of remote master
slADDRESULT=ADD_URL_ENTRY(dvMaster,AMX_URL)
I never knew about this feature and was wondering where I would reference it besides tech support.