AddInternalEvent queue blowout
NMarkRoberts
Posts: 455
in AMX Hardware
Somewhere in my code for a sizeable system I'm overloading a NetLinx processor (I tried it with an NI700, NI2000 and NI4000, non-Duet and a quick try on Duet) causing a series of [telnet / msg on] messages as follows:
CIpInterpreter::Run - Execute Startup Code
CIpEvent::AddInternalEvent - Max Queue Count = 25
CIpEvent::AddInternalEvent - Max Queue Count = 50
CIpEvent::AddInternalEvent - Max Queue Count = 75
and when it hits 300 it crashes the queue to zero, causing random failures of my startup code.
The message log is similar but different:
(Reader=WDMTCP15 writer=tIPConMgr)- CMessagePipe::Max = 75
(Reader=WDMTCP15 writer=tIPConMgr)- CMessagePipe::Max = 50
(Reader=WDMTCP15 writer=tIPConMgr)- CMessagePipe::Max = 25
I have no idea which queue this is and it does not appear to correlate with any of those shown by "set queue size" or "set threshold". I once watched the queue climb to 200, pause for 40 seconds (!) then continue climbing from 200 (!) to 300 and crash.
I can more or less reliably keep it under control by optimising code, cutting out modules and spacing out the module startups, but I'd rather know what's causing it or how to stretch the queue max.
Any ideas?
CIpInterpreter::Run - Execute Startup Code
CIpEvent::AddInternalEvent - Max Queue Count = 25
CIpEvent::AddInternalEvent - Max Queue Count = 50
CIpEvent::AddInternalEvent - Max Queue Count = 75
and when it hits 300 it crashes the queue to zero, causing random failures of my startup code.
The message log is similar but different:
(Reader=WDMTCP15 writer=tIPConMgr)- CMessagePipe::Max = 75
(Reader=WDMTCP15 writer=tIPConMgr)- CMessagePipe::Max = 50
(Reader=WDMTCP15 writer=tIPConMgr)- CMessagePipe::Max = 25
I have no idea which queue this is and it does not appear to correlate with any of those shown by "set queue size" or "set threshold". I once watched the queue climb to 200, pause for 40 seconds (!) then continue climbing from 200 (!) to 300 and crash.
I can more or less reliably keep it under control by optimising code, cutting out modules and spacing out the module startups, but I'd rather know what's causing it or how to stretch the queue max.
Any ideas?
0
Comments
The Reader=WDMTCP15, writer=tIPConMgr means something.
WDMTCP15 is a thread for transmitting data to a G3 Web Control (WDM means Web Device Manager). Since it is the reader, that means messages are being sent to it but it is not reading it, most likely because messages are not flowing over the network fast enough, or the G3WebControl Applet is not responding to messages that have been sent.
What are you thresholds and queue sizes set to?
Good thoughts which I shall look into but it turns out G3 web control was a red herring. Chuck has generously maintained a dialogue with me all weekend (and it was a long one) with so far no magic solution.
He suggested that I should make the queue sizes long and the threshold sizes short. This would mean that the interpreter would stall and not overload the queues. So I experimented with the interpreter threshold and discovered that when set to 50 or 200 the queue will crash, but when set to 100 it scrapes through when I do a minimal installation. So my site is working, but not at "full strength". Adding one more room control module (and it's a biggie) blows the queue.
Chuck's latest idea is to set device holdoff, which makes NI devices come online after define_start has run. Good idea but no effect.
He has confirmed that the offending queue is indeed "internal" and cannot be adjusted.
It took me a while to pin this down. In summary, it seems that rebuild_event on a large button array is hard work for the processor and it relies on a queue which is too short in the following circumstances.
If you build a system as follows, the internal event queue blows out and crashes:
1) many (25+) modules with
2) a button_event on a large button array (200ish)
3) any code in the define_program
4) a call to rebuild_event at timing less than 5/10 second
By "timing" I mean the delay between each successive module's call to rebuild_event.
Here is the relevant code; first the module:
And now the mainline code:
Chuck
Thanks.
The manual says that it is module-specific, or at least that's how I read it.
My resolution was to ensure that I spaced out the rebuild_event calls. I already had this mechanism in place for my module startups but the rebuild_event had fallen through the cracks.
BTW I asked the very helpful folk at AMX Australia when a rebuild_event is necessary and here is the reply:
The REBUILD_EVENT() function should be called in the following
situations:
1) Where an Event is coded for a DEV variable and the Number, Port,
and/or System portion of that DEV variable is changed in the code during
runtime.
2) Where an Event is coded for an intrinsic data type variable such as
an INTEGER, LONG, etc. and the value of that intrinsic data type
variable is changed in the code during runtime.
3) Where an Event is coded for an array and the values in that array are
changed in code during runtime.
4) Where an Event is coded for an array and the length of the array is
changed in code during runtime.
If you never do any of the above then you should not call the
REBUILD_EVENT() function.
The NetLinx Keyword help file states specifically:
REBUILD_EVENT() works on a module-by-module basis (i.e. calling the function in one module does not affect the event table of another module).
Sorry again for any confusion.
Chuck
Those of you who are subscribed will already have seen this brand new technote. Download the PDF to see the last line of text and a diagram.
Reading all the way to the end (heavy going, I know!) reveals:
This version is Duet only.