AddInternalEvent queue blowout

NMarkRoberts · February 2007

Somewhere in my code for a sizeable system I'm overloading a NetLinx processor (I tried it with an NI700, NI2000 and NI4000, non-Duet and a quick try on Duet) causing a series of [telnet / msg on] messages as follows:

CIpInterpreter::Run - Execute Startup Code
CIpEvent::AddInternalEvent - Max Queue Count = 25
CIpEvent::AddInternalEvent - Max Queue Count = 50
CIpEvent::AddInternalEvent - Max Queue Count = 75

and when it hits 300 it crashes the queue to zero, causing random failures of my startup code.

The message log is similar but different:

(Reader=WDMTCP15 writer=tIPConMgr)- CMessagePipe::Max = 75
(Reader=WDMTCP15 writer=tIPConMgr)- CMessagePipe::Max = 50
(Reader=WDMTCP15 writer=tIPConMgr)- CMessagePipe::Max = 25

I have no idea which queue this is and it does not appear to correlate with any of those shown by "set queue size" or "set threshold". I once watched the queue climb to 200, pause for 40 seconds (!) then continue climbing from 200 (!) to 300 and crash.

I can more or less reliably keep it under control by optimising code, cutting out modules and spacing out the module startups, but I'd rather know what's causing it or how to stretch the queue max.

Any ideas?

cwpartridge · February 2007

NMarkRoberts wrote:

Somewhere in my code for a sizeable system I'm overloading a NetLinx processor (I tried it with an NI700, NI2000 and NI4000, non-Duet and a quick try on Duet) causing a series of [telnet / msg on] messages as follows:

CIpInterpreter::Run - Execute Startup Code
CIpEvent::AddInternalEvent - Max Queue Count = 25
CIpEvent::AddInternalEvent - Max Queue Count = 50
CIpEvent::AddInternalEvent - Max Queue Count = 75

and when it hits 300 it crashes the queue to zero, causing random failures of my startup code.

The message log is similar but different:

(Reader=WDMTCP15 writer=tIPConMgr)- CMessagePipe::Max = 75
(Reader=WDMTCP15 writer=tIPConMgr)- CMessagePipe::Max = 50
(Reader=WDMTCP15 writer=tIPConMgr)- CMessagePipe::Max = 25

I have no idea which queue this is and it does not appear to correlate with any of those shown by "set queue size" or "set threshold". I once watched the queue climb to 200, pause for 40 seconds (!) then continue climbing from 200 (!) to 300 and crash.

I can more or less reliably keep it under control by optimising code, cutting out modules and spacing out the module startups, but I'd rather know what's causing it or how to stretch the queue max.

Any ideas?

The Reader=WDMTCP15, writer=tIPConMgr means something.
WDMTCP15 is a thread for transmitting data to a G3 Web Control (WDM means Web Device Manager). Since it is the reader, that means messages are being sent to it but it is not reading it, most likely because messages are not flowing over the network fast enough, or the G3WebControl Applet is not responding to messages that have been sent.

What are you thresholds and queue sizes set to?

DHawthorne · February 2007

It could simply be too many updates to the G3 Web Control device and it can't keep up, or it could be it's off line and the master didn't pick up on the fact. Revisit the feedback part of your code and make sure you aren't overdoing the messages. The best tool for his is NetLinx Diagnostics; connect and turn on Device Notifications for the device in question. Check all the message types, enable notifications, and watch what happens. Anything that is happening multiple times a second is too much, especially for an inherently slow device like Web Control.

NMarkRoberts · February 2007

DHawthorne wrote:

It could simply be too many updates to the G3 Web Control device and it can't keep up

Good thoughts which I shall look into but it turns out G3 web control was a red herring. Chuck has generously maintained a dialogue with me all weekend (and it was a long one) with so far no magic solution.

He suggested that I should make the queue sizes long and the threshold sizes short. This would mean that the interpreter would stall and not overload the queues. So I experimented with the interpreter threshold and discovered that when set to 50 or 200 the queue will crash, but when set to 100 it scrapes through when I do a minimal installation. So my site is working, but not at "full strength". Adding one more room control module (and it's a biggie) blows the queue.

Chuck's latest idea is to set device holdoff, which makes NI devices come online after define_start has run. Good idea but no effect.

He has confirmed that the offending queue is indeed "internal" and cannot be adjusted.

DHawthorne · February 2007

It may be generated by the master, but it must be associated to a device or it would be happening all the time to everyone. I would still go into diagnostics, but turn on notifications for all devices to see who the offending party is. Devices don't normally have this kind of problem, I strongly suspect you have one that is marginal.

NMarkRoberts · February 2007

Problem pinned down

It took me a while to pin this down. In summary, it seems that rebuild_event on a large button array is hard work for the processor and it relies on a queue which is too short in the following circumstances.

If you build a system as follows, the internal event queue blows out and crashes:

1) many (25+) modules with

2) a button_event on a large button array (200ish)

3) any code in the define_program

4) a call to rebuild_event at timing less than 5/10 second

By "timing" I mean the delay between each successive module's call to rebuild_event.

Here is the relevant code; first the module:

module_name = 'mSandpit' (
  dev     dTouchpanel       ,
  integer mModuleNumber     )      

define_variable

(* This array has to be large *)

integer tpaVeryLarge[] = { 
                              48 ,49 ,50,
	51 ,52 ,53 ,54 ,55 ,56 ,57 ,58 ,59 ,60,
	61 ,62 ,63 ,64 ,65 ,66 ,67 ,68 ,69 ,70,
	71 ,72 ,73 ,74 ,75 ,76 ,77 ,78 ,79 ,80,
	81 ,82 ,83 ,84 ,85 ,86 ,87 ,88 ,89 ,90,
	91 ,92 ,93 ,94 ,95 ,96 ,97 ,98 ,99 ,100,
	101,102,103,104,105,106,107,108,109,110,
	111,112,113,114,115,116,117,118,119,120,
	121,122,123,124,125,126,127,128,129,130,
	131,132,133,134,135,136,137,138,139,140,
	141,142,143,144,145,146,147,148,149,150,
	151,152,153,154,155,156,157,158,159,160,
	161,162,163,164,165,166,167,168,169,170,
	171,172,173,174,175,176,177,178,179,180,
	181,182,183,184,185,186,187,188,189,190,
	191,192,193,194,195,196,197,198,199,200,
	201,202,203,204,205,206,207,208,209,210,
	211,212,213,214,215,216,217,218,219,220,
	221,222,223,224,225,226,227,228,229,230,
	231,232,233,234,235,236,237,238,239,240,
	241,242,243,244,245,246,247,248,249,250,
	251,252,253,254}

(* This button_event has to be there *)

define_event

button_event[dTouchpanel,tpaVeryLarge]                                              
  {
  push:
    {         
    }
  }

(* This code has to be there, even though it does nothing *)

define_program

if (1)
  {
  }
			
define_start

(* 
The rebuild_event has to be there
When spaced by 10 tenths little queue buildup occurs; 
when spaced by 1 tenth the queue crashes.
*)

wait (50 + (mModuleNumber * 5))
  {
  rebuild_event()  
  }

And now the mainline code:

program_name = 'Sandpit'

define_device

dTouchpanel = 128:1:0

define_variable

x1 = 1
x2 = 2
x3 = 3
x4 = 4
x5 = 5
x6 = 6
x7 = 7
x8 = 8
x9 = 9
x10 = 10
x11 = 11
x12 = 12
x13 = 13
x14 = 14
x15 = 15
x16 = 16
x17 = 17
x18 = 18
x19 = 19
x20 = 20 
x21 = 11
x22 = 12
x23 = 13
x24 = 14
x25 = 15
x26 = 16
x27 = 17
x28 = 18
x29 = 19
x30 = 20 

(* There must be 20+ modules for the queue to grow large *)

define_module 'mSandpit' modSandpit1(dTouchpanel,x1)
define_module 'mSandpit' modSandpit2(dTouchpanel,x2)
define_module 'mSandpit' modSandpit3(dTouchpanel,x3)
define_module 'mSandpit' modSandpit4(dTouchpanel,x4)
define_module 'mSandpit' modSandpit5(dTouchpanel,x5)
define_module 'mSandpit' modSandpit6(dTouchpanel,x6)
define_module 'mSandpit' modSandpit7(dTouchpanel,x7)
define_module 'mSandpit' modSandpit8(dTouchpanel,x8)
define_module 'mSandpit' modSandpit9(dTouchpanel,x9)
define_module 'mSandpit' modSandpit10(dTouchpanel,x10)
define_module 'mSandpit' modSandpit11(dTouchpanel,x11)
define_module 'mSandpit' modSandpit12(dTouchpanel,x12)
define_module 'mSandpit' modSandpit13(dTouchpanel,x13)
define_module 'mSandpit' modSandpit14(dTouchpanel,x14)
define_module 'mSandpit' modSandpit15(dTouchpanel,x15)
define_module 'mSandpit' modSandpit16(dTouchpanel,x16)
define_module 'mSandpit' modSandpit17(dTouchpanel,x17)
define_module 'mSandpit' modSandpit18(dTouchpanel,x18)
define_module 'mSandpit' modSandpit19(dTouchpanel,x19)
define_module 'mSandpit' modSandpit20(dTouchpanel,x20)  
define_module 'mSandpit' modSandpit21(dTouchpanel,x21)
define_module 'mSandpit' modSandpit22(dTouchpanel,x22)
define_module 'mSandpit' modSandpit23(dTouchpanel,x23)
define_module 'mSandpit' modSandpit24(dTouchpanel,x24)
define_module 'mSandpit' modSandpit25(dTouchpanel,x25)
define_module 'mSandpit' modSandpit26(dTouchpanel,x26)
define_module 'mSandpit' modSandpit27(dTouchpanel,x27)
define_module 'mSandpit' modSandpit28(dTouchpanel,x28)
define_module 'mSandpit' modSandpit29(dTouchpanel,x29)
define_module 'mSandpit' modSandpit30(dTouchpanel,x30)

cwpartridge · February 2007

Why not do 1 rebuild_event() in the main program, instead of 25 occurances? I have to verify when I am in the office, but my recollection is that rebuild_event() rebuild the entire event table, not just 1 particular module. In essence, you are doing the rebuild_event() 24 times too many which evidently can bog down things.

Chuck

Joe Hebert · February 2007

Was there a resolution to this? Does rebuild_event() rebuild the entire table, not just 1 particular module?

Thanks.

NMarkRoberts · February 2007

Joe Hebert wrote:

Was there a resolution to this? Does rebuild_event() rebuild the entire table, not just 1 particular module?

The manual says that it is module-specific, or at least that's how I read it.

My resolution was to ensure that I spaced out the rebuild_event calls. I already had this mechanism in place for my module startups but the rebuild_event had fallen through the cracks.

BTW I asked the very helpful folk at AMX Australia when a rebuild_event is necessary and here is the reply:

The REBUILD_EVENT() function should be called in the following
situations:

1) Where an Event is coded for a DEV variable and the Number, Port,
and/or System portion of that DEV variable is changed in the code during
runtime.

2) Where an Event is coded for an intrinsic data type variable such as
an INTEGER, LONG, etc. and the value of that intrinsic data type
variable is changed in the code during runtime.

3) Where an Event is coded for an array and the values in that array are
changed in code during runtime.

4) Where an Event is coded for an array and the length of the array is
changed in code during runtime.

If you never do any of the above then you should not call the
REBUILD_EVENT() function.

cwpartridge · February 2007

Joe Hebert wrote:

Was there a resolution to this? Does rebuild_event() rebuild the entire table, not just 1 particular module?

Thanks.

As usual lately, my memory fails me. I apologize, again. This was a point of discussion when we talked about implementing REBUILD_EVENTS() and what I stated in my post was not how it was finally implemented. I checked the code, and checked the manual, and NMarkRoberts is correct.

The NetLinx Keyword help file states specifically:
REBUILD_EVENT() works on a module-by-module basis (i.e. calling the function in one module does not affect the event table of another module).

Sorry again for any confusion.

Chuck

NMarkRoberts · May 2007

Technote 824

Those of you who are subscribed will already have seen this brand new technote. Download the PDF to see the last line of text and a diagram.

Reading all the way to the end (heavy going, I know!) reveals:

Log message displaying the increase of the internal* queue of the Interpreter:

Interpreter CIpEvent::AddInternalEvent - Max Queue Count = 250
...
Interpreter CIpEvent::AddInternalEvent - Max Queue Count = 25

*Note: The internal Interpreter queue is for messages being generated by the NetLinx program (waits, timelines, etc...). The upper limit of this queue cannot be modified by the user. However, the upper limit was changed from 300 to 500 in firmware version 3.21.343, to support faster masters.

This version is Duet only.

AddInternalEvent queue blowout

Comments