Offline and Online messages
AlexArtist
Posts: 51
Hi, i have a situation where we have 7 wireless units connecting to an NI-4100 running firmware v3_41_422.
There are actually many masters on this jobs site and all 8 of them connect to this NI-4100 via wired Lan, all of them point to the NI-4100 expect in one case the NI-4100 points to the other master. All the master use "Route Mode Direct" when connecting to another master. The NI-4100 having the Problem is System2
Anyway i get a lot of OFFLINE/ONLINE errors happening on the job site recently. I don't know what the issue is. Even the wired LAN devices seem to be dropping offline (not just the wireless). At first it was occuring about every 3 minutes some random device would drop offline and come back online, but it seems to have stabalized to about 3 times an hours. Its odd, and i'm looking for advice here, what could be wrong, on the surface the client rarely notices a device doesnt seem to respond, but largely the system seems to work. Below are some of messages i get in telnet from the NI-4100, the MVP-8400 and the MVP-5150s
*************************************************************************************************
*************************************************************************************************
*************************************************************************************************
NI-4100 TELNET
*************************************************************************************************
*************************************************************************************************
*************************************************************************************************
Welcome to NetLinx v3.41.422 Copyright AMX LLC 2008
>ip status
IP Status
NetLinx IP Connections
TCP Client connected on IP Port 23 LocalPort=17 IP=10.2.52.40 (socket=42)
TCP Client connected on IP Port 2001 LocalPort=12 IP=10.2.52.20 (socket=46)
TCP Client connected on IP Port 2001 LocalPort=13 IP=10.2.52.24 (socket=47)
TCP Client connected on IP Port 2001 LocalPort=14 IP=10.2.52.28 (socket=48)
TCP Client connected on IP Port 2001 LocalPort=15 IP=10.2.52.32 (socket=49)
TCP Client connected on IP Port 2001 LocalPort=16 IP=10.2.52.36 (socket=50)
TCP Client connected on IP Port 1632 LocalPort=10 IP=10.2.52.49 (socket=53)
TCP Client connected on IP Port 2001 LocalPort=11 IP=10.2.52.16 (socket=45)
TCP Client connected on IP Port 23 LocalPort=19 IP=10.2.52.44 (socket=41)
>show tcp
Show TCP List
The following TCP connections exist(ed):
1: IP=10.2.52.19:1370 Socket=24 (Active)
Last[MsgsInQ=0 RecvTime=0168704413 SendTime=0168704411]
2: IP=10.2.52.41:3602 Socket=25 (Active)
Last[MsgsInQ=0 RecvTime=0168704406 SendTime=0168704517]
3: IP=10.2.52.39:3602 Socket=26 (Active)
Last[MsgsInQ=0 RecvTime=0168704406 SendTime=0168704517]
4: IP=10.2.52.52:1026 Socket=27 (Active)
Last[MsgsInQ=0 RecvTime=0168705122 SendTime=0168705122]
5: IP=10.2.52.30:34305 Socket=29 (Active)
Last[MsgsInQ=0 RecvTime=0168704407 SendTime=0168704517]
6: IP=10.2.52.43:1061 Socket=31 (Active)
Last[MsgsInQ=0 RecvTime=0168704407 SendTime=0168704518]
7: IP=10.2.52.38:34417 Socket=33 (Active)
Last[MsgsInQ=0 RecvTime=0168704413 SendTime=0168704518]
8: IP=10.2.52.46:1026 Socket=34 (Active)
Last[MsgsInQ=0 RecvTime=0168704414 SendTime=0168704519]
9: IP=10.2.52.10:1319 Socket=32 (Active)
Last[MsgsInQ=0 RecvTime=0168708111 SendTime=0168708111]
10: IP=10.2.52.22:43857 Socket=36 (Active)
Last[MsgsInQ=0 RecvTime=0168704408 SendTime=0168704519]
11: IP=10.2.52.18:60448 Socket=39 (Active)
Last[MsgsInQ=0 RecvTime=0168704409 SendTime=0168704519]
12: IP=10.2.52.34:58684 Socket=40 (Active)
Last[MsgsInQ=0 RecvTime=0168704409 SendTime=0168704520]
13: IP=10.2.52.26:35642 Socket=51 (Active)
Last[MsgsInQ=0 RecvTime=0168704408 SendTime=0168704520]
14: IP=172.16.6.197:1145 Socket=43 (Active)
Last[MsgsInQ=0 RecvTime=0168704422 SendTime=0168704520]
>msg stats
Message Stats
Messages since boot:
ICSP messages rx from <10.2.52.19> = 73922
ICSP messages rx from <10.2.52.41> = 33695
ICSP messages rx from <10.2.52.39> = 33722
ICSP messages rx from <10.2.52.52> = 34647
ICSP messages rx from <10.2.52.30> = 375
ICSP messages rx from <10.2.52.43> = 33750
ICSP messages rx from <10.2.52.38> = 8747
ICSP messages rx from <10.2.52.46> = 32250
ICSP messages rx from <10.2.52.10> = 145124
ICSP messages rx from <10.2.52.22> = 1378
ICSP messages rx from <10.2.52.18> = 10338
Interpreter messages = 368612
ICSP messages rx from <10.2.52.34> = 2725
ICSP messages rx from <10.2.52.26> = 372
ICSP messages rx from <172.16.6.197> = 513
Collecting message statistics over 10 seconds. Please wait...
Messages in last 10 seconds:
ICSP messages rx from <10.2.52.19> = 4
ICSP messages rx from <10.2.52.41> = 2
ICSP messages rx from <10.2.52.39> = 2
ICSP messages rx from <10.2.52.52> = 4
ICSP messages rx from <10.2.52.30> = 2
ICSP messages rx from <10.2.52.43> = 2
ICSP messages rx from <10.2.52.38> = 2
ICSP messages rx from <10.2.52.46> = 2
ICSP messages rx from <10.2.52.10> = 5
ICSP messages rx from <10.2.52.22> = 2
ICSP messages rx from <10.2.52.18> = 2
Interpreter messages = 0
ICSP messages rx from <10.2.52.34> = 2
ICSP messages rx from <10.2.52.26> = 2
ICSP messages rx from <172.16.6.197> = 2
>show buffers max
Show Buffers
Thread TX RX Queued
---- ---- ----
Axlink 0
ICSNet 2 0 0-Sent=NO Waiting=NO
PhastLink 0 0 0-Sent=NO Waiting=NO
UDP 0 0-Sent=NO Waiting=NO
IPCon Mgr 0
Lontalk 0
LontalkHost 0
Con Manager 0
Interpreter 0
Device Mgr 0
Diag Mgr 0
Msg Dispatc 0
Cfg Mgr 0
Route Mgr 0
Notify Mgr 0
Java Router 0
---- ---- ----
Total 2 0 0 GrandTotal 2
>show buffers
Show Buffers
Thread TX RX Queued
---- ---- ----
Axlink 0
ICSNet 0 0 0-Sent=NO Waiting=NO
PhastLink 0 0 0-Sent=NO Waiting=NO
UDP 0 0-Sent=NO Waiting=NO
IPCon Mgr 0
Lontalk 0
LontalkHost 0
Con Manager 0
Interpreter 0
Device Mgr 0
Diag Mgr 0
Msg Dispatc 0
Cfg Mgr 0
Route Mgr 0
Notify Mgr 0
Java Router 0
---- ---- ----
Total 0 0 0 GrandTotal 0
>show mem
Display Memory
Volatile Free : 25392208/67108864 (largest free block in bytes/max physical)
NonVolatile Free: 1036814/1047536 (bytes free/max physical)
Disk Free :504913920/512196608 (bytes of free space/max physical)
Duet Memory Free : 2711856 (bytes)
Partition 1 - 2711856 (bytes)
Total Collections - 8
Average Time Between Collections - 18083683ms
Partition 2 - <UNKNOWN>
>cpu usage
CPU usage = 99.36% (30 sec. average = 98.84%, 30 sec. max = 99.85%)
>show mem
Display Memory
Volatile Free : 25392208/67108864 (largest free block in bytes/max physical)
NonVolatile Free: 1036814/1047536 (bytes free/max physical)
Disk Free :504913920/512196608 (bytes of free space/max physical)
Duet Memory Free : 2711856 (bytes)
Partition 1 - 2711856 (bytes)
Total Collections - 8
Average Time Between Collections - 18083683ms
Partition 2 - <UNKNOWN>
>show log all
Message Log for System 2 Version: v3.41.422
Entry Date/Time Object
Text
"CICSPTCP Rx connection to 10.2.52.26 has been closed locally or by peer"
"Closing connection to 10.2.52.26 due to transmission failure to system 2 device 0"
"Closing connection to 10.2.52.30 due to transmission failure to system 2 device 10014"
"Closing connection to 10.2.52.26 due to transmission failure to system 2 device 10013"
"Closing connection to 10.2.52.22 due to transmission failure to system 2 device 10012"
"Closing connection to 10.2.52.34 due to transmission failure to system 2 device 0"
"Closing connection to 10.2.52.34 due to transmission failure to system 2 device 10015"
"Closing connection to 10.2.52.38 due to transmission failure to system 2 device 0"
"Closing connection to 10.2.52.38 due to transmission failure to system 2 device 10016"
"Closing connection to 10.2.52.18 due to transmission failure to system 2 device 10011"
etc... etc... There are many Offline and Online messages that go with the lines above, but they are redundant.
*************************************************************************************************
*************************************************************************************************
*************************************************************************************************
Telnet Info from one of the MVP-8400s
*************************************************************************************************
*************************************************************************************************
*************************************************************************************************
11: 01-01-1970 THU 00:01:01 SocketError
Socket connect attempt failed: No route to host (errno=0x94)
>show connection stats
Connection Statistics
Total Last 15 Minutes
===== ===============
ICSP Messages: 1040263 received 4781 received
1040257 processed 4781 processed
6 dropped 0 droppped
Blink Messages: 31735 received 158 received
9 missed 0 missed
IP Statistics:
RX packets:297736 errors:0 dropped:0 overruns:0 frame:0
TX packets:225209 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:68678575 (65.4 Mb) TX bytes:16145201 (15.3 Mb)
Wireless Statistics:
Mode:Managed Frequency:2.437 GHz Access Point: 00:1A:1E:D0:B7:81
Link Quality=70/100 Signal level=84/255 Noise level=170/152
Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:1205418
Tx excessive retries:36981 Invalid misc:84154 Missed beacon:0
*************************************************************************************************
*************************************************************************************************
*************************************************************************************************
Telnet Info from one of the MVP-5150s
*************************************************************************************************
*************************************************************************************************
*************************************************************************************************
ICSP Messages: 545314 received 302 received
545193 processed 302 processed
121 dropped 0 droppped
Blink Messages: 250900 received 141 received
145 missed 0 missed
IP Statistics:
RX packets:813411 errors:0 dropped:0 overruns:0 frame:0
TX packets:825983 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:195389452 (186.3 MiB) TX bytes:103906823 (99.0 MiB)
Wireless Statistics:
Mode:Managed Frequency:2.437 GHz Access Point: 00:1A:1E:D0:B7:81
Link Quality:76/100 Signal level:-49 dBm Noise level:-122 dBm
Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0
Tx excessive retries:0 Invalid misc:0 Missed beacon:0
>show connection log
Connection Log
Entry Date/Time Event
Text
1: 08-07-2009 FRI 17:56:22 MasterUrlConnect
Connected to (Sys=2) Master 10.2.52.45 (URL Mode)
2: 08-07-2009 FRI 17:56:21 SocketClosure
ICSPTCP End
3: 08-07-2009 FRI 17:56:18 MasterUrlConnect
Disconnected from (Sys=0) Master 10.2.52.45 (URL Mode)
4: 08-07-2009 FRI 17:56:18 MasterUrlConnect
Disconnected from (Sys=0) Master 10.2.52.45 (URL Mode)
5: 08-07-2009 FRI 17:56:18 SocketError
ICSP TCP Socket read error: iBytesRead==0
6: 08-07-2009 FRI 17:56:18 IcspMsgTimeout
Closing connection due to ICSP Message timeout
7: 08-07-2009 FRI 17:56:15 BlinkMissTimeout
Blink Messages Missed
8: 08-07-2009 FRI 17:56:01 BlinkMissTimeout
Blink Messages Missed
9: 08-07-2009 FRI 09:14:00 MasterUrlConnect
Connected to (Sys=2) Master 10.2.52.45 (URL Mode)
10: 08-07-2009 FRI 09:14:00 SocketClosure
ICSPTCP End
There are actually many masters on this jobs site and all 8 of them connect to this NI-4100 via wired Lan, all of them point to the NI-4100 expect in one case the NI-4100 points to the other master. All the master use "Route Mode Direct" when connecting to another master. The NI-4100 having the Problem is System2
Anyway i get a lot of OFFLINE/ONLINE errors happening on the job site recently. I don't know what the issue is. Even the wired LAN devices seem to be dropping offline (not just the wireless). At first it was occuring about every 3 minutes some random device would drop offline and come back online, but it seems to have stabalized to about 3 times an hours. Its odd, and i'm looking for advice here, what could be wrong, on the surface the client rarely notices a device doesnt seem to respond, but largely the system seems to work. Below are some of messages i get in telnet from the NI-4100, the MVP-8400 and the MVP-5150s
*************************************************************************************************
*************************************************************************************************
*************************************************************************************************
NI-4100 TELNET
*************************************************************************************************
*************************************************************************************************
*************************************************************************************************
Welcome to NetLinx v3.41.422 Copyright AMX LLC 2008
>ip status
IP Status
NetLinx IP Connections
TCP Client connected on IP Port 23 LocalPort=17 IP=10.2.52.40 (socket=42)
TCP Client connected on IP Port 2001 LocalPort=12 IP=10.2.52.20 (socket=46)
TCP Client connected on IP Port 2001 LocalPort=13 IP=10.2.52.24 (socket=47)
TCP Client connected on IP Port 2001 LocalPort=14 IP=10.2.52.28 (socket=48)
TCP Client connected on IP Port 2001 LocalPort=15 IP=10.2.52.32 (socket=49)
TCP Client connected on IP Port 2001 LocalPort=16 IP=10.2.52.36 (socket=50)
TCP Client connected on IP Port 1632 LocalPort=10 IP=10.2.52.49 (socket=53)
TCP Client connected on IP Port 2001 LocalPort=11 IP=10.2.52.16 (socket=45)
TCP Client connected on IP Port 23 LocalPort=19 IP=10.2.52.44 (socket=41)
>show tcp
Show TCP List
The following TCP connections exist(ed):
1: IP=10.2.52.19:1370 Socket=24 (Active)
Last[MsgsInQ=0 RecvTime=0168704413 SendTime=0168704411]
2: IP=10.2.52.41:3602 Socket=25 (Active)
Last[MsgsInQ=0 RecvTime=0168704406 SendTime=0168704517]
3: IP=10.2.52.39:3602 Socket=26 (Active)
Last[MsgsInQ=0 RecvTime=0168704406 SendTime=0168704517]
4: IP=10.2.52.52:1026 Socket=27 (Active)
Last[MsgsInQ=0 RecvTime=0168705122 SendTime=0168705122]
5: IP=10.2.52.30:34305 Socket=29 (Active)
Last[MsgsInQ=0 RecvTime=0168704407 SendTime=0168704517]
6: IP=10.2.52.43:1061 Socket=31 (Active)
Last[MsgsInQ=0 RecvTime=0168704407 SendTime=0168704518]
7: IP=10.2.52.38:34417 Socket=33 (Active)
Last[MsgsInQ=0 RecvTime=0168704413 SendTime=0168704518]
8: IP=10.2.52.46:1026 Socket=34 (Active)
Last[MsgsInQ=0 RecvTime=0168704414 SendTime=0168704519]
9: IP=10.2.52.10:1319 Socket=32 (Active)
Last[MsgsInQ=0 RecvTime=0168708111 SendTime=0168708111]
10: IP=10.2.52.22:43857 Socket=36 (Active)
Last[MsgsInQ=0 RecvTime=0168704408 SendTime=0168704519]
11: IP=10.2.52.18:60448 Socket=39 (Active)
Last[MsgsInQ=0 RecvTime=0168704409 SendTime=0168704519]
12: IP=10.2.52.34:58684 Socket=40 (Active)
Last[MsgsInQ=0 RecvTime=0168704409 SendTime=0168704520]
13: IP=10.2.52.26:35642 Socket=51 (Active)
Last[MsgsInQ=0 RecvTime=0168704408 SendTime=0168704520]
14: IP=172.16.6.197:1145 Socket=43 (Active)
Last[MsgsInQ=0 RecvTime=0168704422 SendTime=0168704520]
>msg stats
Message Stats
Messages since boot:
ICSP messages rx from <10.2.52.19> = 73922
ICSP messages rx from <10.2.52.41> = 33695
ICSP messages rx from <10.2.52.39> = 33722
ICSP messages rx from <10.2.52.52> = 34647
ICSP messages rx from <10.2.52.30> = 375
ICSP messages rx from <10.2.52.43> = 33750
ICSP messages rx from <10.2.52.38> = 8747
ICSP messages rx from <10.2.52.46> = 32250
ICSP messages rx from <10.2.52.10> = 145124
ICSP messages rx from <10.2.52.22> = 1378
ICSP messages rx from <10.2.52.18> = 10338
Interpreter messages = 368612
ICSP messages rx from <10.2.52.34> = 2725
ICSP messages rx from <10.2.52.26> = 372
ICSP messages rx from <172.16.6.197> = 513
Collecting message statistics over 10 seconds. Please wait...
Messages in last 10 seconds:
ICSP messages rx from <10.2.52.19> = 4
ICSP messages rx from <10.2.52.41> = 2
ICSP messages rx from <10.2.52.39> = 2
ICSP messages rx from <10.2.52.52> = 4
ICSP messages rx from <10.2.52.30> = 2
ICSP messages rx from <10.2.52.43> = 2
ICSP messages rx from <10.2.52.38> = 2
ICSP messages rx from <10.2.52.46> = 2
ICSP messages rx from <10.2.52.10> = 5
ICSP messages rx from <10.2.52.22> = 2
ICSP messages rx from <10.2.52.18> = 2
Interpreter messages = 0
ICSP messages rx from <10.2.52.34> = 2
ICSP messages rx from <10.2.52.26> = 2
ICSP messages rx from <172.16.6.197> = 2
>show buffers max
Show Buffers
Thread TX RX Queued
---- ---- ----
Axlink 0
ICSNet 2 0 0-Sent=NO Waiting=NO
PhastLink 0 0 0-Sent=NO Waiting=NO
UDP 0 0-Sent=NO Waiting=NO
IPCon Mgr 0
Lontalk 0
LontalkHost 0
Con Manager 0
Interpreter 0
Device Mgr 0
Diag Mgr 0
Msg Dispatc 0
Cfg Mgr 0
Route Mgr 0
Notify Mgr 0
Java Router 0
---- ---- ----
Total 2 0 0 GrandTotal 2
>show buffers
Show Buffers
Thread TX RX Queued
---- ---- ----
Axlink 0
ICSNet 0 0 0-Sent=NO Waiting=NO
PhastLink 0 0 0-Sent=NO Waiting=NO
UDP 0 0-Sent=NO Waiting=NO
IPCon Mgr 0
Lontalk 0
LontalkHost 0
Con Manager 0
Interpreter 0
Device Mgr 0
Diag Mgr 0
Msg Dispatc 0
Cfg Mgr 0
Route Mgr 0
Notify Mgr 0
Java Router 0
---- ---- ----
Total 0 0 0 GrandTotal 0
>show mem
Display Memory
Volatile Free : 25392208/67108864 (largest free block in bytes/max physical)
NonVolatile Free: 1036814/1047536 (bytes free/max physical)
Disk Free :504913920/512196608 (bytes of free space/max physical)
Duet Memory Free : 2711856 (bytes)
Partition 1 - 2711856 (bytes)
Total Collections - 8
Average Time Between Collections - 18083683ms
Partition 2 - <UNKNOWN>
>cpu usage
CPU usage = 99.36% (30 sec. average = 98.84%, 30 sec. max = 99.85%)
>show mem
Display Memory
Volatile Free : 25392208/67108864 (largest free block in bytes/max physical)
NonVolatile Free: 1036814/1047536 (bytes free/max physical)
Disk Free :504913920/512196608 (bytes of free space/max physical)
Duet Memory Free : 2711856 (bytes)
Partition 1 - 2711856 (bytes)
Total Collections - 8
Average Time Between Collections - 18083683ms
Partition 2 - <UNKNOWN>
>show log all
Message Log for System 2 Version: v3.41.422
Entry Date/Time Object
Text
"CICSPTCP Rx connection to 10.2.52.26 has been closed locally or by peer"
"Closing connection to 10.2.52.26 due to transmission failure to system 2 device 0"
"Closing connection to 10.2.52.30 due to transmission failure to system 2 device 10014"
"Closing connection to 10.2.52.26 due to transmission failure to system 2 device 10013"
"Closing connection to 10.2.52.22 due to transmission failure to system 2 device 10012"
"Closing connection to 10.2.52.34 due to transmission failure to system 2 device 0"
"Closing connection to 10.2.52.34 due to transmission failure to system 2 device 10015"
"Closing connection to 10.2.52.38 due to transmission failure to system 2 device 0"
"Closing connection to 10.2.52.38 due to transmission failure to system 2 device 10016"
"Closing connection to 10.2.52.18 due to transmission failure to system 2 device 10011"
etc... etc... There are many Offline and Online messages that go with the lines above, but they are redundant.
*************************************************************************************************
*************************************************************************************************
*************************************************************************************************
Telnet Info from one of the MVP-8400s
*************************************************************************************************
*************************************************************************************************
*************************************************************************************************
11: 01-01-1970 THU 00:01:01 SocketError
Socket connect attempt failed: No route to host (errno=0x94)
>show connection stats
Connection Statistics
Total Last 15 Minutes
===== ===============
ICSP Messages: 1040263 received 4781 received
1040257 processed 4781 processed
6 dropped 0 droppped
Blink Messages: 31735 received 158 received
9 missed 0 missed
IP Statistics:
RX packets:297736 errors:0 dropped:0 overruns:0 frame:0
TX packets:225209 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:68678575 (65.4 Mb) TX bytes:16145201 (15.3 Mb)
Wireless Statistics:
Mode:Managed Frequency:2.437 GHz Access Point: 00:1A:1E:D0:B7:81
Link Quality=70/100 Signal level=84/255 Noise level=170/152
Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:1205418
Tx excessive retries:36981 Invalid misc:84154 Missed beacon:0
*************************************************************************************************
*************************************************************************************************
*************************************************************************************************
Telnet Info from one of the MVP-5150s
*************************************************************************************************
*************************************************************************************************
*************************************************************************************************
ICSP Messages: 545314 received 302 received
545193 processed 302 processed
121 dropped 0 droppped
Blink Messages: 250900 received 141 received
145 missed 0 missed
IP Statistics:
RX packets:813411 errors:0 dropped:0 overruns:0 frame:0
TX packets:825983 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:195389452 (186.3 MiB) TX bytes:103906823 (99.0 MiB)
Wireless Statistics:
Mode:Managed Frequency:2.437 GHz Access Point: 00:1A:1E:D0:B7:81
Link Quality:76/100 Signal level:-49 dBm Noise level:-122 dBm
Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0
Tx excessive retries:0 Invalid misc:0 Missed beacon:0
>show connection log
Connection Log
Entry Date/Time Event
Text
1: 08-07-2009 FRI 17:56:22 MasterUrlConnect
Connected to (Sys=2) Master 10.2.52.45 (URL Mode)
2: 08-07-2009 FRI 17:56:21 SocketClosure
ICSPTCP End
3: 08-07-2009 FRI 17:56:18 MasterUrlConnect
Disconnected from (Sys=0) Master 10.2.52.45 (URL Mode)
4: 08-07-2009 FRI 17:56:18 MasterUrlConnect
Disconnected from (Sys=0) Master 10.2.52.45 (URL Mode)
5: 08-07-2009 FRI 17:56:18 SocketError
ICSP TCP Socket read error: iBytesRead==0
6: 08-07-2009 FRI 17:56:18 IcspMsgTimeout
Closing connection due to ICSP Message timeout
7: 08-07-2009 FRI 17:56:15 BlinkMissTimeout
Blink Messages Missed
8: 08-07-2009 FRI 17:56:01 BlinkMissTimeout
Blink Messages Missed
9: 08-07-2009 FRI 09:14:00 MasterUrlConnect
Connected to (Sys=2) Master 10.2.52.45 (URL Mode)
10: 08-07-2009 FRI 09:14:00 SocketClosure
ICSPTCP End
0
Comments
The first thing I would check is to make double sure there are no master-to-master circular connections. If that's fine, you have to start looking at anything else making an IP connection (panel, media server,whatever applies) and see if any of them might be dropping off too frequently. Lastly, there is a possibility of an exceptionally busy piece of code causing timeouts. The latter two can be a real pain in the nethers to find; it seems to me I am more likely to figure such issues out chasing hunches than by hard logic.
The audio guy said that AMX was supposedly connecting to his peavey program 3 times a second a month back. I never saw anything in AMX that would indicate this though, it could be the same issue of Network devices falling offline. Since the system does it even when not being used and if it's not Net devices causing the problem, I assume then it's something in mainline. Well, i'll look it over. Thanks for your help.
I had one job in which I upgraded firmware (because I needed something in the new firmware) and the job went to the crapper shortly there after. I have since removed all of the new code and that didn't fix the problem (it would lock up after a couple days, but show no signs of memory leaks and have 1 message stuck in the interpreter queue). I have since downgraded the firmware and I am in the process of waiting to see if the problem occurs again. While I was running the 422 firmware, I did see some of the errors you mentioned. At one point, a different processor on the job locked up completely and had to be hard rebooted.
On the job I am at currently, everything was working fine, then last week, everything fell apart. I am still not sure if this is a code issue or a hardware/firmware/ip conflict issue, but recently, I have been having problems with code that had been functioning properly on past jobs. I have yet to come up with concrete evidence as to the cause, but there is a small part of me that is leaning towards blaming the firmware. Unfortunately, I seemed to be the only one having issues according to tech support. Let me know if you find the cause, or if your code was running fine on a previous firmware version.
Jeff
But after taking Hawthornes suggestion i went through all my mainline code and puts waits of 1 tenth of seconds on everything. Seems to work way more stably now. CPU usage is down and i haven't gotten and disconnects yet. Something in that mainline must have been running too much i think. I'm going to wait a day or 2 before i celebrate, but seems to be working better now. (After upgrading the firmware the system did lock up the next day, even though it had been running ok for weeks. But i don't know if it's really the firmware, i think the mainline code was a little too much. But again, i'll wait and see.)
Another strange thing is the CPU usage goes from 64% at startup and gradually grows over time of a few hours of the system idle'ing. It's at like 98% now and i think that's really weird. Why would it grow? It's like a memory leak, except... in the form of a processor leak. (I've got a piece of code in the program that says the processor has never dropped below running mainline 6070 times a minute, so the cpu definetly seems to be able to keep up). Anyway, i don't know how accurate the CPU usage indicator is because it seems to only change once ever hour or so. I was thinking maybe the code isn't processing named waits correctly and it's building them up and eventually clogging up the processor, but that's just me guessing. I'll see if anything new has happened tomorrow.
And yes... the only time the system has locked up was about a day after i installed the 422 firmware, it only happened once. But i had no choice but upgrading, since the previous firmware didn't handle virtual feedback correctly, when using master to master.
I did have to scale back some feedback to the panels, but this is already handled in time lines for the most part. The original problem was caused by an RF security sensor that was on the edge of the RF range. The sensor started failing and was generating a lot of update traffic. After I determined this to be a cause, I eventually disconnected the security panel completely and removed all associated code. The system still locked up a week later tho. This was all code that had been running fine for at least 8 months.
I am wondering if they may have made some efficiency improvements in 422 firmware that allow more data to be sent which in turn could flood communication to the point of lockups? Maybe the processor can send stuff to the network faster than the network can negotiate and transmit?
Jeff
If anyone knows how to read the Spy Log, attached is what the spy log looked like when the system was having most of it's problems. I noticed that on the first pass of the spy log, almost all the Stats where low numbers but after letting it run like 7 times in a row, the numbers got bigger and bigger. Attached is the log after letting it run for about 7 times.