Keypads Offline and Online Occasionally

TurnipTruck · July 2010

Greetings,

I have a system with 14 MET7 keypads. After observing diagnostics, I notice that a few of them go offline for a few seconds and then come back online. They don't do it simutaniously. Is this to be expected for noramally-functioning keypads?

Thanks.

jjames · July 2010

I'm curious - is this at a regular interval? Only after refreshing your online-tree? I could see 6Ns doing it only if you forgot they are two devices in one - but not sure on the 7Ns . . . could power be too low?

Just brainstorming....

TurnipTruck · July 2010

Voltage is good. No rhyme or reason to it as far as I can tell. Here is about a three-hour snap shot of diagnostics. Devices 101-114 are the keypads (IP port 4 is iWeather):

Line 1 (11:12:23):: ClockMgr: Setting system time to - TUE JUL 27 11:12:26 2010$0A
Line 2 (11:12:52):: Connected Successfully
Line 3 (11:12:52):: CIpEvent::OnLine 0:4:103
Line 4 (11:12:52):: Exiting TCP Read thread - closing this socket for local port 4
Line 5 (11:12:52):: CIpEvent::OffLine 0:4:103
Line 6 (11:32:07):: CIpEvent::OffLine 106:1:103
Line 7 (11:32:13):: CIpEvent::OnLine 106:1:103
Line 8 (11:32:52):: Connected Successfully
Line 9 (11:32:52):: CIpEvent::OnLine 0:4:103
Line 10 (11:32:52):: Exiting TCP Read thread - closing this socket for local port 4
Line 11 (11:32:52):: CIpEvent::OffLine 0:4:103
Line 12 (11:52:52):: Connected Successfully
Line 13 (11:52:52):: CIpEvent::OnLine 0:4:103
Line 14 (11:52:52):: Exiting TCP Read thread - closing this socket for local port 4
Line 15 (11:52:52):: CIpEvent::OffLine 0:4:103
Line 16 (12:12:25):: ClockMgr: Setting system time to - TUE JUL 27 12:12:28 2010$0A
Line 17 (12:12:52):: Connected Successfully
Line 18 (12:12:52):: CIpEvent::OnLine 0:6:103
Line 19 (12:12:52):: CIpEvent::OnLine 0:4:103
Line 20 (12:12:52):: Exiting UDP SNMP Read thread - closing this socket for local port 6
Line 21 (12:12:52):: CIpEvent::OffLine 0:6:103
Line 22 (12:12:52):: Exiting TCP Read thread - closing this socket for local port 4
Line 23 (12:12:52):: CIpEvent::OffLine 0:4:103
Line 24 (12:15:53):: CIpEvent::OffLine 112:1:103
Line 25 (12:15:59):: CIpEvent::OnLine 112:1:103
Line 26 (12:28:07):: CIpEvent::OffLine 109:1:103
Line 27 (12:28:13):: CIpEvent::OnLine 109:1:103
Line 28 (12:32:52):: Connected Successfully
Line 29 (12:32:52):: CIpEvent::OnLine 0:4:103
Line 30 (12:32:52):: Exiting TCP Read thread - closing this socket for local port 4
Line 31 (12:32:52):: CIpEvent::OffLine 0:4:103
Line 32 (12:46:53):: CIpEvent::OffLine 101:1:103
Line 33 (12:46:59):: CIpEvent::OnLine 101:1:103
Line 34 (12:52:52):: Connected Successfully
Line 35 (12:52:52):: CIpEvent::OnLine 0:4:103
Line 36 (12:52:55):: Exiting TCP Read thread - closing this socket for local port 4
Line 37 (12:52:55):: CIpEvent::OffLine 0:4:103
Line 38 (13:12:28):: ClockMgr: Setting system time to - TUE JUL 27 13:12:31 2010$0A
Line 39 (13:12:52):: Connected Successfully
Line 40 (13:12:52):: CIpEvent::OnLine 0:4:103
Line 41 (13:12:53):: Exiting TCP Read thread - closing this socket for local port 4
Line 42 (13:12:53):: CIpEvent::OffLine 0:4:103
Line 43 (13:24:07):: CIpEvent::OffLine 101:1:103
Line 44 (13:24:13):: CIpEvent::OnLine 101:1:103
Line 45 (13:32:52):: Connected Successfully
Line 46 (13:32:52):: CIpEvent::OnLine 0:4:103
Line 47 (13:32:52):: Exiting TCP Read thread - closing this socket for local port 4
Line 48 (13:32:52):: CIpEvent::OffLine 0:4:103
Line 49 (13:52:07):: CIpEvent::OffLine 113:1:103
Line 50 (13:52:13):: CIpEvent::OnLine 113:1:103
Line 51 (13:52:52):: Connected Successfully
Line 52 (13:52:52):: CIpEvent::OnLine 0:4:103
Line 53 (13:52:52):: Exiting TCP Read thread - closing this socket for local port 4
Line 54 (13:52:52):: CIpEvent::OffLine 0:4:103

TurnipTruck · July 2010

Any thoughts anyone?

jjames · July 2010

something I just thought about while driving. First, how many Axlink devices total do you have on the system? Are any other systems using, or referencing them? Take a look at the cpu data via telnet, does it seem excessive? Could you write a function that checks the cpu usage when a keypad falls offline and reports it in diagnostics? Can you monitor button presses within that system to see if a certain sequence causes them to freak out? Could iWeather be trying to send them data?

Sent from my HERO200 using Tapatalk

TurnipTruck · July 2010

I never used CPU usage before. Pretty cool.

Current reading is 94.9, peak is 99.5. Seems high!?

PhreaK · July 2010

TurnipTruck wrote: »

Seems high!?

Extremely. I'd do a bit of investigating to find out the cause of that if I were you - even if it has nothing to do with the connection issues.

TurnipTruck · July 2010

Having never even known about the CPU usage test, I'm curious as to what typically contributes to high CPU usage.

Running timelines? Stuff in the define_program section?

In my case, there are no master-to-master connections.

Thanks.

PhreaK · July 2010

TurnipTruck wrote: »

I'm curious as to what typically contributes to high CPU usage.

Anything that is happening on the master can contribute. That includes both firmware and anything going on in your code. As there's not a flood of people complaining about master's suddenly having high usage it tends to suggest that the issue is likely stemming from something in the NetLinx/Duet code on the box.

The NI-x100's are capable of 404 MIPS (million instructions per second) and the 700/900's 304 MIPS (I think). Under normal operation they shouldn't be pushed to anywhere near this. Generally when you see usage that high it's an indication of some code sitting in an infinite loop.

Spire_Jeff · July 2010

Have you run diagnostics with everything selected for the keypad addresses? See if you are getting a bunch of button presses that could be stressing the processor. I have had a couple of the keypads have moisture problems that do funny things to the button circuitry and weird things happen. I had one that was constantly pushing and releasing a couple of buttons 100s of times per second

Also, remember that the 6N's use 2 addresses each. Make sure you don't have any address conflicts.

Jeff

TurnipTruck · July 2010

Processor usage remains the same with keypads and two ENV-VST-C (all Axlink devices) disconnected from the master.

One of the system modules has a bunch of IF/THEN in the define_program. I'm going to try commenting out that module.

The processor here is an NI-2000.

John Nagy · July 2010

CPU USAGE USAGE

This from our head engineer:
In case you’re not already familiar with this: The CPU USAGE command only gives valid number when the system is started with the “reboot heap watch” command in TELNET. Otherwise it will always give the same numbers which are a snap shot taken at boot time. This mode is turned off at next normal reboot.

The numbers are obviously going to vary based on what your system is currently doing. If you run the CPU USAGE command several time you’ll see the numbers fluctuate. Having a max of 99% is not uncommon. You average of 60% is pretty normal.

Here is what my test processor happened to be at when I looked: (System was handling some traffic but not much.)

CPU usage = 9.71% (30 sec. average = 51.85%, 30 sec. max = 99.96%)

Here are my numbers during a reboot:

CPU usage = 96.61% (30 sec. average = 21.10%, 30 sec. max = 96.61%)

PhreaK · July 2010

John Nagy wrote: »

This from our head engineer:
In case you’re not already familiar with this: The CPU USAGE command only gives valid number when the system is started with the “reboot heap watch” command in TELNET.

Is that still true for the latest firmware? Our systems report back system health and load info stats, including cpu utilization, to our RMS server. When the systems are under load it appears to be correctly reporting the increased usage. These are all booted into 'normal' operating mode.

TurnipTruck · July 2010

Maybe someone from AMX engineering could chime in set the record straight about the CPU USAGE command? Speculation abounds right now.

TurnipTruck · August 2010

No answer yet. Keypads still go off and online as shown in the dianostics snapshot. It seems cpmpletely random. No other errors showing anywhere or correlation with other events.

Hmmmm?

ericmedley · August 2010

TurnipTruck wrote: »

No answer yet. Keypads still go off and online as shown in the dianostics snapshot. It seems cpmpletely random. No other errors showing anywhere or correlation with other events.

Hmmmm?

I'm still stuck on the whole CPU useage thing. AxLink is pretty rock-solid when its set up right. Here's an out-of-left-field question: Are you using any Duet modules? and if so, What's your Duet memory set to? You might try jacking it up a bit. I've had a few flaky systems mysteriously 'get better' by jacking up the Duet mem a bit.

Just at thought.

TurnipTruck · August 2010

No duet modules. I say poo poo to Duet.

Colzie · August 2010

No mention of 6Ns but I'll second this...

Spire_Jeff wrote: »

Also, remember that the 6N's use 2 addresses each. Make sure you don't have any address conflicts.

John Nagy · August 2010

PhreaK wrote: »

Is that still true for the latest firmware? Our systems report back system health and load info stats, including cpu utilization, to our RMS server. When the systems are under load it appears to be correctly reporting the increased usage. These are all booted into 'normal' operating mode.

I don't know, but RMS may manage the activation of the monitoring. My test here now looks like it isn't active on the current firmware.

Idling after normal reboot:
CPU usage = 99.97% (30 sec. average = 92.07%, 30 sec. max = 99.99%)

This would indicate it is NOT actually monitoring, the value reported is always the same.

TurnipTruck · August 2010

Definitely no 6N keypads. The Axlink devices are the 14 MET7 keypads and two ENV-VST-C thermostats.

PhreaK · August 2010

John Nagy wrote: »

I don't know, but RMS may manage the activation of the monitoring. My test here now looks like it isn't active on the current firmware.

Unless it happens inside the RMSEngineMod its not doing it. Everything else has been rolled internally.

*Edit
Even within the RMSEngineMod when there's a vanilla system running code which was all rolled internally it appears to function fine.

*Edit (again)
Just contacted the TS guys here and they're not aware of any requirement to reboot with the heap watch flag. They're getting in touch with some of the engineering boffins over at AMX HQ for confirmation. I'll post the results when I hear back.

Keypads Offline and Online Occasionally

Comments