Program Info is Blank
John Gonzales
Posts: 609
in AMX Hardware
I've got an NI-3100 that's been running fine for about 2 years, the last programming update to it was done 14 months ago. All of the sudden, after having unplugged the unit for about 10 minutes, the master is no longer functioning properly. Here are the symptoms:
The unit is online, device number, system number, everything is fine. I can telnet into it, the touchpanel connects to it, it's in a master-master network and it can see and be seen by the other masters. Diagnostics, Notifications all work -- BUT, it appears as though there's no program loaded in it anymore. When I telnet in and execute the 'program info' command, I get a response as though nothing's in there. I loaded a small 2 line program to send a diagnostic message every 10 seconds and that loads and functions fine. Executing a 'program info' command returns the program name. When I load the program that's been in there for 2 years, we're back to the original symptoms as though there's no program loaded.
I've issued clean disk commands, reloaded firmware, changed the name of the program, boosted the duet memory up, commented out the duet modules that were in there... any ideas what to check next? One thing too is that I think the CMOS battery may be bad, since we've been having issues with the time and date falling behind lately.
Thanks for reading this long post.
--John
The unit is online, device number, system number, everything is fine. I can telnet into it, the touchpanel connects to it, it's in a master-master network and it can see and be seen by the other masters. Diagnostics, Notifications all work -- BUT, it appears as though there's no program loaded in it anymore. When I telnet in and execute the 'program info' command, I get a response as though nothing's in there. I loaded a small 2 line program to send a diagnostic message every 10 seconds and that loads and functions fine. Executing a 'program info' command returns the program name. When I load the program that's been in there for 2 years, we're back to the original symptoms as though there's no program loaded.
I've issued clean disk commands, reloaded firmware, changed the name of the program, boosted the duet memory up, commented out the duet modules that were in there... any ideas what to check next? One thing too is that I think the CMOS battery may be bad, since we've been having issues with the time and date falling behind lately.
Thanks for reading this long post.
--John
0
Comments
Format Disk doesn't seem to work either by the way, only clean disk. Is that command still supported?
--John
Was this code recompiled or is it the original .tkn file you're trying to download?
I don't see any obvious errors in the startup logs. Memory seems to be o.k., and using a get mem command it shows plenty of space too.
When I reloaded the program, I first reloaded the original .tkn, then I recompiled it and loaded the new .tkn, then I saved the program as a different name, recompiled and loaded that .tkn. Same issue occurred with all of those files -- no program info. The only one that showed up and functioned correctly was my little 2 line program.
Here's a copy of the log in case I didn't see something. I have to admit, I don't understand a lot of what the logs indicate unless it's glaringly obvious.
--John
When I've encountered this behaviour it's always been because I've inadvertently forgotten to declare a large variable as volatile and have run out of non-volatile memory on the controller. I suspect the same may happen if the program has variable declarations which would exceed the amount of available RAM.
A "show mem" (I think, I can never remember terminal commands unless I'm in front of the box) should indicate if this is the case - check if any of the available memory sizes are negative.
Even though this doesn't explain why the problem is occurring on a system that's been running for 14 months without any changes to the program...
Edit: What's up with the event log you posted? I've never seen it garbled like that before.
When I do a show mem, it still shows that I have memory available, but I tried commenting out the duet modules just to help reduce the size, and the problem was still there. The weird thing like Auser pointed out, is that the program has been stable and running fine for over a year. The only thing that we can tie to the problem is that we had to remove power at that location for about 10 minutes and the CMOS battery seems to be going bad since the time/date drifts now. I'll load the program onto another master later today, but I'm confident it will work o.k.
Is there something else I should consider or can check on the software/setup side? I'll have one of our techs change out the CMOS battery too. If I recall correctly, when the CMOS battery is bad, that can cause issues with the BIOS and the memory pointers, etc..
I can't say it enough, but thanks again for all the help too!
--John
There is no non-volatile memory available. Look at variables that are not declared as VOLATILE. See if any can be set as VOLATILE.
Also, disable any modules that you didn't write.
Hmmm. This particular program is fairly small. It doesn't have any variables declared as non_volatile or persistent. Just to make sure though (since I didn't see that line you quoted until you posted it!) I declared every variable and constant specifically as volatile.
I commented those out too. And just for good measure I commented out the ones I wrote
I reloaded the program with the variables declared volatile, all modules commented out and still the same issue. I ran a show log all again and it still says
The time has also slipped over an hour since I reset it this morning (I'm feeling like a dog with a bone regarding the CMOS thing).
--John
After we uninstalled it, replaced it, and took it to the shop to troubleshoot the problem it naturally went away. I'm sure the switch got toggled a few times in the shop and they may have slapped it around a bit, but it did start working again.
FYI. You have to have at least one variable that is non_volatile or the code wont run.
Don't know why the system would stop all of a sudden after a reboot though.
It's going back for a new motherboard.
Maybe you have a similar problem.
As far as I know, the only reason you need to have one non-volatile variable is so that the debugger will function properly. The code will not have any problems other than those that are coded in if everything in volatile, you will just have problems monitoring variables in debug.
Jeff
I have a test program running on a 2nd master w/o any vars and the code works fine. Doesn't do much but it does what is supposed to. Open debug and I get a warning stating the compiled tkn doesn't match the source code which is BS cuz it's what I just uploaded. If I continue and open debug and drop a defined dev in nothing happens, no D:P:S displayed.
If I put in a VOLATILE var into the code the same thing happens and the var which was initiated to 1 shows nothing when dropped into the debug window. Now if I then change the VOLATILE var to NON_VOLATILE the warning pop up no longer appears, placing a defined dev into debug reveils its D:P:S and then placing the vars into debug show it has a value of 1.
So basically you do need at least one NON_VOLATILE var for debug to work but the code appears to function properly regardless.
Regarding the CMOS battery that I mentioned in every post, it doesn't seem to make any difference. As a matter of fact the NI-3100 that's been there for 2 years was shipped without the battery!
At this point the unit is with me and I'll try different things to make it come back to life before I RMA it to AMX. I feel special since I noticed several of the respondents were registered in 2004 and earlier (I still don't know how Dan C. registered more than a year before all of us ) and those that were registered later are really smart so at least I at least feel like I had a more "advanced" technical problem .
Thanks again everyone! I'll keep everyone apprised if I find out anything else new.
--John
Oh, there is a battery, just not a CR2032 or similar that your used to seeing. It's the big yellow block below the CF slot. And from what you have described, I'd bet your best bet is to get an RMA.
No, it really shipped without one. I pulled the yellow snaphat off of a 4100, then went into this 3100 and found that it was never installed. The timekeeper chip had the holes exposed where the snaphat was supposed to go. I know what I was looking for, and really - it wasn't there. Thanks for the RMA advice, I'll try a few more things this weekend before I send it back.
--John
Just for the benefit of any folks out there who aren't aware, global variables in NetLinx code are non-volatile unless declared otherwise.