Network Time Sync Anomolies
dalley
Posts: 10
I use the i!-TimeManager module to synchronize the system clocks on a large
number of netlinx systems disperesed throughout a university lan. Lately, I have had troubles with the system clocks randomly being skewd to a date far into the future ([MM-DD-2059]). I use [time.nist.gov] as the default time server. Anbody have any ideas as to what the problem could be?
number of netlinx systems disperesed throughout a university lan. Lately, I have had troubles with the system clocks randomly being skewd to a date far into the future ([MM-DD-2059]). I use [time.nist.gov] as the default time server. Anbody have any ideas as to what the problem could be?
0
Comments
In general, though, I wouldn't recommend having many different NetLinx systems going "external" for the time. There's multiple problems with this:
- From a network utilization perspective, this isn't a good configuration. I'd recommend setting up ONE system inside your LAN to be a time server. This can be a Windows system (running TARDIS, say), a UNIX system (which comes with a time server), or a Cisco router, or a host of other devices. This should go "outside" for the time then serve that time inside.
- For architectural reasons, the code I gave to AMX (and was put into i!-TimeManager with minimal changes) can't implement full NTP, only SNTP. This was done because it's not possible to set the NetLinx clock to fractions of a second. End result: the time that NetLinx gets is affected by packet delays (which NTP is supposed to eliminate).
To avoid these issues, my router is the time server to my LAN. Thus, my NetLinx master contacts a local system (< 1ms away) and gets extremely accurate time data.By doing this, you'll reduce (minimally) overall network traffic, you'll have a more managable system (if you want to change your external time server, you can do so in one place), and the NetLinx time will be slightly more accurate to boot (depending on latency to your time server). Most NTP server software fully implements NTP (and thus corrects for packet delays).
Hope this helps,
-- Jeff
Ok,
I understand, but I also have had a problem with masters' clocks not
being reset for DST hours. Could this issue be realted to the same
problen?
NTP/SNTP only works in UTC (Universal time). It's the client responsibility to make any DST adjustments after receiving the time.
I first started using the !-Schedule module, I modified it to also include the
i!-TimeManager module as well. I guess I did not fully understand how to use it. Now that I look more clodely, it appears that they both use some of the same variables. Could there be an issue with the two modules sharing variable information and trying to perfom similar operations at the same time?
The i!-Schedule code already includes all of the time management code. You *MIGHT* be able to get it to work (after all, they are modules, and should be independent). But why? There's no point in having two modules both go out to the Internet and update the time of the *ONE* master. You need one or the other, but not both.
I still recommend you change your network so that you only have one LOCAL time source, and have NetLinx hit that for the time (as per my prior mesage).
I have had the same problem for about six months now with a system that has 11 masters. AMX Tech Support says nobody else has complained about this issue, and I have a ticket number started for this problem. Call and place your complaint so they don't think I'm nuts!
One question, do you have a problem with all masters or just one or two? I have problems with the same three masters all the time. The problem is random, however, and doesn't happen to all three at the same time.
Unfortunately, it is up to "us" to work out this problem. I thought I had it fixed, when I found a "ground loop" on an ICSNet connection to a projector in one of the system's classrooms. But it is back again.
All masters have new firmware loaded, as well as the latest !i-TimeManager from the web site. I have tried to use individual masters to poll for the SNTP, as well as using one master serving the date/time to all the other masters in the system. Neither works better as far as I can tell. I am using the university's SNTP time server, so it's only a millisecond away. I have tried various other time servers, with no better results.
I'm now going to get the time from the system's linux firewall and see if that works any better.
Tom Goez
Antech Labs, Inc.
I have no clue what's going on. It's possible that something went wrong in the localtime adjustments (SNTP is in UCT only, not localtime, and the module must correct for that). Or it's possible that the remote server gave some totally wild response (my module would disregard that, but it's not clear if AMX changed this or not; my module wouldn't adjust the time if what came back from the time server was > 24 hours off from the current time, solely to eliminate the sort of problem you're having). Or perhaps the server gave the right response, and somehow, the response got garbled. It is UDP, after all. Come to think of it ... that's the most likely possibility here. Again, my module would protect against that, but if AMX changed that ...
My original module, up on SourceForge, is still there (in source form). All my stuff on SourceForge has rich debugging with syslog, which can be enabled or disabled on the fly (syslog is another module that I wrote, but you trivially configure to log with "send_string 0" - my stuff supports that - if you don't have a syslog server somewhere). If you were running my module, I'd say "enable debugging, reproduce, and send me the log". That would tell me, with 100% certainty, what was going wrong.
Unfortunately, AMX stripped all this debugging code (or most if it). When I quesitoned them about it, they said "keep it simple" - fine, but that makes diagnosing this sort of thing impossible. That's why I like debugging code that can be logged to a syslog server, and that can be enabled/disabled on the fly. It's wonderful for problems just like this. Lots of my networking devices do syslog, and there's no reason that my NetLinx controller shouldn't do it as well.
I'd suggest the following.
1. Contact AMX, ask if debugging can be enabled for the module, see if that gives you the information you need. I'm honestly not sure what they left in the module or not.
2. If not, you might want to consider asking AMX for the source and adding back in some of my debugging code. Heck, it's not like they wrote it or something. I gave the thing to them, *AND* it's in public domain (on SourceForge), so it's not like they'd be releasing their intellectual property or something.
3. If that doesn't fly, you could consider simply using my module. If the problem still occurs, then I'll explain how to enable debugging, you could send me the log, and that will give me all the data I need to diagnose this.
I imagine that if this does turn out to be a core bug in my code, AMX would be happy to take a patch from me ... (I assume).
If it is a UDP garbling thing, we can likely work around that in code. But for Tom (who has a local time server), I can't see how the response could possibly be garbled unless he has lousy networking hardware (hubs rather than switches, where collisions would be common).
I guess the first thing is: I need to understand what's going on for sure. If it's a bug in my code, I will fix it. It's up to AMX to incorporate my fix into their stuff. But for all I know, my core code is fine and AMX introduced this. No way of knowing until we isolate what's happening.
If you're interested in resolving this, would you be willing to run my code off of SourceForge? With logging data, I can have this isolated in about 10 seconds flat, and a code fix would be forthcoming shortly after that ...
Let me know,
-- Jeff
This particluar system has been up since 2000, with no previous problems.
The system at the University is on a dedicated AV network, and the time update takes place about 2:00 AM when there is no camera control present or other activity.
I will be glad to experiment with other code. I have full control and monitoring capabilities here at our shop, so I don't have to go to the jobsite to do this.
Let me know where I have to go to get your software.
Thanks,
Tom Goez
I'd be happy to try your code as you wrote it. Do you have a direct link to it
on Source forge? What other components do I need to have running for syslog data?
http://cvs.sourceforge.net/viewcvs.py/netlinx-modules/NetLinx-Modules/
Note that the license file in the base directory applies to all my code. But it's a pretty typical open-source license; I doubt you'll have any kind of issues with it.
The TimeSync code is in directory TimeSync. It requires the DawnDusk code, which is a separate module since several different components use it.
Unfortunately, I never made an installation package for this (like you'll see for AudiotronMod and SlimServerMod). This means it's harder for you to use it, unfortunately (sorry). Let's talk you through it and, if you really have problems, I'll make an installation package.
It's fairly time-intensive for me to do that as I'd need to write a small sample program, blow away my working system to fully test it, perhaps write some tiny touchpanel program to test stuff via button pushes, etc (and when done, load my real stuff back). Doable, sure, but likely an hour or two to do that, make the .zip distribution, and upload to SourceForge. I'd prefer to just walk you through it and only take that effort if mandatory (if you guys don't mind).
Start by downloading TimeSyncMod.axi, TimeSyncMod.axs, DawnDusk.axi, DawnDusk.axs, GetIPError.AXI, and Syslog.AXI (sorry, my stuff is all modularized). If you actually want to use Syslog for logging, then also download Syslog.AXS. Compile all that crud, and include DawnDusk.axi, syslog.axi, and timesyncmod.axi into your mainline code.
When you get that far, let me know. You'll be adding a few define_device definitions for module communications (one for timesync, one for dawn dusk), a port for timesync and, if you choose, a define_device and port for syslog.
More on syslog, and what it does: Syslog is a internet standard for independent machines and devices to log to a common log server. Syslog servers (what you log to) ship with UNIX, and are freely available for Windows. Basically, it allows your NetLinx code to log events and have you look at them later on your PC (so you don't need to keep a TELNET session open to your NetLinx system). You can define logging levels, so only errors or critical errors (or whatever) get logged. Or, if you're debugging, you can log everything. This is runtime configurable.
You don't NEED to use syslog (but you still need syslog.axi). If you don't want to use syslog, then create a constant device (NullDevice = 0.0.0.0), and pass that to syslogmod.axi. It would then know to never use syslog, but limit itself to "send_string 0" type debugging.
This should be good enough to get you started. Let me know how it goes.
sorry for offtopic but can someone from you clarify time sync between multiple masters? I mean do I need to run i-TimeManager on each master or one is enough? Usually I have such URL list config that masters see each other, if this may help...
you will not be able to snycronize the time of each local master. I think that what is mentioned in earlier posts simply means that you can use one master to connect to either an internal or external network time server and then use that master as a sort of relay time server where it serves network time to other masters. But even doing this, each of the other masters has to use one of the aforementioned modules to connect and process information.
Hope this helps.
Just for information, clock problems may have something to do with lithium batteries model CR2032 inside Netlinx Controller.
From the manual:
"The NI series of controllers use a combination lithium battery and clock crystal package called a Timekeeper. The battery can be
expected to have up to 3 years of usable life under very adverse conditions. Actual life isappreciably longer under normal operating conditions."
NXC-ME has 2 such batteries with lifetime 2.5 years.
Especially this message might be actual for Tom Goez, who mentoned his system running since 2000...
David,
I use one master with TimeManager on it to update multiple masters in one system.
Just define virtuals for all masters you want to update:
DEFINE_DEVICE
vdevMaster101 = 0:1:101 //for TIME updates
vdevMaster200 = 0:1:200
vdevMaster210 = 0:1:210
vdevMaster211 = 0:1:211
vdevMaster220 = 0:1:220
vdevMaster230 = 0:1:230
vdevMaster330 = 0:1:330
vdevMaster340 = 0:1:340
vdevMaster440 = 0:1:440
vdevMaster550 = 0:1:550
then in the button_event code used to update the master:
button_event[vdvTmEvents,nTmTimeChangeChannel]
{
push:
{
send_string 0,"'Time adjusted by time server. New Time is ',time"
wait 5
{
//LETS UPDATE ALL THE MASTERS FROM THE CORRECT time (AND DATE)...
send_command vdevMaster101, "'CLOCK ',DATE,' ',time"
send_command vdevMaster200, "'CLOCK ',DATE,' ',time"
send_command vdevMaster210, "'CLOCK ',DATE,' ',time"
send_command vdevMaster211, "'CLOCK ',DATE,' ',time"
send_command vdevMaster220, "'CLOCK ',DATE,' ',time"
send_command vdevMaster230, "'CLOCK ',DATE,' ',time"
send_command vdevMaster330, "'CLOCK ',DATE,' ',time"
send_command vdevMaster340, "'CLOCK ',DATE,' ',time"
send_command vdevMaster440, "'CLOCK ',DATE,' ',time"
send_command vdevMaster550, "'CLOCK ',DATE,' ',time"
}
}
}
I did not show the TimeManager code, just what you'll need to update additional masters from one running the module.
Let me know if you have additional comments or questions.
Tom
When the batteries go dead, the clock keeps running until the DC power is removed from the master. After power is removed, the time and date then goes haywire. These particular systems are backed up with large UPS systems.
New batteries were installed in all masters and touchpanels in the system. It's part of our maintenance contract and we do it every two years.
It didn't help this particular problem!
I spoke with Leslie Ward (AMX Tech Support) yesterday and she remembers my problem, but said nobody else has reported this problem.
Tom
Tom's correct.
The time management code uses a "SEND_COMMAND 0" command to actually update the time of the local master. There's no reazon you can't use that time to update many masters in a multi-master system.
Running timesync on multiple masters would also work, but it's "better" (lower overhead) to just run one instance on one master and then use that instance to update all masters. Also, you're more likely to keep the masters in absolute sync (i.e. not a second apart due to the inability to set the NetLinx time to fractions of a second), not that this matters all that much.
To the best of my knowledge, if you set the clock on one master, this doesn't update the clock on other masters automatically. But only AMX could answer this for certain (or someone with multiple masters can try it and report back).
-- Jeff
Jeff,
One master does not automatically update the other masters on the network. You have to "send" the update to each master in the system.
I tried it both ways, only the example above works. There may be other ways to code it, but this was quick and simple.
Since this one master links to the others via the URL table, it's a snap.
Tom
You say you tried it "both ways". What other way? Using multiple instances (one per master)? I know that works.
As I said before, your way is what I did when I had multiple masters, and is the most efficient way to do this. [That is, have the one master update all the other masters directly].
You can run multiple instances (one per master), and that works (at least with my module), but it's sort of pointless except for testing purposes.
-- Jeff
I will also try to get in touch with AMX tech support. Sometimes, they can open up the source code in their module and answer questions about the raw code. In this case, I'd like to know if they kept Jeff's check for an accurate date/time (i.e. less than 24 hours from the master's current time). If anyone else talks to them, ask them that very question.
For the others that are having this problem, how large is your actual program. My program in this case is quite large, many include files and who knows how many lines of code.
Glad to have this dialog started on this bizarre issue. If I get any answers, I will pass them along.
Sheldon Samuels
IPS Resources LLC
I had no idea this was such a common problem. It's probably unreasonable to expect 'n' different folks (where 'n' > one or two) to go through the pain of getting my module going because I don't have an installation kit.
I'll try and look at producing an installation kit this weekend. That will make it dramatically easier to get going with my module if you see the need. It'll have a tiny sample program so you can see exactly how it works.
Of course, if you use i!-Scheduler (which has i!-TimeManager in it), then using my module is somewhat counterproductive (since you'd now have two different modules doing time syncronization).
BTW, another way of using my time syncronization stuff is to use NTP broadcast. In this mode, some device in your network (in my case, a Cisco router) will broadcast the time from time to time (the Cisco router does an NTP broadcast every 64 seconds). My time module would pick that up, see if the time needs an adjustment and, if so, adjust it (if it's more than a second or two off). If it's not far off, it'll disregard the update (to not signal the update, update system logs, send E-Mail, etc).
For you folks having this problem, that might be a better way to use this. That way, even if the packet got garbled, or even if some bug occurred where the time was wildly off, it would fix itself shortly.
If it's a packet garbling issue causing this (which would be a problem in the core code), I'll probably request the time three times and, if they're all very close, take one of them. If one is wildly off, I'll take the two that are close to one another. If all three are wildly off, I'll bag and issue an error. But I really really need the log messages to see if that's what's going on here. No point coding a "fix" if that's not the problem.
-- Jeff
Never really got any farther in the troubleshooting then that because we ended up replacing the touchpanel that was causing problems with a Modero Viewpoint and the problem went away.
Jeff,
Both ways meaning (1) each master has TimeManager installed, and (2) only one master has TM that updates all other masters.
All the latest ideas up to this post have been tried or eliminated as the source of this problem.
Everyone needs to call AMX Tech Support and ask them "what's up" with this. They say I'm the only one with the problem.
AGAIN, thanks to all who have been responding about this bothersome problem.
Tom
Run my code, that has rich debugging (and a developer that will work with you). If you have a problem, I *WILL* find it, it's as simple as that.
Then the only issue is getting AMX to fix their code after I patch mine. Or getting everyone to move to Open Source, of course! :-)
As promised, I've created a TimeSync package and uploaded it to SourceForge.
Documentation is rather skimpy, but it works, and it'll get ya' a heck of a lot closer than producing this yourself.
The package can be downloaded from:
http://sourceforge.net/projects/netlinx-modules/
Look at the mainline program to customize. Pretty much, fix up your timezone, fix up the NTP server name or address, and you should be set.
So, for those that are having this problem: Please install this and see if you still have the problem. If so, then you need to set llTimeSync[1] and llTimeSync[2] to 7 (full debug). Index [1] controls SEND_STRING 0 output, index [2] controls Syslog output (only useful if you're running syslog).
Obviously, if you're not running syslog, then leave a TELNET session opened to the master so you can capture the debug output.
Let me know what you guys find, thanks!
-- Jeff
- Chip
See post #17 this thread.
Tom
If ANY master does a "SEND_STRING" to this device, that string will trigger a DATA_EVENT and appear in DATA.TEXT for that device in ALL of the masters.
So the master doing the time sync can get an update, put the new time in a SEND_STRING to that virtual device, and now that time will appear in a DATA_EVENT in all of the other masters. (The master that issued the SEND_STRING will get it as well)
This is based on Tech Note 435, which rumor has it was an idea supplied to the AMX PA folks by a student in one of their training classes. (Who sadly wound up getting no credit at all for it)
- Chip