Timelines and lockups

Spannertech · April 2005

I've had the most frustrating day. I was back at a house which I'm starting to believe is truly haunted.

On a member of this board's advice, I abandoned AMX's GUI module and did my own for the Compool 3800 (a spa controller box), really just using a very small part of what AMX had done, and leaving the rest out. I need a mere fraction of what it can do. All I'm really doing is changing the state of 4 relays, that's it. I'm still using AMX's comm module which seems to be fine. So I have some buttons on a TP which do send_commands to the virtual device in the module, and that works fine, turning these relays on and off. However, it is also required that I do these same send_commands in a long timeline - sounds crazy but it's 2 minutes of filling a hot tub, 18 minutes of waiting for the undersized water heater to warm up some more water, and I do that about 6 times to get the thing full. The timeline has 12 elements in it. When I set the timeline going, it will very reliably lockup the Netlinx processor, usually when it gets to the first 18 minute segment. There's nothing else going on sending commands to the virtual device or real device. If I shorten the times down to 10 seconds for every step, it completes the whole thing without incident.

The ONLY thing that's different between 10 seconds and 2 minutes, is that the valve for filling the tub doesn't actually react if you do things that fast. It's got a hystereris delay of 20 seconds or so, so if you change things every 10 seconds, it doesn't actually move the fill valve. I've recorded and looked at the strings coming back from the real device, and nothing stands out as weird, between the times it locks up and the times it doesn't. When I say "lock up" I mean that any TP presses are ignored, no feedback. If Netlinx Diagnostics was telling lies so I couldn't see what was really going on, is it really possibly for a bit of incoming data to lockup the bus? The Compool is a strange thing. You can ONLY send it toggling commands for the aux relays. Therefore you HAVE to parse their feedback to know what state things are in. And it squirts out a status string every couple of seconds whether you like it or not.

When it crashes, the processor might be alive, but the Axlink and ICSNet are dead. It needs a reboot to get it back working again. I'll be sending my code to AMX tommorow, with this story, but just in case anyone has any bright insights...

Thanks

OP

DHawthorne · April 2005

I've never used a timeline that ran for so long, but it really sounds like for some inexplicable reason, the master is pending it's internal message queue while the timeline is running. Have you tried Telneting into the master and watching (or at least logging) the diagnostic messages while the timeline was running? Maybe it will offer a clue or two.

If long timelines have an inherent problem, I do have an idea that may work without too much adaptation. Replace your timeline with a small repeating one with a single event that increments a variable once a second. Then just test the size of your variable to fire your events. Actually, I think going the other way would work better - where your old timeline would have started, set your variable at the total time for the whole routine, then launch the repeating one-second timeline and decrement it every second. You can test your variable each pass while the timeline is running, and stop it when it hits zero. Fire your events at the appropriate intervals. Since the timeline itself is generating a sequence event once a second, you won't have the long time periods locking anything up.

Spannertech · April 2005

Actually I was logging both Netlinx Diagnostics (Strings to/from the real device, and commands to/from the virtual) and also had a telnet session running with MSG ON , and nothing wacky happens. I think the processor is still alive, but the busses get locked up (no TP activity). It was my thought that the long (18 minutes) time could be a problem, but a LONG integer is good for a couple of days worth of thousands of seconds (confirmed by AMX) and they say they've had them running that long.

I quite like the idea of running a once-a-minute counter, and decementing it and using that, similar to your suggestion of every second. I've done similar before with good success. Don't know why I should have to, but if it solves the problem, I'm all for it.

Thanks

OP

DHawthorne · April 2005

Another thing if you really want to find the original problem - /show log, and /show log all (capture the text in your Telnet client, could be a lot of stuff there) shows more events than the diagnostics messages.

jeffaco · April 2005

Long timelines work fine for me ...

I don't believe that a timeline of 18 minutes is a problem ...

My scheduling code, at midnight, works out all the events for the next day (until the next midnight), and lays out a timeline to fire events appropriately.

So, it's routine for me to have timelines where the last entry "fire" is 24 hours away. And this works fine - no "missed events" or anything like that, no lockups, etc.

It's worth a shot, I guess. But I'm not at all confident that this will fix your problem.

Timelines and lockups

Comments