IP_CLIENT connection management?

shr00m-dew · September 2005

We've been using the Knox line of switchers for a while and my coding for them is pretty solid. A few changes here and there depending on the model and I/O's..

So I decided to try their new IP option. A few tweaks and everything is good to go. The issue is bad IP dis-connect issues. The only other IP device I've played with is the fireball. I pretty much just flag a variable on ONLINE/OFFLINE events and reconnect as needed with the fireball. Works flawless.

I'm not having any issues with disconnects, I just want to prepare just in case. Everything I try to do either causes random dis-connects/reconnects, or some serious loops.

Enough background, how is everyone handling IP connections and such? Can you run checks to see status of a connection or do things based on certain errors? I tried setting a disconnect/reconnect ONERROR, but if you try to reconnect after it's all ready trying to reconnect that's another error which will create a nasty loop.

Thanks,

Kevin D.

Reese Jacobs · September 2005

IP_CLIENT connection management?

Kevin,

I use TCP to control a wide array of devices with great success. Other than ONLINE, OFFLINE, and ONERROR events for network devices, no other socket or network status runtime commands exist in the Netlinx language. Like you, when I connect to a server (device) and the ONLINE event triggers, I mark the device online. I generally only send diagnostic error messages when I receive an ONERROR event for a network connection since an error does not always mean that the connection has been terminated. I have found that OFFLINE events are reported reliably when a network connection is terminated and I use the OFFLINE event to drive my reconnect sequence.

For most devices, when I receive an OFFLINE event, I will delay slightly and then attempt to reconnect to the device. Until Escient recently fixed their firmware to prevent it from dropping TCP connections every 60 minutes, I used this strategy to reconnect and it worked well. I have found that a 3-5 second delay before a reconnect is beneficial since this provides time for both Netlinx and the server device to tear down all of the network socket setup so that the TCP client device in Netlinx can be used successfully to reconnect. Trying to reconnect too soon can result in an error indicating that the socket is already in use (an ONERROR event) and if you then try to reconnect based on the ONERROR event, you will get into a painful error/reconnect loop.

I do not have any experience with the Knox switchers so I can not provide any info with respect to the issues you are having with it. The other thing that I do with my TCP connected devices is to ping them periodically just as a keepalive strategy. For instance, with Escient devices, I send a power status request to the device every 15 minutes just to refresh the connection status. This command does not power on the device if it is off so it is a harmless request in that sense. With Lutron, I send an RST to the processor every 10-15 minutes to test connection status. This also provides me with a time response from the processor indicating the last time I communicated with it. Other than handling ONLINE and OFFLINE conditions and implementing your own keepalive protocol, the options are pretty limited.

Reese

DHawthorne · September 2005

Many IP devices deliberately drop the connection after a transaction takes place. The MAX servers are a good example of this; another I've seen that does it are the Proxima projectors. The point is, dropped connections can be normal. It's only an issue if the device is slow enough in establishing the connection to detriment your control code.

Besides the ONLINE and OFFLINE data events, you should also track the ERROR handler. Some errors imply an offline condition, or a connection attempt that failed, and some still reflect a connection but some other error, so you have to parse the error code. But between those three you can pretty reliably tell if you are connected.

I don't like repeatedly connecting and having the connection dropped. It ties up resources. So I generally connect on demand as my controller needs it, testing, of course, if I am already connected. I generally will not explicitly drop the connection myself though. The exception is for devices that take a long time between no longer responding and the NetLinx recognizing the connection is gone. In those cases, I'll send out what I need, then drop the connection, opening it up again when needed.

If the device sends asynchronous feedback (like the Escient) you will need to keep the connection open. I too have found them to be much more capable of doing this reliably in recent timne (the original TCP control was flawed to unusability).

The bottom line is there is no one-solution-fits-all answer. You are going to have to see how your device works, and what method will work not only properly, but most efficiently.

Irvine_Kyle · September 2005

I've worked extensively with TCP/IP connect issues and have found the following:

For one the IP_CLIENT_OPEN returns a value, this is where you can monitor errors. I seem to remember that the ERROR: event handler for something like 0:6:0 doesn't really give much detail. Instead you can set a variable to the return value of IP_CLIENT_OPEN such as slCONNECT_SATUS=IP_CLIENT_OPEN(blah.blah.blah)

This needs to be a *SIGNED LONG* variable, cause it can return negative values.

Typically what I will do is monitor a IP connection by periodically sending what I call a heart beat packet, ususally its just some random query, like what's your firmware, or something like that. I then set a flag that the heartbeat_packet has been sent. On the STRING: event I know that the device responded, if I get no response, then I IP_CLIENT_CLOSE, wait for the OFFLINE: event to occur, then reestablish connection.

IP_CLIENT connection management?

Comments