IP COMM ?

vining · April 2013

I'm in the middle of updating one of my modules which uses IP comms. Originally I opened a socket and maintained it but I decided it might be nicer if I turned off unsolicited feedback from the device and closed the socket (IP_CLIENT_CLOSE) when the device isn't actually playing (music server) and no UI's are on that device/instance page. Simple enough, right but when I close the connection I can't re-connect and get an IP error "connection refused" when I attempt to re-connect (going back to the device page). My code will continually attempt to connect, first every 30 seconds and if that doesn't work after 5 attempt it tries every 5 minutes indefinitely but it never connects. If I do another IP_CLIENT_CLOSE it will throw an error "port already closed". In order to connect I have to reboot either the server or the master, so I typically choose the master since it's faster.

So the question is, if rebooting the master allows me to connect is it a problem with the master or the server? If the ports closed which it is and if I send IP_CLOSE again to confirm it lets me know how can rebooting the master fix the issue unless it is the master holding on to something. If there's no connection to drop during the reboot how could that clear up an issue with the server holding on to something and not allowing the re-connect? What ever it is doesn't time out either.

I've been looking at IP_BOUND_CLIENT_OPEN but I'm not sure how one would determine what local port isn't in use to use plus I would tend to think that if the problem is in this realm it's the master trying to re-use the previous assigned local port and the device want anything but that and that's what the reboot clears so bounding would be the last thing I should do.

Any thoughts/ideas?

a_riot42 · April 2013

It sounds to me like the AMX socket is closed, but the device's socket has yet to close (CLOSE_WAIT), and won't accept a connection until it has.
Paul

vining · April 2013

But if i do nothing it never times out and if it is actually closed then rebooting the master should have no affect so me thinks it's something else.

a_riot42 · April 2013

vining wrote: »

But if i do nothing it never times out and if it is actually closed then rebooting the master should have no affect so me thinks it's something else.

Sounds like its time to start Wireshark. Post the results if you do.
Paul

Duncan Ellis · April 2013

What happens if you create,say, a batch of five ports and cycle through them? so when you reconnect, you reconnect using a different local port? and when you get to the fifth one you go back to the first? - just out of interest?

I came across a similar issue when I wanted to check if the internet was available or not and opened up a port to google. If the internet was live, everything was ok. If not, there was an issue - the port wasn't open and it wasn't closed, it was somewhere in between and never timed out so I couldn't close it or try to reconnect - very bizarre.
In your case its a bit odd because you would have thought that it would simply cut off and allow reconnection.

Wouldn't it be easier to just turn off the feedback using a 'selection array' which only allows feedback through when the panel is active in the selection array?

Duncan

vining · April 2013

Duncan Ellis wrote: »

What happens if you create,say, a batch of five ports and cycle through them? so when you reconnect, you reconnect using a different local port? and when you get to the fifth one you go back to the first? - just out of interest?

I had the same thought last night but instead of five just toggling between two so when I got up I gave it a shot,

DEFINE_FUNCTION fnDevMod_IPBoundOpen()  
     
     {
     LOCAL_VAR INTEGER nLast_Local_Port;
     
     if(sSBS.sPlayer.nIP_ConState == IP_CLIENT_DISCO)//sSBS.sPlayer.nIP_ConState < IP_CLIENT_PENDING && sSBS.sPlayer.nIP_ConState != IP_CLIENT_DISABLED)// 
	  {
	  STACK_VAR SLONG nStatus;
	  
	  if(nLast_Local_Port == (SBS_LOCAL_IPPORT_1STBASE + sSBS.sPlayer.nDev_Instance))
	       {
	       nLast_Local_Port = (SBS_LOCAL_IPPORT_2NDBASE + sSBS.sPlayer.nDev_Instance);
	       }
	  else
	       {
	       nLast_Local_Port = (SBS_LOCAL_IPPORT_1STBASE + sSBS.sPlayer.nDev_Instance);
	       }
	  fnDevMod_DeBug("'SBS MOD-[ ',itoa(sSBS.sPlayer.nDev_Instance),' ], host IP-[ ',sSBS.sPlayer.cIP,' ], Attempting IP_Open :DEBUG<',ITOA(__LINE__),'>'");
	  sSBS.sPlayer.nIP_ConAttempts++;
	  sSBS.sPlayer.nIP_ConState = IP_CLIENT_PENDING;
	  WAIT 150'WAITING_TO_CONNECT'
	       {
	       sSBS.sPlayer.nIP_ConState = IP_CLIENT_DISCO;
	       }
	  nStatus = ip_bound_client_open(dvSBS.Port,nLast_Local_Port,sSBS.sPlayer.cIP,sSBS.sPlayer.nPort,1);

1STBASE was 57600 and 2ndBASE was 57700 but alas it did not work and I really thought it might. I still get:

Line     28 (06:12:44)::  SBS MOD-[ 3 ] DEBUG:[L-1], ONERROR Client: connection refused :DEBUG<1578>

I came across a similar issue when I wanted to check if the internet was available or not and opened up a port to google. If the internet was live, everything was ok. If not, there was an issue - the port wasn't open and it wasn't closed, it was somewhere in between and never timed out so I couldn't close it or try to reconnect - very bizarre.
In your case its a bit odd because you would have thought that it would simply cut off and allow reconnection.

at least with my issue if I try to close even though I'm already closed I get the error "port already closed' which makes me fairly confident that the master has actually closed all ties to the server but if that's the case what does rebooting the master do that allows me to re-connect once finished rebooting. That's what made me think it was re-using the local IP port (not to be confused with local port or dev port).

Wouldn't it be easier to just turn off the feedback using a 'selection array' which only allows feedback through when the panel is active in the selection array?
Duncan

That's what I was doing but I figured if I had 5 server, 5 module instances which most of the time aren't being used then why maintain the socket connection. Not that I'm worried about running out of my available IP connections. I was already turning off unsolicited feedback too so other than maintaining the socket I wasn't polling, parsing or dealing with the unsolicited stuff.

So I can easily go back to the way it was before I got this dumb idea but now there's the challenge of knowing "why" this is happening or should I say not happening.

Duncan Ellis · April 2013

Hmm... out of ideas on that one, its really bizarre

vining · April 2013

I decided to try and terminate the connection from the server side thinking well maybe the server isn't letting go if I just IP_CLIENT_CLOSE on the master but the same thing happens. The server terminates the connection, I receive my offline notifications and I still can't re-connect until the master or server is rebooted.

Tried again using ip_bound_client_open with a fixed IP port and then again toggling between two fixed IP ports while terminating on the server side and it still won't re-connect until I reboot. So what is the reboot clearing or resetting?

DHawthorne · April 2013

It sounds to me like the device you are connecting to doesn't properly release the socket. And that's an important distinction to be made as well ... the socket and the port are two different things. I had something similar with one of the new Pioneer BD players, except in that case, the connection would drop by itself, but I could not reconnect without power cycling the disc player *or* the network switch. So, as far as the AMX was concerned, the port was closed, but as far as the device was concerned, it was not. I still don't know if the trouble is on the AMX side for not closing the port properly and releasing the socket, or if it was on the device end and it simply didn't honor the request to release the socket. Either way, you wind up with a non-communicating device.

With the advent of more and more IP devices, I'm seeing a lot more IP related errors, and I suspect it isn't just an AMX problem (though I think some AMX firmwares have issues as well). But then a shortcoming in the AMX socket handling trips over a shortcoming in the device socket handling, this kind of thing results: a difficult to find and illogical error situation that you can't seem to resolve. Some IP devices I have never had a problem with, ever. And others I have had so many issues I fell back to RS-232. But I'm afraid a lot of it is under the hood and out of our hands.

vining · April 2013

I'm leaning towards issues with AMX since the problem doesn't exist when I telnet in from my PC and force an exit and re-connect, plus again the problem goes away when I reboot the master or server. If the device was at fault I wouldn't think that would have any affect but like you said there's a loot goin on under the hood that my brain can't comprehend.

I'm gonna try to analyze this a bit with WireShark but that's not that easy to do on a switched network and what networks aren't switched. Most switches only pass broadcast and multicast packets to all ports and subsequently the PC running WireShark and most of want I need to look at is unicast traffic and the switches just pass those packets from point A to B. I do have a Cisco 2960 switch so I gonna attempt to set up a monitor (SPAN) port on it so that specific switch port traffic is mirrored and routed to my monitor port but I haven't been able to get that to work yet. I actually did have it working for a bit but I screwed it up trying to tweak it and I can't get it back.

Here's a link to the SPAN RSPAN setup for a Cisco 2960
http://www.cisco.com/en/US/docs/switches/lan/catalyst2960/software/release/12.2_37_se/configuration/guide/swspan.html

DHawthorne · April 2013

If it were strictly on the AMX side, rebooting the master would clear it ... which is why I think it's problems on both sides neatly dovetailing into each other to create a failure. Out of curiosity, what master are you using? I've had various IP issues with NI-3000's clear up by replacing them with 3100's. Been trying for a long time to isolate it. My experiences with the 3000 lead me to believe it's not releasing sockets, then running out of them on a busy IP system. If your issue is with a 3000, it might be related.

vining · April 2013

DHawthorne wrote: »

If it were strictly on the AMX side, rebooting the master would clear it ... which is why I think it's problems on both sides neatly dovetailing into each other to create a failure. Out of curiosity, what master are you using? I've had various IP issues with NI-3000's clear up by replacing them with 3100's. Been trying for a long time to isolate it. My experiences with the 3000 lead me to believe it's not releasing sockets, then running out of them on a busy IP system. If your issue is with a 3000, it might be related.

In my case it will resolve by just rebooting the master but it will also resolve if i just reboot the server (device) and that doesn't make any sense at all. I think I'll try rebooting the switch and see if that too will resolve the issue.

I'm using an NI-3100 running the latest and I've tried an NI700 running an older version and both were the same.

Hedberg · April 2013

An AMX IP client is not compliant with the telnet protocol and will have problems interacting with some servers. Because a telnet client operating on a PC will work with the server but the AMX IP client won't, my wag is that there is some interaction that the server demands that the AMX client is not providing. I agree that wireshark is the way to find out what to do if you want to solve this. Connect your PC to the server and watch the interaction between your telnet client and the server and see how the client interacts with the server, terminates the connection, etc. Then, watch the connection between your NI and the server. Perhaps you will have to put your PC in between the master and the server as an ethernet bridge in order to see the traffic.

Something else you can do is watch the strings coming from the server back to the client on your NI. These don't show up in diagnostics but you can watch data.text to see exactly what the server is sending to the NI. All the communications will be strings as described in the telnet protocol -- you can send commands and respond to queries from the server.

All communications between the client and the server are these strings that are sent back and forth. There are no magic status connections or anything. The only way the server knows what the client status is is because of the strings sent by the client and/or responses (or lack of responses) by the client to server queries.

What all this amounts to is learning enough about the telnet protocol so that you can solve the problem and make up for the deficiencies in the AMX IP client. You can probably do that by studying the details of the telnet protocol (start with RFC 854) but that's brutal. Much easier to just sniff out how your PC client does it and copy the details.

feddx · August 2024

2024, and I'm having this same issue with a line of Sharp Displays (4P-BXXEJ2U, and XX is the size). Have several of them on a job so I wrote a little module and things work fine... until they don't. The power off shuts down the connection, so a little maintenance is needed but I often get into a IP_Client_Open, port is already open error, and IP_Client_Close, and I get the socket is already closed error. There are ways to remedy this physically, but I'm not going to be onsite all the time and frankly, why am I still dealing with this? Closing and re-opening the connection just causes me strife, forget that I am also doing WoL to the network broadcast (UDP port 2304) to turn this display on.

I'm wondering if anyone has found a solution to this?

Marc Scheibein · August 2024

Just to make sure.... some basec things to TCP connections in NetLinx

do you check the connection issues in the ONERROR section, for the error code numbers?
if TCP, do you track and take care of the connection state? So e.g. doing IP_Client_Open() only if you got an Offline previously or an onerror that depends on a connection loss?
it doesn't help to fire IP_Client_Open() every x seconds. TCP has a contacting timeout that must be waited, and will the create the relating OnError code number. Just after this, a new Open() should be fired. You can't force to be quicker than that TCP timeout
on a TCP connection, if you are client, and a running connect gets interrupted (e.g. power off the server device), it may take up to 1 minute until the TCP protocol recognizes that interruption and reports an onerror "connection timeout". Such timeout is an error and will NOT create an Offline event!

IP COMM ?

Comments