IP COMM ?
vining
Posts: 4,368
I'm in the middle of updating one of my modules which uses IP comms. Originally I opened a socket and maintained it but I decided it might be nicer if I turned off unsolicited feedback from the device and closed the socket (IP_CLIENT_CLOSE) when the device isn't actually playing (music server) and no UI's are on that device/instance page. Simple enough, right but when I close the connection I can't re-connect and get an IP error "connection refused" when I attempt to re-connect (going back to the device page). My code will continually attempt to connect, first every 30 seconds and if that doesn't work after 5 attempt it tries every 5 minutes indefinitely but it never connects. If I do another IP_CLIENT_CLOSE it will throw an error "port already closed". In order to connect I have to reboot either the server or the master, so I typically choose the master since it's faster.
So the question is, if rebooting the master allows me to connect is it a problem with the master or the server? If the ports closed which it is and if I send IP_CLOSE again to confirm it lets me know how can rebooting the master fix the issue unless it is the master holding on to something. If there's no connection to drop during the reboot how could that clear up an issue with the server holding on to something and not allowing the re-connect? What ever it is doesn't time out either.
I've been looking at IP_BOUND_CLIENT_OPEN but I'm not sure how one would determine what local port isn't in use to use plus I would tend to think that if the problem is in this realm it's the master trying to re-use the previous assigned local port and the device want anything but that and that's what the reboot clears so bounding would be the last thing I should do.
Any thoughts/ideas?
So the question is, if rebooting the master allows me to connect is it a problem with the master or the server? If the ports closed which it is and if I send IP_CLOSE again to confirm it lets me know how can rebooting the master fix the issue unless it is the master holding on to something. If there's no connection to drop during the reboot how could that clear up an issue with the server holding on to something and not allowing the re-connect? What ever it is doesn't time out either.
I've been looking at IP_BOUND_CLIENT_OPEN but I'm not sure how one would determine what local port isn't in use to use plus I would tend to think that if the problem is in this realm it's the master trying to re-use the previous assigned local port and the device want anything but that and that's what the reboot clears so bounding would be the last thing I should do.
Any thoughts/ideas?
0
Comments
Paul
Sounds like its time to start Wireshark. Post the results if you do.
Paul
I came across a similar issue when I wanted to check if the internet was available or not and opened up a port to google. If the internet was live, everything was ok. If not, there was an issue - the port wasn't open and it wasn't closed, it was somewhere in between and never timed out so I couldn't close it or try to reconnect - very bizarre.
In your case its a bit odd because you would have thought that it would simply cut off and allow reconnection.
Wouldn't it be easier to just turn off the feedback using a 'selection array' which only allows feedback through when the panel is active in the selection array?
Duncan
That's what I was doing but I figured if I had 5 server, 5 module instances which most of the time aren't being used then why maintain the socket connection. Not that I'm worried about running out of my available IP connections. I was already turning off unsolicited feedback too so other than maintaining the socket I wasn't polling, parsing or dealing with the unsolicited stuff.
So I can easily go back to the way it was before I got this dumb idea but now there's the challenge of knowing "why" this is happening or should I say not happening.
Tried again using ip_bound_client_open with a fixed IP port and then again toggling between two fixed IP ports while terminating on the server side and it still won't re-connect until I reboot. So what is the reboot clearing or resetting?
With the advent of more and more IP devices, I'm seeing a lot more IP related errors, and I suspect it isn't just an AMX problem (though I think some AMX firmwares have issues as well). But then a shortcoming in the AMX socket handling trips over a shortcoming in the device socket handling, this kind of thing results: a difficult to find and illogical error situation that you can't seem to resolve. Some IP devices I have never had a problem with, ever. And others I have had so many issues I fell back to RS-232. But I'm afraid a lot of it is under the hood and out of our hands.
I'm gonna try to analyze this a bit with WireShark but that's not that easy to do on a switched network and what networks aren't switched. Most switches only pass broadcast and multicast packets to all ports and subsequently the PC running WireShark and most of want I need to look at is unicast traffic and the switches just pass those packets from point A to B. I do have a Cisco 2960 switch so I gonna attempt to set up a monitor (SPAN) port on it so that specific switch port traffic is mirrored and routed to my monitor port but I haven't been able to get that to work yet. I actually did have it working for a bit but I screwed it up trying to tweak it and I can't get it back.
Here's a link to the SPAN RSPAN setup for a Cisco 2960
http://www.cisco.com/en/US/docs/switches/lan/catalyst2960/software/release/12.2_37_se/configuration/guide/swspan.html
In my case it will resolve by just rebooting the master but it will also resolve if i just reboot the server (device) and that doesn't make any sense at all. I think I'll try rebooting the switch and see if that too will resolve the issue.
I'm using an NI-3100 running the latest and I've tried an NI700 running an older version and both were the same.
Something else you can do is watch the strings coming from the server back to the client on your NI. These don't show up in diagnostics but you can watch data.text to see exactly what the server is sending to the NI. All the communications will be strings as described in the telnet protocol -- you can send commands and respond to queries from the server.
All communications between the client and the server are these strings that are sent back and forth. There are no magic status connections or anything. The only way the server knows what the client status is is because of the strings sent by the client and/or responses (or lack of responses) by the client to server queries.
What all this amounts to is learning enough about the telnet protocol so that you can solve the problem and make up for the deficiencies in the AMX IP client. You can probably do that by studying the details of the telnet protocol (start with RFC 854) but that's brutal. Much easier to just sniff out how your PC client does it and copy the details.
2024, and I'm having this same issue with a line of Sharp Displays (4P-BXXEJ2U, and XX is the size). Have several of them on a job so I wrote a little module and things work fine... until they don't. The power off shuts down the connection, so a little maintenance is needed but I often get into a IP_Client_Open, port is already open error, and IP_Client_Close, and I get the socket is already closed error. There are ways to remedy this physically, but I'm not going to be onsite all the time and frankly, why am I still dealing with this? Closing and re-opening the connection just causes me strife, forget that I am also doing WoL to the network broadcast (UDP port 2304) to turn this display on.
I'm wondering if anyone has found a solution to this?
Just to make sure.... some basec things to TCP connections in NetLinx
do you check the connection issues in the ONERROR section, for the error code numbers?
if TCP, do you track and take care of the connection state? So e.g. doing IP_Client_Open() only if you got an Offline previously or an onerror that depends on a connection loss?
it doesn't help to fire IP_Client_Open() every x seconds. TCP has a contacting timeout that must be waited, and will the create the relating OnError code number. Just after this, a new Open() should be fired. You can't force to be quicker than that TCP timeout
on a TCP connection, if you are client, and a running connect gets interrupted (e.g. power off the server device), it may take up to 1 minute until the TCP protocol recognizes that interruption and reports an onerror "connection timeout". Such timeout is an error and will NOT create an Offline event!