IP Connection Issues
vining
Posts: 4,368
I just recently started having some major problems with IP comms. Up until now all conection have been rs232 with the exception of time_manager but I recently added IP comms to a new device which worked fine for a month. I have periodically had issues with slow sluggish performance and then complete shut down and I recall seeing past posts regarding this system and it's link to IP comms. Now things just seem to be getting worse. Now I can't run diagnostics, debug, IP connections to the panels drop out until the systems problems times out, which can quite a long time, 30 minutes, 5 minutes with out any logical rhyme or reason.
I've run the new IP include file in a different master in a smaller program with out issues.
When I initialize IP_Client_Open in the Define_start I put it behind a 1 minute wait just to clear the initial clutter. I send a IP_Client_Close for an offline event and the same when I get a connection error 7, 14 or 17 then a 30 second wait then an IP_Client_Open again. I played with multiple variations of these because sometime when I haven't issued an open and I know a close has been sent I'll get an "error 9" (already disconnected) or when I'm not connected I'll get an "error 14" (port already in use) and if I get a "error 17" (port not open) then try to open I can't.
Define Device:
Define Start:
Data Event:
For Online connection:
For Offline or OnError
I've tried pretty much everything I could think of even put the entire program on a different identical master so it has to be either code based problem it's IP socket handler or something.
When I loose debug or diagnostics (NS2 connection to master) and TP IP connection to master, the master still appears to run RS232 comms just fine, at least the send/recieve LED's blink alternately for device running a query time_line.
It's got to be something stupid. I also just started having some of those other recent debug issues of being off a few lines as well as other goofy **** happening.
If any one can shed some light on this issue my aching brain would be most appreciative!
I've run the new IP include file in a different master in a smaller program with out issues.
When I initialize IP_Client_Open in the Define_start I put it behind a 1 minute wait just to clear the initial clutter. I send a IP_Client_Close for an offline event and the same when I get a connection error 7, 14 or 17 then a 30 second wait then an IP_Client_Open again. I played with multiple variations of these because sometime when I haven't issued an open and I know a close has been sent I'll get an "error 9" (already disconnected) or when I'm not connected I'll get an "error 14" (port already in use) and if I get a "error 17" (port not open) then try to open I can't.
Define Device:
DEFINE_DEVICE //////////////////// IP PORTS/SYSTEM //////////////////////////// //dv010Master = 0:1:0 dvTmTimeSync = 0:3:0 dvRSS = 0:7:0 dvSoundB_Client = 0:8:0 //Client D:P:S /////////////////// SOFTWARE APPS /////////////////////////////// dvMediaPlayer = 8001:1:0 //i!-PCLink/MediaPlayer Application (Default is 8001:1:0) ////////////////// TOUCH PANELS BELOW /////////////////////////////// (******* MVP BELOW ***********)
Define Start:
DEFINE_START create_buffer dvTPSoundBArray, cSBTPBuff create_buffer dvSoundB_Client, cSoundBBuff //create_level dvTPSoundBArray,81,SoundB[1].SBCurSongPlayTime wait 600 { SEND_STRING 0,"'SoundB: Started TL_SoundBFBRT timeline!',crlf" timeline_create(TL_SoundBFB,TL_SoundBFBRT,1,timeline_absolute,timeline_repeat) SEND_STRING 0,"'SoundB: Attempting to open a connection!',crlf" IP_Client_Open(dvSoundB_Client.port,IP_AddressSoundB,SoundB_Client_Port,TCP) } nSBNowPlayFlag = 0 SoundB[1].SBCurListIndexLow = 0 SoundB[1].SBCurListIndexHi = 9 SoundB[1].SBCurListIndex = 1 nSBAdvSearchFlagStep = 0 nSBAdvSearchFlag = 0 SoundB[1].SBRepeat = 'none' SoundB[1].SBShuffle = 'off' SoundB[1].SBDoWhatToQ = 'QueueAndPlay' SoundB[1].SBCurLevelChannel = 90 fnClearSentVariables()
Data Event:
DATA_EVENT [dvSoundB_Client] { ONLINE: { fnSBOnline(0)// 0 means nothing , for future flagging? } OFFLINE: { SEND_STRING 0,"'SoundB: OffLine Event',13,10" fnSBOffline('offline') } ONERROR: { SEND_STRING 0,"'SoundB: on-error status code: ',ITOA(DATA.NUMBER),crlf" if (data.number <> 0) // data.number = 0 -> No error { SEND_STRING 0,"'Device ',DEV_TO_STRING(DATA.Device),':',GET_IP_ERROR(DATA.Number),13,10" SEND_COMMAND dvTPSoundBArray,"'!T',1,'Device ',DEV_TO_STRING(DATA.Device),':',GET_IP_ERROR(DATA.Number)" } switch (data.number) { case 7:{fnSBOffline('OnError7') break} case 9:{fnSBOffline('OnError9') break} case 14:{fnSBOffline('OnError14') break} case 17:{fnSBOffline('OnError17') break} } //fnSBCylcePower() } STRING: { stack_var integer nFBS stack_var char cSBCommand [SBMaxStrLength] stack_var char cSBReturnedStr [SBMaxStrLength] stack_var integer nSBIndexer stack_var char cCmdTemp [SBMaxCmdLength] if (length_string(cSoundBBuff) > 1) { send_string 0,"'SoundB: received ',itoa(length_string(cSoundBBuff)),' bytes of data from server',crlf" } if (find_string(cSoundBBuff,"13,10",1)) { Select { ACTIVE (find_string (cSoundBBuff,"'SoundBridge>'",1)): { if (!nSBRebooting) { SoundB[1].SBCurRcvdStr = 'SoundBridge>' cSBjunk = remove_string (cSoundBBuff,"'SoundBridge>'",1) wait 1 { send_string dvSoundB_Client, "'mmc',13,10" } } } ACTIVE (find_string (cSoundBBuff,"'roku: ready',13,10",1)): (* continued *)
For Online connection:
DEFINE_FUNCTION fnSBOnline(integer iValue) { SEND_STRING 0,"'SoundB: connected!',crlf" if (iValue == 0) { stack_var integer nSBLoop nSBCounter = 0 nSBOnline = 1 ON[vdvSBOnlineFB ,CLIENT_ONLINE] send_string dvSoundB_Client, "'SubscribeTransportUpdateEvents',13,10" wait 1 { send_string dvSoundB_Client, "'SetServerFilter all',13,10" wait 1 { send_string dvSoundB_Client, "'SetListResultType partial',13,10" wait 1 { fnSBResetDisplays() timeline_create(TL_SoundBQuery,TL_SoundBQueryTime,1,timeline_absolute,timeline_repeat) } } } // wait 3 moved into // { // fnSBPPOF_DisplayGroup() // wait 1 // { // fnSBPPOF_SideBarGroup() // wait 1 // { // fnSBPPON_Servers() // } // } // } SEND_COMMAND dvTPSoundBArray,"'!T',1,' '" SEND_COMMAND dvTPSoundBArray,"'!T',2,' '" SEND_COMMAND dvTPSoundBArray,"'!T',15,' '" SEND_COMMAND dvTPSoundBArray,"'!T',14,' '" SEND_COMMAND dvTPSoundBArray,"'!T',20,' '" SEND_COMMAND dvTPSoundBArray,"'!T',31,' '" for(nSBLoop = 1; nSBLoop <= SBMaxListDisplayed; nSBLoop ++) { SEND_COMMAND dvTPSoundBArray,"'!T',49 + nSBLoop,''" iValue ++ } } }
For Offline or OnError
DEFINE_FUNCTION fnSBOffline(char iFromWhere [9]) { local_var integer nSBSendOnce if (timeline_active(TL_SoundBQuery)) { timeline_kill(TL_SoundBQuery) SEND_STRING 0,"'SounB: TimeLine TL_SoundBQuery KILLED!',crlf" } SEND_STRING 0,"'SounB: disconnected!',crlf" if (iFromWhere != 'OnError9' || iFromWhere != 'OnError7') { IP_CLIENT_CLOSE (dvSoundB_Client.port) } fnServerDisconnected() nSBOnline = 0 OFF[vdvSBOnlineFB ,CLIENT_ONLINE] SEND_COMMAND dvTPSoundBArray,"'!T',1,'Offline'" if (nSBSendOnce == 0) { nSBSendOnce = 1 wait 300 { SEND_STRING 0,"'SounB: re-openning connection from "',iFromWhere,'" !',crlf" IP_Client_Open(dvSoundB_Client.port,IP_AddressSoundB,SoundB_Client_Port,TCP) nSBSendOnce = 0 } } }
I've tried pretty much everything I could think of even put the entire program on a different identical master so it has to be either code based problem it's IP socket handler or something.
When I loose debug or diagnostics (NS2 connection to master) and TP IP connection to master, the master still appears to run RS232 comms just fine, at least the send/recieve LED's blink alternately for device running a query time_line.
It's got to be something stupid. I also just started having some of those other recent debug issues of being off a few lines as well as other goofy **** happening.
If any one can shed some light on this issue my aching brain would be most appreciative!
0
Comments
Just before I did this I did have the code working upon a reboot. The code would run and in diagnostics I would get notification every time the device recieved my query string and replied and with out doing anything else just watching the diagnostic window (no button pushes or anything) things would just start get slower and intermittent. I would periodically get a reference to memory just as it does in the beginning and I believe it shows me the remaining amount and then diagnostics would stop, my NS2 connection ends and then evrything is dead from an IP perspective. Serial strings are still sent and received but nothing doing with IP. It doesn't appear to time out any more or maybe it never did, I so burnt I don't know but I can't connect from NS2, can't send a new compiled program or debug and TP actions unless they're are self contained until rebooting again. Then it will run for a while and die again.
I sure if I connected my laptop via the comm port I would maintain comms. I guess that's my next move and/or putting all the AMX stuff and my PC on a seperate network that's clean and with out the numerous other non system network devices on it.
Has any one else had silimar or different IP issues that just seem to defy reason.
I haven't exactly poured over your code samples extensively, but what I don't see is some manner of intermediate state for the connection. Unless I just missed it, I don't see anything but a connected or disconnnected flag, and it is possible for it to attempt to open the socket while another connection is pending. You have a 60 second wait between connection attempts if the connection has not been established, but if your first attempt takes longer for that for any reason, it may still be pending when the second goes in. This will result in your "port in use" error. Likewise for a disconnect. It's not instant. There is an intermediate state between when you say to drop the conection and when it is actually dropped. I don't believe you can depend on the online and offline events tracking these either. Stack up a bunch of these and you have a cycle you will never break out of.
I've had the most success using online, offline, and onerror to set a state variable that includes values for intermediate connections, and outright "something is really screwed up, so never try again" (bad host, for example). In mainline I test for this variable and only try to connect if I have a clean disconnect state. If it's in one of the intermediate, error, or already connected states, nothing happens. I will put a delay on the connect just to allow things to settle down. I also have moved away from depending on the online event to ascertain if I am really connected. I wait until I get a valid response before clearing my intermediate state. Onerror will tell me if it timed out, so that would drop the flag back to disconnected.
So in your example unless you have 0:4:0,0:5:0 and 0:6:0 declared elsewhere the RSS and SoundWeb ports should be 4 and 5.
comments noted and will ammend. I not sure why I skip 2, 3 so on. I most likely pulled it from another file and just left it that way.
DHawthorne:
I understand and totally agree that I should set a flag when initiating a connection and not resend obviously if connected or until I get a time out or an error.
I've been connecting fine and I had no issues at all when on a different system but that different system is a small testing system on it's own network and nothing else. I've since removed the problem system from the main network and put on a subnet as recommended and ignored. I did notice a lot of broadcast traffic .255 being generated by the master (using Ethereal) which is a broadcast to everything on the network (some 40 devices). While reading up on subnets (which I did after I created it, wrong I might add.) I read that on large systems when packets are sent by different hosts at the same time those packets will collide and be destroyed and need to be resent. When resent they are randomly delayed to avoid another collision with the previous destroyed packet which will aslo be resent and so on. The system I removed the AMX gear from probably has 40 host devices, cameras, panels, PC, printers, NAS, servers. Alot of traffic, which can't be efficient if collisions are occurring and packets are being destroyed which need to be resent and so on. And when audio is streaming on the same network and cameras. I think from now on I'll be a firm believer and follower of the recommendation to use subnets, at least one just for the AMX gear. It's simple and makes sense. Well maybe not so simple, my little electricians brain (yes I am first and foremost an electrician) is still trying to comprehend some of this.
Since putting it on the subnet which still isn't right but is working and the program is working as well. Beats the F out of me if that was the problem or part of the problem or wasn't the problem at all but at least being on the subnet I can eliminate it as being the problem if it starts up again.
Thanks for your inputs!
my question to vining is, 'are there any DNS requests within your program' ?
does anything make a URL request instead of an IP number request ?
if you do have URLs, ping the name, and replace the name with the IP number for now (temporary work around, ie. no DNS support).
if there are no URL names, i'd like to know that too. and also, does your AMX network have an internet router attached (which may have had a connection error, you may need to check the router/modem logs) ?
you are not alone in having intermittent network errors. i (and others) have been reporting them for some time now with little action. i am asking you about your DNS setup because i know there is a fault there (when the timing is right/wrong). if you lose internet during a DNS lookup, your controller network will log-jam for at least 90 seconds, and the errors may impact other network connections.
as for the port numbering (just for the record), i use various ranges, from around 6 thru to 20, with gaps in between. any memory wasted is no big deal, there's plenty to spare (as far as i've seen so far).
Using Ethereal network protocol analyzer (free download if you don't already use) shows all the network traffic down to the packets and flags and there is a substantial amount of traffic being sent by domain name. Mostly from my laptop attempting to find my NAS wich is mapped as a network drive that's isn't getting resolved because it's not on this subnet since shifting things around but everything is working now except for my RSS which isn't finding the gateway. As I mentioned my subnet is still wrong and I'm not ready to dive in an fix because If I do this I want to do it right with the future needs in mind. I don't want to have reconfigure my static IP's, all the port forwarding, all the subnet masks, vpn's, etc. twice. Having to do it once is more than I want to deal with. So I need to keep reading and get a better understanding of what would be the best route. Should I stay on a class C network or shift to B, obviously B would be better but I could also do a combination of B & C and separate my wired network from my wireless as an invisible barrier to potential hackers.