how do you use IP_CLIENT_OPEN to get info from a website?

I am trying to goto a website to scrap a webscreen -- be it weather.com or yahoo finance to create a stock ticker, but when I use the IP_CLIENT_OPEN and then a GET, I get the error -

SendString 0:0:0 too long 528 truncated to 274.

So I just get bits and pieces of the webscreen I want to scrap. Is this a Firmware issue since I am using firmware - 2.32.148 ( I am using this old firmware becuase the newer one has DUET in which my Master hangs for about 45 seconds and I didnt want to wait that long while I compiled and uploaded)

DEFINE_START
create_buffer dvWeather, strWeatherBuff;



DATA_EVENT[dvWeather]
{
ONLINE:
{
//weather
SEND_STRING dvWeather,"'GET /forecastrss?p=48316 HTTP/1.0',13,10,13,10"
}
STRING:
{
SEND_STRING 0,"'STRING strWeatherBuff
',strWeatherBuff";
}//string
}


Anyone have any ideas to get al the info from the webpage without this error?
«134

Comments

  • Marc ScheibeinMarc Scheibein Junior Member Posts: 669
    This error is from the SEND_STRING to terminal only. You only can send 274 bytes in one string to terminal.
    Has nothing to do with the IP handling itself.

    The GET request seems not to be complete.
    This is a working string from a router:
    SEND_STRING dvIP_CLIENT,"'GET /cgi-bin/[email protected]&A=D1&rd=status HTTP/1.1',13,10,13,10,'Host: 192.168.0.254',13,10,13,10"
    
    The host is part of the GET, at it has to be the servers's IP.
    This worked fine on a NXI-ME v2.31, so I don't think it is a Duet issue.
  • DHawthorneDHawthorne Junior Member Posts: 4,584
    This error is from the SEND_STRING to terminal only. You only can send 274 bytes in one string to terminal.
    Has nothing to do with the IP handling itself.

    The GET request seems not to be complete.
    This is a working string from a router:
    SEND_STRING dvIP_CLIENT,"'GET /cgi-bin/[email protected]&A=D1&rd=status HTTP/1.1',13,10,13,10,'Host: 192.168.0.254',13,10,13,10"
    
    The host is part of the GET, at it has to be the servers's IP.
    This worked fine on a NXI-ME v2.31, so I don't think it is a Duet issue.

    I'm surprised that works; my understanding was that header data (HOST, etc.) were only to be separated from the GET line by a single CRLF pair, and doing two terminates the session. For example, from a Weatherbug session:
    		SEND_STRING dvRSS, "'GET /rss.aspx?zipcode=', sCity.zip,'&feed=curr,fcst&units=0&zcode=z4641 HTTP/1.0',$0d,$0a" 
    		SEND_STRING dvRSS, "'Host: feeds.weatherbug.com',$0d,$0a"
    		SEND_STRING dvRSS, "'Authorization: Basic',$0d,$0a"
    		SEND_STRING dvRSS, "'User-Agent: NetLinx_WEATHERBUG_RSS/',sVersion, $0d,$0a"
    		SEND_STRING dvRSS, "$0d,$0a"
    

    I have also found in my experiments with various HTTP hosts that some require a bit of header data, some don't require any. I generally capture the packets from a browser session to see what a browser sends to the site, then start dropping stuff out until it breaks.
  • jweatherjweather Junior Member Posts: 320
    DHawthorne wrote: »
    I'm surprised that works; my understanding was that header data (HOST, etc.) were only to be separated from the GET line by a single CRLF pair, and doing two terminates the session.

    Dave is correct, the example request that Marc gave is no different from the example request that John gave, since the CRLF+CRLF terminated the request before the Host: header was sent.

    Some sites will be happy with just a GET request, many will require a Host: header indicating the *name* of the site, like Host: www.amxforums.com (not the IP address), some need even more. Try using a firefox extension to snoop the headers your browser is sending, and start with that set. No harm in leaving extra ones in there even if they aren't needed.
  • JOHNBONZJOHNBONZ Junior Member Posts: 99
    Dave if I try this code u added:

    SEND_STRING dvRSS, "'GET /rss.aspx?zipcode=', sCity.zip,'&feed=curr,fcst&units=0&zcode=z4641 HTTP/1.0',$0d,$0a"
    SEND_STRING dvRSS, "'Host: feeds.weatherbug.com',$0d,$0a"
    SEND_STRING dvRSS, "'Authorization: Basic',$0d,$0a"
    SEND_STRING dvRSS, "'User-Agent: NetLinx_WEATHERBUG_RSS/',sVersion, $0d,$0a"
    SEND_STRING dvRSS, "$0d,$0a"

    in the data_event, STRING HANDLER, I will NOT see the error I was getting? In other words, I will see a stream of data from the the weatherBug site? I used 1 SEND_STRING that just had the GET but you use a GET and others. So back to my original question, why am I allowed only 274 bytes?

    Marc says this:
    This error is from the SEND_STRING to terminal only. You only can send 274 bytes in one string to terminal. Has nothing to do with the IP handling itself.
  • DHawthorneDHawthorne Junior Member Posts: 4,584
    Marc was correct about the error - you will still see it - you simply cannot send a string longer than274 characters to the debug terminal without getting that. It's harmless, you just can't see the whole string. What I do in such cases is write the strings to a file on the master and FTP it to look at what happened.
  • JOHNBONZJOHNBONZ Junior Member Posts: 99
    ahh that is a good idea. I will write the Strings to a file on the Master and then read that file in to parse out

    I will let you all know how this turns out.

    Thanks again!!
  • Chip MoodyChip Moody Junior Member Posts: 727
    If you want to save yourself from dealing with writing files, you could wait until the webserver has closed the connection, then fire a routine that spits the contents of your buffer variable to 0:0:0 bit by bit - like, a nice readable 60 chars at a time, maybe?

    The way you have it now, you're going to see a lot of repetition, since your buffer is being appended to each time you pass through the DATA_EVENT / STRING: handler.

    - Chip
  • JOHNBONZJOHNBONZ Junior Member Posts: 99
    Ok forgive me but now I am a little confused, when I execute the command

    volatile char strWeatherBuff[8192];


    1) IP_CLIENT_OPEN(dvStocks.PORT,'weather.yahooapis.com',80,IP_TCP); //established connection

    then I connection comes online and then goes to the ONLINE:

    2)
    ONLINE:
    {
    SEND_STRING dvWeather,"'GET /forecastrss?p=48316 HTTP/1.0',13,10,13,10" //get info
    }

    3)
    STRING:
    {
    SEND_STRING 0,"'STRING strWeatherBuff
    ',strWeatherBuff"; //print out buffer
    }//string
    }

    If I went to this webpage and there was , say, 2000 bytes of info on this page, will the GET command retrieve all this data at once?
    If so, will I have all this info in the buffer strWeatherBuff (if max size is 8k)?
    is the only problem the SEND_STRING 0 which is the Terminal server?

    If strWeatherBuff has all 2000 bytes then I can parse it out, but I am not sure it will be loaded will all the data based on the above.
  • DHawthorneDHawthorne Junior Member Posts: 4,584
    Yes, the GET command will retrieve it all at once - the HTTP server, at this point, will be acting just like you were a web browser requesting a page. You have to make sure your buffer can hold it all, but if it's big enough, you will just have to parse it (which is another adventure all it's own). Depending on what kind of header information the server supports, you may be able to get it in various formats or even sizes (like, for example, if they support a mobile browser version).
  • HedbergHedberg Junior Member Posts: 671
    Also, don't forget to clear your buffer. Otherwise, after a while, it may start looking a little familiar. When a Netlinx buffer fills up, it ignores further input, rather than discarding data at the beginning of the buffer the way Axcess did. That's my understanding, anyway.
  • DHawthorneDHawthorne Junior Member Posts: 4,584
    Hedberg wrote: »
    Also, don't forget to clear your buffer. Otherwise, after a while, it may start looking a little familiar. When a Netlinx buffer fills up, it ignores further input, rather than discarding data at the beginning of the buffer the way Axcess did. That's my understanding, anyway.

    Yes, it does behave like that. I generally remove each chunk of data as I parse as a matter of course; it didn't occur to me to mention it :).
  • Chip MoodyChip Moody Junior Member Posts: 727
    Don't forget that this doesn't mean that the retrieved data will ARRIVE via DATA_EVENT/STRING: all at once. Your DATA_EVENT/STRING: handler can/will trigger any number of times as data comes in. So my suggestion would be

    Clear your buffer
    Connect to the server
    Issue the GET
    Wait...
    Don't bother looking at the contents of the buffer until AFTER you get the OFFLINE event from the TCP/IP connection.

    If the server doesn't automatically disconnect after sending all the data, you could check the contents of DATA.TEXT on each DATA_EVENT/STRING: event for something like "/HTML" or something guaranteed to be at the end of the expected data, then use that as a cue to disconnect the TCP/IP connection from your side. If there's no absolute tag that sits at the end of the data, (rare) you could check the initial HTTP header as it comes in for a data size tag - use that to calculate how large you expect the buffer to get and close the connection when it gets to that size. (Don't forget a timeout to be safe) The size will represent the number of bytes in the buffer PAST the end of the HTTP header, I.E., after the first occurrence of a double carriage return/linefeed pair.

    - Chip

    DHawthorne wrote: »
    Yes, the GET command will retrieve it all at once
  • viningvining X Member Posts: 4,354
    Chip Moody wrote:
    you could check the contents of DATA.TEXT on each DATA_EVENT/STRING: event for something like "/HTML"
    I don't think I would try DATA.TEXT for this but use a created buffer instead. Isn't there some limitation for the lenght of strings that can be retrieved via DATA.TEXT. I like data.text for virtual devices but real devices should get a buffer IMHO.

    I would also suggest not using the String Data handler either and just put a line in define_program like:
    if(find_string(cBuffer,'</HTML>',1)
         {
         fnPasreMyBuffer() ;
         }
    
    When you get your ending tag or whatever esle determines a full string has been delivered, then parse it.
  • DHawthorneDHawthorne Junior Member Posts: 4,584
    vining wrote: »
    Chip Moody wrote:

    I don't think I would try DATA.TEXT for this but use a created buffer instead. Isn't there some limitation for the lenght of strings that can be retrieved via DATA.TEXT. I like data.text for virtual devices but real devices should get a buffer IMHO.

    I would also suggest not using the String Data handler either and just put a line in define_program like:
    if(find_string(cBuffer,'</HTML>',1)
         {
         fnPasreMyBuffer() ;
         }
    
    When you get your ending tag or whatever esle determines a full string has been delivered, then parse it.

    I entirely ignore the STRING event handler for something like this. I create a buffer on the port, then parse the buffer on the OFFLINE handler, using a WHILE(FIND_STRING(sBuffer, "'>'", 1)) to break it into manageable chunks.
  • Chip MoodyChip Moody Junior Member Posts: 727
    Personally, I think putting in a FIND_STRING test for a potentially larger buffer full of characters in MAINLINE is insane and just a plain bad idea for needlessly chewing up clock cycles - but to each their own...

    Yes - normally there's not much need to do things in DATA_EVENT / STRING: besides calling your parsing routine, but here's the thinking behind what I suggested:

    If you've got a buffer - a potentially large one if it's getting data from a web server - where the STRING: handler is getting hit many times as the buffer fills, AND you're looking for something that is going to appear at the END of that buffer to indicate that you've got your complete document - WHY would you want to call a routine every time that does a FIND_STRING for that data, and looks in that potentially large buffer for it?

    With every hit on the STRING: handler, DATA.TEXT contains the new block of data that is being appended to the buffer - not the entire contents of the buffer itself. On subsequent passes, DATA.TEXT is going to be different every time - it gets replaced with JUST the data that caused the event.

    The point is, after the first triggering of the STRING: handler, your buffer grows larger, while DATA.TEXT will ALWAYS be smaller in size than the buffer.

    Which one will be more efficient to search for your target string in?

    - Chip
  • HedbergHedberg Junior Member Posts: 671
    Chip Moody wrote: »

    Which one will be more efficient to search for your target string in?

    - Chip

    well, the buffer is guaranteed to contain the string (assuming it has arrived and the buffer is sufficiently large) while data.text is not. So, I guess it depends on how you define "efficient." My wag is that waiting for the connection to be closed and then initiating a search from the offline handler (as DH suggests) probably saves at least a couple ms. It doesn't take very long to search for something in RAM.
  • viningvining X Member Posts: 4,354
    DHawthorne wrote:
    then parse the buffer on the OFFLINE handler,
    Ya know, that's a pretty darn good idea on connections that will automatically be closed by the server when it's done sending. What's the delay like before the offline event is triggered? Obviously most connections of this type aren't time sensitive so small delay wouldn't be objectionable.

    Chip Moody wrote:
    DATA.TEXT contains the new block of data that is being appended to the buffer - not the entire contents of the buffer itself. On subsequent passes, DATA.TEXT is going to be different every time - it gets replaced with JUST the data that caused the event.
    I hear what you're saying and I like your logic.

    Question. Since I'm not a big DATA.TEXT user and I recall issues with the lenght of strings it can handle is it possible that using DATA.TEXT might miss the ending tag your looking for or other chunks of the strings?
  • Chip MoodyChip Moody Junior Member Posts: 727
    Hedberg wrote: »
    well, the buffer is guaranteed to contain the string (assuming it has arrived and the buffer is sufficiently large) while data.text is not.

    DATA.TEXT will - over the course of data coming in from a source - RS232 or IP - contain everything that winds up in the buffer at one point or another. Have you been told otherwise?

    This is way over-simplifying, but if a remote device sends "ABCDEFGHIJKLMNOPQRSTUVWXYZ", it could come in smaller chunks via the Netlinx OS and cause multiple triggers of the STRING: handler. For arguments sake, lets say it comes in causing the STRING event to trigger four times.

    If you cleared your buffer before the start of data coming in, your buffer (and we're talking about a CREATE_BUFFER buffer here) will be like this when all is said and done:

    MyBuff = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

    WHILE the data coming in is gathered by the OS/hardware, however - DATA.TEXT will show you what's coming in on each trigger of the STRING: handler, like this for example:

    STRING: handler hit #1 - DATA.TEXT = "ABCDEFGH"
    STRING: handler hit #2 - DATA.TEXT = "IJKLMN"
    STRING: handler hit #3 - DATA.TEXT = "OPQRSTUV"
    STRING: handler hit #4 - DATA.TEXT = "WXYZ"

    So if you want to set up your code to look for "XYZ" as a cue for "okay, I got everything I expected, having each hit on the STRING: handler look in DATA.TEXT (which in this case is never more than 8 characters long) for "XYZ" is more efficient than doing this each time:

    STRING: handler hit #1 - MyBuff = "ABCDEFGH" No match for "XYZ".
    STRING: handler hit #2 - MyBuff = "ABCDEFGHIJKLMN" No match for "XYZ".
    STRING: handler hit #3 - MyBuff = "ABCDEFGHIJKLMNOPQRSTUV" No match for "XYZ".
    STRING: handler hit #4 - MyBuff = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" Found "XYZ" - we're done!

    Hedberg wrote: »
    So, I guess it depends on how you define "efficient." My wag is that waiting for the connection to be closed and then initiating a search from the offline handler (as DH suggests) probably saves at least a couple ms. It doesn't take very long to search for something in RAM.

    Absolutely - if you look a few posts back you'll see that's what I suggested as well. I mentioned the monitoring of DATA.TEXT for some kind of end tag in the rare instances that a server may not disconnect after sending it's data...

    - Chip
  • Chip MoodyChip Moody Junior Member Posts: 727
    vining wrote: »
    Ya know, that's a pretty darn good idea on connections that will automatically be closed by the server when it's done sending. What's the delay like before the offline event is triggered? Obviously most connections of this type aren't time sensitive so small delay wouldn't be objectionable.

    I've never set up stopwatch code to test it, but it's seemed pretty instantaneous to me...

    vining wrote: »
    I hear what you're saying and I like your logic.

    Question. Since I'm not a big DATA.TEXT user and I recall issues with the lenght of strings it can handle is it possible that using DATA.TEXT might miss the ending tag your looking for or other chunks of the strings?

    I have always been told by AMX employees (training, tech support, programmers, etc) that whatever comes in to your buffer will be trapped by DATA.TEXT at one point during the process, and no - it doesn't drop characters. On the flip side to that, I've never experienced otherwise, and I've never been told by anyone at AMX to expect behavior other than that from DATA.TEXT.

    On a similar token, remember that one method of building your buffer (if you didn't declare it in CREATE_BUFFER) is to do MyBuff = "MyBuff,DATA.TEXT" in your STRING: handler. If DATA.TEXT couldn't be relied on to see every byte of data that comes in, you wouldn't be able to do that. (And then there wouldn't be much purpose to have it in the first place!) :)

    - Chip
  • ericmedleyericmedley Senior Member - 4000+ posts Posts: 4,166
    Chip Moody wrote: »
    I've never set up stopwatch code to test it, but it's seemed pretty instantaneous to me...




    I have always been told by AMX employees (training, tech support, programmers, etc) that whatever comes in to your buffer will be trapped by DATA.TEXT at one point during the process, and no - it doesn't drop characters. On the flip side to that, I've never experienced otherwise, and I've never been told by anyone at AMX to expect behavior other than that from DATA.TEXT.

    On a similar token, remember that one method of building your buffer (if you didn't declare it in CREATE_BUFFER) is to do MyBuff = "MyBuff,DATA.TEXT" in your STRING: handler. If DATA.TEXT couldn't be relied on to see every byte of data that comes in, you wouldn't be able to do that. (And then there wouldn't be much purpose to have it in the first place!) :)

    - Chip

    There is one small issue that can cause problems with DATA.TEXT when it comes to web pages. There is a limit to the size of DATA.TEXT that can potentially cause problems with web pages. DATA.TEXT is limited to 16K. So web pages that go over this size can cause problems.

    Also, the behavior of D.T is somewhat different as well.

    When data from CREATE_BUFFER is full it's FIFO. So the data coming pushes the first stuff out. In DATA.TEXT, once it's full, it's just full and it ignores anything over the 16K.

    One other thing that is nothing I can verify, but have experienced...

    If the data coming in is a little stuttery, I've found that the DATA_EVENT will fire if it thinks the message is over. However, CREATE_BUFFER seems to be a bit more patient and seems to wait a bit longer to see if the entire message is done.
  • HedbergHedberg Junior Member Posts: 671
    Chip Moody wrote: »
    DATA.TEXT will - over the course of data coming in from a source - RS232 or IP - contain everything that winds up in the buffer at one point or another. Have you been told otherwise?

    No, and I didn't suggest otherwise. As you go on to explain, data.text is not guaranteed to contain any particular string, and that's all I said. Look at your explanation and consider the case where the string you are looking for is 'UVWXYZ'. If the data comes in to data.text as in the example, you're screwed. I don't know of any reason why data.text is guaranteed to contain any two contiguous characters so I don't see that it's guaranteed to ever contain 'XYZ' either.
  • HedbergHedberg Junior Member Posts: 671
    ericmedley wrote: »

    Also, the behavior of D.T is somewhat different as well.

    When data from CREATE_BUFFER is full it's FIFO. So the data coming pushes the first stuff out. In DATA.TEXT, once it's full, it's just full and it ignores anything over the 16K.
    .

    My understanding is that in Netlinx, when a any buffer is full, any data that the program attempts to add is discarded. This is different from Axcess where data was added to the end of the buffer and discarded from the beginning.
  • ericmedleyericmedley Senior Member - 4000+ posts Posts: 4,166
    Hedberg wrote: »
    My understanding is that in Netlinx, when a any buffer is full, any data that the program attempts to add is discarded. This is different from Axcess where data was added to the end of the buffer and discarded from the beginning.

    That's one of the discussion we had at this week's PROG III class in Dallas. The information I related was from Nick. My previous understanding was the same as yours. Axcess was FIFO / Netlinx was nothing-after-full.

    Perhaps I will run a quick test of the two when I get a chance. It would be helpful to know.
  • HedbergHedberg Junior Member Posts: 671
    ericmedley wrote: »
    That's one of the discussion we had at this week's PROG III class in Dallas. The information I related was from Nick. My previous understanding was the same as yours. Axcess was FIFO / Netlinx was nothing-after-full.

    I've experienced Netlinx buffers filling up with trash put out by a couple different devices (one a Polycom Vortex mixer and one a ClearOne mixer). It looks to me like the Netlinx buffers fill up and stop accepting new bytes. But, if Nick says otherwise, it's probably worth a re-check. He knows a heck of a lot more than I do.
  • shr00m-dewshr00m-dew Junior Member Posts: 394
    Nick pretty much said that this was one benefit to CREATE_BUFFER, it behaves like Axcess did. He also said he didn't know why there was an aversion to CREATE_BUFFER as there's no real downside and more things get handled for you.

    Kevin D.
  • viningvining X Member Posts: 4,354
    shr00m-dew wrote:
    He also said he didn't know why there was an aversion to CREATE_BUFFER
    I don't know if I would say there's an adverson to creating a buffer per se. I just think most folks find it easier to use DATA.TEXT but more so I think it's becasue that's what they see in AMX modules bewteen their UI and Com programs and if that's how they do it, it's got to be the better way.

    I don't know why or how this happened but in my mind I have it that DATA.TEXT is used in module because comms to virutuals are usualy short strings and their timing is a known constant. They always send a complete command or string and always trigger a single string or command event per sent string or command. I also have in my head that these strings aren't reliable above say 256 character sort of like sending text to a variable text address. Like I said I don't know how I got this stuck in my head but it's been in there for a long time and I think re-inforced over the years from some one some where. I either read it or heard it, maybe I just made it up, I don't know. So I've alway created buffers except when talking to virtuals.

    With the exception of using DATA.TEXT as Chip suggested to keep down the lenght of the string my find_string has to search through I don't see a reason to change anything but I am interested to see if any one else has been brain washed to have an adversion to DATA.TEXT like I have.
  • HedbergHedberg Junior Member Posts: 671
    shr00m-dew wrote: »
    Nick pretty much said that this was one benefit to CREATE_BUFFER, it behaves like Axcess did. He also said he didn't know why there was an aversion to CREATE_BUFFER as there's no real downside and more things get handled for you.

    Kevin D.

    Back in 2003, they told us in Programmer 2 that Netlinx buffers were different, but it sure looks like a buffer tied directly to a device with create_buffer works as it did in Axcess. I always thought that create_buffer was an anachronism without a good reason to be used, clearly, I was wrong.
  • Chip MoodyChip Moody Junior Member Posts: 727
    Hedberg wrote: »
    No, and I didn't suggest otherwise. As you go on to explain, data.text is not guaranteed to contain any particular string, and that's all I said. Look at your explanation and consider the case where the string you are looking for is 'UVWXYZ'. If the data comes in to data.text as in the example, you're screwed. I don't know of any reason why data.text is guaranteed to contain any two contiguous characters so I don't see that it's guaranteed to ever contain 'XYZ' either.

    Ack. What he said.

    Brainfart here - I was carrying over the idea of a single byte "delimiter" to be searched for, and obviously (well, apparently not obvious to me) that goes right out the window when you move to working with delimiting characters.

    My bad...

    So maybe in the effort to making a search like this more efficient on a potentially large buffer, it could be constrained by just checking the last "x" number of bytes in the buffer with a RIGHT_STRING. I'm just thinking out loud here...

    - Chip
  • viningvining X Member Posts: 4,354
    I talked with the instructor at the NYC P3 training today who informed that the max string that data.text can handle at one time is 2048. He also re-affirmed that DATA.TEXT acts like an array and once it's full it doesn't accept any more data into it while as previously mentioned a created buffer is a FIFO and if the string is larger than the buffer, data at the begining of the buffer is pushed out the front door as the new data loads in through the back door.

    This does indeed make DATA.TEXT a poor choice for holding "most" HTTP data for parsing but as Chip suggested would work fine for searching for the string required to trigger parsing the data contained in the created buffer in order to save a few clock cycles.
  • a_riot42a_riot42 AMX Wizard Posts: 1,619
    vining wrote: »
    This does indeed make DATA.TEXT a poor choice for holding "most" HTTP data for parsing but as Chip suggested would work fine for searching for the string required to trigger parsing the data contained in the created buffer in order to save a few clock cycles.

    I don't follow what the problem with data.text is. It's a buffer like any other and its up to the programmer to manipulate it to function correctly isn't it? If it overflows and data is lost then that is the programmers fault.

    Someone said that create_buffer is a FIFO buffer. But data.text is as well if you read it as it comes in. The docs say use either data.text or create_buffer so I wonder if they are implemented exactly the same and create_buffer is just there for legacy reasons.
    Paul
Sign In or Register to comment.