how do you use IP_CLIENT_OPEN to get info from a website?

JOHNBONZ · July 2008

I am trying to goto a website to scrap a webscreen -- be it weather.com or yahoo finance to create a stock ticker, but when I use the IP_CLIENT_OPEN and then a GET, I get the error -

SendString 0:0:0 too long 528 truncated to 274.

So I just get bits and pieces of the webscreen I want to scrap. Is this a Firmware issue since I am using firmware - 2.32.148 ( I am using this old firmware becuase the newer one has DUET in which my Master hangs for about 45 seconds and I didnt want to wait that long while I compiled and uploaded)

DEFINE_START
create_buffer dvWeather, strWeatherBuff;

DATA_EVENT[dvWeather]
{
ONLINE:
{
//weather
SEND_STRING dvWeather,"'GET /forecastrss?p=48316 HTTP/1.0',13,10,13,10"
}
STRING:
{
SEND_STRING 0,"'STRING strWeatherBuff

',strWeatherBuff";
}//string
}

Anyone have any ideas to get al the info from the webpage without this error?

Marc Scheibein · July 2008

This error is from the SEND_STRING to terminal only. You only can send 274 bytes in one string to terminal.
Has nothing to do with the IP handling itself.

The GET request seems not to be complete.
This is a working string from a router:

SEND_STRING dvIP_CLIENT,"'GET /cgi-bin/dial?rc=@&A=D1&rd=status HTTP/1.1',13,10,13,10,'Host: 192.168.0.254',13,10,13,10"

The host is part of the GET, at it has to be the servers's IP.
This worked fine on a NXI-ME v2.31, so I don't think it is a Duet issue.

DHawthorne · July 2008

Marc Scheibein wrote: »
This error is from the SEND_STRING to terminal only. You only can send 274 bytes in one string to terminal.
Has nothing to do with the IP handling itself.

The GET request seems not to be complete.
This is a working string from a router:
SEND_STRING dvIP_CLIENT,"'GET /cgi-bin/dial?rc=@&A=D1&rd=status HTTP/1.1',13,10,13,10,'Host: 192.168.0.254',13,10,13,10"
The host is part of the GET, at it has to be the servers's IP.
This worked fine on a NXI-ME v2.31, so I don't think it is a Duet issue.

I'm surprised that works; my understanding was that header data (HOST, etc.) were only to be separated from the GET line by a single CRLF pair, and doing two terminates the session. For example, from a Weatherbug session:

		SEND_STRING dvRSS, "'GET /rss.aspx?zipcode=', sCity.zip,'&feed=curr,fcst&units=0&zcode=z4641 HTTP/1.0',$0d,$0a" 
		SEND_STRING dvRSS, "'Host: feeds.weatherbug.com',$0d,$0a"
		SEND_STRING dvRSS, "'Authorization: Basic',$0d,$0a"
		SEND_STRING dvRSS, "'User-Agent: NetLinx_WEATHERBUG_RSS/',sVersion, $0d,$0a"
		SEND_STRING dvRSS, "$0d,$0a"

I have also found in my experiments with various HTTP hosts that some require a bit of header data, some don't require any. I generally capture the packets from a browser session to see what a browser sends to the site, then start dropping stuff out until it breaks.

jweather · July 2008

DHawthorne wrote: »

I'm surprised that works; my understanding was that header data (HOST, etc.) were only to be separated from the GET line by a single CRLF pair, and doing two terminates the session.

Dave is correct, the example request that Marc gave is no different from the example request that John gave, since the CRLF+CRLF terminated the request before the Host: header was sent.

Some sites will be happy with just a GET request, many will require a Host: header indicating the *name* of the site, like Host: www.amxforums.com (not the IP address), some need even more. Try using a firefox extension to snoop the headers your browser is sending, and start with that set. No harm in leaving extra ones in there even if they aren't needed.

JOHNBONZ · July 2008

Dave if I try this code u added:

SEND_STRING dvRSS, "'GET /rss.aspx?zipcode=', sCity.zip,'&feed=curr,fcst&units=0&zcode=z4641 HTTP/1.0',$0d,$0a"
SEND_STRING dvRSS, "'Host: feeds.weatherbug.com',$0d,$0a"
SEND_STRING dvRSS, "'Authorization: Basic',$0d,$0a"
SEND_STRING dvRSS, "'User-Agent: NetLinx_WEATHERBUG_RSS/',sVersion, $0d,$0a"
SEND_STRING dvRSS, "$0d,$0a"

in the data_event, STRING HANDLER, I will NOT see the error I was getting? In other words, I will see a stream of data from the the weatherBug site? I used 1 SEND_STRING that just had the GET but you use a GET and others. So back to my original question, why am I allowed only 274 bytes?

Marc says this:
This error is from the SEND_STRING to terminal only. You only can send 274 bytes in one string to terminal. Has nothing to do with the IP handling itself.

DHawthorne · July 2008

Marc was correct about the error - you will still see it - you simply cannot send a string longer than274 characters to the debug terminal without getting that. It's harmless, you just can't see the whole string. What I do in such cases is write the strings to a file on the master and FTP it to look at what happened.

JOHNBONZ · July 2008

ahh that is a good idea. I will write the Strings to a file on the Master and then read that file in to parse out

I will let you all know how this turns out.

Thanks again!!

Chip Moody · July 2008

If you want to save yourself from dealing with writing files, you could wait until the webserver has closed the connection, then fire a routine that spits the contents of your buffer variable to 0:0:0 bit by bit - like, a nice readable 60 chars at a time, maybe?

The way you have it now, you're going to see a lot of repetition, since your buffer is being appended to each time you pass through the DATA_EVENT / STRING: handler.

- Chip

JOHNBONZ · July 2008

Ok forgive me but now I am a little confused, when I execute the command

volatile char strWeatherBuff[8192];

1) IP_CLIENT_OPEN(dvStocks.PORT,'weather.yahooapis.com',80,IP_TCP); //established connection

then I connection comes online and then goes to the ONLINE:

2)
ONLINE:
{
SEND_STRING dvWeather,"'GET /forecastrss?p=48316 HTTP/1.0',13,10,13,10" //get info
}

3)
STRING:
{
SEND_STRING 0,"'STRING strWeatherBuff

',strWeatherBuff"; //print out buffer
}//string
}

If I went to this webpage and there was , say, 2000 bytes of info on this page, will the GET command retrieve all this data at once?
If so, will I have all this info in the buffer strWeatherBuff (if max size is 8k)?
is the only problem the SEND_STRING 0 which is the Terminal server?

If strWeatherBuff has all 2000 bytes then I can parse it out, but I am not sure it will be loaded will all the data based on the above.

DHawthorne · July 2008

Yes, the GET command will retrieve it all at once - the HTTP server, at this point, will be acting just like you were a web browser requesting a page. You have to make sure your buffer can hold it all, but if it's big enough, you will just have to parse it (which is another adventure all it's own). Depending on what kind of header information the server supports, you may be able to get it in various formats or even sizes (like, for example, if they support a mobile browser version).

Hedberg · July 2008

Also, don't forget to clear your buffer. Otherwise, after a while, it may start looking a little familiar. When a Netlinx buffer fills up, it ignores further input, rather than discarding data at the beginning of the buffer the way Axcess did. That's my understanding, anyway.

DHawthorne · July 2008

Hedberg wrote: »

Also, don't forget to clear your buffer. Otherwise, after a while, it may start looking a little familiar. When a Netlinx buffer fills up, it ignores further input, rather than discarding data at the beginning of the buffer the way Axcess did. That's my understanding, anyway.

Yes, it does behave like that. I generally remove each chunk of data as I parse as a matter of course; it didn't occur to me to mention it

.

Chip Moody · August 2008

Don't forget that this doesn't mean that the retrieved data will ARRIVE via DATA_EVENT/STRING: all at once. Your DATA_EVENT/STRING: handler can/will trigger any number of times as data comes in. So my suggestion would be

Clear your buffer
Connect to the server
Issue the GET
Wait...
Don't bother looking at the contents of the buffer until AFTER you get the OFFLINE event from the TCP/IP connection.

If the server doesn't automatically disconnect after sending all the data, you could check the contents of DATA.TEXT on each DATA_EVENT/STRING: event for something like "/HTML" or something guaranteed to be at the end of the expected data, then use that as a cue to disconnect the TCP/IP connection from your side. If there's no absolute tag that sits at the end of the data, (rare) you could check the initial HTTP header as it comes in for a data size tag - use that to calculate how large you expect the buffer to get and close the connection when it gets to that size. (Don't forget a timeout to be safe) The size will represent the number of bytes in the buffer PAST the end of the HTTP header, I.E., after the first occurrence of a double carriage return/linefeed pair.

- Chip

DHawthorne wrote: »

Yes, the GET command will retrieve it all at once

vining · August 2008

Chip Moody wrote:

you could check the contents of DATA.TEXT on each DATA_EVENT/STRING: event for something like "/HTML"

I don't think I would try DATA.TEXT for this but use a created buffer instead. Isn't there some limitation for the lenght of strings that can be retrieved via DATA.TEXT. I like data.text for virtual devices but real devices should get a buffer IMHO.

I would also suggest not using the String Data handler either and just put a line in define_program like:

if(find_string(cBuffer,'</HTML>',1)
     {
     fnPasreMyBuffer() ;
     }

When you get your ending tag or whatever esle determines a full string has been delivered, then parse it.

DHawthorne · August 2008

vining wrote: »
Chip Moody wrote:

I don't think I would try DATA.TEXT for this but use a created buffer instead. Isn't there some limitation for the lenght of strings that can be retrieved via DATA.TEXT. I like data.text for virtual devices but real devices should get a buffer IMHO.

I would also suggest not using the String Data handler either and just put a line in define_program like:
if(find_string(cBuffer,'</HTML>',1)
     {
     fnPasreMyBuffer() ;
     }
When you get your ending tag or whatever esle determines a full string has been delivered, then parse it.

I entirely ignore the STRING event handler for something like this. I create a buffer on the port, then parse the buffer on the OFFLINE handler, using a WHILE(FIND_STRING(sBuffer, "'>'", 1)) to break it into manageable chunks.

Chip Moody · August 2008

Personally, I think putting in a FIND_STRING test for a potentially larger buffer full of characters in MAINLINE is insane and just a plain bad idea for needlessly chewing up clock cycles - but to each their own...

Yes - normally there's not much need to do things in DATA_EVENT / STRING: besides calling your parsing routine, but here's the thinking behind what I suggested:

If you've got a buffer - a potentially large one if it's getting data from a web server - where the STRING: handler is getting hit many times as the buffer fills, AND you're looking for something that is going to appear at the END of that buffer to indicate that you've got your complete document - WHY would you want to call a routine every time that does a FIND_STRING for that data, and looks in that potentially large buffer for it?

With every hit on the STRING: handler, DATA.TEXT contains the new block of data that is being appended to the buffer - not the entire contents of the buffer itself. On subsequent passes, DATA.TEXT is going to be different every time - it gets replaced with JUST the data that caused the event.

The point is, after the first triggering of the STRING: handler, your buffer grows larger, while DATA.TEXT will ALWAYS be smaller in size than the buffer.

Which one will be more efficient to search for your target string in?

- Chip

Hedberg · August 2008

Chip Moody wrote: »

Which one will be more efficient to search for your target string in?

- Chip

well, the buffer is guaranteed to contain the string (assuming it has arrived and the buffer is sufficiently large) while data.text is not. So, I guess it depends on how you define "efficient." My wag is that waiting for the connection to be closed and then initiating a search from the offline handler (as DH suggests) probably saves at least a couple ms. It doesn't take very long to search for something in RAM.

vining · August 2008

DHawthorne wrote:

then parse the buffer on the OFFLINE handler,

Ya know, that's a pretty darn good idea on connections that will automatically be closed by the server when it's done sending. What's the delay like before the offline event is triggered? Obviously most connections of this type aren't time sensitive so small delay wouldn't be objectionable.

Chip Moody wrote:

DATA.TEXT contains the new block of data that is being appended to the buffer - not the entire contents of the buffer itself. On subsequent passes, DATA.TEXT is going to be different every time - it gets replaced with JUST the data that caused the event.

I hear what you're saying and I like your logic.

Question. Since I'm not a big DATA.TEXT user and I recall issues with the lenght of strings it can handle is it possible that using DATA.TEXT might miss the ending tag your looking for or other chunks of the strings?

Chip Moody · August 2008

Hedberg wrote: »

well, the buffer is guaranteed to contain the string (assuming it has arrived and the buffer is sufficiently large) while data.text is not.

DATA.TEXT will - over the course of data coming in from a source - RS232 or IP - contain everything that winds up in the buffer at one point or another. Have you been told otherwise?

This is way over-simplifying, but if a remote device sends "ABCDEFGHIJKLMNOPQRSTUVWXYZ", it could come in smaller chunks via the Netlinx OS and cause multiple triggers of the STRING: handler. For arguments sake, lets say it comes in causing the STRING event to trigger four times.

If you cleared your buffer before the start of data coming in, your buffer (and we're talking about a CREATE_BUFFER buffer here) will be like this when all is said and done:

MyBuff = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

WHILE the data coming in is gathered by the OS/hardware, however - DATA.TEXT will show you what's coming in on each trigger of the STRING: handler, like this for example:

STRING: handler hit #1 - DATA.TEXT = "ABCDEFGH"
STRING: handler hit #2 - DATA.TEXT = "IJKLMN"
STRING: handler hit #3 - DATA.TEXT = "OPQRSTUV"
STRING: handler hit #4 - DATA.TEXT = "WXYZ"

So if you want to set up your code to look for "XYZ" as a cue for "okay, I got everything I expected, having each hit on the STRING: handler look in DATA.TEXT (which in this case is never more than 8 characters long) for "XYZ" is more efficient than doing this each time:

STRING: handler hit #1 - MyBuff = "ABCDEFGH" No match for "XYZ".
STRING: handler hit #2 - MyBuff = "ABCDEFGHIJKLMN" No match for "XYZ".
STRING: handler hit #3 - MyBuff = "ABCDEFGHIJKLMNOPQRSTUV" No match for "XYZ".
STRING: handler hit #4 - MyBuff = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" Found "XYZ" - we're done!

Hedberg wrote: »

So, I guess it depends on how you define "efficient." My wag is that waiting for the connection to be closed and then initiating a search from the offline handler (as DH suggests) probably saves at least a couple ms. It doesn't take very long to search for something in RAM.

Absolutely - if you look a few posts back you'll see that's what I suggested as well. I mentioned the monitoring of DATA.TEXT for some kind of end tag in the rare instances that a server may not disconnect after sending it's data...

- Chip

Chip Moody · August 2008

vining wrote: »

Ya know, that's a pretty darn good idea on connections that will automatically be closed by the server when it's done sending. What's the delay like before the offline event is triggered? Obviously most connections of this type aren't time sensitive so small delay wouldn't be objectionable.

I've never set up stopwatch code to test it, but it's seemed pretty instantaneous to me...

vining wrote: »

I hear what you're saying and I like your logic.

Question. Since I'm not a big DATA.TEXT user and I recall issues with the lenght of strings it can handle is it possible that using DATA.TEXT might miss the ending tag your looking for or other chunks of the strings?

I have always been told by AMX employees (training, tech support, programmers, etc) that whatever comes in to your buffer will be trapped by DATA.TEXT at one point during the process, and no - it doesn't drop characters. On the flip side to that, I've never experienced otherwise, and I've never been told by anyone at AMX to expect behavior other than that from DATA.TEXT.

On a similar token, remember that one method of building your buffer (if you didn't declare it in CREATE_BUFFER) is to do MyBuff = "MyBuff,DATA.TEXT" in your STRING: handler. If DATA.TEXT couldn't be relied on to see every byte of data that comes in, you wouldn't be able to do that. (And then there wouldn't be much purpose to have it in the first place!)

- Chip

ericmedley · August 2008

Chip Moody wrote: »

I've never set up stopwatch code to test it, but it's seemed pretty instantaneous to me...

I have always been told by AMX employees (training, tech support, programmers, etc) that whatever comes in to your buffer will be trapped by DATA.TEXT at one point during the process, and no - it doesn't drop characters. On the flip side to that, I've never experienced otherwise, and I've never been told by anyone at AMX to expect behavior other than that from DATA.TEXT.

On a similar token, remember that one method of building your buffer (if you didn't declare it in CREATE_BUFFER) is to do MyBuff = "MyBuff,DATA.TEXT" in your STRING: handler. If DATA.TEXT couldn't be relied on to see every byte of data that comes in, you wouldn't be able to do that. (And then there wouldn't be much purpose to have it in the first place!)

- Chip

There is one small issue that can cause problems with DATA.TEXT when it comes to web pages. There is a limit to the size of DATA.TEXT that can potentially cause problems with web pages. DATA.TEXT is limited to 16K. So web pages that go over this size can cause problems.

Also, the behavior of D.T is somewhat different as well.

When data from CREATE_BUFFER is full it's FIFO. So the data coming pushes the first stuff out. In DATA.TEXT, once it's full, it's just full and it ignores anything over the 16K.

One other thing that is nothing I can verify, but have experienced...

If the data coming in is a little stuttery, I've found that the DATA_EVENT will fire if it thinks the message is over. However, CREATE_BUFFER seems to be a bit more patient and seems to wait a bit longer to see if the entire message is done.

Hedberg · August 2008

Chip Moody wrote: »

DATA.TEXT will - over the course of data coming in from a source - RS232 or IP - contain everything that winds up in the buffer at one point or another. Have you been told otherwise?

No, and I didn't suggest otherwise. As you go on to explain, data.text is not guaranteed to contain any particular string, and that's all I said. Look at your explanation and consider the case where the string you are looking for is 'UVWXYZ'. If the data comes in to data.text as in the example, you're screwed. I don't know of any reason why data.text is guaranteed to contain any two contiguous characters so I don't see that it's guaranteed to ever contain 'XYZ' either.

Hedberg · August 2008

ericmedley wrote: »

Also, the behavior of D.T is somewhat different as well.

When data from CREATE_BUFFER is full it's FIFO. So the data coming pushes the first stuff out. In DATA.TEXT, once it's full, it's just full and it ignores anything over the 16K.
.

My understanding is that in Netlinx, when a any buffer is full, any data that the program attempts to add is discarded. This is different from Axcess where data was added to the end of the buffer and discarded from the beginning.

ericmedley · August 2008

Hedberg wrote: »

My understanding is that in Netlinx, when a any buffer is full, any data that the program attempts to add is discarded. This is different from Axcess where data was added to the end of the buffer and discarded from the beginning.

That's one of the discussion we had at this week's PROG III class in Dallas. The information I related was from Nick. My previous understanding was the same as yours. Axcess was FIFO / Netlinx was nothing-after-full.

Perhaps I will run a quick test of the two when I get a chance. It would be helpful to know.

Hedberg · August 2008

ericmedley wrote: »

That's one of the discussion we had at this week's PROG III class in Dallas. The information I related was from Nick. My previous understanding was the same as yours. Axcess was FIFO / Netlinx was nothing-after-full.

I've experienced Netlinx buffers filling up with trash put out by a couple different devices (one a Polycom Vortex mixer and one a ClearOne mixer). It looks to me like the Netlinx buffers fill up and stop accepting new bytes. But, if Nick says otherwise, it's probably worth a re-check. He knows a heck of a lot more than I do.

shr00m-dew · August 2008

Nick pretty much said that this was one benefit to CREATE_BUFFER, it behaves like Axcess did. He also said he didn't know why there was an aversion to CREATE_BUFFER as there's no real downside and more things get handled for you.

Kevin D.

vining · August 2008

shr00m-dew wrote:

He also said he didn't know why there was an aversion to CREATE_BUFFER

I don't know if I would say there's an adverson to creating a buffer per se. I just think most folks find it easier to use DATA.TEXT but more so I think it's becasue that's what they see in AMX modules bewteen their UI and Com programs and if that's how they do it, it's got to be the better way.

I don't know why or how this happened but in my mind I have it that DATA.TEXT is used in module because comms to virutuals are usualy short strings and their timing is a known constant. They always send a complete command or string and always trigger a single string or command event per sent string or command. I also have in my head that these strings aren't reliable above say 256 character sort of like sending text to a variable text address. Like I said I don't know how I got this stuck in my head but it's been in there for a long time and I think re-inforced over the years from some one some where. I either read it or heard it, maybe I just made it up, I don't know. So I've alway created buffers except when talking to virtuals.

With the exception of using DATA.TEXT as Chip suggested to keep down the lenght of the string my find_string has to search through I don't see a reason to change anything but I am interested to see if any one else has been brain washed to have an adversion to DATA.TEXT like I have.

Hedberg · August 2008

shr00m-dew wrote: »

Nick pretty much said that this was one benefit to CREATE_BUFFER, it behaves like Axcess did. He also said he didn't know why there was an aversion to CREATE_BUFFER as there's no real downside and more things get handled for you.

Kevin D.

Back in 2003, they told us in Programmer 2 that Netlinx buffers were different, but it sure looks like a buffer tied directly to a device with create_buffer works as it did in Axcess. I always thought that create_buffer was an anachronism without a good reason to be used, clearly, I was wrong.

Chip Moody · August 2008

Hedberg wrote: »

No, and I didn't suggest otherwise. As you go on to explain, data.text is not guaranteed to contain any particular string, and that's all I said. Look at your explanation and consider the case where the string you are looking for is 'UVWXYZ'. If the data comes in to data.text as in the example, you're screwed. I don't know of any reason why data.text is guaranteed to contain any two contiguous characters so I don't see that it's guaranteed to ever contain 'XYZ' either.

Ack. What he said.

Brainfart here - I was carrying over the idea of a single byte "delimiter" to be searched for, and obviously (well, apparently not obvious to me) that goes right out the window when you move to working with delimiting characters.

My bad...

So maybe in the effort to making a search like this more efficient on a potentially large buffer, it could be constrained by just checking the last "x" number of bytes in the buffer with a RIGHT_STRING. I'm just thinking out loud here...

- Chip

vining · August 2008

I talked with the instructor at the NYC P3 training today who informed that the max string that data.text can handle at one time is 2048. He also re-affirmed that DATA.TEXT acts like an array and once it's full it doesn't accept any more data into it while as previously mentioned a created buffer is a FIFO and if the string is larger than the buffer, data at the begining of the buffer is pushed out the front door as the new data loads in through the back door.

This does indeed make DATA.TEXT a poor choice for holding "most" HTTP data for parsing but as Chip suggested would work fine for searching for the string required to trigger parsing the data contained in the created buffer in order to save a few clock cycles.

a_riot42 · August 2008

vining wrote: »

This does indeed make DATA.TEXT a poor choice for holding "most" HTTP data for parsing but as Chip suggested would work fine for searching for the string required to trigger parsing the data contained in the created buffer in order to save a few clock cycles.

I don't follow what the problem with data.text is. It's a buffer like any other and its up to the programmer to manipulate it to function correctly isn't it? If it overflows and data is lost then that is the programmers fault.

Someone said that create_buffer is a FIFO buffer. But data.text is as well if you read it as it comes in. The docs say use either data.text or create_buffer so I wonder if they are implemented exactly the same and create_buffer is just there for legacy reasons.
Paul

how do you use IP_CLIENT_OPEN to get info from a website?

Comments