how do you use IP_CLIENT_OPEN to get info from a website?
JOHNBONZ
Posts: 99
I am trying to goto a website to scrap a webscreen -- be it weather.com or yahoo finance to create a stock ticker, but when I use the IP_CLIENT_OPEN and then a GET, I get the error -
SendString 0:0:0 too long 528 truncated to 274.
So I just get bits and pieces of the webscreen I want to scrap. Is this a Firmware issue since I am using firmware - 2.32.148 ( I am using this old firmware becuase the newer one has DUET in which my Master hangs for about 45 seconds and I didnt want to wait that long while I compiled and uploaded)
DEFINE_START
create_buffer dvWeather, strWeatherBuff;
DATA_EVENT[dvWeather]
{
ONLINE:
{
//weather
SEND_STRING dvWeather,"'GET /forecastrss?p=48316 HTTP/1.0',13,10,13,10"
}
STRING:
{
SEND_STRING 0,"'STRING strWeatherBuff
',strWeatherBuff";
}//string
}
Anyone have any ideas to get al the info from the webpage without this error?
SendString 0:0:0 too long 528 truncated to 274.
So I just get bits and pieces of the webscreen I want to scrap. Is this a Firmware issue since I am using firmware - 2.32.148 ( I am using this old firmware becuase the newer one has DUET in which my Master hangs for about 45 seconds and I didnt want to wait that long while I compiled and uploaded)
DEFINE_START
create_buffer dvWeather, strWeatherBuff;
DATA_EVENT[dvWeather]
{
ONLINE:
{
//weather
SEND_STRING dvWeather,"'GET /forecastrss?p=48316 HTTP/1.0',13,10,13,10"
}
STRING:
{
SEND_STRING 0,"'STRING strWeatherBuff
',strWeatherBuff";
}//string
}
Anyone have any ideas to get al the info from the webpage without this error?
0
Comments
Has nothing to do with the IP handling itself.
The GET request seems not to be complete.
This is a working string from a router: The host is part of the GET, at it has to be the servers's IP.
This worked fine on a NXI-ME v2.31, so I don't think it is a Duet issue.
I'm surprised that works; my understanding was that header data (HOST, etc.) were only to be separated from the GET line by a single CRLF pair, and doing two terminates the session. For example, from a Weatherbug session:
I have also found in my experiments with various HTTP hosts that some require a bit of header data, some don't require any. I generally capture the packets from a browser session to see what a browser sends to the site, then start dropping stuff out until it breaks.
Dave is correct, the example request that Marc gave is no different from the example request that John gave, since the CRLF+CRLF terminated the request before the Host: header was sent.
Some sites will be happy with just a GET request, many will require a Host: header indicating the *name* of the site, like Host: www.amxforums.com (not the IP address), some need even more. Try using a firefox extension to snoop the headers your browser is sending, and start with that set. No harm in leaving extra ones in there even if they aren't needed.
SEND_STRING dvRSS, "'GET /rss.aspx?zipcode=', sCity.zip,'&feed=curr,fcst&units=0&zcode=z4641 HTTP/1.0',$0d,$0a"
SEND_STRING dvRSS, "'Host: feeds.weatherbug.com',$0d,$0a"
SEND_STRING dvRSS, "'Authorization: Basic',$0d,$0a"
SEND_STRING dvRSS, "'User-Agent: NetLinx_WEATHERBUG_RSS/',sVersion, $0d,$0a"
SEND_STRING dvRSS, "$0d,$0a"
in the data_event, STRING HANDLER, I will NOT see the error I was getting? In other words, I will see a stream of data from the the weatherBug site? I used 1 SEND_STRING that just had the GET but you use a GET and others. So back to my original question, why am I allowed only 274 bytes?
Marc says this:
This error is from the SEND_STRING to terminal only. You only can send 274 bytes in one string to terminal. Has nothing to do with the IP handling itself.
I will let you all know how this turns out.
Thanks again!!
The way you have it now, you're going to see a lot of repetition, since your buffer is being appended to each time you pass through the DATA_EVENT / STRING: handler.
- Chip
volatile char strWeatherBuff[8192];
1) IP_CLIENT_OPEN(dvStocks.PORT,'weather.yahooapis.com',80,IP_TCP); //established connection
then I connection comes online and then goes to the ONLINE:
2)
ONLINE:
{
SEND_STRING dvWeather,"'GET /forecastrss?p=48316 HTTP/1.0',13,10,13,10" //get info
}
3)
STRING:
{
SEND_STRING 0,"'STRING strWeatherBuff
',strWeatherBuff"; //print out buffer
}//string
}
If I went to this webpage and there was , say, 2000 bytes of info on this page, will the GET command retrieve all this data at once?
If so, will I have all this info in the buffer strWeatherBuff (if max size is 8k)?
is the only problem the SEND_STRING 0 which is the Terminal server?
If strWeatherBuff has all 2000 bytes then I can parse it out, but I am not sure it will be loaded will all the data based on the above.
Yes, it does behave like that. I generally remove each chunk of data as I parse as a matter of course; it didn't occur to me to mention it .
Clear your buffer
Connect to the server
Issue the GET
Wait...
Don't bother looking at the contents of the buffer until AFTER you get the OFFLINE event from the TCP/IP connection.
If the server doesn't automatically disconnect after sending all the data, you could check the contents of DATA.TEXT on each DATA_EVENT/STRING: event for something like "/HTML" or something guaranteed to be at the end of the expected data, then use that as a cue to disconnect the TCP/IP connection from your side. If there's no absolute tag that sits at the end of the data, (rare) you could check the initial HTTP header as it comes in for a data size tag - use that to calculate how large you expect the buffer to get and close the connection when it gets to that size. (Don't forget a timeout to be safe) The size will represent the number of bytes in the buffer PAST the end of the HTTP header, I.E., after the first occurrence of a double carriage return/linefeed pair.
- Chip
I would also suggest not using the String Data handler either and just put a line in define_program like: When you get your ending tag or whatever esle determines a full string has been delivered, then parse it.
I entirely ignore the STRING event handler for something like this. I create a buffer on the port, then parse the buffer on the OFFLINE handler, using a WHILE(FIND_STRING(sBuffer, "'>'", 1)) to break it into manageable chunks.
Yes - normally there's not much need to do things in DATA_EVENT / STRING: besides calling your parsing routine, but here's the thinking behind what I suggested:
If you've got a buffer - a potentially large one if it's getting data from a web server - where the STRING: handler is getting hit many times as the buffer fills, AND you're looking for something that is going to appear at the END of that buffer to indicate that you've got your complete document - WHY would you want to call a routine every time that does a FIND_STRING for that data, and looks in that potentially large buffer for it?
With every hit on the STRING: handler, DATA.TEXT contains the new block of data that is being appended to the buffer - not the entire contents of the buffer itself. On subsequent passes, DATA.TEXT is going to be different every time - it gets replaced with JUST the data that caused the event.
The point is, after the first triggering of the STRING: handler, your buffer grows larger, while DATA.TEXT will ALWAYS be smaller in size than the buffer.
Which one will be more efficient to search for your target string in?
- Chip
well, the buffer is guaranteed to contain the string (assuming it has arrived and the buffer is sufficiently large) while data.text is not. So, I guess it depends on how you define "efficient." My wag is that waiting for the connection to be closed and then initiating a search from the offline handler (as DH suggests) probably saves at least a couple ms. It doesn't take very long to search for something in RAM.
Chip Moody wrote: I hear what you're saying and I like your logic.
Question. Since I'm not a big DATA.TEXT user and I recall issues with the lenght of strings it can handle is it possible that using DATA.TEXT might miss the ending tag your looking for or other chunks of the strings?
DATA.TEXT will - over the course of data coming in from a source - RS232 or IP - contain everything that winds up in the buffer at one point or another. Have you been told otherwise?
This is way over-simplifying, but if a remote device sends "ABCDEFGHIJKLMNOPQRSTUVWXYZ", it could come in smaller chunks via the Netlinx OS and cause multiple triggers of the STRING: handler. For arguments sake, lets say it comes in causing the STRING event to trigger four times.
If you cleared your buffer before the start of data coming in, your buffer (and we're talking about a CREATE_BUFFER buffer here) will be like this when all is said and done:
MyBuff = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
WHILE the data coming in is gathered by the OS/hardware, however - DATA.TEXT will show you what's coming in on each trigger of the STRING: handler, like this for example:
STRING: handler hit #1 - DATA.TEXT = "ABCDEFGH"
STRING: handler hit #2 - DATA.TEXT = "IJKLMN"
STRING: handler hit #3 - DATA.TEXT = "OPQRSTUV"
STRING: handler hit #4 - DATA.TEXT = "WXYZ"
So if you want to set up your code to look for "XYZ" as a cue for "okay, I got everything I expected, having each hit on the STRING: handler look in DATA.TEXT (which in this case is never more than 8 characters long) for "XYZ" is more efficient than doing this each time:
STRING: handler hit #1 - MyBuff = "ABCDEFGH" No match for "XYZ".
STRING: handler hit #2 - MyBuff = "ABCDEFGHIJKLMN" No match for "XYZ".
STRING: handler hit #3 - MyBuff = "ABCDEFGHIJKLMNOPQRSTUV" No match for "XYZ".
STRING: handler hit #4 - MyBuff = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" Found "XYZ" - we're done!
Absolutely - if you look a few posts back you'll see that's what I suggested as well. I mentioned the monitoring of DATA.TEXT for some kind of end tag in the rare instances that a server may not disconnect after sending it's data...
- Chip
I've never set up stopwatch code to test it, but it's seemed pretty instantaneous to me...
I have always been told by AMX employees (training, tech support, programmers, etc) that whatever comes in to your buffer will be trapped by DATA.TEXT at one point during the process, and no - it doesn't drop characters. On the flip side to that, I've never experienced otherwise, and I've never been told by anyone at AMX to expect behavior other than that from DATA.TEXT.
On a similar token, remember that one method of building your buffer (if you didn't declare it in CREATE_BUFFER) is to do MyBuff = "MyBuff,DATA.TEXT" in your STRING: handler. If DATA.TEXT couldn't be relied on to see every byte of data that comes in, you wouldn't be able to do that. (And then there wouldn't be much purpose to have it in the first place!)
- Chip
There is one small issue that can cause problems with DATA.TEXT when it comes to web pages. There is a limit to the size of DATA.TEXT that can potentially cause problems with web pages. DATA.TEXT is limited to 16K. So web pages that go over this size can cause problems.
Also, the behavior of D.T is somewhat different as well.
When data from CREATE_BUFFER is full it's FIFO. So the data coming pushes the first stuff out. In DATA.TEXT, once it's full, it's just full and it ignores anything over the 16K.
One other thing that is nothing I can verify, but have experienced...
If the data coming in is a little stuttery, I've found that the DATA_EVENT will fire if it thinks the message is over. However, CREATE_BUFFER seems to be a bit more patient and seems to wait a bit longer to see if the entire message is done.
No, and I didn't suggest otherwise. As you go on to explain, data.text is not guaranteed to contain any particular string, and that's all I said. Look at your explanation and consider the case where the string you are looking for is 'UVWXYZ'. If the data comes in to data.text as in the example, you're screwed. I don't know of any reason why data.text is guaranteed to contain any two contiguous characters so I don't see that it's guaranteed to ever contain 'XYZ' either.
My understanding is that in Netlinx, when a any buffer is full, any data that the program attempts to add is discarded. This is different from Axcess where data was added to the end of the buffer and discarded from the beginning.
That's one of the discussion we had at this week's PROG III class in Dallas. The information I related was from Nick. My previous understanding was the same as yours. Axcess was FIFO / Netlinx was nothing-after-full.
Perhaps I will run a quick test of the two when I get a chance. It would be helpful to know.
Kevin D.
I don't know why or how this happened but in my mind I have it that DATA.TEXT is used in module because comms to virutuals are usualy short strings and their timing is a known constant. They always send a complete command or string and always trigger a single string or command event per sent string or command. I also have in my head that these strings aren't reliable above say 256 character sort of like sending text to a variable text address. Like I said I don't know how I got this stuck in my head but it's been in there for a long time and I think re-inforced over the years from some one some where. I either read it or heard it, maybe I just made it up, I don't know. So I've alway created buffers except when talking to virtuals.
With the exception of using DATA.TEXT as Chip suggested to keep down the lenght of the string my find_string has to search through I don't see a reason to change anything but I am interested to see if any one else has been brain washed to have an adversion to DATA.TEXT like I have.
Back in 2003, they told us in Programmer 2 that Netlinx buffers were different, but it sure looks like a buffer tied directly to a device with create_buffer works as it did in Axcess. I always thought that create_buffer was an anachronism without a good reason to be used, clearly, I was wrong.
Ack. What he said.
Brainfart here - I was carrying over the idea of a single byte "delimiter" to be searched for, and obviously (well, apparently not obvious to me) that goes right out the window when you move to working with delimiting characters.
My bad...
So maybe in the effort to making a search like this more efficient on a potentially large buffer, it could be constrained by just checking the last "x" number of bytes in the buffer with a RIGHT_STRING. I'm just thinking out loud here...
- Chip
This does indeed make DATA.TEXT a poor choice for holding "most" HTTP data for parsing but as Chip suggested would work fine for searching for the string required to trigger parsing the data contained in the created buffer in order to save a few clock cycles.
I don't follow what the problem with data.text is. It's a buffer like any other and its up to the programmer to manipulate it to function correctly isn't it? If it overflows and data is lost then that is the programmers fault.
Someone said that create_buffer is a FIFO buffer. But data.text is as well if you read it as it comes in. The docs say use either data.text or create_buffer so I wonder if they are implemented exactly the same and create_buffer is just there for legacy reasons.
Paul