Parsing Data from a Webpage
TurnipTruck
Posts: 1,485
Greetings,
Could someone get me started toward parsing data out from a webpage? I want to go to whatismyip.com to get the system's public IP. How can I get the source code of the page into a buffer for parsing?
Thanks.
Could someone get me started toward parsing data out from a webpage? I want to go to whatismyip.com to get the system's public IP. How can I get the source code of the page into a buffer for parsing?
Thanks.
0
Comments
This site generates a smaller response, and doesn't use cookies. The HTTP headers in this code emulates a Firefox browser. This keeps the application under the radar.
I tested using Telnet session. Type command:
msg on
Then send command any character to trigger:
send c vdvIO,'?'
If your netlinx box has a correct IP setup and is on the internet, you will see the WAN IP address printout.
The code will only update WANIP if it has changed. So you can periodically poll WAN IP, then generate an email when it has changed.
hit http://www.amx.com/ip.asp
you get back a simple text message.
If you need the code to initiate the conversation, let me know.
Yes, please. I have had little experience with coding for IP conversations.
Thanks.
here you go...
I write pretty old-skool. You can change it to function calls and whatnot. But you'll get the gist.
Hope that helps.
ejm
Are there any devices that have to be defined to make this work?
Oh sorry, yes
It can be anything from 0:03:0 through whatever. (I don't know what the upper limit is to be frank...) 0:01:0 and 0:02:0 are used by the master.
On my box, I had to increase the size of sCHKIP_BUFFER to about 512. The 255 size cut the returned data off just before the actual body of the message, resulting in no updating. Once I increased the buffer size, everything worked correctly. I didn't bother to calculate exactly what the correct size of the buffer that was needed, probably only need another 30 characters or so to contain the necessary response.
Brad
Thanks for pointing this out. I did a last minute change on the buffer size after testing. The actual response from checkip.dyndns.org is 272. So if the buffer is this or greater it should be good to go.
You're quite welcome!
Well in this thread, two ideas has been opened for discussion
1st Parsing data from a webpage Where can I read a complete resource about this subject, I searched technotes, but didn't get anything.
May be its something related to web programming??
2nd Sending emails from the master???
triggered by Joe
Yeah.. Fine.. and How is that??
Thanks
i!-EquipmentMonitor from amx.com
or if you want my module adapted from it with code that's a little neater, PM me.
Wireshark, previously Ethereal is an excellent learning tool.
http://www.wireshark.org/
As for HTTP protocol in general learning a script language like PHP could prove invaluable. Google is your buddy.
This code has been tested and works great:
PROGRAM
MODULE
Sensiva wrote: This link may be helpful.
http://en.wikipedia.org/wiki/HTTP
Keep in mind that wedsites are subject to change w/o notice and a parsing function that works fine one day most likely won't after the wedsite update.
I personally like to set up accounts for customers through DYNDNS.org and set up the router to update the service. You can buy a block of 20 host services for $10 a year. You can add this in to a yearfull maintenance contract if you choose. I prefer to type in SomeJob.dyndns.org than having to look up a current IP address. Plus my VPN app keeps a list of the URLs so I can connect from anywhere with out checking into the office to find out the new IP.
ericmedley wrote: It's nice to know I'm not the only odd ball that writes in that style.
Thanks for any help..
From DYNDNS:
"Actual HTTP request should look like following fragment. Note that there is the bare minimum set of headers. Request should be followed by sending an empty line.
Fragment base-64-authorization should be represented by Base 64 encoded username : password string.
GET /nic/update?hostname=yourhostname&myip=ipaddress&wildcard=NOCHG&mx=NOCHG&backmx=NOCHG HTTP/1.0
Host: members.dyndns.org
Authorization: Basic base-64-authorization
User-Agent: Company - Device - Version Number"
DEFINE_CALL 'UPDATE DYNDNS'{
LOCAL_VAR
CHAR cAUTH[256]
cAUTH = "EncrBase64Encode(cLoginAuth)"
IP_CLIENT_OPEN(dvDYNDNS.Port,'members.dyndns.org',8245,IP_TCP)
WAIT_UNTIL(CONNECTED=1){
SEND_STRING dvDYNDNS,"
'GET /nic/update?',13,10,
'hostname=mysite.dyndns.org&myip=',WANIP,'&wildcard=NOCH&mx=NOCH&backmx=NOCH&HTTP/1.0',13,10,
'Host: members.dyndns.org',13,10,
'Authorization: Basic cAUTH',13,10,
'User-Agent: RCS - AMX NetLinx NI - Rev1',13,10,13,10"
}
}
Dyndns listed request strings:
Exampl 1:
Raw HTTP GET Request
Actual HTTP request should look like following fragment. Note that there is the bare minimum set of headers. Request should be followed by sending an empty line.
Fragment base-64-authorization should be represented by Base 64 encoded username: password string.
GET /nic/update?hostname=yourhostname&myip=ipaddress&wildcard=NOCHG&mx=NOCHG&backmx=NOCHG HTTP/1.0
Host: members.dyndns.org
Authorization: Basic base-64-authorization
User-Agent: Company - Device - Version Number
I am getting 'badauth' back when I send the update request, the manual says "Base 64 encoded username: password string". I have tried sending "cUSERNAME,':',cPWORD" as one string in the form of "EncrBase64Encode(user: password)" through the encoder as well as sending the user name and password as 2 separate encoded strings "EncrBase64Encode(user)" "EncrBase64Encode(password)" and putting the 2 results in the send_string.
Did you try 'Authorization: cAUTH',13,10,?
Just a thought...
:-)
Also "Authorization: Basic (encoded 'user: password') should be the correct string based on a post here:
http://www.ragestorm.net/tutorial?id=15
"GET http://www.sourceforge.net/somedocument.html HTTP/1.1
Host: www.sourceforge.net
Connection: close
Accept: */*
User-Agent: MyINetApp/0.0.0.1
Proxy-Authorization: Basic dXNlbWU6dGVzdA==
Cache-Control: no-store, no-cache
Pragma: no-cache
"
I also compared what is coming out of my encoder to some results found on the net for user and password and they matched so my encoded string should be good.
I assume you've tried that in your previous fragment assuming it is in line with what you're doing?
Proxy-Authorization: Basic cAUTH etc
HTH
What are you using in the variable cLoginAuth that you are passing to the encoding function?
Proxy-Authorization: Basic (base64)([username]:[password])
11.1 Basic Authentication Scheme
The "basic" authentication scheme is based on the model that the user agent must authenticate itself with a user-ID and a password for each realm. The realm value should be considered an opaque string which can only be compared for equality with other realms on that server. The server will authorize the request only if it can validate the user-ID and password for the protection space of the Request-URI. There are no optional authentication parameters.
Upon receipt of an unauthorized request for a URI within the protection space, the server should respond with a challenge like the following:
WWW-Authenticate: Basic realm="WallyWorld"
where "WallyWorld" is the string assigned by the server to identify the protection space of the Request-URI.
To receive authorization, the client sends the user-ID and password, separated by a single colon (":") character, within a base64 [5] encoded string in the credentials.
basic-credentials = "Basic" SP basic-cookie
basic-cookie = <base64 [5] encoding of userid-password,
except not limited to 76 char/line>
userid-password = [ token ] ":" *TEXT
If the user agent wishes to send the user-ID "Aladdin" and password "open sesame", it would use the following header field:
Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==
The basic authentication scheme is a non-secure method of filtering unauthorized access to resources on an HTTP server. It is based on the assumption that the connection between the client and the server can be regarded as a trusted carrier. As this is not generally true on an open network, the basic authentication scheme should be used accordingly. In spite of this, clients should implement the scheme in order to communicate with servers that use it
Oh yeah, how do you make the forum not do this: user:password
without putting a space after the :
I originally had:
SEND_STRING dvDYNDNS,"
'GET /nic/update?hostname=mysite.dyndns.org&myip=',WANIP,'&
wildcard=NOCH&mx=NOCH&backmx=NOCH&HTTP/1.0',13,10,
'Host: members.dyndns.org',13,10,
'Authorization: Basic cAUTH',13,10,
'User-Agent: RCS - AMX NetLinx NI - Rev1',13,10,13,10"
Needed to change the "&" sign between NOCH&HTTP/1.0' to a space..
Should be:
SEND_STRING dvDYNDNS,"
'GET /nic/update?hostname=mysite.dyndns.org&myip=',WANIP,'&
wildcard=NOCH&mx=NOCH&backmx=NOCH HTTP/1.0',13,10,
'Host: members.dyndns.org',13,10,
'Authorization: Basic cAUTH',13,10,
'User-Agent: RCS - AMX NetLinx NI - Rev1',13,10,13,10"
Works perfect..
Thanks for the previous input on all of this, this really helps me on a project that is located in a country in the Carribean that has an internet provider that does all they can to prevent you from using a fixed IP.. even routers with DYNDNS built in will not work due to the fact that the DSL modems they provide also act as a router/DHCP server with the ability to turn that feature off blocked in the modem setup so anything plugged in behind the modem gets an internal IP address.
:-)
On topic, I'm getting an "HTTP/1.1 301 Moved Permanently" return from an XML file my NetLinx master is requesting. The address given by Apache as the location it's moved to is... the same as the location that I've requested. When I browse to this location in a browser it's fine.
Possibly Apache is discriminating against this request, not sure why: Note the location field and the href are the same URL.
Any ideas? Anyone willing to shove this in their own masters and see if they repeat the problem? Thanks in advance.