Parsing Data from a Webpage
Greetings,
Could someone get me started toward parsing data out from a webpage? I want to go to whatismyip.com to get the system's public IP. How can I get the source code of the page into a buffer for parsing?
Thanks.
Could someone get me started toward parsing data out from a webpage? I want to go to whatismyip.com to get the system's public IP. How can I get the source code of the page into a buffer for parsing?
Thanks.
0
Comments
This site generates a smaller response, and doesn't use cookies. The HTTP headers in this code emulates a Firefox browser. This keeps the application under the radar.
I tested using Telnet session. Type command:
msg on
Then send command any character to trigger:
send c vdvIO,'?'
If your netlinx box has a correct IP setup and is on the internet, you will see the WAN IP address printout.
The code will only update WANIP if it has changed. So you can periodically poll WAN IP, then generate an email when it has changed.
PROGRAM_NAME='GET IP ADDRESS' (***********************************************************) (* DEVICE NUMBER DEFINITIONS GO BELOW *) (***********************************************************) DEFINE_DEVICE dvCheckIP = 0:3:0 (* IP Socket *) vdvIO = 32768:1:0 (* Virtual device to receive test data from telnet *) (***********************************************************) (* VARIABLE DEFINITIONS GO BELOW *) (***********************************************************) DEFINE_VARIABLE CHAR sCHKIP_BUFFER[300] // incoming buffer CHAR CONNECTED // semaphore to prevent errors CHAR WANIP[15] = { '0.0.0.0' } // the goal of the excercise! CHAR PREV_WANIP[15] = { '0.0.0.0' } // allows determining when IP address changes (***********************************************************) (* SUBROUTINE/FUNCTION DEFINITIONS GO BELOW *) (***********************************************************) DEFINE_FUNCTION CHAR[25] getSocketError (LONG err) { SWITCH (err) { CASE 2: RETURN "'General Failure'"; CASE 4: RETURN "'Unknown Host'"; CASE 6: RETURN "'Connection refused'"; CASE 7: RETURN "'Connection timed out'"; CASE 8: RETURN "'Unknown connection error'"; CASE 14: RETURN "'Local port already used'"; CASE 16: RETURN "'Too many open sockets'"; CASE 17: RETURN "'Local Port Not Open'"; } RETURN "'Unknown Err: ',ITOA(err)" } (***********************************************************) (* STARTUP CODE GOES BELOW *) (***********************************************************) DEFINE_START (***********************************************************) (* THE EVENTS GO BELOW *) (***********************************************************) DEFINE_EVENT DATA_EVENT[vdvIO] { COMMAND: { IF(!CONNECTED) IP_CLIENT_OPEN(dvCheckIP.Port,'checkip.dyndns.org',80,IP_TCP) } } DATA_EVENT[dvCheckIP] { ONLINE: (* Socket connected *) { ON[CONNECTED] SEND_STRING dvCheckIP," 'GET / HTTP/1.1',13,10, 'Host: checkip.dyndns.org',13,10, 'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.5',13,10, 'Accept: text/plain;q=0.8',13,10, 'Accept-Language: en-us',13,10, 'Accept-Charset: utf-8;q=0.7',13,10,13,10" } OFFLINE: { OFF[CONNECTED] } STRING: { STACK_VAR START, END sCHKIP_BUFFER = "sCHKIP_BUFFER,DATA.TEXT" END = FIND_STRING(sCHKIP_BUFFER,'</body>',1) IF (END) { START = FIND_STRING(sCHKIP_BUFFER,'<body>',1) START = FIND_STRING(sCHKIP_BUFFER,': ',START) + 2 WANIP = MID_STRING(sCHKIP_BUFFER,START,END-START) IF(PREV_WANIP != WANIP) { PREV_WANIP = WANIP SEND_STRING 0,"'Updated WAN IP Address = ',WANIP" } } } ONERROR: { IF(DATA.Number <> 0) { SEND_STRING 0,"getSocketError(DATA.Number)" } } }hit http://www.amx.com/ip.asp
you get back a simple text message.
If you need the code to initiate the conversation, let me know.
Yes, please. I have had little experience with coding for IP conversations.
Thanks.
here you go...
DEFINE_VARIABLE NIGHTLY_WAN_FLAG GET_WAN_IP_FLAG ROUTER_WAN_COUNT ROUTER_NEW_WAN_IP[16] PERSISTENT ROUTER_CURR_WAN_IP[16] ROUTER_BUFF[1000] ROUTER_WAN_FLAG DEFINE_EVENT DATA_EVENT[HTTP_CLIENT] { ONLINE: // Once you've setup the connection, send the string to the website. { SEND_STRING HTTP_CLIENT,"'GET /ip.asp HTTP/1.1',13,10, 'Accept: */*',13,10, 'Accept-Language: en-us',13,10, 'Accept-Encoding: gzip, deflate',13,10, 'User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Q312461; .NET CLR 1.1.4322)',13,10, 'Host: www.amx.com',13,10,13,10" } STRING: { ROUTER_BUFF=DATA.TEXT IF(FIND_STRING(ROUTER_BUFF,'OK',1)) { WAIT 10 { ROUTER_WAN_FLAG=1 } } } } // END find my WAN IP DEFINE_PROGRAM IF(GET_WAN_IP_FLAG=1) { IP_CLIENT_CLOSE(9) WAIT 4 { IP_CLIENT_OPEN(9,'www.amx.com',80,1) // GO TO DATA_EVENT[HTTP_CLIENT] FOR NEXT STEP. } GET_WAN_IP_FLAG=0 } IF(ROUTER_WAN_FLAG=1) // RECIEVED ip address from amx.com { REMOVE_STRING(ROUTER_BUFF,'Content-Length:',1) ROUTER_WAN_COUNT=ATOI(ROUTER_BUFF) ROUTER_NEW_WAN_IP=RIGHT_STRING(ROUTER_BUFF,ROUTER_WAN_COUNT) IF(ROUTER_NEW_WAN_IP=ROUTER_CURR_WAN_IP) { NETWORKING_LOG="$0D,$0A,'=======================================',$0D,$0A, 'Last known WAN IP from the router was: ',ROUTER_CURR_WAN_IP,$0D,$0A, '=======================================',$0D,$0A,$0D,$0A,NETWORKING_LOG,$0D,$0A" } ELSE { NETWORKING_LOG="$0D,$0A,'=======================================',$0D,$0A, 'Router WAN IP has changed!!!',$0D,$0A, 'Old WAN IP from the router was: ',ROUTER_CURR_WAN_IP,$0D,$0A, 'and now is: ',ROUTER_NEW_WAN_IP,$0D,$0A, 'Please note the change.',$0D,$0A, '=======================================',$0D,$0A,$0D,$0A,NETWORKING_LOG,$0D,$0A" ROUTER_CURR_WAN_IP=ROUTER_NEW_WAN_IP } ROUTER_WAN_FLAG=0 } // END GET THE WAN IP OF THE ROUTER...I write pretty old-skool. You can change it to function calls and whatnot. But you'll get the gist.
Hope that helps.
ejm
Are there any devices that have to be defined to make this work?
Oh sorry, yes
It can be anything from 0:03:0 through whatever. (I don't know what the upper limit is to be frank...) 0:01:0 and 0:02:0 are used by the master.
On my box, I had to increase the size of sCHKIP_BUFFER to about 512. The 255 size cut the returned data off just before the actual body of the message, resulting in no updating. Once I increased the buffer size, everything worked correctly. I didn't bother to calculate exactly what the correct size of the buffer that was needed, probably only need another 30 characters or so to contain the necessary response.
Brad
Thanks for pointing this out. I did a last minute change on the buffer size after testing. The actual response from checkip.dyndns.org is 272. So if the buffer is this or greater it should be good to go.
You're quite welcome!
Well in this thread, two ideas has been opened for discussion
1st Parsing data from a webpage Where can I read a complete resource about this subject, I searched technotes, but didn't get anything.
May be its something related to web programming??
2nd Sending emails from the master???
triggered by Joe
Yeah.. Fine.. and How is that??
Thanks
i!-EquipmentMonitor from amx.com
or if you want my module adapted from it with code that's a little neater, PM me.
Wireshark, previously Ethereal is an excellent learning tool.
http://www.wireshark.org/
As for HTTP protocol in general learning a script language like PHP could prove invaluable. Google is your buddy.
This code has been tested and works great:
PROGRAM
PROGRAM_NAME='Email IP Address Test' (***********************************************************) (***********************************************************) (* FILE_LAST_MODIFIED_ON: 04/05/2006 AT: 09:00:25 *) (***********************************************************) (* System Type : NetLinx *) (***********************************************************) (* REV HISTORY: *) (***********************************************************) (* $History: $ *) (***********************************************************) (* DEVICE NUMBER DEFINITIONS GO BELOW *) (***********************************************************) DEFINE_DEVICE dvCheckIP = 0:3:0 (* IP Socket *) vdvCheckIP = 32768:1:0 (* Virtual device to receive test data from telnet *) (***********************************************************) (* CONSTANT DEFINITIONS GO BELOW *) (***********************************************************) DEFINE_CONSTANT TL_CHECKIP = 1 (***********************************************************) (* DATA TYPE DEFINITIONS GO BELOW *) (***********************************************************) DEFINE_TYPE (***********************************************************) (* VARIABLE DEFINITIONS GO BELOW *) (***********************************************************) DEFINE_VARIABLE VOLATILE LONG TL_CheckIpArray[1] = 3600000 // check every hour // 1 hour = 3600000 // 60000 * 60 // 1 day = 86400000 // (60000 * 60) * 24 /* URL of your outgoing mail server. Leave blank ({''}) to disable e-mail notifications. */ VOLATILE CHAR sSnmpServer[] = { 'your.smtp.host' } /* Authentification for your e-mail server, leave blank ({{''}, {''}}) if authentification is not required */ /* (typical if the request is generated from the same network as the server). */ VOLATILE CHAR sSnmpLogin[][30] = { {'your.smtp.host.login.id'},{'your.smtp.host.login.password'} } /* E-mail address of NetLInx system doing the reporting - this does not necessarily have to be a valid e-mail address, */ /* depending on the requirements of the mail server host. Many *require* this to be an address that originates from their own namespace. */ VOLATILE CHAR sEmailFrom[] = { 'from@domain' } /* E-mail address to send notifcations TO (this is REQUIRED to use e-mail notifcations, in case anyone needs that said :)) */ VOLATILE CHAR sEmailTo[] = { 'to@domain' } /* Email subject text prefix */ VOLATILE CHAR sEmailSubPrefix[]= { 'Subject Prefix - ' } /* Email message text prefix */ VOLATILE CHAR sEmailMsgPrefix[]= { 'Message Prefix - ' } (***********************************************************) (* LATCHING DEFINITIONS GO BELOW *) (***********************************************************) DEFINE_LATCHING (***********************************************************) (* MUTUALLY EXCLUSIVE DEFINITIONS GO BELOW *) (***********************************************************) DEFINE_MUTUALLY_EXCLUSIVE (***********************************************************) (* SUBROUTINE/FUNCTION DEFINITIONS GO BELOW *) (***********************************************************) (* EXAMPLE: DEFINE_FUNCTION <RETURN_TYPE> <NAME> (<PARAMETERS>) *) (* EXAMPLE: DEFINE_CALL '<NAME>' (<PARAMETERS>) *) (***********************************************************) (* STARTUP CODE GOES BELOW *) (***********************************************************) DEFINE_START /* startup check */ SEND_COMMAND vdvCheckIP,'Check IP' /* this can be any string value */ /* periodic check there after */ TIMELINE_CREATE(TL_CHECKIP, TL_CheckIpArray, 1, TIMELINE_ABSOLUTE,TIMELINE_REPEAT) (***********************************************************) (* THE EVENTS GO BELOW *) (***********************************************************) DEFINE_EVENT TIMELINE_EVENT[TL_CHECKIP] { SEND_COMMAND vdvCheckIP,'Check IP' /* this can be any string value */ } (***********************************************************) (* MODULE DEFINITIONS GO BELOW *) (***********************************************************) DEFINE_MODULE 'Email IP Address' U1 ( dvCheckIP, vdvCheckIP, sSnmpServer, sSnmpLogin, sEmailFrom, sEmailTo, sEmailSubPrefix, sEmailMsgPrefix ) (***********************************************************) (* THE ACTUAL PROGRAM GOES BELOW *) (***********************************************************) DEFINE_PROGRAM (***********************************************************) (* END OF PROGRAM *) (* DO NOT PUT ANY CODE BELOW THIS COMMENT *) (***********************************************************)MODULE
MODULE_NAME='Email IP Address' (DEV dvCheckIP, DEV vdvCheckIP, CHAR sSmtpServer[], CHAR sSmtpLogin[][30], CHAR sEmailFrom[], CHAR sEmailTo[], CHAR sEmailSubPrefix[], CHAR sEmailMsgPrefix[]) INCLUDE 'i!-EquipmentMonitorOut.axi' (***********************************************************) (* VARIABLE DEFINITIONS GO BELOW *) (***********************************************************) DEFINE_VARIABLE VOLATILE CHAR sCHKIP_BUFFER[300] // incoming buffer VOLATILE CHAR CONNECTED // semaphore to prevent errors NON_VOLATILE CHAR WANIP[15] = { '0.0.0.0' } // the goal of the excercise! NON_VOLATILE CHAR PREV_WANIP[15] = { '0.0.0.0' } // allows determining when IP address changes (***********************************************************) (* SUBROUTINE/FUNCTION DEFINITIONS GO BELOW *) (***********************************************************) DEFINE_FUNCTION CHAR[25] getSocketError (LONG err) { SWITCH (err) { CASE 2: RETURN "'General Failure'"; CASE 4: RETURN "'Unknown Host'"; CASE 6: RETURN "'Connection refused'"; CASE 7: RETURN "'Connection timed out'"; CASE 8: RETURN "'Unknown connection error'"; CASE 14: RETURN "'Local port already used'"; CASE 16: RETURN "'Too many open sockets'"; CASE 17: RETURN "'Local Port Not Open'"; } RETURN "'Unknown Err: ',ITOA(err)" } (***********************************************************) (* THE EVENTS GO BELOW *) (***********************************************************) DEFINE_EVENT DATA_EVENT[vdvCheckIP] { COMMAND: { IF(!CONNECTED) { SEND_STRING 0,'Checking WAN IP Address' IP_CLIENT_OPEN(dvCheckIP.Port,'checkip.dyndns.org',80,IP_TCP) } } } DATA_EVENT[dvCheckIP] { ONLINE: (* Socket connected *) { ON[CONNECTED] SEND_STRING dvCheckIP," 'GET / HTTP/1.1',13,10, 'Host: checkip.dyndns.org',13,10, 'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.5',13,10, 'Accept: text/plain;q=0.8',13,10, 'Accept-Language: en-us',13,10, 'Accept-Charset: utf-8;q=0.7',13,10,13,10" } OFFLINE: { OFF[CONNECTED] } STRING: { STACK_VAR START, END sCHKIP_BUFFER = "sCHKIP_BUFFER,DATA.TEXT" END = FIND_STRING(sCHKIP_BUFFER,'</body>',1) IF (END) { START = FIND_STRING(sCHKIP_BUFFER,'<body>',1) START = FIND_STRING(sCHKIP_BUFFER,': ',START) + 2 WANIP = MID_STRING(sCHKIP_BUFFER,START,END-START) IF(PREV_WANIP != WANIP) { PREV_WANIP = WANIP SEND_STRING 0,"'Updated WAN IP Address = ',WANIP" /* Attempt to send an email - no retry */ SmtpQueMessage( sEmailFrom, sEmailTo, "sEmailSubPrefix,'WAN IP Address update!'", "sEmailMsgPrefix,'New IP Address is ',WANIP", '' ) } } } ONERROR: { IF(DATA.Number <> 0) { SEND_STRING 0,"getSocketError(DATA.Number)" } } } (***********************************************************) (* STARTUP CODE GOES BELOW *) (***********************************************************) DEFINE_START SmtpSetServer(sSmtpServer) ; SmtpSetUser(sSmtpLogin[1], sSmtpLogin[2])Sensiva wrote: This link may be helpful.
http://en.wikipedia.org/wiki/HTTP
Keep in mind that wedsites are subject to change w/o notice and a parsing function that works fine one day most likely won't after the wedsite update.
I personally like to set up accounts for customers through DYNDNS.org and set up the router to update the service. You can buy a block of 20 host services for $10 a year. You can add this in to a yearfull maintenance contract if you choose. I prefer to type in SomeJob.dyndns.org than having to look up a current IP address. Plus my VPN app keeps a list of the URLs so I can connect from anywhere with out checking into the office to find out the new IP.
ericmedley wrote: It's nice to know I'm not the only odd ball that writes in that style.
Thanks for any help..
From DYNDNS:
"Actual HTTP request should look like following fragment. Note that there is the bare minimum set of headers. Request should be followed by sending an empty line.
Fragment base-64-authorization should be represented by Base 64 encoded username : password string.
GET /nic/update?hostname=yourhostname&myip=ipaddress&wildcard=NOCHG&mx=NOCHG&backmx=NOCHG HTTP/1.0
Host: members.dyndns.org
Authorization: Basic base-64-authorization
User-Agent: Company - Device - Version Number"
DEFINE_CALL 'UPDATE DYNDNS'{
LOCAL_VAR
CHAR cAUTH[256]
cAUTH = "EncrBase64Encode(cLoginAuth)"
IP_CLIENT_OPEN(dvDYNDNS.Port,'members.dyndns.org',8245,IP_TCP)
WAIT_UNTIL(CONNECTED=1){
SEND_STRING dvDYNDNS,"
'GET /nic/update?',13,10,
'hostname=mysite.dyndns.org&myip=',WANIP,'&wildcard=NOCH&mx=NOCH&backmx=NOCH&HTTP/1.0',13,10,
'Host: members.dyndns.org',13,10,
'Authorization: Basic cAUTH',13,10,
'User-Agent: RCS - AMX NetLinx NI - Rev1',13,10,13,10"
}
}
Dyndns listed request strings:
Exampl 1:
Raw HTTP GET Request
Actual HTTP request should look like following fragment. Note that there is the bare minimum set of headers. Request should be followed by sending an empty line.
Fragment base-64-authorization should be represented by Base 64 encoded username: password string.
GET /nic/update?hostname=yourhostname&myip=ipaddress&wildcard=NOCHG&mx=NOCHG&backmx=NOCHG HTTP/1.0
Host: members.dyndns.org
Authorization: Basic base-64-authorization
User-Agent: Company - Device - Version Number
I am getting 'badauth' back when I send the update request, the manual says "Base 64 encoded username: password string". I have tried sending "cUSERNAME,':',cPWORD" as one string in the form of "EncrBase64Encode(user: password)" through the encoder as well as sending the user name and password as 2 separate encoded strings "EncrBase64Encode(user)" "EncrBase64Encode(password)" and putting the 2 results in the send_string.
Did you try 'Authorization: cAUTH',13,10,?
Just a thought...
:-)
Also "Authorization: Basic (encoded 'user: password') should be the correct string based on a post here:
http://www.ragestorm.net/tutorial?id=15
"GET http://www.sourceforge.net/somedocument.html HTTP/1.1
Host: www.sourceforge.net
Connection: close
Accept: */*
User-Agent: MyINetApp/0.0.0.1
Proxy-Authorization: Basic dXNlbWU6dGVzdA==
Cache-Control: no-store, no-cache
Pragma: no-cache
"
I also compared what is coming out of my encoder to some results found on the net for user and password and they matched so my encoded string should be good.
I assume you've tried that in your previous fragment assuming it is in line with what you're doing?
Proxy-Authorization: Basic cAUTH etc
HTH
sun.misc.BASE64Encoder encoder = new sun.misc.BASE64Encoder(); String encodedUserPwd = encoder.encode("mydomain\\MYUSER:MYPASSWORD".getBytes()); con.setRequestProperty ("Proxy-Authorization", "Basic " + encodedUserPwd); // PROXY ----------What are you using in the variable cLoginAuth that you are passing to the encoding function?
Proxy-Authorization: Basic (base64)([username]:[password])
11.1 Basic Authentication Scheme
The "basic" authentication scheme is based on the model that the user agent must authenticate itself with a user-ID and a password for each realm. The realm value should be considered an opaque string which can only be compared for equality with other realms on that server. The server will authorize the request only if it can validate the user-ID and password for the protection space of the Request-URI. There are no optional authentication parameters.
Upon receipt of an unauthorized request for a URI within the protection space, the server should respond with a challenge like the following:
WWW-Authenticate: Basic realm="WallyWorld"
where "WallyWorld" is the string assigned by the server to identify the protection space of the Request-URI.
To receive authorization, the client sends the user-ID and password, separated by a single colon (":") character, within a base64 [5] encoded string in the credentials.
basic-credentials = "Basic" SP basic-cookie
basic-cookie = <base64 [5] encoding of userid-password,
except not limited to 76 char/line>
userid-password = [ token ] ":" *TEXT
If the user agent wishes to send the user-ID "Aladdin" and password "open sesame", it would use the following header field:
Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==
The basic authentication scheme is a non-secure method of filtering unauthorized access to resources on an HTTP server. It is based on the assumption that the connection between the client and the server can be regarded as a trusted carrier. As this is not generally true on an open network, the basic authentication scheme should be used accordingly. In spite of this, clients should implement the scheme in order to communicate with servers that use it
Oh yeah, how do you make the forum not do this: user:password
without putting a space after the :
I originally had:
SEND_STRING dvDYNDNS,"
'GET /nic/update?hostname=mysite.dyndns.org&myip=',WANIP,'&
wildcard=NOCH&mx=NOCH&backmx=NOCH&HTTP/1.0',13,10,
'Host: members.dyndns.org',13,10,
'Authorization: Basic cAUTH',13,10,
'User-Agent: RCS - AMX NetLinx NI - Rev1',13,10,13,10"
Needed to change the "&" sign between NOCH&HTTP/1.0' to a space..
Should be:
SEND_STRING dvDYNDNS,"
'GET /nic/update?hostname=mysite.dyndns.org&myip=',WANIP,'&
wildcard=NOCH&mx=NOCH&backmx=NOCH HTTP/1.0',13,10,
'Host: members.dyndns.org',13,10,
'Authorization: Basic cAUTH',13,10,
'User-Agent: RCS - AMX NetLinx NI - Rev1',13,10,13,10"
Works perfect..
Thanks for the previous input on all of this, this really helps me on a project that is located in a country in the Carribean that has an internet provider that does all they can to prevent you from using a fixed IP.. even routers with DYNDNS built in will not work due to the fact that the DSL modems they provide also act as a router/DHCP server with the ability to turn that feature off blocked in the modem setup so anything plugged in behind the modem gets an internal IP address.
:-)
On topic, I'm getting an "HTTP/1.1 301 Moved Permanently" return from an XML file my NetLinx master is requesting. The address given by Apache as the location it's moved to is... the same as the location that I've requested. When I browse to this location in a browser it's fine.
Possibly Apache is discriminating against this request, not sure why: Note the location field and the href are the same URL.
Any ideas? Anyone willing to shove this in their own masters and see if they repeat the problem? Thanks in advance.