connecting to a web site

samos · September 2009

How do you connected to a web site so that you can retrieve html from it. If I wanted to connect to say www.twitter.com. Do I open and ip_client_open to www.twitter.com. and then what??? Where can I find information on making these type of connections

ericmedley · September 2009

samos wrote: »

How do you connected to a web site so that you can retrieve html from it. If I wanted to connect to say www.twitter.com. Do I open and ip_client_open to www.twitter.com. and then what??? Where can I find information on making these type of connections

Theres a lot of discussion on the forum for it. Just seach for web page scraping or html and I"m sure you'll find what you need.

The basics of it are that you have to use IP_CLIENT_OPEN an upon an online event send a spoof of a web browser to the web server. The connection closes right after the reply.

replies come back as data_Events on whichever Netlinx port you setup for comm.

Some things to know.

You probably need to use CREATE_BUFFER instead of DATA.TEXT for this since DATA.Text has a size limit that can catch you on some web pages. The built-in limit is 2K. If you use CREATE_BUFFER you can set the size up quite a bit depending upon the web site. There are ways you can use data.text. I'm sure someone with a burr under their saddle will mention how.

You cannot do some of the functions available on a web browser like flash animation, direct x. SSL, ect... Also websites with a lot of frames can goof things up. If a website changes its format often, you'll drive yourself mad trying to keep up.

Scraping data from a website is just good-ole fashioned string parsing. (Hashing can work too.)

If you're trying to get raw data fro a site (Like local time or temperature or stock quotes or whatever,) you might look into scraping RSS feeds instead. They're built a bit more for our kind of use.

Hope that helps.

samos · September 2009

Eric,

Thanks for all of the information. I know how to do just about everything you talked about except how to send a spoof of a web browser to the web server. Does anyone have some example code of how to accomplish it????

PhreaK · September 2009

Rather than scraping pages that are presented to the greater unwashed, you will find that a lot of higher profile 'community' sites (this twitter, flickr, facebook etc) have API's for nicer communication with computers. You will still have to use ip sockets within your AMX system to communicate but all the unessecary crud will already be filtered out and you will have much nicer and more efficient communication. You will also be protected against changes to the site UI as API's by nature (should) remain consistent.

In the case of twitter you will probably be interested in having a look around here: http://apiwiki.twitter.com/.

samos · September 2009

I have read the twitter API and even wrote come C++ code to get data from Twitter. I just want to know how to connect to the site with AMX and send the URL requests.

step 1 ip_client_open to www.twitter.com.

what do I send to the web server besides the url request. How do I spoof a browser??

PhreaK · September 2009

To 'spoof' a browser you need to set your 'User-Agent' string in your request header. Within HTTP all communication is just ASCII strings. Check out http://www.httpviewer.net to help visualize what communication actually takes place with different sites.

Also http://www.amxforums.com/showthread.php?t=4406 may be of interest to you.

DHawthorne · September 2009

There isn't a simple answer to this question, because what you need to do varies with the site you are connecting to. The simplest of sites only requires you to connect, then send a sting with GET and two cr/lf pairs. Other sites require header information, like the aforementioned user_agent; you will likely need login credentials as well. Best bet is to look up HTML protocol and get the basics there, then run a packet sniffer (Wireshark is a decent free one) while connecting with a browser to catch what specifics your site requires. Twitter may have a published API so you can fore-go the tedious packet sniffing stage (I would be srprised if they didn't actually, but finding it may be another matter).

samos · September 2009

ok hear is my code for just a connection

i made it so if the connection fails it will retry. when the system starts up it fails like 15 times and then connects. Does anyone know why this is?

PROGRAM_NAME='temp'
(***********************************************************)
(*  FILE_LAST_MODIFIED_ON: 09/08/2009  AT: 13:45:56        *)
(***********************************************************)

DEFINE_DEVICE
dvTwitter     = 0:3:0
vdvTwitter    = 0:4:0

dvTP          = 10001:1:0
DEFINE_FUNCTION integer fnConnectToTwitter()
{
    SEND_STRING 0, 'GET TWITTER FEED'
    ip_client_close(dvTwitter.Port)
    wait 20
    {	
	ip_client_open(dvTwitter.Port,'www.twitter.com',80,IP_TCP)
    }
}
(* EXAMPLE: DEFINE_FUNCTION <RETURN_TYPE> <NAME> (<PARAMETERS>) *)
(* EXAMPLE: DEFINE_CALL '<NAME>' (<PARAMETERS>) *)

(***********************************************************)
(*                STARTUP CODE GOES BELOW                  *)
(***********************************************************)
DEFINE_START
send_string 0, 'START'
fnConnectToTwitter()
(***********************************************************)
(*                THE EVENTS GO BELOW                      *)
(***********************************************************)
DEFINE_EVENT
BUTTON_EVENT[dvTP,1]
{
    push:
    {
	fnConnectToTwitter()
    }
}
DATA_EVENT[dvTwitter]
{
    onerror:
    {
	send_string 0,"'error: client=',ITOA(Data.Number)"
	fnConnectToTwitter()
    }
    online:
    {
	send_string 0,"'online: client'"
    }
    offline:
    {
	send_string 0,"'offline: client'"
    }
    string:
    {    
	send_string 0,"'string: client=',Data.Text"
    }
}
(***********************************************************)
(*            THE ACTUAL PROGRAM GOES BELOW                *)
(***********************************************************)
DEFINE_PROGRAM

(***********************************************************)
(*                     END OF PROGRAM                      *)
(*        DO NOT PUT ANY CODE BELOW THIS COMMENT           *)
(***********************************************************)

Hear is my telnet log

(0000053409) CIpEvent::OnError 0:3:1
(0000053410) error: client=9
(0000053410) GET TWITTER FEED
(0000053411) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053412) CIpEvent::OnError 0:3:1
(0000053413) error: client=9
(0000053414) GET TWITTER FEED
(0000053415) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053416) CIpEvent::OnError 0:3:1
(0000053417) error: client=9
(0000053418) GET TWITTER FEED
(0000053419) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053422) CIpEvent::OnError 0:3:1
(0000053423) error: client=9
(0000053424) GET TWITTER FEED
(0000053425) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053426) CIpEvent::OnError 0:3:1
(0000053427) error: client=9
(0000053427) GET TWITTER FEED
(0000053428) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053429) CIpEvent::OnError 0:3:1
(0000053430) error: client=9
(0000053431) GET TWITTER FEED
(0000053432) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053433) CIpEvent::OnError 0:3:1
(0000053434) error: client=9
(0000053435) GET TWITTER FEED
(0000053436) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053437) CIpEvent::OnError 0:3:1
(0000053438) error: client=9
(0000053438) GET TWITTER FEED
(0000053439) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053440) CIpEvent::OnError 0:3:1
(0000053441) error: client=9
(0000053442) GET TWITTER FEED
(0000053443) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053444) CIpEvent::OnError 0:3:1
(0000053445) error: client=9
(0000053446) GET TWITTER FEED
(0000053446) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053448) CIpEvent::OnError 0:3:1
(0000053449) error: client=9
(0000053449) GET TWITTER FEED
(0000053450) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053451) CIpEvent::OnError 0:3:1
(0000053452) error: client=9
(0000053453) GET TWITTER FEED
(0000053454) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053455) CIpEvent::OnError 0:3:1
(0000053457) error: client=9
(0000053458) GET TWITTER FEED
(0000053459) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053460) CIpEvent::OnError 0:3:1
(0000053461) error: client=9
(0000053461) GET TWITTER FEED
(0000053462) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053464) CIpEvent::OnError 0:3:1
(0000053465) error: client=9
(0000053465) GET TWITTER FEED
(0000053466) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053467) CIpEvent::OnError 0:3:1
(0000053468) error: client=9
(0000053469) GET TWITTER FEED
(0000053470) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053471) CIpEvent::OnError 0:3:1
(0000053472) error: client=9
(0000053472) GET TWITTER FEED
(0000053473) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053474) CIpEvent::OnError 0:3:1
(0000053475) error: client=9
(0000053476) GET TWITTER FEED
(0000053477) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053478) CIpEvent::OnError 0:3:1
(0000053479) error: client=9
(0000053480) GET TWITTER FEED
(0000053481) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053482) CIpEvent::OnError 0:3:1
(0000053483) error: client=9
(0000053483) GET TWITTER FEED
(0000053484) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053486) CIpEvent::OnError 0:3:1
(0000053487) error: client=9
(0000053488) GET TWITTER FEED
(0000053489) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053490) CIpEvent::OnError 0:3:1
(0000053491) error: client=9
(0000053492) GET TWITTER FEED
(0000053492) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053494) CIpEvent::OnError 0:3:1
(0000053494) error: client=9
(0000053495) GET TWITTER FEED
(0000053496) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053497) CIpEvent::OnError 0:3:1
(0000053498) error: client=9
(0000053499) GET TWITTER FEED
(0000053500) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053501) CIpEvent::OnError 0:3:1
(0000053502) error: client=9
(0000053503) GET TWITTER FEED
(0000053503) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053505) CIpEvent::OnError 0:3:1
(0000053506) error: client=9
(0000053507) GET TWITTER FEED
(0000053508) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053509) CIpEvent::OnError 0:3:1
(0000053510) error: client=9
(0000053511) GET TWITTER FEED
(0000053512) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053513) CIpEvent::OnError 0:3:1
(0000053514) error: client=9
(0000053514) GET TWITTER FEED
(0000053515) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053516) CIpEvent::OnError 0:3:1
(0000053517) error: client=9
(0000053518) GET TWITTER FEED
(0000053519) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053520) CIpEvent::OnError 0:3:1
(0000053521) error: client=9
(0000053522) GET TWITTER FEED
(0000053523) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053524) CIpEvent::OnError 0:3:1
(0000053525) error: client=9
(0000053525) GET TWITTER FEED
(0000053526) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053527) CIpEvent::OnError 0:3:1
(0000053528) error: client=9
(0000053529) GET TWITTER FEED
(0000053530) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053531) CIpEvent::OnError 0:3:1
(0000053532) error: client=9
(0000053533) GET TWITTER FEED
(0000053534) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053535) CIpEvent::OnError 0:3:1
(0000053536) error: client=9
(0000053537) GET TWITTER FEED
(0000053538) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053539) CIpEvent::OnError 0:3:1
(0000053540) error: client=9
(0000053541) GET TWITTER FEED
(0000053542) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053543) CIpEvent::OnError 0:3:1
(0000053544) error: client=9
(0000053544) GET TWITTER FEED
(0000053545) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053546) CIpEvent::OnError 0:3:1
(0000053547) error: client=9
(0000053548) GET TWITTER FEED
(0000053549) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053550) CIpEvent::OnError 0:3:1
(0000053551) error: client=9
(0000053552) GET TWITTER FEED
(0000053553) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053554) CIpEvent::OnError 0:3:1
(0000053555) error: client=9
(0000053556) GET TWITTER FEED
(0000053556) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053558) CIpEvent::OnError 0:3:1
(0000053559) error: client=9
(0000053559) GET TWITTER FEED
(0000053560) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053561) CIpEvent::OnError 0:3:1
(0000053562) error: client=9
(0000053563) GET TWITTER FEED
(0000053564) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053565) CIpEvent::OnError 0:3:1
(0000053566) error: client=9
(0000053567) GET TWITTER FEED
(0000053568) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053569) CIpEvent::OnError 0:3:1
(0000053570) error: client=9
(0000053571) GET TWITTER FEED
(0000053572) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053573) CIpEvent::OnError 0:3:1
(0000053574) error: client=9
(0000053575) GET TWITTER FEED
(0000053576) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053577) CIpEvent::OnError 0:3:1
(0000053578) error: client=9
(0000053579) GET TWITTER FEED
(0000053580) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053581) CIpEvent::OnError 0:3:1
(0000053582) error: client=9
(0000053582) GET TWITTER FEED
(0000053583) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053584) CIpEvent::OnError 0:3:1
(0000053585) error: client=9
(0000053586) GET TWITTER FEED
(0000053587) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053588) CIpEvent::OnError 0:3:1
(0000053589) error: client=9
(0000053590) GET TWITTER FEED
(0000053591) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053592) CIpEvent::OnError 0:3:1
(0000053593) error: client=9
(0000053594) GET TWITTER FEED
(0000053595) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053596) CIpEvent::OnError 0:3:1
(0000053597) error: client=9
(0000053597) GET TWITTER FEED
(0000053598) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053599) CIpEvent::OnError 0:3:1
(0000053600) error: client=9
(0000053601) GET TWITTER FEED
(0000053602) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053604) CIpEvent::OnError 0:3:1
(0000053605) error: client=9
(0000053606) GET TWITTER FEED
(0000053606) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053608) CIpEvent::OnError 0:3:1
(0000053609) error: client=9
(0000053609) GET TWITTER FEED
(0000053610) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053611) CIpEvent::OnError 0:3:1
(0000053612) error: client=9
(0000053613) GET TWITTER FEED
(0000053614) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053615) CIpEvent::OnError 0:3:1
(0000053616) error: client=9
(0000053617) GET TWITTER FEED
(0000053618) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053619) CIpEvent::OnError 0:3:1
(0000053620) error: client=9
(0000053621) GET TWITTER FEED
(0000053622) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053623) CIpEvent::OnError 0:3:1
(0000053625) error: client=9
(0000053625) GET TWITTER FEED
(0000053626) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053627) CIpEvent::OnError 0:3:1
(0000053628) error: client=9
(0000053629) GET TWITTER FEED
(0000053630) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053631) CIpEvent::OnError 0:3:1
(0000053632) error: client=9
(0000053633) GET TWITTER FEED
(0000053634) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053635) CIpEvent::OnError 0:3:1
(0000053636) error: client=9
(0000053636) GET TWITTER FEED
(0000053638) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053640) CIpEvent::OnError 0:3:1
(0000053641) error: client=9
(0000053641) GET TWITTER FEED
(0000053642) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053643) CIpEvent::OnError 0:3:1
(0000053644) error: client=9
(0000053645) GET TWITTER FEED
(0000053646) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053647) CIpEvent::OnError 0:3:1
(0000053648) error: client=9
(0000053649) GET TWITTER FEED
(0000053650) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053651) CIpEvent::OnError 0:3:1
(0000053652) error: client=9
(0000053652) GET TWITTER FEED
(0000053653) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053656) CIpEvent::OnError 0:3:1
(0000053657) error: client=9
(0000053658) GET TWITTER FEED
(0000053665) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053717) CIpEvent::OnError 0:3:1
(0000053727) error: client=9
(0000053727) GET TWITTER FEED
(0000053728) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053729) CIpEvent::OnError 0:3:1
(0000053730) error: client=9
(0000053743) GET TWITTER FEED
(0000053744) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053745) CIpEvent::OnError 0:3:1
(0000053746) error: client=9
(0000053747) GET TWITTER FEED
(0000053748) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053750) CIpEvent::OnError 0:3:1
(0000053751) error: client=9
(0000053752) GET TWITTER FEED
(0000053753) CIpSocketMan::ProcessPLPacket - Socket Already Closed
(0000053755) Connected Successfully
(0000053757) CIpEvent::OnError 0:3:1
(0000053758) error: client=9
(0000053759) GET TWITTER FEED
(0000053761) Exiting TCP Read thread - closing this socket for local port 3
(0000053762) CIpEvent::OnLine 0:3:1
(0000053764) online: client
(0000053765) CIpEvent::OffLine 0:3:1
(0000053767) offline: client
(0000053914) IPDeviceDetector.run(): joined multicast group
(0000054364) Memory Available = 5572148 <18812>
(0000055764) Connected Successfully
(0000055766) CIpEvent::OnLine 0:3:1
(0000055767) online: client
(0000056364) Memory Available = 5551284 <20864>
(0000295650) Exiting TCP Read thread - closing this socket for local port 3
(0000295651) CIpEvent::OffLine 0:3:1
(0000295652) offline: client

DHawthorne · September 2009

HTTP connections are designed to open, send your data, get a response, then immediately disconnect. Remember, it was designed for browsers, where someone might open a page and let it sit for unknown periods of time before clicking a link .... you can't tie up the server for slow readers or people who left it open to answer the phone or go on vacation. You have to send all your login information on the online event, and parse what was returned in the offline event. You don't open a connection and leave it open.

samos · September 2009

Dave

I was not trying to leave it open. I just tried to open it, but it returned an error code on the onerror event. so I wrote code to try and open it again if it failed to open.

when I ran the code the ip_client_open function failed about 30 times(triggering the onerror event each time) before it finaly fired and connected. I just wanted to know why it failed to open the connection so many times and then finally worked and fired the online event.

PhreaK · September 2009

The error you are getting (error 9) is coming form your IP_CLIENT_CLOSE statement. Error nine is 'port already closed'. By calling fnConnectToTwitter() on your onerror event it will keep calling your connect function, which in turn will create another error. To combat this you will need to do a bit of logic to make sure you are only trying to reconnect on certain errors. I've also found in the past that it can help giving the master a second or two to collapse the connection before re-opening it.

You can find the error codes spread throughout the documentation in NetLinx Studio, but to help out here's a list of all of them:
-3: unable to open communication port
-2: invalid value for protocol
-1: invalid server port
2: general failure (out of memory)
4: unknown host
6: connection refused
7: connection timed out
8: unknown connection error
9: port already closed
10: binding error
11: listening error
14: local port already in use
15: UDP socket already listening
16: to many open sockets
17: local port not open

Obviously some of these will never be caused by opening a connection as a client.

vining · September 2009

Ah, so this is what Dave was talking about on this other thread. http://www.amxforums.com/showthread.php?t=4406

connecting to a web site

Comments