Parsing UTF-8 encoded web data
John Gonzales
Posts: 609
I'm really struggling with some of the unicode functions in Netlinx.
I'm trying to parse a webpage and the page is encoded in UTF-8. When I treat the incoming data like normal ASCII text and transfer it into a char array buffer, everything looks fine except for the special characters like quotes, etc which are encoded in UTF-8. The "left quote" symbol that the website is using is a UTF-8 special character with a value of hex 0xE28099 which in a character array gets saved as $E2, $80, $99 so instead of getting a single " you get three characters -- ’
If I want to treat this incoming text as unicode, what's the best way to approach this? I've tried creating a wide char variable to hold the incoming data, and concatenated it as follows:
but that didn't work at all. When the data comes in, wcWebBuffer is "?????????????????" and cWebBuffer is "?,?,?,?,.."
I've tried different approaches to handling the UTF-8, but I just can't seem to figure out how to make this work. I appreciate any help.
Thanks,
John
I'm trying to parse a webpage and the page is encoded in UTF-8. When I treat the incoming data like normal ASCII text and transfer it into a char array buffer, everything looks fine except for the special characters like quotes, etc which are encoded in UTF-8. The "left quote" symbol that the website is using is a UTF-8 special character with a value of hex 0xE28099 which in a character array gets saved as $E2, $80, $99 so instead of getting a single " you get three characters -- ’
If I want to treat this incoming text as unicode, what's the best way to approach this? I've tried creating a wide char variable to hold the incoming data, and concatenated it as follows:
wcWebBuffer = WC_CONCAT_STRING(wcWebBuffer,_WC(DATA.TEXT)) cWebBuffer = WC_TO_CH(wcWebBuffer)
but that didn't work at all. When the data comes in, wcWebBuffer is "?????????????????" and cWebBuffer is "?,?,?,?,.."
I've tried different approaches to handling the UTF-8, but I just can't seem to figure out how to make this work. I appreciate any help.
Thanks,
John
0