Home AMX User Forum NetLinx Studio

Parsing UTF-8 encoded web data

I'm really struggling with some of the unicode functions in Netlinx.

I'm trying to parse a webpage and the page is encoded in UTF-8. When I treat the incoming data like normal ASCII text and transfer it into a char array buffer, everything looks fine except for the special characters like quotes, etc which are encoded in UTF-8. The "left quote" symbol that the website is using is a UTF-8 special character with a value of hex 0xE28099 which in a character array gets saved as $E2, $80, $99 so instead of getting a single " you get three characters -- ’

If I want to treat this incoming text as unicode, what's the best way to approach this? I've tried creating a wide char variable to hold the incoming data, and concatenated it as follows:
wcWebBuffer = WC_CONCAT_STRING(wcWebBuffer,_WC(DATA.TEXT))

cWebBuffer = WC_TO_CH(wcWebBuffer)

but that didn't work at all. When the data comes in, wcWebBuffer is "?????????????????" and cWebBuffer is "?,?,?,?,.."

I've tried different approaches to handling the UTF-8, but I just can't seem to figure out how to make this work. I appreciate any help.

Thanks,
John
Sign In or Register to comment.