Decoding a \unicode-escaped string in CFML

I needed to incorporate a news feed from, which was pretty easy using their API. The return format is JSON, so all looked great. Except for the "rich text" they sent back:

Unicode escapes example cfdump

As you can see, there are a lot of "\u00.." occurences in the rich text. I searched Google for a standard CFML solution to convert these character sequences, but found nothing for CFML. So I wrote the following function, which tries to convert the string as fast as possible:

<cffunction name="unicodeEscape" returntype="string" output="no">
<cfargument name="s" type="string"/>
<!--- If no unicode-escapes present in the string: return --->
<cfif not find('\u', arguments.s)>
<cfreturn arguments.s />
<!--- If % is present in the string: url-encode it. Otherwise, urlDecode would choke on it --->
<cfif find('%', arguments.s)>
<cfset arguments.s = replace(arguments.s, '%', urlEncodedFormat('%'), 'all') />
<!--- Ascii characters (\u0000 - \u00FF) can be translated as %00-%FF --->
<cfset arguments.s = replace(arguments.s, "\u00", "%", "all") />
<!--- Higher characters (\u0100 - \uFFFF) can be translated as %01%00 - %FF%FF.
Only do this regex if there is something to replace. --->
<cfif find('\u', arguments.s)>
<cfset arguments.s = rereplace(arguments.s, "\u([0-9A-F][0-9A-F])([0-9A-F][0-9A-F])", "%\1%\2", "all") />
<cfreturn urldecode(arguments.s) />

If there is a built-in way in CFML to decode unicode-escapes, then please leave a comment. I'd be happy to learn :) Digg StumbleUpon Facebook Technorati Fav reddit Google Bookmarks
| Viewed 4140 times
  1. David Epler

    #1 by David Epler - May 13, 2013 at 5:44 PM

    If you are running Railo 4.x+ or ColdFusion 10 there is a new function Canonicalize() which will properly decode encoded strings including unicode. Both utilize OWASP ESAPI to provide the functionality.
  2. Paul Klinkenberg

    #2 by Paul Klinkenberg - May 13, 2013 at 5:48 PM

    That's awesome, thanks David!
    (a bit ashamed to notice I haven't kept up with latest developments though)
(will not be published)
Leave this field empty