Saturday 19 August 2006

UTF Hell

Well the posts are now imported. The following perl one-liner was a lifesaver:
perl -C -pe 's/([^\x00-\x7f])/sprintf("&#%d;", ord($1))/ge;'
Converts non-ascii to XML numeric entity references. The MT XMLRPC daemon wasn't to keen on accepting files with UTF-8 chars (although that was probably the fault of the commandline poster I'm using...) Oneliner was found at: http://www.cl.cam.ac.uk/~mgk25/unicode.html#perl