Richard Jones' Log: One of these things makes sense, the other doesn't

Wed, 26 May 2004

Rachel figured out today why kids eat Clag paste.

On the other hand, I'm quite confused as to why unicode(u'', 'UTF-8') should raise an error.

Comment by Fredrik on Wed, 26 May 2004

From the library reference:

unicode( [object[, encoding [, errors]]]) /.../ If encoding and/or errors are given, unicode() will decode the object which can either be an 8-bit string or a character buffer using the codec for encoding.

u'' isn't an 8-bit string. '' is.

Comment by Martijn Faassen on Wed, 26 May 2004

It's actually quite common for python programmers to not really have a clue about the basics of unicode. I was the same way a few years ago. I've ran into a lot of others. Python in some way by making it easier makes it more confusing, as unicode errors may pop up deep in your application depending on input. (but for other reasons the current approach is reasonably sane)

At Infrae we wrote some text about all this in the context of Silva (our CMS). I'll mail you a copy. Perhaps it's actually too simplistic for you as you may already know, but perhaps it'll help.