September 14, 2007
Encoding an output stream in Python
Sometimes you may need to encode an output stream. Maybe you're writing to a file and you need it in a certain encoding, or to cStringIO.StringIO, which only takes byte strings. Or basically when you're dealing with the dreaded UnicodeEncodeError.
You can avoid this by wrapping your output stream in a class (i.e. a decorator) that transparently encodes Unicode strings into the expected encoding.
class OutStreamEncoder(object):
"""Wraps a stream with an encoder
"""
def __init__(self, outstream, encoding=None):
self.out = outstream
if not encoding:
self.encoding = sys.getfilesystemencoding()
else:
self.encoding = encoding
def write(self, obj):
"""Wraps the output stream, encoding Unicode
strings with the specified encoding"""
if isinstance(obj, unicode):
self.out.write(obj.encode(self.encoding))
else:
self.out.write(obj)
def __getattr__(self, attr):
"""Delegate everything but write to the stream"""
return getattr(self.out, attr)
An example use:
>>> out = si()
>>> nihongo = unicode("日本語", "sjis")
>>> print >> out, nihongo
Traceback (most recent call last):
File "<pyshell#40>", line 1, in <module>
print >> out, nihongo
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2:
ordinal not in range(128)
>>> out = OutStreamEncoder(out, "utf-8")
>>> print >> out, nihongo
>>> val = out.getvalue()
>>> print val.decode("utf-8")
日本語
>>>
The linked zip file (streamencode.zip) has a streamencode module with the OutStreamEncoder class, and a unit test module.