Answer a question

original question: i got a StringIO object, how can i convert it into BytesIO?

update: The more general question is, how to convert a binary (encoded) file-like object into decoded file-like object in python3?

the naive approach i got is:

import io
sio = io.StringIO('wello horld')
bio = io.BytesIO(sio.read().encode('utf8'))
print(bio.read())  # prints b'wello horld'

is there more efficient and elegant way of doing this? the above code just reads everything into memory, encodes it instead of streaming the data in chunks.

for example, for the reverse question (BytesIO -> StringIO) there exist a class - io.TextIOWrapper which does exactly that (see this answer)

Answers

It's interesting that though the question might seem reasonable, it's not that easy to figure out a practical reason why I would need to convert a StringIO into a BytesIO. Both are basically buffers and you usually need only one of them to make some additional manipulations either with the bytes or with the text.

I may be wrong, but I think your question is actually how to use a BytesIO instance when some code to which you want to pass it expects a text file.

In which case, it is a common question and the solution is codecs module.

The two usual cases of using it are the following:

Compose a File Object to Read

In [16]: import codecs, io

In [17]: bio = io.BytesIO(b'qwe\nasd\n')

In [18]: StreamReader = codecs.getreader('utf-8')  # here you pass the encoding

In [19]: wrapper_file = StreamReader(bio)

In [20]: print(repr(wrapper_file.readline()))
'qwe\n'

In [21]: print(repr(wrapper_file.read()))
'asd\n'

In [26]: bio.seek(0)
Out[26]: 0

In [27]: for line in wrapper_file:
    ...:     print(repr(line))
    ...:
'qwe\n'
'asd\n'

Compose a File Object to Write To

In [28]: bio = io.BytesIO()

In [29]: StreamWriter = codecs.getwriter('utf-8')  # here you pass the encoding

In [30]: wrapper_file = StreamWriter(bio)

In [31]: print('жаба', 'цап', file=wrapper_file)

In [32]: bio.getvalue()
Out[32]: b'\xd0\xb6\xd0\xb0\xd0\xb1\xd0\xb0 \xd1\x86\xd0\xb0\xd0\xbf\n'

In [33]: repr(bio.getvalue().decode('utf-8'))
Out[33]: "'жаба цап\\n'"
Logo

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐