git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

How to decode UTF strings?


In comp.lang.python, DFS  <nospam at dfs.com> wrote:
> On 10/25/2019 10:57 PM, MRAB wrote:
>> Here's a simple example, based in your code:
>> 
>> from email.header import decode_header
>> 
>> def test(header, default_encoding='utf-8'):
>>       parts = []
>> 
>>       for data, encoding in decode_header(header):
>>           if isinstance(data, str):
>>              parts.append(data)
>>           else:
>>              parts.append(data.decode(encoding or default_encoding))
>> 
>>       print(''.join(parts))
>> 
>> test('=?iso-8859-9?b?T/B1eg==?= <oguz.ismail.uysal at gmail.com>')
>> test('=?utf-8?Q?=EB=AF=B8?= <taeyeon10006 at gmail.com>')
>> test('=?GBK?B?0Pu66A==?= <xuan.alan at 163.com>')
>> test('=?UTF-8?B?zp3Or866zr/PgiDOks6tz4HOs86/z4I=?= 
>> <vergos.nikolas at gmail.com>')
> I don't think it's working:

It's close. Just ''.join should be ' '.join.

> $ python decode_utf.py
> O?uz<oguz.ismail.uysal at gmail.com>
> ???<taeyeon10006 at gmail.com>
> ????<xuan.alan at 163.com>
> ?????????? ????????????<vergos.nikolas at gmail.com>

Is your terminal UTF-8? I think not.

Elijah
------
answered with C code to do this in comp.lang.c