git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Jinja and non-ASCII characters (was Re: Prepare accented characters for HTML)


Peter J. Holzer wrote:

> On 2019-03-29 12:56:00 +0100, Thomas Jollans wrote:
>> On 29/03/2019 12.39, Tony van der Hoff wrote:
>> > Running in browser:
>> > http://localhost/~tony/private/home/learning/jinja/minimal/minimal.py
>> > 
>> > In apache2.access.log:
>> 
>> So it's running in apache!
>> 
>> Now the question is what apache is doing. Is it running it as a CGI
>> script? Is it doing something clever for Python files (maybe involving
>> Python 2?)
>> 
>> ... wild guess: if the script is running as CGI in an enviroment with an
>> ASCII-using "C" locale, with Python 3.5, you wouldn't be able to print
>> non-ASCII characters by default. I think. In any case I remember reading
>> about this problem (if this is the problem) being fixed in a newer
>> version of Python.
> 
> This is very likely correct. I also had this problem with the default
> Apache configuration on Debian, which explicitely sets LANG=C (Edit
> /etc/apache2/envvars to change this).
> 
> The behaviour can be easily reproduced on the command line:
> 
> hrunkner:~/tmp 15:27 :-) 1021% ./annee
> Content-type: text/html
> 
> 
> French: ann?e
> 3.5.3 (default, Sep 27 2018, 17:25:39)
> [GCC 6.3.0 20170516]
> 
> hrunkner:~/tmp 15:27 :-) 1022% echo $LANG
> en_US.UTF-8
> 
> hrunkner:~/tmp 15:34 :-) 1023% LANG=C
> 
> hrunkner:~/tmp 15:34 :-) 1024% ./annee
> Content-type: text/html
> 
> 
> Traceback (most recent call last):
>   File "./annee", line 6, in <module>
>     print(Template("French: {{french}}").render({"french": "ann\xe9e"}))
> UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in
> position 11: ordinal not in range(128)
> 
> This was fixed(?) in Python 3.7.
> 
> hp

You could try to specify the encoding with PYTHONIOENCODING:
 
$ LANG=C python3 -c "print('ann\xe9e')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 
3: ordinal not in range(128)
$ PYTHONIOENCODING=UTF-8 LANG=C python3 -c "print('ann\xe9e')"
ann?e