git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

fileinput


Pascal wrote:

> I have a small python (3.7.4) script that should open a log file and
> display its content but as you can see, an encoding error occurs :
> 
> -----------------------
> 
> import fileinput
> import sys
> try:
>     source = sys.argv[1:]
> except IndexError:
>     source = None
> for line in fileinput.input(source):
>     print(line.strip())
> 
> -----------------------
> 
> python3.7.4 myscript.py myfile.log
> Traceback (most recent call last):
> ...
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 799:
> invalid continuation byte
> 
> python3.7.4 myscript.py < myfile.log
> Traceback (most recent call last):
> ...
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 799:
> invalid continuation byte
> 
> -----------------------
> 
> I add the encoding hook to overcome the error but this time, the script
> reacts differently depending on the input used :
> 
> -----------------------
> 
> import fileinput
> import sys
> try:
>     source = sys.argv[1:]
> except IndexError:
>     source = None
> for line in fileinput.input(source,
> openhook=fileinput.hook_encoded("utf-8", "ignore")):
>     print(line.strip())
> 
> -----------------------
> 
> python3.7.4 myscript.py myfile.log
> first line of myfile.log
> ...
> last line of myfile.log
> 
> python3.7.4 myscript.py < myfile.log
> Traceback (most recent call last):
> ...
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 799:
> invalid continuation byte
> 
> python3.7.4 myscript.py /dev/stdin < myfile.log
> first line of myfile.log
> ...
> last line of myfile.log
> 
> python3.7.4 myscript.py - < myfile.log
> Traceback (most recent call last):
> ...
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 799:
> invalid continuation byte
> 
> -----------------------
> 
> does anyone have an explanation and/or solution ?

'-' or no argument tell fileinput to use sys.stdin. This is already text 
decoded using Python's default io-encoding, and the open hook is not called.
You can override the default encoding by setting the environment variable

PYTHONIOENCODING=UTF8:ignore