git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

fileinput


Le samedi 26 octobre 2019 17:49:57 UTC+2, Peter Otten a ?crit?:
> Pascal wrote:
> 
> > I have a small python (3.7.4) script that should open a log file and
> > display its content but as you can see, an encoding error occurs :
> > 
> > -----------------------
> > 
> > import fileinput
> > import sys
> > try:
> >     source = sys.argv[1:]
> > except IndexError:
> >     source = None
> > for line in fileinput.input(source):
> >     print(line.strip())
> > 
> > -----------------------
> > 
> > python3.7.4 myscript.py myfile.log
> > Traceback (most recent call last):
> > ...
> > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 799:
> > invalid continuation byte
> > 
> > python3.7.4 myscript.py < myfile.log
> > Traceback (most recent call last):
> > ...
> > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 799:
> > invalid continuation byte
> > 
> > -----------------------
> > 
> > I add the encoding hook to overcome the error but this time, the script
> > reacts differently depending on the input used :
> > 
> > -----------------------
> > 
> > import fileinput
> > import sys
> > try:
> >     source = sys.argv[1:]
> > except IndexError:
> >     source = None
> > for line in fileinput.input(source,
> > openhook=fileinput.hook_encoded("utf-8", "ignore")):
> >     print(line.strip())
> > 
> > -----------------------
> > 
> > python3.7.4 myscript.py myfile.log
> > first line of myfile.log
> > ...
> > last line of myfile.log
> > 
> > python3.7.4 myscript.py < myfile.log
> > Traceback (most recent call last):
> > ...
> > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 799:
> > invalid continuation byte
> > 
> > python3.7.4 myscript.py /dev/stdin < myfile.log
> > first line of myfile.log
> > ...
> > last line of myfile.log
> > 
> > python3.7.4 myscript.py - < myfile.log
> > Traceback (most recent call last):
> > ...
> > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 799:
> > invalid continuation byte
> > 
> > -----------------------
> > 
> > does anyone have an explanation and/or solution ?
> 
> '-' or no argument tell fileinput to use sys.stdin. This is already text 
> decoded using Python's default io-encoding, and the open hook is not called.
> You can override the default encoding by setting the environment variable
> 
> PYTHONIOENCODING=UTF8:ignore

yes, I just found this about it : https://bugs.python.org/issue26756

this modified script is ok in all cases :

import io
import fileinput
import sys
try:
	source = sys.argv[1:]
except IndexError:
	source = None
sys.stdin = io.TextIOWrapper(sys.stdin.buffer, errors='ignore')
for line in fileinput.input(source, openhook=fileinput.hook_encoded('utf-8', 'ignore')):
	print(line.strip())

thanks for the tip !