git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

python3, regular expression and bytes text


12.10.19 21:08, Eko palypse ????:
> So how can I make it work with utf8 encoded text?

You cannot. First, \w in re.LOCALE works only when the text is encoded 
with the locale encoding (cp1252 in your case). Second, re.LOCALE 
supports only 8-bit charsets. So even if you set the utf-8 locale, it 
would not help.

Regular expressions with re.LOCALE are slow. It may be more efficient to 
decode text and use Unicode regular expression.