[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Python-Dev] Replacement for array.array('u')?


Internally, CPython has a _PyUnicodeWriter which is an efficient way
to create a string but appending substrings or characters.
_PyUnicodeWriter changes the internal storage format depending on
characters code points (ascii or latin1: 1 byte/character, BMP: 2 b/c,
full UCS: 4 b/c). I tried once to expose it in Python, but I wasn't
convinced by performances. The overhead of method calls was quite
significant, and I wasn't convinced by "writer += str" performance
neither. Maybe I should try again. PyPy also has such object. It
avoids the "str += str" hack in ceval.c to avoid very poor performance
(_PyUnicodeWriter also uses overallocation which can be controlled
with multiple parameters to reduce the number of realloc).

Another alternative would be have to add a "strarray" type similar to
bytes/bytearray couple.

Is is what you are looking for? Or do you really need array.array API?


Le ven. 22 mars 2019 ? 08:38, Greg Ewing <greg.ewing at> a ?crit :
> A poster on comp.lang.python is asking about array.array('u').
> He wants an efficient mutable collection of unicode characters
> that can be initialised from a string.
> According to the docs, the 'u' code is deprecated and will be
> removed in 4.0, but no alternative is suggested.
> Why is this being deprecated, instead of keeping it and making
> it always 32 bits? It seems like useful functionality that can't
> be easily obtained another way.
> --
> Greg
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

Night gathers, and now my watch begins. It shall not end until my death.