[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Python-Dev] Proposal: dict.with_values(iterable)

Hi, all.

I propose adding new method: dict.with_values(iterable)

# Motivation

Python is used to handle data.
While dict is not efficient way to handle may records, it is still
convenient way.

When creating many dicts with same keys, dict need to
lookup internal hash table while inserting each keys.

It is costful operation.  If we can reuse existing keys of dict,
we can skip this inserting cost.

Additionally, we have "Key-Sharing Dictionary (PEP 412)".
When all keys are string, many dict can share one key.
It reduces memory consumption.

This might be usable for:

* csv.DictReader
* namedtuple._asdict()
* DB-API 2.0 implementations:  (e.g. DictCursor of mysqlclient-python)

# Draft implementation

pull request:

with_values(self, iterable, /)
    Create a new dictionary with keys from this dict and values from iterable.

    When length of iterable is different from len(self), ValueError is raised.
    This method does not support dict subclass.

## Memory usage (Key-Sharing dict)

>>> import sys
>>> keys = tuple("abcdefg")
>>> keys
('a', 'b', 'c', 'd', 'e', 'f', 'g')
>>> d = dict(zip(keys, range(7)))
>>> d
{'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4, 'f': 5, 'g': 6}
>>> sys.getsizeof(d)

>>> keys = dict.fromkeys("abcdefg")
>>> d = keys.with_values(range(7))
>>> d
{'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4, 'f': 5, 'g': 6}
>>> sys.getsizeof(d)

## Speed

$ ./python -m perf timeit -o zip_dict.json -s 'keys =
tuple("abcdefg"); values=[*range(7)]' 'dict(zip(keys, values))'

$ ./python -m perf timeit -o with_values.json -s 'keys =
dict.fromkeys("abcdefg"); values=[*range(7)]'

$ ./python -m perf compare_to zip_dict.json with_values.json
Mean +- std dev: [zip_dict] 935 ns +- 9 ns -> [with_values] 109 ns +-
2 ns: 8.59x faster (-88%)

How do you think?
Any comments are appreciated.

Inada Naoki  <songofacandy at>