Fwd: keying by identity in dict and set
I'm sure that 99% of all use of 'dict' in Python is exactly this. The
vast majority of my own Python code is such, and that is as it should
Here I have an application where I can do something really cool and
useful, by keying on identity. The built-in Python structures are
pretty limited, but I'm making a package for use by other people, and
I strongly prefer them to use familitar Python structures with it,
rather than having to learn something new, and I strongly prefer to
use off-the-shelf, tested structures, rather than rolling my own.
I spent some days trying to make it work in more conventional ways ---
nothing worked as well, as cleanly.
I am not advocating this style of programming. I want the flexibility
to use it, when it is called for. Yes, it has its limitations, but
limitations are to be understood and worked with.
And it is now obvious to me that somebody went to great pains to make
sure the Python 'dict' does in fact support this. It works
splendidly. This is no accident.
It's instructive to compare Python's containers to the container
libraries of other languages, Java for instance. In Java, there are
many kinds of container classes, which permits finding one that is
optimal for a given application. In contrast, Python has only a
handfull. But they are meant to be very flexible, and fairly well
optimized for most applcations.
Yet the documentation *not only* suggests that 'dict' and 'set' cannot
be used for keying by identity, it gives no insight whatever into how
their internal hash algorithms use __hash__() and __eq__(). This
results in hundreds of postings by people who want to just do the
right thing, or who want to do something a little different, often
being answered by people who themselves scarcely understand what is
really going on. While researching this question, I found several
places where somebody asked about doing something like what I
described here, but they never got a useful answer. I also found post
that advocate returning the value of id() in __hash__(), without
explaining how __eq__() should then be overloaded.
A little documentation would have saved me personally days of work.
It would be helpful to know:
* under what conditions can one expect a "perfect hash", that is,
one where __eq__() will never be called?
* is it sufficient to return the value of the key object's id()
function to produce a perfect hash?
* when might it be useful to consider keying by identity?
* what are the limitations of programming this way?
On Sat, Oct 26, 2019 at 7:17 AM dieter <dieter at handshake.de> wrote:
> Steve White <stevan.white at gmail.com> writes:
> > Regarding my question
> > "Is there some fatal reason that this approach must never be
> > used, besides the lack of documentary support for it?"
> > If finally dawned on me that there is a use-case for containers that
> > would preclude using object identity for keys. That is, if the object
> > is to be serialized, or otherwise stored past the run-time of the
> > program. Of course, all the identities (all the id() values) will be
> > meaningless once the current run ends.
> One motivation to base dict key management on equality
> (rather than identity) are literals:
> Consider a dict "d" with at some place
> `d["my example key"] = 1` and at a different place
> (maybe another function, another module) you access
> `d["my example key"]`. You would expect to get `1`
> as result as for your eyes the two literals are equal.
> Would the key management be based on identity, then
> you could get either the expected `1` or a `KeyError`.
> The reason: Python does not manage (most) literals globally;
> this means, if you use the same literal in different places
> you may (or may not) have non-identical objects.
> Basing on equality, you are also more flexibal than
> with identity, because can can change the equality
> rules for a class while you cannot change the identity rules.
> Thus, if you need identity based key management,
> define your `__eq__` accordingly.