git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

keying by identity in dict and set


Steve White <stevan.white at gmail.com> writes:
> Yes, there are several options, but they all involve an extra layer
> that detracts between the interface I am building and my user's code.

Do the wrapping behind the application interface.

And in a private email you told me that this is for a very
special case -- which likely means "rarely used".
For such cases, a bit "detraction" might be acceptable
(much more acceptable than a counter-intuitive behaviour
of a standard data type).

> In this situation, the objects being used as keys are conceptually the
> unique entities that the user deals with, even if their data is
> non-unique.  And I do not want to subject the user to the un-pythonic
> use of some operator other than '==' to determine their equivalence.

One possibility would be to have a special class for those
objects which implements both "__eq__" and "__hash__" via the
"id" function.

> As near as I can tell, returning the id() in __hash__() results in a
> perfect hash key.  I really want to know if that is true.

Such a "__hash__" implementation is almost surely perfect
(unless your objects' lifetime exceeds the process lifetime)
**BUT** nevertheless define "__eq__" in the same way.
This is because the hash value is not directly used to determine
a "dict" slot; to conserve space, the "dict" size determines the
number of available slots; therefore, even with a perfect
hash, two keys (with different hash values) may end up in the same
"dict" slot. It is an implementation details whether Python
distinguishes keys in the same slot using only "__eg__" or "__hash__"
and "__eq__" (implementation detail means that different Python
implementations may behave differently). Thus, follow
the documentation and define "__eq__" and "__hash__" in a compatible
way.