git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

keying by identity in dict and set


Hi Peter,

Yes you are right.  In fact, I shouldn't even have mentioned the
hash() function... it came from a line of reasoning about what an
implementation might do if very large integers were returned by
__hash__(), and some remarks about the value returned by id() applied
to small integers.

The point is, I don't think __eq__() is ever called in a situation as
described in my post, yet the Python documentation states that if
instances are to be used as keys, it must not be used to determine if
non-identical instances are equivalent.  This puts developers like me
in a rather bad position.

The code seems to work as it is --- but we don't want to put out code
that contravenes the documentation.

The options for following the documentation in this situation are:
either subject users to unfamiliar, custom-made container classes, or
give up the semantics of the "==" operator.

It seems so unnecessary, given (my understanding of) how the classes
actually function.

What can be done?  I would be willing to consider a technical solution.

I would like to see the documentation revised to reflect the actual
workings of the existing container classes.  A couple of sentences in
the documentation of __hash__ could fix the problem.

Is that too much to hope for?  How would I proceed to get somebody to
look at this?

Thanks!

On Sun, Oct 20, 2019 at 5:16 PM Peter Otten <__peter__ at web.de> wrote:
>
> Steve White wrote:
>
> > Hi Chris,
> >
> > Yes, I am aware of the hash of small integers.  But I am not keying
> > with small integers here: I am keying with id() values of class
> > instances.
>
> The id() values /are/ smallish integers though.
>
> (I would guess that this is baked into the CPython source, but did not
> actually check.)
>
> > Precisely what my example shows is that the dict/set algorithms in
> > fact *never* call __eq__, when the id() of a class instance is
> > returned by __hash__ (in the implementations of Python I have tested).
> > Please try the code yourself.  Tell me what I am missing.
>
> I'd state that a bit differently:
>
> (0) Key objects in dicts/sets are only compared if they have the same hash
> value.
>
> The comparison works in two steps:
>
> (1) Are both objects the same (a is b --> True)? Then assume they are equal.
>
> (2) Are they different objects (a is b --> False)? Take the slow path and
> actually invoke __eq__
>
> With a sufficiently weird equality:
>
> >>> class A:
> ...     def __eq__(self, other): return False
> ...
> >>> a = A()
> >>> items = [a]
> >>> a in items
> True
> >>> [v for v in items if v == a]
> []
>
> As you can see the hash is not involved (not even defined).
>
> > What "other problems"?  Please provide an example!
> >
> > Thanks!
> >
> > On Sat, Oct 19, 2019 at 9:02 PM Chris Angelico <rosuav at gmail.com> wrote:
> >>
> >> On Sun, Oct 20, 2019 at 3:08 AM Steve White <stevan.white at gmail.com>
> >> wrote:
> >> > It would appear that if __hash__ returns the id, then that id is used
> >> > internally as the key, and since the id is by definition unique, no
> >> > key collision ever occurs -- at least in every Python implementation
> >> > I've tried. It also seems that, for a class instance obj,
> >> >     hash( hash( obj ) ) == hash( obj )
> >> >     hash( id( obj ) ) == id( obj )
> >> > These are very strong and useful properties.  Where are they
> >> > documented?
> >>
> >> There are two rules that come into play here. One is that smallish
> >> integers use their own value as their hash (so hash(1)==1 etc); the
> >> other is that dictionaries actually look for something that matches on
> >> identity OR equality. That's why using identity instead of equality
> >> will appear to work, even though it can cause other problems when you
> >> mismatch them.
> >>
> >> ChrisA
> >> --
> >> https://mail.python.org/mailman/listinfo/python-list
>
>
> --
> https://mail.python.org/mailman/listinfo/python-list