git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Bulletproof json.dump?


On Mon, Jul 6, 2020 at 11:31 PM Jon Ribbens via Python-list
<python-list at python.org> wrote:
>
> On 2020-07-06, Chris Angelico <rosuav at gmail.com> wrote:
> > On Mon, Jul 6, 2020 at 11:06 PM Jon Ribbens via Python-list
> ><python-list at python.org> wrote:
> >> The 'json' module already fails to provide round-trip functionality:
> >>
> >>     >>> for data in ({True: 1}, {1: 2}, (1, 2)):
> >>     ...     if json.loads(json.dumps(data)) != data:
> >>     ...         print('oops', data, json.loads(json.dumps(data)))
> >>     ...
> >>     oops {True: 1} {'true': 1}
> >>     oops {1: 2} {'1': 2}
> >>     oops (1, 2) [1, 2]
> >
> > There's a fundamental limitation of JSON in that it requires string
> > keys, so this is an obvious transformation. I suppose you could call
> > that one a bug too, but it's very useful and not too dangerous. (And
> > then there's the tuple-to-list transformation, which I think probably
> > shouldn't happen, although I don't think that's likely to cause issues
> > either.)
>
> That's my point though - there's almost no difference between allowing
> encoding of tuples and allowing encoding of sets. Any argument against
> the latter would also apply against the former. The only possible excuse
> for the difference is "historical reasons", and given that it would be
> useful to allow it, and there would be no negative consequences, this
> hardly seems sufficient.
>
> >> No. I want a JSON encoder to output JSON to be read by a JSON decoder.
> >
> > Does it need to round-trip, though? If you stringify your datetimes,
> > you can't decode it reliably any more. What's the purpose here?
>
> It doesn't need to round trip (which as mentioned above is fortunate
> because the existing module already doesn't round trip). The main use
> I have, and I should imagine the main use anyone has, for JSON is
> interoperability - to safely store and send data in a format in which
> it can be read by non-Python code. If you need, say, date/times to
> be understood as date/times by the receiving code they'll have to
> deal with that explicitly already. Improving Python to allow sending
> them at least gets us part way there by eliminating half the work.

That's fair.

Maybe what we need is to fork out the default JSON encoder into two,
or have a "strict=True" or "strict=False" flag. In non-strict mode,
round-tripping is not guaranteed, and various types will be folded to
each other - mainly, many built-in and stdlib types will be
represented in strings. In strict mode, compliance with the RFC is
ensured (so ValueError will be raised on inf/nan), and everything
should round-trip safely.

I think that even in non-strict mode, round-tripping should be
achieved after one iteration. That is to say, anything you can
JSON-encode will JSON-decode to something that would create the same
encoded form. Not sure if there's anything that would violate that
(weak) guarantee.

ChrisA