git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

on sorting things


On 20/12/2019 18:59, Peter Otten wrote:
> Chris Angelico wrote:
>
>> On Sat, Dec 21, 2019 at 5:03 AM Peter Otten <__peter__ at web.de> wrote:
>>> PS: If you are sorting files by size and checksum as part of a
>>> deduplication effort consider using dict-s instead:
>> Yeah, I'd agree if that's the purpose. But let's say the point is to
>> have a guaranteed-stable ordering of files that are primarily to be
>> sorted by file size - in order to ensure that two files are in the
>> same order every time you refresh the view, they get sorted by their
>> checksums.
> One thing that struck me about Eli's example is that it features two key
> functions rather than a complex comparison.
>
> If sort() would accept a sequence of key functions each function could be
> used to sort slices that compare equal when using the previous key.

You don't need a sequence of key functions : the sort algorithm used in 
Python (tim-sort) is stable - which means if two items (A &B) are in a 
given order in the sequence before the sort starts, and A & B compare 
equal during the sort, then after the sort A & B retain their ordering.

So if you want to sort by file size as the primary and then by checksum 
if file sizes are equal - you sort by checksum first, and then by file 
size: this guarantees that the items will always be in file size order - 
and if file sizes are equal then they will be ordered by checksum.

The rule to remember - is sort in the reverse order of criteria.

>
>> There ARE good reasons to do weird things with sorting, and a custom
>> key object (either with cmp_to_key or directly implemented) can do
>> that.
> Indeed.
>