[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Format] Pointer types / span types

It sounds like the "span" type could be implemented as a composite of
multiple Arrow arrays / schemas:

array 1 (data)
any schema

array 2 (view)
struct <
  start: int64,
  stop: int64

Unless I'm missing something, this feels like an application-level
concern rather than something that needs to be addressed in the
columnar format / metadata.

On Tue, May 1, 2018 at 9:43 AM, Antoine Pitrou <antoine@xxxxxxxxxx> wrote:
> IIUC, the point is to have different logical views over the same data.
> So you could have e.g. a "sorted" view.  You could also have a view
> spanning a tiny fraction of the original data (you can probably also
> encode that with a null bitmap, but if most values are nulls that is
> less efficient).
> Regards
> Antoine.
> Le 01/05/2018 à 15:24, Brian Hulette a écrit :
>> Yeah I see that difference. I guess my question was really - is there a
>> reason not to re-arrange the actual list data so that an offset array
>> will work?
>> Perhaps they actually want to be able to specify lists with overlap? Or
>> maybe there is meaning to the original order of the list data? I suppose
>> that latter option seems more likely.
>> Brian
>> On 04/30/2018 05:42 PM, Antoine Pitrou wrote:
>>> Le 30/04/2018 à 23:39, Brian Hulette a écrit :
>>>> Yes my first reaction to both of these requests is
>>>> - would dictionary-encoding work?
>>>> - would a List<T> work?
>>>> I think for the former the analogy is more clear, for the latter,
>>>> technically a List encodes start and stop indices with an offset array
>>>> rather than separate arrays for start and stop indices. Is there a
>>>> reason an offset array wouldn't work for the OAMap use-case though?
>>> With an offsets array, spans (lists) are contiguous: span N + 1 starts
>>> off where span N stops.  With separate start/stops array, they needn't
>>> be: the logical array can "walk" the physical array in any order.
>>> Regards
>>> Antoine.