[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Format] Pointer types / span types

Yes my first reaction to both of these requests is
- would dictionary-encoding work?
- would a List<T> work?

I think for the former the analogy is more clear, for the latter, technically a List encodes start and stop indices with an offset array rather than separate arrays for start and stop indices. Is there a reason an offset array wouldn't work for the OAMap use-case though?


On 04/30/2018 04:55 PM, Antoine Pitrou wrote:
Actually, "pointer type" might just be another name for "dictionary type".



Le 30/04/2018 à 22:08, Antoine Pitrou a écrit :

Today I got the opportunity to talk with Jim Pivarski, the main
developer on the OAMap project (*).  Under the hood, he is doing
something not unlike the Arrow representation of nested arrays: he
stores and processes structured data as linear arrays, allowing very
fast processing on seemingly irregular data (in Array parlance, think
something like lists of lists of structs).  It seems that OAMap data
requires two kinds of logical types that Arrow misses :

- a pointer type, where a physical array of ints is used to represent
indices into another array (the logical value being of course the value
pointed to)
- a span type, where two physical arrays of ints are used to represent
start and stop indices into another array (the logical value being the
list of values delimited by the start / stop indices)

Did such a feature request already come by?  Is this something we should
add to our roadmap or future wishlist?