git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[jira] [Created] (ARROW-3020) Addition of option to allow empty row groups in pyarrow


Alex Mendelson created ARROW-3020:
-------------------------------------

             Summary: Addition of option to allow empty row groups in pyarrow
                 Key: ARROW-3020
                 URL: https://issues.apache.org/jira/browse/ARROW-3020
             Project: Apache Arrow
          Issue Type: New Feature
          Components: C++, Python
            Reporter: Alex Mendelson


While our use case is not common, I was able to find one related request from roughly a year ago. Could this be added as a feature?

https://issues.apache.org/jira/browse/PARQUET-1047

*Motivation*

We have an application where each row is associated with one of N contexts, though a minority of contexts may have no associated rows. When encountering the Nth context, we will wish to retrieve all the associated rows. Row groups would provide a natural way to index the data, as the nth context could naturally relate to the nth row group.

Unfortunately, this is not possible at the present time, as pyarrow does not support writing empty row groups. If one writes a pyarrow.Table containing zero rows using pyarrow.parquet.ParquetWriter, it is omitted from the final file, and this distorts the indexing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)