git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Py] writing 2- or 4-byte decimal columns to Parquet


We do "shrink" the input 128-bit decimals to the smallest number of
bytes that fits, though, is that right?

https://github.com/apache/parquet-cpp/blob/c405bf36506ec584e8009a6d53349277e600467d/src/parquet/arrow/schema.cc#L635

On Thu, Apr 19, 2018 at 8:09 AM, Phillip Cloud <cpcloud@xxxxxxxxx> wrote:
> Hi Colin,
>
> Only 128 bit decimal writing is supported right now. Feel free to open a
> JIRA about this.
>
> On Wed, Apr 18, 2018, 19:10 Wes McKinney <wesmckinn@xxxxxxxxx> wrote:
>
>> hi Colin,
>>
>> Phillip Cloud is the expert on this topic, but I believe we only
>> support writing decimals to FIXED_LEN_BYTE_ARRAY physical type in
>> Parquet right now
>>
>>
>> https://github.com/apache/parquet-cpp/blob/master/src/parquet/arrow/writer.cc#L798
>>
>> The size of the type depends on the decimal precision, so if we can
>> write to 32- or 64-bit, then we do that. Writing to INT32 or INT64
>> would be more complicated and require some work in parquet-cpp
>>
>> - Wes
>>
>> On Wed, Apr 18, 2018 at 7:04 PM, Colin Nichols <colin@xxxxxxxxxxxx> wrote:
>> > Hi all,
>> >
>> > Any thoughts on the below?  I did a little more code browsing and I'm not
>> > sure this is supported right now, should I open a Jira ticket?
>> >
>> > - Colin
>> >
>> > On Tue, Apr 17, 2018 at 11:11 PM, Colin Nichols <colin@xxxxxxxxxxxx>
>> wrote:
>> >
>> >> Hi there,
>> >>
>> >> I know (py)arrow has the decimal128() type, and using this type it's
>> easy
>> >> to take an array of Python Decimals, convert to a pa.array, and write
>> out
>> >> to Parquet.
>> >>
>> >> In the absence (afaict) of decimal32 and decimal64 types, is it possible
>> >> to go from an array of Decimals (with compatible precision/scale) and
>> write
>> >> them to a parquet column of 32- or 64- bit width?
>> >>
>> >> Relevant parquet spec -- https://github.com/apache/
>> >> parquet-format/blob/master/LogicalTypes.md#decimal
>> >>
>> >> I'm looking to add this functionality to the project Spectrify, as AWS
>> >> Redshift Spectrum will not query unnecessarily-wide DECIMAL columns --
>> >> https://github.com/hellonarrativ/spectrify/issues/14
>> >>
>> >> Thanks,
>> >> Colin
>> >>
>> >>
>>



( ! ) Warning: include(msgfooter.php): failed to open stream: No such file or directory in /var/www/git/apache-arrow-development/msg04225.html on line 133
Call Stack
#TimeMemoryFunctionLocation
10.0006368696{main}( ).../msg04225.html:0

( ! ) Warning: include(): Failed opening 'msgfooter.php' for inclusion (include_path='.:/var/www/git') in /var/www/git/apache-arrow-development/msg04225.html on line 133
Call Stack
#TimeMemoryFunctionLocation
10.0006368696{main}( ).../msg04225.html:0