git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Py] writing 2- or 4-byte decimal columns to Parquet


That's right. Shrinking happens here:
https://github.com/apache/parquet-cpp/blob/master/src/parquet/arrow/writer.cc#L808-L809

On Thu, Apr 19, 2018 at 9:40 AM Wes McKinney <wesmckinn@xxxxxxxxx> wrote:

> We do "shrink" the input 128-bit decimals to the smallest number of
> bytes that fits, though, is that right?
>
>
> https://github.com/apache/parquet-cpp/blob/c405bf36506ec584e8009a6d53349277e600467d/src/parquet/arrow/schema.cc#L635
>
> On Thu, Apr 19, 2018 at 8:09 AM, Phillip Cloud <cpcloud@xxxxxxxxx> wrote:
> > Hi Colin,
> >
> > Only 128 bit decimal writing is supported right now. Feel free to open a
> > JIRA about this.
> >
> > On Wed, Apr 18, 2018, 19:10 Wes McKinney <wesmckinn@xxxxxxxxx> wrote:
> >
> >> hi Colin,
> >>
> >> Phillip Cloud is the expert on this topic, but I believe we only
> >> support writing decimals to FIXED_LEN_BYTE_ARRAY physical type in
> >> Parquet right now
> >>
> >>
> >>
> https://github.com/apache/parquet-cpp/blob/master/src/parquet/arrow/writer.cc#L798
> >>
> >> The size of the type depends on the decimal precision, so if we can
> >> write to 32- or 64-bit, then we do that. Writing to INT32 or INT64
> >> would be more complicated and require some work in parquet-cpp
> >>
> >> - Wes
> >>
> >> On Wed, Apr 18, 2018 at 7:04 PM, Colin Nichols <colin@xxxxxxxxxxxx>
> wrote:
> >> > Hi all,
> >> >
> >> > Any thoughts on the below?  I did a little more code browsing and I'm
> not
> >> > sure this is supported right now, should I open a Jira ticket?
> >> >
> >> > - Colin
> >> >
> >> > On Tue, Apr 17, 2018 at 11:11 PM, Colin Nichols <colin@xxxxxxxxxxxx>
> >> wrote:
> >> >
> >> >> Hi there,
> >> >>
> >> >> I know (py)arrow has the decimal128() type, and using this type it's
> >> easy
> >> >> to take an array of Python Decimals, convert to a pa.array, and write
> >> out
> >> >> to Parquet.
> >> >>
> >> >> In the absence (afaict) of decimal32 and decimal64 types, is it
> possible
> >> >> to go from an array of Decimals (with compatible precision/scale) and
> >> write
> >> >> them to a parquet column of 32- or 64- bit width?
> >> >>
> >> >> Relevant parquet spec -- https://github.com/apache/
> >> >> parquet-format/blob/master/LogicalTypes.md#decimal
> >> >>
> >> >> I'm looking to add this functionality to the project Spectrify, as
> AWS
> >> >> Redshift Spectrum will not query unnecessarily-wide DECIMAL columns
> --
> >> >> https://github.com/hellonarrativ/spectrify/issues/14
> >> >>
> >> >> Thanks,
> >> >> Colin
> >> >>
> >> >>
> >>
>


( ! ) Warning: include(msgfooter.php): failed to open stream: No such file or directory in /var/www/git/apache-arrow-development/msg04226.html on line 145
Call Stack
#TimeMemoryFunctionLocation
10.0010368696{main}( ).../msg04226.html:0

( ! ) Warning: include(): Failed opening 'msgfooter.php' for inclusion (include_path='.:/var/www/git') in /var/www/git/apache-arrow-development/msg04226.html on line 145
Call Stack
#TimeMemoryFunctionLocation
10.0010368696{main}( ).../msg04226.html:0