Re: Peak memory usage for pyarrow.parquet.read_table
I am not. Should I be? I forgot to mention earlier that the Parquet file
came from Spark/PySpark.
On Wed, Apr 25, 2018 at 1:32 PM Uwe L. Korn <uwelk@xxxxxxxxxx> wrote:
> Hello Bryant,
> are you using any options on `pyarrow.parquet.read_table` or a possible
> `to_pandas` afterwards?
> On Wed, Apr 25, 2018, at 7:27 PM, Bryant Menn wrote:
> > I tried reading a Parquet file (<200MB, lots of text with snappy) using
> > read_table and saw the memory usage peak over 8GB before settling back
> > to ~200MB. This surprised me as I was expecting to be able to handle a
> > Parquet file of this size with much less RAM (doing some processing with
> > smaller VMs).
> > I am not sure if this expected, but I thought I might check with everyone
> > here and learn something new. Poking around it seems to be related with
> > ParquetReader.read_all?
> > Thanks in advance,
> > Bryant