git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Peak memory usage for pyarrow.parquet.read_table


Hello Bryant,

are you using any options on `pyarrow.parquet.read_table` or a possible `to_pandas` afterwards?

Uwe

On Wed, Apr 25, 2018, at 7:27 PM, Bryant Menn wrote:
> I tried reading a Parquet file (<200MB, lots of text with snappy) using
> read_table and saw the memory usage peak over 8GB before settling back down
> to ~200MB. This surprised me as I was expecting to be able to handle a
> Parquet file of this size with much less RAM (doing some processing with
> smaller VMs).
> 
> I am not sure if this expected, but I thought I might check with everyone
> here and learn something new. Poking around it seems to be related with
> ParquetReader.read_all?
> 
> Thanks in advance,
> Bryant



( ! ) Warning: include(msgfooter.php): failed to open stream: No such file or directory in /var/www/git/apache-arrow-development/msg04289.html on line 88
Call Stack
#TimeMemoryFunctionLocation
10.0006363064{main}( ).../msg04289.html:0

( ! ) Warning: include(): Failed opening 'msgfooter.php' for inclusion (include_path='.:/var/www/git') in /var/www/git/apache-arrow-development/msg04289.html on line 88
Call Stack
#TimeMemoryFunctionLocation
10.0006363064{main}( ).../msg04289.html:0