git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[jira] [Created] (ARROW-3933) pyarrow segfault reading Parquet files from GNOMAD


David Konerding created ARROW-3933:
--------------------------------------

             Summary: pyarrow segfault reading Parquet files from GNOMAD
                 Key: ARROW-3933
                 URL: https://issues.apache.org/jira/browse/ARROW-3933
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++
         Environment: Ubuntu 18.04 or Mac OS X
            Reporter: David Konerding


I am getting segfault trying to run a basic program Ubuntu 18.04 VM (AWS). Error also occurs out of box on Mac OS X.

$ sudo snap install --classic google-cloud-sdk
$ gsutil cp gs://gnomad-public/release/2.0.2/vds/exomes/gnomad.exomes.r2.0.2.sites.vds/rdd.parquet/part-r-00000-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet .
$ conda install pyarrow
$ python test.py
Segmentation fault (core dumped)

test.py:

import pyarrow.parquet as pq
path = "part-r-00000-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet"
pq.read_table(path)

gdb output:

Thread 3 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffdf199700 (LWP 13703)]
0x00007fffdfc2a470 in parquet::arrow::StructImpl::GetDefLevels(short const**, unsigned long*) () from /home/ubuntu/miniconda2/lib/python2.7/site-packages/pyarrow/../../../libparquet.so.11

I tested fastparquet, it reads the file just fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)