git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[jira] [Created] (ARROW-2966) Data type conversion error


Christopher Brooks created ARROW-2966:
-----------------------------------------

             Summary: Data type conversion error
                 Key: ARROW-2966
                 URL: https://issues.apache.org/jira/browse/ARROW-2966
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.9.0
         Environment: linux
            Reporter: Christopher Brooks


I have a big pandas dataframe. I try and convert that to a pyarrow table and it fails with a conversion error. Not sure if this is a bug or is expected? 

I realize the code below showing the error is pretty useless as is. *What can I do to help identify the cause in my pandas dataframe?*

Here's the error:

 
{code:java}
In [17]: pa.Table.from_pandas(df)
---------------------------------------------------------------------------
ArrowInvalid Traceback (most recent call last)
<ipython-input-17-6eac5d0eec08> in <module>()
----> 1 pa.Table.from_pandas(df)

table.pxi in pyarrow.lib.Table.from_pandas()

~/.local/share/virtualenvs/iq-si-grade-prediction-zHTZ6n2S/lib/python3.6/site-packages/pyarrow/pandas_compat.py in dataframe_to_arrays(df, schema, preserve_index, nthreads)
375 arrays = list(executor.map(convert_column,
376 columns_to_convert,
--> 377 convert_types))
378 
379 types = [x.type for x in arrays]

~/anaconda3/lib/python3.6/concurrent/futures/_base.py in result_iterator()
584 # Careful not to keep a reference to the popped future
585 if timeout is None:
--> 586 yield fs.pop().result()
587 else:
588 yield fs.pop().result(end_time - time.time())

~/anaconda3/lib/python3.6/concurrent/futures/_base.py in result(self, timeout)
423 raise CancelledError()
424 elif self._state == FINISHED:
--> 425 return self.__get_result()
426 
427 self._condition.wait(timeout)

~/anaconda3/lib/python3.6/concurrent/futures/_base.py in __get_result(self)
382 def __get_result(self):
383 if self._exception:
--> 384 raise self._exception
385 else:
386 return self._result

~/anaconda3/lib/python3.6/concurrent/futures/thread.py in run(self)
54 
55 try:
---> 56 result = self.fn(*self.args, **self.kwargs)
57 except BaseException as exc:
58 self.future.set_exception(exc)

~/.local/share/virtualenvs/iq-si-grade-prediction-zHTZ6n2S/lib/python3.6/site-packages/pyarrow/pandas_compat.py in convert_column(col, ty)
364 
365 def convert_column(col, ty):
--> 366 return pa.array(col, from_pandas=True, type=ty)
367 
368 if nthreads == 1:

array.pxi in pyarrow.lib.array()

error.pxi in pyarrow.lib.check_status()

error.pxi in pyarrow.lib.check_status()

ArrowInvalid: Error converting from Python objects to Double: Got Python object of type str but can only handle these types: float

In [18]: pa.__version__
Out[18]: '0.9.0'

In [19]: pd.__version__
Out[19]: '0.23.3'

{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)