[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[jira] [Created] (ARROW-3909) [Python] Table.from_pandas call that seemingly should zero copy does not
Wes McKinney created ARROW-3909:
-----------------------------------
Summary: [Python] Table.from_pandas call that seemingly should zero copy does not
Key: ARROW-3909
URL: https://issues.apache.org/jira/browse/ARROW-3909
Project: Apache Arrow
Issue Type: Bug
Components: Python
Reporter: Wes McKinney
Fix For: 0.12.0
While doing some performance testing, I noticed that a {{Table.from_pandas}} call that ought to be zero-copy / free was taking 50ms
{code}
import pandas as pd
import pyarrow as pa
import numpy as np
K = 1000
N = 50000000
df = pd.DataFrame({'ints': np.tile(np.arange(K), N // K)})
table = pa.Table.from_pandas(df)
{code}
I see
{code}
In [14]: timeit table = pa.Table.from_pandas(df)
51.9 ms ± 751 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
{code}
I haven't determined what's going on (is it counting nulls?), and initial attempts to get a Flamegraph produced a bunch of "unknown" entries
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)