[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Sync Call Notes

Hello Antoine,

comments inline.

On Thu, Apr 19, 2018, at 2:21 PM, Antoine Pitrou wrote:
> On Thu, 19 Apr 2018 07:17:34 -0400
> Alex Hagerman <alex@xxxxxxxxxxxxxxxxx> wrote:
> > Notes from yesterdays sync call:
> > 
> > Uwe suggested adding in checks for the C++ ABI to detect breaking 
> > changes. Discussed adding this to a CI build job daily.
> > 
> > Wes asked if certain C++ symbols could be marked experimental when 
> > performing the C++ ABI checks.
> > 
> > Uwe also mentioned the potential of using PIMPLs to hide pointers and 
> > implementation to prevent future C++ ABI breakage. He mentioned Parquet 
> > C++ has a similar setup.  
> Some questions:
> 1) are we ok with paying the cost of pimpls? (mostly the indirection
> cost I guess, and the fact that we can't have inline methods/accessors
> anymore)

I'm not sure about how much of the cost we're ready to pay. There is a certain element to keeping a stable ABI (this is done fantastically by the NumPy people), you can do patch releases without consumers worrying if they need to rebuild their binaries.

The indirection on paths that call expensive functions is certainly no problem, i.e. if you have a table and select a column, this is an operation you don't do often, thus I think the overhead is acceptable. On the other hand, accessing the null_count or the length of an array is definitely an operation that is performed quite often. These should be as fast as possible.

I cannot give you a certain answer, once I have the relevant time, I'll try to implement and profile some of the possible approaches. 

> 2) how do we do for things like ArrayData, which seems publicly exposed
> by design?

ArrayData is marked as internal and thus I would feel ok to break its ABI between non-major releases. If people really depend on its usage, then we should think of a clear way to make it public / non-internal.

> More generally, is it wise to focus on ABI compatibility even before a
> 1.0 is released?

Probably not. I care about that already because of two things:

 * I want to have a stable ABI for the core features for 1.0
 * I maintain a set of binary packages that depend on Arrow C++, having a stable ABI makes their maintenance much simpler. Most notably, for each new Arrow patch release, we also need to do a patch release of turbodbc currently. This is not something I would burden on everyone that depends on Arrow.