git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [JAVA] Supporting zero copy arrow-vector


On Sat, Sep 8, 2018 at 8:00 AM Zhenyuan Zhao <zzymtn@xxxxxxxxx> wrote:

> After digging into it a little deeper, I have more questions:
>
> First, vector takes allocator. Zero copy means we should not do any
> additional allocation which implies a dummy allocator with (at most)
> capability of allocating zero length (getEmpty) ArrowBuf is sufficient.
>
I don't see the downside to using the existing allocator as opposed to
creating a dummy. if it doesn't allocate anything, what's the problem?


> However, there are places in vector that requires more allocation:
>
>
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/BaseFixedWidthVector.java#L511
>
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/BitVectorHelper.java#L180
>
> Vector will allocate in the case of all null or non null. Id does seem like
> optimization that can be done, but why it reallocate without looking into
> if validity buffer is really empty? Take fixed width vector as example, it
> in fact does check buffers count is two, and for my simple test case, I saw
> validity buffer is still being sent in non null case.
>
> Agree that ideally we should only allocate if the source was not provided.
Seems like that could be improved.


> Second, arrow made a decision to only support off-heap buffer. Why? Doesn't
> affect my use case, but sounds like this can be more flexible.
>

Perf and GC.