git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Java] org.apache.arrow.vector.ipc.ArrowWriter.recordBlocks


From my time working on the arrow writers, I think that would be fine. You could do the same thing with the dictionary blocks, as well.

As an implementation idea, it might be cleaner to add some callback hooks, i.e. onRecordBlockWritten(), and then implement that in the FileWriter instead of having the base ArrowWriter track the blocks.

Thanks,

Emilio

On 04/27/2018 03:19 PM, Eric Wohlstadter wrote:
Hi all,
  In the context of ArrowStreamWriter:

- It looks like field ArrowWriter.recordBlocks is populated and consumes
memory, e.g. in ArrowWriter.writeRecordBatch

- But the List<ArrowBlock> is never used (it is used in ArrowFileWriter but
not ArrowStreamWriter)

Would it be safe for me to extend ArrowStreamWriter and override
writeRecordBatch with an implementation that does not populate the
recordBlocks?

This is for HIVE-19305 (if anyone has time to take a look and provide
feedback, that would be much appreciated)

Thanks for your help,

--Eric