git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Help with Java API and RecordBatch creation


hi Alberto,

Have you looked at the relevant usage of Arrow in Apache Spark? See

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala

and related modules.

On your first question, my understanding is that

* ArrowRecordBatch represents the in-memory record batch and
* RecordBatch (in org.apache.arrow.flatbuf) is for the serialized
record batch metadata, called the "data header" commonly (defined in
Message.fbs)

- Wes

On Sun, Aug 5, 2018 at 9:13 AM, ALBERTO Bocchinfuso
<alberto_boc_94@xxxxxxxxxx> wrote:
>
> Good morning,
>
> I have to use apache arrow with scala, so I’m using the Java API from scala, but I’m confused, I hope that someone is going to clarify something for me.
>
> First of all, what is the difference between ArrowRecordBatch (in org.apache.arrow.vector.ipc.message) and RecordBatch (in org.apache.arrow.flatbuf)?
> In this regard, if a coder wants to use arrow just for IPC, should she consider only the classes in the package org.apache.arrow.vector, or should she learn also how to use the other packages, particularly io.netty.buffer and org.apache.arrow.memory and org.apache.arrow.flatbuf?
>
> I don’t understand how to perform in java everything that is done in python like in the documentation pages:
>              http://arrow.apache.org/docs/python/data.html
>              http://arrow.apache.org/docs/python/ipc.html
>
> I’d like to understand how I can create what in python is called a RecordBatch, and serialize it in a stream, for example to write it on a file or whatever.
> I think ArrowRecordBatch can be created by using the constructors, once you built a list of ArrowFieldNode (I haven’t understood what this class stands for, to be honest) and ArrowBuff (I haven’t understood how to create one, I think that I should instantiate an ArrowByteBufAllocator though alloc(), but then I wouldn’t know how to procede...), but I’m not sure.
> I hope that my doubts are going to be cleared.
>
> Thank you,
> Alberto
>



( ! ) Warning: include(msgfooter.php): failed to open stream: No such file or directory in /var/www/git/apache-arrow-development/msg05325.html on line 107
Call Stack
#TimeMemoryFunctionLocation
10.0007368696{main}( ).../msg05325.html:0

( ! ) Warning: include(): Failed opening 'msgfooter.php' for inclusion (include_path='.:/var/www/git') in /var/www/git/apache-arrow-development/msg05325.html on line 107
Call Stack
#TimeMemoryFunctionLocation
10.0007368696{main}( ).../msg05325.html:0