git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Arrow & plasma - java sample to store complex objects


Bleeding edge is probably an understatement. We need someone to implement
https://issues.apache.org/jira/browse/ARROW-2892 before this is really
feasible without copy. You could do it with Glue code today and a copy from
the memory used in the current Plasma Java client to the memory used in
BufferAllocator (and vice versa).

On Mon, Aug 6, 2018 at 4:19 PM, Wes McKinney <wesmckinn@xxxxxxxxx> wrote:

> hi Gerard,
>
> This is the right place to ask questions. The Slack channel was closed
> (see prior discussions on the mailing list); few Java developers were
> on Slack anyway so it wouldn't have been a good place to get help.
>
> Using Java with Plasma is very bleeding edge territory. I don't know
> if anyone has an example yet of using Plasma with end-to-end Arrow
> columnar read and write. I would say it's definitely the domain of
> developers working on the Java codebase to build out support tooling
> for these workflows. We'd be glad to have you involved.
>
> For general Arrow workflows, I would recommend looking at the Arrow
> conversion paths in the Spark SQL codebase. There we have record
> batches being streamed to Python and then results received back on the
> JVM side.
>
> - Wes
>
> On Mon, Aug 6, 2018 at 10:19 AM, Gérard Dupont <ger.dupont@xxxxxxxxx>
> wrote:
> > Hi,
> > Not sure this is the right channel for a "user" oriented question but the
> > slack channel on heroku seams to be down...
> >
> > TL;DR: is there some hidden tutorial/java samples to store complex data
> > objects in arrow and access (put/get) with plasma? I'm currently
> exploring
> > the unit test from the java part of the source, but it's not really
> > obvious...
> >
> > So, I'm starting with arrow and actually I went in for the plasma object
> > store which should address an issue I currently have: sharing objects
> > between process on multiples servers to serve as initial starting point
> in
> > computations.
> >
> > The computation parts are already done since the need for distribution
> just
> > recently emerged, and I'm trying to see if I can port the data object
> > within arrow to distribute them over plasma. So far so good: I can launch
> > the plasma_store and access it through the ObjectStoreLink API. But it's
> > really on the byte array level. Any advice or best practice on how to
> > convert existing data model to arrow compliant one? Should I look into
> the
> > Arrow Schema example?
> >
> > Thanks for any pointer.
> >
> > Cheers,
> > --
> > Gérard Dupont
>


( ! ) Warning: include(msgfooter.php): failed to open stream: No such file or directory in /var/www/git/apache-arrow-development/msg05391.html on line 129
Call Stack
#TimeMemoryFunctionLocation
10.0007358376{main}( ).../msg05391.html:0

( ! ) Warning: include(): Failed opening 'msgfooter.php' for inclusion (include_path='.:/var/www/git') in /var/www/git/apache-arrow-development/msg05391.html on line 129
Call Stack
#TimeMemoryFunctionLocation
10.0007358376{main}( ).../msg05391.html:0