Re: Arrow & plasma - java sample to store complex objects
Bleeding edge is probably an understatement. We need someone to implement
https://issues.apache.org/jira/browse/ARROW-2892 before this is really
feasible without copy. You could do it with Glue code today and a copy from
the memory used in the current Plasma Java client to the memory used in
BufferAllocator (and vice versa).
On Mon, Aug 6, 2018 at 4:19 PM, Wes McKinney <wesmckinn@xxxxxxxxx> wrote:
> hi Gerard,
> This is the right place to ask questions. The Slack channel was closed
> (see prior discussions on the mailing list); few Java developers were
> on Slack anyway so it wouldn't have been a good place to get help.
> Using Java with Plasma is very bleeding edge territory. I don't know
> if anyone has an example yet of using Plasma with end-to-end Arrow
> columnar read and write. I would say it's definitely the domain of
> developers working on the Java codebase to build out support tooling
> for these workflows. We'd be glad to have you involved.
> For general Arrow workflows, I would recommend looking at the Arrow
> conversion paths in the Spark SQL codebase. There we have record
> batches being streamed to Python and then results received back on the
> JVM side.
> - Wes
> On Mon, Aug 6, 2018 at 10:19 AM, Gérard Dupont <ger.dupont@xxxxxxxxx>
> > Hi,
> > Not sure this is the right channel for a "user" oriented question but the
> > slack channel on heroku seams to be down...
> > TL;DR: is there some hidden tutorial/java samples to store complex data
> > objects in arrow and access (put/get) with plasma? I'm currently
> > the unit test from the java part of the source, but it's not really
> > obvious...
> > So, I'm starting with arrow and actually I went in for the plasma object
> > store which should address an issue I currently have: sharing objects
> > between process on multiples servers to serve as initial starting point
> > computations.
> > The computation parts are already done since the need for distribution
> > recently emerged, and I'm trying to see if I can port the data object
> > within arrow to distribute them over plasma. So far so good: I can launch
> > the plasma_store and access it through the ObjectStoreLink API. But it's
> > really on the byte array level. Any advice or best practice on how to
> > convert existing data model to arrow compliant one? Should I look into
> > Arrow Schema example?
> > Thanks for any pointer.
> > Cheers,
> > --
> > Gérard Dupont