[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Arrow & plasma - java sample to store complex objects

Bleeding edge is probably an understatement. We need someone to implement before this is really
feasible without copy. You could do it with Glue code today and a copy from
the memory used in the current Plasma Java client to the memory used in
BufferAllocator (and vice versa).

On Mon, Aug 6, 2018 at 4:19 PM, Wes McKinney <wesmckinn@xxxxxxxxx> wrote:

> hi Gerard,
> This is the right place to ask questions. The Slack channel was closed
> (see prior discussions on the mailing list); few Java developers were
> on Slack anyway so it wouldn't have been a good place to get help.
> Using Java with Plasma is very bleeding edge territory. I don't know
> if anyone has an example yet of using Plasma with end-to-end Arrow
> columnar read and write. I would say it's definitely the domain of
> developers working on the Java codebase to build out support tooling
> for these workflows. We'd be glad to have you involved.
> For general Arrow workflows, I would recommend looking at the Arrow
> conversion paths in the Spark SQL codebase. There we have record
> batches being streamed to Python and then results received back on the
> JVM side.
> - Wes
> On Mon, Aug 6, 2018 at 10:19 AM, Gérard Dupont <>
> wrote:
> > Hi,
> > Not sure this is the right channel for a "user" oriented question but the
> > slack channel on heroku seams to be down...
> >
> > TL;DR: is there some hidden tutorial/java samples to store complex data
> > objects in arrow and access (put/get) with plasma? I'm currently
> exploring
> > the unit test from the java part of the source, but it's not really
> > obvious...
> >
> > So, I'm starting with arrow and actually I went in for the plasma object
> > store which should address an issue I currently have: sharing objects
> > between process on multiples servers to serve as initial starting point
> in
> > computations.
> >
> > The computation parts are already done since the need for distribution
> just
> > recently emerged, and I'm trying to see if I can port the data object
> > within arrow to distribute them over plasma. So far so good: I can launch
> > the plasma_store and access it through the ObjectStoreLink API. But it's
> > really on the byte array level. Any advice or best practice on how to
> > convert existing data model to arrow compliant one? Should I look into
> the
> > Arrow Schema example?
> >
> > Thanks for any pointer.
> >
> > Cheers,
> > --
> > Gérard Dupont