git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [EXTERNAL] - Re: Create PCollection<ABC> from List<ABC>in DoFn<>


And if you have a DoFn<X, List<ABC>> you can follow this with Flatten.iterables to turn the output PCollection<List<ABC>> into a PCollection<ABC>. In some cases you may want to follow this with a Reshuffle so that the outputs from a single X get distributed among multiple machines. 

On Thu, Jun 7, 2018 at 8:19 AM Marián Dvorský <mariand@xxxxxxxxxx> wrote:
If you have a function which given X returns a List<ABC>, you can use FlatMapElements transform on PCollection<X> to get a PCollection<ABC>.

On Thu, Jun 7, 2018 at 8:16 AM S. Sahayaraj <ssahayaraj@xxxxxxxxx> wrote:

In case if we could return List<ABC> from DoFn<> then we could use the code as suggested in section 3.1.2 and mentioned by you below., but the return type of DoFn<> is always PCollection<> in where I could not have the list of ABC objects which further will be fed as input for parallel computation. Is there any possibility to convert List<ABC> to PCollection<ABC> in DoFn<> itself? OR can DoFn<> return List<ABC> objects?

 

 

Cheers,

S. Sahayaraj

 

From: Robert Bradshaw [mailto:robertwb@xxxxxxxxxx]
Sent: Wednesday, June 6, 2018 9:40 PM
To: user@xxxxxxxxxxxxxxx
Subject: [EXTERNAL] - Re: Create PCollection<ABC> from List<ABC>in DoFn<>

 

You can use the Create transform to do this, e.g.

 

  Pipeline p = ...

  List<ABC> inMemoryObjects = ...

  PCollection<ABC> pcollectionOfObject = p.apply(Create.of(inMemoryObjects));

  result = pcollectionOfObject.apply(ParDo.of(SomeDoFn...));

 

 

On Wed, Jun 6, 2018 at 8:34 AM S. Sahayaraj <ssahayaraj@xxxxxxxxx> wrote:

Hello,

                I have created a java class which extends DoFn<>, there are list of objects of type ABC (List<ABC>) populated in processElement(ProcessContext c) at runtime and would like to generate respective PCollection<ABC> from List<ABC> so that the subsequent transformation can do parallel execution on each ABC object in PCollection<ABC>. How do we create PCollection from in-memory object created in DoFn<>? OR How do we get pipeline object being in DoFn<>? OR is there any SDK guidelines to refer?

 

 

Thanks,

S. Sahayaraj