git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Schema class in 2.5 ?


Hi,


ParquetIO needs avro Schema(org.apache.avro.Schema) to read and write.

Will it also be possible not to use any avro Schema at all or use Beams Schema (org.apache.beam.sdk.schemas.Schema)


Regards,

Akanksha


From: Akanksha Sharma B
Sent: Thursday, July 12, 2018 1:13:14 PM
To: user@xxxxxxxxxxxxxxx
Subject: Re: Schema class in 2.5 ?
 

From: Alexey Romanenko <aromanenko.dev@xxxxxxxxx>
Sent: Thursday, July 12, 2018 12:31:02 PM
To: user@xxxxxxxxxxxxxxx
Subject: Re: Schema class in 2.5 ?
 
Good catch, Akanksha!
Yes, RowType was renamed to Schema a while ago and BeamSQL doc seems was not updated.
Could you create a Jira issue for that?

On 12 Jul 2018, at 11:10, Akanksha Sharma B <akanksha.b.sharma@xxxxxxxxxxxx> wrote:

Hi,

As I see, in 2.5 BeamSQL had been changed to work with Schema.
The sample code provided in https://beam.apache.org/documentation/dsls/sql/walkthrough/ does not compile with Beam 2.5, and needs to be updated.

 Row
                  .withRowType(appType)

The above mentioned line needs to be adapted to use schema.
Apache Beam is an open source, unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting Enterprise Integration Patterns (EIPs) and Domain Specific Languages (DSLs).
Regards,

Akanksha

From: Akanksha Sharma B
Sent: Wednesday, July 11, 2018 11:02:37 AM
To: user@xxxxxxxxxxxxxxx
Subject: Re: Schema class in 2.5 ?
 
Thanks a lot!!!

From: Alexey Romanenko <aromanenko.dev@xxxxxxxxx>
Sent: Wednesday, July 11, 2018 11:01:05 AM
To: user@xxxxxxxxxxxxxxx
Subject: Re: Schema class in 2.5 ?
 
Hi Akanksha,

I believe this design document can be helpful for you:
https://docs.google.com/document/d/1tnG2DPHZYbsomvihIpXruUmQ12pHGK0QIvXS1FOTgRc

On 11 Jul 2018, at 10:38, Akanksha Sharma B <akanksha.b.sharma@xxxxxxxxxxxx> wrote:

Hi,

Can you please share some documentation about ongoing changes related to Schema class. 
I am looking to understand why is it being introduced and how can I use it.
I was looking for something like RDD in Beam, i.e. Beam understands schema of data internally and thus can handle some conversions itself, e.g. to SqlRow, ParquetFile etc. 

Regards,
Akanksha