git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: BigqueryIO field clustering


Any updates on this? The pull request is already open for a month.

I think we should at least provide some basic feedback, e.g. whether we intend to merge the PR, any problems with the code or tests.

I'd like to help reviewing it but I feel like someone familiar with BigQuery should have a look first.

Thanks,
Max

PS: https://github.com/apache/beam/pull/7061

On 28.11.18 19:27, Chamikara Jayalath wrote:
Thanks for the contribution. I can take a look later this week.

On Wed, Nov 28, 2018 at 12:29 AM Wout Scheepers <Wout.Scheepers@xxxxxxxxxxxxxxxxxxx <mailto:Wout.Scheepers@xxxxxxxxxxxxxxxxxxx>> wrote:

    Hey all,____

    __ __

    Almost two weeks ago, I create a PR to support BigQuery clustering [1].____

    Can someone please have a look?____

    __ __

    Thanks,____

    Wout____

    __ __

    1: https://github.com/apache/beam/pull/7061____

    __ __

    __ __

    *From: *Lukasz Cwik <lcwik@xxxxxxxxxx <mailto:lcwik@xxxxxxxxxx>>
    *Reply-To: *"user@xxxxxxxxxxxxxxx <mailto:user@xxxxxxxxxxxxxxx>"
    <user@xxxxxxxxxxxxxxx <mailto:user@xxxxxxxxxxxxxxx>>
    *Date: *Wednesday, 29 August 2018 at 18:32
    *To: *dev <dev@xxxxxxxxxxxxxxx <mailto:dev@xxxxxxxxxxxxxxx>>,
    "user@xxxxxxxxxxxxxxx <mailto:user@xxxxxxxxxxxxxxx>" <user@xxxxxxxxxxxxxxx
    <mailto:user@xxxxxxxxxxxxxxx>>
    *Cc: *Bob De Schutter <Bob.DeSchutter@xxxxxxxxxxxxxxxxxxx
    <mailto:Bob.DeSchutter@xxxxxxxxxxxxxxxxxxx>>
    *Subject: *Re: BigqueryIO field clustering____

    __ __

    +dev@xxxxxxxxxxxxxxx <mailto:dev@xxxxxxxxxxxxxxx> ____

    __ __

    Wout, I assigned this task to you since it seems like your interested in
    contributing.____

    The Apache Beam contribution guide[1] is a good place to start for answering
    questions on how to contribute.____

    __ __

    If you need help in getting stuff reviewed or having questions, feel free to
    reach out on dev@xxxxxxxxxxxxxxx <mailto:dev@xxxxxxxxxxxxxxx> or on Slack.____

    __ __

    1: https://beam.apache.org/contribute/____

    __ __

    __ __

    On Wed, Aug 29, 2018 at 1:28 AM Wout Scheepers
    <Wout.Scheepers@xxxxxxxxxxxxxxxxxxx
    <mailto:Wout.Scheepers@xxxxxxxxxxxxxxxxxxx>> wrote:____

        Hey all,____

        ____

        I’m trying to use the field clustering beta feature in bigquery [1].____

        However, the current Beam/dataflow worker bigquery api service
        dependency is ‘google-api-services-bigquery: com.google.apis:
        v2-rev374-1.23.0’, which does not include the clustering option in the
        TimePartitioning class.____

        Hereby, I can’t specify the clustering field when loading/streaming into
        bigquery. See [2] for the bigquery api error details.____

        ____

        Does anyone know a workaround for this? ____

        ____

        I guess that in the worst case I’ll have to wait until Beam supports a
        newer version of the bigquery api service.____

        1.After checking the Beam Jira I’ve found BEAM-5191
        <https://jira.apache.org/jira/browse/BEAM-5191>. Is there any way I can
        help to push this forward and make this feature possible in the near
        future?____

        ____

        Thanks in advance,____

        Wout____

        ____

        [1] https://cloud.google.com/bigquery/docs/clustered-tables____

        [2] "errorResult" : {____

               "message" : "Incompatible table partitioning specification.
        Expects partitioning specification interval(type:day,field:publish_time)
        clustering(clustering_id), but input partitioning specification is
        interval(type:day,field:publish_time)",____

               "reason" : "invalid"____

             }____