git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Beam leaving temporary datasets in BigQuery


Hi,

We've recently enabled two Beam batch jobs in production, running daily, and have noticed a whole load of datasets being left behind in BigQuery (see attached). These jobs both read and write from BigQuery, and we're using Beam 2.4.0. The jobs are running as templates (with `withTemplateCompatibility()` when reading).

A similar issue has been reported at https://github.com/GoogleCloudPlatform/DataflowJavaSDK/issues/609.

The code to remove datasets does seem to be there, but I'm not seeing the logs in my job, so presumably it's not being called? https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryQuerySource.java#L151

Nothing else obvious in the logs.

Any ideas or suggestions on how to track this issue down?

Thanks,
Andrew

Attachment: Screen Shot 2018-05-31 at 10.38.15.png
Description: PNG image