[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Issues with self executing jar and FileSystems API


I am trying to package a Beam Dataflow pipeline as a self executing jar using these instructions. However, I am running into a weird issue when attempting to execute this jar.

My pipeline needs to read a file (avro schema .avsc) from GCS outside of a PCollection before starting to work with PCollections. In order to do that I use the FileSystems API. This works perfectly fine when I execute the pipeline via mvn compile exec:java ..

However, if I attempt to run this as a jar, it appears to treat the GCS path as local and fails with a FileNotFoundException.

Exception in thread "main" /some/local/filesystem/path/myproject/gs:/my-gcs-bucket/schema/my-schema.avsc (No such file or directory)
at Method)

(Note that the input path is correct with the double slash but the error seems to strip that out
e.g: --inputPath=gs://my-gcs-bucket/schema/my-schema.avsc)

Any pointers on what might be causing this?

- Sameer