[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Pipeline is passing on local runner and failing on Dataflow runner - help with error


I am running the following pipeline on the local runner with no issues.'Define the pipeline')
p =  beam.Pipeline(options=options)
samplePath = outputPath
ExploreData = (p | "Extract the rows from dataframe" >>'archs4.Debug_annotation'))
                 | "create more columns" >> beam.ParDo(CreateColForSampleFn(colListSubset,outputPath)))
(ExploreData | 'writing to TSV files' >>'gs://archs4/output/dataExploration.txt',file_name_suffix='.tsv',num_shards=1,append_trailing_newlines=True,header=colListStrHeader))

Running on Dataflow fires the below error. I don't have any idea where to look for the issue. The error is not pointing to my pipeline code but to apache beam modules. 
I will try debugging using elimination. Please let me know if you have any direction for me.

Many thanks,

DataflowRuntimeExceptionTraceback (most recent call last)
<ipython-input-151-1e5aeb8b7d9b> in <module>()
----> 1

/usr/local/envs/py2env/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.pyc in wait_until_finish(self, duration)
    776         raise DataflowRuntimeException(
    777             'Dataflow pipeline failed. State: %s, Error:\n%s' %
--> 778             (self.state, getattr(self._runner, 'last_error_msg', None)), self)
    779     return self.state

DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/", line 581, in do_work
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/", line 166, in execute
  File "dataflow_worker/", line 283, in dataflow_worker.operations.DoOperation.start (dataflow_worker/operations.c:10680)
    def start(self):
  File "dataflow_worker/", line 284, in dataflow_worker.operations.DoOperation.start (dataflow_worker/operations.c:10574)
    with self.scoped_start_state:
  File "dataflow_worker/", line 289, in dataflow_worker.operations.DoOperation.start (dataflow_worker/operations.c:9775)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/", line 225, in loads
    return dill.loads(s)
  File "/usr/local/lib/python2.7/dist-packages/dill/", line 277, in loads
    return load(file)
  File "/usr/local/lib/python2.7/dist-packages/dill/", line 266, in load
    obj = pik.load()
  File "/usr/lib/python2.7/", line 858, in load
  File "/usr/lib/python2.7/", line 1090, in load_global
    klass = self.find_class(module, name)
  File "/usr/local/lib/python2.7/dist-packages/dill/", line 423, in find_class
    return StockUnpickler.find_class(self, module, name)
  File "/usr/lib/python2.7/", line 1124, in find_class
ImportError: No module named indexes.base