git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Define folder for task of dag


Thanks for your reply!

I used BashOperator for demonstrate that I want get. In my real case I use SparkSubmitOperator and I can't run "cd {your_work_space}; do something"
Then my application from jar is called from spark-submit, it looking local file. 
e.x. 
>pwd
/home/user
> spark-submit --class_name org.Job my_app.jar  
start scaning folder /home/user
...
>pwd
/tmp/demo
> spark-submit --class_name org.Job /home/user/my_app.jar  
start scaning folder /tmp/demo

If I will know where airflow will run SparkSubmitOperator I will be able set folder as config for scan.

But how I understand it isn't possible, isn't it?

Best Regards,
Anton 


-----Original Message-----
From: Song Liu <songliu@xxxxxxxxxxx> 
Sent: Friday, May 11, 2018 3:36 PM
To: dev@xxxxxxxxxxxxxxxxxxxxxxxxxxxx
Subject: 答复: Define folder for task of dag

It seems that this temporary folder name can't be got.

The folder you saw is a TemporaryFile created to hold your bash command and run it by BashOperator, I think you could have your own working space and run your logic there by simply "cd {your_work_space}; do something".
________________________________
发件人: Anton Mushin <Anton_Mushin@xxxxxxxx>
发送时间: 2018年5月11日 11:20
收件人: dev@xxxxxxxxxxxxxxxxxxxxxxxxxxxx
主题: Define folder for task of dag

Hi everyone,
I need know folder for task of dag.
for example
I have two tasks in dag:
pwd1 = BashOperator(
    task_id='pwd1',
    bash_command='pwd',
    dag=dag)

pwd2 = BashOperator(
    task_id='pwd2',
    bash_command='pwd',
    dag=dag)

as result I have for pwd1:
{bash_operator.py:97} INFO - Output:
{bash_operator.py:101} INFO - /tmp/airflowtmp3u5tdpt_

for pwd2:
{bash_operator.py:97} INFO - Output:
{bash_operator.py:101} INFO - /tmp/airflowtmphiyryxno

Can I get folder name where will be execute dag task?
in my case,  before run tasks getting /tmp/airflowtmp3u5tdpt_ and  /tmp/airflowtmphiyryxno

Best Regards,
Anton