git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: StreamingFileSink causing AmazonS3Exception


Hi Steffen, 

Thanks for reporting this.

Internally Flink does not keep any open connections to S3.  It only keeps buffers data internally up 
till the point they reach a min-size limit (by default 5MB) and then uploads them as a part of 
an MPU on one go. Given this, I will have to dig a bit dipper to see why a connection would timeout.

If you are willing to dig into the code, all interactions with S3 pass through the S3AccessHelper 
class and its implementation, the HadoopS3AccessHelper. For the buffering and uploading logic, 
you could have a look at the S3RecoverableWriter and the S3RecoverableFsDataOutputStream.

I will keep looking into it. In the meantime, if you find anything let us know.

Cheers,
Kostas