git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Dealing with data latency


The common standard is to have the execution_date aligned with the
partition date in the database (say 2018-08-08) and contain data from
2018-08-08T00:00:000
to 2018-08-09T23:59:999.

The partition date and execution_date match and correspond to the left
bound of the time interval processed.

Then you'd use some sensors to make sure this cannot run until the desired
time or conditions are met.

Max

On Mon, Jun 4, 2018 at 5:46 AM Pedro Machado <pedro@xxxxxxxxxxxxxx> wrote:

> Hi. What is the recommended way to deal with data latency? For example, I
> have a feed that is not considered final until 72 hours have passed after
> the end of the daily period.
>
> For example, Monday's data would be ready by Thursday at 23:59.
>
> Should I pull data based on the execution date minus a 72 hour offset or
> use the execution date and somehow delay the data pull for 72 hours?
>
> The latter would be more intuitive (data pull date = execution date) but I
> am not sure if it's a good pattern.
>
> Thanks,
>
> Pedro
>