git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Capturing data changes that happen after the initial data pull


This is a similar case. My idea was to rerun the whole data pull. The
current DAG is idempotent so there is no issue with inserting duplicates.
Now, I'm trying to figure out the best way to code it in Airflow. Thanks

On Wed, Jun 6, 2018 at 2:29 PM Ben Gregory <ben@xxxxxxxxxxxxx> wrote:

> I've seen a similar use case with DoubleClick/Google Analytics (
> https://support.google.com/ds/answer/2791195?hl=en), where the reporting
> metrics have a "lookback window" of up to 30 days to mark conversion
> attribution (so if a user converts on the 14th day of clicking on an ad it
> will still be counted.
>
> What we ended up doing in that case is setting the start/stop params of the
> query to the the full window (pulled daily) and then upserted in Redshift
> based on a primary key (in our case actually a composite key with multiple
> attributes). So you end up pulling a lot of redundant data but since
> there's no way to pull only updated records (which sounds like the case
> you're in), it's the best way to ensure your reporting is up-to-date.
>
>
>
> On Wed, Jun 6, 2018 at 1:10 PM Pedro Machado <pedro@xxxxxxxxxxxxxx> wrote:
>
> > Yes. It's literally the same API calls with the same dates, only done a
> few
> > days later. It's just redoing the same data pull but instead of pulling
> one
> > date each dag run, it would pull all dates for the previous week on
> > Tuesdays.
> >
> > Thanks!
> >
>
>
> --
>
> [image: Astronomer Logo] <https://www.astronomer.io/>
>
> *Ben Gregory*
> Data Engineer
>
> Mobile: +1-615-483-3653 • Online: astronomer.io <
> https://www.astronomer.io/>
>
> Download our new ebook. <http://marketing.astronomer.io/guide/> From
> Volume
> to Value - A Guide to Data Engineering.
>