git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Capturing data changes that happen after the initial data pull


I've seen a similar use case with DoubleClick/Google Analytics (
https://support.google.com/ds/answer/2791195?hl=en), where the reporting
metrics have a "lookback window" of up to 30 days to mark conversion
attribution (so if a user converts on the 14th day of clicking on an ad it
will still be counted.

What we ended up doing in that case is setting the start/stop params of the
query to the the full window (pulled daily) and then upserted in Redshift
based on a primary key (in our case actually a composite key with multiple
attributes). So you end up pulling a lot of redundant data but since
there's no way to pull only updated records (which sounds like the case
you're in), it's the best way to ensure your reporting is up-to-date.



On Wed, Jun 6, 2018 at 1:10 PM Pedro Machado <pedro@xxxxxxxxxxxxxx> wrote:

> Yes. It's literally the same API calls with the same dates, only done a few
> days later. It's just redoing the same data pull but instead of pulling one
> date each dag run, it would pull all dates for the previous week on
> Tuesdays.
>
> Thanks!
>


-- 

[image: Astronomer Logo] <https://www.astronomer.io/>

*Ben Gregory*
Data Engineer

Mobile: +1-615-483-3653 • Online: astronomer.io <https://www.astronomer.io/>

Download our new ebook. <http://marketing.astronomer.io/guide/> From Volume
to Value - A Guide to Data Engineering.


( ! ) Warning: include(msgfooter.php): failed to open stream: No such file or directory in /var/www/git/apache-airflow-development/msg03593.html on line 105
Call Stack
#TimeMemoryFunctionLocation
10.0008364696{main}( ).../msg03593.html:0

( ! ) Warning: include(): Failed opening 'msgfooter.php' for inclusion (include_path='.:/var/www/git') in /var/www/git/apache-airflow-development/msg03593.html on line 105
Call Stack
#TimeMemoryFunctionLocation
10.0008364696{main}( ).../msg03593.html:0