git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Pentaho to Airflow


Hey Arash --

We wrote this for a similar use case to yours (as I understand it). It's an
opinionated operator (assumes loading data from AWS S3) but it has an
pseudo-"upsert" (INSERT ... ON DUPLICATE KEY UPDATE) method for loading
data so you might be able to adapt to your needs.

https://github.com/airflow-plugins/mysql_plugin/blob/master/operators/s3_to_mysql_operator.py#L9

-Ben

On Tue, Jun 5, 2018 at 8:55 PM Arash Soheili <tonyarash@xxxxxxxxx> wrote:

> I have looked through those and didn't find what I needed. Although there
> is the mysql operator and I have used that to implement and insert or
> update.
>
> I was looking for something like this
>
> https://wiki.pentaho.com/plugins/servlet/mobile?contentId=8292089#content/view/8292089
> .
> A way to bulk insert or update based on lookup key. What would be the most
> optimized way to do this in Airflow?
>
> On Tue, Jun 5, 2018, 9:47 PM Taylor Edmiston <tedmiston@xxxxxxxxx> wrote:
>
> > Hey Arash -
> >
> > There are some common operators built-in
> > <
> https://github.com/apache/incubator-airflow/tree/master/airflow/operators
> > >
> > to Airflow and some in contrib
> > <
> >
> https://github.com/apache/incubator-airflow/tree/master/airflow/contrib/operators
> > >
> > as well.
> >
> > We also maintain a community sourced GitHub org of Airflow plugins
> (mostly
> > hooks and operators) at https://github.com/airflow-plugins.
> >
> > Are there specific sources/destinations you're looking for to match what
> > you use in Pentaho?
> >
> > Best,
> > Taylor
> >
> > *Taylor Edmiston*
> > Blog <https://blog.tedmiston.com/> | CV
> > <https://stackoverflow.com/cv/taylor> | LinkedIn
> > <https://www.linkedin.com/in/tedmiston/> | AngelList
> > <https://angel.co/taylor> | Stack Overflow
> > <https://stackoverflow.com/users/149428/taylor-edmiston>
> >
> >
> > On Tue, Jun 5, 2018 at 8:57 PM, Arash Soheili <tonyarash@xxxxxxxxx>
> wrote:
> >
> > > Hi,
> > >
> > > I'm new to Airlfow and helping to setup our organization to transition
> > away
> > > from using Pentaho Data Integration for our ETL. Although there are a
> lot
> > > of things I don't like about Pentaho they do have some nice standard
> > > modules like batch databae insert/update which are common ETL tasks.
> > >
> > > As I'm new to Airflow I haven't seen any standard Operators for this
> kind
> > > of task which I would think would be a common use case in Airflow or
> any
> > > ETL. Am I missing this information or is it expected upon each Airflow
> > > users to implement their own standard operators for this kind of
> > operation.
> > > I would think this should at some point become part of Airflow
> codebase.
> > >
> > > Arash
> > >
> >
>


-- 

[image: Astronomer Logo] <https://www.astronomer.io/>

*Ben Gregory*
Data Engineer

Mobile: +1-615-483-3653 • Online: astronomer.io <https://www.astronomer.io/>

Download our new ebook. <http://marketing.astronomer.io/guide/> From Volume
to Value - A Guide to Data Engineering.