git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Make sure your Airflow instance isn't public and isn't Google indexed


+1 for securing it by default. I also suggest to create a admin user password while running "initdb".

On 05/06/2018, 22:49, "Bolke de Bruin" <bdbruin@xxxxxxxxx> wrote:

    Tbh I like to go to a setup where it is secure by default. Airflow is getting more and more used so it also increases the attack surface. If you run “initdb” or “resetdb” it is easy to provide a generated password. 
    
    I don’t see a reason anymore for having a unsecured version.
    
    B.
    
    Verstuurd vanaf mijn iPad
    
    > Op 5 jun. 2018 om 23:11 heeft Christopher Bockman <chris@xxxxxxxxxxxxxxx> het volgende geschreven:
    > 
    > +1 to being able to disable--we have authentication in place, but use a
    > separate solution that (probably?) Airflow won't realize is enabled, so
    > having a continuous giant warning banner would be rather unfortunate.
    > 
    >> On Tue, Jun 5, 2018 at 2:05 PM, Alek Storm <alek.storm@xxxxxxxxx> wrote:
    >> 
    >> This is a great idea, but we'd appreciate a setting that disables the
    >> banner even if those conditions aren't met - our instance is deployed
    >> without authentication, but is only accessible via our intranet.
    >> 
    >> Alek
    >> 
    >> 
    >> On Tue, Jun 5, 2018, 3:35 PM James Meickle <jmeickle@xxxxxxxxxxxxxx>
    >> wrote:
    >> 
    >>> I think that a banner notification would be a fair penalty if you access
    >>> Airflow without authentication, or have API authentication turned off, or
    >>> are accessing via http:// with a non-localhost `Host:`. (Are there any
    >>> other circumstances to think of?)
    >>> 
    >>> I would also suggest serving a default robots.txt to mitigate accidental
    >>> indexing of public instances (as most public instances will be
    >> accidentally
    >>> public, statistically speaking). If you truly want your Airflow instance
    >>> public and indexed, you should have to go out of your way to permit that.
    >>> 
    >>> On Tue, Jun 5, 2018 at 1:51 PM, Maxime Beauchemin <
    >>> maximebeauchemin@xxxxxxxxx> wrote:
    >>> 
    >>>> What about a clear alert on the UI showing when auth is off? Perhaps a
    >>>> large red triangle-exclamation icon on the navbar with a tooltip
    >>>> "Authentication is off, this Airflow instance in not secure." and
    >>> clicking
    >>>> take you to the doc's security page.
    >>>> 
    >>>> Well and then of course people should make sure their infra isn't open
    >> to
    >>>> the Internet. We really shouldn't have to tell people to keep their
    >>>> infrastructure behind a firewall. In most environments you have to do
    >>> quite
    >>>> a bit of work to open any resource up to the Internet (SSL certs,
    >> special
    >>>> security groups for load balancers/proxies, ...). Now I'm curious to
    >>>> understand how UMG managed to do this by mistake...
    >>>> 
    >>>> Also a quick reminder to use the Connection abstraction to store
    >> secrets,
    >>>> ideally using the environment variable feature.
    >>>> 
    >>>> Max
    >>>> 
    >>>> On Tue, Jun 5, 2018 at 10:02 AM Taylor Edmiston <tedmiston@xxxxxxxxx>
    >>>> wrote:
    >>>> 
    >>>>> One of our engineers wrote a blog post about the UMG mistakes as
    >> well.
    >>>>> 
    >>>>> https://www.astronomer.io/blog/universal-music-group-airflow-leak/
    >>>>> 
    >>>>> I know that best practices are well known here, but I second James'
    >>>>> suggestion that we add some docs, code, or config so that the
    >> framework
    >>>>> optimizes for being (nearly) production-ready by default and not just
    >>>> easy
    >>>>> to start with for local dev.  Admittedly this takes some work to not
    >>> add
    >>>>> friction to the local onboarding experience.
    >>>>> 
    >>>>> Do most people keep separate airflow.cfg files per environment like
    >>>> what's
    >>>>> considered the best practice in the Django world?  e.g.
    >>>>> https://stackoverflow.com/q/10664244/149428
    >>>>> 
    >>>>> Taylor
    >>>>> 
    >>>>> *Taylor Edmiston*
    >>>>> Blog <https://blog.tedmiston.com/> | CV
    >>>>> <https://stackoverflow.com/cv/taylor> | LinkedIn
    >>>>> <https://www.linkedin.com/in/tedmiston/> | AngelList
    >>>>> <https://angel.co/taylor> | Stack Overflow
    >>>>> <https://stackoverflow.com/users/149428/taylor-edmiston>
    >>>>> 
    >>>>> 
    >>>>> On Tue, Jun 5, 2018 at 9:57 AM, James Meickle <
    >> jmeickle@xxxxxxxxxxxxxx
    >>>> 
    >>>>> wrote:
    >>>>> 
    >>>>>> Bumping this one because now Airflow is in the news over it...
    >>>>>> 
    >>>>>> https://www.bleepingcomputer.com/news/security/contractor-
    >>>>>> exposes-credentials-for-universal-music-groups-it-
    >>>>>> infrastructure/?utm_campaign=Security%2BNewsletter&utm_
    >>>>>> medium=email&utm_source=Security_Newsletter_co_79
    >>>>>> 
    >>>>>> On Fri, Mar 23, 2018 at 9:33 AM, James Meickle <
    >>>> jmeickle@xxxxxxxxxxxxxx>
    >>>>>> wrote:
    >>>>>> 
    >>>>>>> While Googling something Airflow-related a few weeks ago, I
    >> noticed
    >>>>> that
    >>>>>>> someone's Airflow dashboard had been indexed by Google and was
    >>>>> accessible
    >>>>>>> to the outside world without authentication. A little more
    >> Googling
    >>>>>>> revealed a handful of other indexed instances in various states
    >> of
    >>>>>>> security. I did my best to contact the operators, and waited for
    >>>>>> responses
    >>>>>>> before posting this.
    >>>>>>> 
    >>>>>>> Airflow is not a secure project by default (
    >>>> https://issues.apache.org/
    >>>>>>> jira/browse/AIRFLOW-2047), and you can do all sorts of mean
    >> things
    >>> to
    >>>>> an
    >>>>>>> instance that hasn't been intentionally locked down. (And even
    >>> then,
    >>>>> you
    >>>>>>> shouldn't rely exclusively on your app's authentication for
    >>> providing
    >>>>>>> security.)
    >>>>>>> 
    >>>>>>> Having "internal" dashboards/data sources/executors exposed to
    >> the
    >>>> web
    >>>>> is
    >>>>>>> dangerous, since old versions can stick around for a very long
    >>> time,
    >>>>> help
    >>>>>>> compromise unrelated deployments, and generally just create very
    >>> bad
    >>>>>> press
    >>>>>>> for the overall project if there's ever a mass compromise (see:
    >>> Redis
    >>>>> and
    >>>>>>> MongoDB).
    >>>>>>> 
    >>>>>>> Shipping secure defaults is hard, but perhaps we could add best
    >>>>> practices
    >>>>>>> like instructions for deploying a robots.txt with Airflow? Or an
    >>>> impact
    >>>>>>> statement about what someone could do if they access your Airflow
    >>>>>> instance?
    >>>>>>> I think that many people deploying Airflow for the first time
    >> might
    >>>> not
    >>>>>>> realize that it can get indexed, or how much damage someone can
    >>> cause
    >>>>> via
    >>>>>>> accessing it.
    >>>>>>> 
    >>>>>> 
    >>>>> 
    >>>> 
    >>> 
    >> 
    
    
    



Kaxil Naik 

Data Reply
2nd Floor, Nova South
160 Victoria Street, Westminster
London SW1E 5LB - UK 
phone: +44 (0)20 7730 6000
k.naik@xxxxxxxxx
www.reply.com