git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PSA: Make sure your Airflow instance isn't public and isn't Google indexed


Agreed, secured by default is ideal. Though I wouldn't want people to get
an unreasonable sense of safety and open their instance to the web.

I like the idea of generating a temporary key/token and exposing it in the
console where the process was started. Other option is to use the
database/password mechanism by default and add a `airflow create-user
--admin`  CLI command to generate a user. With the level of cluelessness
we're observing we should probably force a certain password complexity
level.

We should also state clearly in the docs that Airflow is not regularly
pen-tested and should not be exposed to the Internet.

For the record we had Airflow pen-tested at Airbnb by a third party in 2016
(or was it 2017?) and found/resolved half a dozen or so vulnerabilities or
so. Note that there's no recurring process in place, or any mechanisms to
prevent regressions beyond code review. Also note that the new [beta in
1.10] UI has not been pen tested (to my knowledge).

Max

On Tue, Jun 5, 2018 at 2:48 PM Bolke de Bruin <bdbruin@xxxxxxxxx> wrote:

> Tbh I like to go to a setup where it is secure by default. Airflow is
> getting more and more used so it also increases the attack surface. If you
> run “initdb” or “resetdb” it is easy to provide a generated password.
>
> I don’t see a reason anymore for having a unsecured version.
>
> B.
>
> Verstuurd vanaf mijn iPad
>
> > Op 5 jun. 2018 om 23:11 heeft Christopher Bockman <chris@xxxxxxxxxxxxxxx>
> het volgende geschreven:
> >
> > +1 to being able to disable--we have authentication in place, but use a
> > separate solution that (probably?) Airflow won't realize is enabled, so
> > having a continuous giant warning banner would be rather unfortunate.
> >
> >> On Tue, Jun 5, 2018 at 2:05 PM, Alek Storm <alek.storm@xxxxxxxxx>
> wrote:
> >>
> >> This is a great idea, but we'd appreciate a setting that disables the
> >> banner even if those conditions aren't met - our instance is deployed
> >> without authentication, but is only accessible via our intranet.
> >>
> >> Alek
> >>
> >>
> >> On Tue, Jun 5, 2018, 3:35 PM James Meickle <jmeickle@xxxxxxxxxxxxxx>
> >> wrote:
> >>
> >>> I think that a banner notification would be a fair penalty if you
> access
> >>> Airflow without authentication, or have API authentication turned off,
> or
> >>> are accessing via http:// with a non-localhost `Host:`. (Are there any
> >>> other circumstances to think of?)
> >>>
> >>> I would also suggest serving a default robots.txt to mitigate
> accidental
> >>> indexing of public instances (as most public instances will be
> >> accidentally
> >>> public, statistically speaking). If you truly want your Airflow
> instance
> >>> public and indexed, you should have to go out of your way to permit
> that.
> >>>
> >>> On Tue, Jun 5, 2018 at 1:51 PM, Maxime Beauchemin <
> >>> maximebeauchemin@xxxxxxxxx> wrote:
> >>>
> >>>> What about a clear alert on the UI showing when auth is off? Perhaps a
> >>>> large red triangle-exclamation icon on the navbar with a tooltip
> >>>> "Authentication is off, this Airflow instance in not secure." and
> >>> clicking
> >>>> take you to the doc's security page.
> >>>>
> >>>> Well and then of course people should make sure their infra isn't open
> >> to
> >>>> the Internet. We really shouldn't have to tell people to keep their
> >>>> infrastructure behind a firewall. In most environments you have to do
> >>> quite
> >>>> a bit of work to open any resource up to the Internet (SSL certs,
> >> special
> >>>> security groups for load balancers/proxies, ...). Now I'm curious to
> >>>> understand how UMG managed to do this by mistake...
> >>>>
> >>>> Also a quick reminder to use the Connection abstraction to store
> >> secrets,
> >>>> ideally using the environment variable feature.
> >>>>
> >>>> Max
> >>>>
> >>>> On Tue, Jun 5, 2018 at 10:02 AM Taylor Edmiston <tedmiston@xxxxxxxxx>
> >>>> wrote:
> >>>>
> >>>>> One of our engineers wrote a blog post about the UMG mistakes as
> >> well.
> >>>>>
> >>>>> https://www.astronomer.io/blog/universal-music-group-airflow-leak/
> >>>>>
> >>>>> I know that best practices are well known here, but I second James'
> >>>>> suggestion that we add some docs, code, or config so that the
> >> framework
> >>>>> optimizes for being (nearly) production-ready by default and not just
> >>>> easy
> >>>>> to start with for local dev.  Admittedly this takes some work to not
> >>> add
> >>>>> friction to the local onboarding experience.
> >>>>>
> >>>>> Do most people keep separate airflow.cfg files per environment like
> >>>> what's
> >>>>> considered the best practice in the Django world?  e.g.
> >>>>> https://stackoverflow.com/q/10664244/149428
> >>>>>
> >>>>> Taylor
> >>>>>
> >>>>> *Taylor Edmiston*
> >>>>> Blog <https://blog.tedmiston.com/> | CV
> >>>>> <https://stackoverflow.com/cv/taylor> | LinkedIn
> >>>>> <https://www.linkedin.com/in/tedmiston/> | AngelList
> >>>>> <https://angel.co/taylor> | Stack Overflow
> >>>>> <https://stackoverflow.com/users/149428/taylor-edmiston>
> >>>>>
> >>>>>
> >>>>> On Tue, Jun 5, 2018 at 9:57 AM, James Meickle <
> >> jmeickle@xxxxxxxxxxxxxx
> >>>>
> >>>>> wrote:
> >>>>>
> >>>>>> Bumping this one because now Airflow is in the news over it...
> >>>>>>
> >>>>>> https://www.bleepingcomputer.com/news/security/contractor-
> >>>>>> exposes-credentials-for-universal-music-groups-it-
> >>>>>> infrastructure/?utm_campaign=Security%2BNewsletter&utm_
> >>>>>> medium=email&utm_source=Security_Newsletter_co_79
> >>>>>>
> >>>>>> On Fri, Mar 23, 2018 at 9:33 AM, James Meickle <
> >>>> jmeickle@xxxxxxxxxxxxxx>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> While Googling something Airflow-related a few weeks ago, I
> >> noticed
> >>>>> that
> >>>>>>> someone's Airflow dashboard had been indexed by Google and was
> >>>>> accessible
> >>>>>>> to the outside world without authentication. A little more
> >> Googling
> >>>>>>> revealed a handful of other indexed instances in various states
> >> of
> >>>>>>> security. I did my best to contact the operators, and waited for
> >>>>>> responses
> >>>>>>> before posting this.
> >>>>>>>
> >>>>>>> Airflow is not a secure project by default (
> >>>> https://issues.apache.org/
> >>>>>>> jira/browse/AIRFLOW-2047), and you can do all sorts of mean
> >> things
> >>> to
> >>>>> an
> >>>>>>> instance that hasn't been intentionally locked down. (And even
> >>> then,
> >>>>> you
> >>>>>>> shouldn't rely exclusively on your app's authentication for
> >>> providing
> >>>>>>> security.)
> >>>>>>>
> >>>>>>> Having "internal" dashboards/data sources/executors exposed to
> >> the
> >>>> web
> >>>>> is
> >>>>>>> dangerous, since old versions can stick around for a very long
> >>> time,
> >>>>> help
> >>>>>>> compromise unrelated deployments, and generally just create very
> >>> bad
> >>>>>> press
> >>>>>>> for the overall project if there's ever a mass compromise (see:
> >>> Redis
> >>>>> and
> >>>>>>> MongoDB).
> >>>>>>>
> >>>>>>> Shipping secure defaults is hard, but perhaps we could add best
> >>>>> practices
> >>>>>>> like instructions for deploying a robots.txt with Airflow? Or an
> >>>> impact
> >>>>>>> statement about what someone could do if they access your Airflow
> >>>>>> instance?
> >>>>>>> I think that many people deploying Airflow for the first time
> >> might
> >>>> not
> >>>>>>> realize that it can get indexed, or how much damage someone can
> >>> cause
> >>>>> via
> >>>>>>> accessing it.
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>


( ! ) Warning: include(msgfooter.php): failed to open stream: No such file or directory in /var/www/git/apache-airflow-development/msg03569.html on line 271
Call Stack
#TimeMemoryFunctionLocation
10.0008377016{main}( ).../msg03569.html:0

( ! ) Warning: include(): Failed opening 'msgfooter.php' for inclusion (include_path='.:/var/www/git') in /var/www/git/apache-airflow-development/msg03569.html on line 271
Call Stack
#TimeMemoryFunctionLocation
10.0008377016{main}( ).../msg03569.html:0