git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PSA: Make sure your Airflow instance isn't public and isn't Google indexed


This is a great idea, but we'd appreciate a setting that disables the
banner even if those conditions aren't met - our instance is deployed
without authentication, but is only accessible via our intranet.

Alek


On Tue, Jun 5, 2018, 3:35 PM James Meickle <jmeickle@xxxxxxxxxxxxxx> wrote:

> I think that a banner notification would be a fair penalty if you access
> Airflow without authentication, or have API authentication turned off, or
> are accessing via http:// with a non-localhost `Host:`. (Are there any
> other circumstances to think of?)
>
> I would also suggest serving a default robots.txt to mitigate accidental
> indexing of public instances (as most public instances will be accidentally
> public, statistically speaking). If you truly want your Airflow instance
> public and indexed, you should have to go out of your way to permit that.
>
> On Tue, Jun 5, 2018 at 1:51 PM, Maxime Beauchemin <
> maximebeauchemin@xxxxxxxxx> wrote:
>
> > What about a clear alert on the UI showing when auth is off? Perhaps a
> > large red triangle-exclamation icon on the navbar with a tooltip
> > "Authentication is off, this Airflow instance in not secure." and
> clicking
> > take you to the doc's security page.
> >
> > Well and then of course people should make sure their infra isn't open to
> > the Internet. We really shouldn't have to tell people to keep their
> > infrastructure behind a firewall. In most environments you have to do
> quite
> > a bit of work to open any resource up to the Internet (SSL certs, special
> > security groups for load balancers/proxies, ...). Now I'm curious to
> > understand how UMG managed to do this by mistake...
> >
> > Also a quick reminder to use the Connection abstraction to store secrets,
> > ideally using the environment variable feature.
> >
> > Max
> >
> > On Tue, Jun 5, 2018 at 10:02 AM Taylor Edmiston <tedmiston@xxxxxxxxx>
> > wrote:
> >
> > > One of our engineers wrote a blog post about the UMG mistakes as well.
> > >
> > > https://www.astronomer.io/blog/universal-music-group-airflow-leak/
> > >
> > > I know that best practices are well known here, but I second James'
> > > suggestion that we add some docs, code, or config so that the framework
> > > optimizes for being (nearly) production-ready by default and not just
> > easy
> > > to start with for local dev.  Admittedly this takes some work to not
> add
> > > friction to the local onboarding experience.
> > >
> > > Do most people keep separate airflow.cfg files per environment like
> > what's
> > > considered the best practice in the Django world?  e.g.
> > > https://stackoverflow.com/q/10664244/149428
> > >
> > > Taylor
> > >
> > > *Taylor Edmiston*
> > > Blog <https://blog.tedmiston.com/> | CV
> > > <https://stackoverflow.com/cv/taylor> | LinkedIn
> > > <https://www.linkedin.com/in/tedmiston/> | AngelList
> > > <https://angel.co/taylor> | Stack Overflow
> > > <https://stackoverflow.com/users/149428/taylor-edmiston>
> > >
> > >
> > > On Tue, Jun 5, 2018 at 9:57 AM, James Meickle <jmeickle@xxxxxxxxxxxxxx
> >
> > > wrote:
> > >
> > > > Bumping this one because now Airflow is in the news over it...
> > > >
> > > > https://www.bleepingcomputer.com/news/security/contractor-
> > > > exposes-credentials-for-universal-music-groups-it-
> > > > infrastructure/?utm_campaign=Security%2BNewsletter&utm_
> > > > medium=email&utm_source=Security_Newsletter_co_79
> > > >
> > > > On Fri, Mar 23, 2018 at 9:33 AM, James Meickle <
> > jmeickle@xxxxxxxxxxxxxx>
> > > > wrote:
> > > >
> > > > > While Googling something Airflow-related a few weeks ago, I noticed
> > > that
> > > > > someone's Airflow dashboard had been indexed by Google and was
> > > accessible
> > > > > to the outside world without authentication. A little more Googling
> > > > > revealed a handful of other indexed instances in various states of
> > > > > security. I did my best to contact the operators, and waited for
> > > > responses
> > > > > before posting this.
> > > > >
> > > > > Airflow is not a secure project by default (
> > https://issues.apache.org/
> > > > > jira/browse/AIRFLOW-2047), and you can do all sorts of mean things
> to
> > > an
> > > > > instance that hasn't been intentionally locked down. (And even
> then,
> > > you
> > > > > shouldn't rely exclusively on your app's authentication for
> providing
> > > > > security.)
> > > > >
> > > > > Having "internal" dashboards/data sources/executors exposed to the
> > web
> > > is
> > > > > dangerous, since old versions can stick around for a very long
> time,
> > > help
> > > > > compromise unrelated deployments, and generally just create very
> bad
> > > > press
> > > > > for the overall project if there's ever a mass compromise (see:
> Redis
> > > and
> > > > > MongoDB).
> > > > >
> > > > > Shipping secure defaults is hard, but perhaps we could add best
> > > practices
> > > > > like instructions for deploying a robots.txt with Airflow? Or an
> > impact
> > > > > statement about what someone could do if they access your Airflow
> > > > instance?
> > > > > I think that many people deploying Airflow for the first time might
> > not
> > > > > realize that it can get indexed, or how much damage someone can
> cause
> > > via
> > > > > accessing it.
> > > > >
> > > >
> > >
> >
>