git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Recruiting more maintainers for Apache Arrow


Hey,

first of all, thanks a lot for your, Uwes, the mergers and contributors
work. Now, to the maintainer problem:

# Arrow as "a library"
One thing that makes Arrow special is that it is not a single, but many
libraries (one for each language) and many of them are not only a
binding to a C/C++ lib, but partly a complete re-implementation of the
protocol, e.g.:

- C++: one core, but also contains Python specialties
- Java: another core
- Rust: yet another core
- Python: a binding to C++ but also a lot more stuff because of Pandas
...

And you two are maintaining all of them and I doubt that you have the
capacities and knowledge to do this at the desired level of quality
(which is natural, not a personal issue or offense). So this I would
call "pseudo-maintenance", since you're solely the gatekeeper that does
some shallow reviewing and has the burden to do the housekeeping and
the merging. So why accepting these language bindings in the first
place without bringing a core maintainer in place? For example, let's
say someone proposes a binding to Haskell now. That should not be
accepted as part of the official Apache implementation without a
dedicated maintainer (ideally the PR-author would be that person, but
there may others who step up).

Right now, it might be too late to remove some of the incomplete / WIP
implementations that don't have a core maintainer though.

# GitHub
Another special thing to consider is that Arrow is (ab)using GitHub as
a code hosting platform. Even as a contributor, this has obvious bad
uncool consequences:

- you have yet another issue hosting system to log in
- links to issues don't work in the known magic way
- you're merging the PRs by closing them; which is by all means a not
  very nice way because it does not reflect the contributors work in
  the project overview and personal profiles, but exactly this is a
  large part of the GitHub community (btw: merging PRs without using
  GitHubs merge button IS possible as bors/bors-ng proof)

So as a potential maintainer, this is already a bumper, since I know
that there are things less confortable then the system I would get from
any normal GitHub or Gitlab project.

I'm not really sure how to solve this or if it should be solved (read
about the laziness aspect in "Contribution VS Maintenance" below)

# Time / Payment
Yes, this is indeed a big issue. From what I can tell from the open
source projects I was involved in is that for large contributor crowds,
you normally have full/half-time positions in place for the core
maintainer (look at the Mozilla projects, the Blender Foundation, Gnome
/ Red Hat). So at one point I think maintaining isn't a part time /
hobby thing anymore (w/o downgrading the hard work of Hobby-
contributors, in contrast). I don't have a link at hand, but I recall
some discussion about GitHub and it's importance for hiring (since it
it acts as a CV) after MS bought it, and some of the responses are
"doing all this work in your free time is a privilege of wealthy,
mostly-white men", which without signing this statement in this really
bare form already shows a problem of open source world.

# Contribution VS Maintenance
The very "nice" thing about patch/PR contribution is that you do your
work and then you can walk away and it's the maintainers problem to
release the artifact, upgrade/migrate your code and ensure that the
tests you've written never break. It's comfortable. Being a maintainer
means all the opposite things. And in the end, you get blamed for not
supporting certain features (see the open source paragraph here https:/
/blog.ghost.org/5/ ) or for security disasters (remember the OpenSSL
disaster).

I think together with the previous point this means, we have to get
companies to pay for that work, and not just dump their features to an
OSS repo.

# Path to Maintainership
So I think (from my narrow point of view!) that many people expect that
the path from "outsider" to "maintainer" takes the route over "a lot of
patch/PR contributions". If I'm reading your mail right, that is not
necessarily the case for Apache projects and I think that's great. The
"review PRs" path sounds great, but I think GitHub or any platform I'm
aware don't do a good job in getting people to do so. I mean, I see a
PR and a can leave a review, but for me it is not really clear which
consequences this have (naturally, random people don't have a veto on
changes). So I can jump in when I think something is wrong, but I
cannot approve a PR. This makes sense, but it poses the question of
"how?!". I mean, it is pretty clear on how to become a patch/PR
contributor, but it is not clear on how to become a maintainer, at
least not in an easy way. (I'm sure it's written down somewhere).

So, overall I think a clear Call for Action at the top of the README
could help. Like "Hey, we're looking for maintainers, you could start
by reviewing some PRs and after some reviews maintainers will just be
the last gatekeeper and after some more time, you can even merge PRs on
your own".

# My personal contribution
Triggered by this call for help, I'll try to get more involved in
Python, C++ and Rust reviews.

So, these are some thoughts that I hope may help.

Thanks again for addressing this issue and your time and passion,
Marco

On 2018/06/30 14:57:42, Wes McKinney <w...@xxxxxxxxx> wrote: 
> hi folks,> 
> 
> Arrow has grown by leaps and bounds over the last 2.5 years. We are> 
> approaching our 2000th patch and on track to surpass 200 unique> 
> contributors by year end.> 
> 
> All this contribution growth is great, but it has a hidden cost:
the> 
> maintenance. The burden of maintaining the project: particularly> 
> reviewing and merging patches, has fallen on a very small number of> 
> people. From the commit logs, we can see how many patches each> 
> committer has merged:> 
> 
> $ git shortlog -csn
d5aa7c46692474376a3c31704cfc4783c86338f2..master> 
>   1289  Wes McKinney> 
>    268  Uwe L. Korn> 
>     74  Korn, Uwe> 
>     54  Antoine Pitrou> 
>     52  Julien Le Dem> 
>     39  Philipp Moritz> 
>     18  Kouhei Sutou> 
>     18  Steven Phillips> 
>     13  Bryan Cutler> 
>     11  Jacques Nadeau> 
>     10  Phillip Cloud> 
>      8  Brian Hulette> 
>      5  Robert Nishihara> 
>      5  adeneche> 
>      4  GitHub> 
>      3  Sidd> 
>      3  siddharth> 
>      1  AbdelHakim Deneche> 
>      1  Your Name Here> 
> 
> So Uwe and I have merged ~84% of the patches in the project so far.> 
> This isn't a completely accurate reflection of the maintainer
burden,> 
> since many others contribute to code reviews and other aspects of> 
> patch maintenance, and you have to be a committer to earn a place
on> 
> this list.> 
> 
> I'm not sure what's the best way to address this problem. The
quality> 
> of our code review has declined at times as we struggle to keep up> 
> with the flow of patches -- I don't think this is good. Having the> 
> patch queue pile up isn't great either. Personally, I'm having a> 
> difficult time balancing project maintenance and patch authoring,> 
> particularly in the last 6 months.> 
> 
> Unfortunately, many people believe that writing patches is the
primary> 
> mode of contribution to an open source project. Apache projects> 
> explicitly state that non-patch contributions are valued in earning> 
> karma (committership and PMC membership). We're starting to have
more> 
> corporate contributors come out of the woodwork, and while it's
great> 
> for contributors to be paid to write patches for the project, they
are> 
> rarely given the time and space to contribute meaningfully to> 
> maintenance.> 
> 
> Any thoughts about how we can grow the maintainership? Somehow we
need> 
> to reach ~5-6 core maintainers over the next year.> 
> 
> Thanks,> 
> Wes> 
>