77 lines
No EOL
7.2 KiB
Text
77 lines
No EOL
7.2 KiB
Text
<!DOCTYPE html>
|
|
<html xmlns="http://www.w3.org/1999/xhtml"><head><title>Who maintains package X?</title><meta charset="UTF-8"/><link rel="stylesheet" href="/static/css/style.css"/><link rel="stylesheet" href="/static/css/highlight.css"/><script src="/static/js/highlight.pack.js"></script><script>hljs.initHighlightingOnLoad();</script><link rel="alternate" href="/atom" title="Who maintains package X?" type="application/atom+xml"/><meta name="viewport" content="width=device-width, initial-scale=1, viewport-fit=cover"/></head><body><nav class="navbar navbar-default navbar-fixed-top"><div class="container"><div class="navbar-header"><a class="navbar-brand" href="/Posts">full stack engineer</a></div><div class="collapse navbar-collapse collapse"><ul class="nav navbar-nav navbar-right"><li><a href="/About"><span>About</span></a></li><li><a href="/Posts"><span>Posts</span></a></li></ul></div></div></nav><main><div class="flex-container"><div class="post"><h2>Who maintains package X?</h2><span class="author">Written by hannes</span><br/><div class="tags">Classified under: <a href="/tags/package signing" class="tag">package signing</a><a href="/tags/security" class="tag">security</a></div><span class="date">Published: 2017-02-16 (last updated: 2017-03-09)</span><article><p>A very important data point for conex, the new opam signing utility, is who is authorised for a given package. We
|
|
could have written this manually down, or force each author to create a
|
|
pull request for their packages, but this would be a long process and not
|
|
easy: the main opam repository has around 1500 unique packages, and 350
|
|
contributors. Fortunately, it is a git repository with 5 years of history, and
|
|
over 6900 pull requests. Each opam file may also contain a <code>maintainers</code> entry,
|
|
a list of strings (usually a mail address).</p>
|
|
<p>The data sources we correlate are the <code>maintainers</code> entry in opam file, and who
|
|
actually committed in the opam repository. This is inspired by <a href="https://github.com/ocaml/opam/issues/2693">some GitHub
|
|
discussion</a>.</p>
|
|
<h3>GitHub id and email address</h3>
|
|
<p>For simplicity, since conex uses any (unique) identifier for authors, and the opam
|
|
repository is hosted on GitHub, we use a GitHub id as author identifier.
|
|
Maintainer information is an email address, thus we need a mapping between them.</p>
|
|
<p>We wrote a <a href="https://raw.githubusercontent.com/hannesm/conex/master/analysis/loop-prs.sh">shell
|
|
script</a>
|
|
to find all PR merges, their GitHub id (in a brittle way: using the name of the
|
|
git remote), and email address of the last commit. It also saves a diff of the
|
|
PR for later. This results in 6922 PRs (opam repository version 38d908dcbc58d07467fbc00698083fa4cbd94f9d).</p>
|
|
<p>The metadata output is processed by
|
|
<a href="https://github.com/hannesm/conex/blob/dbdfc5337c97d62edc74f1c546023bcb5e719343/analysis/maintainer.ml#L134-L156">github_mail</a>:
|
|
we ignore PRs from GitHub organisations <code>PR.ignore_github</code>, where commits
|
|
<code>PR.ignore_pr</code> are picked from a different author (manually), bad mail addresses,
|
|
and <a href="https://github.com/yallop">Jeremy's</a> mail address (it is added to too many GitHub ids otherwise). The
|
|
goal is to have a for an email address a single GitHub id. 329 authors with 416 mail addresses are mapped.</p>
|
|
<h3>Maintainer in opam</h3>
|
|
<p>As mentioned, lots of packages contain a <code>maintainers</code> entry. In
|
|
<a href="https://github.com/hannesm/conex/blob/dbdfc5337c97d62edc74f1c546023bcb5e719343/analysis/maintainer.ml#L40-L68"><code>maintainers</code></a>
|
|
we extract the mail addresses of the <a href="https://github.com/hannesm/conex/blob/dbdfc5337c97d62edc74f1c546023bcb5e719343/analysis/maintainer.ml#L70-L94">most recently released opam
|
|
file</a>.
|
|
Some hardcoded matches are teams which do not properly maintain the maintainers
|
|
field (such as mirage and xapi-project ;). We're open for suggestions to extend
|
|
this massaging to the needs. Additionally, the contact at ocamlpro mail address
|
|
was used for all packages before the maintainers entry was introduced (based on
|
|
a discussion with Louis Gesbert). 132 packages with empty maintainers.</p>
|
|
<h3>Fitness</h3>
|
|
<p>Combining these two data sources, we hoped to find a strict small set of whom to
|
|
authorise for which package. Turns out some people use different mail addresses
|
|
for git commits and opam maintainer entries, which <a href="https://github.com/hannesm/conex/blob/dbdfc5337c97d62edc74f1c546023bcb5e719343/analysis/maintainer.ml#L233-L269">are be easily
|
|
fixed</a>.</p>
|
|
<p>While <a href="https://github.com/hannesm/conex/blob/dbdfc5337c97d62edc74f1c546023bcb5e719343/analysis/maintainer.ml#L169-L205">processing the full diffs of each
|
|
PR</a>
|
|
(using the diff parser of conex mentioned above), ignoring the 44% done by
|
|
<a href="https://github.com/hannesm/conex/blob/dbdfc5337c97d62edc74f1c546023bcb5e719343/analysis/maintainer.ml#L158-L165">janitors</a>
|
|
(a manually created set by looking at log data, please report if wrong), we
|
|
categorise the modifications: authorised modification (the GitHub id is
|
|
authorised for the package), modification by an author to a team-owned package
|
|
(propose to add this author to the team), modification of a package where no
|
|
GitHub id is authorised, and unauthorised modification. We also ignore packages
|
|
which are no longer in the opam repository.</p>
|
|
<p>2766 modifications were authorised, 418 were team-owned, 452 were to packages
|
|
with no maintainer, and 570 unauthorised. This results in 125 unowned packages.</p>
|
|
<p>Out of the 452 modifications to packages with no maintainer, 75 are a global
|
|
one-to-one author to package relation, and are directly authorised.</p>
|
|
<p>Inference of team members is an overapproximation (everybody who committed
|
|
changes to their packages), additionally the janitors are missing. We will have
|
|
to fill these manually.</p>
|
|
<pre><code>alt-ergo -> OCamlPro-Iguernlala UnixJunkie backtracking bobot nobrowser
|
|
janestreet -> backtracking hannesm j0sh rgrinberg smondet
|
|
mirage -> MagnusS dbuenzli djs55 hannesm hnrgrgr jonludlam mato mor1 pgj pqwy pw374 rdicosmo rgrinberg ruhatch sg2342 talex5 yomimono
|
|
ocsigen -> balat benozol dbuenzli hhugo hnrgrgr jpdeplaix mfp pveber scjung slegrand45 smondet vasilisp
|
|
xapi-project -> dbuenzli djs55 euanh mcclurmc rdicosmo simonjbeaumont yomimono
|
|
</code></pre>
|
|
<h3>Alternative approach: GitHub urls</h3>
|
|
<p>An alternative approach (attempted earlier) working only for GitHub hosted projects, is to authorise
|
|
<a href="https://github.com/hannesm/conex/blob/github/analysis/maintainer.ml#L37-L91">the use of the user part of the GitHub repository
|
|
URL</a>.
|
|
Results after filtering GitHub organisations are not yet satisfactory (but only
|
|
56 packages with no maintainer, <a href="https://github.com/hannesm/opam-repository/tree/github">output repo</a>. This approach
|
|
completely ignores the manually written maintainer field.</p>
|
|
<h3>Conclusion</h3>
|
|
<p>Manually maintained metadata is easily out of date, and not very useful. But
|
|
combining automatically created metadata with manually, and some manual tweaking
|
|
leads to reasonable data.</p>
|
|
<p>The resulting authorised inference is available <a href="https://github.com/hannesm/opam-repository/tree/auth">in this branch</a>.</p>
|
|
</article></div></div></main></body></html> |