Introducing pkgctl license
In Arch Linux, as part of RFC40, we have recently decided to license all Arch Linux package sources as 0BSD. Our package sources didn't have any license previously. RFC40 only specified that we do want to license our package sources but it didn't specify how to ensure this. As such, in RFC52 we decided we want to use REUSE to achieve that.
NOTE: It might be a bit confusing that our PKGBUILD files also have a license
field.
However, this field specifies the upstream license, i.e. the license of the software that we package.
It does not specify the license of the package sources.
This blog post is about tooling to manage the licenses for our package sources.
REUSE
REUSE is a tool that checks whether all files in your project are legally accounted for.
Without it, it wouldn't be quite so clear which files fall under what license exactly.
REUSE requires either per-file SPDX-License-Identifier
annotations or a central REUSE.toml
.
We decided we wanted a central place per repository for all licenses and went with having a REUSE.toml
file.
That file has a fairly simple format and usually looks something like this:
version = 1
[[annotations]]
path = [
"foo.c",
]
SPDX-FileCopyrightText = "John Doe"
SPDX-License-Identifier = "GPL-3.0-only"
So, that's our fundamental building block. The licenses should always be in the format specified by SPDX.
Before checking the licenses of a project that we have setup with a
REUSE.toml
, we have to download the respective license files. This is
achieved using reuse download --all
, which will download the specified
licenses to the LICENSES/
directory. Afterwards we can run reuse lint
to
check the project for license compliance.
Integrating into devtools
Most of our package maintainer tooling lives in the devtools repository.
The idea now was to integrate REUSE with our existing tooling to ease the pain with initial setup and to ensure that it's always run before our package maintainers push new packages.
We have roughly 12000 packages and we wanted to save everyone's time as much as possible.
Having our folks write 12000 REUSE.toml
s by hand wasn't going to cut it.
Integrating into pkgctl
We use pkgctl
as our universal swiss-army-knife for all kinds of packaging related tasks.
It offers commands to conveniently interact with many relevant aspects of Arch Linux.
As such, it only made sense to integrate the reuse
command with pkgctl
somehow.
For this pkgctl license
was created with its two subcommands pkgctl license setup
and pkgctl license check
.
There's now also an integration in our package commit process ensuring that new package sources are committed with a valid REUSE setup from the start.
pkgctl license setup
This subcommand will try to generate a valid REUSE.toml
for the given package.
At the time of writing, our default base REUSE.toml
looks like this:
version = 1
[[annotations]]
path = [
"PKGBUILD",
"README.md",
"keys/**",
".SRCINFO",
".nvchecker.toml",
"*.install",
"*.sysusers",
"*sysusers.conf",
"*.tmpfiles",
"*tmpfiles.conf",
"*.logrotate",
"*.pam",
"*.service",
"*.socket",
"*.timer",
"*.desktop",
"*.hook",
]
SPDX-FileCopyrightText = "Arch Linux contributors"
SPDX-License-Identifier = "0BSD"
It will always work fine if the package has no patches.
In case .patch
files were found, we assume they have the same license as the upstream software itself and we'll then generate something that looks like this:
[[annotations]]
path = [
"fix-something.patch",
]
SPDX-FileCopyrightText = "upstream contributors"
SPDX-License-Identifier = "MIT"
This automatic generation will fail in case the package sources have multiple licenses and patches.
pkgctl license check
This command checks whether we have a top-level LICENSE
file with our expected Arch Linux-specific 0BSD license and then runs reuse lint
.
If this command is happy, we're happy.
Simple enough.
Common problems
Automatic REUSE.toml
generation fails in some known cases.
The most common cases are these:
Non-standard Arch-specific package source files are included in the repo
Sometimes, we have some auxiliary files that are not part of the default REUSE.toml
that we generate.
This could be, for instance, some shell scripts that help with packaging.
In this case, they should usually just be added in the annotations array for
0BSD
as they are most certainly created by package maintainers.
Multiple licenses and patches
In case the package has multiple licenses and there are patches in the
repo, pkgctl license setup
can't automatically decide which of the licenses
should be assigned to the patches.
In that case, it will generate a dummy license annotation for the patches.
This approach, as opposed to failing outright and refusing to generate any
REUSE.toml
, was chosen so that our package maintainers would be able to just
figure out the right license and put it in the SPDX-License-Identifier
line.
This is less annoying than also having to write all the annotations by hand.
In case there are problems with the REUSE.toml
or any of the files in the
repository, running pkgctl license check
will fail and inform the package
maintainer of any necessary manual intervention.
PKGBUILD doesn't specify the license in SPDX compliant format
In RFC16 we decided to use valid SPDX license identifiers in the license
field of our package build scripts.
However, to this day, many of our packages still specify the upstream license in a non-compliant format (e.g BSD
instead of BSD-3-Clause
).
In that case, package maintainers need to figure out what the correct license identifier is, provide it in the license
field and then run pkgctl license setup -f
.