Arch Linux Dev Blog

Introducing pkgctl license

··Sven-Hendrik Haase

In Arch Linux, as part of RFC40, we have recently decided to license all Arch Linux package sources as 0BSD. Our package sources didn't have any license previously. RFC40 only specified that we do want to license our package sources but it didn't specify how to ensure this. As such, in RFC52 we decided we want to use REUSE to achieve that.

NOTE: It might be a bit confusing that our PKGBUILD files also have a license field. However, this field specifies the upstream license, i.e. the license of the software that we package. It does not specify the license of the package sources. This blog post is about tooling to manage the licenses for our package sources.

REUSE

REUSE is a tool that checks whether all files in your project are legally accounted for. Without it, it wouldn't be quite so clear which files fall under what license exactly. REUSE requires either per-file SPDX-License-Identifier annotations or a central REUSE.toml.

We decided we wanted a central place per repository for all licenses and went with having a REUSE.toml file. That file has a fairly simple format and usually looks something like this:

version = 1

[[annotations]]
path = [
    "foo.c",
]
SPDX-FileCopyrightText = "John Doe"
SPDX-License-Identifier = "GPL-3.0-only"

So, that's our fundamental building block. The licenses should always be in the format specified by SPDX.

Before checking the licenses of a project that we have setup with a REUSE.toml, we have to download the respective license files. This is achieved using reuse download --all, which will download the specified licenses to the LICENSES/ directory. Afterwards we can run reuse lint to check the project for license compliance.

Integrating into devtools

Most of our package maintainer tooling lives in the devtools repository.

The idea now was to integrate REUSE with our existing tooling to ease the pain with initial setup and to ensure that it's always run before our package maintainers push new packages.

We have roughly 12000 packages and we wanted to save everyone's time as much as possible. Having our folks write 12000 REUSE.tomls by hand wasn't going to cut it.

Integrating into pkgctl

We use pkgctl as our universal swiss-army-knife for all kinds of packaging related tasks. It offers commands to conveniently interact with many relevant aspects of Arch Linux. As such, it only made sense to integrate the reuse command with pkgctl somehow.

For this pkgctl license was created with its two subcommands pkgctl license setup and pkgctl license check.

There's now also an integration in our package commit process ensuring that new package sources are committed with a valid REUSE setup from the start.

pkgctl license setup

This subcommand will try to generate a valid REUSE.toml for the given package.

At the time of writing, our default base REUSE.toml looks like this:

version = 1

[[annotations]]
path = [
    "PKGBUILD",
    "README.md",
    "keys/**",
    ".SRCINFO",
    ".nvchecker.toml",
    "*.install",
    "*.sysusers",
    "*sysusers.conf",
    "*.tmpfiles",
    "*tmpfiles.conf",
    "*.logrotate",
    "*.pam",
    "*.service",
    "*.socket",
    "*.timer",
    "*.desktop",
    "*.hook",
]
SPDX-FileCopyrightText = "Arch Linux contributors"
SPDX-License-Identifier = "0BSD"

It will always work fine if the package has no patches.

In case .patch files were found, we assume they have the same license as the upstream software itself and we'll then generate something that looks like this:

[[annotations]]
path = [
    "fix-something.patch",
]
SPDX-FileCopyrightText = "upstream contributors"
SPDX-License-Identifier = "MIT"

This automatic generation will fail in case the package sources have multiple licenses and patches.

pkgctl license check

This command checks whether we have a top-level LICENSE file with our expected Arch Linux-specific 0BSD license and then runs reuse lint. If this command is happy, we're happy. Simple enough.

Common problems

Automatic REUSE.toml generation fails in some known cases. The most common cases are these:

Non-standard Arch-specific package source files are included in the repo

Sometimes, we have some auxiliary files that are not part of the default REUSE.toml that we generate. This could be, for instance, some shell scripts that help with packaging. In this case, they should usually just be added in the annotations array for 0BSD as they are most certainly created by package maintainers.

Multiple licenses and patches

In case the package has multiple licenses and there are patches in the repo, pkgctl license setup can't automatically decide which of the licenses should be assigned to the patches.

In that case, it will generate a dummy license annotation for the patches. This approach, as opposed to failing outright and refusing to generate any REUSE.toml, was chosen so that our package maintainers would be able to just figure out the right license and put it in the SPDX-License-Identifier line. This is less annoying than also having to write all the annotations by hand.

In case there are problems with the REUSE.toml or any of the files in the repository, running pkgctl license check will fail and inform the package maintainer of any necessary manual intervention.

PKGBUILD doesn't specify the license in SPDX compliant format

In RFC16 we decided to use valid SPDX license identifiers in the license field of our package build scripts. However, to this day, many of our packages still specify the upstream license in a non-compliant format (e.g BSD instead of BSD-3-Clause). In that case, package maintainers need to figure out what the correct license identifier is, provide it in the license field and then run pkgctl license setup -f.