Specifications
In October 2024 a team of dedicated developers has started work on the ALPM project. Since then it has been focusing on writing new documentation on many aspects of Arch Linux Package Management that were not thoroughly documented in the past. This article provides an overview of the specifications written by this project and attempts to contextualize them for the reader.
The existing stack 📚
With its bash
based makepkg
tool for package creation, the libalpm C library for interfacing with system state and the central pacman
package management tool, the pacman project has defined the foundation of package management on Arch Linux for the past 20 years.
Over the years, several adjacent projects emerged, that provide functionality beyond the scope of the pacman project:
namcap
: PKGBUILD and package file linting.- dbscripts: Binary package repository management used by Arch Linux to manage the official repositories.
- devtools: A set of scripts and configuration files that also encompass Arch Linux's canonical package build tool
pkgctl
which wrapsmakepkg
and performs builds in clean chroot environments with the help ofsystemd-nspawn
.
Each Linux distribution has a similar stack of tools, that allows for the creation of package files from some form of input, the management of binary package repositories and the installation and management of those packages on end-user systems. However, many of these tools are not used by end-users, unless they themselves maintain their own package build scripts and binary package repositories.
On distribution documentation 🔍
The documentation of a distribution is key to its success, as it provides its members with access to details on tools, file formats and the overarching concepts. While Arch Linux's ArchWiki is a great resource for using the distribution, it lacks detailed information on developing it, as well as the concepts governing the existing tech stack.
Arguably, a wiki is not the best place for documentation of this sort, which is also made note of in RFC0021: Documentation on the operational side of Arch Linux is better served in a separate, dedicated place.
Similarly, central documentation on common file types, data types and concepts used in the package management stack are an important cornerstone for a shared and broad understanding of the technology. This helps package maintainers, application developers and end-users alike in collaborating and improving the existing set of tools.
Falling through the cracks 🕳️
The projects in the existing stack document most of their own functionality and use-cases for end-users. However, when considering strict validation of artifacts between the various building blocks in the ecosystem, it became clear that large areas of these projects are underspecified and only loosely follow an overarching design. For example, APIs or file formats not considered public or important enough for a dedicated specification by one project may be integral to the safe use of another project consuming its output.
The ALPM project follows in a long tradition of tools in the Arch Linux package management ecosystem, while more strongly focusing on modularity and validation. Already early on it became clear that an extensive documentation effort would be needed as the foundation of its granular design.
ALPM specifications 📜
Based mainly on black-box tests with the existing tooling, as well as input from longtime package maintainers and developers, a growing set of specifications has been written by the ALPM project.
Currently, the documentation is split between information on file formats and concepts. Some specifications already exist in multiple versions, which document different revisions of a format that changed over the past years.
For local access to all specifications in the form of man pages, install the alpm
package group.
pacman -Su alpm
Concepts 📝
- alpm: A top-level overview of Arch Linux Package Management, from package creation to consumption.
- alpm-package: Specifies what ALPM-based packages look like and what they contain.
- alpm-meta-package: Explains what meta packages are and how they are created.
- alpm-split-package: Explains how split packages work and how they are created.
- alpm-architecture: The CPU architecture identifier used in file formats and file names.
- alpm-comparison: The comparison functionality of packages (in particular versions) in various file formats.
- alpm-package-base: The use of
pkgbase
in the various file formats. - alpm-package-group: Explains how package groups work in the various file formats.
- alpm-package-name: Specifies how package names are used in the various file formats and package files.
- alpm-package-relation: The relationships between packages as used in the various file formats.
- alpm-package-source: The types of package sources in use in PKGBUILD and SRCINFO files.
- alpm-package-source-checksum: The hash digests used for package sources in PKGBUILD and SRCINFO files.
- alpm-package-version: An overview of the different types of version strings in use in the various file formats. More details on specific components of version strings can be found in alpm-epoch, alpm-pkgver and alpm-pkgrel.
- alpm-soname: The handling of soname information in package relations in some of the available file formats. This concept exists in multiple versions (alpm-sonamev1 and alpm-sonamev2) and describes how soname information of ELF files is used in metadata.
- alpm-state-repo: A repository in which metadata about the state of one or more binary package repositories is maintained.
File formats 📄
- SRCINFO: The format of
.SRCINFO
files found in the package source repositories of all official packages as well as all AUR source repositories. It provides metadata about the sources and packages defined in an enclosed PKGBUILD file while not requiring Bash to access this data. - ALPM-MTREE: The format of
.MTREE
files found in all package files. This file format exists in multiple versions (ALPM-MTREEv1 and ALPM-MTREEv2) and describes all files contained in a package file. - BUILDINFO: The format of
.BUILDINFO
files found in all package files. This file format exists in multiple versions (BUILDINFOv1 and BUILDINFOv2) and describes the environment used to build a package file. - PKGINFO: The format of
.PKGINFO
files found in all package files. This file format exists in multiple versions (PKGINFOv1 and PKGINFOv2) and describes the metadata of a package file. - alpm-install-scriptlet: The format of an
.INSTALL
file found in some package files. This script file is used to run custom commands around the installation, upgrade or uninstallation of a package. - alpm-repo-desc: The format of
desc
files found in repository sync databases. This file format exists in multiple versions (alpm-repo-descv1 and alpm-repo-descv2) and describes the state of a single package in a binary package repository. - alpm-db-desc: The format of
desc
files found in local libalpm databases. This file format exists in multiple versions (alpm-db-descv1 and alpm-db-descv2) and describes the state of a single package on a given system. - alpm-files: The format of
files
files found in local libalpm and repository sync databases. Depending on context, the file format may be referred to as alpm-db-files or alpm-repo-files, respectively.
In the works 🚧
Further specification documents are planned to describe repository sync databases and a new format for the handling of binary repository state in the future.
The documents are usually accompanied by dedicated parser and writer implementations, which are validated against real data to ensure their correctness (or to find bugs in existing tooling and data).
If this article sparked your interest, consider contributing to the ALPM project!