dist-git and exploded SRPMS – demystified
In this article we address another topic which appeared in multiple discussions recently. We take a look at the difference between the SRPM and the so called dist-git repository of a package. And why do we indeed prefer the dist-git.
How RPM packages work?
In simple words RPM packages need three things:
- archive of the original sources of the upstream application;
- set of patches which needs to be applied to the original sources;
- recipe (RPM spec file), which describes how to apply the patches, how to build the code and how to install it on a target system.
When developing an RPM package you treat the upstream sources as a read-only object. You can not change the upstream sources, they should match the exact content upstream has released.
To diverge from upstream, for example to backport a fix or to integrate the software better in the system, you create and maintain patches as separate files next to your upstream sources.
Then to build a package the build system needs to fetch the archive with original sources, unpack it, apply patches as described in the spec, run the build scripts again as described in the spec, arrange the resulting files in a specific way and pack them into archive together with the installation recipe.
This archive is the final “binary RPM” which you can install on your system using rpm
or dnf
commands.
As we build software for multiple architectures, we can produce several binary RPMs from the same source data by building them on different workers with different architectures (one for x86_64
, one for aarch64
and so on).
What is dist-git
dist-git is a git repository with a specific layout, which Fedora, CentOS Stream and RHEL use to develop RPM packages.
The very minimal dist-git repo would look like this:
.
├── my-app.spec // spec file
├── sources // reference to the sources
└── patch-for-some-feature.patch // patch to apply to the sources
The important feature of the dist-git is that it doesn't store the unpacked sources of the application. It only stores a reference to the tarball of original upstream sources in a so-called lookaside cache.
This reference is stored in the file which is called ./sources
in the root of the git repository. See for example sources
of a glibc package in Fedora Rawhide
The lookaside cache of Fedora and CentOS (Stream or not Stream) is public and you can download any of its content.
Now, since dist-git is the main repository where package development is happening, package maintainers often use it to store all sorts of additional things (scripts, readme files, infra configurations..) which can help them to do the work.
There is also a recommended way to write tests in dist-git (see TMT). These integration tests are not part of the RPM package, but they are used in CI workflows and we recommend to put them in the dist-git repository, so that people can contribute to the package and the test development via the same interface.
Example – keepalived dist-git
Let's take a random package build, for example keepalived-2.2.4-6.el9
.
dist-git for the package has the following structure:
.
├── bz2028351-fix-dbus-policy-restrictions.patch // patches
├── bz2102493-fix-variable-substitution.patch
├── bz2134749-fix-memory-leak-https-checks.patch
├── gating.yaml // * CI configuration
├── .gitignore // * standard gitignore
├── keepalived.init // additional sources
├── keepalived.service // additional sources
├── keepalived.spec // spec file
├── rpminspect.yaml // * rpminspect checks configuration
├── sources // reference to the lookaside cache
└── tests // * dist-git test scenarios, run on every merge request
├── keepalived.conf.in
├── run_tests.sh
└── tests.yml
Here I marked with asterisk the files which are not relevant to the RPM package build.
What is SRPM
As explained above, RPM package build requires multiple inputs. While the inputs are stored in dist-git and lookaside cache, you need to fetch them and carry around the build system to the build workers.
Instead of fetching data from the internet during the build process (no build systems should ever do this!), we fetch all of the sources at the beginning, pack them in a tarball (SRPM file) and then use that self-contained tarball to run the builds in the isolated build environment.
The SRPM then serves as a record of what build system got as input to produce the binary files.
Example – keepalived SRPM
SRPM for the package contains the following data:
bz2028351-fix-dbus-policy-restrictions.patch 1.58 KB
bz2102493-fix-variable-substitution.patch 929.00 B
bz2134749-fix-memory-leak-https-checks.patch 1.87 KB
keepalived-2.2.4.tar.gz 1.10 MB
keepalived.service 392.00 B
keepalived.spec 20.47 KB
You can see how the SRPM was produced by the build system together with binary RPMs via the Koji build task https://kojihub.stream.centos.org/koji/buildinfo?buildID=27965
The build task used dist-git commit as the input:
Source: git+https://gitlab.com/redhat/centos-stream/rpms/keepalived#fc07f81c047dca49df2fc9d20513a7f52005a54d
Note how the SRPM contains full tarball of the original upstream sources (1.10 MB of it). This tarball was fetched from the dist-git lookaside cache during the SRPM build step.
What is exploded SRPM
Fedora and RHEL use dist-git repositories for a very long time. Fedora dist-git has always been public, while RHEL dist-git repositories were internal and not available for people outside of Red Hat.
So the only way for CentOS Project to rebuild RHEL code was to take the SRPM files and use them as the source of the rebuild.
Since CentOS Project needed to rebrand or adjust certain packages, they didn't take RHEL SRPMs as is, rather they unpacked them and put the unpacked sources in git repository. This way they got access to at least some history of the changes, were able to apply their own patches and generally increased the visibility of the content.
Example – keepalived exploded SRPM
“Exploded SRPM” at git.centos.org for this package looks like:
.
├── .gitignore
├── .keepalived.metadata // same as ./sources in dist-git
├── SOURCES
│ ├── bz2028351-fix-dbus-policy-restrictions.patch // patches
│ ├── bz2102493-fix-variable-substitution.patch
│ ├── bz2134749-fix-memory-leak-https-checks.patch
│ └── keepalived.service // additional sources
└── SPECS
└── keepalived.spec // spec file
Exploded SRPM git again doesn't store the upstream tarball in the repository and references the lookaside cache via .keepalived.metadata
file.
You can see the same files as included in the SRPM, though they are put into a different directory structure. And none of the additional files (tests, scripts, configs) are available.
Take away
dist-git repository is the original source of an RPM package build. Fedora, CentOS Stream and RHEL packages are all built directly from dist-git repositories.
SRPM is an artifact of the build process. It is produced from the commit in dist-git and then stored alongside the binary RPM.
Exploded SRPM is an attempt to recover the original git structure from the SRPM in case there is no access to the dist-git repository. It does contain the same source files and spec as in dist-git, but it is not able to recover additional non-packaged data, like configuration files, tests and so on.
We recommend to use dist-git for any collaboration and development purposes.
P.S. You can also take a look at the Source Git initiative which aims to change the approach to RPM sources to make upstream source code more accessible.