Reproducible Builds with BuildKit for Software Supply Chain Security (DockerCon 2023)

Introduction

In this session at DockerCon 2023, Akihiro Suda from n Corporation in Japan discusses the importance of reproducible builds in enhancing software supply chain security, particularly when working with Docker images. The focus is on a technique that allows developers to create bit-for-bit identical image binaries from Dockerfiles using BuildKit. This method addresses the challenges posed by non-reproducible builds and enhances the verifiability of software components.

Background

Software supply chain security has become increasingly challenging with the rise of Docker images, primarily due to the difficulty in verifying the source of these images. Obtaining a Docker image often involves pulling it from a registry and then checking its SHA256 digest. However, even if you have access to the relevant Dockerfile from the source repository, you cannot guarantee that the image was built from this file—there remains the possibility that the image was created from an altered source, potentially containing malicious code.

Although one can inspect image signatures using tools such as Cosign, Notation, or OpenPGP, the same issues of trust remain. Signatures provide a level of security, but they are not foolproof, as users must inherently trust the signers involved. This is where reproducible builds come into play; they serve to provide a way to validate that an image was indeed generated from its source code.

What is Reproducible Builds?

Reproducible builds mean that given the same source code, you can produce the exact same binary. This reproducibility can be confirmed by anyone with access to the source code. Importantly, reproducibility must be maintained over time; images built today should remain identical when built again weeks, months, or even years later.

The need for reproducible builds arises from the fact that non-reproducible builds cannot guarantee that the resulting image can be traced back to harmless source code. For example, if you pull an image from a registry and obtain a SHA256 digest value, and later build it from the same Dockerfile but receive a different digest, the original image cannot be trusted.

In these scenarios, reproducibility assures users that the image can be recreated accurately from the source code, thereby enhancing trust in the publisher of the image. However, it is important to note that while reproducible builds confirm that an image can be created from source, they do not inherently prove that the source code itself is secure. Thus, thorough code review remains essential.

Current Status of Docker Images

It has been observed that the majority of Docker images available online are not reproducible. For instance, the Gang image version 1.2 does not exhibit reproducibility as demonstrated in tests where builds produced different SHA256 digest values. Various factors contribute to this lack of reproducibility, including timestamps in the metadata, the specific version of base images, and variations in how files are ordered during the build process.

Addressing Non-Reproducibility

To address time stamp issues and achieve reproducibility with BuildKit, Akihiro demonstrated how to set the --source-date-epoch argument. This argument can be used to set consistent time stamps when building images. Furthermore, it is suggested to work with more recent versions of BuildKit that provide better handling for reproducibility.

Pinning versions of base images, using specific timestamped images, and even providing scripts to handle dependencies through snapshot sources can help remove uncertainties around installed packages. The importance of building from fixed snapshots rather than taking the latest stable versions is emphasized.

Future Directions

For the future, there is potential for improved processes that promote reproducibility in various ecosystems. The community is looking forward to proposals aimed at making images, including Alpine and RHEL/Fedora distributions, more reproducible.

Conclusion

Reproducible builds serve as a critical aspect of validating images in the complex landscape of software supply chain security. While reproducibility does not guarantee the harmlessness of source code, it significantly reduces concerns over the trustworthiness of images. AW tools and examples shared on Akihiro’s GitHub can guide developers in implementing reproducible builds effectively.

Keywords

Reproducible Builds
Docker Images
BuildKit
Software Supply Chain Security
SHA256 Digest
Version Pinning
Time Stamps
Build Metadata

FAQ

Q: What are reproducible builds?
A: Reproducible builds are the ability to generate identical binary outputs from the same source code, ensuring consistency and trust in the software supply chain.

Q: Why are non-reproducible builds a concern?
A: Non-reproducible builds raise doubts about whether an image can be reliably traced back to its source code, which may lead to security risks.

Q: How does BuildKit enhance reproducibility?
A: BuildKit offers capabilities like the --source-date-epoch option, allowing developers to set consistent timestamps and maintain reproducibility across builds.

Q: What steps can be taken to ensure reproducibility?
A: Developers should pin dependencies to fixed versions, utilize snapshot repositories, and ensure that time stamps and metadata are handled consistently.

Q: Are all Docker images reproducible?
A: No, a large number of Docker images available online are not reproducible, primarily due to variations in build processes and time stamps.