Reproducible R Environment Using Guix
Categories: Lovely Linux Guix R
Tags: reproducibility R Guix development
Abstract
It is essential for a good analytical project to be 100% reproducible. Reproducibility issues in projects typically stem from three parts: 1) the data, 2) the software, and 3) random number generation. The reproducibility from software stack perspective can either come from the code that the analyst writes, or the packages, libraries, and software they use in their analysis. In this article I try to propose a simple solution for the latter without needing to change your Linux distro.
Why should we even bother
I have been working with a friend to come up with an all-inclusive solution for containerized declarative, reproducible Guix-based R development environment, but that project still has some rough edges. So in this article I'm presenting a declarative Guix-based approach that can be used for any programming language and any IDE, but without containerizing the environment. This approach uses Guix for package management and direnv for preparing environment. You can substitute Guix with Nix, but in this article I only discuss Guix, as I like Guile Scheme more than Nix language. :)
The reason that we need to use Guix/Nix is that they:
- allow installation of multiple versions of the same software independently along with their independent dependencies
- avoid package and dependency conflicts
- allow the user to declare the environmently exactly as the wish
- provide a very high level of project reproducibility due to the way they construct derivatives, profiles, and storing packages in their respective stores
One of the biggest advantages of Guix compared to Nix is that it allows you to specifically specify a particular time in history, and all the software you ask it will be installed as if you have went back in time (this is also possible in Nix, but that's a discussion for another day). In other words, if you specify a commit hash for December 2nd 2024, and as it to install R and Python, it would install the version that was available on that particular point of time. This feature is available through guix time-machine. We will get a little bit more into this in this article in practice, but you can read more about it here and here.
Installation of required software
For the sake of this article, all you need is having Guix and direnv installed on your computer. Everything else will be handles by the Guix itself.
Guix
Just to clarify, Guix is a package manager and although they also have Guix Operating System, you can have Guix on any distro. In this article I assume you are on a conventional distro like Ubuntu, Arch, Debian, etc. and you want to setup the environment without installing a new distro.
You can go to the official Guix website and then use the "GNU Guix Binary" option which is suitable to have guix on other Linux distros. I would suggest follow the official Guix binary installation instructions which would take about 1 minute to give it the information it needs, and then it would take about 10 minutes to get fully installed on your computer. Just make sure that your computer has 50GB of extra disk space, At first Guix will not consume 50GB, but after a year of using Guix, it will eventually grow as you will install more and more software with it. The installation guideline at the time of writing this article (2025-11-06T12:32:47+02:00) is:
# Go to the /tmp folder (generally this will be wiped when you reboot)
cd /tmp
# Download the installation script
wget -O guix-install.sh https://guix.gnu.org/install.sh
# Make the installation script executable
chmod +x guix-install.sh
# Execute the installation script
./guix-install.sh
direnv
For direnv you have to do two things:
- Installation of direnv
The
direnvalready exists on many linux distro package repositories. You can find more information about this on direnv official website. - Add the direnv hook to your shell
For adding the hook, there is a very nice explanation page on direnv website, but it can be simplified by adding the following to your .bashrc or .zshrc:
# if direnv is installed, run the hook
if hash direnv 2> /dev/null; then
# get the shell name
local tmp_shell
tmp_shell="basename $(echo $SHELL)"
# add the hook
eval "$(direnv hook ${tmp_shell})"
fi
Workflow
Apart from Guix and direnv, you can install the IDE you want. In this article I'm going to discuss both Emacs and Positron, but I think Rstudio, VScode and other IDEs would work the same way.
Bare-minimum workflow
Let's create a toy project. In this project we will have an R file called test.R in which we use my own package: varhandle as an example. So let's create a folder to add the files:
# create the project folder
mkdir /path/to/the/project #change this based on your preference
# go to the project folder
cd /path/to/the/project #remember to change this accordingly too
# create the R file
touch test.R
Now open the test.R file with whatever text editor you want and add some code. For example:
# -*- mode: ess-r; fill-column: 80; -*-
#-- some code to demonstrate that the package is loaded ------------------------
{
# just to show we can load a package
library("varhandle")
# get some data
my_iris <- iris
# add 20 NAs randomly
for(i in 1:260){
my_iris[sample(1:nrow(my_iris), 2), sample(c(1,2,3,1,3,3,3), 1)] <- NA
}
# now we can inspect the NAs
inspect.na(my_iris)
}
# get the session info
sessionInfo()
# check the path of the R executible in this path
Sys.which("R")
We also need at least two more files:
manifest.scmwhich is the Guix package declaration file;; -*- mode: scheme; -*- (specifications->manifest (list "r-minimal" "r-varhandle" )).envrc# -*- mode: sh; -*- # this is required to create and load a guix profile in the current directory GUIX_PROFILE="$PWD/.envrc.guix-profile" # create guix profile eval $(guix package -p "$GUIX_PROFILE" --manifest=manifest.scm) eval $(guix package -p "$GUIX_PROFILE" --search-paths) # this is required to have guix and common tools (less, man etc.) in PATH PATH="/bin:/usr/bin:/gnu/remote/bin:$PATH"
Now that we have all the files, we can have the most minimal workflow:
- Open a new terminal (it is important to be new so that we load a fresh environment)
- Navigate to the project folder (remember to change the path according to what you made above)
cd /path/to/the/project - The moment you enter the folder. The direnv should ask you to allow running and reading the
.envrc. You should allow it (by runningdirenv allow), and it will try to install the packages in themanifest.scmand create a folder called.envrc.guix-profile. This might take a while depending on how much stuff you have added there, but it only takes time for the first time, and every subsequent run will be quick since those software are already installed. The folder should now look like this:❯ ls -alh total 24K drwxr-xr-x 2 mehrad mehrad 4.0K Nov 12 12:52 ./ drwxr-xr-x 4 mehrad mehrad 4.0K Nov 10 14:50 ../ -rw-r--r-- 1 mehrad mehrad 400 Nov 10 15:21 .envrc lrwxrwxrwx 1 mehrad mehrad 26 Nov 11 11:46 .envrc.guix-profile -> .envrc.guix-profile-1-link/ lrwxrwxrwx 1 mehrad mehrad 51 Nov 11 11:46 .envrc.guix-profile-1-link -> /gnu/store/ra2vyx3c213gq64sgdzpmk3nqbvsq78q-profile/ -rw-r--r-- 1 mehrad mehrad 94 Nov 11 12:34 manifest.scm -rw-r--r-- 1 mehrad mehrad 201 Nov 10 14:52 README.md -rw-r--r-- 1 mehrad mehrad 601 Nov 11 12:33 test.R - When guix is done, run
positron,emacs, or any IDE you like in the terminal. You can also openRdirectly in the terminal if you for now don't want to include an IDE in your workflow.
Now, for educational purposes, let's cofirm few things in the R console:
- if the
varhandlepackage is installed:is.element("varhandle", row.names(installed.packages())) - the R version, by running
version(Note this is not a conventional function, so it does not need()). At the time of writing this, I get this output:> version _ platform x86_64-unknown-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status major 4 minor 5.0 year 2025 month 04 day 11 svn rev 88135 language R version.string R version 4.5.0 (2025-04-11) nickname How About a Twenty-Six - We can also use the equivalent of the shell's
whichcommand in R and see where R is pointed to:
The output should be in your project folder.Sys.which("R") - You can of course see the path to the R binary which is much more solid approach than the step 3:
This should result in something like this (the hash can be different for you, but the rest should match):file.path(R.home("bin"), "R")> file.path(R.home("bin"), "R") [1] "/gnu/store/8ahzimkjx5xhdgir21z5rbj811q972qq-r-minimal-4.5.0/lib/R/bin/R"
As the final last step, let's also run the test.R:
source("test.R")
Using Guix time-machine
Now that we got the minimum setup covered, let's improve it and use the guix time-machine to make sure the software we have in the project is pinned to a specific time in history. The guix time-machine is in my opinion the most interesting and most handy tool it provides. Anyways, let's try something fun.
For using the time-machine and travel to a point in time, we need to have an additional file, channels.scm, to define the repository and their respective commit hashes. Let's use the following as the content of channels.scm:
(list (channel
(name 'guix)
(url "https://codeberg.org/guix/guix.git")
(branch "master")
(commit
"a2590694ae0350f9d7400f6f6f41fdbac2fa5340")
(introduction
(make-channel-introduction
"9edb3f66fd807b096b48283debdcddccfea34bad"
(openpgp-fingerprint
"BBB0 2DDF 2CEA F6A8 0D1D E643 A2A0 6DF2 A33A 54FA")))))
Note that the "a2590694ae0350f9d7400f6f6f41fdbac2fa5340" is the commit hash which indicates the point in time which dates back to Oct 12, 2025, 09:09 PM GMT+3, and the "9edb3f66fd807b096b48283debdcddccfea34bad" is the commit that was signed with the GPG key A2A06DF2A33A54FA.
We also need to update the .envrc file to instruct Guix to use the channel file:
# -*- mode: sh; -*-
## this is required to create and load a guix profile in the current directory
export GUIX_PROFILE="$PWD/.guix-profile"
eval $(guix time-machine \
--channels='channels.scm' \
-- package \
--substitute-urls='https://ci.guix.gnu.org' \
--manifest='manifest.scm' \
--profile="${GUIX_PROFILE}")
eval $(guix package -p "${GUIX_PROFILE}" --search-paths)
## this is required to have guix and common tools (less, man etc.) in PATH
export PATH="/bin:/usr/bin:/gnu/remote/bin:$PATH"
export R_LIBS_USER="${GUIX_PROFILE}/site-library/"
lrwxrwxrwx 1 mehrad mehrad 20 Nov 12 12:40 .guix-profile -> .guix-profile-1-link
lrwxrwxrwx 1 mehrad mehrad 51 Nov 12 12:40 .guix-profile-1-link -> /gnu/store/5nl8mg32ab57qpa9mjqvs6h6ncxdjriz-profile
-rw-r--r-- 1 mehrad mehrad 943 Nov 12 12:29 channels.scm
-rw-r--r-- 1 mehrad mehrad 617 Nov 12 12:33 .envrc
-rw-r--r-- 1 mehrad mehrad 94 Nov 11 12:34 manifest.scm
-rw-r--r-- 1 mehrad mehrad 181 Nov 12 12:31 README.md
-rw-r--r-- 1 mehrad mehrad 601 Nov 11 12:33 test.R
Final notes
This approach can be extended in a very simple way. For example not all R packages are packaged in the official Guix channel, so it make sense to add other channels to provide the software you need. There is a good search tool to know which channel has packaged the software you want: https://toys.whereis.social
For example the channels.scm can extended to be:
;; -*- mode: scheme; -*-
(list
;; Guix main channel
(channel (name 'guix)
(url "https://codeberg.org/guix/guix.git")
(branch "master")
(commit
"a2590694ae0350f9d7400f6f6f41fdbac2fa5340")
(introduction
(make-channel-introduction
"9edb3f66fd807b096b48283debdcddccfea34bad"
(openpgp-fingerprint
"BBB0 2DDF 2CEA F6A8 0D1D E643 A2A0 6DF2 A33A 54FA"))))
;; The CRAN (official R package repository)
(channel (name 'guix-cran)
(url "https://github.com/guix-science/guix-cran.git")
(branch "master")
(commit
"8e31deefce41c4f2d4c83ac6271dda2dc553c957"))
;; The Guix Science channel
(channel (name 'guix-science)
(url "https://codeberg.org/guix-science/guix-science.git")
(branch "master")
(commit
"d78e1d5763e44705ee901f8c7c47b3aedd565ed6")
(introduction
(make-channel-introduction
"b1fe5aaff3ab48e798a4cce02f0212bc91f423dc"
(openpgp-fingerprint
"CA4F 8CF4 37D7 478F DA05 5FD4 4213 7701 1A37 8446")))))