From Ansible to NixOS

Ming
9 min readNov 27, 2023

--

I gave up; that was the sixth time I bricked my Raspberry Pi with apt dist-upgrade.

I needed a way to go back to a working state.

Photo by Praveen Thirumurugan on Unsplash

Naive solutions first

My Raspberry Pi was running Raspberry Pi OS lite (derived from Debian). I’m no expert with Linux. My best ideas were:

  1. Restore a full backup image of the SD card.
  2. Reinstall the OS.

The first option sounded dumb.

With the whole OS took up <1GB, keeping a 32GB (size of my SD card) image around looks stupid. Yes, I could have zerofill-ed the free space beforehand and gzip-ed the .img file, but that's trading disk space with lots of disk I/O. Since SD cards can only hold up to ~10,000 write cycles, I won’t tax my SD cards with that.

So I explored the 2nd option.

But reinstalling OS bores me. I’m fine with the wait, but I hate the attention I have to put into the setup process. I’m talking about all those EULAs to accept, packages to install, and configuration files to copy over from my laptop.

Naturally, I tried to automate this process.

Circa 2015, I handcrafted a bash script that sets up a new OS installation to my liking. I could just scp it over and execute. Looking back, that was my first step into the idea of Infrastructure as Code (IaC).

It served its purpose just fine, till the computers under my name triple-folded.

My experience with Ansible

It was in 2017. I was doing my master’s degree in data science, and I gained access to multiple UNIX-like machines:

  • a MacBook that I brought to classes,
  • a Dell desktop at home,
  • a Mac mini in the office,
  • 5 servers in the two research groups I was affiliated with, running Ubuntu,
  • 2 high-performance clusters (HPCs) from the school, which are on CentOS (IIRC),
  • tilde.club, where I hosted my homepage at that time and was running Fedora,
  • aaaaaaand my Raspberry Pi.

To save my sanity, I needed them to provide similar environments, but they were using different OS-level package managers, so my bash script was soon plagued with if-else statements trying to catch ’em all. I ended up spending more time adding if [ $? -eq 0 ]; clauses than I would have if I read EULAs in their entirety.

I wanted to just import someone else’s cross-platform setup script, and Ansible scratched exactly that itch.

How I use Ansible

Ansible Roles are pre-packaged setup scripts. They abstract away platform-specific details, enabling one to use them in a declarative fashion.

For my Raspberry Pi (“RPi” hereafter), its Ansible Playbook comprises these Roles:

  • mambaorg.micromamba: Installs Micromamba, which I use to manage all my virtual environments.
  • hannseman.homebridge: Sets up Homebridge, which allows me to turn my HomeKit-incompatible smart bulb on / off based on my iPhone's proximity with home. (Since I owned only one piece of smart accessory and barely put any music on speaker, I wasn't invested in a HomePod.)
  • bertvv.samba: Shares files over network. With an external drive connected, my Pi works as a network attached storage (NAS) and therefore as a Time Machine backup destination. That's how I satisfy (part of) the 3-2-1 Backup Strategy.
  • m4rcu5nl.zerotier-one: Adds the RPi to my personal VPN. Not the kind of VPN that unlocks geo-restricted Netflix content; it's a way that I can SSH into my other machines without exposing their corresponding ports to the internet.

The format of a Role is [maintainer].[role_name]. Notice how few Roles are maintained by the same developers of the to-be-set-up programs themselves. My gratitude goes to these maintainers and the vast Ansible community in general.

Where Ansible falls short

While saving me from re-inventing many wheels, Ansible is still far from perfect.

Essentially, we are still rebuilding, rather than rolling back to, the known good configuration. This means 2 fundamental steps are not automated:

  1. Backing up data files (documents, etc.).
  2. Re-installing the base OS (i.e., things that dist-upgrade didn't touch).
  3. Set up SSH, which implies setting up user credentials.

Frankly, none of these steps provide any value — They don’t fix a problem. Worse, they still cost computation time. Ansible is notoriously slow already, and RPi’s feeble CPU only aggravates the situation.

Ansible’s slowness has its design to blame: Under the hood, Ansible copies Python scripts to the remote for execution. This also means that, when Ansible fails, you end up with more garbage in /tmp than you'd have with a vanilla bash script. This isn't usually a concern, but with limited disk space on RPis, it could still manifest as an annoyance.

(One more comment on this: being declarative doesn’t guarantee atomicity. Just because Ansible hides away the exact steps taken doesn’t mean things can no longer break half-way.)

Is there a way to avoid these steps then?

Dipping my toes into NixOS

Obligatory `neofetch` screenshot from my RPi running NixOS.

What I need is an option to boot right into the last known good configuration if a system update went wrong. The idea is nothing new:

Is there a Linux distribution that snapshots the system out of the box? NixOS seems to be the answer. As a matter of fact, there’s an official documentation on installing NixOS on a RPi.

(Aside: You see, this is why I prefer RPis over other single-board computers (SBCs) — There is far richer information on exactly the same hardware than on any competitors. If you want to do something with your RPi, chances are that someone has already trodden the path and written that down for you.)

Immutable OS?

You might have heard about the concept “immutable operating systems” and wondered why I hadn’t used that phrase. That’s because I don’t necessarily need immutability.

Under this context, “immutable” means “users & applications can’t modify system files willy-nilly”; such changes can only be done by the OS itself (unless you abuse sudo, of course). As anyone who messed around with the system32 folder knows, Windows isn't immutable, but that doesn't stop it from offering decent system recovery features (namely, System Restore).

Snapshots. Photo by Lisa Fotios via Pexels

In my mind, immutability motivates OS developers to incorporate “last known good state” into the boot menu. With system updates being the only events that could alter the OS, its developers may feel more responsible to promise atomicity; that is, shielding users from getting exposed to a half-configured state. But that’s it; it’s no more than a motivation.

My interpretation of the term “immutable OS” is quite verbatim and disentangled from the property of “atomic updates”, which is notably different from some popular ideas, such as this blog post from Ubuntu.com. That being said, Oliver Smith did a brilliant job illustrating some common implementations of atomic updates. Although the section title, the architecture of an immutable Linux OS, isn’t something I’d fully agree with, I recommend everyone to read through it nonetheless.

Equivalents of some Ansible snippets in the NixOS world

For fellow IaC enthusiasts who are considering to switch from Ansible to NixOS, I think it’s helpful to share some code snippets for achieving the same goals across the two systems. They will illustrate the differences (and make the Nix language less daunting). Let’s compare them for the following cases:

  1. A simple task that uses a Role in the Ansible world and a Service in NixOS.
  2. A slight more complex example.
  3. An OS-level configuration that uses a Role in Ansible but not a Service in NixOS.
  4. A containerized application.

The first goal is setting up ZeroTier One (“0t1” hereafter). In Ansible Playbook, this can be written as:

  roles:
- role: m4rcu5nl.zerotier-one
become: true
vars:
zerotier_network_id: 1234567890000000

With configuration.nix, 0t1 can be declared as a service:

{ config, pkgs, lib, ... }:
{
services = {
zerotierone = {
enable = true;
joinNetworks = [ "1234567890000000" ];

Behind the scene, both methods invoke shell commands to get the work done (Ansible; NixOS).

How about a more complex service, such as setting up Samba as a Time Machine backup destination?

In Ansible, this is what I have:

  roles:
- role: bertvv.samba
become: true
vars:
samba_apple_extensions: yes
samba_users:
- name: time-traveler
password: correct-horse-battery-staple
samba_shares:
- name: TimeCapsule
vfs_objects:
- name: fruit
options:
- name: time machine
value: 'yes'
- name: streams_xattr
- name: catia
path: /mount/Remuwabo
valid_users: time-traveler
write_list: time-traveler
owner: time-traveler
group: time-travelers
public: no
guest_ok: no
browseable: yes

In NixOS, again we use a service:

{ config, pkgs, lib, ... }:
{
samba = {
enable = true;
openFirewall = true;
shares = {
TimeCapsule = {
path = "/mount/Remuwabo";
"valid users" = "time-traveler";
public = "no";
writeable = "yes";
"guest ok" = "no";
"force user" = "time-traveler";
"force group" = "time-travelers";
"fruit:time machine" = "yes";
"fruit:aapl" = "yes";
"vfs objects" = "catia fruit streams_xattr";
};
};

In both snippets, /mount/Remuwabo refer to the external hard drive attached to my RPi. How do we do something as basic as that?

(Aside: “Remuwabo” is how I would pronounce “removable” as if it were a loanword to Japanese. I name things in Romaji so as to remind myself which values were my inventions and which ones were enums — One of the less-known perks of speaking another language.)

To mount a drive with Ansible, I use a Task rather than a Role:

  tasks:
- name: Mount up device by UUID
mount:
path: /mount/Remuwabo
src: UUID=1f824a54-8cc0-11ee-b9d1-0242ac120002
fstype: ext4
opts: rw
state: present
become: true

With Nix, there’s a dedicated section called fileSystems:

  fileSystems = {
"/mount/Remuwabo" = {
device = "/dev/disk/by-uuid/1f824a54-8cc0-11ee-b9d1-0242ac120002";
fsType = "ext4";
options = [ "rw" ];

The last thing I want to highlight is containerized applications. We’ll use Homebridge as an example.

In fact, I’ve never got Docker to run properly on RPi OS. With Ansible, I was using the Role geerlingguy.docker_arm (now deprecated), which was blocked by a dependency issue back in 2020. Although there was a PR for a fix, I didn’t invest more time into it, since there was another Role that installed Homebridge as a NPM package and worked just fine:

    - role: hannseman.homebridge
become: true
vars:
homebridge_nodejs_version: "18.x" # Find version numbers here: https://github.com/homebridge/homebridge/releases
homebridge_version: 1.6.0
homebridge_pin: 123-45-678
homebridge_name: HomuBuruizhi
homebridge_dir: /var/lib/homebridge
homebridge_plugins:
- name: homebridge-tplink-smarthome
version: 4.0.1
- name: homebridge-config-ui-x
version: 4.13.3

On the other hand, NixOS supports containerization right out of the box. The default backend is systemd-nspawn, which -- thanks to the Open Container Initiative -- allows one to run images from Docker Hub with virtually no tweaks:

  networking.firewall.interfaces."end0" = {
allowedTCPPorts = [ 5353 8581 51789 ];
allowedUDPPorts = [ 5353 ];

allowedTCPPortRanges = [{ from = 52100; to = 52150; }];
};
systemd.tmpfiles.rules = [ "d /var/lib/homebridge 0755 root root" ];
virtualisation.oci-containers.containers.HomuBuruizhi = {
image = "homebridge/homebridge:latest";
volumes = [ "/var/lib/homebridge:/homebridge" ];
extraOptions = [ "--network=host" ];

(based on nifoc/dotfiles)

Conclusion

That wraps up our tour along the borderline between the Republic of Ansible and the NixOS Nation. I’m still new to the Nix world, so I’ll save the following topics to a next post:

  • Home Manager,
  • cross-compiling NixOS packages for RPi from a more powerful machine, and
  • what it looks like to actually roll back to a last-known good configuration with NixOS. (Guess what — my USB keyboard doesn’t work on the boot menu, so I’m really stuck with using NixOS as just another Ansible for now.)

If you enjoyed this article, please also read the post from Michael Lynch, who also migrated from Ansible to the Nix land this year. Also I would love to hear from you if you managed to take your own jump between the systems. Cheers.

--

--