Netboot on the Pi

In order to understand nobodd, it is useful to understand the netboot procedure on the Raspberry Pi in general. At a high level, it consists of three phases which we’ll cover in the following sections.

DHCP

The first phase is quite simply a fairly typical DHCP phase, in which the bootloader attempts to obtain an IPv4 address from the local DHCP server. On the Pi 4 (and later models), the address obtained can be seen on the boot diagnostics screen. Near the top the line starting with “net:” indicates the current network status. Initially this will read:

net: down ip: 0.0.0.0 sn: 0.0.0.0 gw: 0.0.0.0

Shortly before attempting netboot, this line should change to something like the following:

net: up ip: 192.168.1.137 sn: 255.255.255.0 gw: 192.168.1.1

This indicates that the Pi has obtained the address “192.168.1.137” on a class D subnet (“192.168.1.0/24” in CIDR form), and knows the local network gateway is at “192.168.1.1”.

The bootloader also inspects certain DHCP options to locate the TFTP server for the next phase. Specifically:

  • DHCP option 66 (TFTP server) can specify the address directly

  • If DHCP option 43 (vendor options) specifies PXE string “Raspberry Pi Boot” [1] then option 54 (server identifier) will be used

  • On the Pi 4 (and later), the EEPROM can override both of these with the TFTP_IP option

With the network configured, and the TFTP server address obtained, we move onto the TFTP phase…

TFTP

Note

Most of the notes under this section are specific, in some way, to the netboot sequence on the Pi 4. While older and newer models may broadly follow the same sequence, there will be differences.

The bootloader’s TFTP client first attempts to locate the start4.elf file. By default, it looks for this in a directory named after the Pi’s serial number. On the Pi 4 and later models, the EEPROM configuration can override this behaviour with the TFTP_PREFIX option, but we will only cover the default behaviour here.

All subsequent files will be requested from within this serial number directory prefix [2]. Hence, when we say the bootloader requests SERIAL/vmlinuz, we mean it requests the file vmlinuz from within the virtual directory named after the Pi’s serial number [3].

The attempt to retrieve start4.elf is immediately aborted when it is located, presumably because the intent is to determine the existence of the prefix directory, rather than the file itself. Next the bootloader attempts to read SERIAL/config.txt, which will configure the rest of the boot sequence.

Once SERIAL/config.txt has been retrieved, the bootloader parses it to discover the name of the tertiary bootloader to load [4], and requests SERIAL/start.elf or SERIAL/start4.elf (depending on the model) and the corresponding fix-up file (SERIAL/fixup.dat or SERIAL/fixup4.dat respectively).

The bootloader now executes the tertiary “start.elf” bootloader which requests SERIAL/config.txt again. This is re-parsed [5] and the name of the base device-tree, kernel, kernel command line, (optional) initramfs, and any (optional) device-tree overlays are determined. These are then requested over TFTP, placed in RAM, and finally the bootloader hands over control to the kernel.

TFTP Extensions

A brief aside on the subject of TFTP extensions (as defined in RFC 2347). The basic TFTP protocol is extremely simple (as the acronym would suggest) and also rather inefficient, being limited to 512-byte blocks, in-order, synchronously (each block must be acknowledged before another can be sent), with no retry mechanism. Various extensions have been proposed to the protocol over the years, including those in RFC 2347, RFC 2348, RFC 2349, and RFC 7440.

The Pi bootloader implements some of these extensions. Specifically, it uses the “blocksize” extension (RFC 2348) to negotiate a larger size of block to transfer, and the “tsize” extension (RFC 2349) to attempt to determine the size of a transfer prior to it beginning.

However, its use of “tsize” is slightly unusual in that, when it finds the server supports it, it frequently starts a transfer with “tsize=0” (requesting the size of the file), but when the server responds with, for example, “tsize=1234” in the OACK packet (indicating the file to be transferred is 1234 bytes large), the bootloader then terminates the transfer.

In the case of the initial request for start4.elf (detailed above), this is understandable as a test for the existence of a directory, rather than an actual attempt to retrieve a file. However, in later requests the bootloader terminates the transfer after the initial packet, then immediately restarts it. My best guess is that it allocates the RAM for the transfer after the termination, then restarts it (though why it does this is a bit of a mystery as it could allocate the space and continue the transfer, since the OACK packet doesn’t contain any of the file data itself).

Sadly, the “windowsize” extension (RFC 7440) is not yet implemented which means the Pi’s netboot, up to the kernel, is quite slow compared to other methods.

Kernel

The kernel is now running with the configured command line, and (optionally) the address of an initial ramdisk (initramfs) as the root file-system. The initramfs is expected to contain the relevant kernel modules, and client binaries to talk to whatever network server will provide the root file-system.

Traditionally on the Raspberry Pi, this has meant NFS. However, it may also be NBD (as served by nbd-server(1)) or iSCSI (as served by iscsid(8)). Typically, the init process loaded from the kernel’s initramfs will dissect the kernel’s command line to determine the location of the root file-system, and mount it using the appropriate utilities.

In the case of nbd-server(1) the following items in the kernel command line are crucial:

  • ip=dhcp tells the kernel that it should request an IP address via DHCP (the Pi’s bootloader cannot pass network state to the kernel, so this must be re-done)

  • nbdroot=HOST/SHARE tells the kernel that it should open “SHARE” on the NBD server at HOST. This will form the block device /dev/nbd0

  • root=/dev/nbd0p2 tells the kernel that the root file-system is on the second partition of the block device