Netboot on the Pi
In order to understand nobodd, it is useful to understand the netboot procedure on the Raspberry Pi in general. At a high level, it consists of three phases which we’ll cover in the following sections.
DHCP
The first phase is quite simply a fairly typical DHCP phase, in which the bootloader attempts to obtain an IPv4 address from the local DHCP server. On the Pi 4 (and later models), the address obtained can be seen on the boot diagnostics screen. Near the top the line starting with “net:” indicates the current network status. Initially this will read:
net: down ip: 0.0.0.0 sn: 0.0.0.0 gw: 0.0.0.0
Shortly before attempting netboot, this line should change to something like the following:
net: up ip: 192.168.1.137 sn: 255.255.255.0 gw: 192.168.1.1
This indicates that the Pi has obtained the address “192.168.1.137” on a class D subnet (“192.168.1.0/24” in CIDR form), and knows the local network gateway is at “192.168.1.1”.
The bootloader also inspects certain DHCP options to locate the TFTP server for the next phase. Specifically:
DHCP option 66 (TFTP server) can specify the address directly
If DHCP option 43 (vendor options) specifies PXE string “Raspberry Pi Boot” [1] then option 54 (server identifier) will be used
On the Pi 4 (and later), the EEPROM can override both of these with the TFTP_IP option
With the network configured, and the TFTP server address obtained, we move onto the TFTP phase…
TFTP
Note
Most of the notes under this section are specific, in some way, to the netboot sequence on the Pi 4. While older and newer models may broadly follow the same sequence, there will be differences.
The bootloader’s TFTP client first attempts to locate the start4.elf
file. By default, it looks for this in a directory named after the Pi’s serial
number. On the Pi 4 and later models, the EEPROM configuration can override
this behaviour with the TFTP_PREFIX option, but we will only cover the
default behaviour here.
All subsequent files will be requested from within this serial number directory
prefix [2]. Hence, when we say the bootloader requests
SERIAL/vmlinuz
, we mean it requests the file vmlinuz
from
within the virtual directory named after the Pi’s serial number
[3].
The attempt to retrieve start4.elf
is immediately aborted when it is
located, presumably because the intent is to determine the existence of the
prefix directory, rather than the file itself. Next the bootloader attempts to
read SERIAL/config.txt
, which will configure the rest of the boot
sequence.
Once SERIAL/config.txt
has been retrieved, the bootloader parses it to
discover the name of the tertiary bootloader to load [4], and
requests SERIAL/start.elf
or SERIAL/start4.elf
(depending on
the model) and the corresponding fix-up file (SERIAL/fixup.dat
or
SERIAL/fixup4.dat
respectively).
The bootloader now executes the tertiary “start.elf” bootloader which requests
SERIAL/config.txt
again. This is re-parsed [5] and the name of
the base device-tree, kernel, kernel command line, (optional) initramfs, and
any (optional) device-tree overlays are determined. These are then requested
over TFTP, placed in RAM, and finally the bootloader hands over control to the
kernel.
TFTP Extensions
A brief aside on the subject of TFTP extensions (as defined in RFC 2347). The basic TFTP protocol is extremely simple (as the acronym would suggest) and also rather inefficient, being limited to 512-byte blocks, in-order, synchronously (each block must be acknowledged before another can be sent), with no retry mechanism. Various extensions have been proposed to the protocol over the years, including those in RFC 2347, RFC 2348, RFC 2349, and RFC 7440.
The Pi bootloader implements some of these extensions. Specifically, it uses the “blocksize” extension (RFC 2348) to negotiate a larger size of block to transfer, and the “tsize” extension (RFC 2349) to attempt to determine the size of a transfer prior to it beginning.
However, its use of “tsize” is slightly unusual in that, when it finds the server supports it, it frequently starts a transfer with “tsize=0” (requesting the size of the file), but when the server responds with, for example, “tsize=1234” in the OACK packet (indicating the file to be transferred is 1234 bytes large), the bootloader then terminates the transfer.
In the case of the initial request for start4.elf
(detailed above),
this is understandable as a test for the existence of a directory, rather than
an actual attempt to retrieve a file. However, in later requests the bootloader
terminates the transfer after the initial packet, then immediately restarts
it. My best guess is that it allocates the RAM for the transfer after the
termination, then restarts it (though why it does this is a bit of a mystery as
it could allocate the space and continue the transfer, since the OACK packet
doesn’t contain any of the file data itself).
Sadly, the “windowsize” extension (RFC 7440) is not yet implemented which means the Pi’s netboot, up to the kernel, is quite slow compared to other methods.
Kernel
The kernel is now running with the configured command line, and (optionally) the address of an initial ramdisk (initramfs) as the root file-system. The initramfs is expected to contain the relevant kernel modules, and client binaries to talk to whatever network server will provide the root file-system.
Traditionally on the Raspberry Pi, this has meant NFS. However, it may also
be NBD (as served by nbd-server(1)) or iSCSI (as served by
iscsid(8)). Typically, the init
process loaded from the kernel’s
initramfs will dissect the kernel’s command line to determine the location of
the root file-system, and mount it using the appropriate utilities.
In the case of nbd-server(1) the following items in the kernel command line are crucial:
ip=dhcp
tells the kernel that it should request an IP address via DHCP (the Pi’s bootloader cannot pass network state to the kernel, so this must be re-done)nbdroot=HOST/SHARE
tells the kernel that it should open “SHARE” on the NBD server at HOST. This will form the block device/dev/nbd0
root=/dev/nbd0p2
tells the kernel that the root file-system is on the second partition of the block device
The tertiary bootloader operates on all [sections]
in the
config.txt
. The secondary bootloader (bootcode.bin
) only
operates on some of these and doesn’t comprehend the full syntax that the
tertiary bootloader does (for instance, the secondary bootloader won’t
handle includes).