This just shows how proprietary boot flow of RPi is slow. On other platforms you can get bootloader to hand off to kernel in subsecond timeframes, sometimes as fast as in 0.3s.
Then kernel can be done in another 0.3s or faster, and then userspace can still be optimized more than what's described in the article, too. Instead of shell and executing programs, which requires filesystem access, loading them, loading shared libraries, etc. you can have a specialized, tiny statically built init binary which just does all initialization/mounts via syscalls and then runs your app (which should be built statically, too).
This way you can achieve ~1-2s boot to UI on even way slower platforms than Rpi4. I did something like this on Pinephone a few years ago and recently again on Luckfox Pico Mini, which is a slow Cortex-A7 with DDR2 memory.
There's no justification for Rpi to be booting for 12s. Even unoptimized standard U-Boot on common platforms often runs for just ~1s before handing off to the kernel.
PhilipRoman 121 days ago [-]
>On other platforms you can get bootloader to hand off to kernel in subsecond timeframes, sometimes as fast as in 0.3s.
Any good resources about this? I absolutely loathe the pre-kernel boot times we get at $WORK, even when using a supposedly minimal bootloader. Just decompressing the kernel takes like 10 seconds.
megous 121 days ago [-]
For 0.3s you need specialized bootloader, like eg. levinboot https://gitlab.com/DeltaGem/levinboot for Rockchip RK3399 or https://xnux.eu/p-boot/ for Allwinner A64. Or make something similar for your chosen platform with U-Boot's falcon mode and making the kernel as small as possible, so that not much needs to be loaded for the initial UI (put in modules what can wait, like wifi driver, etc.).
For around 1-2s you need a platform that has good SD or eMMC host implementation in U-Boot, and doesn't spend too much time initializing DRAM, or in U-Boot SPL code before MMU/CPU data cache is enabled. If the CPU is slow or not upclocked in U-Boot, not doing decompression in bootloader is also a tradeoff option that may end up being faster. Some newer Rockchip RK35xx SoCs have pretty fast boot times in this regard. (RK3399 interstingly doesn't)
It's all necessarily SoC specific.
On some platforms (RV1103/6) I've also experimented with using bootrom functions that normally load U-Boot to load the kernel directly, so with that approach, DTB, and kernel is being loaded right away, without re-initializing eMMC or SD card or having to load and execute a complicated DT driven bootloader like U-Boot at all. But that required quite bit of work and some reverse engineering of the Boot ROM code, so that's not just a optimization.
122 days ago [-]
joezydeco 125 days ago [-]
"Instead of finding and optimizing tasks, we just turned systemd off". Okay.
The folks at Bootlin have way more interesting approaches and techniques to review. I'd look to them.
"The new average is 107 ms smaller, which you are likely to consider as a worthy reduction, if you have experience with boot time reduction projects."
Not really. The result is kind of a letdown and that was too be expected, going after a single memcpy is low ROI because memcpy performance is probably what processor and library makers look at first.
What wins big is to remove useless stuff entirely. The demo boards I have seen often come with a standard Debian with all the services one might want for a workstation. The kernel is usually the same, with drivers such as PS/2 mouse enabled - thankfully just as a module, but disabling the useless drivers wins big in kernel compilation time. Which makes optimizing the rest a bit more comfortable (some options such as hardening and debugging can nearly full recompilation).
While there, one can also build the drivers one always want inside the kernel (not as module). A module inside a compressed kernel is loaded faster than from somewhere in a filesystem.
That said, Bootlin has indeed a lot of good stuff.
As for turning off systemd, it is not a bad idea if one replaces it with an init system that can start stuff in parallel (to mask the waiting times). I get that desktop users and admins like it a lot, but for an embedded system with a specific mission and a static configuration, it can overkill. There are a bit more caveats than what they say, though.
ryandrake 122 days ago [-]
Most of the "make RPi Linux boot fast" guides out there use the same approach: Start with a tiny Debian and remove stuff until it's faster.
There's not a lot of good information about attacking it from the other way: Start with a bootloader, firmware, and a kernel, and add only the stuff that you need. I mean, it's probably out there on the web somewhere, but I've found it tough to find among all the "just strip down Debian" blog posts. If anyone has experience doing this and could point me to a few resources, I'd be grateful!
astrobe_ 121 days ago [-]
I'm not the best one to answer your question, but I know there's Buildroot and Yocto - the latter being a bit "heavy" in terms of setup. I think I would go for one of those distros that focus on minimalism (Damn Small Linux, Alpine, ...) which also have precompiled packages for your target (note: Debian does have a tool to build custom root filesystems [1]).
From there, get gcc and make for your platform, so you can carry on with compilation on-target, if possible; cross-compilation can be tricky. It should be ok if you don't want the heavier GUI stuff like X11/Wayland, browser etc.
> What wins big is to remove useless stuff entirely.
This. And there's even a project (targeting mainly RasPi) that does just that: https://gokrazy.org
(And yes, you can also deploy code that wasn't written in Go, although it's quite clunky.)
moffkalast 122 days ago [-]
I mean systemd-analyze blame usually shows only a few services taking up 90% of the boot time (looking at you snap daemon and cloud init taking a whole 20 seconds lmao) so turning off those yields most of the gain without any real usability hit for general use.
But in practical terms if you really need boot speed, you need a Pi 5 that only takes a few seconds at worst, not a Pi 4 which uses its slow ass GPU to run the boot process for absolute maximum sluggishness possible.
notpushkin 122 days ago [-]
This just prompted me to finally look into boot time on my Asahi Linux setup. Turns out Docker was eating 5 seconds (about half of the boot time!) for some reason, so I disabled it (and enabled docker.socket instead, so that it starts up automatically when I need it).
yjftsjthsd-h 122 days ago [-]
Is there a way to start a service but not block on it? I want to start docker with the system, but nothing should wait on it
ta988 122 days ago [-]
create a .timer file in your service directory
with
A few seconds is still very long - in automotive you’re dealing with a few hundred milliseconds from power is applied till the system is presenting an interactive GUI.
joezydeco 122 days ago [-]
Automotive HMIs with Linux tend to boot long, then hibernate while the car is off. The only thing that needs to come up quickly (at least in the US) is the backup camera and there tends to be hardware to inject that video quicker than the HMI can.
dividuum 122 days ago [-]
Even if your userspace is pretty fast, some necessary initialization happening before the kernel is even loaded means the you probably never get lower than 6-8 seconds. There is a lengthy EEPROM issue thread about that for the Pi4: https://github.com/raspberrypi/firmware/issues/1375
cvilgan 122 days ago [-]
I wonder how this even works, as PID 1 should exit right away, no? And I thought this would lead to an immediate kernel panic?
halayli 122 days ago [-]
what do you mean pid 1 should exit right away? pid 1 is the init process and is the first and last process in the OS life cycle. All other processes run as child processes of init.
cvilgan 122 days ago [-]
Ah, sorry, that was badly written. What I meant was that it looks like the shell script exits almost immediately, i.e. pid 1 forks the app process but then exits itself...
halayli 119 days ago [-]
got it. The confusion then is spawning(no pun) from the assumption that pid 1 script forks while it doesn't. It calls a program and wait for it to exit. Only then pid 1 exits and the system shuts down.
megous 122 days ago [-]
Yeah, I'd say it simply doesn't work as described.
Rendered at 05:32:16 GMT+0000 (Coordinated Universal Time) with Vercel.
Then kernel can be done in another 0.3s or faster, and then userspace can still be optimized more than what's described in the article, too. Instead of shell and executing programs, which requires filesystem access, loading them, loading shared libraries, etc. you can have a specialized, tiny statically built init binary which just does all initialization/mounts via syscalls and then runs your app (which should be built statically, too).
This way you can achieve ~1-2s boot to UI on even way slower platforms than Rpi4. I did something like this on Pinephone a few years ago and recently again on Luckfox Pico Mini, which is a slow Cortex-A7 with DDR2 memory.
There's no justification for Rpi to be booting for 12s. Even unoptimized standard U-Boot on common platforms often runs for just ~1s before handing off to the kernel.
Any good resources about this? I absolutely loathe the pre-kernel boot times we get at $WORK, even when using a supposedly minimal bootloader. Just decompressing the kernel takes like 10 seconds.
For around 1-2s you need a platform that has good SD or eMMC host implementation in U-Boot, and doesn't spend too much time initializing DRAM, or in U-Boot SPL code before MMU/CPU data cache is enabled. If the CPU is slow or not upclocked in U-Boot, not doing decompression in bootloader is also a tradeoff option that may end up being faster. Some newer Rockchip RK35xx SoCs have pretty fast boot times in this regard. (RK3399 interstingly doesn't)
It's all necessarily SoC specific.
On some platforms (RV1103/6) I've also experimented with using bootrom functions that normally load U-Boot to load the kernel directly, so with that approach, DTB, and kernel is being loaded right away, without re-initializing eMMC or SD card or having to load and execute a complicated DT driven bootloader like U-Boot at all. But that required quite bit of work and some reverse engineering of the Boot ROM code, so that's not just a optimization.
The folks at Bootlin have way more interesting approaches and techniques to review. I'd look to them.
https://bootlin.com/blog/tag/boot-time/
Not really. The result is kind of a letdown and that was too be expected, going after a single memcpy is low ROI because memcpy performance is probably what processor and library makers look at first.
What wins big is to remove useless stuff entirely. The demo boards I have seen often come with a standard Debian with all the services one might want for a workstation. The kernel is usually the same, with drivers such as PS/2 mouse enabled - thankfully just as a module, but disabling the useless drivers wins big in kernel compilation time. Which makes optimizing the rest a bit more comfortable (some options such as hardening and debugging can nearly full recompilation).
While there, one can also build the drivers one always want inside the kernel (not as module). A module inside a compressed kernel is loaded faster than from somewhere in a filesystem.
That said, Bootlin has indeed a lot of good stuff.
As for turning off systemd, it is not a bad idea if one replaces it with an init system that can start stuff in parallel (to mask the waiting times). I get that desktop users and admins like it a lot, but for an embedded system with a specific mission and a static configuration, it can overkill. There are a bit more caveats than what they say, though.
There's not a lot of good information about attacking it from the other way: Start with a bootloader, firmware, and a kernel, and add only the stuff that you need. I mean, it's probably out there on the web somewhere, but I've found it tough to find among all the "just strip down Debian" blog posts. If anyone has experience doing this and could point me to a few resources, I'd be grateful!
From there, get gcc and make for your platform, so you can carry on with compilation on-target, if possible; cross-compilation can be tricky. It should be ok if you don't want the heavier GUI stuff like X11/Wayland, browser etc.
[1] https://wiki.debian.org/Debootstrap
This. And there's even a project (targeting mainly RasPi) that does just that: https://gokrazy.org
(And yes, you can also deploy code that wasn't written in Go, although it's quite clunky.)
But in practical terms if you really need boot speed, you need a Pi 5 that only takes a few seconds at worst, not a Pi 4 which uses its slow ass GPU to run the boot process for absolute maximum sluggishness possible.
[Timer]
OnBootSec=2min
https://wiki.archlinux.org/title/Systemd/Timers
Edit: set delay to one minure, working just fine!
A few seconds is still very long - in automotive you’re dealing with a few hundred milliseconds from power is applied till the system is presenting an interactive GUI.