Skip to content

Services with ConditionNeedsUpdate= are still executed at first boot #3338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
septatrix opened this issue Jan 9, 2025 · 15 comments
Open
Labels

Comments

@septatrix
Copy link
Contributor

mkosi commit the issue has been seen with

main

Used host distribution

Fedora 41

Used target distribution

Arch

Linux kernel version used

6.12.7-200.fc41.x86_64

CPU architectures issue was seen on

x86_64

Unexpected behaviour you saw

On the first boot services which have ConditionNeedsUpdate=/var (or /etc) set are still started even though they are fresh images. This delays the bootup unnecessarily. For images with an immutable /etc this might even result in them being started each boot (as the /etc/.updated file cannot be created) though I did not explicitly test that.

One example of such a service is systemd-journal-catalog-update even though the catalog is freshly installed. systemd-hwdb-update would be a similar candidate, however, it has additional checks which do not trigger because mkosi manually invokes systemd-hwdb --usr. For other services these optimizations do not exist.

mkosi should likely start all services which use ConditionNeedsUpdate= as part of the build process and call /usr/lib/systemd/systemd-update-done at the very end. Doing it this way should be safe as either: The /etc and/or /var directories are included in the image in which case their generated files would also be included, or, they are not included in which case the .updated files would also be absent resulting in them being invoked again as expected.


(I noticed there is also another issue if a device reboots from v1 to v2 after v3 is built but not yet installed. In that case the .updated files will have timestamps later than v3's /usr resulting in ConditionNeedsUpdate= not being triggered after a subsequent update from v2 to v3. This, however, is a technical problem in the way these .updated files currently work and not a problem of mkosi.)

Used mkosi config

Default config from this repo

mkosi output

No response

@septatrix septatrix added the bug label Jan 9, 2025
@DaanDeMeyer
Copy link
Contributor

I'll certainly review a PR to run systemd-journal-catalog-update as part of the image build process. I'm not sure we should be calling systemd-update-done though, that seems like something to fix in systemd.

@septatrix
Copy link
Contributor Author

I'll certainly review a PR to run systemd-journal-catalog-update as part of the image build process.

Just running the catalog update manually won't improve things because it would simply be run again on first boot. This would improve if it had a similar mechanism to hwdb/udev where it does not run if a precompiled catalog exists under /usr. Even if that were implemented there are still several other services which will get invoked on first boot unnecessarily.

I'm not sure we should be calling systemd-update-done though, that seems like something to fix in systemd.

Which way do you envision for fixing this? I though about maybe letting sd-firstboot take care of this?

@septatrix
Copy link
Contributor Author

Even if that were implemented there are still several other services which will get invoked on first boot unnecessarily.

One example is ldconfig to populate /etc/ld.so.cache. Here it also will not be possible to support the mechanism from hwdb/udev of checking if /usr/lib/ld.so.cache exists because it will be very hard to change to logic of ld.so.

One possible solution could be to add support for a more fine grained needs-update condition on a file basis like ConditionNeedsUpdate=/etc/ld.so.cache on the systemd side and compare that files timestamp with the one from /usr. In that case the service would be responsible for updating the timestamp on that file on its own. The drawback is that we would lose the TIMESTAMP_NSEC fallback for filesystems which do not support NS accuracy.

@septatrix
Copy link
Contributor Author

I created systemd/systemd#36046 and systemd/systemd#36045 now to track possible improvements/solutions over at systemd

@keszybz
Copy link
Member

keszybz commented Mar 19, 2025

https://bugzilla.redhat.com/show_bug.cgi?id=2348669 is a downstream issue where a lot of pain is created by ldconfig.service being run on a live image.

@septatrix
Copy link
Contributor Author

https://bugzilla.redhat.com/show_bug.cgi?id=2348669 is a downstream issue where a lot of pain is created by ldconfig.service being run on a live image.

I suggest you also comment this under the systemd issue/RFE I filed (systemd/systemd#36046). While mkosi could in theory work around some of these issues this needs proper resolution by systemd itself (and the Fedora KDE live images are to my knowledge created using Kiwi, not mkosi)

@keszybz
Copy link
Member

keszybz commented Mar 19, 2025

So… I think both of the downstream systemd issues are valid, but they are not the complete solution, or maybe not the only solution. In case of images built from rpms, the rpms have scriptlets to do all those updates. So on a normal system installed with mkosi, we can skip all the update services that run at boot, there is nothing for them to do unless a mistake was made in the packaging. This is not something that is true in general for all images: additional non-distro stuff may be added which in fact does depend on update services during boot. But I think that the builder of image should be able to assert "my config implements updates during build, I don't need to rely on boot-time updates". And mkosi should then update /etc/.updated and /var/.updated.

For example, on my f42 image build from the config in mkosi sources, I see this during boot:

Mar 19 12:20:50 main systemd[1]: Starting systemd-sysusers.service - Create System Users...
Mar 19 12:20:51 main systemd[1]: Starting ldconfig.service - Rebuild Dynamic Linker Cache...
Mar 19 12:20:51 main systemd[1]: Starting systemd-journal-catalog-update.service - Rebuild Journal Catalog...

All those are unnecessary.

Maybe this should just be done by downstream config, e.g. in mkosi.finalize as the last step.

@septatrix
Copy link
Contributor Author

So… I think both of the downstream systemd issues are valid, but they are not the complete solution, or maybe not the only solution

I agree, though they are can certainly be part of the solution. If all the update services would be able to specify more precise NeedsUpdate conditions one can run those manually during mkosi.finalize and if they missed some they would still be run upon boot. So there is no way to have old/invalid state - either you manually updated it and it uses a fine-grained condition, or systemd will just be conservative and still run it.

Unless systemd has some way to run all the update services offline (which would be quite hard if not impossible) this is not something which mkosi could ever assume. Installing e.g. libraries as part of mkosi.build would invalidate the ldconfig cache

Maybe this should just be done by downstream config, e.g. in mkosi.finalize as the last step.

Yes, this conservative way is likely the safest way for now.

@keszybz
Copy link
Member

keszybz commented Mar 19, 2025

I filed systemd/systemd#36803 to help with the "conservative way".

keszybz added a commit to keszybz/mkosi that referenced this issue Mar 19, 2025
This implements the "conservative approach" discussed in
systemd#3338. If we know that
/etc/ and /var/ were populated during the installation, we can
opt out of running early boot services, making the first boot
quicker.

C.f. https://bugzilla.redhat.com/show_bug.cgi?id=2348669.
@keszybz
Copy link
Member

keszybz commented Mar 19, 2025

#3602 does a mkosi.finalize with touch -r /usr …. Seems to work as expected: ldconfig.service, systemd-sysusers.service, systemd-journal-catalog-update.service were started but are not started with that patch.

@DaanDeMeyer
Copy link
Contributor

Can we just handle all the common cases and then mark stuff as "updated"? I think we can kind of make the assumption that if you're shipping an image with /etc and /var that they should be considered updated.

@septatrix
Copy link
Contributor Author

Then we will have people left and right running into the uncommon cases. I would not be comfortable with such a change as it is overly optimistic.

Instead, services/systemd should learn to specify exact dependencies not on /etc or /var, but rather files or subdirectories. ldconfig could then declare ConditionNeedsUpdate=/etc/ldcache.so (or maybe an ldcache.so.updated file for ensuring ns precision). Journald could reference the compiled message catalog. Sysuser could reference the passwd/shadow and groups files (or just /etc/sysusers.updated).

This way we could run the tools to the best of our knowledge and those would not be started upon a fresh boot. And if something pulls in more exotic services with a ConditionNeedsUpdate= which we do not cover they will have to ensure themselves that they invoke it during their build process.

@keszybz
Copy link
Member

keszybz commented Mar 20, 2025

I think we can ignore the question of precision. For a live system, theoretically one could install packages in a loop and do that a few times within a second and use a shitty file system. But for images, which is what mkosi is concerned with, this doesn't seem to be a realistic issue. Or in other words, if you want to deliver images with a sub-second frequency, then don't use a file system (or fs config) which doesn't support proper timestamps.

@septatrix
Copy link
Contributor Author

I think we can ignore the question of precision.

I agree regarding the precision, the larger issue I see with simply using a normal file as the target of NeedsUpdate= is that applications need to explicitly set its mtime to that of /usr after an update. ldconfig and other tools do not know that they should do this - sure, they could be teached, but it's an easy thing to forget.

@keszybz
Copy link
Member

keszybz commented Mar 20, 2025

Hmm, you're right. Normally, it's enough if the mtime is set in a natural way, because then it's guaranteed to be be >= mtime of /usr. But then we are back to the problem described in systemd/systemd#36045, i.e. we might erroneously skip the update. But I don't think we want to teach random unrelated tools to play with timestamps like that. Maybe this should be done at the level of systemd, i.e. system should update the timestamp file itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants