bcachefs.git - Unnamed repository; edit this file 'description' to name the repository.

tag name	bcachefs-2025-05-24 (642a3d66bc6d67e1d297fb12f7ac99990bdef752)
tag date	2025-05-24 20:18:44 -0400
tagged by	Kent Overstreet <kent.overstreet@linux.dev>
tagged object	commit 9caea9208f...

bcachefs updates for 6.16

Lots of changes: - Poisoned extents can now be moved: this lets us handle bitrotted data without deleting it. For now, reading from poisoned extents only returns -EIO: in the future we'll have an API for specifying "read this data even if there were bitflips". - Incompatible features may now be enabled at runtime, via "opts/version_upgrade" in sysfs. Toggle it to incompatible, and then toggle it back - option changes via the sysfs interface are persistent. - Various changes to support deployable disk images: - RO mounts now use less memory - Images may be stripped of alloc info, particularly useful for slimming them down if they will primarily be mounted RO. Alloc info will be automatically regenerated on first RW mount, and this is quite fast. - Filesystem images generated with 'bcachefs image' will be automatically resized the first time they're mounted on a larger device. The images 'bcachefs image' generates with compression enabled have been comparable in size to those generated by squashfs and erofs - but you get a full RW capable filesystem. - Major error message improvements for btree node reads, data reads, and elsewhere. We now build up a single error message that lists all the errors encountered, actions taken to repair, and success/failure of the IO. This extends to other error paths that may kick off other actions, e.g. scheduling recovery passes: actions we took because of an error are included in that error message, with grouping/indentation so we can see what caused what. - Repair/self healing: - We can now kick off recovery passes and run them in the background if we detect errors. Currently, this is just used by code that walks backpointers; we now also check for missing backpointers at runtime and run check_extents_to_backpointers if required. The messy 6.14 upgrade left missing backpointers for some users, and this will correct that automatically instead of requiring a manual fsck - some users noticed this as copygc spinning and not making progress. In the future, as more recovery passes come online, we'll be able to repair and recover from nearly anything - except for unreadable btree nodes, and that's why you're using replication, of course - without shutting down the filesystem. - There's a new recovery pass, for checking the rebalance_work btree, which tracks extents that rebalance will process later. - Hardening: - Close the last known hole in btree iterator/btree locking assertions: path->should_be_locked paths must stay locked until the end of the transaction. This shook out a few bugs, including a performance issue that was causing unnecessary path_upgrade transaction restarts. - Performance; - Faster snapshot deletion: this is an incompatible feature, as it requires new sentinal values, for safety. Snapshot deletion no longer has to do a full metadata scan, it now just scans the inodes btree: if an extent/dirent/xattr is present for a given snapshot ID, we already require that an inode be present with that same snapshot ID. If/when users hit scalability limits again (ridiculously huge filesystems with lots of inodes, and many sparse snapshots), let me know - the next step will be to add an index from snapshot ID -> inode number, which won't be too hard. - Faster device removal: the "scan for pointers to this device" no longer does a full metadata scan, instead it walks backpointers. Like fast snapshot deletion this is another incompat feature: it also requires a new sentinal value, because we don't want to reuse these device IDs until after a fsck. - We're now coalescing redundant accounting updates prior to transaction commit, taking some pressure off the journal. Shortly we'll also be doing multiple extent updates in a transaction in the main write path, which combined with the previous should drastically cut down on the amount of metadata updates we have to journal. - Stack usage improvements: All allocator state has been moved off the stack - Debug improvements: - enumerated refcounts: The debug code previously used for filesystem write refs is now a small library, and used for other heavily used refcounts. Different users of a refcount are enumerated, making it much easier to debug refcount issues. - Async object debugging: There's a new kconfig option that makes various async objects (different types of bios, data updates, write ops, etc.) visible in debugfs, and it should be fast enough to leave on in production. - Various sets of assertions no longer require CONFIG_BCACHEFS_DEBUG, instead they're controlled by module parameters and static keys, meaning users won't need to compile custom kernels as often to help debug issues. - bch2_trans_kmalloc() calls can be tracked (there's a new kconfig option); with it on you can check the btree_transaction_stats in debugfs to see the bch2_trans_kmalloc() calls a transaction did when it used the most memory. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmgyaC0ACgkQE6szbY3K bnYcGQ//ZOCe34wjVFub+dNn9os0llaIFaShTC9Baoi+Ly8qmMBkiVR8h0XZWJ6I Xue8FaPksEDUF+pXSPjI+L/WA2uW/qNm2Q2RxEfxigSMSzUUZvHs/jU3ZkpZ1JQb l327tun1XNNY2JagcTj09X+VoasLuhQtvBKXM6gAWozXNszLesd1vaFexPsk13bV GwqSxlfayYt5DwzEf7OCL9CXWfW86qs8snLYAPpv/pyoVNKw+iuPFlhDA1AD1ZMG s+syQ5R7u5ikcfpYnaakDsn3KhxsX+jLk5PoSHk/6kGy/5BdJ1AUYQEsSNfdcxHy pxNht12Nuoo2q2qI0gL4oegnz36cndtveCf9vs6K0Vg24ZRylhh8uz3v/ZcAu0Ne CwFvpxMn5jtIgqh75i9R1/W6aiuKffkE29D4Me5RJxEqoM8yKKhKx6tHHzZftT3a QSvbgsfBghetfTqcajBvDDN5GQM2Z8pz2iLrIw/EHuAh15hAhzf+7ULHprIh6IDz m/Px72xrh39CAKI8IdsjD7QLT9a7xN3WKQXbSvFMEPjnJtGL3JGARZfsKB2gL7ZO 551ONexueFkilQmGQfy20VYGF1Mu9mWTUqyVnNaQUMbgKKDcAivy71UyFe/n3GOB xJyEKTfrJg8Qn+vEJvlhXevVnz5FO/hiOAMIrMPKQq8XT0iNdAA= =srxl -----END PGP SIGNATURE-----