bcachefs.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
3 hours	bcachefs: Improved btree node tracepoints	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
14 days	bcachefs: Fix restart handling in btree_node_scrub_work()	Kent Overstreet
	btree node scrub was sometimes failing to rewrite nodes with errors; bch2_btree_node_rewrite() can return a transaction restart and we weren't checking - the lockrestart_do() needs to wrap the entire operation. And there's a better helper it should've been using, bch2_btree_node_rewrite_key(), which makes all this more convenient. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-16	bcachefs: don't return fsck_fix for unfixable node errors in __btree_err	Bharadwaj Raju
	After cd3cdb1ef706 ("Single err message for btree node reads"), all errors caused __btree_err to return -BCH_ERR_fsck_fix no matter what the actual error type was if the recovery pass was scanning for btree nodes. This lead to the code continuing despite things like bad node formats when they earlier would have caused a jump to fsck_err, because btree_err only jumps when the return from __btree_err does not match fsck_fix. Ultimately this lead to undefined behavior by attempting to unpack a key based on an invalid format. Make only errors of type -BCH_ERR_btree_node_read_err_fixable cause __btree_err to return -BCH_ERR_fsck_fix when scanning for btree nodes. Reported-by: syzbot+cfd994b9cdf00446fd54@syzkaller.appspotmail.com Fixes: cd3cdb1ef706 ("bcachefs: Single err message for btree node reads") Signed-off-by: Bharadwaj Raju <bharadwaj.raju777@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-16	bcachefs: Move bset size check before csum check	Alan Huang
	In syzbot's crash, the bset's u64s is larger than the btree node. Reported-by: syzbot+bfaeaa8e26281970158d@syzkaller.appspotmail.com Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-11	bcachefs: Fix version checks in validate_bset()	Kent Overstreet
	It seems btree node scan picked up a partially overwritten btree node, and corrected the "bset version older than sb version_min" error - resulting in an invalid superblock with a bad version_min field. Don't run this check at all when we're in btree node scan, and when we do run it, do something saner if the bset version is totally crazy. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-11	bcachefs: Add more flags to btree nodes for rewrite reason	Kent Overstreet
	It seems excessive forced btree node rewrites can cause interior btree updates to become wedged during recovery, before we're using the write buffer for backpointer updates. Add more flags so we can determine where these are coming from. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02	bcachefs: bch_err_throw()	Kent Overstreet
	Add a tracepoint for any time we return an error and unwind. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-01	bcachefs: Replace rcu_read_lock() with guards	Kent Overstreet
	The new guard(), scoped_guard() allow for more natural code. Some of the uses with creative flow control have been left. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31	bcachefs: Use bch2_err_matches() for BCH_ERR_fsck_(fix\|ignore)	Kent Overstreet
	We'll be adding subtypes of these errors, and new error code tracing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21	bcachefs: Reduce usage of recovery.curr_pass	Kent Overstreet
	We want recovery.curr_pass to be private to the recovery passes code, for better showing recovery pass status; also, it may rewind and is generally not the correct member to use. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21	bcachefs: struct bch_fs_recovery	Kent Overstreet
	bch_fs has gotten obnoxiously big, let's start organizing thins a bit better. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21	bcachefs: bch2_fs_emergency_read_only2()	Kent Overstreet
	More error message cleanup: instead of multiple printk()s per error, we want to be building up a single error message in a printbuf, so that it can be printed with indenting that shows grouping and avoid errors getting interspersed or lost in the log. This gets rid of most calls to bch2_fs_emergency_read_only(). We still have calls to - bch2_fatal_error() - bch2_fs_fatal_error() - bch2_fs_fatal_err_on() that need work. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21	bcachefs: Rename fsck_running, recovery_running flags	Kent Overstreet
	Slightly more readable. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21	bcachefs: Debug params are now static_keys	Kent Overstreet
	We'd like users to be able to debug without building custom kernels, so this will help us get rid of CONFIG_BCACHEFS_DEBUG, at least for most things. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21	bcachefs: Single err message for btree node reads	Kent Overstreet
	Like we just did with the data read path, emit a single error message per btree node reads, nicely formatted, with all the actions we took grouped together. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21	bcachefs: Plumb printbuf through bch2_btree_lost_data()	Kent Overstreet
	Part of the ongoing project to improve error messages by building them up in printbufs and emitting them all at once, so that we can easily see what events are related in the log. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21	bcachefs: Make various async objs visible in debugfs	Kent Overstreet
	Add async objs list for - promote_op - bch_read_bio - btree_read_bio - btree_write_bio This gets us introspection on in-flight async ops, and because under the hood it uses fast_lists (percpu slot buffer on top of a radix tree), it'll be fast enough to enable in production. This will be very helpful for debugging "something got stuck" issues, which have been cropping up from time to time (in the CI, especially with folio writeback). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21	bcachefs: bch_dev.io_ref -> enumerated_ref	Kent Overstreet
	Convert device IO refs to enumerated_refs, for easier debugging of refcount issues. Simple conversion: enumerate all users and convert to the new helpers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21	bcachefs: bch_fs.writes -> enumerated_refs	Kent Overstreet
	Drop the single-purpose write ref code in bcachefs.h, and convert to enumarated refs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21	bcachefs: Plumb target parameter through btree_node_rewrite_pos()	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21	bcachefs: print_str_as_lines() -> print_str()	Kent Overstreet
	bch2_print_string_as_lines() is a low level helper that allows messages longer than 1k to be printed without truncation. But we should always be printing with the helpers that take a filesystem object, if we're in fsck they direct output to the userspace process controlling fsck instead of the dmesg log. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21	bcachefs: Simplify logic	Alan Huang
	Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21	bcachefs: btree_io_complete_wq -> btree_write_complete_wq	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-05	bcachefs: Add missing barriers before wake_up_bit()	Kent Overstreet
	wake_up() doesn't require a barrier - but wake_up_bit() does. This only affected non x86, and primarily lead to lost wakeups after btree node reads. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-03	bcachefs: Fix scheduling while atomic from logging changes	Kent Overstreet
	Two fixes from the recent logging changes: bch2_inconsistent(), bch2_fs_inconsistent() be called from interrupt context, or with rcu_read_lock() held. The one syzbot found is in bch2_bkey_pick_read_device bch2_dev_rcu bch2_fs_inconsistent We're starting to switch to lift the printbufs up to higher levels so we can emit better log messages and print them all in one go (avoid garbling), so that conversion will help with spotting these in the future; when we declare a printbuf it must be flagged if we're in an atomic context. Secondly, in btree_node_write_endio: 00085 BUG: sleeping function called from invalid context at include/linux/sched/mm.h:321 00085 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 618, name: bch-reclaim/fa6 00085 preempt_count: 10001, expected: 0 00085 RCU nest depth: 0, expected: 0 00085 4 locks held by bch-reclaim/fa6/618: 00085 #0: ffffff80d7ccad68 (&j->reclaim_lock){+.+.}-{4:4}, at: bch2_journal_reclaim_thread+0x84/0x198 00085 #1: ffffff80d7c84218 (&c->btree_trans_barrier){.+.+}-{0:0}, at: __bch2_trans_get+0x1c0/0x440 00085 #2: ffffff80cd3f8140 (bcachefs_btree){+.+.}-{0:0}, at: __bch2_trans_get+0x22c/0x440 00085 #3: ffffff80c3823c20 (&vblk->vqs[i].lock){-.-.}-{3:3}, at: virtblk_done+0x58/0x130 00085 irq event stamp: 328 00085 hardirqs last enabled at (327): [<ffffffc080073a14>] finish_task_switch.isra.0+0xbc/0x2a0 00085 hardirqs last disabled at (328): [<ffffffc080971a10>] el1_interrupt+0x20/0x60 00085 softirqs last enabled at (0): [<ffffffc08002f920>] copy_process+0x7c8/0x2118 00085 softirqs last disabled at (0): [<0000000000000000>] 0x0 00085 Preemption disabled at: 00085 [<ffffffc08003ada0>] irq_enter_rcu+0x18/0x90 00085 CPU: 8 UID: 0 PID: 618 Comm: bch-reclaim/fa6 Not tainted 6.14.0-rc6-ktest-g04630bde23e8 #18798 00085 Hardware name: linux,dummy-virt (DT) 00085 Call trace: 00085 show_stack+0x1c/0x30 (C) 00085 dump_stack_lvl+0x84/0xc0 00085 dump_stack+0x14/0x20 00085 __might_resched+0x180/0x288 00085 __might_sleep+0x4c/0x88 00085 __kmalloc_node_track_caller_noprof+0x34c/0x3e0 00085 krealloc_noprof+0x1a0/0x2d8 00085 bch2_printbuf_make_room+0x9c/0x120 00085 bch2_prt_printf+0x60/0x1b8 00085 btree_node_write_endio+0x1b0/0x2d8 00085 bio_endio+0x138/0x1f0 00085 btree_node_write_endio+0xe8/0x2d8 00085 bio_endio+0x138/0x1f0 00085 blk_update_request+0x220/0x4c0 00085 blk_mq_end_request+0x28/0x148 00085 virtblk_request_done+0x64/0xe8 00085 blk_mq_complete_request+0x34/0x40 00085 virtblk_done+0x78/0x130 00085 vring_interrupt+0x6c/0xb0 00085 __handle_irq_event_percpu+0x8c/0x2e0 00085 handle_irq_event+0x50/0xb0 00085 handle_fasteoi_irq+0xc4/0x250 00085 handle_irq_desc+0x44/0x60 00085 generic_handle_domain_irq+0x20/0x30 00085 gic_handle_irq+0x54/0xc8 00085 call_on_irq_stack+0x24/0x40 Reported-by: syzbot+c82cd2906e2f192410bb@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-02	bcachefs: Kill btree_iter.trans	Kent Overstreet
	This was planned to be done ages ago, now finally completed; there are places where we have quite a few btree_trans objects on the stack, so this reduces stack usage somewhat. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-02	bcachefs: Split up bch_dev.io_ref	Kent Overstreet
	We now have separate per device io_refs for read and write access. This fixes a device removal bug where the discard workers were still running while we're removing alloc info for that device. It's also a bit of hardening; we no longer allow writes to devices that are read-only. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-01	bcachefs: fix ref leak in btree_node_read_all_replicas	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-29	bcachefs: Better printing of inconsistency errors	Kent Overstreet
	Build up and emit the error message for an inconsistency error all at once, instead of spread over multiple printk calls, so they're not jumbled in the dmesg log. Also, add better indenting. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-28	bcachefs: Consistent indentation of multiline fsck errors	Kent Overstreet
	Add the new helper printbuf_indent_add_nextline(), and use it in __bch2_fsck_err() to centralize setting the indentation of multiline fsck errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24	bcachefs: btree node write errors now print btree node	Kent Overstreet
	It turned out a user was wondering why we were going read-only after a write error, and he didn't realize he didn't have replication enabled - this will make that more obvious, and we should be printing it anyways. Link: https://www.reddit.com/r/bcachefs/comments/1jf9akl/large_data_transfers_switched_bcachefs_to_readonly/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-16	bcachefs: Checksum errors get additional retries	Kent Overstreet
	It's possible for checksum errors to be transient - e.g. flakey controller or cable, thus we need additional retries (besides retrying from different replicas) before we can definitely return an error. This is particularly important for the next patch, which will allow the data move path to move extents with checksum errors - we don't want to accidentally introduce bitrot due to a transient error! - bch2_bkey_pick_read_device() is substantially reworked, and bch2_dev_io_failures is expanded to record more information about the type of failure (i.e. number of checksum errors). It now returns an error code that describes more precisely the reason for the failure - checksum error, io error, or offline device, instead of the previous generic "insufficient devices". This is important for the next patches that add poisoning, as we only want to poison extents when we've got real checksum errors (or perhaps IO errors?) - not because a device was offline. - Add a new option and superblock field for the number of checksum retries. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14	bcachefs: Finish bch2_account_io_completion() conversions	Kent Overstreet
	More prep work for automatically kicking devices out after too many IO errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14	bcachefs: Add time_stat for btree writes	Kent Overstreet
	We have other metadata IO types covered, this was missing. Note: this includes the time until completion, i.e. including parent pointer update. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14	bcachefs: bch2_btree_node_scrub()	Kent Overstreet
	Add a function for scrubbing btree nodes - reading them in, and kicking off a rewrite if there's an error. The btree_node_read_done() checks have to be duplicated because we're not using a pointer to a struct btree - the btree node might already be in cache, and we need to check a specific replica, which might not be the one we previously read from. This will be used in the next patch implementing high-level scrub. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14	bcachefs: bch2_bkey_pick_read_device() can now specify a device	Kent Overstreet
	To be used for scrub, where we want the read to come from a specific device. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-11	bcachefs: Fix b->written overflow	Alan Huang
	When bset past end of btree node, we should not add sectors to b->written, which will overflow b->written. Reported-by: syzbot+3cb3d9e8c3f197754825@syzkaller.appspotmail.com Tested-by: syzbot+3cb3d9e8c3f197754825@syzkaller.appspotmail.com Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-20	bcachefs: Fix memmove when move keys down	Alan Huang
	The fix alone doesn't fix [1], but should be applied before debugging that. [1] https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09	bcachefs: bch2_btree_node_write_trans()	Kent Overstreet
	Avoiding screwing up path->lock_seq. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21	bcachefs: fix bch2_btree_node_header_to_text() format string	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21	bcachefs: If we did repair on a btree node, make sure we rewrite it	Kent Overstreet
	Ensure that "invalid bkey" repair gets persisted, so that it doesn't repeatedly spam the logs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21	bcachefs: struct bkey_validate_context	Kent Overstreet
	Add a new parameter to bkey validate functions, and use it to improve invalid bkey error messages: we can now print the btree and depth it came from, or if it came from the journal, or is a btree root. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21	bcachefs: Don't try to en/decrypt when encryption not available	Kent Overstreet
	If a btree node says it's encrypted, but the superblock never had an encryptino key - whoops, that needs to be handled. Reported-by: syzbot+026f1857b12f5eb3f9e9@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21	bcachefs: Fix dup/misordered check in btree node read	Kent Overstreet
	We were checking for out of order keys, but not duplicate keys. Reported-by: syzbot+dedbd67513939979f84f@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21	bcachefs: Avoid bch2_btree_id_str()	Kent Overstreet
	Prefer bch2_btree_id_to_text() - it prints out the integer ID when unknown. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-11-08	bcachefs: Fix validate_bset() repair path	Kent Overstreet
	When we truncate a bset (due to it extending past the end of the btree node), we can't skip the rest of the validation for e.g. the packed format (if it's the first bset in the node). Reported-by: syzbot+4d722d3c539d77c7bc82@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-18	bcachefs: Don't use commit_do() unnecessarily	Kent Overstreet
	Using commit_do() to call alloc_sectors_start_trans() breaks when we're randomly injecting transaction restarts - the restart in the commit causes us to leak the lock that alloc_sectorS_start_trans() takes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-09	bcachefs: Release transaction before wake up	Alan Huang
	We will get this if we wake up first: Kernel panic - not syncing: btree_node_write_done leaked btree_trans since there are still transactions waiting for cycle detectors after BTREE_NODE_write_in_flight is cleared. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27	bcachefs: rename version -> bversion	Kent Overstreet
	give bversions a more distinct name, to aid in grepping Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27	bcachefs: memset bounce buffer portion to 0 after key_sort_fix_overlapping	Piotr Zalewski
	Zero-initialize part of allocated bounce buffer which wasn't touched by subsequent bch2_key_sort_fix_overlapping to mitigate later uinit-value use KMSAN bug[1]. After applying the patch reproducer still triggers stack overflow[2] but it seems unrelated to the uninit-value use warning. After further investigation it was found that stack overflow occurs because KMSAN adds too many function calls[3]. Backtrace of where the stack magic number gets smashed was added as a reply to syzkaller thread[3]. It was confirmed that task's stack magic number gets smashed after the code path where KSMAN detects uninit-value use is executed, so it can be assumed that it doesn't contribute in any way to uninit-value use detection. [1] https://syzkaller.appspot.com/bug?extid=6f655a60d3244d0c6718 [2] https://lore.kernel.org/lkml/66e57e46.050a0220.115905.0002.GAE@google.com [3] https://lore.kernel.org/all/rVaWgPULej8K7HqMPNIu8kVNyXNjjCiTB-QBtItLFBmk0alH6fV2tk4joVPk97Evnuv4ZRDd8HB5uDCkiFG6u81xKdzDj-KrtIMJSlF6Kt8=@proton.me Reported-by: syzbot+6f655a60d3244d0c6718@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6f655a60d3244d0c6718 Fixes: ec4edd7b9d20 ("bcachefs: Prep work for variable size btree node buffers") Suggested-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Piotr Zalewski <pZ010001011111@proton.me> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>