summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-07-16bcachefs: for_each_btree_key2()bcachefs-fsck-workKent Overstreet
This introduces two new macros for iterating through the btree, with transaction restart handling - for_each_btree_key2() - for_each_btree_key_commit() Every iteration is now in an implicit transaction, and - as with lockrestart_do() and commit_do() - returning -EINTR will cause the transaction to be restarted, at the same key. This patch converts a bunch of code that was open coding this to these new macros, saving a substantial amount of code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-16bcachefs: Fix repair for extent past end of inodeKent Overstreet
When we find an extent past an inode's i_size, we need to do the deletion in the inode's snapshot (which will emit a whiteout if necessary); and we also need to note that we now have an a key at that position and snapshot, so that we don't go into an infinite loop. Also, switch to walking inodes in reverse older, oldest snapshot to newest, so that we emit the fewest whiteouts possible. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-16bcachefs: When fsck finds redundant snapshot keys, trigger snapshots cleanupKent Overstreet
Fsck now checks for keys in different snapshot IDs that are now redundant due to other snapshots being deleted - it needs to for its own algorithms to not get confused. When it detects this it should re-run the post snapshot deletion cleanup - this patch does that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-16fixup! bcachefs: Improve snapshots_seenKent Overstreet
2022-07-15bcachefs: Improve fsck for subvols/snapshotsKent Overstreet
- Bunch of refactoring, and move some code out of bch2_snapshots_start() and into bch2_snapshots_check(), for constency with the rest of fsck - Interior snapshot nodes no longer point to a subvolume; this is so we don't end up with dangling subvol references when deleting or require scanning the full snapshots btree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-15bcachefs: Improve snapshots_seenKent Overstreet
This makes the snapshots_seen data structure fsck private and improves it; we now also track the equivalence class for each snapshot id we've seen, which means we can detect when snapshot deletion hasn't finished or run correctly (which will otherwise confuse fsck). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-15bcachefs: Fix subvol/snapshot deleting in recoveryKent Overstreet
fsck doesn't want to run while we're cleaning up deleted snapshots - if that work needs to be done, we want it to have finished before fsck runs, otherwise fsck will get confused when it finds multiple keys in the same snapshot ID equivalence class (i.e. the mechanism that snapshot deletion uses for cleaning up redundant keys). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-14bcachefs: fsck_inode_rm() shouldn't delete subvolsKent Overstreet
We should never see an inode marked as unlinked that's a subvolume root (or a directory) in fsck, but even if we do it's not correct for fsck to delete the subvolume: subvolumes are owned by dirents, and if we find a dangling subvolume (not marked as unlinked) we want fsck to reattach it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-14bcachefs: Switch data_update path to snapshot_id_listKent Overstreet
snapshots_seen is becoming private to fsck, and snapshot_id_list is actually what the data update path needs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-14bcachefs: Fix snapshot deletionKent Overstreet
Snapshots being deleted won't in general have a corresponding subvolume: this fixes a spurious fsck error where we'd complain about a snapshot pointing to a missing subvolume - but the subvolume had been deleted, and the snapshot was pending deletion as well. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-13bcachefs: Rename __bch2_trans_do() -> commit_do()Kent Overstreet
Better/more descriptive naming, and prep for adding nested_lockrestart_do() and nested_commit_do(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12bcachefs: Put some repair messages behind opts->verboseKent Overstreet
These messages log the updates we're doing in bch2_check_fix_ptrs(), which is useful when debugging but not usually needed. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12bcachefs: Silence some fsck errors when reconstructing alloc infoKent Overstreet
There's no need to print fsck errors for errors that are expected, and the user has already opted to repair. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12bcachefs: Delete a faulty assertionKent Overstreet
A lock ordering transaction restart can rarely happen in bch2_btree_path_traverse_all() due to btree_key_cache_fill() creating new paths at a lower lock order than the current path being traversed. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12bcachefs: Silence unimportant tracepointsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-129p: Add mempools for RPCsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Latchesar Ionkov <lucho@ionkov.net> Cc: Dominique Martinet <asmadeus@codewreck.org>
2022-07-129p: Add client parameter to p9_req_put()Kent Overstreet
This is to aid in adding mempools, in the next patch. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Latchesar Ionkov <lucho@ionkov.net> Cc: Dominique Martinet <asmadeus@codewreck.org>
2022-07-129p: Drop kref usageKent Overstreet
An upcoming patch is going to require passing the client through p9_req_put() -> p9_req_free(), but that's awkward with the kref indirection - so this patch switches to using refcount_t directly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Latchesar Ionkov <lucho@ionkov.net> Cc: Dominique Martinet <asmadeus@codewreck.org>
2022-07-12bcachefs: Fix move path when move_stats == NULLKent Overstreet
This isn't done very often, but it is legitimate Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12bcachefs: Fix bch2_check_alloc_key()Kent Overstreet
bch2_check_alloc_key() was failing to check buckets that didn't have alloc keys yet (because they'd never been used) - they still need to be added to the freespace btree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12bcachefs: Improve bch2_check_alloc_infoKent Overstreet
- In check_alloc_key(), previously we were re-initializing iterators for the need_discard and freespace btrees for every alloc key we checked. But this was causing us to redo lookups into the journal keys every time, since those lookups are cached in struct btree_iter. This initializes the iterators in bch2_check_alloc_info and passes them into check_alloc_key(). - Make the looping more consistent/efficient in bch2_check_alloc_info() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12bcachefs: Use BTREE_INSERT_LAZY_RW in bch2_check_alloc_info()Kent Overstreet
This runs before we go rw for journal replay, but after we're allowed to go rw. It might be time to consider killing BTREE_INSERT_LAZY_RW, though. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12bcachefs: Make "failed to evacuate bucket" a fatal errorKent Overstreet
We need to track these down, so let's make them noisier. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12bcachefs: Silence spurious "failed to evacuate bucket" errorsKent Overstreet
We'd like to make these errors fatal and more noisy, but first we need to silence the ones that aren't actually errors. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12bcachefs: Bucket invalidate path improvementsKent Overstreet
- invalidate_one_bucket() now returns 1 when we don't have any buckets on this device to invalidate, ensuring we don't spin - the tracepoint invocation is moved to after the transaction commit, and we now include the number of cached sectors in the tracepoint Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12bcachefs: Don't BUG_ON() inode link count underflowKent Overstreet
This switches that assertion to a bch2_trans_inconsistent() call, as it should be. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12bcachefs: We can handle missing btree roots for all alloc btreesKent Overstreet
We can rebuild alloc info if these btree roots are missing - no need to bail out and say the filesystem is unrecoverable Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12bcachefs: Always descend to leaf nodes it btree_gcKent Overstreet
If a btree node is unreadable, it's the topology repair that fixes that and it's kicked off by btree_gc, so btree_gc needs to touch every node and very that they can be read. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12bcachefs: fix __dev_available().Daniel Hill
__dev_available() now calculates available buckets correctly. Previously it would almost always return 0 when we have cached data. Signed-off-by: Daniel Hill <daniel@gluo.nz>
2022-07-12bcachefs: Fix assertion in topology repairKent Overstreet
If we were at the end of the node, when breaking out of the loop we'd pop the assertion on line 446 when cur wasn't NULL. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12bcachefs: Make verbose option settable at runtimeKent Overstreet
-o verbose is very useful, and we're starting to use it more for runtime debug statements - making it possible to enable at runtime is a no brainer. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12bcachefs: Fix refcount leak in bch2_do_invalidates()Kent Overstreet
If we fail to queue the work item because it's already in process, we need to drop the ref we just took. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12bcachefs: Get ref on c->writes in move.cKent Overstreet
There's no point reading an extent in order to move it if the write is going to fail because we're shutting down. This patch changes the move path so that moving_io now owns a ref on c->writes - as a bonus, rebalance and copygc will now notice that we're shutting down and exit quicker. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12bcachefs: Pipeline copygc across multiple bucketsKent Overstreet
This tweaks the copygc path to keep the same moving_ctxt across multiple evacuate bucket calls, meaning we can pipeline across buckets - should be a nice performance boost. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12bcachefs: move.c refactoringKent Overstreet
- add bch2_moving_ctxt_(init|exit) - split out __bch2_evacutae_bucket() which takes an existing moving_ctxt, this will be used for improving copygc performance by pipelining across multiple buckets Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12bcachefs: btree key cache pcpu freedlistKent Overstreet
Originally, the btree key cache code would always allocate new entries by reusing from the recently-freed list, if that list wasn't empty. But that behaviour was dropped, for lock contention reasons. But it seems that entries stranded on the freed list have been contributing to some of our oom issues, because long running btree transactions will prevent them from being freed. This patch re-adds allocating from the freed list, but it also adds percpu buffers to solve the lock contention issues - and the new percpu freed lists will improve the evict paths, too. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12fixup! lib/printbuf: New data structure for printing stringsKent Overstreet
2022-07-12bcachefs: Make IO in flight by copygc/rebalance configurableKent Overstreet
This adds a new option, move_bytes_in_flight, for configuring the amount of IO in flight by copygc/rebalance - users with many devices in their filesystem will want to increase this. In the future we should be smarter about this, but this is an easy improvement. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12bcachefs: Check for extents with too many ptrsKent Overstreet
We have a hardcoded maximum on number of pointers in an extent that's used by some other data structures - notably bch_devs_list - but we weren't actually checking for it. Oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12bcachefs: Always use percpu_ref_tryget_live() on c->writesKent Overstreet
If we're trying to get a ref and the refcount has been killed, it means we're doing an emergency shutdown - we always want tryget_live(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12bcachefs: data jobs, including rebalance wait for copygc.Daniel Hill
move_ratelimit() now has a bool that specifies whether we want to wait for copygc to finish. When copygc is running, we're probably low on free buckets instead of consuming the remaining buckets, we want to wait for copygc to finish. This should help with performance, and run away bucket fragmentation. Signed-off-by: Daniel Hill <daniel@gluo.nz>
2022-07-12bcachefs: Redo data_update interfaceKent Overstreet
This patch significantly cleans up and simplifies the data_update interface. Instead of only being able to specify a single pointer by device to rewrite, we're now able to specify any or all of the pointers in the original extent to be rewrited, as a bitmask. data_cmd is no more: the various pred functions now just return true if the extent should be moved/updated. All the data_update path does is rewrite existing replicas, or add new ones. This fixes a bug where with background compression on replicated filesystems, where rebalance -> data_update would incorrectly drop the wrong old replica, and keep trying to recompress an extent pointer and each time failing to drop the right replica. Oops. Now, the data update path doesn't look at the io options to decide which pointers to keep and which to drop - it only goes off of the data_update_options passed to it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12bcachefs: Improve checksum error messagesKent Overstreet
We're seeing checksum errors in the bch2_rechecksum_bio() path - give it a better error message to help track this down. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12bcachefs: Move 'btree_transactions' debug to debugsKent Overstreet
This moves btree_transactions from sysfs to debugfs, and makes it more verbose: now we also include the backtrace of each task, since we generally need this for debugging deadlocks. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12bcachefs: Improve an error messageKent Overstreet
When inserting a key type that's not valid for a given btree, we should print out which btree we were inserting into. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12bcachefs: Fix assertion in bch2_dev_list_add_dev()Kent Overstreet
We were only allowing 4 devices in a dev_list, not 16. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12bcachefs: Improve "copygc requested to run" error messageKent Overstreet
This improves the "copygc requested to run but no buckets found" to show the device that requires copygc to be run on - we'll definitely need to improve this more. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12bcachefs: Pull out data_update.cKent Overstreet
This is the start of reorganizing the data IO paths. The plan is to also break apart io.c into data_read.c and data_write.c, and migrate_write will be renamed to the data_update path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12bcachefs: Split out dev_buckets_free()Kent Overstreet
Previously, dev_buckets_available() only counted buckets that are eligible to be allocated right now - i.e. buckets that don't have cached data, or need discard, or need gc gens, etc. But most users of this function want to know how many buckets are eligible to be allocated from without moving data around - copygc, allocator striping, which means we should be including cached data buckets etc. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-07-12bcachefs: Copygc now uses backpointersKent Overstreet
Previously, copygc needed to walk the entire extents & reflink btrees to find extents that needed to be moved. Now that we have backpointers, this patch implements bch2_evacuate_bucket() in the move code, which copygc now uses for evacuating mostly empty buckets. Also, thanks to the new backpointers code, copygc can now move btree nodes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>