summaryrefslogtreecommitdiff
path: root/fs/bcachefs/btree_key_cache.c
AgeCommit message (Collapse)Author
2022-06-09bcachefs: Convert to lib/printbuf.cprintbuf_v3_bcachefsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-06-05bcachefs: Btree key cache coherencyKent Overstreet
This is the last piece for btree key cache coherency: We already have: - btree iterator code checks the key cache when iterating over a cached btree - update path ensures that updates go to the key cache when updating a cached btree But for iterating over a cached btree to work, we need to ensure that if a key exists in the key cache, it also exists in the btree - otherwise the iterator code will skip past it and not check the key cache. This patch implements that last piece: on a key cache update, if creating a new key, we now also update the underlying btree. This fixes a device removal bug, where deleting alloc info wasn't correctly deleting all keys associated with a given device. It also means we should be able to re-enable the key cache for inodes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Allocate some extra room in btree_key_cache_fill()Kent Overstreet
If we allocate a buffer that's a bit bigger than necessary the transaction commit path will be much less likely to have to reallocate - which requires a transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Fix a few warnings on 32 bitKent Overstreet
These showed up when building for mips. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Introduce a separate journal watermark for copygcKent Overstreet
Since journal reclaim -> btree key cache flushing may require the allocation of new btree nodes, it has an implicit dependency on copygc in order to make forward progress - so we should avoid blocking copygc unless the journal is really close to full. This introduces watermarks to replace our single MAY_GET_UNRESERVED bit in the journal, and adds a watermark for copygc and plumbs it through. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Lock ordering asserts for traverse_all()Kent Overstreet
This adds some new assertions that we always take locks in the correct order while running under traverse_all() - we've been seeing some livelocks and a bit of strange behaviour, this helps ensure that everything is working the way we expect. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Fix usage of six lock's percpu modeKent Overstreet
Six locks have a percpu mode, which we use for interior btree nodes, as well as btree key cache keys for the subvolumes btree. We've been switching locks back and forth between percpu and non percpu mode as needed, but it turns out this is racy - when we're reusing an existing node, other threads could be attempting to lock it while we're switching it between modes. This patch fixes this by never switching 'struct btree' between the two modes, and instead segragating them between two different freed lists. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Delete redundant tracepointKent Overstreet
We were emitting two trace events on transaction restart in this code path - delete the redundant one. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Btree key cache coherencyKent Overstreet
Updates to non key cache iterators will now be transparently redirected to the key cache for cached btrees. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: BTREE_ITER_WITH_KEY_CACHEKent Overstreet
This is the start of cache coherency with the btree key cache - this adds a btree iterator flag that causes lookups to also check the key cache when we're iterating over the btree (not iterating over the key cache). Note that we could still race with another thread creating at item in the key cache and updating it, since we aren't holding the key cache locked if it wasn't found. The next patch for the update path will address this by causing the transaction to restart if the key cache is found to be dirty. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Improve btree_key_cache_flush_pos()Kent Overstreet
btree_key_cache_flush_pos() uses BTREE_ITER_CACHED_NOFILL - but it wasn't checking for !ck->valid. It does check for the entry being dirty, so it shouldn't matter, but this refactor it a bit and adds and assertion. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Tracepoint improvementsKent Overstreet
This improves the transaction restart tracepoints - adding distinct tracepoints for all the locations and reasons a transaction might have been restarted, and ensures that there's a tracepoint for every transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Log & error message improvementsKent Overstreet
- Add a shim uuid_unparse_lower() in the kernel, since %pU doesn't work in userspace - We don't need to print the bcachefs: or the filesystem name prefix in userspace - Improve a few error messages Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Switch to __func__for recording where btree_trans was initializedKent Overstreet
Symbol decoding, via %ps, isn't supported in userspace - this will also be faster when we're using trans->fn in the fast path, as with the new BCH_JSET_ENTRY_log journal messages. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Add error messages for memory allocation failuresKent Overstreet
This adds some missing diagnostics from rare but annoying to debug runtime allocation failure paths. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Fix some shutdown path bugsKent Overstreet
This fixes some bugs when we hit an error very early in the filesystem startup path, before most things have been initialized. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: BTREE_ITER_FILTER_SNAPSHOTSKent Overstreet
For snapshots, we need to implement btree lookups that return the first key that's an ancestor of the snapshot ID the lookup is being done in - and filter out keys in unrelated snapshots. This patch adds the btree iterator flag BTREE_ITER_FILTER_SNAPSHOTS which does that filtering. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Subvolumes, snapshotsKent Overstreet
This patch adds subvolume.c - support for the subvolumes and snapshots btrees and related data types and on disk data structures. The next patches will start hooking up this new code to existing code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30Revert "bcachefs: Add more assertions for locking btree iterators out of order"Kent Overstreet
Figured out the bug we were chasing, and it had nothing to do with locking btree iterators/paths out of order. This reverts commit ff08733dd298c969aec7c7828095458f73fd5374. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Add more assertions for locking btree iterators out of orderKent Overstreet
btree_path_traverse_all() traverses btree iterators in sorted order, and thus shouldn't see transaction restarts due to potential deadlocks - but sometimes we do. This patch adds some more assertions and tracks some more state to help track this down. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Fix initialization of bch_write_op.nonceKent Overstreet
If an extent ends up with a replica that is encrypted an a replica that isn't encrypted (due the user changing options), and then copygc/rebalance moves one of the replicas by reading from the unencrypted replica, we had a bug where we wouldn't correctly initialize op->nonce - for each crc field in an extent, crc.offset + crc.nonce must be equal. This patch fixes that by moving op.nonce initialization to bch2_migrate_write_init. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Kill BTREE_ITER_NEED_PEEKKent Overstreet
This was used for an optimization that hasn't existing in quite awhile - iter->uptodate will probably be going away as well. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Further reduce iter->trans usageKent Overstreet
This is prep work for splitting btree_path out from btree_iter - btree_path will not have a pointer to btree_trans. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Reduce iter->trans usageKent Overstreet
Disfavoured, and should go away. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Kill BTREE_INSERT_NOUNLOCKKent Overstreet
With the recent transaction restart changes, it's no longer needed - all transaction commits have BTREE_INSERT_NOUNLOCK semantics. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: trans->restartedKent Overstreet
Start tracking when btree transactions have been restarted - and assert that we're always calling bch2_trans_begin() immediately after transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Use bch2_trans_do() in bch2_btree_key_cache_journal_flush()Kent Overstreet
We're working to standardize handling of transaction restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Don't downgrade in traverse()Kent Overstreet
Downgrading of btree iterators is something that should only happen explicitly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Tighten up btree_iter locking assertionsKent Overstreet
We weren't correctly verifying that we had interior node intent locks - this patch also fixes bugs uncovered by the new assertions. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: BTREE_UPDATE_INTERNAL_SNAPSHOT_NODEKent Overstreet
Add a new flag to control assertions about updating to internal snapshot nodes, that normally should not be written to - to be used in an upcoming patch. Also do some renaming - trigger_flags is now update_flags. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Change bch2_btree_key_cache_count() to exclude dirty keysKent Overstreet
We're seeing livelocks that appear to be due to bch2_btree_key_cache_scan repeatedly scanning and blocking other tasks from using the key cache lock - we probably shouldn't be reporting objects that can't actually be freed yet. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Fix key cache assertionKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Fix an out of bounds readKent Overstreet
bch2_varint_decode() can read up to 7 bytes past the end of the buffer, which means we need to allocate slightly larger key cache buffers. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Fix a deadlock on journal reclaimKent Overstreet
Flushing the btree key cache needs to use allocation reserves - journal reclaim depends on flushing the btree key cache for making forward progress, and the allocator and copygc depend on journal reclaim making forward progress. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Don't flush btree writes more aggressively because of btree key cacheKent Overstreet
We need to flush the btree key cache when it's too dirty, because otherwise the shrinker won't be able to reclaim memory - this is done by journal reclaim. But journal reclaim also kicks btree node writes: this meant that btree node writes were getting kicked much too often just because we needed to flush btree key cache keys. This patch splits journal pins into two different lists, and teaches journal reclaim to not flush btree node writes when it only needs to flush key cache keys. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Fix a startup raceKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Be more careful about JOURNAL_RES_GET_RESERVEDKent Overstreet
JOURNAL_RES_GET_RESERVED should only be used for updatse that need to be done to free up space in the journal. In particular, when we're flushing keys from the key cache, if we're flushing them out of order we shouldn't be using it, since we're using up our remaining space in the journal without dropping a pin that will let us make forward progress. With this patch, BTREE_INSERT_JOURNAL_RECLAIM without BTREE_INSERT_JOURNAL_RESERVED may return -EAGAIN - we can't wait on journal reclaim if we're already in journal reclaim. This means we need to propagate these errors up to journal reclaim, indicating that flushing a journal pin should be retried in the future. This is prep work for a patch to change the way journal reclaim works, to split out flushing key cache keys because the btree key cache is too dirty from journal reclaim because we need space in the journal. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Split out bpos_cmp() and bkey_cmp()Kent Overstreet
With snapshots, we're going to need to differentiate between comparisons that should and shouldn't include the snapshot field. bpos_cmp is now the comparison function that does include the snapshot field, used by core btree code. Upper level filesystem code generally does _not_ want to compare against the snapshot field - that code wants keys to compare as equal even when one of them is in an ancestor snapshot. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: btree key cache locking improvementsKent Overstreet
The btree key cache mutex was becoming a significant bottleneck - it was mainly used to protect the lists of dirty, clean and freed cached keys. This patch eliminates the dirty and clean lists - instead, when we need to scan for keys to drop from the cache we iterate over the rhashtable, and thus we're able to remove most uses of that lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: btree_iter_set_dontneed()Kent Overstreet
This is a bit clearer than using bch2_btree_iter_free(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Fix locking in bch2_btree_iter_traverse_cached()Kent Overstreet
bch2_btree_iter_traverse() is supposed to ensure we have the correct type of lock - it was downgrading if necessary, but if we entered with a read lock it wasn't upgrading to an intent lock, oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Don't use BTREE_INSERT_USE_RESERVE so muchKent Overstreet
Previously, we were using BTREE_INSERT_RESERVE in a lot of places where it no longer makes sense. - we now have more open_buckets than we used to, and the reserves work better, so we shouldn't need to use BTREE_INSERT_RESERVE just because we're holding open_buckets pinned anymore. - We have the btree key cache for updates to the alloc btree, meaning we no longer need the btree reserve to ensure the allocator can make forward progress. This means that we should only need a reserve for btree updates to ensure that copygc can make forward progress. Since it's now just for copygc, we can also fold RESERVE_BTREE into RESERVE_MOVINGGC (the allocator's freelist reserve). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Add some cond_rescheds() in shutdown pathKent Overstreet
Particularly on emergency shutdown we can end up having to clean up a lot of dirty cached btree keys here. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Fix some spurious gcc warningsKent Overstreet
These only come up when building in userspace, for some reason. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: bch2_trans_get_iter() no longer returns errorsKent Overstreet
Since we now always preallocate the maximum number of iterators when we initialize a btree transaction, getting an iterator never fails - we can delete a fair amount of error path code. This patch also simplifies the iterator allocation code a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Change a BUG_ON() to a fatal errorKent Overstreet
In the btree key cache code, failing to flush a dirty key is a serious error, but it doesn't need to be a BUG_ON(), we can stop the filesystem instead. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Fix error in filesystem initializationKent Overstreet
The rhashtable code doesn't like when we destroy an rhashtable that was never initialized Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Move journal reclaim to a kthreadKent Overstreet
This is to make tracing easier. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Ensure journal reclaim runs when btree key cache is too dirtyKent Overstreet
Ensuring the key cache isn't too dirty is critical for ensuring that the shrinker can reclaim memory. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-30bcachefs: Improve btree key cache shrinkerKent Overstreet
The shrinker should start scanning for entries that can be freed oldest to newest - this way, we can avoid scanning a lot of entries that are too new to be freed. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>