summaryrefslogtreecommitdiff
path: root/fs/bcachefs/btree_update_leaf.c
AgeCommit message (Collapse)Author
2021-06-04bcachefs: Improve btree iterator tracepointsKent Overstreet
This patch adds some new tracepoints to the btree iterator code, and adds new fields to the existing tracepoints - primarily for the iterator position. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-06-02bcachefs: Check for errors from bch2_trans_update()Kent Overstreet
Upcoming refactoring is going to change bch2_trans_update() to start returning transaction restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-20bcachefs: Fix for buffered writes getting -ENOSPCKent Overstreet
Buffered writes may have to increase their disk reservation at btree update time, due to compression and erasure coding being unpredictable: O_DIRECT writes should be checking for -ENOSPC, but buffered writes have already been accepted and should not. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Split extents if necessary in bch2_trans_update()Kent Overstreet
Currently, we handle multiple overlapping extents in the same transaction commit by doing fixups in bch2_trans_update() - this patch extents that to split updates when necessary. The next patch that changes the reflink code to not fragment extents when making them indirect will require this. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Fix bch2_extent_can_insert() callKent Overstreet
It was being skipped when hole punching, leading to problems when splitting compressed extents. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Add a tracepoint for when we block on journal reclaimKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Fix an out of bounds readKent Overstreet
bch2_varint_decode() can read up to 7 bytes past the end of the buffer, which means we need to allocate slightly larger key cache buffers. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Fix for btree_gc repairing interior btree ptrsKent Overstreet
Using the normal transaction commit path to insert and journal updates to interior nodes hadn't been done before this repair code was written, not surprising that there was a bug. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Always check for invalid bkeys in trans commit pathKent Overstreet
We check for this prior to metadata being written, but we're seeing some strange bugs lately, and this will help catch those closer to where they occur. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Fix journal_reclaim_wait_done()Kent Overstreet
Can't run arbitrary code inside a wait_event() conditional, due to task state being weird... Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Don't call bch2_btree_iter_traverse() unnecessarilyKent Overstreet
If we let bch2_trans_commit() do it, it'll traverse iterators in sorted order which means we'll get fewer lock restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Make sure to kick journal reclaim when we're waiting on itKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Kill bch2_fs_usage_scratch_get()Kent Overstreet
This is an important cleanup, eliminating an unnecessary copy in the transaction commit path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Fix livelock calling bch2_mark_bkey_replicas()Kent Overstreet
The bug was that we were trying to find a replicas entry that wasn't sorted - but, we can also simplify the code by not using bch2_mark_bkey_replicas and instead ensuring the list of replicas entries exists directly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Be more careful about JOURNAL_RES_GET_RESERVEDKent Overstreet
JOURNAL_RES_GET_RESERVED should only be used for updatse that need to be done to free up space in the journal. In particular, when we're flushing keys from the key cache, if we're flushing them out of order we shouldn't be using it, since we're using up our remaining space in the journal without dropping a pin that will let us make forward progress. With this patch, BTREE_INSERT_JOURNAL_RECLAIM without BTREE_INSERT_JOURNAL_RESERVED may return -EAGAIN - we can't wait on journal reclaim if we're already in journal reclaim. This means we need to propagate these errors up to journal reclaim, indicating that flushing a journal pin should be retried in the future. This is prep work for a patch to change the way journal reclaim works, to split out flushing key cache keys because the btree key cache is too dirty from journal reclaim because we need space in the journal. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Fix journal deadlockKent Overstreet
After we get a journal reservation, we need to use it - if we erorr out of a transaction commit, we'll be eating into space in the journal and if our transaction needs to make forward progress in order to reclaim space in the journal, we'll deadlock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Drop trans->nounlockKent Overstreet
Since we're no longer doing btree node merging post commit, we can now delete a bunch of code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Move btree node merging to before transaction commitKent Overstreet
Currently, BTREE_INSERT_NOUNLOCK makes it hard to ensure btree node merging happens reliably - since btree node merging happens after transaction commit, we can't drop btree locks and block when starting the btree update. This patch moves it to before transaction commit - and failure to do a merge that we wanted to do just restarts the transaction. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Don't make foreground writes wait behind journal reclaim too longKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Don't use write side of mark_lock in journal write pathKent Overstreet
The write side of percpu rwsemaphors is really expensive, and we shouldn't be taking it at all in steady state operation. Fortunately, in bch2_journal_super_entries_add_common(), we don't need to - we have a seqlock, usage_lock for accumulating percpu usage counters to the base counters. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Free iterator in bch2_btree_delete_range_trans()Kent Overstreet
This is specifically to speed up bch2_inode_rm(), so that we're not traversing iterators we're done with. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Start using bpos.snapshot fieldKent Overstreet
This patch starts treating the bpos.snapshot field like part of the key in the btree code: * bpos_successor() and bpos_predecessor() now include the snapshot field * Keys in btrees that will be using snapshots (extents, inodes, dirents and xattrs) now always have their snapshot field set to U32_MAX The btree iterator code gets a new flag, BTREE_ITER_ALL_SNAPSHOTS, that determines whether we're iterating over keys in all snapshots or not - internally, this controlls whether bkey_(successor|predecessor) increment/decrement the snapshot field, or only the higher bits of the key. We add a new member to struct btree_iter, iter->snapshot: when BTREE_ITER_ALL_SNAPSHOTS is not set, iter->pos.snapshot should always equal iter->snapshot, which will be 0 for btrees that don't use snapshots, and alsways U32_MAX for btrees that will use snapshots (until we enable snapshot creation). This patch also introduces a new metadata version number, and compat code for reading from/writing to older versions - this isn't a forced upgrade (yet). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Split out bpos_cmp() and bkey_cmp()Kent Overstreet
With snapshots, we're going to need to differentiate between comparisons that should and shouldn't include the snapshot field. bpos_cmp is now the comparison function that does include the snapshot field, used by core btree code. Upper level filesystem code generally does _not_ want to compare against the snapshot field - that code wants keys to compare as equal even when one of them is in an ancestor snapshot. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Add a mechanism for running callbacks at trans commit timeKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Fix for bch2_trans_commit() unlocking when it's not supposed toKent Overstreet
When we pass BTREE_INSERT_NOUNLOCK bch2_trans_commit isn't supposed to unlock after a successful commit, but it was calling bch2_trans_cond_resched() - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Switch extent_handle_overwrites() to one key at a timeKent Overstreet
Prep work for snapshots Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Fix btree iterator leak in extent_handle_overwrites()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: __bch2_trans_get_iter() refactoring, BTREE_ITER_NOT_EXTENTSKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: btree_iter_live()Kent Overstreet
New helper to clean things up a bit - also, improve iter->flags handling. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Improve handling of extents in bch2_trans_update()Kent Overstreet
The transaction update/commit path cares about whether it's inserting extents or regular keys; extents require extra passes (handling of overlapping extents) but sometimes we want to skip all that. This clarifies things by adding a new member to btree_insert_entry specifying whether the key being inserted is an extent, instead of overloading BTREE_ITER_IS_EXTENTS. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Rename BTREE_ID enums for consistency with other enumsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: KEY_TYPE_discard is no longer usedKent Overstreet
KEY_TYPE_discard used to be used for extent whiteouts, but when handling over overlapping extents was lifted above the core btree code it became unused. This patch updates various code to reflect that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Don't call into journal reclaim when we're not supposed toKent Overstreet
This was causing a deadlock when btree_update_nodes_writtes() invokes journal reclaim because of the btree cache being too dirty. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Extents may now cross btree node boundariesKent Overstreet
When snapshots arrive, we won't necessarily be able to arbitrarily split existis - when we need to split an existing extent, we'll have to check if the extent was overwritten in child snapshots and if so emit a whiteout for the split in the child snapshot. Because extents couldn't span btree nodes previously, journal replay would sometimes have to split existing extents. That's no good anymore, but fortunately since extent handling has already been lifted above most of the btree code there's no real need for that rule anymore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: iter->real_posKent Overstreet
We need to differentiate between the search position of a btree iterator, vs. what it actually points at (what we found). This matters for extents, where iter->pos will typically be the start of the key we found and iter->real_pos will be the end of the key we found (which soon won't necessarily be in the same btree node!) and it will also matter for snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Ensure btree iterators are traversed in bch2_trans_commit()Kent Overstreet
The upcoming patch to allow extents to span btree nodes will require this... and this assertion seems to be popping, and it's not a very good assertion anyways. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Kill bch2_btree_iter_set_pos_same_leaf()Kent Overstreet
The only reason we were keeping this around was for BTREE_INSERT_NOUNLOCK semantics - if bch2_btree_iter_set_pos() advances to the next leaf node, it'll drop the lock on the node that we just inserted to. But we don't rely on BTREE_INSERT_NOUNLOCK semantics for the extents btree, just the inodes btree, and if we do need it for the extents btree in the future we can do it more cleanly by cloning the iterator - this lets us delete some special cases in the btree iterator code, which is complicated enough as it is. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Ensure __bch2_trans_commit() always calls bch2_trans_reset()Kent Overstreet
This was leading to a very strange bug in bch2_bucket_io_time_reset(), where we'd retry without clearing out the list of updates. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Verify transaction updates are sortedKent Overstreet
A user reported a bug that implies they might not be correctly sorted, this should help track that down. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Don't use BTREE_INSERT_USE_RESERVE so muchKent Overstreet
Previously, we were using BTREE_INSERT_RESERVE in a lot of places where it no longer makes sense. - we now have more open_buckets than we used to, and the reserves work better, so we shouldn't need to use BTREE_INSERT_RESERVE just because we're holding open_buckets pinned anymore. - We have the btree key cache for updates to the alloc btree, meaning we no longer need the btree reserve to ensure the allocator can make forward progress. This means that we should only need a reserve for btree updates to ensure that copygc can make forward progress. Since it's now just for copygc, we can also fold RESERVE_BTREE into RESERVE_MOVINGGC (the allocator's freelist reserve). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Fix btree lock being incorrectly droppedKent Overstreet
__btree_trans_get_iter() was using bch2_btree_iter_upgrade, but it shouldn't have been because on failure bch2_btree_iter_upgrade may drop locks in other iterators, expecting the transaction to be restarted. But __btree_trans_get_iter can't return an error to indicate that we need to restart thet transaction - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Check for errors in bch2_journal_reclaim()Kent Overstreet
If the journal is halted, journal reclaim won't necessarily be able to make any forward progress, and won't accomplish anything anyways - we should bail out so that we don't get stuck looping in reclaim when the caches are too dirty and we should be shutting down. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: bch2_trans_get_iter() no longer returns errorsKent Overstreet
Since we now always preallocate the maximum number of iterators when we initialize a btree transaction, getting an iterator never fails - we can delete a fair amount of error path code. This patch also simplifies the iterator allocation code a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: bch2_btree_delete_range_trans()Kent Overstreet
This helps reduce stack usage by avoiding multiple btree_trans on the stack. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Throttle updates when btree key cache is too dirtyKent Overstreet
This is needed to ensure we don't deadlock because journal reclaim and thus memory reclaim isn't making forward progress. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Simplify transaction commit error pathKent Overstreet
The transaction restart path traverses all iterators, we don't need to do it here. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Add accounting for dirty btree nodes/keysKent Overstreet
This lets us improve journal reclaim, so that it now tries to make sure no more than 3/4s of the btree node cache and btree key cache are dirty - ensuring the shrinkers can free memory. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Drop typechecking from bkey_cmp_packed()Kent Overstreet
This only did anything in two places, and those can just be replaced wiht bkey_cmp_left_packed()). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Drop sysfs interface to debug parametersKent Overstreet
It's not used much anymore, the module paramter interface is better. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Fix btree updates when mixing cached and non cached iteratorsKent Overstreet
There was a bug where bch2_trans_update() would incorrectly delete a pending update where the new update did not actually overwrite the existing update, because we were incorrectly using BTREE_ITER_TYPE when sorting pending btree updates. This affects the pending patch to use cached iterators for inode updates. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>