Age | Commit message (Collapse) | Author |
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
|
|
Delete some obsolete tracepoints, organize alloc tracepoints better,
make a few tracepoints more consistent.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
We're seeing occasional firings of the assertion in the key cache
shutdown code that nr_dirty == 0, which means we must sometimes be doing
transaction commits after we've gone read only.
Cleanups & changes:
- BCH_FS_ALLOC_CLEAN renamed to BCH_FS_CLEAN_SHUTDOWN
- new helper bch2_btree_interior_updates_flush(), which returns true if
it had to wait
- bch2_btree_flush_writes() now also returns true if there were btree
writes in flight
- __bch2_fs_read_only now checks if btree writes were in flight in the
shutdown loop: btree write completion does a transaction update, to
update the pointer in the parent node
- assert that !BCH_FS_CLEAN_SHUTDOWN in __bch2_trans_commit
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
For backpointers, we'll need the full key location - that means btree_id
and btree level. This patch plumbs it through.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
When many journal replay keys have been overwritten,
bch2_journal_keys_peek() was taking excessively long to scan before it
found a key to return.
Fix this by introducing bch2_journal_keys_peek_upto() which takes a
parameter for the end of the range we want, so that we can terminate the
search much sooner, and replace all uses of bch2_journal_keys_peek()
with peek_upto() or peek_slot().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
We now pass a rw argument to .key_invalid methods so they can trigger
assertions for updates but not on existing keys. We shouldn't trigger
these extra assertions in journal replay - this patch changes the
transaction commit path accordingly.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
New helper, for deleting extents.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This gets us better error messages.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This adds a new parameter to .key_invalid() methods for whether the key
is being read or written; the idea being that methods can do more
aggressive checks when a key is newly created and being written, when we
wouldn't want to delete the key because of those checks.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
For backpointers, we'll need to delete old backpointers before adding
new backpointers - otherwise we'll run into spurious duplicate
backpointer errors.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Add a new helper for logging messages to the journal - a new debugging
tool, an alternative to trace_printk().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Previously, we'd go into an infinite loop when attempting to cache a
bkey in the key cache larger than 128 u64s - since we were only using a
u8 for the size field, it'd get rounded up to 256 then truncated to 0.
Oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Since journal reclaim -> btree key cache flushing may require the
allocation of new btree nodes, it has an implicit dependency on copygc
in order to make forward progress - so we should avoid blocking copygc
unless the journal is really close to full.
This introduces watermarks to replace our single MAY_GET_UNRESERVED bit
in the journal, and adds a watermark for copygc and plumbs it through.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
BTREE_TRIGGER_WANTS_OLD_AND_NEW didn't work correctly when the old and
new key were both alloc keys, but different versions - it required old
and new key type to be identical, and this bug is a problem for the new
allocator rewrite.
This patch fixes it by checking if the old and new key have the same
trigger functions - the different versions of alloc (and inode) keys
have the same trigger functions.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
We recently started stashing a copy of the key being overwritten in
btree_insert_entry: this is helpful for avoiding multiple calls to
bch2_btree_path_peek_slot() and bch2_journal_keys_peek() in the
transaction commit path.
But it turns out this has a problem - when we run mem/atomic triggers,
we've done a couple things that can invalidate the pointer to the old
key's value. This makes the optimization of stashing a pointer to the
old value questionable, but for now this patch revalidates that pointer
before running mem triggers.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
As we've already reserved space in the journal this optimization doesn't
actually buy us anything, and when doing list_journal debugging it
deletes information we want.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
In BTREE_ITER_FILTER_SNAPHOTS mode, we skip over keys in unrelated
snapshots. When we hit the end of an inode, if the next inode(s) are in
a different subvolume, we could potentially have to skip past many keys
before finding a key we can return to the caller, so they can terminate
the iteration.
This adds a peek_upto() variant to solve this problem, to be used when
we know the range we're searching within.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
It wasn't used as iter_flags (excepting the unit tests, which this patch
fixes), and the next patch is going to need to pass in
BTREE_TRIGGER_NORUN.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This fixes a regression from "bcachefs: Stash a copy of key being
overwritten in btree_insert_entry". In btree_key_can_insert_cached(), we
may reallocate the key cache key, invalidating pointers previously
returned by peek() - fix it by issuing a transaction restart.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
bch2_btree_node_write_cond() was only used in one place - this inlines
it into __btree_node_flush() and makes the cmpxchg loop actually
correct.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This is for adding an array of strings for btree node flag names.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This patch changes printbufs dynamically allocate and reallocate a
buffer as needed. Stack usage has become a bit of a problem, and a major
cause of that has been static size string buffers on the stack.
The most involved part of this refactoring is that printbufs must now be
exited with printbuf_exit().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Triggers can generate additional btree updates - we need to run alloc
triggers after all other triggers have run, because they generate
updates for the alloc btree.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Previously, when doing updates and running triggers before journal
replay completes, triggers would see the incorrect key for the old key
being overwritten - this patch updates the trigger code to check the
journal keys when necessary, needed for the upcoming allocator rewrite.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
We currently need to call bch2_btree_path_peek_slot() multiple times in
the transaction commit path - and some of those need to be updated to
also check the keys from journal replay, too. Let's consolidate this and
stash the key being overwritten in btree_insert_entry.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Upcoming patches are doing more work on the triggers code, this patch
just moves code around.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
We're now coming up with triggers that modify the update being done. A
bkey_s_c is const - bkey_i is the correct type to be using here.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This improves __bch2_trans_commit - early in the recovery process, when
we're running btree_gc and before we want to go RW, it now uses
bch2_journal_key_insert() to add the update to the list of updates for
journal replay to do, instead of btree_gc having to use separate
interfaces depending on whether we're running at bringup or, later,
runtime.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This is prep work for the next patch, which is going to change
__bch2_trans_commit() to use bch2_journal_key_insert() when very early
in the recovery process, so that we have a unified interface for doing
btree updates.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This consolidates some of the btree node lock path, so that when we're
blocked taking a write lock on a node it shows up in
bch2_btree_trans_to_text(), along with intent and read locks.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Updates to non key cache iterators will now be transparently redirected
to the key cache for cached btrees.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Change the error messages in bch2_inconsistent_error() and
bch2_fatal_error() so we can distinguish them.
Also, prefer bch2_fs_fatal_error() (which also logs an error message) to
bch2_fatal_error(), and change a call to bch2_inconsistent_error() to
bch2_fatal_error() when we can't continue.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
BTREE_INSERT_LAZY_RW shouldn't do anything after the filesystem has
finished starting up - otherwise, it might interfere with going
read-only as part of shutting down.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
With BTREE_ITER_FILTER_SNAPSHOTS, we have to distinguish between the
path where the key was found, and the path for inserting into the
current snapshot. This adds a new field to struct btree_iter for saving
a path for the current snapshot, and plumbs it through
bch2_trans_update().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This breaks bch2_trans_commit_run_triggers() up into multiple functions,
and deletes a bit of duplication - prep work for triggers on alloc keys,
which will need to run last.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
It shouldn't run if the btree being checked doesn't have snapshots.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Implement a hash table, using cuckoo hashing, for empty buckets that are
waiting on a journal commit before they can be reused.
This replaces the journal_seq field of bucket_mark, and is part of
eventually getting rid of the in memory bucket array.
We may need to make bch2_bucket_needs_journal_commit() lockless, pending
profiling and testing.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Symbol decoding, via %ps, isn't supported in userspace - this will also
be faster when we're using trans->fn in the fast path, as with the new
BCH_JSET_ENTRY_log journal messages.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
bch2_trans_commit() can legitimately return -ENOSPC with
BTREE_INSERT_NOFAIL set if BTREE_INSERT_NOWAIT was also set.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
With BTREE_ITER_WITH_JOURNAL, there's no longer any restrictions on the
order we have to replay keys from the journal in, and we can also start
up journal reclaim right away - and delete a bunch of code.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Add a flag to indicate whether a journal replay key has been
overwritten, and set/test it with appropriate btree locks held.
This fixes a race between the allocator - invalidating buckets, and
doing btree updates - and journal replay, which before this patch could
clobber the allocator thread's update with an older version of the key
from the journal.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Add a journal entry type for logging messages, and add an option to use
it to log the transaction name - this makes for a very handy debugging
tool, as with it we can use the 'bcachefs list_journal' command to see
not only what updates were done, but what was doing them.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This adds some missing diagnostics from rare but annoying to debug
runtime allocation failure paths.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Will be used by the new snapshot tests, to pass in
BTREE_ITER_ALL_SNAPSHOTS.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This adds a flag to not mark the initial btree_path as preserve, for
paths that we expect to be cheap to reconstitute if necessary - this
solves a btree_path overflow caused by need_whiteout_for_snapshot().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
bch2_btree_delete_range() can split compressed extents, thus needs to
pass in a disk reservation when we're operating on extents btrees.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
With snapshots, bch2_trans_update() has to check if we need a whitout,
which can cause a transaction restart, so this is important now.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|