Age | Commit message (Collapse) | Author |
|
Now that we can reliably designate and find the master subvolume out of
a tree of snapshots, we can finally make quotas work with snapshots:
That is - quotas will now _ignore_ snapshot subvolumes, and only be in
effect for the master (non snapshot) subvolume.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add two new fields to bch_subvolume:
- otime: creation time
- parent: For snapshots, this is the id of the subvolume the snapshot
was created from
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This adds a new btree which gets us a persistent per-snapshot-tree
identifier.
- BTREE_ID_snapshot_trees
- KEY_TYPE_snapshot_tree
- bch_snapshot now has a field that points to a snapshot_tree
This is going to be used to designate one snapshot ID/subvolume out of a
given tree of snapshots as the "main" subvolume, so that we can do quota
accounting in that subvolume and not the rest.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a new helper for allocating a new slot in a btree.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
It's safe to call bch2_trans_update with a k/v pair where the value
hasn't been filled out, as long as the key part has been and the value
is filled out by transaction commit time.
This patch folds the bch2_trans_update() call into bch2_bkey_get_mut(),
eliminating a bit of boilerplate.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
It's safe to call bch2_trans_update with a k/v pair where the value
hasn't been filled out, as long as the key part has been and the value
is filled out by transaction commit time.
This patch folds the bch2_trans_update() call into bch2_bkey_alloc(),
eliminating a bit of boilerplate.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
- bch2_bkey_get_mut() now handles types increasing in size, allocating
a buffer for the type's current size when necessary
- bch2_bkey_make_mut_typed()
- bch2_bkey_get_mut() now initializes the iterator, like
bch2_bkey_get_iter()
Also, refactor so that most of the code is in functions - now macros are
only used for wrappers.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Introduce new helpers for a common pattern:
bch2_trans_iter_init();
bch2_btree_iter_peek_slot();
- bch2_bkey_get_iter_type() returns -ENOENT if it doesn't find a key of
the correct type
- bch2_bkey_get_val_typed() copies the val out of the btree to a
(typically stack allocated) variable; it handles the case where the
value in the btree is smaller than the current version of the type,
zeroing out the remainder.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This adds a new field to bkey_ops for the minimum size of the value,
which standardizes that check and also enforces the new rule (previously
done somewhat ad-hoc) that we can extend value types by adding new
fields on to the end.
To make that work we do _not_ initialize min_val_size with sizeof,
instead we initialize it to the size of the first version of those
values.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
A workqueue resource deadlock has been observed when running fsck
on a filesystem with a full/stuck journal. fsck is not currently
able to repair the fs due to fairly rapid emergency shutdown, but
rather than exit gracefully the fsck process hangs during the
shutdown sequence. Fortunately this is easily recoverable from
userspace, but the root cause involves code shared between the
kernel and userspace and so should be addressed.
The deadlock scenario involves the main task in the bch2_fs_stop()
-> bch2_fs_read_only() path waiting on write references to drain
with the fs state lock held. A bch2_read_only_work() workqueue task
is scheduled on the system_long_wq, blocked on the state lock.
Finally, various other write ref holding workqueue tasks are
scheduled to run on the same workqueue and must complete in order to
release references that the initial task is waiting on.
To avoid this problem, we can split the dependent workqueue tasks
across different workqueues. It's a bit of a waste to create a
dedicated wq for the read-only worker, but there are several tasks
throughout the fs that follow the pattern of acquiring a write
reference and then scheduling to the system wq. Use a local wq
for such tasks to break the subtle dependency between these and the
read-only worker.
Signed-off-by: Brian Foster <bfoster@redhat.com>
|
|
This adds private error codes for most (but not all) of our ENOMEM uses,
which makes it easier to track down assorted allocation failures.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This fixes a null ptr deref when creating new snapshots:
bch2_create_trans() will lookup the subvolume and find the _new_
snapshot in the BCH_CREATE_SUBVOL path that's being created in that
transaction.
We have to call bch2_mark_snapshot() earlier so that it's properly
initialized, instead of leaving it for transaction commit.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
btree & level are passed to trans_mark - for backpointers -
bch2_mark_key() should take them as well.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This fixes a bug where bch2_mark_snapshot() wasn't called for existing
snapshot nodes being updated when child nodes were added.
This led to the data update path thinking the key being updated was for
a snapshot that didn't have children, causing it to fail to insert
whiteouts when splitting existing extents.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This adds a debug mode where we split up the c->writes refcount into
distinct refcounts for every codepath that takes a reference, and adds
sysfs code to print the value of each ref.
This will make it easier to debug shutdown hangs due to refcount leaks.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This introduces some new conveniences, to help cut down on boilerplate:
- bch2_trans_kmalloc_nomemzero() - performance optimiation
- bch2_bkey_make_mut()
- bch2_bkey_get_mut()
- bch2_bkey_get_mut_typed()
- bch2_bkey_alloc()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We shouldn't be overloading standard error codes now that we have
provisions for bcachefs-specific errorcodes: this patch converts super.c
and super-io.c to per error site errcodes, with a bit of cleanup.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This patch introduces
- bpos_eq()
- bpos_lt()
- bpos_le()
- bpos_gt()
- bpos_ge()
and equivalent replacements for bkey_cmp().
Looking at the generated assembly these could probably be improved
further, but we already see a significant code size improvement with
this patch.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This provides an inlined version of bch2_subvolume_get() and uses it in
bch2_subvolume_get_snapshot(), since this is the version that's used all
over the place and in fast paths (e.g. IO paths).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Fixes for various checkpatch errors.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Continuing the saga of introducing private dedicated error codes for
each error path, this patch converts ENOSPC to error codes that are
subtypes of ENOSPC. We've recently had a test failure where we got
-ENOSPC where we shouldn't have, and didn't have enough information to
tell where it came from, so this patch will solve that problem.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We were iterating starting at BCACHEFS_ROOT_INO, but snapshots start at
POS_MIN - meaning this code was never getting run.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Reported-by: Olexa Bilaniuk <obilaniu@gmail.com>
|
|
This fixes an assertion when the transaction has been unexpectedly
restarted.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Now that we have error codes, with subtypes, we can switch to our own
error code for transaction restarts - and even better, a distinct error
code for each transaction restart reason: clearer code and better
debugging.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
The new for_each_btree_key2() macro handles transaction retries,
allowing us to avoid nested transactions - which we want to avoid since
they're tricky to do completely correctly and upcoming assertions are
going to be checking for that.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This introduces two new macros for iterating through the btree, with
transaction restart handling
- for_each_btree_key2()
- for_each_btree_key_commit()
Every iteration is now in an implicit transaction, and - as with
lockrestart_do() and commit_do() - returning -EINTR will cause the
transaction to be restarted, at the same key.
This patch converts a bunch of code that was open coding this to these
new macros, saving a substantial amount of code.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
- Bunch of refactoring, and move some code out of
bch2_snapshots_start() and into bch2_snapshots_check(), for constency
with the rest of fsck
- Interior snapshot nodes no longer point to a subvolume; this is so we
don't end up with dangling subvol references when deleting or require
scanning the full snapshots btree.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
fsck doesn't want to run while we're cleaning up deleted snapshots - if
that work needs to be done, we want it to have finished before fsck
runs, otherwise fsck will get confused when it finds multiple keys in
the same snapshot ID equivalence class (i.e. the mechanism that
snapshot deletion uses for cleaning up redundant keys).
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
snapshots_seen is becoming private to fsck, and snapshot_id_list is
actually what the data update path needs.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Snapshots being deleted won't in general have a corresponding subvolume:
this fixes a spurious fsck error where we'd complain about a snapshot
pointing to a missing subvolume - but the subvolume had been deleted,
and the snapshot was pending deletion as well.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Better/more descriptive naming, and prep for adding
nested_lockrestart_do() and nested_commit_do().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
If we're trying to get a ref and the refcount has been killed, it means
we're doing an emergency shutdown - we always want tryget_live().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This converts bcachefs to the modern printbuf interface/implementation,
synced with the version to be submitted upstream.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This adds a new parameter to .key_invalid() methods for whether the key
is being read or written; the idea being that methods can do more
aggressive checks when a key is newly created and being written, when we
wouldn't want to delete the key because of those checks.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
We weren't properly catching errors from snapshot_live() - oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Inspired by CCAN darray - simple, stupid resizable (dynamic) arrays.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This still needs to be expanded more, but this adds a basic test for
BTREE_ITER_FILTER_SNAPSHOTS.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
With snapshots, bch2_trans_update() has to check if we need a whitout,
which can cause a transaction restart, so this is important now.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Quota support was disabled when snapshots were released, because of some
tricky interactions with snpashots. We're sidestepping that for now -
we're simply disabling quota accounting on snapshot subvolumes.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This helps to unify the interface between bch2_mark_key() and
bch2_trans_mark_key() - and it also gives access to the journal
reservation and journal seq in the mark_key path.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This fixes building in userspace - code that's coupled to the kernel VFS
interface should live in fs.c
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Snapshot deletion needs to become a multi step process, where we unlink,
then tear down the page cache, then delete the subvolume - the deleting
flag is equivalent to an inode with i_nlink = 0.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Subvolume deletion doesn't flush & evict the btree key cache - ideally
it would, but that's tricky, so instead bch2_subvolume_create() needs to
make sure the slot doesn't exist in the key cache to avoid triggering
assertions.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Factor out a little helper.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This patch adds subvolume.c - support for the subvolumes and snapshots
btrees and related data types and on disk data structures. The next
patches will start hooking up this new code to existing code.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|