Age | Commit message (Collapse) | Author |
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
By default the maximum per-device journal is 8GB (and this should
probably be revised for large memory machines).
When debugging we often want as much history as possible so that we can
reconstruct a full sequence of events: enabling this option raises the
limit to 32GB, at the expense of slower recovery from unclean shutdown.
Users with mixed ssd/hdd setups may want to ensure that only the ssds
are used for journalling, via the data_allowed option (note that we
don't yet have an easy way to remove a journal from an existing device).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add an option to skip rewinding extents when rewinding the journal.
If the only extents operatinos that have occurred are data move
operations, then this will give a higher probability of successful
recovery - until we implement the "buffer discards by a percentage of
device size" functionality.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Case folding is often applied to subtrees and not on an entire
filesystem.
Disallowing layers from filesystems that support case folding is over
limiting.
Replace the rule that case-folding capable are not allowed as layers
with a rule that case folded directories are not allowed in a merged
directory stack.
Should case folding be enabled on an underlying directory while
overlayfs is mounted the outcome is generally undefined.
Specifically in ovl_lookup(), we check the base underlying directory
and fail with -ESTALE and write a warning to kmsg if an underlying
directory case folding is enabled.
Suggested-by: Kent Overstreet <kent.overstreet@linux.dev>
Link: https://lore.kernel.org/linux-fsdevel/20250520051600.1903319-1-kent.overstreet@linux.dev/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
REQ_FUA means "skip the drive cache", and it can be used with reads to.
If there was a checksum error, we want to retry the whole read path, not
read it from cache again.
Suggested-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a sysfs attribute for checking whether read fua appears to behave
properly on a device.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This adds shrinker.to_text() methods for our shrinkers and hooks them up
to our existing to_text() functions.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
exec_update_lock is used to check permissions, no need here.
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Differentiate between pointers to invalid devices and pointers to
removed devices in log messages and superblock error counters.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a bitmask of device slots that have been marked as removed: this
will be used in the next patch for differentiating, in error messages
and counters, between references to invalid devices and references to
removed devices.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
For regular files: reconstruct if more than three extents are found
found
For directories: reconstruct if a single dirent is found.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
If we find a dirent pointing to a missing inode, check for
dirents/extents assocatiated with that inode number: if present,
reconstruct the inode insead of deleting the dirent.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
On filesystems that predate persistent cursors, inode_generation keys
will be present (no longer needed because the cursor tracks the current
generation).
But the new inode allocation code skipped allocating slots with
inode_generation keys, which led to a lot of unnecessary scanning, which
this patch fixes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Use bio_add_folio_nofail() to replace the unfailable bio_add_folio()
operation.
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
For the part of directly mapping the kernel virtual address, there is no
need to increase to bio page-by-page. It can be directly replaced by
bio_add_virt_nofail().
For the address part of the vmalloc region, its physical address is
discontinuous and needs to be increased page-by-page to bio. The helper
function bio_add_vmalloc() can be used to simplify the implementation of
bch2_bio_map().
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202506261026.ZxtJ7yeV-lkp@intel.com/
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Fix a performance bug when doing many unlinks.
The btree has optimizations to ensure we don't have too many whiteouts
to scan in peek() before we find a real key to return, but unflushed key
cache deletions break this.
To fix this, tweak the existing code for redirecting updates that create
a key to the underlying btree so that we can use it for deletions as
well.
Reported-by: John Schoenick <johns@valvesoftware.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We require that if a key exists in the key cache it also be present in
the underlying btree, for cache coherency reasons.
So checking the key cache on whiteout is unnecessary. This is part of
fixing a major performance bug when doing many unlinks all in a row -
we end up scanning through a ton of key cache whiteouts before peek()
can return a real key.
Reported-by: John Schoenick <johns@valvesoftware.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Don't delete dirents or extents if it's the wrong type of inode.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
new helper
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
buf->u64s is all we used.
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a new fsck flag, FSCK_ERR_SILENT, to suppress logging the error in
dmesg.
Use this for allocator async repair.
Also, make sure that we _do_ still log silent error correction in the
journal.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
the 'ip' parameter to bch2_trans_kmalloc() is used for bump allocator
tracing: when we exceed the bump allocator limit, it dumps a list of
allocations and what function did them.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Convert standard errcodes to private error codes, and return them with
bch_err_throw(), for better debugging.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Fix the CONFIG_UNICODE checks - IS_ENABLED() is required when unicode is
bulit as a module.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Kill some temporaries on the stack that were completely unnecessary.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add an option for completely disabling casefolding on a filesystem, as a
workaround for overlayfs.
This should only be needed as a temporary workaround, until the
overlayfs fix arrives.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Reported-by: syzbot+cc7567f096079cb4146f@syzkaller.appspotmail.com
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Checking for invalid IDs was introduced in 9e7cfb35e266 ("bcachefs: Check for invalid btree IDs")
to prevent an invalid shift later, but since 141526548052 ("bcachefs: Bad btree roots are now autofix")
which made btree_root_bkey_invalid autofix, the fsck_err_on call didn't
do anything.
We can mark this err type (invalid_btree_id) autofix as well, so it gets
handled.
Reported-by: syzbot+029d1989099aa5ae3e89@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=029d1989099aa5ae3e89
Fixes: 141526548052 ("bcachefs: Bad btree roots are now autofix")
Signed-off-by: Bharadwaj Raju <bharadwaj.raju777@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Fix a 6.16 regression from the recovery pass rework, which introduced a
bug where calling bch2_run_explicit_recovery_pass() would only return
the error code to rewind recovery for the first call that scheduled that
recovery pass.
If the error code from the first call was swallowed (because it was
called by an asynchronous codepath), subsequent calls would go "ok, this
pass is already marked as needing to run" and return 0.
Fixing this ensures that check_topology bails out to run btree_node_scan
before doing any repair.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Previously, calling bch2_btree_has_scanned_nodes() when btree node
scan hadn't actually run would erroniously return false - causing us to
think a btree was entirely gone.
This fixes a 6.16 regression from moving the scheduling of btree node
scan out of bch2_btree_lost_data() (fixing the bug where we'd schedule
it persistently in the superblock) and only scheduling it when
check_toploogy() is asking for scanned btree nodes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Autofix is specified in btree_gc.c if it's not an important btree.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
wait_on_allocator() emits debug info when we hang trying to allocate.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Reported-by: syzbot+d540192e763531d307ff@syzkaller.appspotmail.com
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Before calling bch2_indirect_extent_missing_error(), we have to
calculate the missing range, which is the intersection of the reflink
pointer and the non-indirect-extent we found.
The calculation didn't take into account that the returned extent may
span the iter position, leading to an infinite loop when we
(unnecessarily) resized the extent we were returning to one that didn't
extend past the offset we were looking up.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Make sure we return a standard error code.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|