summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-03-02bcachefs: new subvol ioctlsbcachefs_subvol_ioctlsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-02bcachefs: bch2_inum_to_path()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-02statx: stx_volKent Overstreet
Add a new statx field for (sub)volume identifiers. This includes bcachefs support; we'll definitely want btrfs support as well. Link: https://lore.kernel.org/linux-fsdevel/2uvhm6gweyl7iyyp2xpfryvcu2g3padagaeqcbiavjyiis6prl@yjm725bizncq/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Howells <dhowells@redhat.com>
2024-03-02bcachefs: copy_(to|from)_user_errcode()Kent Overstreet
we've got some helpers that return errors sanely, move them to a more common location for use in fs-ioctl.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-01bcachefs: Split out bkey_types.hKent Overstreet
We're going to need bkey_types.h in bcachefs_ioctl.h in a future patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-01bcachefs: fix lost journal buf wakeup due to improved pipeliningBrian Foster
The journal_write_done() handler was reworked into a loop in commit 746a33c96b7a ("bcachefs: better journal pipelining"). As part of this, the journal buffer wake was factored into a post-loop branch that executes if at least one journal buffer has completed. The journal buffer processing loop iterates on the journal buffer pointer, however. This means that w refers to the last buffer processed by the loop, which may or may not be done. This also means that if multiple buffers are processed by the loop, only the last is awoken. This lost wakeup behavior has lead to stalling problems in various CI and fstests, such as generic/703. Lift the wake into the loop so each done buffer sees a wake call as it is processed. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-01bcachefs: intercept mountoption value for bool typeHongbo Li
For mount option with bool type, the value must be 0 or 1 (See bch2_opt_parse). But this seems does not well intercepted cause for other value(like 2...), it returns the unexpect return code with error message printed. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-29bcachefs: avoid returning private error code in bch2_xattr_bcachefs_setHongbo Li
Avoid the private error code return to caller. The error code should be transformed into genernal error code. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-29bcachefs: Buffered write path now can avoid the inode lockKent Overstreet
Non append, non extending buffered writes can now avoid taking the inode lock. To ensure atomicity of writes w.r.t. other writes, we lock every folio that we'll be writing to, and if this fails we fall back to taking the inode lock. Extensive comments are provided as to corner cases. Link: https://lore.kernel.org/linux-fsdevel/Zdkxfspq3urnrM6I@bombadil.infradead.org/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-29fs: file_remove_privs_flags()Kent Overstreet
Rename and export __file_remove_privs(); for a buffered write path that doesn't take the inode lock we need to be able to check if the operation needs to do work first. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org>
2024-02-28bcachefs: Fix bch2_journal_noflush_seq()Kent Overstreet
Improved journal pipelining broke journal_noflush_seq(); it implicitly assumed only the oldest outstanding journal buf could be in flight, but that's no longer true. Make this more straightforward by just setting buf->must_flush whenever we know a journal buf is going to be flush. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-28bcachefs: fix the error code when mounting with incorrect options.Hongbo Li
When mount with incorrect options such as: "mount -t bcachefs -o errors=back /dev/loop1 /mnt/bcachefs/". It rebacks the error "mount: /mnt/bcachefs: permission denied." cause bch2_parse_mount_opts returns -1 and bch2_mount throws it up. This is unreasonable. The real error message should be like this: "mount: /mnt/bcachefs: wrong fs type, bad option, bad superblock on /dev/loop1, missing codepage or helper program, or other error." Adding three private error codes for mounting error. Here are: - BCH_ERR_mount_option as the parent class for option error. - BCH_ERR_option_name represents the invalid option name. - BCH_ERR_option_value represents the invalid option value. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27bcachefs: split out ignore_blacklisted, ignore_not_dirtyKent Overstreet
prep work for replaying the journal backwards Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27bcachefs: fix check_inode_deleted_list()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27bcachefs: improve move_gap()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27bcachefs: journal_keys now uses darray helpersKent Overstreet
nice bit of code cleanup Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27bcachefs: Rename journal_keys.d -> journal_keys.dataKent Overstreet
This will let us use some darray helpers in the next patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27bcachefs: jset_entry for loops declare loop iterKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27bcachefs: Errcode tracepoint, documentationKent Overstreet
Add a tracepoint for downcasting private errors to standard errors, so they can be recovered even when not logged; also, add some documentation. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27bcachefs: remove redundant assignment to variable retColin Ian King
Variable ret is being assigned a value that is never read, it is being re-assigned a couple of statements later on. The assignment is redundant and can be removed. Cleans up clang scan build warning: fs/bcachefs/super-io.c:806:2: warning: Value stored to 'ret' is never read [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27bcachefs: Silence gcc warnings about arm arch ABI driftCalvin Owens
32-bit arm builds emit a lot of spam like this: fs/bcachefs/backpointers.c: In function ‘extent_matches_bp’: fs/bcachefs/backpointers.c:15:13: note: parameter passing for argument of type ‘struct bch_backpointer’ changed in GCC 9.1 Apply the change from commit ebcc5928c5d9 ("arm64: Silence gcc warnings about arch ABI drift") to fs/bcachefs/ to silence them. Signed-off-by: Calvin Owens <jcalvinowens@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27thread_with_file: add f_ops.flushKent Overstreet
Add a flush op, to return the exit code via close(). Also update bcachefs usage to use this to return fsck exit codes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27bcachefs: Add journal.blocked to journal_debug_to_text()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27bcachefs: Fix journal_buf bitfield accessesKent Overstreet
All jounal_buf bitfield updates must happen under the journal lock - perhaps we should just switch these to atomic bit flags. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27bcachefs: Split out discard fastpathKent Overstreet
Buckets usually can't be discarded until the transaction that made them empty has been committed in the journal. Tracing has indicated that we're queuing the discard worker excessively, only for it to skip over many buckets that are still waiting on a journal commit, discarding only one or two buckets per iteration. We want to switch to only queuing the discard worker after a journal flush write, but there's an important optimization we need to preserve: if a bucket becomes empty and it was never committed in the journal while it was in use, we want to discard it and reuse it right away - since overwriting it before the previous writes are flushed from the device cache eans those writes only cost bus bandwidth. So, this patch implements a fast path for buckets that can be discarded right away. We need new locking between the two discard workers; the new list of buckets being discarded provides that locking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27bcachefs: improve bch2_journal_buf_to_text()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27bcachefs: Drop redundant btree_path_downgrade()sKent Overstreet
If a path doesn't have any active references, we shouldn't downgrade it; it'll either be reused, possibly with intent refs again, or dropped at bch2_trans_begin() time. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27bcachefs: rebalance_status now shows correct unitsDaniel Hill
Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27bcachefs: more informative write path error messageKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27MAINTAINERS: repair file entries in THREAD WITH FILELukas Bulwahn
Commit ead021e0fe5b ("thread_with_file: Lift from bcachefs") adds the section THREAD WITH FILE with file entries to the relevant header files thread_with_file.h and thread_with_file_types.h in include/linux/. The commit however unintentionally refers to files with a .c extension, but the header files are of course with .h extension. Fortunately, the script './scripts/get_maintainer.pl --self-test=patterns' notices that. Adjust the file entries to use the right extension. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27bcachefs: check_path() now only needs to walk up to subvolume rootKent Overstreet
Now that checking subvolume structure is a separate pass, the main check_directory_connectivity() pass only needs to walk up to a given inode's subvolume root. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27bcachefs: bch2_check_subvolume_structure()Kent Overstreet
Now that we've got bch_subvolume.fs_path_parent, it's easy to write subvolume Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27bcachefs: omit alignment attribute on big endian struct bkeyThomas Bertschinger
This is needed for building Rust bindings on big endian architectures like s390x. Currently this is only done in userspace, but it might happen in-kernel in the future. When creating a Rust binding for struct bkey, the "packed" attribute is needed to get a type with the correct member offsets in the big endian case. However, rustc does not allow types to have both a "packed" and "align" attribute. Thus, in order to get a Rust type compatible with the C type, we must omit the "aligned" attribute in C. This does not affect the struct's size or member offsets, only its toplevel alignment, which should be an acceptable impact. The little endian version can have the "align" attribute because the "packed" attr is redundant, and rust-bindgen will omit the "packed" attr when an "align" attr is present and it can do so without changing a type's layout Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27bcachefs: bch2_trigger_alloc() handles state changes betterKent Overstreet
bch2_trigger_alloc() kicks off certain tasks on bucket state changes; e.g. triggering the bucket discard worker and the invalidate worker. We've observed the discard worker running too often - most runs it doesn't do any work, according to the tracepoint - so clearly, we're kicking it off too often. This adds an explicit statechange() macro to make these checks more precise. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27bcachefs: skip invisible entries in empty subvolume checkingGuoyu Ou
When we are checking whether a subvolume is empty in the specified snapshot, entries that do not belong to this subvolume should be skipped. This fixes the following case: $ bcachefs subvolume create ./sub $ cd sub $ bcachefs subvolume create ./sub2 $ bcachefs subvolume snapshot . ./snap $ ls -a snap . .. $ rmdir snap rmdir: failed to remove 'snap': Directory not empty As Kent suggested, we pass 0 in may_delete_deleted_inode() to ignore subvols in the subvol we are checking, because inode.bi_subvol is only set on subvolume roots, and we can't go through every inode in the subvolume and change bi_subvol when taking a snapshot. It makes the check less strict, but that's ok, the rest of fsck will still catch it. Signed-off-by: Guoyu Ou <benogy@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27thread_with_file: Fix missing va_end()Kent Overstreet
Fixes: https://lore.kernel.org/linux-bcachefs/202402131603.E953E2CF@keescook/T/#u Reported-by: coverity scan Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27thread_with_file: allow ioctls against these filesDarrick J. Wong
Make it so that a thread_with_stdio user can handle ioctls against the file descriptor. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27thread_with_file: create ops structure for thread_with_stdioDarrick J. Wong
Create an ops structure so we can add more file-based functionality in the next few patches. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27thread_with_file: fix various printf problemsDarrick J. Wong
Experimentally fix some problems with stdio_redirect_vprintf by creating a MOO variant with which we can experiment. We can't do a GFP_KERNEL allocation while holding the spinlock, and I don't like how the printf function can silently truncate the output if memory allocation fails. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27thread_with_file: allow creation of readonly filesDarrick J. Wong
Create a new run_thread_with_stdout function that opens a file in O_RDONLY mode so that the kernel can write things to userspace but userspace cannot write to the kernel. This will be used to convey xfs health event information to userspace. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27bcachefs: bch2_print_opts()Kent Overstreet
Make sure early error messages get redirected, for kernel-fsck-from-userland. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27bcachefs: Improve error messages in device remove pathKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27bcachefs: Use kvzalloc() when dynamically allocating btree pathsKent Overstreet
THis silences a mm/page_alloc.c warning about allocating more than a page with GFP_NOFAIL - and there's no reason for this to not have a vmalloc fallback anyways. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27bcachefs: Track iter->ip_allocated at bch2_trans_copy_iter()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27bcachefs: Save key_cache_path in peek_slot()Kent Overstreet
When bch2_btree_iter_peek_slot() clones the iterator to search for the next key, and then discovers that the key from the cloned iterator is the key we want to return - we also want to save the iter->key_cache_path as well, for the update path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27bcachefs: Pin btree cache in ram for random access in fsckKent Overstreet
Various phases of fsck involve checking references from one btree to another: this means doing a sequential scan of one btree, and then mostly random access into the second. This is particularly painful for checking extents <-> backpointers; we can prefetch btree node access on the sequential scan, but not on the random access portion, and this is particularly painful on spinning rust, where we'd like to keep the pipeline fairly full of btree node reads so that the elevator can reduce seeking. This patch implements prefetching and pinning of the portion of the btree that we'll be doing random access to. We already calculate how much of the random access btree will fit in memory so it's a fairly straightforward change. This will put more pressure on system memory usage, so we introduce a new option, fsck_memory_usage_percent, which is the percentage of total system ram that fsck is allowed to pin. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27bcachefs: Check for subvolume children when deleting subvolumesKent Overstreet
Recursively destroying subvolumes isn't allowed yet. Fixes: https://github.com/koverstreet/bcachefs/issues/634 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27bcachefs: BTREE_ID_subvolume_childrenKent Overstreet
Add a btree to record a parent -> child subvolume relationships, according to the filesystem heirarchy. The subvolume_children btree is a bitset btree: if a bit is set at pos p, that means p.offset is a child of subvolume p.inode. This will be used for efficiently listing subvolumes, as well as recursive deletion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27bcachefs: bch_subvolume::fs_path_parentKent Overstreet
Record the filesystem path heirarchy for subvolumes in bch_subvolume Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-27bcachefs: bch2_btree_bit_mod()Kent Overstreet
Provide a non-write buffer version of bch2_btree_bit_mod_buffered(), for the subvolume children btree. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>