summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2025-04-06bcachefs: Improve opts.degradedbcachefs-testing-rebasedKent Overstreet
Kill 'opts.very_degraded', and make 'opts.degraded' a persistent option, stored in the superblock. It's now an enum, with available choices ask/yes/very/no. "ask" mode will be handled by the mount helper, for prompting the user (on a machine used interactively) for whether to do a degraded mount. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: Fix duplicate "ro,read_only" in opts at startupKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: split error messages of invalid compression into two linesIntegral
When an invalid compression type or level is passed as an argument to `--compression`, two error messages are squashed into one line: > bcachefs format --compression=lzo bcachefs-comp.img invalid option: invalid compression typecompression: parse error > bcachefs format --compression=lz4:16 bcachefs-comp.img invalid option: invalid compression levelcompression: parse error To resolve this issue, add a newline character at the end of the first error message to separate them into two lines. Signed-off-by: Integral <integral@archlinuxcn.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: early return for negative values when parsing BCH_OPT_UINTIntegral
Currently, when passing a negative integer as argument, the error message is "too big" due to casting to an unsigned integer: > bcachefs format --block_size=-1 bcachefs.img invalid option: block_size: too big (max 65536) When negative value in argument detected, return early before calling bch2_opt_validate(). A new error code `BCH_ERR_option_negative` is added. Signed-off-by: Integral <integral@archlinuxcn.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: Add a feature bit for indicating fs has no alloc info WIPKent Overstreet
If a filesystem is going to only be used read-only, and will be a deployable image, we can strip out alloc info for a substantial reduction in metadata size - around half, due to backpointers. Alloc info will be regenerated on first read-write mount. Remounting RW is disallowed for now, since we don't yet have check_allocations running in RW mode. XXX make scrub work (no repair) XXX look at mount wall clock time/profile XXX we also really don't want to be pinning btree roots in memory anymore, since a lot of them aren't used Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: Use sort_nonatomic() instead of sort()Kent Overstreet
Fixes "task out to lunch" warnings during recovery on large machines with lots of dirty data in the journal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: Read retries are after checksum errors now REQ_FUAKent Overstreet
REQ_FUA means "skip the drive cache", and it can be used with reads to. If there was a checksum error, we want to retry the whole read path, not read it from cache again. Suggested-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: read_fua_testKent Overstreet
Add a sysfs attribute for checking whether read fua appears to behave properly on a device. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: RO mounts now use less memoryKent Overstreet
Defer memory allocations only needed in RW mode until we actually go RW. This is part of improved support for RO images. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: Move various init code to _init_early()Kent Overstreet
_init_early() is for initialization that cannot fail, and often must happen for teardown partway through initialization to work. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: alphabetize init function callsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: simplify journal pin initializationKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: btree_io_complete_wq -> btree_write_complete_wqKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: Single device modeKent Overstreet
This provides a new option for single device mode - which allows multiple filesystems with the same UUID to be mounted simultaneously. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: Initialize c->name earlier on single dev filesystemsKent Overstreet
On single device filesystems, c->name contains the block device name, not the UUID. Initialize this earlier, so that single device mode can use it for initializing sysfs/debugfs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: bch2_kvmalloc() mem alloc profilingKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: add missing includeKent Overstreet
Hygeine, and fix build in userspace. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: bch2_snapshot_table_make_room()Kent Overstreet
Add a better helper for check_snapshot_exists(). create_snapids() can't be changed to use this, unfortunately, because the transaction that creates new snapshot will also be inserting other keys (e.g. root inode) that reference that snapshot ID, and they expect the snapshot table to already be updated. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: darray: provide typedefs for primitive typesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: reduce new_stripe_alloc_buckets() stack usageKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: alloc_request no longer on stackKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: alloc_request.ptrs2Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: alloc_request.caKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: alloc_request.countersKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: alloc_request.usageKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: alloc_request: deallocate_extra_replicas()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: new_stripe_alloc_buckets() takes alloc_requestKent Overstreet
More stack usage improvements: instead of creating a new alloc_request (currently on the stack), save/restore just the fields we need to reuse. This is a bit tricky, because we're doing a normal alloc_foreground.c allocation, which calls into ec.c to get a stripe, which then does more normal allocations - some of the fields get reused, and used differently. So we have to save and restore them - but the stack usage improvements will be well worth it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: bch2_ec_stripe_head_get() takes alloc_requestKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: bch2_bucket_alloc_trans() takes alloc_requestKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: alloc_request.data_typeKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: struct alloc_requestKent Overstreet
Add a struct for common state for satisfying an on disk allocation, instead of passing the same long list of items to every function. This will help with stack usage, performance, and perhaps enable some code cleanups. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: trace bch2_trans_kmalloc()Kent Overstreet
We're occasionally seeing the WARN_ON() for bump allocator usage exceeding BTREE_TRANS_MEM_MAX; add some tracing so we can see what's going on. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: replace memcpy with memcpy_and_pad for jset_entry_log->d buffRoxana Nicolescu
This was achieved before by zero-ing out the source buffer and then copying the bytes into the destination buffer. This can also be done with memcpy_and_pad which will zero out only the destination buffer if its size is bigger than the size of the source buffer. This is already used in the same way in journal_transaction_name(). Moreover, zero-ing the source buffer was done twice, first in __bch2_fs_log_msg() and then in bch2_trans_log_msg(). And this method may also require allocating some extra memory for the source buffer. In conclusion, using memcpy_and_pad is better even tough the result is the same because it brings uniformity with what's already used in journal_transaction_name, it avoids code duplication and reallocating extra memory. Signed-off-by: Roxana Nicolescu <nicolescu.roxana@protonmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: replace strncpy() with memcpy_and_pad in journal_transaction_nameRoxana Nicolescu
Strncpy is now deprecated. The buffer destination is not required to be NULL-terminated, but we also want to zero out the rest of the buffer as it is already done in other places. Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Roxana Nicolescu <nicolescu.roxana@protonmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: Rebalance now skips poisoned extentsKent Overstreet
Let's not move poisoned extents unnecessarily, since we can't guard against introducing more bitrot. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: Data move can read from poisoned extentsKent Overstreet
Now, if an extent is poisoned we can move it even if there was a checksum error. We'll have to give it a new checksum, but the poison bit means that userspace will still see the appropriate error when they try to read it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: Poison extents that can't be read due to checksum errorsKent Overstreet
Copygc needs to be able to move extents that have bitrotted. We don't want to delete them - in the future we'll have an API for "read me the data even if there's checksum errors", and in general we don't want to delete anything unless the user asks us to. That will require writing it with a new checksum, which means we can't forget that there was a checksum error so we return the correct error to userspace. Rebalance also wants to skip bad extents; we can now use the poison flag for that. This is currently disabled by default, as we want read fua support so that we can distinguish between transient and permanent errors from the device. It may be enabled with the module parameter: poison_extents_on_checksum_error Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: Be precise about bch_io_failuresKent Overstreet
If the extent we're reading from changes, due to be being overwritten or moved (possibly partially) - we need to reset bch_io_failures so that we don't accidentally mark a new extent as poisoned prematurely. This means we have to separately track (in the retry path) the extent we previously read from. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: bch2_subvolume_wait_for_pagecache_and_delete() cleanupKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: Fix UAF in bchfs_read()Kent Overstreet
Commit 3ba0240a8789 fixed a bug in the read retry path in __bch2_read(), and changed bchfs_read() to match - to avoid a landmine if bch2_read_extent() ever starts returning transaction restarts. But that was incorrect, because bchfs_read() doesn't use a separate stack allocated bvec_iter, it uses the one in the rbio being submitted. Add a comment explaining the issue, and revert the buggy change. Fixes: 3ba0240a8789 ("bcachefs: Fix silent short reads in data read retry path") Reported-by: syzbot+2deb10b8dc9aae6fab67@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: Use cpu_to_le16 for dirent lengthsGabriel Shahrouzi
Prevent incorrect byte ordering for big-endian systems. Signed-off-by: Gabriel Shahrouzi <gshahrouzi@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: Fix type for parameter in journal_advance_devs_to_next_bucketGabriel Shahrouzi
Replace u64 with __le64 to match the expected parameter type. Ensure consistency both in function calls and within the function itself. Signed-off-by: Gabriel Shahrouzi <gshahrouzi@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06bcachefs: Fix escape sequence in prt_printfGabriel Shahrouzi
Remove backslash before format specifier. Ensure correct output. Signed-off-by: Gabriel Shahrouzi <gshahrouzi@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-06Merge tag 'sched-urgent-2025-04-06' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: - Fix a nonsensical Kconfig combination - Remove an unnecessary rseq-notification * tag 'sched-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rseq: Eliminate useless task_work on execve sched/isolation: Make CONFIG_CPU_ISOLATION depend on CONFIG_SMP
2025-04-05treewide: Switch/rename to timer_delete[_sync]()Thomas Gleixner
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree over and remove the historical wrapper inlines. Conversion was done with coccinelle plus manual fixups where necessary. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-04-04Merge tag '6.15-rc-part2-smb3-client-fixes' of ↵Linus Torvalds
git://git.samba.org/sfrench/cifs-2.6 Pull more smb client updates from Steve French: - reconnect fixes: three for updating rsize/wsize and an SMB1 reconnect fix - RFC1001 fixes: fixing connections to nonstandard ports, and negprot retries - fix mfsymlinks to old servers - make mapping of open flags for SMB1 more accurate - permission fixes: adding retry on open for write, and one for stat to workaround unexpected access denied - add two new xattrs, one for retrieving SACL and one for retrieving owner (without having to retrieve the whole ACL) - fix mount parm validation for echo_interval - minor cleanup (including removing now unneeded cifs_truncate_page) * tag '6.15-rc-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: update internal version number cifs: Implement is_network_name_deleted for SMB1 cifs: Remove cifs_truncate_page() as it should be superfluous cifs: Do not add FILE_READ_ATTRIBUTES when using GENERIC_READ/EXECUTE/ALL cifs: Improve SMB2+ stat() to work also without FILE_READ_ATTRIBUTES cifs: Add fallback for SMB2 CREATE without FILE_READ_ATTRIBUTES cifs: Fix querying and creating MF symlinks over SMB1 cifs: Fix access_flags_to_smbopen_mode cifs: Fix negotiate retry functionality cifs: Improve handling of NetBIOS packets cifs: Allow to disable or force initialization of NetBIOS session cifs: Add a new xattr system.smb3_ntsd_owner for getting or setting owner cifs: Add a new xattr system.smb3_ntsd_sacl for getting or setting SACLs smb: client: Update IO sizes after reconnection smb: client: Store original IO parameters and prevent zero IO sizes smb:client: smb: client: Add reverse mapping from tcon to superblocks cifs: remove unreachable code in cifs_get_tcp_session() cifs: fix integer overflow in match_server()
2025-04-03Merge tag 'v6.15rc-part2-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: "Four ksmbd SMB3 server fixes, all also for stable" * tag 'v6.15rc-part2-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: fix null pointer dereference in alloc_preauth_hash() ksmbd: validate zero num_subauth before sub_auth is accessed ksmbd: fix overflow in dacloffset bounds check ksmbd: fix session use-after-free in multichannel connection
2025-04-03fs: actually hold the namespace semaphoreChristian Brauner
Don't use a scoped guard that only protects the next statement. Use a regular guard to make sure that the namespace semaphore is held across the whole function. Signed-off-by: Christian Brauner <brauner@kernel.org> Reported-by: Leon Romanovsky <leon@kernel.org> Link: https://lore.kernel.org/all/20250401170715.GA112019@unreal/ Fixes: db04662e2f4f ("fs: allow detached mounts in clone_private_mount()") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-04-03Merge tag 'bcachefs-2025-04-03' of git://evilpiepirate.org/bcachefsLinus Torvalds
Pull more bcachefs updates from Kent Overstreet: "More notable fixes: - Fix for striping behaviour on tiering filesystems where replicas exceeds durability on destination target - Fix a race in device removal where deleting alloc info races with the discard worker - Some small stack usage improvements: this is just enough for KMSAN builds to not blow the stack, more is queued up for 6.16" * tag 'bcachefs-2025-04-03' of git://evilpiepirate.org/bcachefs: bcachefs: Fix "journal stuck" during recovery bcachefs: backpointer_get_key: check for null from peek_slot() bcachefs: Fix null ptr deref in invalidate_one_bucket() bcachefs: Fix check_snapshot_exists() restart handling bcachefs: use nonblocking variant of print_string_as_lines in error path bcachefs: Fix scheduling while atomic from logging changes bcachefs: Add error handling for zlib_deflateInit2() bcachefs: add missing selection of XARRAY_MULTI bcachefs: bch_dev_usage_full bcachefs: Kill btree_iter.trans bcachefs: do_trace_key_cache_fill() bcachefs: Split up bch_dev.io_ref bcachefs: fix ref leak in btree_node_read_all_replicas bcachefs: Fix null ptr deref in bch2_write_endio() bcachefs: Fix field spanning write warning bcachefs: Fix striping behaviour
2025-04-03Merge tag '9p-for-6.15-rc1' of https://github.com/martinetd/linuxLinus Torvalds
Pull 9p updates from Dominique Martinet: - fix handling of bogus (negative/too long) replies - fix crash on mkdir with ACLs (... looks like nobody is using ACLs with semi-recent kernels...) - ipv6 support for trans=tcp - minor concurrency fix to make syzbot happy - minor cleanup * tag '9p-for-6.15-rc1' of https://github.com/martinetd/linux: docs: fs/9p: Add missing "not" in cache documentation 9p: Use hashtable.h for hash_errmap Documentation/fs/9p: fix broken link 9p/trans_fd: mark concurrent read and writes to p9_conn->err 9p/net: return error on bogus (longer than requested) replies 9p/net: fix improper handling of bogus negative read/write replies fs/9p: fix NULL pointer dereference on mkdir net/9p/fd: support ipv6 for trans=tcp