summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2023-09-08bcachefs: six locks: Guard against wakee exiting in __six_lock_wakeup()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-09-08bcachefs: Don't open code closure_nr_remaining()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-09-08bcachefs: Fix lifetime in bch2_write_done(), add assertionKent Overstreet
We're hunting for an open_bucket leak, add an assertion to help track it down: also, we can't use the bch_fs after dropping our write ref to it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-09-08bcachefs: Add a comment for should_drop_open_bucket()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-09-08bcachefs: six locks: Fix missing barrier on wait->lock_acquiredKent Overstreet
Six locks do lock handoff via the wakeup path: the thread doing the wakeup also takes the lock on behalf of the waiter, which means the waiter only has to look at its waitlist entry, and doesn't have to touch the lock cacheline while another thread is using it. Linus noticed that this needs a real barrier, which this patch fixes. Also add a comment for the should_sleep_fn() error path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: linux-bcachefs@vger.kernel.org Cc: linux-kernel@vger.kernel.org
2023-09-08bcachefs: Check for directories in deleted inodes btreeKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-09-08bcachefs: Add btree_trans* to inode_set_fnJoshua Ashton
This will be used when we need to re-hash a directory tree when setting flags. It is not possible to have concurrent btree_trans on a thread. Signed-off-by: Joshua Ashton <joshua@froggi.es>
2023-09-08bcachefs: Improve bch2_write_points_to_text()Kent Overstreet
Now we also print the open_buckets owned by each write_point - this is to help with debugging a shutdown hang. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-09-08bcachefs: Fix check_version_upgrade()Kent Overstreet
We were failing to upgrade to the latest compatible version - whoops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-09-08bcachefs: Fix 'journal not marked as containing replicas'Kent Overstreet
This fixes the replicas_write_errors test: the patch bcachefs: mark journal replicas before journal write submission partially fixed replicas marking for the journal, but it broke the case where one replica failed - this patch re-adds marking after the journal write completes, when we know how many replicas succeeded. Additionally, we do not consider it a fsck error when the very last journal entry is not correctly marked, since there is an inherent race there. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28fs/aio: obey min_nr when doing wakeupsKent Overstreet
I've been observing workloads where IPIs due to wakeups in aio_complete() are ~15% of total CPU time in the profile. Most of those wakeups are unnecessary when completion batching is in use in io_getevents(). This plumbs min_nr through via the wait eventry, so that aio_complete() can avoid doing unnecessary wakeups. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Cc: Benjamin LaHaise <bcrl@kvack.org Cc: linux-aio@kvack.org Cc: linux-fsdevel@vger.kernel.org
2023-08-28bcachefs: add counters for failed shrinker reclaimDaniel Hill
These counters should help us debug OOM issues. Signed-off-by: Daniel Hill <daniel@gluo.nz>
2023-08-28bcachefs: shrinker.to_text() methodsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28xfs: add nodataio mount option to skip all data I/OBrian Foster
When mounted with nodataio, add the NOSUBMIT iomap flag to all data mappings passed into the iomap layer. This causes iomap to skip all data I/O submission and thus facilitates metadata only performance testing. For experimental use only. Only tested insofar as fsstress runs for a few minutes without blowing up. Signed-off-by: Brian Foster <bfoster@redhat.com>
2023-08-28iomap: add nosubmit flag to skip data I/O on iomap mappingBrian Foster
Implement a quick and dirty hack to skip data I/O submission on a specified mapping. The iomap layer will still perform every step up through constructing the bio as if it will be submitted, but instead invokes completion on the bio directly from submit context. The purpose of this is to facilitate filesystem metadata performance testing without the overhead of actual data I/O. Note that this may be dangerous in current form in that folios are not explicitly zeroed where they otherwise wouldn't be, so whatever previous data exists in a folio prior to being added to a read bio is mapped into pagecache for the file. Signed-off-by: Brian Foster <bfoster@redhat.com>
2023-08-28vfs: inode cache conversion to hash-blDave Chinner
Because scalability of the global inode_hash_lock really, really sucks. 32-way concurrent create on a couple of different filesystems before: - 52.13% 0.04% [kernel] [k] ext4_create - 52.09% ext4_create - 41.03% __ext4_new_inode - 29.92% insert_inode_locked - 25.35% _raw_spin_lock - do_raw_spin_lock - 24.97% __pv_queued_spin_lock_slowpath - 72.33% 0.02% [kernel] [k] do_filp_open - 72.31% do_filp_open - 72.28% path_openat - 57.03% bch2_create - 56.46% __bch2_create - 40.43% inode_insert5 - 36.07% _raw_spin_lock - do_raw_spin_lock 35.86% __pv_queued_spin_lock_slowpath 4.02% find_inode Convert the inode hash table to a RCU-aware hash-bl table just like the dentry cache. Note that we need to store a pointer to the hlist_bl_head the inode has been added to in the inode so that when it comes to unhash the inode we know what list to lock. We need to do this because the hash value that is used to hash the inode is generated from the inode itself - filesystems can provide this themselves so we have to either store the hash or the head pointer in the inode to be able to find the right list head for removal... Same workload after: Signed-off-by: Dave Chinner <dchinner@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: linux-fsdevel@vger.kernel.org
2023-08-28vfs: factor out inode hash head calculationDave Chinner
In preparation for changing the inode hash table implementation. Signed-off-by: Dave Chinner <dchinner@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: linux-fsdevel@vger.kernel.org
2023-08-28six locks: Lock contended tracepointsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28mean and variance: Promote to lib/mathKent Overstreet
This promotes mean_and_variance from bcachefs to lib/math, so it can be used by other things. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28Increase MAX_LOCK_DEPTH, bcachefs BTREE_ITER_MAX (do not upstream)Kent Overstreet
2023-08-28bcachefs: Switch to lockdep_set_no_check_recursion()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28bcachefs: Check for btree locks held on transaction initKent Overstreet
Ideally we would disallow multiple btree_trans being initialized within the same process - and hopefully we will at some point, the stack usage is excessive - but for now there are a couple places where we do this: - transaction commit error path -> journal reclaim - btree key cache flush - move data path -> do_pending_writes -> write path -> bucket allocation (in the near future when bucket allocation switches to using a freespace btree) In order to avoid deadlocking the first btree_trans must have been unlocked with bch2_trans_unlock() before using the second btree_trans - this patch adds an assertion to bch2_trans_init() that verifies that this has been done when lockdep is enabled. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-08-28bcachefs: Use lock_class_is_held() for btree locking assertsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28bcachefs: Assert that btree node locks aren't being leakedKent Overstreet
This asserts (when lockdep is enabled) that btree locks aren't held when exiting a btree_trans. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-08-28bcachefs: btree_journal_iter.cKent Overstreet
Split out a new file from recovery.c for managing the list of keys we read from the journal: before journal replay finishes the btree iterator code needs to be able to iterate over and return keys from the journal as well, so there's a fair bit of code here. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28bcachefs: sb-clean.cKent Overstreet
Pull code for bch_sb_field_clean out into its own file. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28bcachefs: Move bch_sb_field_crypt code to checksum.cKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28bcachefs: sb-members.cKent Overstreet
Split out a new file for bch_sb_field_members - we'll likely want to move more code here in the future. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28bcachefs: Split up btree_update_leaf.cKent Overstreet
We now have btree_trans_commit.c btree_update.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28bcachefs: Split up fs-io.[ch]Kent Overstreet
fs-io.c is too big - time for some reorganization - fs-dio.c: direct io - fs-pagecache.c: pagecache data structures (bch_folio), utility code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28bcachefs: Fix assorted checkpatch nitsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28bcachefs: Fix for sb buffer being misalignedKent Overstreet
On old kernels, kmalloc() may return an allocation that's not naturally aligned - this resulted in a bug where we allocated a bio with not enough biovecs. Fix this by using buf_pages(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28bcachefs: Convert journal validation to bkey_invalid_flagsKent Overstreet
This fixes a bug where we were already passing bkey_invalid_flags around, but treating the parameter as just read/write - so the compat code wasn't being run correctly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28bcachefs: Improve journal_entry_err_msg()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28bcachefs: BCH_COMPAT_bformat_overflow_done no longer requiredKent Overstreet
Awhile back, we changed bkey_format generation to ensure that the packed representation could never represent fields larger than the unpacked representation. This was to ensure that bkey_packed_successor() always gave a sensible result, but in the current code bkey_packed_successor() is only used in a debug assertion - not for anything important. This kills the requirement that we've gotten rid of those weird bkey formats, and instead changes the assertion to check if we're dealing with an old weird bkey format. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28bcachefs: kill EBUG_ON() redefinition in bkey.cKent Overstreet
our debug mode assertions in bkey.c haven't been getting run, whoops Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28bcachefs: Add logging to bch2_inode_peek() & relatedKent Overstreet
Add error messages when we fail to lookup an inode, and also add a few missing bch2_err_class() calls. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28bcachefs: Fix lock thrashing in __bchfs_fallocate()Kent Overstreet
We've observed significant lock thrashing on fstests generic/083 in fallocate, due to dropping and retaking btree locks when checking the pagecache for data. This adds a nonblocking mode to bch2_clamp_data_hole(), where we only use folio_trylock(), and can thus be used safely while btree locks are held - thus we only have to drop btree locks as a fallback, on actual lock contention. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28bcachefs: Fix for bch2_copygc() spuriously returning -EEXISTKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28bcachefs: Convert btree_err_type to normal error codesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28bcachefs: Fix btree_err() macroKent Overstreet
Error code wasn't being propagated correctly, change it to match fsck_err() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28bcachefs: Ensure topology repair runsKent Overstreet
This fixes should_restart_for_topology_repair() - previously it was returning false if the btree io path had already seleceted topology repair to run, even if it hadn't run yet. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28bcachefs: Log a message when running an explicit recovery passKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28bcachefs: Print out required recovery passes on version upgradeKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28bcachefs: Fix shift by 64 in set_inc_field()Kent Overstreet
UBSAN was complaining about a shift by 64 in set_inc_field(). This only happened when the value being shifted was 0, so in theory should be harmless - a shift by 64 (or register width) should logically give a result of 0, but CPUs will in practice leave the input unchanged when the number of bits to shift by wraps - and since our input here is 0, the output is still what we want. But, it's still undefined behaviour and we need our UBSAN output to be clean, so it needs to be fixed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28bcachefs: bkey_format helper improvementsKent Overstreet
- add a to_text() method for bkey_format - convert bch2_bkey_format_validate() to modern error message style, where we pass a printbuf for the error string instead of returning a static string Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28bcachefs: bcachefs_metadata_version_deleted_inodesKent Overstreet
Add a new bitset btree for inodes pending deletion; this means we no longer have to scan the full inodes btree after an unclean shutdown. Specifically, this adds: - a trigger to update the deleted_inodes btree based on changes to the inodes btree - a new recovery pass - and check_inodes is now only a fsck pass. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28bcachefs: Fix folio leak in folio_hole_offset()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28bcachefs: Fix overlapping extent repairKent Overstreet
A number of smallish fixes for overlapping extent repair, and (part of) a new unit test. This fixes all the issues turned up by bhzhu203, in his filesystem image from running mongodb + snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28bcachefs: In debug mode, run fsck again after fixing errorsKent Overstreet
We want to ensure that fsck actually fixed all the errors it found - the second fsck run should be clean. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>