binx/bcachefs.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2023-09-13	bcachefs: Heap allocate btree_trans	Kent Overstreet
	We're using more stack than we'd like in a number of functions, and btree_trans is the biggest object that we stack allocate. But we have to do a heap allocatation to initialize it anyways, so there's no real downside to heap allocating the entire thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-09-13	bcachefs: Fix W=12 build errors	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-09-12	bcachefs: Fix -Wformat in bch2_btree_key_cache_to_text()	Nathan Chancellor
	When building bcachefs for 32-bit ARM, there is a compiler warning in bch2_btree_key_cache_to_text() due to use of an incorrect format specifier: fs/bcachefs/btree_key_cache.c:1060:36: error: format specifies type 'size_t' (aka 'unsigned int') but the argument has type 'long' [-Werror,-Wformat] 1060 \| prt_printf(out, "nr_freed:\t%zu", atomic_long_read(&c->nr_freed)); \| ~~~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \| %ld fs/bcachefs/util.h:223:54: note: expanded from macro 'prt_printf' 223 \| #define prt_printf(_out, ...) bch2_prt_printf(_out, __VA_ARGS__) \| ^~~~~~~~~~~ 1 error generated. On 64-bit architectures, size_t is 'unsigned long', so there is no warning when using %zu but on 32-bit architectures, size_t is 'unsigned int'. Use '%lu' to match the other format specifiers used in this function for printing values returned from atomic_long_read(). Fixes: 6d799930ce0f ("bcachefs: btree key cache pcpu freedlist") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-09-09	bcachefs: Fix silent enum conversion error	Kent Overstreet
	This changes mark_btree_node_locked() to take an enum btree_node_locked_type, not a six_lock_type, since BTREE_NODE_UNLOCKED is -1 which may cause problems converting back and forth to six_lock_type if short enums are in use. With this change, we never store BTREE_NODE_UNLOCKED in a six_lock_type enum. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: shrinker.to_text() methods	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: mark bch_inode_info and bkey_cached as reclaimable	Mikulas Patocka
	Mark these caches as reclaimable, so that available memory is correctly reported when there is a lot of cached inodes. Note that more work is needed - you should add __GFP_RECLAIMABLE to some of the kmalloc calls, so that they are allocated from the "kmalloc-rcl-*" caches. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2023-08-28	bcachefs: Add new assertions for shutdown path	Kent Overstreet
	We've been seeing assertions pop that indicate the btree node cache or key cache have dirty items when we just did a clean shutdown. Add some more assertions so we can catch this when we're dirtying items. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: Kill BTREE_INSERT_USE_RESERVE	Kent Overstreet
	Now that we have journal watermarks and alloc watermarks unified, BTREE_INSERT_USE_RESERVE is redundant and can be deleted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: Kill JOURNAL_WATERMARK	Kent Overstreet
	This unifies JOURNAL_WATERMARK with BCH_WATERMARK; we're working towards specifying watermarks once in the transaction commit path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: unregister_shrinker() now safe on not-registered shrinker	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: allocate_dropping_locks()	Kent Overstreet
	Add two new helpers for allocating memory with btree locks held: The idea is to first try the allocation with GFP_NOWAIT\|__GFP_NOWARN, then if that fails - unlock, retry with GFP_KERNEL, and then call trans_relock(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	six locks: Kill six_lock_state union	Kent Overstreet
	As suggested by Linus, this drops the six_lock_state union in favor of raw bitmasks. On the one hand, bitfields give more type-level structure to the code. However, a significant amount of the code was working with six_lock_state as a u64/atomic64_t, and the conversions from the bitfields to the u64 were deemed a bit too out-there. More significantly, because bitfield order is poorly defined (#ifdef __LITTLE_ENDIAN_BITFIELD can be used, but is gross), incrementing the sequence number would overflow into the rest of the bitfield if the compiler didn't put the sequence number at the high end of the word. The new code is a bit saner when we're on an architecture without real atomic64_t support - all accesses to lock->state now go through atomic64_*() operations. On architectures with real atomic64_t support, we additionally use atomic bit ops for setting/clearing individual bits. Text size: 7467 bytes -> 4649 bytes - compilers still suck at bitfields. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	six locks: Kill six_lock_pcpu_(alloc\|free)	Kent Overstreet
	six_lock_pcpu_alloc() is an unsafe interface: it's not safe to allocate or free the percpu reader count on an existing lock that's in use, the only safe time to allocate percpu readers is when the lock is first being initialized. This patch adds a flags parameter to six_lock_init(), and instead of six_lock_pcpu_free() we now expose six_lock_exit(), which does the same thing but is less likely to be misused. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: bch2_bkey_get_iter() helpers	Kent Overstreet
	Introduce new helpers for a common pattern: bch2_trans_iter_init(); bch2_btree_iter_peek_slot(); - bch2_bkey_get_iter_type() returns -ENOENT if it doesn't find a key of the correct type - bch2_bkey_get_val_typed() copies the val out of the btree to a (typically stack allocated) variable; it handles the case where the value in the btree is smaller than the current version of the type, zeroing out the remainder. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: Private error codes: ENOMEM	Kent Overstreet
	This adds private error codes for most (but not all) of our ENOMEM uses, which makes it easier to track down assorted allocation failures. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: don't bump key cache journal seq on nojournal commits	Brian Foster
	fstest generic/388 occasionally reproduces corruptions where an inode has extents beyond i_size. This is a deliberate crash and recovery test, and the post crash+recovery characteristics are usually the same: the inode exists on disk in an early (i.e. just allocated) state based on the journal sequence number associated with the inode. Subsequent inode updates exist in the journal at higher sequence numbers, but the inode hadn't been written back before the associated crash and the post-crash recovery processes a set of journal sequence numbers that doesn't include updates to the inode. In fact, the sequence with the most recent inode key update always happens to be the sequence just before the front of the journal processed by recovery. This last bit is a significant hint that the problem relates to an on-disk journal update of the front of the journal. The root cause of this problem is basically that the inode is updated (multiple times) in-core and in the key cache, each time bumping the key cache sequence number used to control the cache flush. The cache flush skips one or more times, bumping the associated key cache journal pin to the key cache seq value. This has a side effect of holding the inode in memory a bit longer than normal, which helps exacerbate this problem, but is also unsafe in certain cases where the key cache seq may have been updated by a transaction commit that didn't journal the associated key. For example, consider an inode that has been allocated, updated several times in the key cache, journaled, but not yet written back. At this stage, everything should be consistent if the fs happens to crash because the latest update has been journal. Now consider a key update via bch2_extent_update_i_size_sectors() that uses the BTREE_UPDATE_NOJOURNAL flag. While this update may not change inode state, it can have the side effect of bumping ck->seq in bch2_btree_insert_key_cached(). In turn, if a subsequent key cache flush skips due to seq not matching the former, the ck->journal pin is updated to ck->seq even though the most recent key update was not journaled. If this pin happens to reside at the front (tail) of the journal, this means a subsequent journal write can update last_seq to a value beyond that which includes the most recent update to the inode. If this occurs and the fs happens to crash before the inode happens to flush, recovery will see the latest last_seq, fail to recover the inode and leave the inode in the inconsistent state described above. To avoid this problem, skip the key cache seq update on NOJOURNAL commits, except on initial pin add. Pass the insert entry directly to bch2_btree_insert_key_cached() to make the associated flag available and be consistent with btree_insert_key_leaf(). Signed-off-by: Brian Foster <bfoster@redhat.com>
2023-08-28	bcachefs: Drop some anonymous structs, unions	Kent Overstreet
	Rust bindgen doesn't cope well with anonymous structs and unions. This patch drops the fancy anonymous structs & unions in bkey_i that let us use the same helpers for bkey_i and bkey_packed; since bkey_packed is an internal type that's never exposed to outside code, it's only a minor inconvenienc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: Centralize btree node lock initialization	Kent Overstreet
	This fixes some confusion in the lockdep code due to initializing btree node/key cache locks with the same lockdep key, but different names. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: Kill trans->flags	Kent Overstreet
	Recursive transaction commits are occasionally necessary - in particular, for the upcoming btree write buffer's flush path. This avoids bugs due to trans->flags being accidentally mutated mid-commit, which can cause c->writes refcount leaks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: Don't call bch2_journal_pin_drop() under key cache lock	Kent Overstreet
	This fixes a (harmless) lockdep splat, due to a lock order violation in the key cache exit path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: Use six_lock_ip()	Kent Overstreet
	This uses the new _ip() interface to six locks and hooks it up to btree_path->ip_allocated, when available. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: Don't emit tracepoints for expected events	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: key cache: Don't hold btree locks while using GFP_RECLAIM	Kent Overstreet
	This is something we need to do more widely: instead of bothering with GFP_NOIO/GFP_NOFS, if we need to allocate memory while holding locks: - first attempt the allocation with GFP_NOWAIT - if that fails, drop btree locks with bch2_trans_unlock(), then retry with GFP_KERNEL. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: Improve bkey_cached_lock_for_evict()	Kent Overstreet
	We don't need a write lock to check if a key is dirty. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: Fix a livelock in key cache fill path	Kent Overstreet
	We weren't setting path->uptodate before calling bch2_btree_key_cache_fill() - which causes __bch2_btree_path_upgrade() to fail. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: Key cache now works for snapshots btrees	Kent Overstreet
	This switches btree_key_cache_fill() to use a btree iterator, not a btree path, so that it can search for keys in previous snapshots. We also add another iterator flag, BTREE_ITER_KEY_CACHE_FILL, to avoid recursion back into the key cache. This will allow us to re-enable the key cache for inodes in the next patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: Bring back BTREE_ITER_CACHED_NOFILL	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: New bpos_cmp(), bkey_cmp() replacements	Kent Overstreet
	This patch introduces - bpos_eq() - bpos_lt() - bpos_le() - bpos_gt() - bpos_ge() and equivalent replacements for bkey_cmp(). Looking at the generated assembly these could probably be improved further, but we already see a significant code size improvement with this patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: Fix a use after free	Kent Overstreet
	This fixes a regression from percpu freedlists in the btree key cache code: in a rare error path, we were immediately freeing a bkey_cached that had been used before and should've waited for an SRCU barrier. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: Assorted checkpatch fixes	Kent Overstreet
	checkpatch.pl gives lots of warnings that we don't want - suggested ignore list: ASSIGN_IN_IF UNSPECIFIED_INT - bcachefs coding style prefers single token type names NEW_TYPEDEFS - typedefs are occasionally good FUNCTION_ARGUMENTS - we prefer to look at functions in .c files (hopefully with docbook documentation), not .h file prototypes MULTISTATEMENT_MACRO_USE_DO_WHILE - we have _many_ x-macros and other macros where we can't do this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: Btree key cache shrinker fix	Kent Overstreet
	The shrinker assumes freed key cache items are ordered by age, so that it doesn't have to scan the full list to find items that are old enough (according to the srcu code) to be freed. But percpu freelists broke this ordering; this patch fixes this by ensuring we insert items into the proper position. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: Btree key cache improvements	Kent Overstreet
	- In userspace, we don't have real percpu variables; this patch disables the percpu freelists in userspace - add some error messages for the asserts in bch2_fs_btree_key_cache_exit(); we've been hitting this (only in userspace, oddly), perhaps this will help us track down the error. - bkey_cached_reuse() should likely be taking the key cache lock, and it's a slowpath so it doesn't hurt to Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: bch2_btree_key_cache_scan() doesn't need trylock	Kent Overstreet
	We don't actually allocate memory under the btree key cache lock - so there's no recursion concerns, and the shrinker can just use mutex_lock(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: Break out bch2_btree_path_traverse_cached_slowpath()	Kent Overstreet
	Prep work for further refactoring. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: Delete old deadlock avoidance code	Kent Overstreet
	This deletes our old lock ordering based deadlock avoidance code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-08-28	bcachefs: Ensure intent locks are marked before taking write locks	Kent Overstreet
	Locks must be correctly marked for the cycle detector to work. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: Avoid using btree_node_lock_nopath()	Kent Overstreet
	With the upcoming cycle detector, we have to be careful about using btree_node_lock_nopath - in particular, using it to take write locks can cause deadlocks. All held locks need to be tracked in a btree_path, so that the cycle detector knows about them - unless we know that we cannot cause deadlocks for other reasons: e.g. we are only taking read locks, or we're in very early fsck (topology repair). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: Fix usage of six lock's percpu mode, key cache version	Kent Overstreet
	Similar to "bcachefs: Fix usage of six lock's percpu mode", six locks have a percpu mode, but we can't switch between percpu and non percpu modes while a lock is in use: threads attempting to take a read lock may race, and we'll end up with the read count permanently off. Fixing this the "correct" way, in six_lock_pcpu_(alloc\|free) would require an RCU barrier, and we don't want to do that - instead, we have to permanently segragate percpu and non percpu objects, including when on freelists. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: Refactor bkey_cached_alloc() path	Kent Overstreet
	Clean up the arguments passed and make them more consistent. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: Convert more locking code to btree_bkey_cached_common	Kent Overstreet
	Ideally, all the code in btree_locking.c should be converted, but then we'd want to convert btree_path to point to btree_key_cached_common too, and then we'd be in for a much bigger cleanup - but a bit of incremental cleanup will still be helpful for the next patches. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: btree_bkey_cached_common->cached	Kent Overstreet
	Add a type descriptor to btree_bkey_cached_common - there's no reason not to since we've got padding that was otherwise unused, and this is a nice cleanup (and helpful in later patches). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: bch2_btree_node_lock_write_nofail()	Kent Overstreet
	Taking a write lock will be able to fail, with the new cycle detector - unless we pass it nofail, which is possible but not preferred. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: New locking functions	Kent Overstreet
	In the future, with the new deadlock cycle detector, we won't be using bare six_lock_* anymore: lock wait entries will all be embedded in btree_trans, and we will need a btree_trans context whenever locking a btree node. This patch plumbs a btree_trans to the few places that need it, and adds two new locking functions - btree_node_lock_nopath, which may fail returning a transaction restart, and - btree_node_lock_nopath_nofail, to be used in places where we know we cannot deadlock (i.e. because we're holding no other locks). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-08-28	bcachefs: Don't leak lock pcpu counts memory	Kent Overstreet
	This fixes a small memory leak. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: Add persistent counters for all tracepoints	Kent Overstreet
	Also, do some reorganizing/renaming, convert atomic counters in bch_fs to persistent counters, and add a few missing counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: Correctly initialize bkey_cached->lock	Kent Overstreet
	We need to use the right class for some assertions to work correctly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-08-28	bcachefs: Fix assertion in bch2_btree_key_cache_drop()	Kent Overstreet
	Turns out this assertion was something we could legitimately hit - add a comment describing what's going on, and handle it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-08-28	bcachefs: Kill BTREE_ITER_CACHED_(NOFILL\|NOCREATE)	Kent Overstreet
	These were used more prior to getting rid of the in-memory bucket arrays - they don't serve much purpose anymore, and deleting them lets us write better assertions. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-08-28	bcachefs: Tracepoint improvements	Kent Overstreet
	Our types are exported to the tracepoint code, so it's not necessary to break things out individually when passing them to tracepoints - we can also call other functions from TP_fast_assign(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-08-28	bcachefs: BTREE_ITER_NO_NODE -> BCH_ERR codes	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>