bcachefs.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2022-10-03	bcachefs: Fix large key cache keys	Kent Overstreet
	Previously, we'd go into an infinite loop when attempting to cache a bkey in the key cache larger than 128 u64s - since we were only using a u8 for the size field, it'd get rounded up to 256 then truncated to 0. Oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Improve bucket_alloc tracepoints	Kent Overstreet
	- bucket_alloc_fail now indicates whether allocation was nonblocking - we now return strings, not integers, for alloc reserve. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Improve bucket_alloc_fail tracepoint	Kent Overstreet
	Also include the number of buckets available, and the number of buckets awaiting journal commit - and add a sysfs counter, too. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Fix a use after free	Kent Overstreet
	This fixes a regression from "bcachefs: Stash a copy of key being overwritten in btree_insert_entry". In btree_key_can_insert_cached(), we may reallocate the key cache key, invalidating pointers previously returned by peek() - fix it by issuing a transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Fix failure to allocate btree node in cache	Kent Overstreet
	The error code when we fail to allocate a node in the btree node cache doesn't make it to bch2_btree_path_traverse_all(). Instead, we need to stash a flag in btree_trans so we know we have to take the cannibalize lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Fix slow tracepoints	Kent Overstreet
	Some of our tracepoints were calling snprintf("pS") - which does symbol table lookups - in TP_fast_assign(), which turns out to be a really bad idea. This was done because perf trace wasn't correctly printing tracepoints that use %pS anymore - but it turns out trace-cmd does handle it correctly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Btree key cache coherency	Kent Overstreet
	Updates to non key cache iterators will now be transparently redirected to the key cache for cached btrees. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Tracepoint improvements	Kent Overstreet
	This improves the transaction restart tracepoints - adding distinct tracepoints for all the locations and reasons a transaction might have been restarted, and ensures that there's a tracepoint for every transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Switch to __func__for recording where btree_trans was initialized	Kent Overstreet
	Symbol decoding, via %ps, isn't supported in userspace - this will also be faster when we're using trans->fn in the fast path, as with the new BCH_JSET_ENTRY_log journal messages. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: bch_dev->dev	Kent Overstreet
	Add a field to bch_dev for the dev_t of the underlying block device - this fixes a null ptr deref in tracepoints. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Add a tracepoint for the btree cache shrinker	Kent Overstreet
	This is to help with diagnosing why the btree node can doesn't seem to be shrinking - we've had issues in the past with granularity/batch size, since btree nodes are so big. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Update export_operations for snapshots	Kent Overstreet
	When support for snapshots was merged, export operations weren't updated yet. This patch adds new filehandle types for bcachefs that include the subvolume ID and updates export operations for subvolumes - and also .get_parent, support for which was added just prior to snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Better approach to write vs. read lock deadlocks	Kent Overstreet
	Instead of unconditionally upgrading read locks to intent locks in do_bch2_trans_commit(), this patch changes the path that takes write locks to first trylock, and then if trylock fails check if we have a conflicting read lock, and restart the transaction if necessary. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Drop some fast path tracepoints	Kent Overstreet
	These haven't turned out to be useful Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Fix initialization of bch_write_op.nonce	Kent Overstreet
	If an extent ends up with a replica that is encrypted an a replica that isn't encrypted (due the user changing options), and then copygc/rebalance moves one of the replicas by reading from the unencrypted replica, we had a bug where we wouldn't correctly initialize op->nonce - for each crc field in an extent, crc.offset + crc.nonce must be equal. This patch fixes that by moving op.nonce initialization to bch2_migrate_write_init. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Minor tracepoint improvements	Kent Overstreet
	Btree iterator tracepoints should print whether they're for the key cache. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Improve btree iterator tracepoints	Kent Overstreet
	This patch adds some new tracepoints to the btree iterator code, and adds new fields to the existing tracepoints - primarily for the iterator position. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Don't use uuid in tracepoints	Kent Overstreet
	%pU for printing out pointers to uuids doesn't work in perf trace Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Add a tracepoint for copygc waiting	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Add a tracepoint for when we block on journal reclaim	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: New tracepoint for bch2_trans_get_iter()	Kent Overstreet
	Trying to debug an issue where after traverse_all() we shouldn't have to traverse any iterators... yet we are Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Allocator refactoring	Kent Overstreet
	This uses the kthread_wait_freezable() macro to simplify a lot of the allocator thread code, along with cleaning up bch2_invalidate_bucket2(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Improve trans_restart_mem_realloced tracepoint	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Improve bch2_btree_iter_traverse_all()	Kent Overstreet
	By changing it to upgrade iterators to intent locks to avoid lock restarts we can simplify __bch2_btree_node_lock() quite a bit - this fixes a probable bug where it could potentially drop a lock on an unrelated error but still succeed instead of causing a transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Drop old style btree node coalescing	Kent Overstreet
	We have foreground btree node merging now, and any future btree node merging improvements are going to be based off of that code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Don't flush btree writes more aggressively because of btree key cache	Kent Overstreet
	We need to flush the btree key cache when it's too dirty, because otherwise the shrinker won't be able to reclaim memory - this is done by journal reclaim. But journal reclaim also kicks btree node writes: this meant that btree node writes were getting kicked much too often just because we needed to flush btree key cache keys. This patch splits journal pins into two different lists, and teaches journal reclaim to not flush btree node writes when it only needs to flush key cache keys. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Throttle updates when btree key cache is too dirty	Kent Overstreet
	This is needed to ensure we don't deadlock because journal reclaim and thus memory reclaim isn't making forward progress. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Simplify transaction commit error path	Kent Overstreet
	The transaction restart path traverses all iterators, we don't need to do it here. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Ensure journal reclaim runs when btree key cache is too dirty	Kent Overstreet
	Ensuring the key cache isn't too dirty is critical for ensuring that the shrinker can reclaim memory. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Convert tracepoints to use %ps, not %pf	Kent Overstreet
	Symbol decoding was changed from %pf to %ps Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Improve tracing for transaction restarts	Kent Overstreet
	We have a bug where we can get stuck with a process spinning in transaction restarts - need more information. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Make copygc thread global	Kent Overstreet
	Per device copygc threads don't move data to different devices and they make fragmentation works - they don't make much sense anymore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: btree_bkey_cached_common	Kent Overstreet
	This is prep work for the btree key cache: btree iterators will point to either struct btree, or a new struct bkey_cached. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Add a few tracepoints	Kent Overstreet
	Transaction restart tracing should probably be overhaulled at some point. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Initial commit	Kent Overstreet
	Forked from drivers/md/bcache, now a full blown COW multi device filesystem with a long list of features - https://bcachefs.org Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	Revert "block: remove zero_fill_bio_iter"	Kent Overstreet
	Bring this helper back for bcachefs. This reverts commit 6f822e1b5d9dda3d20e87365de138046e3baa03a. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	lib/generic-radix-tree.c: Add peek_prev()	Kent Overstreet
	This patch adds genradix_peek_prev(), genradix_iter_rewind(), and genradix_for_each_reverse(), for iterating backwards over a generic radix tree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	lib/generic-radix-tree.c: Add a missing include	Kent Overstreet
	We now need linux/limits.h for SIZE_MAX. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	lib/generic-radix-tree.c: Don't overflow in peek()	Kent Overstreet
	When we started spreading new inode numbers throughout most of the 64 bit inode space, that triggered some corner case bugs, in particular some integer overflows related to the radix tree code. Oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	vfs: inode cache conversion to hash-bl	Dave Chinner
	Because scalability of the global inode_hash_lock really, really sucks. 32-way concurrent create on a couple of different filesystems before: - 52.13% 0.04% [kernel] [k] ext4_create - 52.09% ext4_create - 41.03% __ext4_new_inode - 29.92% insert_inode_locked - 25.35% _raw_spin_lock - do_raw_spin_lock - 24.97% __pv_queued_spin_lock_slowpath - 72.33% 0.02% [kernel] [k] do_filp_open - 72.31% do_filp_open - 72.28% path_openat - 57.03% bch2_create - 56.46% __bch2_create - 40.43% inode_insert5 - 36.07% _raw_spin_lock - do_raw_spin_lock 35.86% __pv_queued_spin_lock_slowpath 4.02% find_inode Convert the inode hash table to a RCU-aware hash-bl table just like the dentry cache. Note that we need to store a pointer to the hlist_bl_head the inode has been added to in the inode so that when it comes to unhash the inode we know what list to lock. We need to do this because the hash value that is used to hash the inode is generated from the inode itself - filesystems can provide this themselves so we have to either store the hash or the head pointer in the inode to be able to find the right list head for removal... Same workload after: Signed-off-by: Dave Chinner <dchinner@redhat.com>
2022-10-03	hlist-bl: add hlist_bl_fake()	Dave Chinner
	in preparation for switching the VFS inode cache over the hlist_bl lists, we nee dto be able to fake a list node that looks like it is hased for correct operation of filesystems that don't directly use the VFS indoe cache. Signed-off-by: Dave Chinner <dchinner@redhat.com>
2022-10-03	closures: closure_wait_event()	Kent Overstreet
	Like wait_event() - except, because it uses closures and closure waitlists it doesn't have the restriction on modifying task state inside the condition check, like wait_event() does. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Acked-by: Coly Li <colyli@suse.de>
2022-10-03	bcache: move closures to lib/	Kent Overstreet
	Prep work for bcachefs - being a fork of bcache it also uses closures Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Acked-by: Coly Li <colyli@suse.de>
2022-10-03	block: Export blk_status_to_str()	Kent Overstreet
	Bcachefs uses this. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	fs: factor out d_mark_tmpfile()	Kent Overstreet
	New helper for bcachefs - bcachefs doesn't want the inode_dec_link_count() call that d_tmpfile does, it handles i_nlink on its own atomically with other btree updates Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	mm: Bring back vmalloc_exec	Kent Overstreet
	This is needed for bcachefs, which dynamically generates per-btree node unpack functions. This reverts commit 7a0e27b2a0ce2735e27e21ebc8b777550fe0ed81. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	sched: Add task_struct->faults_disabled_mapping	Kent Overstreet
	This is needed to fix a page cache coherency issue with O_DIRECT writes. O_DIRECT writes (and other filesystem operations that modify file data while bypassing the page cache) need to shoot down ranges of the page cache - and additionally, need locking to prevent those pages from pulled back in. But O_DIRECT writes invoke the page fault handler (via get_user_pages), and the page fault handler will need to take that same lock - this is a classic recursive deadlock if userspace has mmaped the file they're DIO writing to and uses those pages for the buffer to write from, and it's a lock ordering deadlock in general. Thus we need a way to signal from the dio code to the page fault handler when we already are holding the pagecache add lock on an address space - this patch just adds a member to task_struct for this purpose. For now only bcachefs is implementing this locking, though it may be moved out of bcachefs and made available to other filesystems in the future. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	locking: SIX locks (shared/intent/exclusive)	Kent Overstreet
	New lock for bcachefs, like read/write locks but with a third state, intent. Intent locks conflict with each other, but not with read locks; taking a write lock requires first holding an intent lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	locking/lockdep: lock_class_is_held()	Kent Overstreet
	This patch adds lock_class_is_held(), which can be used to verify that a particular type of lock is _not_ held. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	Compiler Attributes: add __flatten	Kent Overstreet
	Prep work for bcachefs Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>