bcachefs.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2022-10-03	bcachefs: Improve bucket_alloc tracepoint	Kent Overstreet
	It now includes more info - whether the bucket was for metadata or data - and also call it in the same place as the bucket_alloc_fail tracepoint. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2022-10-03	bcachefs: Delete old deadlock avoidance code	Kent Overstreet
	This deletes our old lock ordering based deadlock avoidance code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Deadlock cycle detector	Kent Overstreet
	We've outgrown our own deadlock avoidance strategy. The btree iterator API provides an interface where the user doesn't need to concern themselves with lock ordering - different btree iterators can be traversed in any order. Without special care, this will lead to deadlocks. Our previous strategy was to define a lock ordering internally, and whenever we attempt to take a lock and trylock() fails, we'd check if the current btree transaction is holding any locks that cause a lock ordering violation. If so, we'd issue a transaction restart, and then bch2_trans_begin() would re-traverse all previously used iterators, but in the correct order. That approach had some issues, though. - Sometimes we'd issue transaction restarts unnecessarily, when no deadlock would have actually occured. Lock ordering restarts have become our primary cause of transaction restarts, on some workloads totally 20% of actual transaction commits. - To avoid deadlock or livelock, we'd often have to take intent locks when we only wanted a read lock: with the lock ordering approach, it is actually illegal to hold _any_ read lock while blocking on an intent lock, and this has been causing us unnecessary lock contention. - It was getting fragile - the various lock ordering rules are not trivial, and we'd been seeing occasional livelock issues related to this machinery. So, since bcachefs is already a relational database masquerading as a filesystem, we're stealing the next traditional database technique and switching to a cycle detector for avoiding deadlocks. When we block taking a btree lock, after adding ourself to the waitlist but before sleeping, we do a DFS of btree transactions waiting on other btree transactions, starting with the current transaction and walking our held locks, and transactions blocking on our held locks. If we find a cycle, we emit a transaction restart. Occasionally (e.g. the btree split path) we can not allow the lock() operation to fail, so if necessary we'll tell another transaction that it has to fail. Result: trans_restart_would_deadlock events are reduced by a factor of 10 to 100, and we'll be able to delete a whole bunch of grotty, fragile code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: bch2_btree_path_upgrade() now emits transaction restart	Kent Overstreet
	Centralizing the transaction restart/tracepoint in bch2_btree_path_upgrade() lets us improve the tracepoint - now it emits old and new locks_want. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2022-10-03	bcachefs: Add persistent counters for all tracepoints	Kent Overstreet
	Also, do some reorganizing/renaming, convert atomic counters in bch_fs to persistent counters, and add a few missing counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2022-10-03	bcachefs: Improve trans_restart_journal_preres_get tracepoint	Kent Overstreet
	It now includes journal_flags. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2022-10-03	bcachefs: Improve btree_node_relock_fail tracepoint	Kent Overstreet
	It now prints the error name when the btree node is an error pointer; also, don't trace failures when the the btree node is BCH_ERR_no_btree_node_up. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2022-10-03	bcachefs: Switch btree locking code to struct btree_bkey_cached_common	Kent Overstreet
	This is just some type safety cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	six locks: Improve six_lock_count	Kent Overstreet
	six_lock_count now counts up whether a write lock held, and this patch now also correctly counts six_lock->intent_lock_recurse. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Switch "no match for" message to a tracepoint	Kent Overstreet
	This message fires when the data update path races with a foreground write that overwrote the data that was being moved - this isn't a concerning event as long as it's not happening too often, so switch it to a tracepoint. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Don't drop locks unnecessarily in bch2_btree_update_start()	Kent Overstreet
	This is to fix a livelock in the btree split path described by the previous patch. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Tracepoint improvements	Kent Overstreet
	Our types are exported to the tracepoint code, so it's not necessary to break things out individually when passing them to tracepoints - we can also call other functions from TP_fast_assign(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Tracepoint improvements	Kent Overstreet
	- use strlcpy(), not strncpy() - add tracepoints for btree_path alloc and free - give the tracepoint for key cache upgrade fail a proper name - add a tracepoint for btree_node_upgrade_fail Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Inject transaction restarts in debug mode	Kent Overstreet
	In CONFIG_BCACHEFS_DEBUG mode, we'll now randomly issue transaction restarts - with a decaying probability based on the number of restarts we've already had, to ensure that transactions eventually make forward progress. This should help shake out some bugs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: btree_trans_too_many_iters() is now a transaction restart	Kent Overstreet
	All transaction restarts need a tracepoint - this is essential for debugging Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Improved errcodes	Kent Overstreet
	Instead of overloading standard error codes (EINTR/EAGAIN), and defining short lists of error codes in multiple places that potentially end up overlapping & conflicting, we're now going to have one master list of error codes. Error codes are defined with an x-macro: thus we also have bch2_err_str() now. Also, error codes have a class field. Now, instead of checking for errors with ==, code should use bch2_err_matches(), which returns true if the error is equal to or a sub-error of the error class. This means we can define unique errors for every source location where an error is generated, which will help improve our error messages. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Improve bucket_alloc_fail tracepoint	Kent Overstreet
	We should be printing the number of free buckets, not just the number of available buckets. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Bucket invalidate path improvements	Kent Overstreet
	- invalidate_one_bucket() now returns 1 when we don't have any buckets on this device to invalidate, ensuring we don't spin - the tracepoint invocation is moved to after the transaction commit, and we now include the number of cached sectors in the tracepoint Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Tracepoint improvements	Kent Overstreet
	Delete some obsolete tracepoints, organize alloc tracepoints better, make a few tracepoints more consistent. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Delete a redundant tracepoint	Kent Overstreet
	Now that the bucket_alloc_fail tracepoint includes the error code, the open_bucket_alloc_fail tracepoint is redundant. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Don't normalize to pages in btree cache shrinker	Kent Overstreet
	This behavior dates from the early, early days of bcache, and upon further delving appears to not make any sense. The shrinker only works in terms of 'objects' of unknown size; normalizing to pages only had the effect of changing the batch size, which we could do directly - if we wanted; we probably don't. Normalizing to pages meant our batch size was very small, which seems to have been keeping us from doing as much shrinking as we should be under heavy memory pressure; this patch appears to alleviate some OOMs we've been seeing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Add a tracepoint for superblock writes	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Discard path fixes/improvements	Kent Overstreet
	- bch2_clear_need_discard() was using bch2_trans_relock() incorrectly, and always bailing out before doing any work - ouch. - Add a tracepoint that fires every time bch2_do_discards() runs, and tells us about the work it did - When too many buckets aren't able to be discarded because they need a journal commit, bch2_do_discards now flushes the journal. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Run overwrite triggers before insert	Kent Overstreet
	For backpointers, we'll need to delete old backpointers before adding new backpointers - otherwise we'll run into spurious duplicate backpointer errors. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Improve bucket_alloc_fail() tracepoint	Kent Overstreet
	This adds counters for each of the reasons we may skip allocating a bucket - we're seeing a bug where we loop endlessly trying to allocate when we should have plenty of buckets available, so hopefully this will help us track down why. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Fix large key cache keys	Kent Overstreet
	Previously, we'd go into an infinite loop when attempting to cache a bkey in the key cache larger than 128 u64s - since we were only using a u8 for the size field, it'd get rounded up to 256 then truncated to 0. Oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Improve bucket_alloc tracepoints	Kent Overstreet
	- bucket_alloc_fail now indicates whether allocation was nonblocking - we now return strings, not integers, for alloc reserve. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Improve bucket_alloc_fail tracepoint	Kent Overstreet
	Also include the number of buckets available, and the number of buckets awaiting journal commit - and add a sysfs counter, too. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Fix a use after free	Kent Overstreet
	This fixes a regression from "bcachefs: Stash a copy of key being overwritten in btree_insert_entry". In btree_key_can_insert_cached(), we may reallocate the key cache key, invalidating pointers previously returned by peek() - fix it by issuing a transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Fix failure to allocate btree node in cache	Kent Overstreet
	The error code when we fail to allocate a node in the btree node cache doesn't make it to bch2_btree_path_traverse_all(). Instead, we need to stash a flag in btree_trans so we know we have to take the cannibalize lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Fix slow tracepoints	Kent Overstreet
	Some of our tracepoints were calling snprintf("pS") - which does symbol table lookups - in TP_fast_assign(), which turns out to be a really bad idea. This was done because perf trace wasn't correctly printing tracepoints that use %pS anymore - but it turns out trace-cmd does handle it correctly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Btree key cache coherency	Kent Overstreet
	Updates to non key cache iterators will now be transparently redirected to the key cache for cached btrees. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Tracepoint improvements	Kent Overstreet
	This improves the transaction restart tracepoints - adding distinct tracepoints for all the locations and reasons a transaction might have been restarted, and ensures that there's a tracepoint for every transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Switch to __func__for recording where btree_trans was initialized	Kent Overstreet
	Symbol decoding, via %ps, isn't supported in userspace - this will also be faster when we're using trans->fn in the fast path, as with the new BCH_JSET_ENTRY_log journal messages. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: bch_dev->dev	Kent Overstreet
	Add a field to bch_dev for the dev_t of the underlying block device - this fixes a null ptr deref in tracepoints. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Add a tracepoint for the btree cache shrinker	Kent Overstreet
	This is to help with diagnosing why the btree node can doesn't seem to be shrinking - we've had issues in the past with granularity/batch size, since btree nodes are so big. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Better approach to write vs. read lock deadlocks	Kent Overstreet
	Instead of unconditionally upgrading read locks to intent locks in do_bch2_trans_commit(), this patch changes the path that takes write locks to first trylock, and then if trylock fails check if we have a conflicting read lock, and restart the transaction if necessary. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Drop some fast path tracepoints	Kent Overstreet
	These haven't turned out to be useful Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Fix initialization of bch_write_op.nonce	Kent Overstreet
	If an extent ends up with a replica that is encrypted an a replica that isn't encrypted (due the user changing options), and then copygc/rebalance moves one of the replicas by reading from the unencrypted replica, we had a bug where we wouldn't correctly initialize op->nonce - for each crc field in an extent, crc.offset + crc.nonce must be equal. This patch fixes that by moving op.nonce initialization to bch2_migrate_write_init. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Minor tracepoint improvements	Kent Overstreet
	Btree iterator tracepoints should print whether they're for the key cache. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Improve btree iterator tracepoints	Kent Overstreet
	This patch adds some new tracepoints to the btree iterator code, and adds new fields to the existing tracepoints - primarily for the iterator position. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Don't use uuid in tracepoints	Kent Overstreet
	%pU for printing out pointers to uuids doesn't work in perf trace Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Add a tracepoint for copygc waiting	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Add a tracepoint for when we block on journal reclaim	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: New tracepoint for bch2_trans_get_iter()	Kent Overstreet
	Trying to debug an issue where after traverse_all() we shouldn't have to traverse any iterators... yet we are Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Allocator refactoring	Kent Overstreet
	This uses the kthread_wait_freezable() macro to simplify a lot of the allocator thread code, along with cleaning up bch2_invalidate_bucket2(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Improve trans_restart_mem_realloced tracepoint	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Improve bch2_btree_iter_traverse_all()	Kent Overstreet
	By changing it to upgrade iterators to intent locks to avoid lock restarts we can simplify __bch2_btree_node_lock() quite a bit - this fixes a probable bug where it could potentially drop a lock on an unrelated error but still succeed instead of causing a transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Drop old style btree node coalescing	Kent Overstreet
	We have foreground btree node merging now, and any future btree node merging improvements are going to be based off of that code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03	bcachefs: Don't flush btree writes more aggressively because of btree key cache	Kent Overstreet
	We need to flush the btree key cache when it's too dirty, because otherwise the shrinker won't be able to reclaim memory - this is done by journal reclaim. But journal reclaim also kicks btree node writes: this meant that btree node writes were getting kicked much too often just because we needed to flush btree key cache keys. This patch splits journal pins into two different lists, and teaches journal reclaim to not flush btree node writes when it only needs to flush key cache keys. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>