summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-03-13bcachefs: Repair code for read_time=0new_allocatorKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-13bcachefs: BCH_ALLOC_NEED_INC_GEN()Kent Overstreet
This makes incrementing of bucket gens rigorous, and ensures that it always happens before issuing discards. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-13bcachefs: Improve bucket_alloc_fail tracepointKent Overstreet
Also include the number of buckets available, and the number of buckets awaiting journal commit - and add a sysfs counter, too. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-13bcachefs: Kill struct bucket_markKent Overstreet
This switches struct bucket to using a lock, instead of cmpxchg. And now that the protected members no longer need to fit into a u64, we can expand the sector counts to 32 bits. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-13bcachefs: Kill main in-memory bucket arrayKent Overstreet
All code using the in-memory bucket array, excluding GC, has now been converted to use the alloc btree directly - so we can finally delete it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-13bcachefs: bch2_dev_usage_update() no longer depends on bucket_markKent Overstreet
This is one of the last steps in getting rid of the main in-memory bucket array. This changes bch2_dev_usage_update() to take bkey_alloc_unpacked instead of bucket_mark, and for the places where we are in fact working with bucket_mark and don't have bkey_alloc_unpacked, we add a wrapper that takes bucket_mark and converts to bkey_alloc_unpacked. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-13bcachefs: Fix LRU repair codeDaniel Hill
Don't run triggers when repairing incorrect/missing lru entries Triggers create a conflicting call to lru_change() with the incorrect lru ptr, lru_change attempts to delete this incorrect lru entry, and fails because the back ptr doesn't match the original bucket causing fsck to error. Signed-off-by: Daniel Hill <daniel@gluo.nz>
2022-03-13bcachefs: Fsck for need_discard & freespace btreesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-13bcachefs: New bucket invalidate pathKent Overstreet
In the old allocator code, preparing an existing empty bucket was part of the same code path that invalidated buckets containing cached data. In the new allocator code this is no longer the case: the main allocator path finds empty buckets (via the new freespace btree), and can't allocate buckets that contain cached data. We now need a separate code path to invalidate buckets containing cached data when we're low on empty buckets, which this patch implements. When the number of free buckets decreases that triggers the new invalidate path to run, which uses the LRU btree to pick cached data buckets to invalidate until we're above our watermark. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-13bcachefs: New discard implementationKent Overstreet
In the old allocator code, buckets would be discarded just prior to being used - this made sense in bcache where we were discarding buckets just after invalidating the cached data they contain, but in a filesystem where we typically have more free space we want to be discarding buckets when they become empty. This patch implements the new behaviour - it checks the need_discard btree for buckets awaiting discards, and then clears the appropriate bit in the alloc btree, which moves the buckets to the freespace btree. Additionally, discards are now enabled by default. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-13bcachefs: Kill allocator threads & freelistsKent Overstreet
Now that we have new persistent data structures for the allocator, this patch converts the allocator to use them. Now, foreground bucket allocation uses the freespace btree to find buckets to allocate, instead of popping buckets off the freelist. The background allocator threads are no longer needed and are deleted, as well as the allocator freelists. Now we only need background tasks for invalidating buckets containing cached data (when we are low on empty buckets), and for issuing discards. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-13bcachefs: Freespace, need_discard btreesKent Overstreet
This adds two new btrees for the upcoming allocator rewrite: an extents btree of free buckets, and a btree for buckets awaiting discards. We also add a new trigger for alloc keys to keep the new btrees up to date, and a compatibility path to initialize them on existing filesystems. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-13bcachefs: LRU btreeKent Overstreet
This implements new persistent LRUs, to be used for buckets containing cached data, as well as stripes ordered by time when a block became empty. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-13bcachefs: KEY_TYPE_setKent Overstreet
A new empty key type, to be used when using a btree as a set. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-13bcachefs: bch_sb_field_journal_v2Kent Overstreet
Add a new superblock field which represents journal buckets as ranges: also move code for the superblock journal fields to journal_sb.c. This also reworks the code for resizing the journal to write the new superblock before using the new journal buckets, and thus be a bit safer. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-13bcachefs: Fix BTREE_TRIGGER_WANTS_OLD_AND_NEWKent Overstreet
BTREE_TRIGGER_WANTS_OLD_AND_NEW didn't work correctly when the old and new key were both alloc keys, but different versions - it required old and new key type to be identical, and this bug is a problem for the new allocator rewrite. This patch fixes it by checking if the old and new key have the same trigger functions - the different versions of alloc (and inode) keys have the same trigger functions. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-13bcachefs: Move trigger fns to bkey_opsKent Overstreet
This replaces the switch statements in bch2_mark_key(), bch2_trans_mark_key() with new bkey methods - prep work for the next patch, which fixes BTREE_TRIGGER_WANTS_OLD_AND_NEW. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-13fixup! bcachefs: Initial commitKent Overstreet
2022-03-12bcachefs: Revalidate pointer to old bkey val before calling mem triggersKent Overstreet
We recently started stashing a copy of the key being overwritten in btree_insert_entry: this is helpful for avoiding multiple calls to bch2_btree_path_peek_slot() and bch2_journal_keys_peek() in the transaction commit path. But it turns out this has a problem - when we run mem/atomic triggers, we've done a couple things that can invalidate the pointer to the old key's value. This makes the optimization of stashing a pointer to the old value questionable, but for now this patch revalidates that pointer before running mem triggers. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-12bcachefs: bch2_trans_updates_to_text()Kent Overstreet
This turns bch2_dump_trans_updates() into a to_text() method - this way it can be used by debug tracing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-12bcachefs: bch2_trans_inconsistent()Kent Overstreet
Add a new error macro that also dumps transaction updates in addition to doing an emergency shutdown - when a transaction update discovers or is causing a fs inconsistency, it's helpful to see what updates it was doing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-12bcachefs: Drop !did_work path from do_btree_insert_one()Kent Overstreet
As we've already reserved space in the journal this optimization doesn't actually buy us anything, and when doing list_journal debugging it deletes information we want. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-12bcachefs: bch2_btree_iter_peek_upto()Kent Overstreet
In BTREE_ITER_FILTER_SNAPHOTS mode, we skip over keys in unrelated snapshots. When we hit the end of an inode, if the next inode(s) are in a different subvolume, we could potentially have to skip past many keys before finding a key we can return to the caller, so they can terminate the iteration. This adds a peek_upto() variant to solve this problem, to be used when we know the range we're searching within. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-12bcachefs: Delay setting path->should_be_lockedKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-12bcachefs: Add a missing wakeupKent Overstreet
This fixes a rare bug with bch2_btree_flush_all_writes() getting stuck. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-12bcachefs: Allocate journal buckets sequentiallyKent Overstreet
This tweaks __bch2_set_nr_journal_buckets() so that we aren't reversing their order in the jorunal anymore - nice for rotating disks. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-12bcachefs: bch2_journal_log_msg()Kent Overstreet
This adds bch2_journal_log_msg(), which just logs a message to the journal, and uses it to mark startup and when journal replay finishes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-12bcachefs: Change flags param to bch2_btree_delete_range to update_flagsKent Overstreet
It wasn't used as iter_flags (excepting the unit tests, which this patch fixes), and the next patch is going to need to pass in BTREE_TRIGGER_NORUN. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-12bcachefs: Lock ordering asserts for traverse_all()Kent Overstreet
This adds some new assertions that we always take locks in the correct order while running under traverse_all() - we've been seeing some livelocks and a bit of strange behaviour, this helps ensure that everything is working the way we expect. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-12bcachefs: Fix lock ordering under traverse_all()Kent Overstreet
traverse_all() traverses btree paths in sorted order, so it should never see transaction restarts due to lock ordering violations. But some code in __bch2_btree_path_upgrade(), while necessary when not running under traverse_all(), was causing some confusing lock ordering violations - disabling this code under traverse_all() will let us put in some more assertions. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-12bcachefs: Fix error handling in traverse_all()Kent Overstreet
In btree_path_traverse_all() we were failing to check for -EIO in the retry loop, and after btree node read error we'd go into an infinite loop. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-12bcachefs: Fix dio write path with loopback dio modeKent Overstreet
When the iov_iter is a bvec iter, it's possible the IO was submitted from a kthread that didn't have an mm to switch to. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-12bcachefs: Use bio_iov_vecs_to_alloc()Kent Overstreet
This fixes a bug in the DIO read path where, when using a loopback device in DIO mode, we'd allocate a biovec that would get overwritten and leaked in bio_iov_iter_get_pages() -> bio_iov_bvec_set(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-12bcachefs: Revert UUID format-specifier changeKent Overstreet
"bcachefs: Log & error message improvements" accidentally changed the format specifier we use for converting UUIDs to strings, which broke mounting of encrypted filesystems - this patch reverts that change. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-12bcachefs: Skip periodic wakeup of journal reclaim when journal emptyKent Overstreet
Less system noise. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-12bcachefs: Check for rw before setting opts via sysfsKent Overstreet
This isn't a correctness issue, it just eliminates errors in the dmesg log when we're RO. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-12bcachefs: Fix bch2_journal_flush_device_pins()Kent Overstreet
It's now legal for the pin fifo to be empty, which means this code needs to be updated in order to not hit an assert. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-12bcachefs: Fix pr_tab_rjust()Kent Overstreet
pr_tab_rjust() was broken and leaving a null somewhere in the output string - this patch fixes it and simplifies it a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-12bcachefs: Don't keep around btree_paths unnecessarilyKent Overstreet
When bch2_trans_begin() is called and there hasn't been a transaction restart, we presume that we're now doing something new - iterating over different keys, and we now shouldn't keep aruond paths related to the previous transaction, excepting the subvolumes btree. This should fix some of our "transaction path overflow" bugs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-12bcachefs: Don't arm journal->write_work when journal entry !openKent Overstreet
This fixes a shutdown race where we were rearming journal->write_work after the journal has already shut down. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-12bcachefs: Convert bch2_sb_to_text to master option listKent Overstreet
Options no longer have to be manually added to bch2_sb_to_text() - it now uses the master list of options in opts.h. Also, improve some of the formatting by converting it to tabstops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-12bcachefs: Fix transaction path overflow in fiemapKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-12bcachefs: respect superblock discard flag.Daniel Hill
We were accidentally using default mount options and overwriting the discard flag. Signed-off-by: Daniel Hill <daniel@gluo.nz>
2022-03-12bcachefs: Fix usage of six lock's percpu modeKent Overstreet
Six locks have a percpu mode, which we use for interior btree nodes, as well as btree key cache keys for the subvolumes btree. We've been switching locks back and forth between percpu and non percpu mode as needed, but it turns out this is racy - when we're reusing an existing node, other threads could be attempting to lock it while we're switching it between modes. This patch fixes this by never switching 'struct btree' between the two modes, and instead segragating them between two different freed lists. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-12bcachefs: Refactor bch2_btree_node_mem_alloc()Kent Overstreet
This is prep work for the next patch, which is going to fix our usage of the percpu mode of six locks by never switching struct btree between the two modes - which means we need separate freed lists. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-12bcachefs: Simplify parameters to bch2_btree_update_start()Kent Overstreet
We don't need to pass the number of nodes required to bch2_btree_update_start, just whether we're doing a split at @level. This is prep work for a fix to our usage of six lock's percpu mode, which is going to require us to count up and allocate interior nodes and leaf nodes seperately. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-12bcachefs: Make bch2_btree_cache_scan() try harderKent Overstreet
Previously, when bch2_btree_cache_scan() attempted to reclaim a node but failed (because trylock failed, because it was dirty, etc.), it would count that against the number of nodes it was scanning and attempting to free. This patch changes that behaviour, so that now we only count nodes that we then don't free if they have the accessed bit (which we also clear). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-12bcachefs: Finish writing journal after journal errorKent Overstreet
After emergency shutdown, all journal entries will be written as noflush entries, meaning they will never be used - but they'll still exist for debugging tools to examine. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-12bcachefs: __journal_entry_close() never failsKent Overstreet
Previous patch just moved responsibility for incrementing the journal sequence number and initializing the new journal entry from __journal_entry_close() to journal_entry_open(); this patch makes the analagous change for journal reservation state, incrementing the index into array of journal_bufs at open time. This means that __journal_entry_close() never fails to close an open journal entry, which is important for the next patch that will change our emergency shutdown behaviour. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-03-12bcachefs: Refactor journal code to not use unwritten_idxKent Overstreet
It makes the code more readable if we work off of sequence numbers, instead of direct indexes into the array of journal buffers. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>