summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-01-06bcachefs: bucket_gens btreeKent Overstreet
To improve mount times, add a btree for just bucket gens, 256 of them per key: this means we'll have to scan drastically less metadata at startup. This adds - trigger for keeping it in sync with the all btree - initialization code, for filesystems from previous versions - new path for reading bucket gens - new fsck code And a new on disk format version. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: New magic numberKent Overstreet
Add a new bcachefs-specific magic number for the superblock, instead of continuing to use the old bcache magic number3 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Fix a btree iter assertion popKent Overstreet
This fixes a (harmless) broken invariant in __bch2_btree_path_set_pos(): iterators to interior nodes should point to the first non whiteout. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Simplify journal read pathKent Overstreet
This just cleans up and simplifies the code that decides where to resume writing in the journal - when the code was originally written we weren't saving the precise location of every journal write found. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Fix a "no journal entries found" bugKent Overstreet
On startup, we need to ensure the first journal entry written is a flush write: after a clean shutdown we generally don't read the journal, which means we might be overwriting whatever was there previously, and there must always be at least one flush entry in the journal or recovery will fail. Found by fstests generic/388. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Don't error out when just reading the journalKent Overstreet
This tweaks the recovery and journal paths so that we don't error out before we need to: the list_journal command should work, even if we wouldn't be able to replay successfully. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Improve bch2_check_alloc_info()Kent Overstreet
This factors out a new helper from bch2_dev_freespace_init(), bch2_get_key_or_hole(), and uses it in bch2_check_alloc_info(): we're now able to process holes in the alloc btree as ranges, instead of one bucket at a time. This will improve fsck performance on new filesystems, or filesystems where not every bucket has been used yet. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: bch2_trans_revalidate_updates_in_node()Kent Overstreet
When we started stashing the key being overwritten in btree_insert_entry, this introduced a typical iterator invalidation problem, triggered by btree node splits or resorts. Previously, dealt with this by unconditionally re-validating those stashed pointers in the transaction commit path. This patch gets rid of that by doing it only when needed, in bch2_trans_node_add() or bch2_trans_node_reinit_iter(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Improve bch2_dev_freespace_init()Kent Overstreet
This makes bch2_dev_freespace_init() much faster: instead of processing every bucket on the device one at a time, we handle ranges of missing keys all at once: the freespace btree is an extents style btree, so we only have to insert one freespace key for every range of missing keys in the alloc btree. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Use for_each_btree_key_upto() more consistentlyKent Overstreet
It's important that in BTREE_ITER_FILTER_SNAPSHOTS mode we always use peek_upto() and provide an end for the interval we're searching for - otherwise, when we hit the end of the inode the next inode be in a different subvolume and not have any keys in the current snapshot, and we'd iterate over arbitrarily many keys before returning one. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: extents no longer require special handling for packingKent Overstreet
Extent overwrite used to be handled differently, underneath the journaling layer and within the core btree code. This imposed restrictions on bkey packing/packed formats, which no longer apply. This patch deletes those restrictions. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Optimize bch2_alloc_to_v4()Kent Overstreet
Inline fastpath - and also avoid a copy in the fastpath. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: New btree helpersKent Overstreet
This introduces some new conveniences, to help cut down on boilerplate: - bch2_trans_kmalloc_nomemzero() - performance optimiation - bch2_bkey_make_mut() - bch2_bkey_get_mut() - bch2_bkey_get_mut_typed() - bch2_bkey_alloc() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcacehfs: Fix bch2_get_alloc_in_memory_pos()Kent Overstreet
The code that determines how much alloc/backpointers we can fit in memory iterates keys in level 1 that point to leaf nodes: therefore there's no guarantee that the keys it sees correspond to valid devices. This fixes a bug where after device removal we'd call bucket_pos_to_bp() on a pos for an invvalid device. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: bch2_inode_opts_get()Kent Overstreet
This improves io_opts() and makes it a non-inline function - it's big enough that it probably shouldn't be. Also, bch_io_opts no longer needs fields for whether options are defined, so we can slim it down a bit. We'd like to stop passing around the full bch_io_opts, but that'll be tricky because of bch2_rebalance_add_key(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Fix BCH_IOCTL_DISK_SET_STATEKent Overstreet
- Ensure we print an error message if necessary. Ideally we'd return the precise error code to userspace and leave printing the error message to the userspace tool, but we haven't decided to make our private error codes ABI-stable yet. - Return standard error code to userspace Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Don't set accessed bit on btree node fillKent Overstreet
Btree nodes shouldn't have their accessed bit set when entering the btree cache by being read in from disk - this fixes linear scans thrashing the cache. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Fix an includeKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Kill BCH_FEATURE_incompressibleKent Overstreet
This isn't needed anymore, we only support metadata versions that have this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: bkey_min(), bkey_max()Kent Overstreet
Parallel to bpos_min(), bpos_max() - trivial refactoring. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Better inlining in bch2_time_stats_update()Kent Overstreet
Move the actual slowpath off into a new function - bch2_time_stats_clear_buffer() - and inline bch2_time_stats_update_one(). Alo, use the new inlined update functions from mean_and_variance. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Optimize bch2_trans_iter_init()Kent Overstreet
When flags & btree_id are constants, we can constant fold the entire calculation of the actual iterator flags - and the whole thing becomes small enough to inline. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: More dio inliningKent Overstreet
Eliminate another function call in the O_DIRECT write path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Kill some unneeded references to c->flagsKent Overstreet
This drops some unneeded references to JOURNAL_REPLAY_DONE in c->flags: we're already mirroring it in btree_trans, we just weren't using it consistently. We may want to do this with more flags: btree_iter.c: unsigned nr = test_bit(BCH_FS_STARTED, &c->flags) btree_update_leaf.c: if (unlikely(!test_bit(BCH_FS_MAY_GO_RW, &c->flags))) { Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Optimize bch2_nocow_write()Kent Overstreet
Previously, bch2_nocow_write() had to walk the pointers in the extent being written to three times - this patch deletes one of them, and saves some moderately expensive intermediate results: PTR_BUCKET() requires a divide, and this also saves the nocow lock hash computation instead of redoing it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: New bpos_cmp(), bkey_cmp() replacementsKent Overstreet
This patch introduces - bpos_eq() - bpos_lt() - bpos_le() - bpos_gt() - bpos_ge() and equivalent replacements for bkey_cmp(). Looking at the generated assembly these could probably be improved further, but we already see a significant code size improvement with this patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Improve bch2_inode_opts_to_opts()Kent Overstreet
It turns out the *_defined entries of bch_io_opts are only used in one place - in the xattr get path - and there we immediately convert to a bch_opts struct, which also has the *_defined entries. This patch changes bch2_inode_opts_to_opts() to go directly from bch_inode_unpacked to bch_opts, which is a minor simplification and will also let us slim down struct bch_io_opts in another patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Add some unlikely() annotationsKent Overstreet
Add a few easy unlikely() optimizations. These are mainly worthwhile because the compiler will (usually) put the branch-not-taken path at the end of the function, meaning better icache utilization. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Better inlining in core write pathKent Overstreet
Provide inline versions of some allocation functions - bch2_alloc_sectors_done_inlined() - bch2_alloc_sectors_append_ptrs_inlined() and use them in the core IO path. Also, inline bch2_extent_update_i_size_sectors() and bch2_bkey_append_ptr(). In the core write path, function call overhead matters - every function call is a jump to a new location and a potential cache miss. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Switch to golden ratio hash for nocow locksKent Overstreet
Siphash turned out to be too expensive, after profiling. jhash is terrible, but a standard golden ratio hash may have the right properties to do well enough here. We'll want to watch for lock contention due to excessive hash collisions, but we have a time_stats for that. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Inline bch2_two_state_(trylock|unlock)Kent Overstreet
Standard inlining of fast paths - these locks are now used by our new nocow mode. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Better inlining in bch2_subvolume_get_snapshot()Kent Overstreet
This provides an inlined version of bch2_subvolume_get() and uses it in bch2_subvolume_get_snapshot(), since this is the version that's used all over the place and in fast paths (e.g. IO paths). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Inline bch2_bkey_format_add_key()Kent Overstreet
This is only called in two places, and when it's used we use it in a tight loop - it's definitely worth inlining. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Tiny bch2_trans_update_by_path_trace() optimizationKent Overstreet
This just removes a redundant comparison - there's more work we could do here to remove some redundant copying. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Move some asserts behind CONFIG_BCACHEFS_DEBUGKent Overstreet
Convert some non-critical asserts in long-stable code to debug asserts. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Split out __bch2_btree_node_get()Kent Overstreet
Standard splitting out of the slow path from the fast path of a function. We may follow this up in another patch with inlining the fast path into btree_iter.c. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: More errcode cleanupKent Overstreet
We shouldn't be overloading standard error codes now that we have provisions for bcachefs-specific errorcodes: this patch converts super.c and super-io.c to per error site errcodes, with a bit of cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Handle last journal write being tornKent Overstreet
If the last journal write didn't complete sucessfully due to a torn write, we'll detect it as a checksum error. In that case, we should just pretend that journal entry was never written. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Improve journal_read() loggingKent Overstreet
Print out the journal entries we read and will replay as soon as possible - if we get an error walidating keys it's helpful to know where it was in the journal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Nocow supportKent Overstreet
This adds support for nocow mode, where we do writes in-place when possible. Patch components: - New boolean filesystem and inode option, nocow: note that when nocow is enabled, data checksumming and compression are implicitly disabled - To prevent in-place writes from racing with data moves (data_update.c) or bucket reuse (i.e. a bucket being reused and re-allocated while a nocow write is in flight, we have a new locking mechanism. Buckets can be locked for either data update or data move, using a fixed size hash table of two_state_shared locks. We don't have any chaining, meaning updates and moves to different buckets that hash to the same lock will wait unnecessarily - we'll want to watch for this becoming an issue. - The allocator path also needs to check for in-place writes in flight to a given bucket before giving it out: thus we add another counter to bucket_alloc_state so we can track this. - Fsync now may need to issue cache flushes to block devices instead of flushing the journal. We add a device bitmask to bch_inode_info, ei_devs_need_flush, which tracks devices that need to have flushes issued - note that this will lead to unnecessary flushes when other codepaths have already issued flushes, we may want to replace this with a sequence number. - New nocow write path: look up extents, and if they're writable write to them - otherwise fall back to the normal COW write path. XXX: switch to sequence numbers instead of bitmask for devs needing journal flush XXX: ei_quota_lock being a mutex means bch2_nocow_write_done() needs to run in process context - see if we can improve this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Data update support for unwritten extentsKent Overstreet
The data update path requires special support for unwritten extents - we still need to be able to move them, but there's no need to read or write anything. This patch adds a new error code to tell bch2_move_extent() that we're short circuiting the read, and adds bch2_update_unwritten_extent() to create a reservation then call __bch2_data_update_index_update(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Unwritten extents supportKent Overstreet
- bch2_extent_merge checks unwritten bit - read path returns 0s for unwritten extents without actually reading - reflink path skips over unwritten extents - bch2_bkey_ptrs_invalid() checks for extents with both written and unwritten extents, and non-normal extents (stripes, btree ptrs) with unwritten ptrs - fiemap checks for unwritten extents and returns FIEMAP_EXTENT_UNWRITTEN Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: bch2_extent_update_i_size_sectors()Kent Overstreet
In the io path, when we do the extent update we also have to update the inode - for i_size and i_sectors updates, as well as for bi_journal_seq for fsync. This factors that out into a new helper which will be used in the new nocow mode, in the unwritten extent conversion path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: bch2_extent_fallocate()Kent Overstreet
This factors out part of __bchfs_fallocate() in fs-io.c into an new, lower level io.c helper, which creates a single extent reservation. This is prep work for nocow support - the new helper will shortly gain the ability to create unwritten extents. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Improve verify_bucket_evacuated()Kent Overstreet
On failure to evacuate bucket this now prints out the extents in that bucket, not just the alloc key: with unwritten extents, it's important to know what we failed to move as we have different codepaths for unwritten and written extents. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Fix a transaction path overflowKent Overstreet
It turns out we need bch2_extent_trim_atomi() even when we're deleting extents one at a time because it's possible for one reflink_p to reference arbitrarily many reflink_v extents. This doesn't normally happen, but the data move path can fragment existing extents in the background. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Fix a race with b->write_typeKent Overstreet
b->write_type needs to be set atomically with setting the btree_node_need_write flag, so move it into b->flags. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Error message improvementKent Overstreet
- Centralize format strings in bcachefs.h - Add bch2_fmt_inum_offset() and related helpers - Switch error messages for inodes to also print out the offset, in bytes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Improve a few warningsKent Overstreet
Warnings ought to always have a format string/log message - makes them considerably more useful. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-06bcachefs: Fix for_each_btree_key2()Kent Overstreet
Previously, when we exited from the loop body with a break statement _ret wouldn't have been assigned to yet, and we could spuriously return a transaction restart error. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>