summaryrefslogtreecommitdiff
path: root/fs/bcachefs/btree_io.c
AgeCommit message (Collapse)Author
2021-05-30bcachefs: Fix a deadlockKent Overstreet
Waiting on a btree node write with btree locks held can deadlock, if the write errors: the write error path has to do do a btree update to drop the pointer to the replica that errored. The interior update path has to wait on in flight btree writes before freeing nodes on disk. Previously, this was done in bch2_btree_interior_update_will_free_node(), and could deadlock; now, we just stash a pointer to the node and do it in btree_update_nodes_written(), just prior to the transactional part of the update.
2021-05-30bcachefs: Split out btree_error_wqKent Overstreet
We can't use btree_update_wq becuase btree updates may be waiting on btree writes to complete. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-25fixup! bcachefs: Add a debug mode that always reads from every btree replicaKent Overstreet
2021-05-24fixup! bcachefs: Add a debug mode that always reads from every btree replicaKent Overstreet
2021-05-22bcachefs: Fix an issue with inconsistent btree writes after unclean shutdownKent Overstreet
After unclean shutdown, btree writes may have completed on one device and not others - and this inconsistency could lead us to writing new bsets with a gap in our btree node in one of our replicas. Fortunately, this is only an issue with bsets that are newer than the most recent journal flush, and we already have a mechanism for detecting and blacklisting those. We just need to make sure to start new btree writes after the most recent _non_ blacklisted bset. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-22bcachefs: Add a workqueue for btree io completionsKent Overstreet
Also, clean up workqueue usage - we shouldn't be using system workqueues, pretty much everything we do needs to be on our own WQ_MEM_RECLAIM workqueues. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-22fixup! bcachefs: Add a debug mode that always reads from every btree replicaKent Overstreet
2021-05-22bcachefs: Add a debug mode that always reads from every btree replicaKent Overstreet
There's a new module parameter, verify_all_btree_replicas, that enables reading from every btree replica when reading in btree nodes and comparing them against each other. We've been seeing some strange btree corruption - this will hopefully aid in tracking it down and catching it more often. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Fix oob write in __bch2_btree_node_writeDan Robertson
Fix a possible out of bounds write in __bch2_btree_node_write when the data buffer padding is cleared up to the block size. The out of bounds write is possible if the data buffers size is not a multiple of the block size. Signed-off-by: Dan Robertson <dan@dlrobertson.com>
2021-05-19bcachefs: New and improved topology repair codeKent Overstreet
This splits out btree topology repair into a separate pass, and makes some improvements: - When we have to pick which of two overlapping nodes to drop keys from, we use the btree node header sequence number to preserve the newer node - the gc code has been changed so that it doesn't bail out if we're continuing/ignoring on fsck error - this way the dump tool can skip running the repair pass but still walk all reachable metadata - add a new superblock flag indicating when a filesystem is known to have btree topology issues, and the topology repair pass should be run - changing the start/end of a node might mean keys in that node have to be deleted: this patch handles that better by splitting it out into a separate function and running it explicitly in the topology repair code, previously those keys were only being dropped when the btree node was read in. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Rewrite btree nodes with errorsKent Overstreet
This patch adds self healing functionality for btree nodes - if we notice a problem when reading a btree node, we just rewrite it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Punt btree writes to workqueue to submitKent Overstreet
We don't want to be submitting IO with btree locks held, and btree writes usually aren't latency sensitive. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Improve bset compactionKent Overstreet
The previous patch that fixed btree nodes being written too aggressively now meant that we weren't sorting btree node bsets optimally - this patch fixes that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Add a sysfs var for average btree write sizeKent Overstreet
Useful number for performance tuning. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Add repair code for out of order keys in a btree node.Kent Overstreet
This just drops the offending key - in the bug report where this was seen, it was clearly a single bit memory error, and fsck will fix the missing key. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Start using bpos.snapshot fieldKent Overstreet
This patch starts treating the bpos.snapshot field like part of the key in the btree code: * bpos_successor() and bpos_predecessor() now include the snapshot field * Keys in btrees that will be using snapshots (extents, inodes, dirents and xattrs) now always have their snapshot field set to U32_MAX The btree iterator code gets a new flag, BTREE_ITER_ALL_SNAPSHOTS, that determines whether we're iterating over keys in all snapshots or not - internally, this controlls whether bkey_(successor|predecessor) increment/decrement the snapshot field, or only the higher bits of the key. We add a new member to struct btree_iter, iter->snapshot: when BTREE_ITER_ALL_SNAPSHOTS is not set, iter->pos.snapshot should always equal iter->snapshot, which will be 0 for btrees that don't use snapshots, and alsways U32_MAX for btrees that will use snapshots (until we enable snapshot creation). This patch also introduces a new metadata version number, and compat code for reading from/writing to older versions - this isn't a forced upgrade (yet). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Split out bpos_cmp() and bkey_cmp()Kent Overstreet
With snapshots, we're going to need to differentiate between comparisons that should and shouldn't include the snapshot field. bpos_cmp is now the comparison function that does include the snapshot field, used by core btree code. Upper level filesystem code generally does _not_ want to compare against the snapshot field - that code wants keys to compare as equal even when one of them is in an ancestor snapshot. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Drop bkey noopsKent Overstreet
Bkey noops were introduced to deal with trimming inline data extents in place in the btree: if the u64s field of a bkey was 0, that u64 was a noop and we'd start looking for the next bkey immediately after it. But extent handling has been lifted above the btree - we no longer modify existing extents in place in the btree, and the compatibilty code for old style extent btree nodes is gone, so we can completely drop this code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Validate bset version field against sb version fieldsKent Overstreet
The superblock version fields need to be accurate to know whether a filesystem is supported, thus we should be verifying them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Require all btree iterators to be freedKent Overstreet
We keep running into occasional bugs with btree transaction iterators overflowing - this will make those bugs more visible. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Use bch2_bpos_to_text() more consistentlyKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Use x-macros for more enumsKent Overstreet
This patch standardizes all the enums that have associated string tables (probably more enums should have string tables). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Rename BTREE_ID enums for consistency with other enumsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: KEY_TYPE_discard is no longer usedKent Overstreet
KEY_TYPE_discard used to be used for extent whiteouts, but when handling over overlapping extents was lifted above the core btree code it became unused. This patch updates various code to reflect that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Kill support for !BTREE_NODE_NEW_EXTENT_OVERWRITE()Kent Overstreet
bcachefs has been aggressively migrating filesystems and btree nodes to the new format for quite some time - this shouldn't affect anyone anymore, and lets us delete a _lot_ of code. Also, it frees up KEY_TYPE_discard for a new whiteout key type for snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Don't drop ptrs to btree nodesKent Overstreet
If a ptr gen doesn't match the bucket gen, the bucket likely doesn't contain the data we want - but it's still possible the data we want might have been overwritten, and for btree node pointers we can verify whether or not the node is the one we wanted with the node's sequence number, so it's better to keep the pointer and try reading from it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Add code to scan for/rewite old btree nodesKent Overstreet
This adds a new data job type to scan for btree nodes in the old extent format, and rewrite them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Include device in btree IO error messagesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Add BTREE_PTR_RANGE_UPDATEDKent Overstreet
This is so that when we discover btree topology issues, we can just update the pointer to a btree node and signal btree read path that the min/max keys in the node header should be updated from the node pointer. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Fix an assertion popKent Overstreet
There was a race: btree node writes drop their reference on journal pins before clearing the btree_node_write_in_flight flag. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Run jset_validate in write path as wellKent Overstreet
This is because we had a bug where we were writing out journal entries with garbage last_seq, and not catching it. Also, completely ignore jset->last_seq when JSET_NO_FLUSH is true, because of aforementioned bug, but change the write path to set last_seq to 0 when JSET_NO_FLUSH is true. Minor other cleanups and comments. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Reduce/kill BKEY_PADDED useKent Overstreet
With various newer key types - stripe keys, inline data extents - the old approach of calculating the maximum size of the value is becoming more and more error prone. Better to switch to bkey_on_stack, which can dynamically allocate if necessary to handle any size bkey. In particular we also want to get rid of BKEY_EXTENT_VAL_U64s_MAX. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Try to print full btree error messageKent Overstreet
Metadata corruption bugs are hard to debug if we can't see exactly what went wrong - try to allocate a bigger buffer so we can print out everything we have. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Don't issue btree writes that weren't journalledKent Overstreet
If we have an error in the btree interior update path that prevents us from journalling the update, we can't issue the corresponding btree node write - we didn't get a journal sequence number that would cause it to be ignored in recovery. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Improve some IO error messagesKent Overstreet
it's useful to know whether an error was for a read or a write - this also standardizes error messages a bit more. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Add more debug checksKent Overstreet
tracking down a bug where we see a btree node pointer in the wrong node Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Hack around bch2_varint_decode invalid readsKent Overstreet
bch2_varint_decode can do reads up to 7 bytes past the end ptr, for the sake of performance - these extra bytes are always masked off. This won't be a problem in practice if we make sure to burn 8 bytes in any buffer that has bkeys in it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Add accounting for dirty btree nodes/keysKent Overstreet
This lets us improve journal reclaim, so that it now tries to make sure no more than 3/4s of the btree node cache and btree key cache are dirty - ensuring the shrinkers can free memory. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Drop typechecking from bkey_cmp_packed()Kent Overstreet
This only did anything in two places, and those can just be replaced wiht bkey_cmp_left_packed()). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Drop sysfs interface to debug parametersKent Overstreet
It's not used much anymore, the module paramter interface is better. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Improve some error messagesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Fix a bug with the journal_seq_blacklist mechanismKent Overstreet
Previously, we would start doing btree updates before writing the first journal entry; if this was after an unclean shutdown, this could cause those btree updates to not be blacklisted. Also, move some code to headers for userspace debug tools. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Convert various code to printbufKent Overstreet
printbufs know how big the buffer is that was allocated, so we can get rid of the random PAGE_SIZEs all over the place. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Remove some uses of PAGE_SIZE in the btree codeKent Overstreet
For portability to userspace, we should try to avoid working in kernel pages. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Add bch2_blk_status_to_str()Kent Overstreet
We define our own BLK_STS_REMOVED, so we need our own to_str helper too. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Use x-macros for data typesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Mark btree nodes as needing rewrite when not all replicas are RWKent Overstreet
This fixes a bug where recovery fails when one of the devices is read only. Also - consolidate the "must rewrite this node to insert it" behind a new btree node flag. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Use blk_status_to_str()Kent Overstreet
Improved error messages are always a good thing Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Change bch2_dump_bset() to also print key valuesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: btree_bkey_cached_commonKent Overstreet
This is prep work for the btree key cache: btree iterators will point to either struct btree, or a new struct bkey_cached. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>