summaryrefslogtreecommitdiff
path: root/fs/bcachefs/io.c
AgeCommit message (Collapse)Author
2021-06-02bcachefs: Check for errors from bch2_trans_update()Kent Overstreet
Upcoming refactoring is going to change bch2_trans_update() to start returning transaction restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-22bcachefs: Add a workqueue for btree io completionsKent Overstreet
Also, clean up workqueue usage - we shouldn't be using system workqueues, pretty much everything we do needs to be on our own WQ_MEM_RECLAIM workqueues. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-20bcachefs: Fix for buffered writes getting -ENOSPCKent Overstreet
Buffered writes may have to increase their disk reservation at btree update time, due to compression and erasure coding being unpredictable: O_DIRECT writes should be checking for -ENOSPC, but buffered writes have already been accepted and should not. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Split extents if necessary in bch2_trans_update()Kent Overstreet
Currently, we handle multiple overlapping extents in the same transaction commit by doing fixups in bch2_trans_update() - this patch extents that to split updates when necessary. The next patch that changes the reflink code to not fragment extents when making them indirect will require this. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Move io_in_flight ratelimiting to fs-io.cKent Overstreet
This fixes a bug where an async O_DIRECT write that required multiple bch2_write calls could deadlock, because bch2_write runs out of the same workqueue used for index updates and would block on the io_in_flight semaphore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: made changes to support clang, fixed a couple bugsBrett Holman
fs/bcachefs/bset.c edited prefetch macro to add clang support fs/bcachefs/btree_iter.c bugfix: initialize iter->real_pos in bch2_btree_iter_init for later use fs/bcachefs/io.c bugfix: eliminated undefined behavior (negative bitshift) fs/bcachefs/buckets.c bugfix: invert sign to handle 64bit abs()
2021-05-19bcachefs: Fix reflink triggerKent Overstreet
The trigger for reflink pointers wasn't always incrementing/decrementing the refcounts correctly - this patch fixes that logic. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Call bch2_inconsistent_error() on missing stripe/indirect extentKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Start using bpos.snapshot fieldKent Overstreet
This patch starts treating the bpos.snapshot field like part of the key in the btree code: * bpos_successor() and bpos_predecessor() now include the snapshot field * Keys in btrees that will be using snapshots (extents, inodes, dirents and xattrs) now always have their snapshot field set to U32_MAX The btree iterator code gets a new flag, BTREE_ITER_ALL_SNAPSHOTS, that determines whether we're iterating over keys in all snapshots or not - internally, this controlls whether bkey_(successor|predecessor) increment/decrement the snapshot field, or only the higher bits of the key. We add a new member to struct btree_iter, iter->snapshot: when BTREE_ITER_ALL_SNAPSHOTS is not set, iter->pos.snapshot should always equal iter->snapshot, which will be 0 for btrees that don't use snapshots, and alsways U32_MAX for btrees that will use snapshots (until we enable snapshot creation). This patch also introduces a new metadata version number, and compat code for reading from/writing to older versions - this isn't a forced upgrade (yet). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Get disk reservation when overwriting data in old snapshotKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-05-19bcachefs: Require all btree iterators to be freedKent Overstreet
We keep running into occasional bugs with btree transaction iterators overflowing - this will make those bugs more visible. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Fix read retry path for indirect extentsKent Overstreet
In the read path, for retry of indirect extents to work we need to differentiate between the location in the btree the read was for, vs. the location where we found the data. This patch adds that plumbing to bch_read_bio. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Consolidate bch2_read_retry and bch2_read()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Rename BTREE_ID enums for consistency with other enumsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Correctly order flushes and journal writes on multi device filesystemsKent Overstreet
All writes prior to a journal write need to be flushed before the journal write itself happens. On single device filesystems, it suffices to mark the write with REQ_PREFLUSH|REQ_FUA, but on multi device filesystems we need to issue flushes to every device - and wait for them to complete - before issuing the journal writes. Previously, we were issuing flushes to every device, but we weren't waiting for them to complete before issuing the journal writes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Reduce/kill BKEY_PADDED useKent Overstreet
With various newer key types - stripe keys, inline data extents - the old approach of calculating the maximum size of the value is becoming more and more error prone. Better to switch to bkey_on_stack, which can dynamically allocate if necessary to handle any size bkey. In particular we also want to get rid of BKEY_EXTENT_VAL_U64s_MAX. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Change when we allow overwritesKent Overstreet
Originally, we'd check for -ENOSPC when getting a disk reservation whenever the new extent took up more space on disk than the old extent. Erasure coding screwed this up, because with erasure coding writes are initially replicated, and then in the background the extra replicas are dropped when the stripe is created. This means that with erasure coding enabled, writes will always take up more space on disk than the data they're overwriting - but, according to posix, overwrites aren't supposed to return ENOSPC. So, in this patch we fudge things: if the new extent has more replicas than the _effective_ replicas of the old extent, or if the old extent is compressed and the new one isn't, we check for ENOSPC when getting the disk reservation - otherwise, we don't. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Don't use BTREE_INSERT_USE_RESERVE so muchKent Overstreet
Previously, we were using BTREE_INSERT_RESERVE in a lot of places where it no longer makes sense. - we now have more open_buckets than we used to, and the reserves work better, so we shouldn't need to use BTREE_INSERT_RESERVE just because we're holding open_buckets pinned anymore. - We have the btree key cache for updates to the alloc btree, meaning we no longer need the btree reserve to ensure the allocator can make forward progress. This means that we should only need a reserve for btree updates to ensure that copygc can make forward progress. Since it's now just for copygc, we can also fold RESERVE_BTREE into RESERVE_MOVINGGC (the allocator's freelist reserve). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Fix iterator overflow in move pathKent Overstreet
The move path was calling bch2_bucket_io_time_reset() for cached pointers (which it shouldn't have been), and then not calling bch2_trans_reset() when it got -EINTR (indicating transaction restart). Oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Don't write bucket IO time lazilyKent Overstreet
With the btree key cache code, we don't need to update the alloc btree lazily - and this will mean we can remove the bch2_alloc_write() call in the shutdown path. Future work: we really need to expend the bucket IO clocks from 16 to 64 bits, so that we don't have to rescale them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Always check if we need disk res in extent update pathKent Overstreet
With erasure coding, we now have processes in the background that compact data, causing it to take up less space on disk than when it was written, or potentially when it was read. This means that we can't trust the page cache when it says "we have data on disk taking up x amount of space here" - there's always the potential to race with background compaction. To fix this, just check if we need to add to our disk reservation in the bch2_extent_update() path, in the transaction that will do the btree update. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Improve some IO error messagesKent Overstreet
it's useful to know whether an error was for a read or a write - this also standardizes error messages a bit more. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: bch2_trans_get_iter() no longer returns errorsKent Overstreet
Since we now always preallocate the maximum number of iterators when we initialize a btree transaction, getting an iterator never fails - we can delete a fair amount of error path code. This patch also simplifies the iterator allocation code a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: New varintsKent Overstreet
Previous varint implementation used by the inode code was not nearly as fast as it could have been; partly because it was attempting to encode integers up to 96 bits (for timestamps) but this meant that encoding and decoding the length required a table lookup. Instead, we'll just encode timestamps greater than 64 bits as two separate varints; this will make decoding/encoding of inodes significantly faster overall. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Build fixes for 32bit x86Kent Overstreet
PAGE_SIZE and size_t are not unsigned longs on 32 bit, annoying... also switch to atomic64_cmpxchg instead of cmpxchg() for journal_seq_copy, as atomic64_cmpxchg has a fallback that uses spinlocks for when it's not supported. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Indirect inline data extentsKent Overstreet
When inline data extents were added, reflink was forgotten about - we need indirect inline data extents for reflink + inline data to work correctly. This patch adds them, and a new feature bit that's flipped when they're used. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Fix rare use after free in read pathKent Overstreet
If the bkey_on_stack_reassemble() call in __bch2_read_indirect_extent() reallocates the buffer, k in bch2_read - which we pointed at the bkey_on_stack buffer - will now point to a stale buffer. Whoops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Don't drop replicas when copygcing ec dataKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Fix a couple null ptr derefs when no disk groups existKent Overstreet
Normally successfully parsing a target means disk groups should exist, but we don't want a BUG() or null ptr deref if we end up with an invalid target. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Don't block on allocations when only writing to specific deviceKent Overstreet
Since the copygc thread is now global and not per device, we're not freeing up space on any one device in bounded time - and indeed we never really were, since rebalance wasn't moving data around between devices with that objective. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Don't disallow btree writes to RO devicesKent Overstreet
There's an inherent race with setting devices RO when they have dirty btree nodes on them. We already check if a btree node is on an RO device before we dirty it, so this patch just allows those writes so that we don't have errors forcing the entire filesystem read only when trying to remove a device. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Fix a race with BCH_WRITE_SKIP_CLOSURE_PUTKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Add bch2_blk_status_to_str()Kent Overstreet
We define our own BLK_STS_REMOVED, so we need our own to_str helper too. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Wrap write path in memalloc_nofs_save()Kent Overstreet
This fixes a lockdep splat where we're allocating memory with vmalloc in the compression bounce path, which doesn't always obey GFP_NOFS. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Use x-macros for data typesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Use blk_status_to_str()Kent Overstreet
Improved error messages are always a good thing Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Refactor dio write code to reinit bch_write_opKent Overstreet
This fixes a bug where the BCH_WRITE_SKIP_CLOSURE_PUT was set incorrectly, causing the completion to be delivered multiple times. oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Hacky io-in-flight throttlingKent Overstreet
We've been seeing btree updates get stuck, due to some sort of bug; when this happens, buffered writeback will keep queueing up writes that lead to the system running out of memory. Not sure if this kind of throttling is something we'll want to keep and improve, or get rid of when the bug with btree updates getting stuck is fixed. For now it should make debugging easier. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: fix stack corruptionYuxuan Shui
When a bkey_on_stack is passed to bch_read_indirect_extent, there is no guarantee that it will be big enough to hold the bkey. And bch_read_indirect_extent is not aware of bkey_on_stack to call realloc on it. This cause a stack corruption. This commit makes bch_read_indirect_extent aware of bkey_on_stack so it can call realloc when appropriate. Tested-by: Yuxuan Shui <yshuiv7@gmail.com> Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
2021-04-27bcachefs: Fix a workqueue deadlockKent Overstreet
writes running out of a workqueue (via dio path) could block and prevent other writes from calling bch2_write_index() and completing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Shut down quickerKent Overstreet
Internal writes (i.e. copygc/rebalance operations) shouldn't be blocking on the allocator when we're going RO. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Fix another iterator leakKent Overstreet
This updates bch2_rbio_narrow_crcs() to the current style for transactional btree code, and fixes a rare panic on iterator overflow. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Don't log errors that are expected during shutdownKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Kill TRANS_RESET_MEM|TRANS_RESET_ITERSKent Overstreet
All iterators should be released now with bch2_trans_iter_put(), so TRANS_RESET_ITERS shouldn't be needed anymore, and TRANS_RESET_MEM is always used. Also convert more code to __bch2_trans_do(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Check for bad key version numberKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Track incompressible dataKent Overstreet
This fixes the background_compression option: wihout some way of marking data as incompressible, rebalance will keep rewriting incompressible data over and over. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Make sure bch2_read_extent obeys BCH_READ_MUST_CLONEKent Overstreet
This fixes the bch2_read_retry_nodecode() path, we were resubmitting a bio without properly reinitializing it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Split out btree_trigger_flagsKent Overstreet
The trigger flags really belong with individual btree_insert_entries, not the transaction commit flags - this splits out those flags and unifies them with the BCH_BUCKET_MARK flags. Todo - split out btree_trigger.c from buckets.c Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Convert some enums to x-macrosKent Overstreet
Helps for preventing things from getting out of sync. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2021-04-27bcachefs: Fix a use after freeKent Overstreet
op->end_io may free the op struct Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>