Age | Commit message (Collapse) | Author |
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This behavior dates from the early, early days of bcache, and upon
further delving appears to not make any sense. The shrinker only works
in terms of 'objects' of unknown size; normalizing to pages only had the
effect of changing the batch size, which we could do directly - if we
wanted; we probably don't. Normalizing to pages meant our batch size was
very small, which seems to have been keeping us from doing as much
shrinking as we should be under heavy memory pressure; this patch
appears to alleviate some OOMs we've been seeing.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
|
|
mark_stripe_bucket() was busted; it was using @new unitialized.
Also, clean up all the gc mark functions, and convert them to the same
style.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This neatly avoids bugs where we fail partway through initializing a new
filesystem, if we just don't write out partly-initialized state.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
With printbufs, it's now easy to build up multi-line log messages and
emit them with one call, which is good because it prevents multiple
multi-line log messages from getting Interspersed in the log buffer;
this patch also improves the formatting and converts it to latest style.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Trivial cleanup.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
In a few places we were passing a variable to pr_buf() for the format
string - oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
- bch2_clear_need_discard() was using bch2_trans_relock() incorrectly,
and always bailing out before doing any work - ouch.
- Add a tracepoint that fires every time bch2_do_discards() runs, and
tells us about the work it did
- When too many buckets aren't able to be discarded because they need a
journal commit, bch2_do_discards now flushes the journal.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This patch introduces bch2_alloc_to_v4_mut() which returns a
bkey_i_alloc_v4 *, which then can be passed to bch2_trans_update()
directly.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This introduces a new alloc key which doesn't use varints. Soon we'll be
adding backpointers and storing them in alloc keys, which means our
pack/unpack workflow for alloc keys won't really work - we'll need to be
mutating alloc keys in place.
Instead of bch2_alloc_unpack(), we now have bch2_alloc_to_v4() that
converts older types of alloc keys to v4 if needed.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
For backpointers, we'll need to delete old backpointers before adding
new backpointers - otherwise we'll run into spurious duplicate
backpointer errors.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
For backpointers, we need to switch the order triggers are run in: we
need to run triggers for deletions/overwrites before triggers for
inserts.
To avoid breaking the reflink triggers, this patch moves deleting of
indirect extents with refcount=0 to their triggers, instead of doing it
when we update those keys.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Print bucket:offset when the filesystem is online; this makes debugging
easier when correlating with alloc updates.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Add a new helper for logging messages to the journal - a new debugging
tool, an alternative to trace_printk().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Inspired by CCAN darray - simple, stupid resizable (dynamic) arrays.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This fixes a bug where __bch2_btree_node_update_key() wasn't clearing
should_be_locked, leading to bch2_btree_path_traverse() always failing -
all callers of btree_path_make_mut() want should_be_locked cleared, so
do it there.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
bch2_btree_iter_next_node() was mucking with other btree_path state
without setting path->update to be consistent with the fact that the
path is very much no longer uptodate - oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Since the bucket invalidate and discard paths are required for other
allocations to make forward progress, they at a minimum need
BTREE_INSERT_USE_RESERVE. Watermarks may need further work.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This adds counters for each of the reasons we may skip allocating a
bucket - we're seeing a bug where we loop endlessly trying to allocate
when we should have plenty of buckets available, so hopefully this will
help us track down why.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
It's currently possible to end up in a half-upgraded state where we
haven't set the superblock to the new version, but we have run the
freespace initialization path. Previously, this meant when running fsck
on such a filesystem we wouldn't check the freespace btrees - which is a
problem, if they have been initialized and there's something fsck needs
to check and fix.
Fix this by making bch2_check_alloc_info() check if freespace has been
initialized on each device, not by making it run conditionally on the
superblock version.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
We weren't properly catching errors from snapshot_live() - oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
bch2_journal_space_available -> bch2_journal_halt() self deadlocks on
journal lock; work around this by dropping/retaking journal lock before
we call bch2_fatal_error().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
When deleting an entry from a heap that was at entry h->used - 1, we'd
end up calling heap_sift() on an entry outside the heap - the entry we
just removed - which would end up re-adding it to the heap and deleting
something we didn't want to delete. Oops...
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This tells the compiler to check printf format strings, and catches a
few bugs.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
We've been seeing a very strange bug where journal flush & reclaim delay
end up getting inexplicably zeroed, in the superblock. We're now
validating all the options in bch2_validate_super(), and 0 is no longer
a valid value for those options, but we need to be careful not to
prevent people's filesystems from mounting because of the new
validation.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Something funny is going on with the new code for restoring the journal
write point, and it's hard to reproduce.
We do want to debug this because resuming writing to the journal in the
wrong spot could be something serious. For now, replace the assertion
with an error message and revert to old behaviour when it happens.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
We're seeing a very strange bug where journal_flush_delay sometimes gets
set to 0 in the superblock. Together with the preceding patch, this
should help us track it down.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This moves validation of superblock options to bch2_sb_validate(), so
they'll be checked in the write path as well.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Now we've got strings for metadata versions - this changes
bch2_sb_to_text() and our mount log message to use it.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Previously, we'd go into an infinite loop when attempting to cache a
bkey in the key cache larger than 128 u64s - since we were only using a
u8 for the size field, it'd get rounded up to 256 then truncated to 0.
Oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
We don't want to run triggers when repairing inside bch2_gc() these
triggers cause ERO due to errors that will be later fixed by
bch2_check_alloc_info()
Signed-off-by: Daniel Hill <daniel@gluo.nz>
|
|
These warnings are symptomatic of something else going wrong, we don't
want them spamming up the logs as that'll make it harder to find the
real issue.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Since journal reclaim -> btree key cache flushing may require the
allocation of new btree nodes, it has an implicit dependency on copygc
in order to make forward progress - so we should avoid blocking copygc
unless the journal is really close to full.
This introduces watermarks to replace our single MAY_GET_UNRESERVED bit
in the journal, and adds a watermark for copygc and plumbs it through.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
We don't actually want copygc allocations to be nowait - an allocation
for copygc might fail and then later succeed due to a bucket needing to
wait on journal commit, or to be discarded.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This patch tweaks the journal recovery path so that we start writing
right after where we left off, instead of the next empty bucket. This is
partly prep work for supporting zoned devices, but it's also good to do
in general to avoid the journal completely filling up and getting stuck.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
- bucket_alloc_fail now indicates whether allocation was nonblocking
- we now return strings, not integers, for alloc reserve.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Also include the number of buckets available, and the number of buckets
awaiting journal commit - and add a sysfs counter, too.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This switches struct bucket to using a lock, instead of cmpxchg. And now
that the protected members no longer need to fit into a u64, we can
expand the sector counts to 32 bits.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
All code using the in-memory bucket array, excluding GC, has now been
converted to use the alloc btree directly - so we can finally delete it.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This is one of the last steps in getting rid of the main in-memory
bucket array.
This changes bch2_dev_usage_update() to take bkey_alloc_unpacked instead
of bucket_mark, and for the places where we are in fact working with
bucket_mark and don't have bkey_alloc_unpacked, we add a wrapper that
takes bucket_mark and converts to bkey_alloc_unpacked.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Don't run triggers when repairing incorrect/missing lru entries Triggers
create a conflicting call to lru_change() with the incorrect lru ptr,
lru_change attempts to delete this incorrect lru entry, and fails
because the back ptr doesn't match the original bucket causing fsck to
error.
Signed-off-by: Daniel Hill <daniel@gluo.nz>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
In the old allocator code, preparing an existing empty bucket was part
of the same code path that invalidated buckets containing cached data.
In the new allocator code this is no longer the case: the main allocator
path finds empty buckets (via the new freespace btree), and can't
allocate buckets that contain cached data.
We now need a separate code path to invalidate buckets containing cached
data when we're low on empty buckets, which this patch implements. When
the number of free buckets decreases that triggers the new invalidate
path to run, which uses the LRU btree to pick cached data buckets to
invalidate until we're above our watermark.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
In the old allocator code, buckets would be discarded just prior to
being used - this made sense in bcache where we were discarding buckets
just after invalidating the cached data they contain, but in a
filesystem where we typically have more free space we want to be
discarding buckets when they become empty.
This patch implements the new behaviour - it checks the need_discard
btree for buckets awaiting discards, and then clears the appropriate
bit in the alloc btree, which moves the buckets to the freespace btree.
Additionally, discards are now enabled by default.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Now that we have new persistent data structures for the allocator, this
patch converts the allocator to use them.
Now, foreground bucket allocation uses the freespace btree to find
buckets to allocate, instead of popping buckets off the freelist.
The background allocator threads are no longer needed and are deleted,
as well as the allocator freelists. Now we only need background tasks
for invalidating buckets containing cached data (when we are low on
empty buckets), and for issuing discards.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This adds two new btrees for the upcoming allocator rewrite: an extents
btree of free buckets, and a btree for buckets awaiting discards.
We also add a new trigger for alloc keys to keep the new btrees up to
date, and a compatibility path to initialize them on existing
filesystems.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|