Age | Commit message (Collapse) | Author |
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This reverts commit 1079b0ee3256715c5eb72aa15be22e4cd39c4b05.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Rework the way mark and sweep gc and the allocator threads are kicked:
Now, mark and sweep gc is only kicked from the allocator thread when the
allocator thread can't find free buckets, and when there are buckets with
saturated counters _and_ we have tried to subtract one allocator batch's worth
of sectors from saturated buckets since the last gc.
The allocator thread now explicitly waits on the number of available buckets,
and is kicked from buckets.c when we make buckets available, and garbage
collection when we reset the bucket counts at the end.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
superblock marking needs to be pulled out of bch_bucket_alloc(), because of
locking changes coming in the next patch
This patch also correctly marks devices that have journal entries as containing
metadata
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Also add an explicit struct write_point, that the caller of bch_data_insert()
specifies: the idea is that write_point specifies how the allocation should
happen.
This makes a lot of things cleaner - copygc can specify the correct write point
before calling bch_data_move(), which means that it can strip off the pointer to
the old copy instead having gc_alloc_sectors() do everything - the old way with
gc_alloc_sectors() was fairly sketchy and difficult to follow.
Also, since bch_alloc_sectors() doesn't modify any of the old pointers we can
get rid of the old ptrs_to_write() bitfield - bch_alloc_sectors is only
appending new pointers, so we just write pointers starting from wherever they
ended when we started.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Need to drop old stale pointers before adding new ones
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This fixes a performance issue where e.g. a tiering write may block the shared
workqueue that foreground writes are also using, because bch_data_insert_start()
-> generic_make_request() is blocking in a driver's make request fn.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
When we need to do a sync write (i.e. wait on the journal write to hit disk), we
do that by passing a closure to the journalling code to go on a waitlist.
bch_btree_insert_node() was using the wrong closure - a closure can only be on
one waitlist at a time, and the closure in struct btree_op can be on other
waitlists. Make the closure an explicit argument so the caller can pass one
that's safe to use.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Previously there was a chance this would return wrong data. Now, we
use a custom endio function for bch_read()'s leaf bios. If the ptr
is stale upon completion of the read, requeue the bio on a
per-cache set list and kick off a work item to feed it back through
bch_bio().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
- add punt parameter to bch_submit_bbio() and use it from
bch_submit_bbio_replicas()
- open-code bch_bbio_prep() now that it's only called in one
place
- rename __bch_bbio_prep() to bch_bbio_prep()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This eliminates the requirement to zero out the entire bio first, which
should save CPU time. It also simplifies bch_read_fn().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
These are being introduced for locking btree nodes, replacing the rw sempahores
that were used previously (and also the write_lock mutex).
This both simplifies the old contorted locking scheme, and will greatly improve
concurrency.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This is prep work for introducing six locks - with six locks, we're going to
have to do some funny things with btree locking in order to avoid deadlocks. It
ought to allow some other refactorings too.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Otherwise, we needlessly run btree GC every 30 seconds. Also, change gc_count
into an atomic and get rid of gc_lock.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
No need to trim the replace key, which means we don't need a temporary key
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
there was no point in the mark_*() functions not being inlined, so move them to
request.c and inline them.
Also add a stat for discards, and make all the sector counts print out in sysfs
in human readable units.
Reformat stuff to match normal bcache style.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Mostly renaming tracepoints for consistency - also add a tiering_copy tracepoint
to match the gc_copy tracepoint.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
this was incorrect, at least as added - dirty pointers are never going to go
stale out from under us, so this was leading to spurious data corruption errors
with fault injection enabled.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
The old code was wrong because cache_promote() assumes the bio is a
bbio with a valid key. Also, soon bch_read() will want to set its own
endio function. This, too, will require a bbio with a valid key.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
- search->bypass was not being initialized, so we would randomly
bypass the insert
- cache_promote_op wasn't properly handling read race; it must set
a flag to skip the cache insert step, but it must not fail the
original bio from struct search with -EINTR, or else we fail the
overall IO
- we would chop off all but one ptrs from the replace_key, which
would fail with a collision if the key had multiple ptrs
- actually update cache_hit, cache_miss stats for flash-only volumes
bch_read() is still totally broken.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
previously, it wasn't being called in the ioctl io paths
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
With the recent stuff to make allocation all asynchronous, we now shouldn't need
different workqueues for tiering, copy gc, etc. - so consolidate down to a
single workqueue.
This ought to help performance a bit, as previously a write request would run
out of two different workqueues for different parts of the write path, now we'll
only be using a single worker thread.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Performance optimization - the punting is needed for btree node writes, but it
isn't for normal data writes.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
If not passed in, we will wait on the btree_op's closure if the
map function returns -EAGAIN.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This closure is used to wait on btree node allocation. The code around
the bch_data_insert_keys() function has been redone to use this closure.
Also, the cache_lookup() function in request.c is now called by s->op.cl
and not s->iop.cl. This continues the effort to decouple the read path
from the data_insert_op, which should only be used for writes.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
bch_btree_insert() now blocks on bucket allocation, and request.c
directly uses bch_btree_map_leaf_nodes() / bch_btree_insert_node().
This actually simplifies some things and removes code.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
If the source tier is full, btree_check_reserve() would block. This could
prevent tiering from making progress in copying to the destination tier.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Previously functions that operated on btrees would take both a btree_op and
a btree_id. Adding the id to btree_op simplifies parameter lists. It also
allows the caller to pass in the btree node reserve to use for allocations.
This is cleaner than having a moving_gc bit inside btree_op, and with a
future patch that adds a tiering reserve it eliminates the need to have a
separate tiering bit.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Everywhere else it runs in btree_insert_wq, here was op->wq for
some reason.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
We'd like to eventually stop touching data_insert_op on read, so
add a new bypass bit to 'struct search', and rename data_insert_op's
'bypass' field to 'discard' to reflect what it's doing (discarding
the key range).
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
- split off cache_set->mca_wait from btree_cache_wait
- change btree_cache_wait to a closure_waitlist
- btree_check_reserve() now takes a closure and will return
-EAGAIN if the allocation should be retried once the closure
completes, instead of the old behavior of adding the op to a
wait queue and returning -EINTR
- bch_btree_insert_node() and bch_btree_insert() now use the
parent closure for waiting on btree allocation; a new
flush parameter indicates if this closure should be used
for journal writes too
- callers of bch_btree_insert() which expect it to block on
allocation now call a new bch_btree_insert_sync() method
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
this is just preserving the original meaning of bypass (don't write to or
promote to the cache).
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|