summaryrefslogtreecommitdiff
path: root/drivers/md/bcache/request.c
AgeCommit message (Collapse)Author
2017-01-18bcache: move top level read/write code to io.cKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: implement -ENOSPCKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: add real locking for cache_tierKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18Revert "bcache: use bch_read in cached_dev_make_request()" (SQUASH)Kent Overstreet
This reverts commit 1079b0ee3256715c5eb72aa15be22e4cd39c4b05. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: Mark data from cache misses as cachedKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: rework alloc_failed()Kent Overstreet
Rework the way mark and sweep gc and the allocator threads are kicked: Now, mark and sweep gc is only kicked from the allocator thread when the allocator thread can't find free buckets, and when there are buckets with saturated counters _and_ we have tried to subtract one allocator batch's worth of sectors from saturated buckets since the last gc. The allocator thread now explicitly waits on the number of available buckets, and is kicked from buckets.c when we make buckets available, and garbage collection when we reset the bucket counts at the end. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: Add bch_check_mark_super()Kent Overstreet
superblock marking needs to be pulled out of bch_bucket_alloc(), because of locking changes coming in the next patch This patch also correctly marks devices that have journal entries as containing metadata Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: rip out list of open buckets for a simple hash by write pointKent Overstreet
Also add an explicit struct write_point, that the caller of bch_data_insert() specifies: the idea is that write_point specifies how the allocation should happen. This makes a lot of things cleaner - copygc can specify the correct write point before calling bch_data_move(), which means that it can strip off the pointer to the old copy instead having gc_alloc_sectors() do everything - the old way with gc_alloc_sectors() was fairly sketchy and difficult to follow. Also, since bch_alloc_sectors() doesn't modify any of the old pointers we can get rid of the old ptrs_to_write() bitfield - bch_alloc_sectors is only appending new pointers, so we just write pointers starting from wherever they ended when we started. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: don't overrun buffer in bch_data_insert()Kent Overstreet
Need to drop old stale pointers before adding new ones Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: selectable checksum typeKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: add struct cache_member to superblockKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: BITMASK() macro improvementsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: fix typos in commentsSlava Pestov
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: Give tiering, copygc their own workqueues for bcache writeKent Overstreet
This fixes a performance issue where e.g. a tiering write may block the shared workqueue that foreground writes are also using, because bch_data_insert_start() -> generic_make_request() is blocking in a driver's make request fn. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: use bch_read() in flash_dev_make_request()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: return IO error when data's missing that's not supposed to beKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: fix overflow with sectors_until_gcSlava Pestov
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: Remove devices at runtimeKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: refactor prio clock stuff a bitKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: Wait on journal write with correct closureKent Overstreet
When we need to do a sync write (i.e. wait on the journal write to hit disk), we do that by passing a closure to the journalling code to go on a waitlist. bch_btree_insert_node() was using the wrong closure - a closure can only be on one waitlist at a time, and the closure in struct btree_op can be on other waitlists. Make the closure an explicit argument so the caller can pass one that's safe to use. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: handle read bucket invalidate race in bch_read()Slava Pestov
Previously there was a chance this would return wrong data. Now, we use a custom endio function for bch_read()'s leaf bios. If the ptr is stale upon completion of the read, requeue the bio on a per-cache set list and kick off a work item to feed it back through bch_bio(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: clean up bbio code a bitSlava Pestov
- add punt parameter to bch_submit_bbio() and use it from bch_submit_bbio_replicas() - open-code bch_bbio_prep() now that it's only called in one place - rename __bch_bbio_prep() to bch_bbio_prep() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: better comments in request.cSlava Pestov
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: bch_read() uses MAP_HOLES just like cache_lookup()Slava Pestov
This eliminates the requirement to zero out the entire bio first, which should save CPU time. It also simplifies bch_read_fn(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: handle deadlock introduced by six locksKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: SIX locks (shared/intent/exclusive)Kent Overstreet
These are being introduced for locking btree nodes, replacing the rw sempahores that were used previously (and also the write_lock mutex). This both simplifies the old contorted locking scheme, and will greatly improve concurrency. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: Track locks held in btree_opKent Overstreet
This is prep work for introducing six locks - with six locks, we're going to have to do some funny things with btree locking in order to avoid deadlocks. It ought to allow some other refactorings too. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: tiering and moving GC only trigger btree GC if allocator is waitingSlava Pestov
Otherwise, we needlessly run btree GC every 30 seconds. Also, change gc_count into an atomic and get rid of gc_lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: fix __cache_promote()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: Pass bbios as bbiosKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: simplify bch_read_fn()Kent Overstreet
No need to trim the replace key, which means we don't need a temporary key Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: stats cleanupKent Overstreet
there was no point in the mark_*() functions not being inlined, so move them to request.c and inline them. Also add a stat for discards, and make all the sector counts print out in sysfs in human readable units. Reformat stuff to match normal bcache style. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: More tracepointsKent Overstreet
Mostly renaming tracepoints for consistency - also add a tiering_copy tracepoint to match the gc_copy tracepoint. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: drop bch_bbio_stale()Kent Overstreet
this was incorrect, at least as added - dirty pointers are never going to go stale out from under us, so this was leading to spurious data corruption errors with fault injection enabled. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: always clone bio in bch_read()Slava Pestov
The old code was wrong because cache_promote() assumes the bio is a bbio with a valid key. Also, soon bch_read() will want to set its own endio function. This, too, will require a bbio with a valid key. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: cache promotion fixesSlava Pestov
- search->bypass was not being initialized, so we would randomly bypass the insert - cache_promote_op wasn't properly handling read race; it must set a flag to skip the cache insert step, but it must not fail the original bio from struct search with -EINTR, or else we fail the overall IO - we would chop off all but one ptrs from the replace_key, which would fail with a collision if the key had multiple ptrs - actually update cache_hit, cache_miss stats for flash-only volumes bch_read() is still totally broken. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: Call bch_increment_clock() from read/write codeKent Overstreet
previously, it wasn't being called in the ioctl io paths Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: Use fewer workqueuesKent Overstreet
With the recent stuff to make allocation all asynchronous, we now shouldn't need different workqueues for tiering, copy gc, etc. - so consolidate down to a single workqueue. This ought to help performance a bit, as previously a write request would run out of two different workqueues for different parts of the write path, now we'll only be using a single worker thread. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: Don't punt bio submits to wq in bch_data_insert()Kent Overstreet
Performance optimization - the punting is needed for btree node writes, but it isn't for normal data writes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: add MAP_ASYNC flag to map_nodes / map_keysSlava Pestov
If not passed in, we will wait on the btree_op's closure if the map function returns -EAGAIN. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: put a closure inside btree_opSlava Pestov
This closure is used to wait on btree node allocation. The code around the bch_data_insert_keys() function has been redone to use this closure. Also, the cache_lookup() function in request.c is now called by s->op.cl and not s->iop.cl. This continues the effort to decouple the read path from the data_insert_op, which should only be used for writes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: refactor bch_btree_insert()Slava Pestov
bch_btree_insert() now blocks on bucket allocation, and request.c directly uses bch_btree_map_leaf_nodes() / bch_btree_insert_node(). This actually simplifies some things and removes code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: btree node reserve for tieringSlava Pestov
If the source tier is full, btree_check_reserve() would block. This could prevent tiering from making progress in copying to the destination tier. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: add id and reserve fields to btree_opSlava Pestov
Previously functions that operated on btrees would take both a btree_op and a btree_id. Adding the id to btree_op simplifies parameter lists. It also allows the caller to pass in the btree node reserve to use for allocations. This is cleaner than having a moving_gc bit inside btree_op, and with a future patch that adds a tiering reserve it eliminates the need to have a separate tiering bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: bch_data_insert_keys() used wrong workqueueSlava Pestov
Everywhere else it runs in btree_insert_wq, here was op->wq for some reason. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: clean up data_insert_op->bypassSlava Pestov
We'd like to eventually stop touching data_insert_op on read, so add a new bypass bit to 'struct search', and rename data_insert_op's 'bypass' field to 'discard' to reflect what it's doing (discarding the key range). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: updating comments in request.c and friendsSlava Pestov
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: asynchronous btree node allocationSlava Pestov
- split off cache_set->mca_wait from btree_cache_wait - change btree_cache_wait to a closure_waitlist - btree_check_reserve() now takes a closure and will return -EAGAIN if the allocation should be retried once the closure completes, instead of the old behavior of adding the op to a wait queue and returning -EINTR - bch_btree_insert_node() and bch_btree_insert() now use the parent closure for waiting on btree allocation; a new flush parameter indicates if this closure should be used for journal writes too - callers of bch_btree_insert() which expect it to block on allocation now call a new bch_btree_insert_sync() method Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: log errors from cache lookupSlava Pestov
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18bcache: obey bypass in cache_lookup_fnKent Overstreet
this is just preserving the original meaning of bypass (don't write to or promote to the cache). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>