linux-bcache.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2017-01-18	bcache: MAP_END_KEY -> MAP_HOLES	Kent Overstreet
	MAP_END_KEY was created for cases (like cache_lookup_fn) where if we _don't_ find the key we're looking for, we still need to do something with that btree node. This has since come up in various places (inode/dirent creation in particular), but it was only just now that I realized that in all these cases what we really want to do is iterate over the _keyspace_, not just the keys that happen to be present. So, that's what MAP_HOLES does now - the code that implements it is gross, but it _considerably_ simplifies all the users. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: better iterator invalidation	Kent Overstreet
	Previously, if a btree map fn invalidated an iterator, it would force bch_btree_map_keys()/bch_btree_map_nodes() to redo the lookup by returning -EINTR. But we're going to rework bch_btree_map_keys() to make sure it passes every key to the map fn precisely once, so we need to differentiate between cases where the map fn does need to be called again (for which it will still return -EINTR) from cases where it's done with that key, but the iterator was still invalidated. This also means the map_fn doesn't need to know about bch_btree_insert_node() or soemthing else it calls invalidating the iterator. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: Better sanity checking in bch_data_insert()	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: Make bucket allocation asynchronous again	Kent Overstreet
	Bucket allocation in bcache used to be asynchronous, ages and ages ago, but for awhile I was making various asynchronous stuff synchronous. With moving gc and now tiering becoming more important, and also the addition of refcounted struct open_bucket to get rid of the bucket refcount, this has turned out to be impractical - to avoid deadlocks we'd need a ridiculous number of workqueues. We can avoid the deadlocks by going back to making various things asynchronous - the conversion is pretty straightforward. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: separate btree node reserve for moving GC	Slava Pestov
	If moving GC runs out of buckets to write btree nodes to, it will block waiting for the allocator to produce more buckets. But the allocator might be waiting for btree GC to finish, which will not yield any more buckets until moving GC is done, etc. To avoid this scenario, we reserve an equal number of buckets for btree node writes from moving GC as the size of the free_inc list. This ensures that for every btree node that's re-written from moving GC, a bucket is returned to free_inc. So if moving GC uses up its entire reserve, it will free up free_inc, allowing the allocator to write out prios and gens and re-distribute free_inc among the various free lists. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: put moving GC into its own per-cache thread	Slava Pestov
	This allows us to have a per-cache device PD controller, which is closer to the behavior that we want here. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: better ewma_add() macro	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: priorities algorith changed to CLOCK	Nicholas Swenson
	The previous algorithm always set read_prio to max, and then decremented all bucket prios every so many bytes. This required looping over all the buckets which could take awhile as cache sizes get larger. The new algorithm instead uses a circular clock, a u16 thats incremented every so often, and sets read_prio to this clock when read from. This doesn't require looping over all the buckets to increment the clock every time. Occasionally, because data may be in the cache for an arbitrary amount of time, we may need to loop over all the buckets to rescale the prios, and create more space for new prios. Signed-off-by: Nicholas Swenson <nks@daterainc.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: mark extents with 0 ptrs deleted	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: Add bch_read()	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: Store inodes in a btree	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: multiple btrees	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: New bkey version	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: Add cache_promote() (XXX: unfinished)	Kent Overstreet
	need to redo readahead Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: Treat extra pointers as cached copies	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: Copy data to slower tier in background	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: Assign cache devices to different tiers	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: Generational garbage collection	Nicholas Swenson
	Generational grouping is a technique that attempts to keep hot and cold data separate. It essentially just groups data based on age, to keep hot and cold data together. Previous patches have sorted the data we are compacting into genertaions, stored in GC_GEN. This patch does the actual writing to different buckets. To allow for generation sorting, moving_gc needed its own sector allocation, as the standard allocator sticks data sticks new writes into whatever bucket pops out of the lru. Moving gc_alloc_sectors is new function created to handle this allocation. It will take the GC_GEN, and allocate sectors in the appropriate bucket for that get. Each cache keeps an array of gc_open_buckets to allocate sectors from. To mark the use of the gc_sector allocator, data_insert_op received a new field moving_gc to denote such a write. note: removed write_prio from bch_alloc_seectors because it was only use to distinguish gc writes from others. This is not needed because gc_writes use a different allocator. note: this doesn't support multiple cache devices. It will only move ptr[0] and ignore the rest. Signed-off-by: Nicholas Swenson <nks@daterainc.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: Replicate writes to multiple cache devs	Nicholas Swenson
	This mostly invoves a new function to handle the submission of bios to multiple ptrs in a key. It simply clones the original bio, and submits it to all the devs a key points to. The moving_gc write path needed changes to handle moving a single key while preserving ptrs to other devices, and to handle the possibility of moving multiple keys. To do this, I detached the allocation paths of foreground and gc wrties. Perhaps later they can be realigned but for now this is simplest. Signed-off-by: Nicholas Swenson <nks@daterainc.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: bch_extent_normalize()	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: bch_extent_pick_ptr()	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: Kill bucket pin	Kent Overstreet
	bucket->pin has been a _continual_ source of consternation and bugs; it prevents buckets from being garbage collected (originally it was for preventing buckets from being reused while we were reading from them too), but the ownership semantics were always... hazy, at best. But it's finally gone! Now, struct open_bucket is the primary mechanism for preventing something from being garbage collected until a pointer to it ends up in the btree or wherever garbage collection will eventually find it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: GC now marks open buckets	Kent Overstreet
	More prep work for killing bucket->pin... Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: Add refcount to open_buckets	Kent Overstreet
	Prep work for finally getting rid of bucket->pin! The way we're gonig to get rid of bucket pin is by having buckets owned by something that the garbage collector can find until after we've inserted the new keys that point to that bucket into the btree. This adds code for allocating struct open_buckets, and freeing them when their refcount goes to 0, and reworks bch_alloc_sectors() for the new way of allocating buckets and to pass the pointer to the struct open_bucket back to bch_data_insert_start(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: Add a workqueue for btree insertions	Kent Overstreet
	The next patch is going to make struct open_bucket something with a refcount that gets allocated, and the refs will be dropped after we do the index update (in bch_data_insert_keys()); we can't drop the ref (i.e. freeing it) from the same workqueue as we do the allocation - that would block frees - so add a new workqueue for bch_data_insert_keys(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: Centralize bch_keybuf_check_overlapping()	Kent Overstreet
	bch_keybuf_check_overlapping() is used as an optimization to if possible keep copygc from moving data around that we're about to overwrite (note that it's not an optimization when used on the writeback keybuf! that's critical for cache coherency). Anyways, it goes with writing data to the cache, so stick it there. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: Make wait explicitly set by data_insert caller	Nicholas Swenson
	Having it implicitly set by whether or not the key is dirty or not is terrible, and not even what we want for other users of bch_data_insert(). Signed-off-by: Nicholas Swenson <nks@daterainc.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: bch_data_insert path refactoring	Kent Overstreet
	Changed bch_data_insert_start to use a key for header info tracking, so it can bkey_copy into the new key; add also add bch_data_insert_op_init() to ensure required fields are initialized. We're ending up with various places that need to write some data to the cache, but already have the key that it should be inserted with (e.g. copygc, the upcoming tiering code, potentially various other fun stuff) - so instead of taking a key, breaking the fields out to set up data_insert_op, and having data_insert() reassemble them... just use a damn bkey. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: foreground/background write ratio in sysfs	Nicholas Swenson
	Added foreground and gc write tracking to the ewma cache stats. Also outputs a percetage of foreground writes. Signed-off-by: Nicholas Swenson <nks@daterainc.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: Split out alloc.h	Kent Overstreet
	Trying to pull more stuff out of bcache.h Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: Reorganize struct cache_set	Kent Overstreet
	struct cache_set is way too big, but we can at least _attempt_ to organize it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: make MAP_END_KEY less error prone	Kent Overstreet
	Now, if you ask for MAP_END_KEY you get passed NULL for the key at the end of each btree node - this seemed uglier to me at the time than what was done before this patch, but the old behaviour led to a bunch of bugs in the new inode/dirent code that's still out of tree. Now, if MAP_END_KEY is misused it'll be caught right away with a null pointer deref. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: Drop unneeded bch_keylist_realloc() wrapper	Kent Overstreet
	The logic for not asking for too much journal space at a time isn't needed anymore because of the journalling rework in the last patch - for leaf nodes, bch_btree_insert_keys() only asks for space in the journal for one key at a time. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: Journal when we insert keys, not before	Kent Overstreet
	Previously, journalling an index update and doing the index update were two different operations. This was problematic even in the current code, and was going to be a major issue for future work; basically, any index update that required traversing and locking the btree before we know what's actually being done (which currently includes replace operations) couldn't make use of the journal. Now, any index update that uses bch_btree_insert_node() gets journalled - i.e. everything (at least to leaf nodes, for now). This also means the order that index updates happen in is preserved in the journal, which was a bug waiting to happen (if it wasn't a bug already). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: Massive journalling rework	Kent Overstreet
	Prep work for moving where keys are journalled to within the btree insertion code: This adds bch_journal_write_get() and bch_journal_write_put(), which in the next patch will be used from the btree code to journal keys immediately after they've been added to a btree node, while that node is still locked. This also does some general refactoring, and it also changes the journalling code to not require a workqueue for anything important (in particular, the next journal write if one needed to go out had to be kicked off from the system_wq). This will help to avoid deadlocks when the journalling code is being used from more interesting contexts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: drop null test before destroy functions	Julia Lawall
	Remove unneeded NULL test. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x; @@ -if (x != NULL) \(kmem_cache_destroy\\|mempool_destroy\\|dma_pool_destroy\)(x); // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	bcache: Split out keybuf code into keybuf.c	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2017-01-18	Revert "bcache: don't embed 'return' statements in closure macros"	Kent Overstreet
	This reverts commit 77b5a08427e87514c33730afc18cd02c9475e2c3 - the patch was never mailed out to any mailing lists or the maintainer. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2016-09-22	block: export bio_free_pages to other modules	Guoqing Jiang
	bio_free_pages is introduced in commit 1dfa0f68c040 ("block: add a helper to free bio bounce buffer pages"), we can reuse the func in other modules after it was imported. Cc: Christoph Hellwig <hch@infradead.org> Cc: Jens Axboe <axboe@fb.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Shaohua Li <shli@fb.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Acked-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-07	block: rename bio bi_rw to bi_opf	Jens Axboe
	Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower portion and the op code in the higher portions. This means that old code that relies on manually setting bi_rw is most likely going to be broken. Instead of letting that brokeness linger, rename the member, to force old and out-of-tree code to break at compile time instead of at runtime. No intended functional changes in this commit. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07	block, drivers, fs: rename REQ_FLUSH to REQ_PREFLUSH	Mike Christie
	To avoid confusion between REQ_OP_FLUSH, which is handled by request_fn drivers, and upper layers requesting the block layer perform a flush sequence along with possibly a WRITE, this patch renames REQ_FLUSH to REQ_PREFLUSH. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07	bcache: use bio op accessors	Mike Christie
	Separate the op from the rq_flag_bits and have bcache set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07	bcache: use op_is_write instead of checking for REQ_WRITE	Mike Christie
	We currently set REQ_WRITE/WRITE for all non READ IOs like discard, flush, writesame, etc. In the next patches where we no longer set up the op as a bitmap, we will not be able to detect a operation direction like writesame by testing if REQ_WRITE is set. This has bcache use the op_is_write helper which will do the right thing. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-07	block: change ->make_request_fn() and users to return a queue cookie	Jens Axboe
	No functional changes in this patch, but it prepares us for returning a more useful cookie related to the IO that was queued up. Signed-off-by: Jens Axboe <axboe@fb.com> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com>
2015-08-13	bcache: remove driver private bio splitting code	Kent Overstreet
	The bcache driver has always accepted arbitrarily large bios and split them internally. Now that every driver must accept arbitrarily large bios this code isn't nessecary anymore. Cc: linux-bcache@vger.kernel.org Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> [dpark: add more description in commit message] Signed-off-by: Dongsu Park <dpark@posteo.net> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-07-29	block: add a bi_error field to struct bio	Christoph Hellwig
	Currently we have two different ways to signal an I/O error on a BIO: (1) by clearing the BIO_UPTODATE flag (2) by returning a Linux errno value to the bi_end_io callback The first one has the drawback of only communicating a single possible error (-EIO), and the second one has the drawback of not beeing persistent when bios are queued up, and are not passed along from child to parent bio in the ever more popular chaining scenario. Having both mechanisms available has the additional drawback of utterly confusing driver authors and introducing bugs where various I/O submitters only deal with one of them, and the others have to add boilerplate code to deal with both kinds of error returns. So add a new bi_error field to store an errno value directly in struct bio and remove the existing mechanisms to clean all this up. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-07-11	bcache: don't embed 'return' statements in closure macros	Jens Axboe
	This is horribly confusing, it breaks the flow of the code without it being apparent in the caller. Signed-off-by: Jens Axboe <axboe@fb.com> Acked-by: Christoph Hellwig <hch@lst.de>
2015-06-02	writeback: separate out include/linux/backing-dev-defs.h	Tejun Heo
	With the planned cgroup writeback support, backing-dev related declarations will be more widely used across block and cgroup; unfortunately, including backing-dev.h from include/linux/blkdev.h makes cyclic include dependency quite likely. This patch separates out backing-dev-defs.h which only has the essential definitions and updates blkdev.h to include it. c files which need access to more backing-dev details now include backing-dev.h directly. This takes backing-dev.h off the common include dependency chain making it a lot easier to use it across block and cgroup. v2: fs/fat build failure fixed. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-05	bio: skip atomic inc/dec of ->bi_cnt for most use cases	Jens Axboe
	Struct bio has a reference count that controls when it can be freed. Most uses cases is allocating the bio, which then returns with a single reference to it, doing IO, and then dropping that single reference. We can remove this atomic_dec_and_test() in the completion path, if nobody else is holding a reference to the bio. If someone does call bio_get() on the bio, then we flag the bio as now having valid count and that we must properly honor the reference count when it's being put. Tested-by: Robert Elliott <elliott@hp.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-24	md/bcache: use generic io stats accounting functions to simplify io stat ↵	Gu Zheng
	accounting Use generic io stats accounting help functions (generic_{start,end}_io_acct) to simplify io stat accounting. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Acked-by: Kent Overstreet <kmo@datera.io> Signed-off-by: Jens Axboe <axboe@fb.com>