summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorKent Overstreet <kent.overstreet@gmail.com>2015-04-26 00:50:36 -0700
committerKent Overstreet <kent.overstreet@gmail.com>2015-04-26 00:50:36 -0700
commit8ab8c6c64b9cf56ab35f7f8c32c8b84c5ce7ba83 (patch)
treebd60c8da1cbdb55b2c7727ca93f6abd9ea10a507
parent61f96c65462dcf0b97f8c5d2e62fd9bcf25224a3 (diff)
notes
-rw-r--r--Todo.mdwn50
1 files changed, 46 insertions, 4 deletions
diff --git a/Todo.mdwn b/Todo.mdwn
index b38f483..e6d2d72 100644
--- a/Todo.mdwn
+++ b/Todo.mdwn
@@ -1,9 +1,51 @@
bcache/bcachefs todo list:
- * asynchronous btree node splits
+ * Asynchronous btree node splits
- * lockless btree node lookus?
+ The last source of tail latency in the index update path is btree node
+ splits/compacts - a btree node split has to write out the new node(s) and
+ then update the parent node, all synchronously.
- * bcachefs: add a mount option to disable journal_push_seq() - so userspace is never waiting on metadata to be synced. Instead, metadata will be persisted after journal_delay_ms (and ordering will be preserved as usual).
+ This is particularly painful with bcachefs, as we end up doing a lot more
+ index updates (dirents/inodes/xattrs) as a proportion of total IO.
- The idea here is that a lot of the time users really don't care about losing the past 10-100 ms of work (client machines, build servers) and would prefer the performance improvement on fsync heavy workloads.
+ Need to get the design for this written down. Slava and I had it worked out,
+ but we didn't write anything down so we'll have to go over it again.
+
+ * Lockless btree lookups
+
+ The idea is to use the technique from seqlocks - instead taking a read lock
+ on a btree node, we'll check a sequence number for that node, do the lookup,
+ then check the sequence number again: if they don't match we raced and we
+ have to retry.
+
+ This will let us do lookups without writing to any shared cachelines, which
+ should be a fairly significant performance win - right now the root node's
+ lock is a bottleneck on multithreaded workloads simply because all lookups
+ have to write to that cacheline, just to take a read lock on it.
+
+ We already have the sequence number, from SIX locks. The main thing that has
+ to be done is the lookup code in bset.c has to be audited and modified to
+ handle reading garbage and gracefully return an error indicating the lookup
+ raced with a writer. This hopefully won't be too difficult, we don't really
+ have pointers to deal with.
+
+ * Scalability to large cache sets:
+
+ Mark and sweep GC is an issue - it runs concurrently with everything _except_
+ the allocator invalidating buckets. So as long as gc can finish before the
+ freelist is used up, everything is fine - if not, all writes are going to
+ stall until gc finishes.
+
+ Additionally, we _almost_ don't need mark and sweep anymore. It would be nice
+ to rip it out entirely (which Slava was working on). This is going to expose
+ subtle races though - leaks of sector counts that aren't an issue today
+ though because mark and sweep eventually fixes them.
+
+ * bcachefs: add a mount option to disable journal_push_seq() - so userspace is
+ never waiting on metadata to be synced. Instead, metadata will be persisted
+ after journal_delay_ms (and ordering will be preserved as usual).
+
+ The idea here is that a lot of the time users really don't care about losing
+ the past 10-100 ms of work (client machines, build servers) and would prefer the
+ performance improvement on fsync heavy workloads.