notes

author: Kent Overstreet <kent.overstreet@gmail.com> 2015-04-26 00:50:36 -0700
committer: Kent Overstreet <kent.overstreet@gmail.com> 2015-04-26 00:50:36 -0700
commit: 8ab8c6c64b9cf56ab35f7f8c32c8b84c5ce7ba83 (patch)
tree: bd60c8da1cbdb55b2c7727ca93f6abd9ea10a507
parent: 61f96c65462dcf0b97f8c5d2e62fd9bcf25224a3 (diff)
1 files changed, 46 insertions, 4 deletions
diff --git a/Todo.mdwn b/Todo.mdwn
index b38f483..e6d2d72 100644
--- a/Todo.mdwn
+++ b/Todo.mdwn
@@ -1,9 +1,51 @@
 bcache/bcachefs todo list:
 
- * asynchronous btree node splits
+ * Asynchronous btree node splits
 
- * lockless btree node lookus?
+   The last source of tail latency in the index update path is btree node
+   splits/compacts - a btree node split has to write out the new node(s) and
+   then update the parent node, all synchronously.
 
- * bcachefs: add a mount option to disable journal_push_seq() - so userspace is never waiting on metadata to be synced. Instead, metadata will be persisted after journal_delay_ms (and ordering will be preserved as usual).
+   This is particularly painful with bcachefs, as we end up doing a lot more
+   index updates (dirents/inodes/xattrs) as a proportion of total IO.
 
-    The idea here is that a lot of the time users really don't care about losing the past 10-100 ms of work (client machines, build servers) and would prefer the performance improvement on fsync heavy workloads.
+   Need to get the design for this written down. Slava and I had it worked out,
+   but we didn't write anything down so we'll have to go over it again.
+
+ * Lockless btree lookups
+
+   The idea is to use the technique from seqlocks - instead taking a read lock
+   on a btree node, we'll check a sequence number for that node, do the lookup,
+   then check the sequence number again: if they don't match we raced and we
+   have to retry.
+
+   This will let us do lookups without writing to any shared cachelines, which
+   should be a fairly significant performance win - right now the root node's
+   lock is a bottleneck on multithreaded workloads simply because all lookups
+   have to write to that cacheline, just to take a read lock on it.
+
+   We already have the sequence number, from SIX locks. The main thing that has
+   to be done is the lookup code in bset.c has to be audited and modified to
+   handle reading garbage and gracefully return an error indicating the lookup
+   raced with a writer. This hopefully won't be too difficult, we don't really
+   have pointers to deal with.
+
+ * Scalability to large cache sets:
+
+   Mark and sweep GC is an issue - it runs concurrently with everything _except_
+   the allocator invalidating buckets. So as long as gc can finish before the
+   freelist is used up, everything is fine - if not, all writes are going to
+   stall until gc finishes.
+
+   Additionally, we _almost_ don't need mark and sweep anymore. It would be nice
+   to rip it out entirely (which Slava was working on). This is going to expose
+   subtle races though - leaks of sector counts that aren't an issue today
+   though because mark and sweep eventually fixes them.
+
+ * bcachefs: add a mount option to disable journal_push_seq() - so userspace is
+   never waiting on metadata to be synced. Instead, metadata will be persisted
+   after journal_delay_ms (and ordering will be preserved as usual).
+
+   The idea here is that a lot of the time users really don't care about losing
+   the past 10-100 ms of work (client machines, build servers) and would prefer the
+   performance improvement on fsync heavy workloads.
author	Kent Overstreet <kent.overstreet@gmail.com>	2015-04-26 00:50:36 -0700
committer	Kent Overstreet <kent.overstreet@gmail.com>	2015-04-26 00:50:36 -0700
commit	8ab8c6c64b9cf56ab35f7f8c32c8b84c5ce7ba83 (patch)
tree	bd60c8da1cbdb55b2c7727ca93f6abd9ea10a507
parent	61f96c65462dcf0b97f8c5d2e62fd9bcf25224a3 (diff)