diff options
author | Kent Overstreet <kent.overstreet@gmail.com> | 2015-04-26 00:50:36 -0700 |
---|---|---|
committer | Kent Overstreet <kent.overstreet@gmail.com> | 2015-04-26 00:50:36 -0700 |
commit | 8ab8c6c64b9cf56ab35f7f8c32c8b84c5ce7ba83 (patch) | |
tree | bd60c8da1cbdb55b2c7727ca93f6abd9ea10a507 | |
parent | 61f96c65462dcf0b97f8c5d2e62fd9bcf25224a3 (diff) |
notes
-rw-r--r-- | Todo.mdwn | 50 |
1 files changed, 46 insertions, 4 deletions
@@ -1,9 +1,51 @@ bcache/bcachefs todo list: - * asynchronous btree node splits + * Asynchronous btree node splits - * lockless btree node lookus? + The last source of tail latency in the index update path is btree node + splits/compacts - a btree node split has to write out the new node(s) and + then update the parent node, all synchronously. - * bcachefs: add a mount option to disable journal_push_seq() - so userspace is never waiting on metadata to be synced. Instead, metadata will be persisted after journal_delay_ms (and ordering will be preserved as usual). + This is particularly painful with bcachefs, as we end up doing a lot more + index updates (dirents/inodes/xattrs) as a proportion of total IO. - The idea here is that a lot of the time users really don't care about losing the past 10-100 ms of work (client machines, build servers) and would prefer the performance improvement on fsync heavy workloads. + Need to get the design for this written down. Slava and I had it worked out, + but we didn't write anything down so we'll have to go over it again. + + * Lockless btree lookups + + The idea is to use the technique from seqlocks - instead taking a read lock + on a btree node, we'll check a sequence number for that node, do the lookup, + then check the sequence number again: if they don't match we raced and we + have to retry. + + This will let us do lookups without writing to any shared cachelines, which + should be a fairly significant performance win - right now the root node's + lock is a bottleneck on multithreaded workloads simply because all lookups + have to write to that cacheline, just to take a read lock on it. + + We already have the sequence number, from SIX locks. The main thing that has + to be done is the lookup code in bset.c has to be audited and modified to + handle reading garbage and gracefully return an error indicating the lookup + raced with a writer. This hopefully won't be too difficult, we don't really + have pointers to deal with. + + * Scalability to large cache sets: + + Mark and sweep GC is an issue - it runs concurrently with everything _except_ + the allocator invalidating buckets. So as long as gc can finish before the + freelist is used up, everything is fine - if not, all writes are going to + stall until gc finishes. + + Additionally, we _almost_ don't need mark and sweep anymore. It would be nice + to rip it out entirely (which Slava was working on). This is going to expose + subtle races though - leaks of sector counts that aren't an issue today + though because mark and sweep eventually fixes them. + + * bcachefs: add a mount option to disable journal_push_seq() - so userspace is + never waiting on metadata to be synced. Instead, metadata will be persisted + after journal_delay_ms (and ordering will be preserved as usual). + + The idea here is that a lot of the time users really don't care about losing + the past 10-100 ms of work (client machines, build servers) and would prefer the + performance improvement on fsync heavy workloads. |