summaryrefslogtreecommitdiff
path: root/fs/bcachefs/super.c
AgeCommit message (Collapse)Author
2022-04-17bcachefs: Fix key cache assertionKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Fix bch2_trans_mark_dev_sb()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Eliminate more PAGE_SIZE usesKent Overstreet
In userspace, we don't really have a well defined PAGE_SIZE and shouln't be relying on it. This is some more incremental work to remove references to it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Don't use write side of mark_lock in journal write pathKent Overstreet
The write side of percpu rwsemaphors is really expensive, and we shouldn't be taking it at all in steady state operation. Fortunately, in bch2_journal_super_entries_add_common(), we don't need to - we have a seqlock, usage_lock for accumulating percpu usage counters to the base counters. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Add a print statement for when we go read-writeKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Use x-macros for more enumsKent Overstreet
This patch standardizes all the enums that have associated string tables (probably more enums should have string tables). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Rename BTREE_ID enums for consistency with other enumsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Add a mempool for the replicas delta listKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Start journal reclaim thread earlierKent Overstreet
Especially in userspace, we sometime run into resource exhaustion issues with starting up threads after mark and sweep/fsck. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Fix for copygc getting stuck waiting for reserve to be filledKent Overstreet
This fixes a regression from the patch bcachefs: Fix copygc dying on startup In general only the allocator thread itself should be updating ca->allocator_state, the thread waking up the allocator setting it is an ugly hack only needed to avoid racing with the copygc threads when we're first starting up. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Rip out copygc pd controllerKent Overstreet
We have a separate mechanism for ratelimiting copygc now - the pd controller has only been causing problems. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Fix an allocator startup raceKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Create allocator threads when allocating filesystemKent Overstreet
We're seeing failures to mount because of a failure to start the allocator threads, which currently happens fairly late in the mount process, after walking all metadata, and kthread_create() fails if something has tried to kill the mount process, which is probably not what we want. This patch avoids this issue by creating, but not starting, the allocator threads when we preallocate all of our other in memory data structures. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Redo checks for sufficient devicesKent Overstreet
When the replicas mechanism was added, for tracking data by which drives it's replicated on, the check for whether we have sufficient devices was never updated to make use of it. This patch finally does that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Fixes/improvements for journal entry reservationsKent Overstreet
This fixes some arithmetic bugs in "bcachefs: Journal updates to dev usage" - additionally, it cleans things up by switching everything that goes in every journal entry to the journal_entry_res mechanism. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Journal updates to dev usageKent Overstreet
This eliminates the need to scan every bucket to regenerate dev_usage at mount time. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Persist 64 bit io clocksKent Overstreet
Originally, bcachefs - going back to bcache - stored, for each bucket, a 16 bit counter corresponding to how long it had been since the bucket was read from. But, this required periodically rescaling counters on every bucket to avoid wraparound. That wasn't an issue in bcache, where we'd perodically rewrite the per bucket metadata all at once, but in bcachefs we're trying to avoid having to walk every single bucket. This patch switches to persisting 64 bit io clocks, corresponding to the 64 bit bucket timestaps introduced in the previous patch with KEY_TYPE_alloc_v2. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Add support for doing btree updates prior to journal replayKent Overstreet
Some errors may need to be fixed in order for GC to successfully run - walk and mark all metadata. But we can't start the allocators and do normal btree updates until after GC has completed, and allocation information is known to be consistent, so we need a different method of doing btree updates. Fortunately, we already have code for walking the btree while overlaying keys from the journal to be replayed. This patch adds an update path that adds keys to the list of keys to be replayed by journal replay, and also fixes up iterators. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Mark superblocks transactionallyKent Overstreet
More work towards getting rid of the in memory struct bucket: this path adds code for marking superblock and journal buckets via the btree, and uses it in the device add and journal resize paths. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Refactor dev usageKent Overstreet
This is to make it more amenable for serialization. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Fix an assertion popKent Overstreet
There was a race: btree node writes drop their reference on journal pins before clearing the btree_node_write_in_flight flag. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Avoid write lock on mark_lockKent Overstreet
mark_lock is a frequently taken lock, and there's also potential for deadlocks since currently bch2_clear_page_bits which is called from memory reclaim has to take it to drop disk reservations. The disk reservation get path takes it when it recalculates the number of sectors known to be available, but it's not really needed for consistency. We just want to make sure we only have one thread updating the sectors_available count, which we can do with a dedicated mutex. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Refactor filesystem usage accountingKent Overstreet
Various filesystem usage counters are kept in percpu counters, with one set per in flight journal buffer. Right now all the code that deals with it assumes that there's only two buffers/sets of counters, but the number of journal bufs is getting increased to 4 in the next patch - so refactor that code to not assume a constant. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Move journal reclaim to a kthreadKent Overstreet
This is to make tracing easier. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Add a kmem_cache for btree_key_cache objectsKent Overstreet
We allocate a lot of these, and we're seeing sporading OOMs - this will help with tracking those down. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: New varintsKent Overstreet
Previous varint implementation used by the inode code was not nearly as fast as it could have been; partly because it was attempting to encode integers up to 96 bits (for timestamps) but this meant that encoding and decoding the length required a table lookup. Instead, we'll just encode timestamps greater than 64 bits as two separate varints; this will make decoding/encoding of inodes significantly faster overall. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Add a single slot percpu buf for btree itersKent Overstreet
Allocating our array of btree iters is a big enough allocation that it hits the buddy allocator, and we're seeing lots of lock contention. Sticking a single element buffer in front of it should help. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Improved inode create optimizationKent Overstreet
This shards new inodes into different btree nodes by using the processor ID for the high bits of the new inode number. Much faster than the previous inode create optimization - this also helps with sharding in the other btrees that index by inode number. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Minor journal reclaim improvementKent Overstreet
With the btree key cache code, journal reclaim now has a lot more work to do. It could be the case that after journal reclaim has finished one iteration there's already more work to do, so put it in a loop to check for that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Inode create optimizationKent Overstreet
On workloads that do a lot of multithreaded creates all at once, lock contention on the inodes btree turns out to still be an issue. This patch adds a small buffer of inode numbers that are known to be free, so that we can avoid touching the btree on every create. Also, this changes inode creates to update via the btree key cache for the initial create. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Start/stop io clock hands in read/write pathsKent Overstreet
This fixes a bug where the clock hands in the journal and superblock didn't match, because we were still incrementing the read clock hand while read-only. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Improvements to writing alloc infoKent Overstreet
Now that we've got transactional alloc info updates (and have for awhile), we don't need to write it out on shutdown, and we don't need to write it out on startup except when GC found errors - this is a big improvement to mount/unmount performance. This patch also fixes a few bugs where we weren't writing out alloc info (on new filesystems, and new devices) and should have been. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Fix copygc dying on startupKent Overstreet
The copygc threads errors out and makes the filesystem go RO if it ever tries to run and discovers it has no reserve allocated - which is a problem if it races with the allocator thread and its reserve hasn't been filled yet. The allocator thread doesn't start filling the copygc reserve until after BCH_FS_STARTED has been set, so make sure to wake up the allocator threads after setting that and before starting copygc. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Fix errors early in the fs init processKent Overstreet
At some point bch2_fs_alloc() was changed to always call bch2_fs_free() in the error path, which means we need c->cl to always be initialized. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Fix unmount pathKent Overstreet
There was a long standing race in the mount/unmount code - the VFS intends for mount/unmount synchronizatino to be handled by the list of superblocks, but we were still holding devices open after tearing down our superblock in the unmount path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Don't fail mount if device has been removedKent Overstreet
Also - make sure to show the devices we actually have open in /proc Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Fix a bug with the journal_seq_blacklist mechanismKent Overstreet
Previously, we would start doing btree updates before writing the first journal entry; if this was after an unclean shutdown, this could cause those btree updates to not be blacklisted. Also, move some code to headers for userspace debug tools. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Don't let copygc buckets be stolen by other threadsKent Overstreet
And assorted other copygc fixes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Make copygc thread globalKent Overstreet
Per device copygc threads don't move data to different devices and they make fragmentation works - they don't make much sense anymore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Use x-macros for data typesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Move stripe creation to workqueueKent Overstreet
This is mainly to solve a lock ordering issue, and also simplifies the code a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Improve stripe triggers/heap codeKent Overstreet
Soon we'll be able to modify existing stripes - replacing empty blocks with new blocks and new p/q blocks. This patch updates the trigger code to handle pointers changing in an existing stripe; also, it significantly improves how the stripes heap works, which means we can get rid of the stripe creation/deletion lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Use cached iterators for alloc btreeKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Btree key cacheKent Overstreet
This introduces a new kind of btree iterator, cached iterators, which point to keys cached in a hash table. The cache also acts as a write cache - in the update path, we journal the update but defer updating the btree until the cached entry is flushed by journal reclaim. Cache coherency is for now up to the users to handle, which isn't ideal but should be good enough for now. These new iterators will be used for updating inodes and alloc info (the alloc and stripes btrees). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Turn c->state_lock into an rwsemKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Hacky io-in-flight throttlingKent Overstreet
We've been seeing btree updates get stuck, due to some sort of bug; when this happens, buffered writeback will keep queueing up writes that lead to the system running out of memory. Not sure if this kind of throttling is something we'll want to keep and improve, or get rid of when the bug with btree updates getting stuck is fixed. For now it should make debugging easier. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Kill old allocator startup codeKent Overstreet
It's not needed anymore since we can now write to buckets before updating the alloc btree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Fixes for going ROKent Overstreet
Now that interior btree updates are fully transactional, we don't need to write out alloc info in a loop. However, interior btree updates do put more things in the journal, so we still need a loop in the RO sequence. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Interior btree updates are now fully transactionalKent Overstreet
We now update the alloc info (bucket sector counts) atomically with journalling the update to the interior btree nodes, and we also set new btree roots atomically with the journalled part of the btree update. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Factor out bch2_fs_btree_interior_update_init()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>