summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-08-22bcachefs: bch2_btree_path_upgrade_norestart()bcachefs-testing-2Kent Overstreet
btree_path_get_locks() leaves a path unlocked (and with error pointers for nodes) on failure - this used to be necessary for path_traverse() to correctly lock all needed levels. But it's not needed anymore, and we need a new helper that doesn't break future locking invariants and leave us in a bad state (unlocked) when used on a should_be_locked path, because bch2_path_get() has no way of returning a transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22rcu: Switch kvfree_rcu() to new rcu_pendingKent Overstreet
This nets us a slight performance increase, and converts to common code. Todo - re-add the shrinker, so that memory reclaim can free expired objects and expedite a grace period when necessary. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22rcu: delete lockdep_assert_irqs_enabled() assert in ↵Kent Overstreet
start_poll_synchronize_rcu_common() this assertion appears to have been entirely unnecessary Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22rcu: lift rcu_pending from bcachefsKent Overstreet
Generic data structure for explicitly tracking pending RCU items, allowing items to be dequeued (i.e. allocate from items pending freeing). Works with conventional RCU and SRCU, and possibly other RCU flavors in the future, meaning this can serve as a more generic replacement for SLAB_TYPESAFE_BY_RCU. Pending items are tracked in radix trees; if memory allocation fails, we fall back to linked lists. A rcu_pending is initialized with a callback, which is invoked when pending items's grace periods have expired. Two types of callback processing are handled specially: - RCU_PENDING_KVFREE_FN New backend for kvfree_rcu(). Slightly faster, and eliminates the synchronize_rcu() slowpath in kvfree_rcu_mightsleep() - instead, an rcu_head is allocated if we don't have one and can't use the radix tree TODO: - add a shrinker (as in the existing kvfree_rcu implementation) so that memory reclaim can free expired objects if callback processing isn't keeping up, and to expedite a grace period if we're under memory pressure and too much memory is stranded by RCU - add a counter for amount of memory pending - RCU_PENDING_CALL_RCU_FN Accelerated backend for call_rcu() - pending callbacks are tracked in a radix tree to eliminate linked list overhead. to serve as replacement backends for kvfree_rcu() and call_rcu(); these may be of interest to other uses (e.g. SLAB_TYPESAFE_BY_RCU users). Note: Internally, we're using a single rearming call_rcu() callback for notifications from the core RCU subsystem for notifications when objects are ready to be processed. Ideally we would be getting a callback every time a grace period completes for which we have objects, but that would require multiple rcu_heads in flight, and since the number of gp sequence numbers with uncompleted callbacks is not bounded, we can't do that yet. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22darray: lift from bcachefsKent Overstreet
dynamic arrays - inspired from CCAN darrays, basically c++ stl vectors. Used by thread_with_stdio, which is also being lifted from bcachefs for xfs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22vmalloc: is_vmalloc_addr_inlined()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22fs: super_cache_to_text()Kent Overstreet
Implement shrinker.to_text() for the superblock shrinker: print out nr of dentries and inodes, total and shrinkable. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22fs/dcache: Add per-sb accounting for nr dentriesKent Overstreet
Like the previous patch, add a counter for total dentries, so we can print total vs. reclaimable. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22fs: Add super_block->s_inodes_nrKent Overstreet
Upcoming shrinker debugging patchset is going to give us a callback for reporting on all memory owned by a shrinker. This adds a counter for total number of inodes allocated for a given superblock, so we can compare with the number of reclaimable inodes we already have. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22percpu: per_cpu_sum()Kent Overstreet
Add a little helper to replace open coded versions. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.krenel.org Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: shrinker.to_text() methodsKent Overstreet
This adds shrinker.to_text() methods for our shrinkers and hooks them up to our existing to_text() functions. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22mm: shrinker: Add shrinker_to_text() to debugfs interfaceKent Overstreet
Previously, we added shrinker_to_text() and hooked it up to the OOM report - now, the same report is available via debugfs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22mm: Centralize & improve oom reporting in show_mem.cKent Overstreet
This patch: - Changes show_mem() to always report on slab usage - Instead of reporting on all slabs, we only report on top 10 slabs, and in sorted order - Also reports on shrinkers, with the new shrinkers_to_text(). Shrinkers need to be included in OOM/allocation failure reporting because they're responsible for memory reclaim - if a shrinker isn't giving up its memory, we need to know which one and why. More OOM reporting can be moved to show_mem.c and improved, this patch is only a start. New example output on OOM/memory allocation failure: 00177 Mem-Info: 00177 active_anon:13706 inactive_anon:32266 isolated_anon:16 00177 active_file:1653 inactive_file:1822 isolated_file:0 00177 unevictable:0 dirty:0 writeback:0 00177 slab_reclaimable:6242 slab_unreclaimable:11168 00177 mapped:3824 shmem:3 pagetables:1266 bounce:0 00177 kernel_misc_reclaimable:0 00177 free:4362 free_pcp:35 free_cma:0 00177 Node 0 active_anon:54824kB inactive_anon:129064kB active_file:6612kB inactive_file:7288kB unevictable:0kB isolated(anon):64kB isolated(file):0kB mapped:15296kB dirty:0kB writeback:0kB shmem:12kB writeback_tmp:0kB kernel_stack:3392kB pagetables:5064kB all_unreclaimable? no 00177 DMA free:2232kB boost:0kB min:88kB low:108kB high:128kB reserved_highatomic:0KB active_anon:2924kB inactive_anon:6596kB active_file:428kB inactive_file:384kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB 00177 lowmem_reserve[]: 0 426 426 426 00177 DMA32 free:15092kB boost:5836kB min:8432kB low:9080kB high:9728kB reserved_highatomic:0KB active_anon:52196kB inactive_anon:122392kB active_file:6176kB inactive_file:7068kB unevictable:0kB writepending:0kB present:507760kB managed:441816kB mlocked:0kB bounce:0kB free_pcp:72kB local_pcp:0kB free_cma:0kB 00177 lowmem_reserve[]: 0 0 0 0 00177 DMA: 284*4kB (UM) 53*8kB (UM) 21*16kB (U) 11*32kB (U) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2248kB 00177 DMA32: 2765*4kB (UME) 375*8kB (UME) 57*16kB (UM) 5*32kB (U) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15132kB 00177 4656 total pagecache pages 00177 1031 pages in swap cache 00177 Swap cache stats: add 6572399, delete 6572173, find 488603/3286476 00177 Free swap = 509112kB 00177 Total swap = 2097148kB 00177 130938 pages RAM 00177 0 pages HighMem/MovableOnly 00177 16644 pages reserved 00177 Unreclaimable slab info: 00177 9p-fcall-cache total: 8.25 MiB active: 8.25 MiB 00177 kernfs_node_cache total: 2.15 MiB active: 2.15 MiB 00177 kmalloc-64 total: 2.08 MiB active: 2.07 MiB 00177 task_struct total: 1.95 MiB active: 1.95 MiB 00177 kmalloc-4k total: 1.50 MiB active: 1.50 MiB 00177 signal_cache total: 1.34 MiB active: 1.34 MiB 00177 kmalloc-2k total: 1.16 MiB active: 1.16 MiB 00177 bch_inode_info total: 1.02 MiB active: 922 KiB 00177 perf_event total: 1.02 MiB active: 1.02 MiB 00177 biovec-max total: 992 KiB active: 960 KiB 00177 Shrinkers: 00177 super_cache_scan: objects: 127 00177 super_cache_scan: objects: 106 00177 jbd2_journal_shrink_scan: objects: 32 00177 ext4_es_scan: objects: 32 00177 bch2_btree_cache_scan: objects: 8 00177 nr nodes: 24 00177 nr dirty: 0 00177 cannibalize lock: 0000000000000000 00177 00177 super_cache_scan: objects: 8 00177 super_cache_scan: objects: 1 Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: linux-mm@kvack.org Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22mm: shrinker: Add new stats for .to_text()Kent Overstreet
Add a few new shrinker stats. number of objects requested to free, number of objects freed: Shrinkers won't necessarily free all objects requested for a variety of reasons, but if the two counts are wildly different something is likely amiss. .scan_objects runtime: If one shrinker is taking an excessive amount of time to free objects that will block kswapd from running other shrinkers. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: linux-mm@kvack.org Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22mm: shrinker: Add a .to_text() method for shrinkersKent Overstreet
This adds a new callback method to shrinkers which they can use to describe anything relevant to memory reclaim about their internal state, for example object dirtyness. This patch also adds shrinkers_to_text(), which reports on the top 10 shrinkers - by object count - in sorted order, to be used in OOM reporting. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: linux-mm@kvack.org Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22seq_buf: seq_buf_human_readable_u64()Kent Overstreet
This adds a seq_buf wrapper for string_get_size(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: Convert to use jiffies macrosChen Yufan
Use jiffies macros instead of using jiffies directly to handle wraparound. Signed-off-by: Chen Yufan <chenyufan@vivo.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: Refactor bch2_bset_fix_lookup_tableAlan Huang
bch2_bset_fix_lookup_table is too complicated to be easily understood, the comment "l now > where" there is also incorrect when where == t->end_offset. This patch therefore refactor the function, the idea is that when where >= rw_aux_tree(b, t)[t->size - 1].offset, we don't need to adjust the rw aux tree. Signed-off-by: Alan Huang <mmpgouride@gmail.com>
2024-08-22bcachefs: Assert that we don't lock nodes when !trans->lockedKent Overstreet
We rely on the trans->locked to know if a trans has nodes locked for assertions about deadlocks; there can't be more than one trans in the same process that is locked. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: Do not check folio_has_private()Matthew Wilcox (Oracle)
folio_has_private() is an attractive nuisance; filesystem authors generally don't realise that it actually checks two flags (one of which is never set by bcachefs). There's no need to check the private flag at all; for folios owned by bcachefs, we know that folio->private is NULL when the private flag is clear and non-NULL when the private flag is set. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: bch2_time_stats_reset()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: Drop memalloc_nofs_save() in bch2_btree_node_mem_alloc()Kent Overstreet
It's really not needed: the only locks used here are the btree cache lock, which we drop for GFP_WAIT allocations, and btree node locks - but we also drop those for GFP_WAIT allocations. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: Simplify bch2_xattr_emit() implementationYouling Tang
Use helper functions to make code more readable. Similar to commit a5488f29835c ("fs: simplify ->listxattr() implementation") Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: drop unused posix acl handlersYouling Tang
Remove struct nop_posix_acl_{access,default} for bcachefs filesystem that don't depend on the xattr handler in their inode->i_op->listxattr() method in any way. There's nothing more to do than to simply remove the handler. It's been effectively unused ever since we introduced the new posix acl api. See [1] for details. Link [1]: https://patchwork.kernel.org/project/linux-fsdevel/cover/20230125-fs-acl-remove-generic-xattr-handlers-v3-0-f760cc58967d@kernel.org/ Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: Remove unused parameterAlan Huang
iter here is unused, remove it. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: Remove the prev array stuffAlan Huang
After reducing the search range when building the aux tree, the prev array stuff is no longer useful, so remove it. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: Minimize the search range used to calculate the mantissaAlan Huang
When the search key's mantissa is larger than the node i's, we know that the search key is larger than the first key of the cacheline corresponding to node i, so that when we are calculating the mantissa of right side nodes of node i, the left side of the search range can be the first key of node i. Once the search range is minimized, the mantissa we are calculating can have more useful bits, thus reduce the slow path comparison. Besides, we can now remove all the prev array stuff. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: Convert open-coded extra computation to helperAlan Huang
This patch replaces open-coded extra computation to eytzinger1_extra. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: Remove dead code in __build_ro_aux_treeAlan Huang
This logic is no longer useful since commit 3ce8b463e3e0 ("bcachefs: kill bset_tree->max_key"), so remove it. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: Remove unused parameter of bkey_mantissa_bits_droppedAlan Huang
The idx parameter of bkey_mantissa_bits_dropped is unused, remove it. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: Remove unused parameter of bkey_mantissaAlan Huang
The idx parameter of bkey_mantissa became unused since commit b904a7991802 ("bcachefs: Go back to 16 bit mantissa bkey floats"), so remove it. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: bch2_sb_nr_devices()Kent Overstreet
factoring out a helper Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: trivial open_bucket_add_buckets() cleanupKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: Fix a spelling error in docsXiaxi Shen
Signed-off-by: Xiaxi Shen <shenxiaxi26@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: promote_whole_extents is now a normal optionKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: Move rebalance_status out of sysfs/internalKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: remove the unused parameter in macro bkey_crc_nextJulian Sun
In the macro definition of bkey_crc_next, five parameters were accepted, but only four of them were used. Let's remove the unused one. The patch has only passed compilation tests, but it should be fine. Signed-off-by: Julian Sun <sunjunchao2870@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: fix macro definition allocate_dropping_locksJulian Sun
The macro allocate_dropping_locks accepts a parameter _trans, but it was not used, rather the variable trans was directly used, which may be a local variable inside a function that calls the macros. Signed-off-by: Julian Sun <sunjunchao2870@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: fix macro definition allocate_dropping_locks_errcodeJulian Sun
The macro allocate_dropping_locks_errocode accepts a parameter _trans, but it was not used, rather the variable trans was directly used, which may be a local variable inside a function that calls the macros. Signed-off-by: Julian Sun <sunjunchao2870@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: remove the unused macro definitionJulian Sun
macro bch2_kthread_wait_event_ioclock_timeout is no longer used, let's remove it. The patch has passed compilation test. Signed-off-by: Julian Sun <sunjunchao2870@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: quota_reserve_range() -> for_each_btree_key_in_subvolume_uptoKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: bch2_folio_set() -> for_each_btree_key_in_subvolume_uptoKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: range_has_data() -> for_each_btree_key_in_subvolume_uptoKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: bch2_seek_hole() -> for_each_btree_key_in_subvolume_uptoKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: bch2_seek_data() -> for_each_btree_key_in_subvolume_uptoKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: bch2_xattr_list() -> for_each_btree_key_in_subvolume_uptoKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: bch2_readdir() -> for_each_btree_key_in_subvolume_uptoKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: for_each_btree_key_in_subvolume_upto()Kent Overstreet
New helper for looping over keys in a given subvolume Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: bch2_fiemap(): call trans_begin() on every loop iterKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: bchfs_read(): call trans_begin() on every loop iterKent Overstreet
Same as the recent change for __bch2_read(); also, kill now unnecessary btree_trans_too_many_iters() calls. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>