Age | Commit message (Collapse) | Author |
|
btree_path_get_locks() leaves a path unlocked (and with error pointers
for nodes) on failure - this used to be necessary for path_traverse() to
correctly lock all needed levels.
But it's not needed anymore, and we need a new helper that doesn't break
future locking invariants and leave us in a bad state (unlocked) when
used on a should_be_locked path, because bch2_path_get() has no way of
returning a transaction restart.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This nets us a slight performance increase, and converts to common
code.
Todo - re-add the shrinker, so that memory reclaim can free expired
objects and expedite a grace period when necessary.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
start_poll_synchronize_rcu_common()
this assertion appears to have been entirely unnecessary
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Generic data structure for explicitly tracking pending RCU items,
allowing items to be dequeued (i.e. allocate from items pending
freeing). Works with conventional RCU and SRCU, and possibly other RCU
flavors in the future, meaning this can serve as a more generic
replacement for SLAB_TYPESAFE_BY_RCU.
Pending items are tracked in radix trees; if memory allocation fails, we
fall back to linked lists.
A rcu_pending is initialized with a callback, which is invoked when
pending items's grace periods have expired. Two types of callback
processing are handled specially:
- RCU_PENDING_KVFREE_FN
New backend for kvfree_rcu(). Slightly faster, and eliminates the
synchronize_rcu() slowpath in kvfree_rcu_mightsleep() - instead, an
rcu_head is allocated if we don't have one and can't use the radix
tree
TODO:
- add a shrinker (as in the existing kvfree_rcu implementation) so that
memory reclaim can free expired objects if callback processing isn't
keeping up, and to expedite a grace period if we're under memory
pressure and too much memory is stranded by RCU
- add a counter for amount of memory pending
- RCU_PENDING_CALL_RCU_FN
Accelerated backend for call_rcu() - pending callbacks are tracked in
a radix tree to eliminate linked list overhead.
to serve as replacement backends for kvfree_rcu() and call_rcu(); these
may be of interest to other uses (e.g. SLAB_TYPESAFE_BY_RCU users).
Note:
Internally, we're using a single rearming call_rcu() callback for
notifications from the core RCU subsystem for notifications when objects
are ready to be processed.
Ideally we would be getting a callback every time a grace period
completes for which we have objects, but that would require multiple
rcu_heads in flight, and since the number of gp sequence numbers with
uncompleted callbacks is not bounded, we can't do that yet.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
dynamic arrays - inspired from CCAN darrays, basically c++ stl vectors.
Used by thread_with_stdio, which is also being lifted from bcachefs for
xfs.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Implement shrinker.to_text() for the superblock shrinker: print out nr
of dentries and inodes, total and shrinkable.
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Like the previous patch, add a counter for total dentries, so we can
print total vs. reclaimable.
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Upcoming shrinker debugging patchset is going to give us a callback for
reporting on all memory owned by a shrinker.
This adds a counter for total number of inodes allocated for a given
superblock, so we can compare with the number of reclaimable inodes we
already have.
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a little helper to replace open coded versions.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.krenel.org
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This adds shrinker.to_text() methods for our shrinkers and hooks them up
to our existing to_text() functions.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Previously, we added shrinker_to_text() and hooked it up to the OOM
report - now, the same report is available via debugfs.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This patch:
- Changes show_mem() to always report on slab usage
- Instead of reporting on all slabs, we only report on top 10 slabs,
and in sorted order
- Also reports on shrinkers, with the new shrinkers_to_text().
Shrinkers need to be included in OOM/allocation failure reporting
because they're responsible for memory reclaim - if a shrinker isn't
giving up its memory, we need to know which one and why.
More OOM reporting can be moved to show_mem.c and improved, this patch
is only a start.
New example output on OOM/memory allocation failure:
00177 Mem-Info:
00177 active_anon:13706 inactive_anon:32266 isolated_anon:16
00177 active_file:1653 inactive_file:1822 isolated_file:0
00177 unevictable:0 dirty:0 writeback:0
00177 slab_reclaimable:6242 slab_unreclaimable:11168
00177 mapped:3824 shmem:3 pagetables:1266 bounce:0
00177 kernel_misc_reclaimable:0
00177 free:4362 free_pcp:35 free_cma:0
00177 Node 0 active_anon:54824kB inactive_anon:129064kB active_file:6612kB inactive_file:7288kB unevictable:0kB isolated(anon):64kB isolated(file):0kB mapped:15296kB dirty:0kB writeback:0kB shmem:12kB writeback_tmp:0kB kernel_stack:3392kB pagetables:5064kB all_unreclaimable? no
00177 DMA free:2232kB boost:0kB min:88kB low:108kB high:128kB reserved_highatomic:0KB active_anon:2924kB inactive_anon:6596kB active_file:428kB inactive_file:384kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
00177 lowmem_reserve[]: 0 426 426 426
00177 DMA32 free:15092kB boost:5836kB min:8432kB low:9080kB high:9728kB reserved_highatomic:0KB active_anon:52196kB inactive_anon:122392kB active_file:6176kB inactive_file:7068kB unevictable:0kB writepending:0kB present:507760kB managed:441816kB mlocked:0kB bounce:0kB free_pcp:72kB local_pcp:0kB free_cma:0kB
00177 lowmem_reserve[]: 0 0 0 0
00177 DMA: 284*4kB (UM) 53*8kB (UM) 21*16kB (U) 11*32kB (U) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2248kB
00177 DMA32: 2765*4kB (UME) 375*8kB (UME) 57*16kB (UM) 5*32kB (U) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15132kB
00177 4656 total pagecache pages
00177 1031 pages in swap cache
00177 Swap cache stats: add 6572399, delete 6572173, find 488603/3286476
00177 Free swap = 509112kB
00177 Total swap = 2097148kB
00177 130938 pages RAM
00177 0 pages HighMem/MovableOnly
00177 16644 pages reserved
00177 Unreclaimable slab info:
00177 9p-fcall-cache total: 8.25 MiB active: 8.25 MiB
00177 kernfs_node_cache total: 2.15 MiB active: 2.15 MiB
00177 kmalloc-64 total: 2.08 MiB active: 2.07 MiB
00177 task_struct total: 1.95 MiB active: 1.95 MiB
00177 kmalloc-4k total: 1.50 MiB active: 1.50 MiB
00177 signal_cache total: 1.34 MiB active: 1.34 MiB
00177 kmalloc-2k total: 1.16 MiB active: 1.16 MiB
00177 bch_inode_info total: 1.02 MiB active: 922 KiB
00177 perf_event total: 1.02 MiB active: 1.02 MiB
00177 biovec-max total: 992 KiB active: 960 KiB
00177 Shrinkers:
00177 super_cache_scan: objects: 127
00177 super_cache_scan: objects: 106
00177 jbd2_journal_shrink_scan: objects: 32
00177 ext4_es_scan: objects: 32
00177 bch2_btree_cache_scan: objects: 8
00177 nr nodes: 24
00177 nr dirty: 0
00177 cannibalize lock: 0000000000000000
00177
00177 super_cache_scan: objects: 8
00177 super_cache_scan: objects: 1
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: linux-mm@kvack.org
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a few new shrinker stats.
number of objects requested to free, number of objects freed:
Shrinkers won't necessarily free all objects requested for a variety of
reasons, but if the two counts are wildly different something is likely
amiss.
.scan_objects runtime:
If one shrinker is taking an excessive amount of time to free
objects that will block kswapd from running other shrinkers.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: linux-mm@kvack.org
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This adds a new callback method to shrinkers which they can use to
describe anything relevant to memory reclaim about their internal state,
for example object dirtyness.
This patch also adds shrinkers_to_text(), which reports on the top 10
shrinkers - by object count - in sorted order, to be used in OOM
reporting.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: linux-mm@kvack.org
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This adds a seq_buf wrapper for string_get_size().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Use jiffies macros instead of using jiffies directly to handle wraparound.
Signed-off-by: Chen Yufan <chenyufan@vivo.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_bset_fix_lookup_table is too complicated to be easily understood,
the comment "l now > where" there is also incorrect when where ==
t->end_offset. This patch therefore refactor the function, the idea is
that when where >= rw_aux_tree(b, t)[t->size - 1].offset, we don't need
to adjust the rw aux tree.
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
|
|
We rely on the trans->locked to know if a trans has nodes locked for
assertions about deadlocks; there can't be more than one trans in the
same process that is locked.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
folio_has_private() is an attractive nuisance; filesystem authors
generally don't realise that it actually checks two flags (one of which
is never set by bcachefs). There's no need to check the private flag at
all; for folios owned by bcachefs, we know that folio->private is NULL
when the private flag is clear and non-NULL when the private flag is set.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
It's really not needed: the only locks used here are the btree cache
lock, which we drop for GFP_WAIT allocations, and btree node locks - but
we also drop those for GFP_WAIT allocations.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Use helper functions to make code more readable.
Similar to commit a5488f29835c ("fs: simplify ->listxattr() implementation")
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Remove struct nop_posix_acl_{access,default} for bcachefs filesystem
that don't depend on the xattr handler in their inode->i_op->listxattr()
method in any way. There's nothing more to do than to simply remove the
handler. It's been effectively unused ever since we introduced the new
posix acl api. See [1] for details.
Link [1]: https://patchwork.kernel.org/project/linux-fsdevel/cover/20230125-fs-acl-remove-generic-xattr-handlers-v3-0-f760cc58967d@kernel.org/
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
iter here is unused, remove it.
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
After reducing the search range when building the aux tree, the prev array
stuff is no longer useful, so remove it.
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
When the search key's mantissa is larger than the node i's, we know that
the search key is larger than the first key of the cacheline corresponding
to node i, so that when we are calculating the mantissa of right side
nodes of node i, the left side of the search range can be the first key
of node i. Once the search range is minimized, the mantissa we are
calculating can have more useful bits, thus reduce the slow path
comparison. Besides, we can now remove all the prev array stuff.
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This patch replaces open-coded extra computation to eytzinger1_extra.
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This logic is no longer useful since commit
3ce8b463e3e0 ("bcachefs: kill bset_tree->max_key"), so remove it.
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The idx parameter of bkey_mantissa_bits_dropped is unused, remove it.
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The idx parameter of bkey_mantissa became unused since commit
b904a7991802 ("bcachefs: Go back to 16 bit mantissa bkey floats"),
so remove it.
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
factoring out a helper
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Xiaxi Shen <shenxiaxi26@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
In the macro definition of bkey_crc_next, five parameters
were accepted, but only four of them were used. Let's remove
the unused one.
The patch has only passed compilation tests, but it should be fine.
Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The macro allocate_dropping_locks accepts a parameter _trans,
but it was not used, rather the variable trans was directly used,
which may be a local variable inside a function that calls the macros.
Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The macro allocate_dropping_locks_errocode accepts a parameter _trans,
but it was not used, rather the variable trans was directly used,
which may be a local variable inside a function that calls the macros.
Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
macro bch2_kthread_wait_event_ioclock_timeout is no longer used,
let's remove it.
The patch has passed compilation test.
Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
New helper for looping over keys in a given subvolume
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Same as the recent change for __bch2_read(); also, kill now unnecessary
btree_trans_too_many_iters() calls.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|