summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-04-25Start of %q(%p)shrinker_to_textKent Overstreet
2022-04-22bcachefs: Convert to lib/printbuf.cKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-22bcachefs: shrinker.to_text() methodsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-22mm: Centralize & improve oom reporting in show_mem.cKent Overstreet
This patch: - Changes show_mem() to always report on slab usage - Instead of reporting on all slabs, we only report on top 10 slabs, and in sorted order - Also reports on shrinkers, with the new shrinkers_to_text(). Shrinkers need to be included in OOM/allocation failure reporting because they're responsible for memory reclaim - if a shrinker isn't giving up its memory, we need to know which one and why. More OOM reporting can be moved to show_mem.c and improved, this patch is only a start. New example output on OOM/memory allocation failure: 00177 Mem-Info: 00177 active_anon:13706 inactive_anon:32266 isolated_anon:16 00177 active_file:1653 inactive_file:1822 isolated_file:0 00177 unevictable:0 dirty:0 writeback:0 00177 slab_reclaimable:6242 slab_unreclaimable:11168 00177 mapped:3824 shmem:3 pagetables:1266 bounce:0 00177 kernel_misc_reclaimable:0 00177 free:4362 free_pcp:35 free_cma:0 00177 Node 0 active_anon:54824kB inactive_anon:129064kB active_file:6612kB inactive_file:7288kB unevictable:0kB isolated(anon):64kB isolated(file):0kB mapped:15296kB dirty:0kB writeback:0kB shmem:12kB writeback_tmp:0kB kernel_stack:3392kB pagetables:5064kB all_unreclaimable? no 00177 DMA free:2232kB boost:0kB min:88kB low:108kB high:128kB reserved_highatomic:0KB active_anon:2924kB inactive_anon:6596kB active_file:428kB inactive_file:384kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB 00177 lowmem_reserve[]: 0 426 426 426 00177 DMA32 free:15092kB boost:5836kB min:8432kB low:9080kB high:9728kB reserved_highatomic:0KB active_anon:52196kB inactive_anon:122392kB active_file:6176kB inactive_file:7068kB unevictable:0kB writepending:0kB present:507760kB managed:441816kB mlocked:0kB bounce:0kB free_pcp:72kB local_pcp:0kB free_cma:0kB 00177 lowmem_reserve[]: 0 0 0 0 00177 DMA: 284*4kB (UM) 53*8kB (UM) 21*16kB (U) 11*32kB (U) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2248kB 00177 DMA32: 2765*4kB (UME) 375*8kB (UME) 57*16kB (UM) 5*32kB (U) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15132kB 00177 4656 total pagecache pages 00177 1031 pages in swap cache 00177 Swap cache stats: add 6572399, delete 6572173, find 488603/3286476 00177 Free swap = 509112kB 00177 Total swap = 2097148kB 00177 130938 pages RAM 00177 0 pages HighMem/MovableOnly 00177 16644 pages reserved 00177 Unreclaimable slab info: 00177 9p-fcall-cache total: 8.25 MiB active: 8.25 MiB 00177 kernfs_node_cache total: 2.15 MiB active: 2.15 MiB 00177 kmalloc-64 total: 2.08 MiB active: 2.07 MiB 00177 task_struct total: 1.95 MiB active: 1.95 MiB 00177 kmalloc-4k total: 1.50 MiB active: 1.50 MiB 00177 signal_cache total: 1.34 MiB active: 1.34 MiB 00177 kmalloc-2k total: 1.16 MiB active: 1.16 MiB 00177 bch_inode_info total: 1.02 MiB active: 922 KiB 00177 perf_event total: 1.02 MiB active: 1.02 MiB 00177 biovec-max total: 992 KiB active: 960 KiB 00177 Shrinkers: 00177 super_cache_scan: objects: 127 00177 super_cache_scan: objects: 106 00177 jbd2_journal_shrink_scan: objects: 32 00177 ext4_es_scan: objects: 32 00177 bch2_btree_cache_scan: objects: 8 00177 nr nodes: 24 00177 nr dirty: 0 00177 cannibalize lock: 0000000000000000 00177 00177 super_cache_scan: objects: 8 00177 super_cache_scan: objects: 1 Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-22mm: Move lib/show_mem.c to mm/Kent Overstreet
show_mem.c is really mm specific, and the next patch in the series is going to require mm/slab.h, so let's move it before doing more work on it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-22mm: Count requests to free & nr freed per shrinkerKent Overstreet
The next step in this patch series for improving debugging of shrinker related issues: keep counts of number of objects we request to free vs. actually freed, and prints them in shrinker_to_text(). Shrinkers won't necessarily free all objects requested for a variety of reasons, but if the two counts are wildly different something is likely amiss. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-22mm: Add a .to_text() method for shrinkersKent Overstreet
This adds a new callback method to shrinkers which they can use to describe anything relevant to memory reclaim about their internal state, for example object dirtyness. This uses the new printbufs to output to heap allocated strings, so that the .to_text() methods can be used both for messages logged to the console, and also sysfs/debugfs. This patch also adds shrinkers_to_text(), which reports on the top 10 shrinkers - by object count - in sorted order, to be used in OOM reporting. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-22clk: tegra: bpmp: Convert to printbufKent Overstreet
This converts from seq_buf to printbuf, which is similar but heap allocates the string buffer. Previously in this code the string buffer was allocated on the stack; this means we've added a new potential memory allocation failure. This is fine though since it's only for a dev_printk() message. Memory allocation context: printbuf doesn't take gfp flags, instead we prefer the new memalloc_no*_(save|restore) interfaces to be used. Here the surrounding code is already allocating with GFP_KERNEL, so everything is fine. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-22mm/memcontrol.c: Convert to printbufKent Overstreet
This converts memory_stat_format() from seq_buf to printbuf. Printbuf is simalar to seq_buf except that it heap allocates the string buffer: here, we were already heap allocating the buffer with kmalloc() so the conversion is trivial. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-22Input/joystick/analog: Convert from seq_buf -> printbufKent Overstreet
seq_buf is being deprecated, this converts to printbuf which is similar but heap allocates the string buffer. This means we have to consider memory allocation context & failure: Here we're in device initialization so GFP_KERNEL should be fine, and also as we're in device initialization returning -ENOMEM is fine. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-22lib/printbuf: New data structure for heap-allocated stringsKent Overstreet
This adds printbufs: simple heap-allocated strings meant for building up structured messages, for logging/procfs/sysfs and elsewhere. They've been heavily used in bcachefs for writing .to_text() functions/methods - pretty printers, which has in turn greatly improved the overall quality of error messages. Basic usage is documented in include/linux/printbuf.h. The next patches in the series are going to be using printbufs to implement a .to_text() method for shrinkers, and improving OOM reporting. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-22fixup! bcachefs: Go RW before bch2_check_lrus()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-22fixup! bcachefs: bch_sb_field_journal_v2Kent Overstreet
2022-04-22fixup! bcachefs: Shutdown path improvementsKent Overstreet
2022-04-21fixup! bcachefs: Go RW before bch2_check_lrus()Kent Overstreet
2022-04-21bcachefs: Improve invalid bkey error messageKent Overstreet
Bkeys have gotten a lot bigger since this code was written and now are often formatted across multiple lines - while the reason a bkey is invalid will still be short and fit on a single line. This patch prints the error bfore the bkey, making it a bit more readable. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-21fixup! bcachefs: bch2_btree_iter_peek_all_levels()Kent Overstreet
2022-04-21bcachefs: Fix journal_iters_fix()Kent Overstreet
journal_iters_fix() was incorrectly rewinding iterators past keys they had already returned, leading to those keys being double counted in the bch2_gc() path - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-21bcachefs: Go RW before bch2_check_lrus()Kent Overstreet
btree updates before going RW are expensive if they're in random order, since they use the list of keys for journal replay to insert, which is just a gap buffer. This patch improves the bucket invalidate path so that if bch2_check_lrus() hasn't finished it only prints warnings instead of doing an emergency shutdown, which means we can now set BCH_FS_MAY_GO_RW before bch2_check_lrus(). Also, the filesystem state bits are reorganized a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-21bcachefs: Add persistent countersDaniel Hill
This adds a new superblock field for persisting counters and adds a sysfs interface in counters/ exposing these counters. The superblock field is ignored by older versions letting us avoid an on disk version bump. Each sysfs file outputs a counter that tracks since filesystem creation and a counter for the current mount session. Signed-off-by: Daniel Hill <daniel@gluo.nz>
2022-04-18bcachefs: Tracepoint improvementsKent Overstreet
Delete some obsolete tracepoints, organize alloc tracepoints better, make a few tracepoints more consistent. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-18bcachefs: Don't kick journal reclaim unless low on spaceKent Overstreet
We shouldn't kick journal reclaim unnecessarily, it's got its own timer for that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-18bcachefs: Lock ordering fixKent Overstreet
Can't take btree node locks while holding btree_reserve_cache_lock - it would be nice if we could check this with lockdep. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Shutdown path improvementsKent Overstreet
We're seeing occasional firings of the assertion in the key cache shutdown code that nr_dirty == 0, which means we must sometimes be doing transaction commits after we've gone read only. Cleanups & changes: - BCH_FS_ALLOC_CLEAN renamed to BCH_FS_CLEAN_SHUTDOWN - new helper bch2_btree_interior_updates_flush(), which returns true if it had to wait - bch2_btree_flush_writes() now also returns true if there were btree writes in flight - __bch2_fs_read_only now checks if btree writes were in flight in the shutdown loop: btree write completion does a transaction update, to update the pointer in the parent node - assert that !BCH_FS_CLEAN_SHUTDOWN in __bch2_trans_commit Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Fix hash_check_key()Kent Overstreet
hash_check_key() was incorrectly handling transaction restarts - switch it to for_each_btree_key_norestart(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Allocate some extra room in btree_key_cache_fill()Kent Overstreet
If we allocate a buffer that's a bit bigger than necessary the transaction commit path will be much less likely to have to reallocate - which requires a transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17fixup! bcachefs: Fix usage of six lock's percpu modeKent Overstreet
2022-04-17bcachefs: bch2_btree_iter_peek_all_levels()Kent Overstreet
This adds bch2_btree_iter_peek_all_levels(), which returns keys from every level of the btree - interior nodes included - in monotonically increasing order, soon to be used by the backpointers check & repair code. - BTREE_ITER_ALL_LEVELS can now be passed to for_each_btree_key() to iterate thusly, much like BTREE_ITER_SLOTS - The existing algorithm in bch2_btree_iter_advance() doesn't work with peek_all_levels(): we have to defer the actual advancing until the next time we call peek, where we have the btree path traversed and uptodate. So, we add an advanced bit to btree_iter; when BTREE_ITER_ALL_LEVELS is set bch2_btree_iter_advanced() just marks the iterator as advanced. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: btree_path_set_level_(up|down)Kent Overstreet
This adds two new helpers to btree_iter.c for changing the level of a path up or down - to be used by the new bch2_btree_iter_peek_all_levels(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: bch2_btree_iter_peek_slot() now works on interior nodesKent Overstreet
The new backpointers code will be using bch2_btree_iter_peek_slot() on interior nodes - this patch updates peek_slot() to make that work. - Pass the correct level to bch2_journal_keys_peek_slot() - We should only set BTREE_ITER_CACHED or BTREE_ITER_WITH_KEY_CACHE when using bch2_trans_iter_init(), not bch2_trans_node_iter_init() - Update assertions Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: btree_update_interior.c prep for backpointersKent Overstreet
Previously, btree_update_interior.c passed keys to bch2_trans_mark_* that hadn't been fully initialized - they didn't have the key field filled out, just the value. With backpointers, we need to make sure keys are fully initialized before marking them - because the backpointer points back to the original key. This patch tweaks the interior update paths to fix this. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Plumb btree_id & level to trans_markKent Overstreet
For backpointers, we'll need the full key location - that means btree_id and btree level. This patch plumbs it through. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Improve some fsck error messagesKent Overstreet
We have string names for d_type; use it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Go emergency RO when i_blocks underflowsKent Overstreet
This improves some of our warnings and assertions - they imply possible filesystem inconsistencies, so they should be calling bch2_fs_inconsistent(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Ensure sysfs show fns print a newlineKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Improve try_alloc_bucket()Kent Overstreet
This patch primarily improves the error reporting in try_alloc_bucket(): we now print out the actual freespace extent we were allocating from, which makes it easier to grep through the logs when something goes wrong. Also: - switch to for_each_btree_key_norestart(); bch2_bucket_alloc() already handles transaction restarts, and in general transaction restarts should only be handled in one place - switch to < instead of != when looping over buckets represented by a freespace extent Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Kill old rebuild_replicas optionKent Overstreet
This option was useful when the replicas mechism was new and still being debugged, but hasn't been used in ages - let's delete it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: In fsck, pass BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE when deleting ↵Kent Overstreet
dirents A user reported an error where we hit an assertion due to deleting a key in an internal snapshot node, when deleting a dirent that points to a nonexisting inode. We try to avoid doing updates to keys for internal snapshot nodes, but upon inspection of the places where we remove dirents in fsck it appears BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE is correct for all of them: either the target dirent doesn't exist, or it's a directory with multiple dirents pointing to it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Fix for getting stuck in journal replayKent Overstreet
In journal replay, we weren't immediately dropping journal pins when we start doing updates that ewern't from journal replay - leading to journal reclaim getting stuck. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Delete a redundant tracepointKent Overstreet
Now that the bucket_alloc_fail tracepoint includes the error code, the open_bucket_alloc_fail tracepoint is redundant. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Improve error logging in fsck.cKent Overstreet
This adds error logging to a bunch of functions in fsck.c - in fsck, reduntant error messages is probably better than not enough. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Fix inode_backpointer_exists()Kent Overstreet
If the dirent an inode points to doesn't exist, we shouldn't be returning an error - just 0/false. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Improve bch2_lru_delete() error messagesKent Overstreet
When we detect a filesystem inconsistency, we should include the relevent keys in the error message. This patch adds a parameter to pass the key with the lru entry to bch2_lru_delete(), so that it can be printed. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Introduce bch2_journal_keys_peek_(upto|slot)()Kent Overstreet
When many journal replay keys have been overwritten, bch2_journal_keys_peek() was taking excessively long to scan before it found a key to return. Fix this by introducing bch2_journal_keys_peek_upto() which takes a parameter for the end of the range we want, so that we can terminate the search much sooner, and replace all uses of bch2_journal_keys_peek() with peek_upto() or peek_slot(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Improve error message when alloc key doesn't match lru entryKent Overstreet
Error messages should always print out the full key when available - this gives us a starting point when looking through the journal to debug what went wrong. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Ensure buckets have io_time[READ] setKent Overstreet
It's an error if a bucket is in state BCH_DATA_cached but not on the LRU btree - i.e io_time[READ] == 0 - so, make sure it's set before adding it. Also, make some of the LRU code a bit clearer and more direct. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Use bch2_trans_inconsistent_on() in more placesKent Overstreet
This gets us better error messages. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Improve bch2_open_buckets_to_text()Kent Overstreet
This patch updates bch2_open_buckets_to_text() to include the device and bucket the open_bucket owns. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Fix CPU usage in journal read pathKent Overstreet
In journal_entry_add(), we were repeatedly scanning the journal entries radix tree to scan for old entries that can be freed, with O(n^2) behaviour. This patch tweaks things to remember the previous last_seq, so we don't have to scan for entries to free from the start. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-04-17bcachefs: Fix a null ptr derefKent Overstreet
We start doing allocations before the GC thread is created, which means we need to check for that to avoid a null ptr deref. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>