summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2022-10-03bcachefs: Improve bucket_alloc tracepointKent Overstreet
It now includes more info - whether the bucket was for metadata or data - and also call it in the same place as the bucket_alloc_fail tracepoint. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2022-10-03bcachefs: Delete old deadlock avoidance codeKent Overstreet
This deletes our old lock ordering based deadlock avoidance code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03bcachefs: Deadlock cycle detectorKent Overstreet
We've outgrown our own deadlock avoidance strategy. The btree iterator API provides an interface where the user doesn't need to concern themselves with lock ordering - different btree iterators can be traversed in any order. Without special care, this will lead to deadlocks. Our previous strategy was to define a lock ordering internally, and whenever we attempt to take a lock and trylock() fails, we'd check if the current btree transaction is holding any locks that cause a lock ordering violation. If so, we'd issue a transaction restart, and then bch2_trans_begin() would re-traverse all previously used iterators, but in the correct order. That approach had some issues, though. - Sometimes we'd issue transaction restarts unnecessarily, when no deadlock would have actually occured. Lock ordering restarts have become our primary cause of transaction restarts, on some workloads totally 20% of actual transaction commits. - To avoid deadlock or livelock, we'd often have to take intent locks when we only wanted a read lock: with the lock ordering approach, it is actually illegal to hold _any_ read lock while blocking on an intent lock, and this has been causing us unnecessary lock contention. - It was getting fragile - the various lock ordering rules are not trivial, and we'd been seeing occasional livelock issues related to this machinery. So, since bcachefs is already a relational database masquerading as a filesystem, we're stealing the next traditional database technique and switching to a cycle detector for avoiding deadlocks. When we block taking a btree lock, after adding ourself to the waitlist but before sleeping, we do a DFS of btree transactions waiting on other btree transactions, starting with the current transaction and walking our held locks, and transactions blocking on our held locks. If we find a cycle, we emit a transaction restart. Occasionally (e.g. the btree split path) we can not allow the lock() operation to fail, so if necessary we'll tell another transaction that it has to fail. Result: trans_restart_would_deadlock events are reduced by a factor of 10 to 100, and we'll be able to delete a whole bunch of grotty, fragile code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03six locks: Add start_time to six_lock_waiterKent Overstreet
This is needed by the cycle detector in bcachefs - we need a way to iterater over waitlist entries while dropping and retaking the waitlist lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2022-10-03six locks: Simplify six_optimistic_spin()Kent Overstreet
With the new method where the thread doing the wakeup after unlock takes the lock on behalf of the thread waiting for the lock, we don't want to spin calling trylock() anymore - we can instead spin on wait->lock_acquired, and not have to touch the lock's cacheline. Also, osq_lock doesn't make much sense for six locks; multiple readers may be waiting on a single thread to drop the write lock, and dropping it simplifies the code a bit more. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2022-10-03six locks: six_lock_waiter()Kent Overstreet
This allows passing in the wait list entry - to be used for a deadlock cycle detector. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2022-10-03six locks: Wakeup now takes lock on behalf of waiterKent Overstreet
This brings back an important optimization, to avoid touching the wait lists an extra time, while preserving the property that a thread is on a lock waitlist iff it is waiting - it is never removed from the waitlist until it has the lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2022-10-03six locks: Simplify wait listsKent Overstreet
This switches to a single list of waiters, instead of separate lists for read and intent, and switches write locks to also use the wait lists instead of being handled differently. Also, removal from the wait list is now done by the process waiting on the lock, not the process doing the wakeup. This is needed for the new deadlock cycle detector - we need tasks to stay on the waitlist until they've successfully acquired the lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2022-10-03locking/lockdep: lockdep_set_no_check_recursion()Kent Overstreet
This adds a method to tell lockdep not to check lock ordering within a lock class - but to still check lock ordering w.r.t. other lock types. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2022-10-03bcachefs: bch2_btree_path_upgrade() now emits transaction restartKent Overstreet
Centralizing the transaction restart/tracepoint in bch2_btree_path_upgrade() lets us improve the tracepoint - now it emits old and new locks_want. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2022-10-03six locks: Delete six_lock_pcpu_free_rcu()Kent Overstreet
Didn't have any users, and wasn't a good idea to begin with - delete it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2022-10-03bcachefs: Add persistent counters for all tracepointsKent Overstreet
Also, do some reorganizing/renaming, convert atomic counters in bch_fs to persistent counters, and add a few missing counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2022-10-03bcachefs: Improve trans_restart_journal_preres_get tracepointKent Overstreet
It now includes journal_flags. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2022-10-03bcachefs: Improve btree_node_relock_fail tracepointKent Overstreet
It now prints the error name when the btree node is an error pointer; also, don't trace failures when the the btree node is BCH_ERR_no_btree_node_up. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2022-10-03bcachefs: Switch btree locking code to struct btree_bkey_cached_commonKent Overstreet
This is just some type safety cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03six locks: Improve six_lock_countKent Overstreet
six_lock_count now counts up whether a write lock held, and this patch now also correctly counts six_lock->intent_lock_recurse. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03bcachefs: Switch "no match for" message to a tracepointKent Overstreet
This message fires when the data update path races with a foreground write that overwrote the data that was being moved - this isn't a concerning event as long as it's not happening too often, so switch it to a tracepoint. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03bcachefs: Don't drop locks unnecessarily in bch2_btree_update_start()Kent Overstreet
This is to fix a livelock in the btree split path described by the previous patch. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03lib/printbuf: Tabstop improvementsKent Overstreet
- Add a flag, has_indent_or_tabstops, that is set if indent level or tabstops are set. - Tabstops can no longer be set by modifying the tabstop array directly: instead, the new functions are provided: printbuf_tabstop_push() - add a new tabstop, n spaces after previous tabstop printbuf_tabtstop_pop() - remove previous tabstop printbuf_tabstops_reset() - remove all tabstops Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03lib/printbuf: prt_str_indented()Kent Overstreet
This adds a new helper, prt_str_indented(), which handles embedded control characters by calling prt_newline(), prt_tab(), and prt_tab_rjust() as needed. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03bcachefs: Tracepoint improvementsKent Overstreet
Our types are exported to the tracepoint code, so it's not necessary to break things out individually when passing them to tracepoints - we can also call other functions from TP_fast_assign(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03bcachefs: Tracepoint improvementsKent Overstreet
- use strlcpy(), not strncpy() - add tracepoints for btree_path alloc and free - give the tracepoint for key cache upgrade fail a proper name - add a tracepoint for btree_node_upgrade_fail Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03bcachefs: Inject transaction restarts in debug modeKent Overstreet
In CONFIG_BCACHEFS_DEBUG mode, we'll now randomly issue transaction restarts - with a decaying probability based on the number of restarts we've already had, to ensure that transactions eventually make forward progress. This should help shake out some bugs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03bcachefs: btree_trans_too_many_iters() is now a transaction restartKent Overstreet
All transaction restarts need a tracepoint - this is essential for debugging Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03bcachefs: Improved errcodesKent Overstreet
Instead of overloading standard error codes (EINTR/EAGAIN), and defining short lists of error codes in multiple places that potentially end up overlapping & conflicting, we're now going to have one master list of error codes. Error codes are defined with an x-macro: thus we also have bch2_err_str() now. Also, error codes have a class field. Now, instead of checking for errors with ==, code should use bch2_err_matches(), which returns true if the error is equal to or a sub-error of the error class. This means we can define unique errors for every source location where an error is generated, which will help improve our error messages. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03bcachefs: Improve bucket_alloc_fail tracepointKent Overstreet
We should be printing the number of free buckets, not just the number of available buckets. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03bcachefs: Bucket invalidate path improvementsKent Overstreet
- invalidate_one_bucket() now returns 1 when we don't have any buckets on this device to invalidate, ensuring we don't spin - the tracepoint invocation is moved to after the transaction commit, and we now include the number of cached sectors in the tracepoint Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03mm: Count requests to free & nr freed per shrinkerKent Overstreet
The next step in this patch series for improving debugging of shrinker related issues: keep counts of number of objects we request to free vs. actually freed, and prints them in shrinker_to_text(). Shrinkers won't necessarily free all objects requested for a variety of reasons, but if the two counts are wildly different something is likely amiss. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03mm: Add a .to_text() method for shrinkersKent Overstreet
This adds a new callback method to shrinkers which they can use to describe anything relevant to memory reclaim about their internal state, for example object dirtyness. This uses the new printbufs to output to heap allocated strings, so that the .to_text() methods can be used both for messages logged to the console, and also sysfs/debugfs. This patch also adds shrinkers_to_text(), which reports on the top 10 shrinkers - by object count - in sorted order, to be used in OOM reporting. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03vsprintf: %pf(%p)Kent Overstreet
This implements a new %p format string extensions for passing a pretty printer and its arguments to printk, which will then be inserted into the formatted output. A pretty-printer is a function that takes as its first argument a pointer to a struct printbuf, and then zero or more additional pointer arguments - these being the objects to format and print. The arguments to the pretty-printer function are denoted in the format string by %p, i.e %pf() foo0_to_text(struct printbuf *out) %pf(%p) foo1_to_text(struct printbuf *out, struct foo *) %pf(%p,%p) foo2_to_text(struct printbuf *out, struct foo *) We'd also like to eventually support non pointer arguments - in particular, integers - but this will probably require libffi. Typechecking is accomplished with the CALL_PP macro, which verifies that the arguments passed to sprintf match the types of the pp-function arguments, and passes a struct with a cookie to sprintf so that sprintf can verify that the CALL_PP() macro was used. Full example: static void foo_to_text(struct printbuf *out, struct foo *foo) { prt_printf(out, "bar=%u baz=%u", foo->bar, foo->baz); } printf("%pf(%p)", CALL_PP(foo_to_text, foo)); The goal is to replace most of our %p format extensions with this interface, and to move pretty-printers out of the core vsprintf.c code - this will get us better organization and better discoverability (you'll be able to cscope to pretty printer calls!), as well as eliminate a lot of dispatch code in vsprintf.c. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-10-03Delete seq_bufKent Overstreet
No longer has any users, so delete it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03tracing: Convert to printbufKent Overstreet
This converts the seq_bufs in dynevent_cmd and trace_seq to printbufs. - read_pos in seq_buf doesn't exist in printbuf, so is added to trace_seq - seq_buf_to_user doesn't have a printbuf equivalent, so is inlined into trace_seq_to_user - seq_buf_putmem_hex currently swabs bytes on little endian, hardcoded to 8 byte units. This patch switches it to prt_hex_bytes(), which does _not_ swab. Otherwise this is largely a direct conversion, with a few slight refactorings and cleanups. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03d_path: prt_path()Kent Overstreet
This implements a new printbuf version of d_path()/mangle_path(), which will replace the seq_buf version. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03vsprintf: prt_u64_minwidth(), prt_u64()Kent Overstreet
This adds two new-style printbuf helpers for printing simple u64s, and converts num_to_str() to be a simple wrapper around prt_u64_minwidth(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03lib/pretty-printers: prt_string_option(), prt_bitflags()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03lib/printbuf: Unit specifiersKent Overstreet
This adds options to printbuf for specifying whether units should be printed raw (default) or with human readable units, and for controlling whether human-readable units should be base 2 (default), or base 10. This also adds new helpers that obey these options: - pr_human_readable_u64 - pr_human_readable_s64 These obey printbuf->si_units - pr_units_u64 - pr_units_s64 These obey both printbuf-human_readable_units and printbuf->si_units Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03lib/printbuf: Tabstops, indentingKent Overstreet
This patch adds two new features to printbuf for structured formatting: - Indent level: the indent level, as a number of spaces, may be increased with pr_indent_add() and decreased with pr_indent_sub(). Subsequent lines, when started with pr_newline() (not "\n", although that may change) will then be intended according to the current indent level. This helps with pretty-printers that structure a large amonut of data across multiple lines and multiple functions. - Tabstops: Tabstops may be set by assigning to the printbuf->tabstops array. Then, pr_tab() may be used to advance to the next tabstop, printing as many spaces as required - leaving previous output left justified to the previous tabstop. pr_tab_rjust() advances to the next tabstop but inserts the spaces just after the previous tabstop - right justifying the previously-outputted text to the next tabstop. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03lib/printbuf: Heap allocationKent Overstreet
This makes printbufs optionally heap allocated: a printbuf initialized with the PRINTBUF initializer will automatically heap allocate and resize as needed. Allocations are done with GFP_KERNEL: code should use e.g. memalloc_nofs_save()/restore() as needed. Since we do not currently have memalloc_nowait_save()/restore(), in contexts where it is not safe to block we provide the helpers printbuf_atomic_inc() printbuf_atomic_dec() When the atomic count is nonzero, memory allocations will be done with GFP_NOWAIT. On memory allocation failure, output will be truncated. Code that wishes to check for memory allocation failure (in contexts where we should return -ENOMEM) should check if printbuf->allocation_failure is set. Since printbufs are expected to be typically used for log messages and on a best effort basis, we don't return errors directly. Other helpers provided by this patch: - printbuf_make_room(buf, extra) Reallocates if necessary to make room for @extra bytes (not including terminating null). - printbuf_str(buf) Returns a null terminated string equivalent to the contents of @buf. If @buf was never allocated (or allocation failed), returns a constant empty string. - printbuf_exit(buf) Releases memory allocated by a printbuf. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03lib/string_helpers: string_get_size() now returns characters wroteKent Overstreet
printbuf now needs to know the number of characters that would have been written if the buffer was too small, like snprintf(); this changes string_get_size() to return the the return value of snprintf(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03lib/hexdump: Convert to printbufKent Overstreet
This converts most of the hexdump code to printbufs, along with some significant cleanups and a bit of reorganization. The old non-printbuf functions are mostly left as wrappers around the new printbuf versions. Big note: byte swabbing behaviour Previously, hex_dump_to_buffer() would byteswab the groups of bytes being printed on little endian machines. This behaviour is... not standard or typical for a hex dumper, and this behaviour was silently added/changed without documentation (in 2007). Given that the hex dumpers are just used for debugging output, nothing is likely to break, and hopefully by reverting to more standard behaviour the end result will be _less_ confusion, modulo a few kernel developers who will certainly be annoyed by their tools changing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03vsprintf: Convert to printbufKent Overstreet
This converts vsnprintf() to printbufs: instead of passing around raw char * pointers for current buf position and end of buf, we have a real type! This makes the calling convention for our existing pretty printers a lot saner and less error prone, plus printbufs add some new helpers that make the code smaller and more readable, with a lot less crazy pointer arithmetic. There are a lot more refactorings to be done: this patch tries to stick to just converting the calling conventions, as that needs to be done all at once in order to avoid introducing a ton of wrappers that will just be deleted. Thankfully we have good unit tests for printf, and they have been run and are all passing with this patch. We have two new exported functions with this patch: - prt_printf(), which is like snprintf but outputs to a printbuf - prt_vprintf, like vsnprintf These are the actual core print routines now - vsnprintf() is a wrapper around prt_vprintf(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03lib/string_helpers: Convert string_escape_mem() to printbufKent Overstreet
Like the upcoming vsprintf.c conversion, this converts string_escape_mem to prt_escaped_string(), which uses and outputs to a printbuf, and makes string_escape_mem() a smaller wrapper to support existing users. The new printbuf helpers greatly simplify the code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03lib/printbuf: New data structure for printing stringsKent Overstreet
This adds printbufs: a printbuf points to a char * buffer and knows the size of the output buffer as well as the current output position. Future patches will be adding more features to printbuf, but initially printbufs are targeted at refactoring and improving our existing code in lib/vsprintf.c - so this initial printbuf patch has the features required for that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-10-03bcachefs: Tracepoint improvementsKent Overstreet
Delete some obsolete tracepoints, organize alloc tracepoints better, make a few tracepoints more consistent. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03bcachefs: Delete a redundant tracepointKent Overstreet
Now that the bucket_alloc_fail tracepoint includes the error code, the open_bucket_alloc_fail tracepoint is redundant. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03bcachefs: Don't normalize to pages in btree cache shrinkerKent Overstreet
This behavior dates from the early, early days of bcache, and upon further delving appears to not make any sense. The shrinker only works in terms of 'objects' of unknown size; normalizing to pages only had the effect of changing the batch size, which we could do directly - if we wanted; we probably don't. Normalizing to pages meant our batch size was very small, which seems to have been keeping us from doing as much shrinking as we should be under heavy memory pressure; this patch appears to alleviate some OOMs we've been seeing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03bcachefs: Add a tracepoint for superblock writesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03bcachefs: Discard path fixes/improvementsKent Overstreet
- bch2_clear_need_discard() was using bch2_trans_relock() incorrectly, and always bailing out before doing any work - ouch. - Add a tracepoint that fires every time bch2_do_discards() runs, and tells us about the work it did - When too many buckets aren't able to be discarded because they need a journal commit, bch2_do_discards now flushes the journal. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03bcachefs: Run overwrite triggers before insertKent Overstreet
For backpointers, we'll need to delete old backpointers before adding new backpointers - otherwise we'll run into spurious duplicate backpointer errors. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-10-03bcachefs: Improve bucket_alloc_fail() tracepointKent Overstreet
This adds counters for each of the reasons we may skip allocating a bucket - we're seeing a bug where we loop endlessly trying to allocate when we should have plenty of buckets available, so hopefully this will help us track down why. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>