summaryrefslogtreecommitdiff
tag nameremove-freeze-weirdness-5.11_2020-11-09 (c17e4bbac878ac2d433c4241f5a825a656355051)
tag date2020-11-09 10:07:38 -0800
tagged byDarrick J. Wong <darrick.wong@oracle.com>
tagged objectcommit 455994625a...
vfs: remove lockdep fs freeze weirdness
A long time ago, XFS got tangled up with lockdep because the last thing it does during a fs freeze is use a transaction to flush the superblock to disk. Transactions require int(ernal) write freeze protection, which implies a recursive grab of the intwrite lock and makes lockdep all sad. This was "solved" in lockdep back in 2012 as commit 5accdf82ba25c by adding a "convert XFS' blocking fsfreeze lock attempts into a trylock" behavior in v6 of a patch[1] that Jan Kara sent to try to fix fs freeze handling. The behavior was not in the v5[0] patch, nor was there any discussion for any of the v5 patches that would suggest why things changed from v5 to v6. Commit f4b554af9931 in 2015 created the current weird logic in __sb_start_write, which converts recursive freeze lock grabs into trylocks whose return values are ignored(!!!). XFS solved the problem by creating a variant of transactions (XFS_TRANS_NO_WRITECOUNT) that don't grab intwrite freeze protection, thus making lockdep's solution unnecessary. The commit claims that Dave Chinner explained that the trylock hack + comment could be removed, but the patch author never did that, and lore doesn't seem to know where or when Dave actually said that? Now it's 2020, and still nobody removed this from __sb_start_write. Worse yet, nowadays lock_is_held returns 1 if lockdep is built-in but offline. This causes attempts to grab the pagefaults freeze lock synchronously to turn into unchecked trylocks! Hilarity ensues if a page fault races with fsfreeze and loses, which causes us to break the locking model. This finally came to a head in 5.10-rc1 because the new lockdep bugs introduced during the merge window caused this maintainer to hit the weird case where sb_start_pagefault can return without having taken the freeze lock, leading to test failures and memory corruption. Since this insanity is dangerous and hasn't been needed by xfs since the late 2.6(???) days, kill it with fire. v2: refactor helpers to be more cohesive v3: move more cohesive helpers to header files