btrfs: get used bytes while holding lock at btrfs_reclaim_bgs_work()

At btrfs_reclaim_bgs_work(), we are grabbing twice the used bytes counter of the block group while not holding the block group's spinlock. This can result in races, reported by KCSAN and similar tools, since a concurrent task can be updating that counter while at btrfs_update_block_group(). So avoid these races by grabbing the counter in a critical section delimited by the block group's spinlock after setting the block group to RO mode. This also avoids using two different values of the counter in case it changes in between each read. This silences KCSAN and is required for the next patch in the series too. Fixes: 243192b67649 ("btrfs: report reclaim stats in sysfs") Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
author: Filipe Manana <fdmanana@suse.com> 2025-02-24 15:40:26 +0000
committer: David Sterba <dsterba@suse.com> 2025-03-18 20:35:47 +0100
commit: ba5d06440cae63edc4f49465baf78f1f43e55c77 (patch)
tree: 87fef0d0d6327d1e21267dbfdc9d88aa9083165c
parent: 1283b8c125a83bf7a7dbe90c33d3472b6d7bf612 (diff)
1 files changed, 16 insertions, 5 deletions
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index cbea2bf25714..cbfcdd65e15e 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1823,7 +1823,7 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
 	list_sort(NULL, &fs_info->reclaim_bgs, reclaim_bgs_cmp);
 	while (!list_empty(&fs_info->reclaim_bgs)) {
 		u64 zone_unusable;
-		u64 reclaimed;
+		u64 used;
 		int ret = 0;
 
 		bg = list_first_entry(&fs_info->reclaim_bgs,
@@ -1919,19 +1919,30 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
 		if (ret < 0)
 			goto next;
 
+		/*
+		 * Grab the used bytes counter while holding the block group's
+		 * spinlock to prevent races with tasks concurrently updating it
+		 * due to extent allocation and deallocation (running
+		 * btrfs_update_block_group()) - we have set the block group to
+		 * RO but that only prevents extent reservation, allocation
+		 * happens after reservation.
+		 */
+		spin_lock(&bg->lock);
+		used = bg->used;
+		spin_unlock(&bg->lock);
+
 		btrfs_info(fs_info,
 			"reclaiming chunk %llu with %llu%% used %llu%% unusable",
 				bg->start,
-				div64_u64(bg->used * 100, bg->length),
+				div64_u64(used * 100, bg->length),
 				div64_u64(zone_unusable * 100, bg->length));
 		trace_btrfs_reclaim_block_group(bg);
-		reclaimed = bg->used;
 		ret = btrfs_relocate_chunk(fs_info, bg->start);
 		if (ret) {
 			btrfs_dec_block_group_ro(bg);
 			btrfs_err(fs_info, "error relocating chunk %llu",
 				  bg->start);
-			reclaimed = 0;
+			used = 0;
 			spin_lock(&space_info->lock);
 			space_info->reclaim_errors++;
 			if (READ_ONCE(space_info->periodic_reclaim))
@@ -1940,7 +1951,7 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
 		}
 		spin_lock(&space_info->lock);
 		space_info->reclaim_count++;
-		space_info->reclaim_bytes += reclaimed;
+		space_info->reclaim_bytes += used;
 		spin_unlock(&space_info->lock);
 
 next:
author	Filipe Manana <fdmanana@suse.com>	2025-02-24 15:40:26 +0000
committer	David Sterba <dsterba@suse.com>	2025-03-18 20:35:47 +0100
commit	ba5d06440cae63edc4f49465baf78f1f43e55c77 (patch)
tree	87fef0d0d6327d1e21267dbfdc9d88aa9083165c
parent	1283b8c125a83bf7a7dbe90c33d3472b6d7bf612 (diff)