From 5ccc944dce3df5fd2fd683a7df4fd49d1068eba2 Mon Sep 17 00:00:00 2001
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Date: Fri, 10 Jun 2022 14:44:41 -0400
Subject: filemap: Correct the conditions for marking a folio as accessed

We had an off-by-one error which meant that we never marked the first page
in a read as accessed.  This was visible as a slowdown when re-reading
a file as pages were being evicted from cache too soon.  In reviewing
this code, we noticed a second bug where a multi-page folio would be
marked as accessed multiple times when doing reads that were less than
the size of the folio.

Abstract the comparison of whether two file positions are in the same
folio into a new function, fixing both of these bugs.

Reported-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/filemap.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index ac3775c1ce4c..577068868449 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2629,6 +2629,13 @@ err:
 	return err;
 }
 
+static inline bool pos_same_folio(loff_t pos1, loff_t pos2, struct folio *folio)
+{
+	unsigned int shift = folio_shift(folio);
+
+	return (pos1 >> shift == pos2 >> shift);
+}
+
 /**
  * filemap_read - Read data from the page cache.
  * @iocb: The iocb to read.
@@ -2700,11 +2707,11 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
 		writably_mapped = mapping_writably_mapped(mapping);
 
 		/*
-		 * When a sequential read accesses a page several times, only
+		 * When a read accesses the same folio several times, only
 		 * mark it as accessed the first time.
 		 */
-		if (iocb->ki_pos >> PAGE_SHIFT !=
-		    ra->prev_pos >> PAGE_SHIFT)
+		if (!pos_same_folio(iocb->ki_pos, ra->prev_pos - 1,
+							fbatch.folios[0]))
 			folio_mark_accessed(fbatch.folios[0]);
 
 		for (i = 0; i < folio_batch_count(&fbatch); i++) {
-- 
cgit v1.2.3


From cb995f4eeba9d268fd4b56c2423ad6c1d1ea1b82 Mon Sep 17 00:00:00 2001
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Date: Fri, 17 Jun 2022 20:00:17 -0400
Subject: filemap: Handle sibling entries in filemap_get_read_batch()

If a read races with an invalidation followed by another read, it is
possible for a folio to be replaced with a higher-order folio.  If that
happens, we'll see a sibling entry for the new folio in the next iteration
of the loop.  This manifests as a NULL pointer dereference while holding
the RCU read lock.

Handle this by simply returning.  The next call will find the new folio
and handle it correctly.  The other ways of handling this rare race are
more complex and it's just not worth it.

Reported-by: Dave Chinner <david@fromorbit.com>
Reported-by: Brian Foster <bfoster@redhat.com>
Debugged-by: Brian Foster <bfoster@redhat.com>
Tested-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Fixes: cbd59c48ae2b ("mm/filemap: use head pages in generic_file_buffered_read")
Cc: stable@vger.kernel.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/filemap.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/filemap.c b/mm/filemap.c
index 577068868449..ffdfbc8b0e3c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2385,6 +2385,8 @@ static void filemap_get_read_batch(struct address_space *mapping,
 			continue;
 		if (xas.xa_index > max || xa_is_value(folio))
 			break;
+		if (xa_is_sibling(folio))
+			break;
 		if (!folio_try_get_rcu(folio))
 			goto retry;
 
-- 
cgit v1.2.3


From b653db77350c7307a513b81856fe53e94cf42446 Mon Sep 17 00:00:00 2001
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Date: Sun, 19 Jun 2022 10:37:32 -0400
Subject: mm: Clear page->private when splitting or migrating a page

In our efforts to remove uses of PG_private, we have found folios with
the private flag clear and folio->private not-NULL.  That is the root
cause behind 642d51fb0775 ("ceph: check folio PG_private bit instead
of folio->private").  It can also affect a few other filesystems that
haven't yet reported a problem.

compaction_alloc() can return a page with uninitialised page->private,
and rather than checking all the callers of migrate_pages(), just zero
page->private after calling get_new_page().  Similarly, the tail pages
from split_huge_page() may also have an uninitialised page->private.

Reported-by: Xiubo Li <xiubli@redhat.com>
Tested-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/huge_memory.c | 1 +
 mm/migrate.c     | 1 +
 2 files changed, 2 insertions(+)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f7248002dad9..834f288b3769 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2377,6 +2377,7 @@ static void __split_huge_page_tail(struct page *head, int tail,
 			page_tail);
 	page_tail->mapping = head->mapping;
 	page_tail->index = head->index + tail;
+	page_tail->private = 0;
 
 	/* Page flags must be visible before we make the page non-compound. */
 	smp_wmb();
diff --git a/mm/migrate.c b/mm/migrate.c
index e51588e95f57..6c1ea61f39d8 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1106,6 +1106,7 @@ static int unmap_and_move(new_page_t get_new_page,
 	if (!newpage)
 		return -ENOMEM;
 
+	newpage->private = 0;
 	rc = __unmap_and_move(page, newpage, force, mode);
 	if (rc == MIGRATEPAGE_SUCCESS)
 		set_page_owner_migrate_reason(newpage, reason);
-- 
cgit v1.2.3


From 00fa15e0d56482e32d8ca1f51d76b0ee00afb16b Mon Sep 17 00:00:00 2001
From: Alistair Popple <apopple@nvidia.com>
Date: Mon, 20 Jun 2022 19:05:36 +1000
Subject: filemap: Fix serialization adding transparent huge pages to page
 cache

Commit 793917d997df ("mm/readahead: Add large folio readahead")
introduced support for using large folios for filebacked pages if the
filesystem supports it.

page_cache_ra_order() was introduced to allocate and add these large
folios to the page cache. However adding pages to the page cache should
be serialized against truncation and hole punching by taking
invalidate_lock. Not doing so can lead to data races resulting in stale
data getting added to the page cache and marked up-to-date. See commit
730633f0b7f9 ("mm: Protect operations adding pages to page cache with
invalidate_lock") for more details.

This issue was found by inspection but a testcase revealed it was
possible to observe in practice on XFS. Fix this by taking
invalidate_lock in page_cache_ra_order(), to mirror what is done for the
non-thp case in page_cache_ra_unbounded().

Signed-off-by: Alistair Popple <apopple@nvidia.com>
Fixes: 793917d997df ("mm/readahead: Add large folio readahead")
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/readahead.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/readahead.c b/mm/readahead.c
index 57a015108254..fdcd28cbd92d 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -510,6 +510,7 @@ void page_cache_ra_order(struct readahead_control *ractl,
 			new_order--;
 	}
 
+	filemap_invalidate_lock_shared(mapping);
 	while (index <= limit) {
 		unsigned int order = new_order;
 
@@ -536,6 +537,7 @@ void page_cache_ra_order(struct readahead_control *ractl,
 	}
 
 	read_pages(ractl);
+	filemap_invalidate_unlock_shared(mapping);
 
 	/*
 	 * If there were already pages in the page cache, then we may have
-- 
cgit v1.2.3