From 5ccc944dce3df5fd2fd683a7df4fd49d1068eba2 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 10 Jun 2022 14:44:41 -0400 Subject: filemap: Correct the conditions for marking a folio as accessed We had an off-by-one error which meant that we never marked the first page in a read as accessed. This was visible as a slowdown when re-reading a file as pages were being evicted from cache too soon. In reviewing this code, we noticed a second bug where a multi-page folio would be marked as accessed multiple times when doing reads that were less than the size of the folio. Abstract the comparison of whether two file positions are in the same folio into a new function, fixing both of these bugs. Reported-by: Yu Kuai Reviewed-by: Kent Overstreet Signed-off-by: Matthew Wilcox (Oracle) --- mm/filemap.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index ac3775c1ce4c..577068868449 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2629,6 +2629,13 @@ err: return err; } +static inline bool pos_same_folio(loff_t pos1, loff_t pos2, struct folio *folio) +{ + unsigned int shift = folio_shift(folio); + + return (pos1 >> shift == pos2 >> shift); +} + /** * filemap_read - Read data from the page cache. * @iocb: The iocb to read. @@ -2700,11 +2707,11 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter, writably_mapped = mapping_writably_mapped(mapping); /* - * When a sequential read accesses a page several times, only + * When a read accesses the same folio several times, only * mark it as accessed the first time. */ - if (iocb->ki_pos >> PAGE_SHIFT != - ra->prev_pos >> PAGE_SHIFT) + if (!pos_same_folio(iocb->ki_pos, ra->prev_pos - 1, + fbatch.folios[0])) folio_mark_accessed(fbatch.folios[0]); for (i = 0; i < folio_batch_count(&fbatch); i++) { -- cgit v1.2.3 From cb995f4eeba9d268fd4b56c2423ad6c1d1ea1b82 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 17 Jun 2022 20:00:17 -0400 Subject: filemap: Handle sibling entries in filemap_get_read_batch() If a read races with an invalidation followed by another read, it is possible for a folio to be replaced with a higher-order folio. If that happens, we'll see a sibling entry for the new folio in the next iteration of the loop. This manifests as a NULL pointer dereference while holding the RCU read lock. Handle this by simply returning. The next call will find the new folio and handle it correctly. The other ways of handling this rare race are more complex and it's just not worth it. Reported-by: Dave Chinner Reported-by: Brian Foster Debugged-by: Brian Foster Tested-by: Brian Foster Reviewed-by: Brian Foster Fixes: cbd59c48ae2b ("mm/filemap: use head pages in generic_file_buffered_read") Cc: stable@vger.kernel.org Signed-off-by: Matthew Wilcox (Oracle) --- mm/filemap.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/filemap.c b/mm/filemap.c index 577068868449..ffdfbc8b0e3c 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2385,6 +2385,8 @@ static void filemap_get_read_batch(struct address_space *mapping, continue; if (xas.xa_index > max || xa_is_value(folio)) break; + if (xa_is_sibling(folio)) + break; if (!folio_try_get_rcu(folio)) goto retry; -- cgit v1.2.3 From b653db77350c7307a513b81856fe53e94cf42446 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sun, 19 Jun 2022 10:37:32 -0400 Subject: mm: Clear page->private when splitting or migrating a page In our efforts to remove uses of PG_private, we have found folios with the private flag clear and folio->private not-NULL. That is the root cause behind 642d51fb0775 ("ceph: check folio PG_private bit instead of folio->private"). It can also affect a few other filesystems that haven't yet reported a problem. compaction_alloc() can return a page with uninitialised page->private, and rather than checking all the callers of migrate_pages(), just zero page->private after calling get_new_page(). Similarly, the tail pages from split_huge_page() may also have an uninitialised page->private. Reported-by: Xiubo Li Tested-by: Xiubo Li Signed-off-by: Matthew Wilcox (Oracle) --- mm/huge_memory.c | 1 + mm/migrate.c | 1 + 2 files changed, 2 insertions(+) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index f7248002dad9..834f288b3769 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2377,6 +2377,7 @@ static void __split_huge_page_tail(struct page *head, int tail, page_tail); page_tail->mapping = head->mapping; page_tail->index = head->index + tail; + page_tail->private = 0; /* Page flags must be visible before we make the page non-compound. */ smp_wmb(); diff --git a/mm/migrate.c b/mm/migrate.c index e51588e95f57..6c1ea61f39d8 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1106,6 +1106,7 @@ static int unmap_and_move(new_page_t get_new_page, if (!newpage) return -ENOMEM; + newpage->private = 0; rc = __unmap_and_move(page, newpage, force, mode); if (rc == MIGRATEPAGE_SUCCESS) set_page_owner_migrate_reason(newpage, reason); -- cgit v1.2.3 From 00fa15e0d56482e32d8ca1f51d76b0ee00afb16b Mon Sep 17 00:00:00 2001 From: Alistair Popple Date: Mon, 20 Jun 2022 19:05:36 +1000 Subject: filemap: Fix serialization adding transparent huge pages to page cache Commit 793917d997df ("mm/readahead: Add large folio readahead") introduced support for using large folios for filebacked pages if the filesystem supports it. page_cache_ra_order() was introduced to allocate and add these large folios to the page cache. However adding pages to the page cache should be serialized against truncation and hole punching by taking invalidate_lock. Not doing so can lead to data races resulting in stale data getting added to the page cache and marked up-to-date. See commit 730633f0b7f9 ("mm: Protect operations adding pages to page cache with invalidate_lock") for more details. This issue was found by inspection but a testcase revealed it was possible to observe in practice on XFS. Fix this by taking invalidate_lock in page_cache_ra_order(), to mirror what is done for the non-thp case in page_cache_ra_unbounded(). Signed-off-by: Alistair Popple Fixes: 793917d997df ("mm/readahead: Add large folio readahead") Reviewed-by: Jan Kara Signed-off-by: Matthew Wilcox (Oracle) --- mm/readahead.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/readahead.c b/mm/readahead.c index 57a015108254..fdcd28cbd92d 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -510,6 +510,7 @@ void page_cache_ra_order(struct readahead_control *ractl, new_order--; } + filemap_invalidate_lock_shared(mapping); while (index <= limit) { unsigned int order = new_order; @@ -536,6 +537,7 @@ void page_cache_ra_order(struct readahead_control *ractl, } read_pages(ractl); + filemap_invalidate_unlock_shared(mapping); /* * If there were already pages in the page cache, then we may have -- cgit v1.2.3