summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-09-02xfs/559: adapt to kernels that use large folios for writesv2023.09.03Darrick J. Wong
The write invalidation code in iomap can only be triggered for writes that span multiple folios. If the kernel reports a huge page size, scale up the write size. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-09-02common: rename get_page_size to _get_page_sizeDarrick J. Wong
This function does not follow the naming convention that common helpers must start with an underscore. Fix this. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-09-02common: split _get_hugepagesize into detection and actual queryDarrick J. Wong
This helper has two parts -- querying the value, and _notrun'ing the test if huge pages aren't turned on. Break these into the usual _require_hugepages and _get_hugepagesize predicates so that we can adapt xfs/559 to large folios being used for writes. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-09-02btrfs/282: skip test if /var/lib/btrfs isnt writableDarrick J. Wong
I run fstests in a readonly container, and accidentally uninstalled the btrfsprogs package. When I did, this test started faililng: --- btrfs/282.out +++ btrfs/282.out.bad @@ -1,3 +1,7 @@ QA output created by 282 wrote 2147483648/2147483648 bytes at offset 0 XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +WARNING: cannot create scrub data file, mkdir /var/lib/btrfs failed: Read-only file system. Status recording disabled +WARNING: failed to open the progress status socket at /var/lib/btrfs/scrub.progress.3e1cf8c6-8f8f-4b51-982c-d6783b8b8825: No such file or directory. Progress cannot be queried +WARNING: cannot create scrub data file, mkdir /var/lib/btrfs failed: Read-only file system. Status recording disabled +WARNING: failed to open the progress status socket at /var/lib/btrfs/scrub.progress.3e1cf8c6-8f8f-4b51-982c-d6783b8b8825: No such file or directory. Progress cannot be queried Skip the test if /var/lib/btrfs isn't writable, or if /var/lib isn't writable, which means we cannot create /var/lib/btrfs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-09-02generic/187: don't run this test on NFSJeff Layton
This test is unreliable on NFS. It fails consistently when run vs. a server exporting btrfs, but passes when the server exports xfs. Since we don't have any sort of attribute that we can require to test this, just skip this one on NFS. Also, subsume the check for btrfs into the _supported_fs check, and add a comment for it. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-09-02generic/357: don't run this test on NFSJeff Layton
NFS doesn't keep track of whether a file is reflinked or not, so it doesn't prevent this behavior. It shouldn't be a problem for NFS anyway, so just skip this test there. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-09-02generic/294: don't run this test on NFSJeff Layton
When creating a new dentry (of any type), NFS will optimize away any on-the-wire lookups prior to the create since that means an extra round trip to the server. Because of that, it consistently fails this test. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-09-02generic/*: add a check for security attrsJeff Layton
There are several generic tests that require "setcap", but don't check whether the underlying fs supports security attrs. Add the appropriate checks. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-09-02generic/578: add a check to ensure that fiemap is supportedJeff Layton
This test requires FIEMAP support. Suggested-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-09-02common/attr: fix the _require_acl testJeff Layton
_require_acl tests whether you're able to fetch the ACL from a file using chacl, and then tests for an -EOPNOTSUPP error return. Unfortunately, filesystems that don't support them (like NFSv4) just return -ENODATA when someone calls getxattr for the POSIX ACL, so the test doesn't work. Fix the test to have chacl set an ACL on the file instead, which should reliably fail on filesystems that don't support them. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-09-02generic/61[67]: support SOAK_DURATIONDarrick J. Wong
Now that I've finally gotten liburing installed on my test machine, I can actually test io_uring. Adapt these two tests to support SOAK_DURATION so I can add it to that too. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-09-02generic/650: race mount and unmount with cpu hotplug tooDarrick J. Wong
Ritesh Harjani reported that mount and unmount can race with the xfs cpu hotplug notifier hooks and crash the kernel, which isfixed by: https://lore.kernel.org/linux-xfs/ZO6J4W9msOixUk05@dread.disaster.area/T/#t Extend this test to include that. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-09-02generic/650: add SOAK_DURATION controlsDarrick J. Wong
Make this test controllable via SOAK_DURATION, for anyone who wants to perform a long soak test of filesystem vs. cpu hotplug. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-09-02btrfs/237: kick reclaim process with a small filesystemNaohiro Aota
Since commit 3687fcb0752a ("btrfs: zoned: make auto-reclaim less aggressive"), the reclaim process won't run unless the 75% (by default) of the filesystem volume is allocated as block groups. As a result, btrfs/237 won't success when it is run with a large volume. To run the reclaim process, we need to either fill the FS to the desired level, or make a small FS so that the test write can go over the level. Since the current test code expects the FS has only one data block group, filling the FS is both cumbersome and need effort to rewrite the test code. So, we take the latter method. We create a small (16 * zone size) FS. The size is chosen to hold a minimal FS with DUP metadata setup. However, creating a small FS is not enough. With SINGLE metadata setup, we allocate 3 zones (one for each DATA, METADATA and SYSTEM), which is less than 75% of 16 zones. We can tweak the threshold to 51% on regular btrfs kernel config (!CONFIG_BTRFS_DEBUG), but that is still not enough to start the reclaim process. So, this test requires CONFIG_BTRFS_DEBUG to set the threshold to 1%. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-09-02fstests: generic/352 should accomodate other pwrite behaviorsBill O'Donnell
xfs_io pwrite issues a series of block size writes, but there is no guarantee that the resulting extent(s) will be singular or contiguous. This behavior is acceptable, but the test is flawed in that it expects a single extent for a pwrite. Modify test to limit pwrite and reflink to a single block. Signed-off-by: Bill O'Donnell <bodonnel@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-09-02fstests: test fix for an agbno overflow in __xfs_getfsmap_datadevDarrick J. Wong
Dave Chinner reported that xfs/273 fails if the AG size happens to be an exact power of two. I traced this to an agbno integer overflow when the current GETFSMAP call is a continuation of a previous GETFSMAP call, and the last record returned was non-shareable space at the end of an AG. This is the regression test for that bug. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-09-02generic/551: bail out test if aio-dio-write-verify failedNaohiro Aota
When the AIO program failed, it is better to bail out the test to keep the failed state intact. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-09-02aio-dio-write-verify: print more info on the error caseNaohiro Aota
When short read or corruption happened, it is difficult to locate which IO event failed. Print the address to make it identifiable. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-09-02aio-dio-write-verify: check for the IO errorsNaohiro Aota
The async write IOs can return some errors, which may lead to a short read or corruption in io_verify() stage. Catch an error early to identify the root cause easily. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-08-25generic/471: Remove this broken casev2023.08.27Yang Xu
I remember this case fails on last year becuase of kernel commit cae2de69 ("iomap: Add async buffered write support") kernel commit 1aa91d9 ("xfs: Add async buffered write support"). as below: pwrite: Resource temporarily unavailable wrote 8388608/8388608 bytes at offset 0 XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) -RWF_NOWAIT time is within limits. +pwrite: Resource temporarily unavailable +(standard_in) 1: syntax error +RWF_NOWAIT took seconds So For async buffered write requests, the request will return -EAGAIN if the ilock cannot be obtained immediately. Here also a discussion[1] that seems generic/471 has been broken. Now, I met this problem in my linux distribution, then I found the above discussion. IMO, remove this case is ok and then we can avoid to meet this false report again. [Additional information from Dave Chinner] We changed how timestamps are updated so that they are aware of IOCB_NOWAIT. If the IOCB_NOWIAT DIO write now needs to update the inode timestamps, it will return -EAGAIN instead of doing potentially blocking operations that require IO to complete (i.e. taking a transaction reservation). Hence the first time we go to do a DIO read an inode, it's going to do an atime update, which now occurrs from an IOCB_NOWAIT context and we return -EAGAIN.... Yes, we added non-blocking timestamp updates as part of the async buffered write support, but this was a general XFS IO path change of behaviour to address a potential blocking point in *all* IOCB_NOWAIT reads and writes, buffered or direct. The test is not validating that RWF_NOWAIT is behaving correctly - it just was a simple operation that kinda exercised RWF_NOWAIT semantics when we had no other way to test this code. It has outlived it's original purpose, so it should be removed... [1]https://lore.kernel.org/linux-xfs/b2865bd6-2346-8f4d-168b-17f06bbedbed@kernel.dk/ Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-08-25fstests: fsstress: wait interrupted aio to finishQu Wenruo
[BUG] There is a very low chance to hit data csum mismatch (caught by scrub) during test case btrfs/06[234567]. After some extra digging, it turns out that plain fsstress itself is enough to cause the problem: ``` workload() { mkfs.btrfs -f -m single -d single --csum sha256 $dev1 > /dev/null mount $dev1 $mnt #$fsstress -p 10 -n 1000 -w -d $mnt umount $mnt btrfs check --check-data-csum $dev1 || fail } runtime=1024 for (( i = 0; i < $runtime; i++ )); do echo "=== $i / $runtime ===" workload done ``` Inside a VM which has only 6 cores, above script can trigger with 1/20 possibility. [CAUSE] Locally I got a much smaller workload to reproduce: $fsstress -p 7 -n 50 -s 1691396493 -w -d $mnt -v > /tmp/fsstress With extra kernel trace_prinkt() on the buffered/direct writes. It turns out that the following direct write is always the cause: btrfs_do_write_iter: r/i=5/283 buffered fileoff=708608(709121) len=12288(7712) btrfs_do_write_iter: r/i=5/283 direct fileoff=8192(8192) len=73728(73728) <<<<< btrfs_do_write_iter: r/i=5/283 direct fileoff=589824(589824) len=16384(16384) With the involved byte number, it's easy to pin down the fsstress opeartion: 0/31: writev d0/f3[285 2 0 0 296 1457078] [709121,8,964] 0 0/32: chown d0/f2 308134/1763236 0 0/33: do_aio_rw - xfsctl(XFS_IOC_DIOINFO) d0/f2[285 2 308134 1763236 320 1457078] return 25, fallback to stat() 0/33: awrite - io_getevents failed -4 <<<< 0/34: dwrite - xfsctl(XFS_IOC_DIOINFO) d0/f3[285 2 308134 1763236 320 1457078] return 25, fallback to stat() Note the 0/33, when the data csum mismatch triggered, it always fail with -4 (-EINTR). It looks like with lucky enough concurrency, we can get to the following situation inside fsstress: Process A | Process B -----------------------------------+--------------------------------------- do_aio_rw() | |- io_sumit(); | |- io_get_events(); | | Returned -EINTR, but IO hasn't | | finished. | `- free(buf); | malloc(); | Got the same memory of @buf from | thread A. | Modify the memory | Now the buffer is changed while | still under IO This is the typical buffer modification during direct IO, which is going to cause csum mismatch for btrfs, and btrfs properly detects it. This is the direct cause of the problem. The root cause is that, io_uring would use signals to handle submission/completion of IOs. Thus io_uring operations would interrupt AIO operations, thus causing the above problem. [FIX] To fix the problem, we can just retry io_getevents() so that we can properly wait for the IO. This prevents us to modify the IO buffer before writeback really finishes. With this fixes, I can no longer reproduce the data corruption. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-08-25btrfs/004: use shuf to shuffle the file linesNaohiro Aota
The "sort -R" is slower than "shuf" even with the full output because "sort -R" actually sort them to group the identical keys. $ time bash -c "seq 1000000 | shuf >/dev/null" bash -c "seq 1000000 | shuf >/dev/null" 0.18s user 0.03s system 104% cpu 0.196 total $ time bash -c "seq 1000000 | sort -R >/dev/null" bash -c "seq 1000000 | sort -R >/dev/null" 19.61s user 0.03s system 99% cpu 19.739 total Since the "find"'s outputs never be identical, we can just use "shuf" to optimize the selection. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-08-25fstests/btrfs: use _random_file() helperNaohiro Aota
Use _random_file() helper to choose a random file in a directory. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-08-25common/rc: introduce _random_file() helperNaohiro Aota
Currently, we use "ls ... | sort -R | head -n1" (or tail) to choose a random file in a directory.It sorts the files with "ls", sort it randomly and pick the first line, which wastes the "ls" sort. Also, using "sort -R | head -n1" is inefficient. For example, in a directory with 1000000 files, it takes more than 15 seconds to pick a file. $ time bash -c "ls -U | sort -R | head -n 1 >/dev/null" bash -c "ls -U | sort -R | head -n 1 >/dev/null" 15.38s user 0.14s system 99% cpu 15.536 total $ time bash -c "ls -U | shuf -n 1 >/dev/null" bash -c "ls -U | shuf -n 1 >/dev/null" 0.30s user 0.12s system 138% cpu 0.306 total So, we should just use "ls -U" and "shuf -n 1" to choose a random file. Introduce _random_file() helper to do it properly. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-08-19fstests: Verify dir permissions when creating a stub subvolumeLee Trager
btrfs supports creating nesting subvolumes however snapshots are not recurive. When a snapshot is taken of a volume which contains a subvolume the subvolume is replaced with a stub subvolume which has the same name and uses inode number 2. This test validates that the stub volume copies permissions of the original volume. Signed-off-by: Lee Trager <lee@trager.us> Reviewed-by: Zorro Lang <zlang@redhat.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-08-19btrfs/220: do not run async discard test on zoned deviceNaohiro Aota
The mount option "discard=async" is not meant to be used on the zoned mode. Skip it from the test. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-08-19common/rc: drop 'fsck -f' parameter from _repair_test_fsDavid Disseldorp
The '-f' parameter is fsck.ext# specific, where it's documented to: Force checking even if filesystem is marked clean _repair_test_fs() is only called on _check_test_fs() failure, so dropping the parameter should be possible without changing ext# behaviour. Doing so fixes _repair_test_fs() on exfat, where fsck.exfat doesn't support '-f'. Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-08-19generic/{175,297,298}: fix use of uninitialized varAmir Goldstein
The truncate command in those tests use an uninitialized variable i. in kdevops, i must contain some leftover, so we get errors like: /data/fstests-install/xfstests/tests/generic/298: line 45: /dev/loop12): syntax error: operand expected (error token is "/dev/loop12)") Apparently, noone including the author of the tests knows why this truncate command is in the test, so remove the wrong truncate command. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-08-19check: fix parsing expunge file with commentsAmir Goldstein
commit 60054d51 ("check: fix excluded tests are only expunged in the first iteration") change to use exclude_tests array instead of file. The check if a test is in expunge file was using grep -q $TEST_ID FILE so it was checking if the test was a non-exact match to one of the lines, for a common example: "generic/001 # exclude this test" would be a match to test generic/001. The commit regressed this example, because the new code checks for exact match of [ "generic/001" == "generic/001 " ]. Change the code to match a regular expression to deal with this case and any other suffix correctly. NOTE that the original code would have matched test generic/100 with lines like "generic/1000" when we get to 4 digit seqnum, so the regular expression does an exact match to the first word of the line. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-08-19fsx: tidy options usage and formatShiyang Ruan
1. Add missing options and wrap the cli example line. 2. Cleanup and also add missing "-K" operation for options description part. Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-08-19t_ofd_locks: fix sem initialization sequenceStas Sergeev
The locker was waiting for sem_otime on sem0 to became non-zero after incrementing sem0 himself. So sem_otime was never 0 at the time of checking it, so the check was redundant/wrong. This patch: - moves the increment of sem1 to the lock-tester site - lock-setter waits for that sem1 event, for which this patch replaces the wait loop on sem_otime with GETVAL loop, adding a small sleep - increment of sem0 to 2 moved past that sem1 event. That sem0 event is currently not used/waited. This guarantees that the lock-setter is working only after lock-getter is fully initialized. CC: fstests@vger.kernel.org CC: Murphy Zhou <xzhou@redhat.com> CC: Jeff Layton <jlayton@kernel.org> CC: Zorro Lang <zlang@redhat.com> Signed-off-by: Stas Sergeev <stsp2@yandex.ru> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-08-19t_ofd_locks: fix stalled semaphore handlingStas Sergeev
Currently IPC_RMID was attempted on a semid returned after failed semget() with flags=IPC_CREAT|IPC_EXCL. So nothing was actually removed. This patch introduces the much more reliable scheme where the wrapper script creates and removes semaphores, passing a sem key to the test binary via new -K option. This patch speeds up the test ~5 times by removing the sem-awaiting loop in a lock-getter process. As the semaphore is now created before the test process started, there is no need to wait for anything. CC: fstests@vger.kernel.org CC: Murphy Zhou <xzhou@redhat.com> CC: Jeff Layton <jlayton@kernel.org> CC: Zorro Lang <zlang@redhat.com> Signed-off-by: Stas Sergeev <stsp2@yandex.ru> Reviwed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-08-19btrfs/213: fix failure due to misspelled function nameFilipe Manana
The test is calling _not_run but it should be _notrun, so fix that. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-08-05xfs: skip fragmentation tests when alwayscow mode is enabled, part 2v2023.08.06Darrick J. Wong
If the always_cow debugging flag is enabled, all file writes turn into copy writes. This dramatically ramps up fragmentation in the filesystem (intentionally!) so there's no point in complaining about fragmentation. I missed these two in the original commit because readahead for md5sum would create large folios at the start of the file. This resulted in the fdatatasync after the random writes issuing writeback for the whole large folio, which reduced file fragmentation to the point where this test started passing. With Ritesh's patchset implementing sub-folio dirty tracking, this test goes back to failing due to high fragmentation (as it did before large folios) so we need to mask these off too. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-08-05generic/642: fix SOAK_DURATION usage in generic/642Darrick J. Wong
Misspelled variable name. Yay bash. Fixes: 3e85dd4fe4 ("misc: add duration for long soak tests") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-08-05fstests: add helper to canonicalize devices used to enable persistent disksLuis Chamberlain
The filesystem configuration file does not allow you to use symlinks to real devices given the existing sanity checks verify that the target end device matches the source. Device mapper links work but not symlinks for real drives do not. Using a symlink is desirable if you want to enable persistent tests across reboots. For example you may want to use /dev/disk/by-id/nvme-eui.* so to ensure that the same drives are used even after reboot. This is very useful if you are testing for example with a virtualized environment and are using PCIe passthrough with other qemu NVMe drives with one or many NVMe drives. To enable support just add a helper to canonicalize devices prior to running the tests. This allows one test runner, kdevops, which I just extended with support to use real NVMe drives it has support now to use nvme EUI symlinks and fallbacks to nvme model + serial symlinks as not all NVMe drives support EUIs. The drives it uses for the filesystem configuration optionally is with NVMe eui symlinks so to allow the same drives to be used over reboots. For instance this works today with real nvme drives: mkfs.xfs -f /dev/nvme0n1 mount /dev/nvme0n1 /mnt TEST_DIR=/mnt TEST_DEV=/dev/nvme0n1 FSTYP=xfs ./check generic/110 FSTYP -- xfs (debug) PLATFORM -- Linux/x86_64 flax-mtr01 6.5.0-rc3-djwx #rc3 SMP PREEMPT_DYNAMIC Wed Jul 26 14:26:48 PDT 2023 generic/110 2s Ran: generic/110 Passed all 1 tests But this does not: TEST_DIR=/mnt TEST_DEV=/dev/disk/by-id/nvme-eui.0035385411904c1e FSTYP=xfs ./check generic/110 mount: /mnt: /dev/disk/by-id/nvme-eui.0035385411904c1e already mounted on /mnt. common/rc: retrying test device mount with external set mount: /mnt: /dev/disk/by-id/nvme-eui.0035385411904c1e already mounted on /mnt. common/rc: could not mount /dev/disk/by-id/nvme-eui.0035385411904c1e on /mnt umount /mnt TEST_DIR=/mnt TEST_DEV=/dev/disk/by-id/nvme-eui.0035385411904c1e FSTYP=xfs ./check generic/110 TEST_DEV=/dev/disk/by-id/nvme-eui.0035385411904c1e is mounted but not on TEST_DIR=/mnt - aborting Already mounted result: /dev/disk/by-id/nvme-eui.0035385411904c1e /mnt This fixes this. This allows the same real drives for a test to be used over and over after reboots. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-08-05check: generate gcov code coverage reports at the end of each sectionDarrick J. Wong
Support collecting kernel code coverage information as reported in debugfs. At the start of each section, we reset the gcov counters; during the section wrapup, we'll collect the kernel gcov data. If lcov is installed and the kernel source code is available, it will also generate a nice html report. If a CLI web browser is available, it will also format the html report into text for easy grepping. This requires the test runner to set REPORT_GCOV=1 explicitly and gcov to be enabled in the kernel. Cc: tytso@mit.edu Cc: kent.overstreet@linux.dev Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-08-05btrfs/276: make test accurate regarding number of expected extentsFilipe Manana
btrfs/276 creates a 16G file with compression enabled in order to quickly and efficiently create a file with many extents and have a fs tree with a height of 3 (root node at level 2), so that it can test that fiemap is correctly reporting extent sharedness when we have shared subtrees of the fs tree due to a snapshot. Compression results in extents with a maximum size of 128K and the test is expecting only extents of 128K, which normally happens if the machine has a large amount of RAM and writeback is not triggered before the xfs_io command finishes. However if writeback is triggered in the meanwhile, due to memory pressure for example, then we can get end up with some extents that are smaller than 128K, therefore increasing the total number of extents in the test file and make the test fail. This seems to happen often on test machines with small amounts of RAM, such as 4G, as reported by Qu in the following thread: https://lore.kernel.org/linux-btrfs/20230801065529.50122-1-wqu@suse.com/ So to address this create a file with holes and direct IO to make sure we always get a specific number of extents in the test file. To speedup the test create 2000 64K extents, with holes in between them, so that it works on a fs with any sector size, and then create a bunch of files with large xattrs to quickly bump the fs tree height to 3 for any node size (4K to 64K). This also guarantees that the file extent items are spread over multiples leaves, in order to exercise fiemap's correctness when reporting shared extents due to shared subtrees. Reported-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Tested-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-08-05fstests: add smoketest groupZorro Lang
Darrick suggests that fstests can provide a simple smoketest, by running several generic filesystem smoke testing for five minutes apiece (SOAK_DURATION="5m"). Since there are only five smoke tests, this is effectively a 20min super-quick test. With gcov enabled, running these tests yields about ~75% coverage for iomap and ~60% for xfs; or ~50% for ext4 and ~75% for ext4; and ~45% for btrfs. Coverage was about ~65% for the pagecache. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-08-05xfs/122: adjust test for flexarray conversions in 6.5Darrick J. Wong
Adjust the output of this test to handle the conversion of flexarray declaration conversions in linux v6.5, commit a49bbce58ea9 ("xfs: convert flex-array declarations in xfs attr leaf blocks") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-08-05generic: add a test for device removal without dirty dataChristoph Hellwig
Test the removal of the underlying device when the file system still does not have dirty data. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-08-05generic: add a test for device removal with dirty dataChristoph Hellwig
Test the removal of the underlying device when the file system still has dirty data. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-08-05btrfs: add a test case to make sure scrub can repair parity corruptionQu Wenruo
There is a kernel regression caused by commit 75b470332965 ("btrfs: raid56: migrate recovery and scrub recovery path to use error_bitmap"), which leads to scrub not repairing corrupted parity stripes. So here we add a test case to verify the P/Q stripe scrub behavior by: - Create a RAID5 or RAID6 btrfs with minimal amount of devices This means 2 devices for RAID5, and 3 devices for RAID6. This would result the parity stripe to be a mirror of the only data stripe. And since we have control of the content of data stripes, the content of the P stripe is also fixed. - Create an 64K file The file would cover one data stripe. - Corrupt the P stripe - Scrub the fs If scrub is working, the P stripe would be repaired. Unfortunately scrub can not report any P/Q corruption, limited by its reporting structure. So we can not use the return value of scrub to determine if we repaired anything. - Verify the content of the P stripe - Use "btrfs check --check-data-csum" to double check By above steps, we can verify if the P stripe is properly fixed. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-08-05btrfs/294: reject zoned devices for nowQu Wenruo
The test case itself is utilizing RAID5/6, which is not yet supported on zoned device. In the future we would use raid-stripe-tree (RST) feature, but for now just reject zoned devices completely. And since we're here, also update the _fixed_by_kernel_commit lines, as the proper fix is already merged upstream. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-08-04fstests: install soak_duration.awkTheodore Ts'o
Commit 3e85dd4fe423 ("misc: add duration for long soak tests") added a helper executable, soak_duration.awk, is which used by the check script if SOAK_DURATION is set. This script translates a "human-friendly" time duration specifier, such as 4m or 2d into an integer number of seconds. We need to make sure that this script is installed or the check script will bomb out if SOAK_DURATION is set (and if the fstests installation doesn't include a full set of fstests source, but just those files installed by "make install"). Fixes: 3e85dd4fe423 ("misc: add duration for long soak tests") Cc: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-07-23btrfs: add a test case to verify that per-fs features directory gets updatedv2023.07.23Qu Wenruo
Although btrfs has a per-fs feature directory, it's not properly refreshed after new features are enabled. We had some attempts to do that properly, like commit 14e46e04958d ("btrfs: synchronize incompat feature bits with sysfs files"). But unfortunately that commit get later reverted as some call sites is not safe to update sysfs files. Now we have a new commit b7625f461da6 ("btrfs: sysfs: update fs features directory asynchronously") to properly refresh that per-fs features directory. So it's time to add a test case for it. The test case itself is pretty straightforward: - Make a very basic 3 disks btrfs Only using the very basic profiles (DUP/SINGLE) so that even older mkfs.btrfs can support. - Make sure per-fs features directory doesn't contain "raid1c34" file - Balance the metadata to RAID1C3 profile - Verify the per-fs features directory contains "raid1c34" feature file Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> [ Update commit log. Remove commented code. Add _fixed_by_kernel_commit. Check mkfs status. Add sync. ] Signed-off-by: Anand Jain <anand.jain@oracle.com>
2023-07-23btrfs: add a test case to check btrfs won't crash on certain corruptionQu Wenruo
The test case would reproduce the situation by creating an empty fs, with SINGLE metadata profile, then corrupt the tree root manually. Finally try mounting the corrupted fs, the mount should fail while our kernel should not fail. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> [ Update commit log. Fix a line gt 80 chars. Use append to $seqres.full. Fix comment ] Signed-off-by: Anand Jain <anand.jain@oracle.com>
2023-07-23btrfs: add a test case to verify the write behavior of large RAID5 data chunksQu Wenruo
There is a recent regression during v6.4 merge window, that a u32 left shift overflow can cause problems with large data chunks (over 4G) sized. This is especially nasty for RAID56, which can lead to ASSERT() during regular writes, or corrupt memory if CONFIG_BTRFS_ASSERT is not enabled. This is the regression test case for it. Unlike btrfs/292, btrfs doesn't support trim inside RAID56 chunks, thus the workflow is simplified: - Create a RAID5 or RAID6 data chunk during mkfs - Fill the fs with 5G data and sync For unpatched kernel, the sync would crash the kernel. - Make sure everything is fine Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-07-23generic/558: avoid forkbombs on filesystems with many free inodesDarrick J. Wong
Mikulas reported that this test became a forkbomb on his system when he tested it with bcachefs. Unlike XFS and ext4, which have large inodes consuming hundreds of bytes, bcachefs has very tiny ones. Therefore, it reports a large number of free inodes on a freshly mounted 1GB fs (~15 million), which causes this test to try to create 15000 processes. There's really no reason to do that -- all this test wanted to do was to exhaust the number of inodes as quickly as possible using all available CPUs, and then it ran xfs_repair to try to reproduce a bug. Set the number of subshells to 4x the CPU count and spread the work among them instead of forking thousands of processes. Reported-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Tested-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-07-23xfs: add a couple more tests for ascii-ci problemsDarrick J. Wong
Add some tests to make sure that userspace and the kernel actually agree on how to do ascii case-insensitive directory lookups, and that metadump can actually obfuscate such filesystems. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>