summaryrefslogtreecommitdiff
path: root/check
AgeCommit message (Collapse)Author
2024-02-16Add ktest style markers for test starting and finishingKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-16Dump seqres.full on test failureKent Overstreet
In ktest, we try to keep all essential information on test failure in a single log file - dumping seqres.full to stdout will end up in that log file. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2024-02-16check: Add -f: failfast modeKent Overstreet
This adds a new flag to check which exits immediately after the first test failure, so as to leave test/scratch devices untouched and make it easier to debug rare test failures. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-08-19check: fix parsing expunge file with commentsAmir Goldstein
commit 60054d51 ("check: fix excluded tests are only expunged in the first iteration") change to use exclude_tests array instead of file. The check if a test is in expunge file was using grep -q $TEST_ID FILE so it was checking if the test was a non-exact match to one of the lines, for a common example: "generic/001 # exclude this test" would be a match to test generic/001. The commit regressed this example, because the new code checks for exact match of [ "generic/001" == "generic/001 " ]. Change the code to match a regular expression to deal with this case and any other suffix correctly. NOTE that the original code would have matched test generic/100 with lines like "generic/1000" when we get to 4 digit seqnum, so the regular expression does an exact match to the first word of the line. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-08-05fstests: add helper to canonicalize devices used to enable persistent disksLuis Chamberlain
The filesystem configuration file does not allow you to use symlinks to real devices given the existing sanity checks verify that the target end device matches the source. Device mapper links work but not symlinks for real drives do not. Using a symlink is desirable if you want to enable persistent tests across reboots. For example you may want to use /dev/disk/by-id/nvme-eui.* so to ensure that the same drives are used even after reboot. This is very useful if you are testing for example with a virtualized environment and are using PCIe passthrough with other qemu NVMe drives with one or many NVMe drives. To enable support just add a helper to canonicalize devices prior to running the tests. This allows one test runner, kdevops, which I just extended with support to use real NVMe drives it has support now to use nvme EUI symlinks and fallbacks to nvme model + serial symlinks as not all NVMe drives support EUIs. The drives it uses for the filesystem configuration optionally is with NVMe eui symlinks so to allow the same drives to be used over reboots. For instance this works today with real nvme drives: mkfs.xfs -f /dev/nvme0n1 mount /dev/nvme0n1 /mnt TEST_DIR=/mnt TEST_DEV=/dev/nvme0n1 FSTYP=xfs ./check generic/110 FSTYP -- xfs (debug) PLATFORM -- Linux/x86_64 flax-mtr01 6.5.0-rc3-djwx #rc3 SMP PREEMPT_DYNAMIC Wed Jul 26 14:26:48 PDT 2023 generic/110 2s Ran: generic/110 Passed all 1 tests But this does not: TEST_DIR=/mnt TEST_DEV=/dev/disk/by-id/nvme-eui.0035385411904c1e FSTYP=xfs ./check generic/110 mount: /mnt: /dev/disk/by-id/nvme-eui.0035385411904c1e already mounted on /mnt. common/rc: retrying test device mount with external set mount: /mnt: /dev/disk/by-id/nvme-eui.0035385411904c1e already mounted on /mnt. common/rc: could not mount /dev/disk/by-id/nvme-eui.0035385411904c1e on /mnt umount /mnt TEST_DIR=/mnt TEST_DEV=/dev/disk/by-id/nvme-eui.0035385411904c1e FSTYP=xfs ./check generic/110 TEST_DEV=/dev/disk/by-id/nvme-eui.0035385411904c1e is mounted but not on TEST_DIR=/mnt - aborting Already mounted result: /dev/disk/by-id/nvme-eui.0035385411904c1e /mnt This fixes this. This allows the same real drives for a test to be used over and over after reboots. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-08-05check: generate gcov code coverage reports at the end of each sectionDarrick J. Wong
Support collecting kernel code coverage information as reported in debugfs. At the start of each section, we reset the gcov counters; during the section wrapup, we'll collect the kernel gcov data. If lcov is installed and the kernel source code is available, it will also generate a nice html report. If a CLI web browser is available, it will also format the html report into text for easy grepping. This requires the test runner to set REPORT_GCOV=1 explicitly and gcov to be enabled in the kernel. Cc: tytso@mit.edu Cc: kent.overstreet@linux.dev Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-07-07check: fix excluded tests are only expunged in the first iterationYuezhang Mo
If iterating more than once and excluding some tests, the excluded tests are expunged in the first iteration, but run in subsequent iterations. This is not expected. The problem was caused by the temporary file saving the excluded tests being deleted by `rm -f $tmp.*` in _wrapup() at the end of the first iteration. This commit saves the excluded tests into a variable instead of a temporary file. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@foxmail.com> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-06-23fstests: reduce runtime of check -nAmir Goldstein
kvm-xfstests invokes check -n twice to pre-process and generate the tests-to-run list, which is then being passed as a long list of tests for invkoing check in the command line. check invokes dirname, basename and sed several times per test just for doing basic string prefix/suffix trimming. Use bash string pattern matching instead which is much faster. Note that the following pattern matching expression change: < test_dir=${test_dir#$SRC_DIR/*} > t=${t#$SRC_DIR/} does not change the meaning of the expression, because the shortest match of "$SRC_DIR/*" that is being trimmed is "$SRC_DIR/" and removing the tests/ prefix is what this code intended to do. With check -n, there is no need to cleanup the results dir, but check -n is doing that for every single listed test. Move the cleanup of results dir to before actually running the test. These improvements to check pre-test code cut down several minutes from the time until tests actually start to run with kvm-xfstests. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-05-01misc: add duration for long soak testsDarrick J. Wong
Make it so that test runners can schedule long soak stress test programs for an exact number of seconds by setting the SOAK_DURATION config variable. Change the definition of the 'soak' test to specify that these tests can be controlled via SOAK_DURATION. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-05-01fstests: add AFS supportDavid Howells
Add support for the AFS filesystem. AFS is a network filesystem and there are a number of features it doesn't support. - No mkfs. (Kind of. An AFS volume server can be asked to create a new volume, but that's probably best left to AFS-specific test suites. Further, a volume would need to be destroyed before another of the same name could be created; it's not simply a matter of overwriting the old one as it is on a blockdev with a block-based filesystem.) - No fsck. (Kind of - the server can be asked to salvage a volume, but it may involve taking the server offline). - No richacls. AFS has its own ACL system. - No atimes. Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-04-24check: _check_filesystems for errors even if test failedLeah Rumancik
Previously, we would only run _check_filesystems to ensure that a test that appeared to pass did not have any filesystem corruption. However, in _check_filesystems, we also repair any errors found in the filesystem. Let's do this even if we already know the test failed so that subsequent tests aren't affected. Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-04-24check: try to fix the test device if it gets corruptedTheodore Ts'o
If the test device gets corrupted all subsequent tests will fail. To prevent this from causing all subsequent tests to be useless, try repair the file system on TEST_DEV if possible. We don't need to do this with the scratch device since that file system gets recreated each time anyway. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-03-26report: record fstests start and report generation timestampsDarrick J. Wong
Report two new timestamps in the xml report: the time that ./check was started, and the time that the report was generated. We introduce new timestamps to minimize breakage with parsing scripts. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-03-26report: clarify the meaning of the timestamp attributeDarrick J. Wong
We've never specified what the timestamp attribute of the testsuite element actually means, and it history is rather murky. Prior to the introduction of the xml report format in commit f9fde7db2f, the "date_time" variable was used only to scrape dmesg via the /dev/kmsg device after each test. If /dev/kmsg was not a writable path, the variable was not set at all. In this case, the report timestamp would be blank. In commit ffdecf7498a1, Ted changed the xunit report code to handle empty date_time values by setting date_time to the time of report generation. This change was done to handle the case where no tests are run at all. However, it did not change the behavior that date_time is not set if /dev/kmsg is not writable. Clear up all this confusion by defining the timestamp attribute to reflect the start time of the most recent test, regardless of the state of /dev/kmsg. If no tests are run, then define the attribute to be the time of report generation. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-03-25check: generate section reports between testsDarrick J. Wong
Generate the section report between tests so that the summary report always reflects the outcome of the most recent test. Two usecases are envisioned here -- if a cluster-based test runner anticipates that the testrun could crash the VM, they can set REPORT_DIR to (say) an NFS mount to preserve the intermediate results. If the VM does indeed crash, the scheduler can examine the state of the crashed VM and move the tests to another VM. The second usecase is a reporting agent that runs in the VM to upload live results to a test dashboard. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Leah Rumancik <leah.rumancik@gmail.com> Tested-by: Leah Rumancik <leah.rumancik@gmail.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-01-22xfstests: add fuse supportv2023.01.22Miklos Szeredi
This allows using any fuse filesystem that can be mounted with mount -t fuse$FUSE_SUBTYP ... Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Jakob Unterwurzacher <jakobunt@gmail.com> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2023-01-14overlay: avoid to use NULL OVL_BASE_FSTYP for mountingBaokun Li
Generally, FSTYP is used to specify OVL_BASE_FSTYP. When we specify FSTYP through an environment variable, it is not converted to OVL_BASE_FSTYP. In addition, sometimes we do not even specify the file type. For example, we only use `./check -n -overlay -g auto` to list overlay-related cases. If OVL_BASE_FSTYP is NULL, mounting fails and the test fails. To solve this problem, try to assign a value to OVL_BASE_FSTYP when specifying -overlay. In addition, in the _overlay_base_mount function, the basic file system type of the overlay is specified only when OVL_BASE_FSTYP is not NULL. Reported-by: Murphy Zhou <jencce.kernel@gmail.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2022-12-16check: wipe tmp.arglistLeah Rumancik
Make sure tmp.arglist is wiped before each run to avoid accidentally rerunning tests. Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: David Disseldorp <ddiss@suse.de> Signed-off-by: Zorro Lang <zlang@kernel.org>
2022-12-15check: ensure sect_stop is initialized if interruptedDavid Disseldorp
sect_stop is normally set immediately prior to calling _wrapup() via run_section(). However, when called via a trap signal handler, sect_stop may be uninitialized, leading to a negative section time (sect_stop - sect_start) in the xunit report. E.g. Interrupted! Passed all 1 tests Xunit report: /home/david/xfstests/results//result.xml rapido1:/# head /home/david/xfstests/results//result.xml <?xml version="1.0" encoding="UTF-8"?> <testsuite name="xfstests" failures="0" skipped="0" tests="1" time="-1670885797" ... > This commit uses the existing $interrupt flag to determine when sect_stop needs to be initialised. Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2022-12-14check: call _check_dmesg even if the test case failedQu Wenruo
[BUG] When KEEP_DMESG=yes is specified, passed test cases will also keep their $seqres.dmesg files. However for failed test cases (caused by _fail calls), their dmesg files are not saved at all: # rm -rf results/btrfs/219* # ./check btrfs/219 # ls result/btrfs/219* results/btrfs/219.full results/btrfs/219.out.bad [CAUSE] $seqres.dmesg is created (and later deleted depending on config) by _check_dmesg() function. But if a test case failed by calling _fail, then we no longer call _check_dmesg(), thus no dmesg will be saved no matter whatever the config is. [FIX] If the test case itself failed, then still call _check_dmesg() to either save the dmesg unconditionally (KEEP_DMESG=yes case), or save the dmesg if there is something wrong (default). The dmesg can be pretty handy debug clue for both cases. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Disseldorp <ddiss@suse.de> Signed-off-by: Zorro Lang <zlang@kernel.org>
2022-10-16fstests: get section config after RUN_SECTION checksv2022.10.16Josef Bacik
While trying to do ./check -s <some section> I was failing because I had a section defined higher than <some section> that had TEST_DEV=/some/nonexistent/device, since I was using the other section to test an experimental drive. This appears to be because we run through all of the sections, and when getting the section config we check to see if it's valid, and in this case the section wasn't valid. The section I was actually trying to use was valid however. Fix check to see if the section we're trying to run is in our list of sections to run first, and then if it is get the config at that point. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2022-10-15check: detect and preserve all coredumps made by a testDarrick J. Wong
If someone sets kernel.core_uses_pid (or kernel.core_pattern), any coredumps generated by fstests might have names that are longer than just "core". Since the pid isn't all that useful by itself, let's record the coredumps by hash when we save them, so that we don't waste space storing identical crash dumps. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2022-09-20egrep, fgrep: deprecatedMurphy Zhou
Since this grep commit: commit a9515624709865d480e3142fd959bccd1c9372d1 Author: Paul Eggert <eggert@cs.ucla.edu> Date: Sun Aug 15 10:52:13 2021 -0700 egrep, fgrep: now obsolete egrep will trigger a warning like: +egrep: warning: egrep is obsolescent; using grep -E This will break many gold output. Signed-off-by: Murphy Zhou <jencce.kernel@gmail.com> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2022-09-04common: filter internal errors during io error testingDarrick J. Wong
The goal of an EIO shutdown test is to examine the shutdown and recovery behavior if we make the underlying storage device return EIO. On XFS, it's possible that the shutdown will come from a thread that cancels a dirty transaction due to the EIO. This is expected behavior, but _check_dmesg will flag it as a test failure. Make it so that we can add simple regexps to the default check_dmesg filter function, then add the "Internal error" string to filter function when we invoke an EIO test. This fixes periodic regressions in generic/019 and generic/475. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2022-07-24report: add support for the xunit-quiet formatv2022.07.24Theodore Ts'o
The xunit-quiet format excludes the NNN.{full,dmesg,bad} files in <system-out> and <system-err> nodes which are included in the xunit report format. For test runners that save the entire results directory to preserve all of the test artifacts, capturing the NNN.{full,dmesg,bad} in the results.xml file is redundant. In addition, if the NNN.bad is too large, it can cause the junitparser python library to refuse to parse the XML file to prevent potential denial of service attacks[1]. A simple way to avoid this problem is to simply to omit the <system-out> and <system-err> nodes in the results.xml file. [1] https://gitlab.com/gitlab-org/gitlab/-/issues/268035 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Zorro Lang <zlang@kernel.org>
2022-07-09check: add -L <n> parameter to rerun failed testsDavid Disseldorp
If check is run with -L <n>, then a failed test will be rerun <n> times before proceeding to the next test. Following completion of the rerun loop, aggregate pass/fail statistics are printed. Rerun tests will be tracked as a single failure in overall pass/fail metrics (via @try and @bad), with .out.bad, .dmesg, .core, .hints, .notrun and .full saved using a .rerun# suffix. Suggested-by: Theodore Ts'o <tytso@mit.edu> Link: https://lwn.net/Articles/897061/ Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Zorro Lang <zlang@kernel.org>
2022-07-09check: append bad / notrun arrays in helper functionDavid Disseldorp
Currently the @try, @bad and @notrun arrays are appended with seqnum at different points in the main run_section() loop: - @try: shortly prior to test script execution - @notrun: on list (check -n), or after .notrun flagged test completion - @bad: at the start of subsequent test loop and loop exit For future loop-test-following-failure functionality it makes sense to combine some of these steps. This change moves both @notrun and @bad appends into a helper function which is called at the end of each loop iteration. Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Zorro Lang <zlang@kernel.org>
2022-07-09check: make a few variables localDavid Disseldorp
The variables aren't used outside of function scope. Also convert one timestamp output to use the helper. Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Zorro Lang <zlang@kernel.org>
2022-07-09report: pass through most details as function parametersDavid Disseldorp
Report generation currently involves reaching into a whole bunch of globals for things like section name and start/end times. Pass these through as explicit function parameters to avoid unintentional breakage. One minor fix included is the default xunit error message, which used $sequm instead of $seqnum. Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Zorro Lang <zlang@kernel.org>
2022-07-02check: document mkfs.xfs reliance on fstests exportsDarrick J. Wong
There are a number of fstests that employ special (and now unsupported) XFS filesystem configurations to perform testing in a controlled environment. The presence of the QA_CHECK_FS and MSGVERB variables are used by mkfs.xfs to detect that it's running inside fstests, which enables the unsupported configurations. Nobody else should be using filesystems with tiny logs, non-redundant superblocks, or smaller than the (new) minimum supported size. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2022-06-24check: remove err and first_test variablesDavid Disseldorp
tc_status can be used for both of these. Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2022-06-24check: use arrays instead of separate n_ countersDavid Disseldorp
The separate n_try, n_bad and n_notrun counters are unnecessary when the corresponding lists are switched to bash arrays. Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2022-06-24report: fix xunit tests countDavid Disseldorp
The xunit "section report" provides a tests attribute, which according to https://llg.cubic.org/docs/junit/ represents: tests="" <!-- The total number of tests in the suite, required. --> The current value is generated as a sum of the $n_try and $n_notrun counters. This is incorrect as the $n_try counter already includes tests which are run but complete with _notrun. One special case exists for $showme (check -n), where $n_try remains zero, so $n_notrun can be used as-is. Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2022-06-24check: simplify check.time parsingDavid Disseldorp
There's no need to use grep and awk when the latter can do all that's needed, including the pretty printing. Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2022-06-24check: avoid FSTYP=<fstyp parameter> repetitionDavid Disseldorp
Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2022-05-02common: print hints for reasons of test failuresAmir Goldstein
Introduce helpers _fixed_by_{kernel,git}_commit() and _fixed_in_{kernel_,}version() that can be used to hint testers why a test might be failing and aid in auto-generating of expunge lists for downstream kernel testing. A test may be annotated with multiple hints, for example: _fixed_by_kernel_commit 09889695864 xfs: foo _fixed_by_kernel_commit 46464565465 ext4: bar _fixed_in_version xfsprogs v5.15 Annotate fix kernel commits for some overlayfs tests. Annotate fix kernel version for some overlayfs tests testing for legacy behavior whose fixes are not likely to be backported to stable kernels. This is modeled after LTP's 'make filter-known-fails' and print_failure_hints() using struct tst_tag annotations. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Zorro Lang <zlang@kernel.org>
2021-08-01check: back off the OOM score adjustment to -500Darrick J. Wong
Dave Chinner complained that fstests really shouldn't be running at -1000 oom score adjustment because that makes it more "important" than certain system daemons (e.g. journald, udev). That's true, so increase it to -500. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2021-08-01check: don't leave the scratch filesystem mounted after _notrunDarrick J. Wong
Unmount the scratch filesystem if a test decides to _notrun itself because _try_wipe_scratch_devs will not be able to wipe the scratch device prior to the next test run. We don't want to let scratch state from one test leak into subsequent tests if we can help it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2021-07-18check: run _check_filesystems in an OOM-happy subshellDarrick J. Wong
While running fstests one night, I observed that fstests stopped abruptly because ./check ran _check_filesystems to run xfs_repair. In turn, repair (which inherited oom_score_adj=-1000 from ./check) consumed so much memory that the OOM killer ran around killing other daemons, rendering the system nonfunctional. This is silly -- we set an OOM score adjustment of -1000 on the ./check process so that the test framework itself wouldn't get OOM-killed, because that aborts the entire run. Everything else is fair game for that, including subprocesses started by _check_filesystems. Therefore, adapt _check_filesystems (and its children) to run in a subshell with a much higher oom score adjustment. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2021-07-04check: exit with exit code 1 after printing the usage messageTheodore Ts'o
If check is passed an invalid command line option, exit with a non-zero exit code so that a script calling check can detect the failure. The check script already performs an "exit 1" if a valid option has an invalid argument, so this is consistent with existing practice. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2021-06-27check: use generated group filesDarrick J. Wong
Convert the ./check script to use the automatically generated group list membership files, as the transition is now complete. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2021-03-07check: add CLI option to repeat and stop tests in case of failureRitesh Harjani
Currently with -i <n> option the test can run for many iterations, but in case if we want to stop the iteration in case of a failure, it is much easier to have such an option which could check the failed status and stop the test from further proceeding. This patch adds such an option (-I <n>) thereby extending the -i <n> option functionality. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2021-03-07check: run tests in exactly the order specifiedDarrick J. Wong
Introduce a new --exact-order switch to disable all sorting, filtering of repeated lines, and shuffling of test order. The goal of this is to be able to run tests in a specific order, namely to try to reproduce test failures that could be the result of a -r(andomize) run getting lucky. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2021-03-07check: don't abort on non-existent excluded groupsDarrick J. Wong
Don't abort the whole test run if we asked to exclude groups that aren't included in the candidate group list, since we actually /are/ satisfying the user's request. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2021-03-07check: allow '-e testid' to exclude a single testDarrick J. Wong
This enables us to mask off specific tests. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2020-12-06check: source common/rc again if TEST_DEV was recreatedEryu Guan
If TEST_DEV is recreated by check, FSTYP derived from TEST_DEV previously could be changed too and might not reflect the reality. So source common/rc again with correct FSTYP to get fs-specific configs, e.g. common/xfs. For example, using this config-section config file, and run section ext4 first then xfs, you can see: our local _scratch_mkfs routine ... ./common/rc: line 825: _scratch_mkfs_xfs: command not found check: failed to mkfs $SCRATCH_DEV using specified options local.config: [default] RECREATE_TEST_DEV=true TEST_DEV=/dev/sda5 SCRATCH_DEV=/dev/sda6 TEST_DIR=/mnt/test SCRATCH_MNT=/mnt/scratch [ext4] MKFS_OPTIONS="-b 4096" FSTYP=ext4 [xfs] FSTYP=xfs MKFS_OPTIONS="-f -b size=4k" Tested-by: Ritesh Harjani <riteshh@linux.ibm.com> Signed-off-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
2020-11-22check: run tests in a systemd scope for mandatory test cleanupDarrick J. Wong
TLDR: If systemd is available, run each test in its own temporary systemd scope. This enables the test harness to forcibly clean up all of the test's child processes (if it does not do so itself) so that we can move into the post-test unmount and check cleanly. I frequently run fstests in "low" memory situations (2GB!) to force the kernel to do interesting things. There are a few tests like generic/224 and generic/561 that put processes in the background and occasionally trigger the OOM killer. Most of the time the OOM killer correctly shoots down fsstress or duperemove, but once in a while it's stupid enough to shoot down the test control process (i.e. tests/generic/224) instead. fsstress is still running in the background, and the one process that knew about that is dead. When the control process dies, ./check moves on to the post-test fsck, which fails because fsstress is still running and we can't unmount. After fsck fails, ./check moves on to the next test, which fails because fsstress is /still/ writing to the filesystem and we can't unmount or format. The end result is that that one OOM kill causes cascading test failures, and I have to re-start fstests to see if I get a clean(er) run. So, the solution I present in this patch is to teach ./check to try to run the test script in a systemd scope. If that succeeds, ./check will tell systemd to kill the scope when the test script exits and returns control to ./check. Concretely, this means that systemd creates a new cgroup, stuffs the processes in that cgroup, and when we kill the scope, systemd kills all the processes in that cgroup and deletes the cgroup. The end result is that fstests now has an easy way to ensure that /all/ child processes of a test are dead before we try to unmount the test and scratch devices. I've designed this to be optional, because not everyone does or wants or likes to run systemd, but it makes QA easier. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2020-10-21check: fix misspelled variable name for sectionsFilipe Manana
We have some places that refer to the variable OPTIONS_HAVE_SECTIONS has OPTIONS_HAVE_SECIONS, obviously a typo. So fix them. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2020-10-18fstests: drop check.log and check.time into section specific results dirJosef Bacik
Right now we only track check.log and check.time globally, it would be nice to do it per-section as well. This makes it easier to parse results from systems that run a bunch of different configurations at once. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Eryu Guan <guaneryu@gmail.com>
2020-09-21check: try reloading modulesDarrick J. Wong
Optionally reload the module between each test to try to pinpoint slab cache errors and whatnot. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Eryu Guan <guaneryu@gmail.com>