summaryrefslogtreecommitdiff
path: root/fs/xfs/libxfs
AgeCommit message (Collapse)Author
2018-02-05Merge tag 'xfs-4.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull more xfs updates from Darrick Wong: "As promised, here's a (much smaller) second pull request for the second week of the merge cycle. This time around we have a couple patches shutting off unsupported fs configurations, and a couple of cleanups. Last, we turn off EXPERIMENTAL for the reverse mapping btree, since the primary downstream user of that information (online fsck) is now upstream and I haven't seen any major failures in a few kernel releases. Summary: - Print scrub build status in the xfs build info. - Explicitly call out the remaining two scenarios where we don't support reflink and never have. - Remove EXPERIMENTAL tag from reverse mapping btree!" * tag 'xfs-4.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: remove experimental tag for reverse mapping xfs: don't allow reflink + realtime filesystems xfs: don't allow DAX on reflink filesystems xfs: add scrub to XFS_BUILD_OPTIONS xfs: fix u32 type usage in sb validation function
2018-01-31xfs: fix u32 type usage in sb validation functionDarrick J. Wong
Don't use u32, use uint32_t, because this won't work in xfsprogs. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2018-01-31Merge tag 'xfs-4.16-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs updates from Darrick Wong: "This merge cycle, we're again some substantive changes to XFS. Metadata verifiers have been restructured to provide more detail about which part of a metadata structure failed checks, and we've enhanced the new online fsck feature to cross-reference extent allocation information with the other metadata structures. With this pull, the metadata verification part of online fsck is more or less finished, though the feature is still experimental and still disabled by default. We're also preparing to remove the EXPERIMENTAL tag from a couple of features this cycle. This week we're committing a bunch of space accounting fixes for reflink and removing the EXPERIMENTAL tag from reflink; I anticipate that we'll be ready to do the same for the reverse mapping feature next week. (I don't have any pending fixes for rmap; however I wish to remove the tags one at a time.) This giant pile of patches has been run through a full xfstests run over the weekend and through a quick xfstests run against this morning's master, with no major failures reported. Let me know if there's any merge problems -- git merge reported that one of our patches touched the same function as the i_version series, but it resolved things cleanly. Summary: - Log faulting code locations when verifiers fail, for improved diagnosis of corrupt filesystems. - Implement metadata verifiers for local format inode fork data. - Online scrub now cross-references metadata records with other metadata. - Refactor the fs geometry ioctl generation functions. - Harden various metadata verifiers. - Fix various accounting problems. - Fix uncancelled transactions leaking when xattr functions fail. - Prevent the copy-on-write speculative preallocation garbage collector from racing with writeback. - Emit log reservation type information as trace data so that we can compare against xfsprogs. - Fix some erroneous asserts in the online scrub code. - Clean up the transaction reservation calculations. - Fix various minor bugs in online scrub. - Log complaints about mixed dio/buffered writes once per day and less noisily than before. - Refactor buffer log item lists to use list_head. - Break PNFS leases before reflinking blocks. - Reduce lock contention on reflink source files. - Fix some quota accounting problems with reflink. - Fix a serious corruption problem in the direct cow write code where we fed bad iomaps to the vfs iomap consumers. - Various other refactorings. - Remove EXPERIMENTAL tag from reflink!" * tag 'xfs-4.16-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (94 commits) xfs: remove experimental tag for reflinks xfs: don't screw up direct writes when freesp is fragmented xfs: check reflink allocation mappings iomap: warn on zero-length mappings xfs: treat CoW fork operations as delalloc for quota accounting xfs: only grab shared inode locks for source file during reflink xfs: allow xfs_lock_two_inodes to take different EXCL/SHARED modes xfs: reflink should break pnfs leases before sharing blocks xfs: don't clobber inobt/finobt cursors when xref with rmap xfs: skip CoW writes past EOF when writeback races with truncate xfs: preserve i_rdev when recycling a reclaimable inode xfs: refactor accounting updates out of xfs_bmap_btalloc xfs: refactor inode verifier corruption error printing xfs: make tracepoint inode number format consistent xfs: always zero di_flags2 when we free the inode xfs: call xfs_qm_dqattach before performing reflink operations xfs: bmap code cleanup Use list_head infra-structure for buffer's log items list Split buffer's b_fspriv field Get rid of xfs_buf_log_item_t typedef ...
2018-01-29Merge tag 'iversion-v4.16-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull inode->i_version rework from Jeff Layton: "This pile of patches is a rework of the inode->i_version field. We have traditionally incremented that field on every inode data or metadata change. Typically this increment needs to be logged on disk even when nothing else has changed, which is rather expensive. It turns out though that none of the consumers of that field actually require this behavior. The only real requirement for all of them is that it be different iff the inode has changed since the last time the field was checked. Given that, we can optimize away most of the i_version increments and avoid dirtying inode metadata when the only change is to the i_version and no one is querying it. Queries of the i_version field are rather rare, so we can help write performance under many common workloads. This patch series converts existing accesses of the i_version field to a new API, and then converts all of the in-kernel filesystems to use it. The last patch in the series then converts the backend implementation to a scheme that optimizes away a large portion of the metadata updates when no one is looking at it. In my own testing this series significantly helps performance with small I/O sizes. I also got this email for Christmas this year from the kernel test robot (a 244% r/w bandwidth improvement with XFS over DAX, with 4k writes): https://lkml.org/lkml/2017/12/25/8 A few of the earlier patches in this pile are also flowing to you via other trees (mm, integrity, and nfsd trees in particular)". * tag 'iversion-v4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: (22 commits) fs: handle inode->i_version more efficiently btrfs: only dirty the inode in btrfs_update_time if something was changed xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need incrementing fs: only set S_VERSION when updating times if necessary IMA: switch IMA over to new i_version API xfs: convert to new i_version API ufs: use new i_version API ocfs2: convert to new i_version API nfsd: convert to new i_version API nfs: convert to new i_version API ext4: convert to new i_version API ext2: convert to new i_version API exofs: switch to new i_version API btrfs: convert to new i_version API afs: convert to new i_version API affs: convert to new i_version API fat: convert to new i_version API fs: don't take the i_lock in inode_inc_iversion fs: new API for handling inode->i_version ntfs: remove i_version handling ...
2018-01-29xfs: don't screw up direct writes when freesp is fragmentedDarrick J. Wong
xfs_bmap_btalloc is given a range of file offset blocks that must be allocated to some data/attr/cow fork. If the fork has an extent size hint associated with it, the request will be enlarged on both ends to try to satisfy the alignment hint. If free space is fragmentated, sometimes we can allocate some blocks but not enough to fulfill any of the requested range. Since bmapi_allocate always trims the new extent mapping to match the originally requested range, this results in bmapi_write returning zero and no mapping. The consequences of this vary -- buffered writes will simply re-call bmapi_write until it can satisfy at least one block from the original request. Direct IO overwrites notice nmaps == 0 and return -ENOSPC through the dio mechanism out to userspace with the weird result that writes fail even when we have enough space because the ENOSPC return overrides any partial write status. For direct CoW writes the situation was disastrous because nobody notices us returning an invalid zero-length wrong-offset mapping to iomap and the write goes off into space. Therefore, if free space is so fragmented that we managed to allocate some space but not enough to map into even a single block of the original allocation request range, we should break the alignment hint in order to guarantee at least some forward progress for the direct write. If we return a short allocation to iomap_apply it'll call back about the remaining blocks. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29xfs: treat CoW fork operations as delalloc for quota accountingDarrick J. Wong
Since the CoW fork only exists in memory, it is incorrect to update the on-disk quota block counts when we modify the CoW fork. Unlike the data fork, even real extents in the CoW fork are only delalloc-style reservations (on-disk they're owned by the refcountbt) so they must not be tracked in the on disk quota info. Ensure the i_delayed_blks accounting reflects this too. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29xfs: refactor accounting updates out of xfs_bmap_btallocDarrick J. Wong
Move all the inode and quota accounting updates out of xfs_bmap_btalloc in preparation for fixing some quota accounting problems with copy on write. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-01-29xfs: refactor inode verifier corruption error printingDarrick J. Wong
Refactor inode verifier error reporting into a non-libxfs function so that we aren't encoding the message format in libxfs. This also changes the kernel dmesg output to resemble buffer verifier errors more closely. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29xfs: bmap code cleanupShan Hai
Remove the extent size hint and realtime inode relevant code from the xfs_bmapi_reserve_delalloc since it is not called on the inode with extent size hint set or on a realtime inode. Signed-off-by: Shan Hai <shan.hai@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-29Split buffer's b_fspriv fieldCarlos Maiolino
By splitting the b_fspriv field into two different fields (b_log_item and b_li_list). It's possible to get rid of an old ABI workaround, by using the new b_log_item field to store xfs_buf_log_item separated from the log items attached to the buffer, which will be linked in the new b_li_list field. This way, there is no more need to reorder the log items list to place the buf_log_item at the beginning of the list, simplifying a bit the logic to handle buffer IO. This also opens the possibility to change buffer's log items list into a proper list_head. b_log_item field is still defined as a void *, because it is still used by the log buffers to store xlog_in_core structures, and there is no need to add an extra field on xfs_buf just for xlog_in_core. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: minor style changes] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-29xfs: convert to new i_version APIJeff Layton
Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Dave Chinner <dchinner@redhat.com>
2018-01-17xfs: check sb_agblocks and sb_agblklog when validating superblockDarrick J. Wong
Currently, we don't check sb_agblocks or sb_agblklog when we validate the superblock, which means that we can fuzz garbage values into those values and the mount succeeds. This leads to all sorts of UBSAN warnings in xfs/350 since we can then coerce other parts of xfs into shifting by ridiculously large values. Once we've validated agblocks, make sure the agcount makes sense. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-01-17xfs: recheck reflink / dirty page status before freeing CoW reservationsDarrick J. Wong
Eryu Guan reported seeing occasional hangs when running generic/269 with a new fsstress that supports clonerange/deduperange. The cause of this hang is an infinite loop when we convert the CoW fork extents from unwritten to real just prior to writing the pages out; the infinite loop happens because there's nothing in the CoW fork to convert, and so it spins forever. The fundamental issue here is that when we go to perform these CoW fork conversions, we're supposed to have an extent waiting for us, but the low space CoW reaper has snuck in and blown them away! There are four conditions that can dissuade the reaper from touching our file -- no reflink iflag; dirty page cache; writeback in progress; or directio in progress. We check the four conditions prior to taking the locks, but we neglect to recheck them once we have the locks, which is how we end up whacking the writeback that's in progress. Therefore, refactor the four checks into a helper function and call it once again once we have the locks to make sure we really want to reap the inode. While we're at it, add an ASSERT for this weird condition so that we'll fail noisily if we ever screw this up again. Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Tested-by: Eryu Guan <eguan@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-01-17xfs: btree format ifork loader should check for zero numrecsDarrick J. Wong
A btree format inode fork with zero records makes no sense, so reject it if we see it, or else we can miscalculate memory allocations. Found by zeroes fuzzing {a,u3}.bmbt.numrecs in xfs/{374,378,412} with KASAN. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-01-17xfs: attr leaf verifier needs to check for obviously bad countDarrick J. Wong
In the attribute leaf verifier, we can check for obviously bad values of firstused and count so that later attempts at lasthash don't run off the end of the memory buffer. Found by ones fuzzing hdr.count in xfs/400 with KASAN. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-01-17xfs: directory scrubber must walk through data block to offsetDarrick J. Wong
In xfs_scrub_dir_rec, we must walk through the directory block entries to arrive at the offset given by the hash structure. If we blindly trust the hash address, we can end up midway into a directory entry and stray outside the block. Found by lastbit fuzzing lents[3].address in xfs/390 with KASAN enabled. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-17xfs: cross-reference the realtime bitmapDarrick J. Wong
While we're scrubbing various btrees, cross-reference the records with the other metadata. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-17xfs: add scrub cross-referencing helpers for the refcount btreesDarrick J. Wong
Add a couple of functions to the refcount btrees that will be used to cross-reference metadata against the refcountbt. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-17xfs: add scrub cross-referencing helpers for the rmap btreesDarrick J. Wong
Add a couple of functions to the rmap btrees that will be used to cross-reference metadata against the rmapbt. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-17xfs: add scrub cross-referencing helpers for the inode btreesDarrick J. Wong
Add a couple of functions to the inode btrees that will be used to cross-reference metadata against the inobt. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-17xfs: add scrub cross-referencing helpers for the free space btreesDarrick J. Wong
Add a couple of functions to the free space btrees that will be used to cross-reference metadata against the bnobt/cntbt, and a generic btree function that provides the real implementation. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-16xfs: cancel tx on xfs_defer_finish() error during xattr set/removeBrian Foster
Chris Dunlop reports a problem where an xattr operation fails, reports the following error to syslog and hangs during unmount: ================================================ [ BUG: lock held when returning to user space! ] ... ------------------------------------------------ <PID> is leaving the kernel with locks still held! 1 lock held by <PID>: #0: (sb_internal){......}, at: [<ffffffffa07692a3>] xfs_trans_alloc+0xe3/0x130 [xfs] The failure/shutdown occurs during deferred ops processing which leads to an error return from xfs_defer_finish() via xfs_attr_leaf_addname(). While the root cause of the failure is unknown corruption, the cause of the subsequent BUG above and unmount hang is failure to cancel the transaction before returning to userspace. The transaction is not cancelled because the out_defer_cancel error handling paths in the xfs_attr_[leaf|node]_[add|remove]name() functions clear args.trans without releasing the transaction. The callers therefore lose the reference to the transaction and fail to cancel it. Since xfs_attr_[set|remove]() always cancel args.trans when != NULL and xfs_defer_finish()->...->xfs_trans_roll() should always return with a valid transaction, update the leaf/node xattr functions to not reset args.trans in the error path responsible for cancelling deferred ops. Reported-by: Chris Dunlop <chris@onthe.net.au> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-12xfs: account finobt blocks properly in perag reservationBrian Foster
XFS started using the perag metadata reservation pool for free inode btree blocks in commit 76d771b4cbe33 ("xfs: use per-AG reservations for the finobt"). To handle backwards compatibility, finobt blocks are accounted against the pool so long as the full reservation is available at mount time. Otherwise the ->m_inotbt_nores flag is set and the filesystem falls back to the traditional per-transaction finobt reservation. This commit has two problems: - finobt blocks are always accounted against the metadata reservation on allocation, regardless of ->m_inotbt_nores state - finobt blocks are never returned to the reservation pool on free The first problem affects reflink+finobt filesystems where the full finobt reservation is not available at mount time. finobt blocks are essentially stolen from the reflink reservation, putting refcountbt management at risk of allocation failure. The second problem is an unconditional leak of metadata reservation whenever finobt is enabled. Update the finobt block allocation callouts to consider ->m_inotbt_nores and account blocks appropriately. Blocks should be consistently accounted against the metadata pool when ->m_inotbt_nores is false and otherwise tagged as RESV_NONE. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-12xfs: fix check on struct_version for versions 4 or greaterColin Ian King
It appears that the check for versions 4 or more is incorrect and is off-by-one. Fix this. Detected by CoverityScan, CID#1463775 ("Logically dead code") Fixes: ac503a4cc9e8 ("xfs: refactor the geometry structure filling function") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-12xfs: use %px for data pointers when debuggingDarrick J. Wong
Starting with commit 57e734423ad ("vsprintf: refactor %pK code out of pointer"), the behavior of the raw '%p' printk format specifier was changed to print a 32-bit hash of the pointer value to avoid leaking kernel pointers into dmesg. For most situations that's good. This is /undesirable/ behavior when we're trying to debug XFS, however, so define a PTR_FMT that prints the actual pointer when we're in debug mode. Note that %p for tracepoints still prints the raw pointer, so in the long run we could consider rewriting some of these messages as tracepoints. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-12xfs: change 0x%p -> %p in print messagesDarrick J. Wong
Since %p prepends "0x" to the outputted string, we can drop the prefix. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-09xfs: harden directory integrity checks some moreDarrick J. Wong
If a malicious filesystem image contains a block+ format directory wherein the directory inode's core.mode is set such that S_ISDIR(core.mode) == 0, and if there are subdirectories of the corrupted directory, an attempt to traverse up the directory tree will crash the kernel in __xfs_dir3_data_check. Running the online scrub's parent checks will tend to do this. The crash occurs because the directory inode's d_ops get set to xfs_dir[23]_nondir_ops (it's not a directory) but the parent pointer scrubber's indiscriminate call to xfs_readdir proceeds past the ASSERT if we have non fatal asserts configured. Fix the null pointer dereference crash in __xfs_dir3_data_check by looking for S_ISDIR or wrong d_ops; and teach the parent scrubber to bail out if it is fed a non-directory "parent". Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-01-08xfs: refactor the geometry structure filling functionDarrick J. Wong
Refactor the geometry structure filling function to use the superblock to fill the fields. While we're at it, make the function less indenty and use some whitespace to make the function easier to read. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-01-08xfs: hoist xfs_fs_geometry to libxfsDarrick J. Wong
Move xfs_fs_geometry to libxfs so that we can clean up the fs geometry reporting in xfsprogs. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-01-08xfs: trace log reservations at mount timeDarrick J. Wong
At each mount, emit the transaction reservation type information via tracepoints. This makes it easier to compare the log reservation info calculated by the kernel and xfsprogs so that we can more easily diagnose minimum log size failures on freshly formatted filesystems. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-01-08xfs: standardize quota verification function outputsDarrick J. Wong
Rename xfs_dqcheck to xfs_dquot_verify and make it return an xfs_failaddr_t like every other structure verifier function. This enables us to check on-disk quotas in the same way that we check everything else. Callers are now responsible for logging errors, as XFS_QMOPT_DOWARN goes away. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08xfs: separate dquot repair into a separate functionDarrick J. Wong
Move the dquot repair code into a separate function and remove XFS_QMOPT_DQREPAIR in favor of calling the helper directly. Remove other dead code because quotacheck is the only caller of DQREPAIR. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08xfs: create a new buf_ops pointer to verify structure metadataDarrick J. Wong
Expose all metadata structure buffer verifier functions via buf_ops. These will be used by the online scrub mechanism to look for problems with buffers that are already sitting around in memory. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08xfs: fail out of xfs_attr3_leaf_lookup_int if it looks corruptDarrick J. Wong
If the xattr leaf block looks corrupt, return -EFSCORRUPTED to userspace instead of ASSERTing on debug kernels or running off the end of the buffer on regular kernels. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08xfs: provide a centralized method for verifying inline fork dataDarrick J. Wong
Replace the current haphazard dir2 shortform verifier callsites with a centralized verifier function that can be called either with the default verifier functions or with a custom set. This helps us strengthen integrity checking while providing us with flexibility for repair tools. xfs_repair wants this to be able to supply its own verifier functions when trying to fix possibly corrupt metadata. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08xfs: refactor short form directory structure verifier functionDarrick J. Wong
Change the short form directory structure verifier function to return the instruction pointer of a failing check or NULL if everything's ok. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08xfs: create structure verifier function for short form symlinksDarrick J. Wong
Create a function to check the structure of short form symlink targets. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08xfs: create structure verifier function for shortform xattrsDarrick J. Wong
Create a function to perform structure verification for short form extended attributes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08xfs: move inode fork verifiers to xfs_dinode_verifyDarrick J. Wong
Consolidate the fork size and format verifiers to xfs_dinode_verify so that we can reject bad inodes earlier and in a single place. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08xfs: verify dinode header firstDarrick J. Wong
Move the v3 inode integrity information (crc, owner, metauuid) before we look at anything else in the inode so that we don't waste time on a torn write or a totally garbled block. This makes xfs_dinode_verify more consistent with the other verifiers. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08xfs: refactor verifier callers to print address of failing checkDarrick J. Wong
Refactor the callers of verifiers to print the instruction address of a failing check. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08xfs: have buffer verifier functions report failing addressDarrick J. Wong
Modify each function that checks the contents of a metadata buffer to return the instruction address of the failing test so that we can report more precise failure errors to the log. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08xfs: refactor xfs_verifier_error and xfs_buf_ioerrorDarrick J. Wong
Since all verification errors also mark the buffer as having an error, we can combine these two calls. Later we'll add a xfs_failaddr_t parameter to promote the idea of reporting corruption errors and the address of the failing check to enable better debugging reports. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08xfs: remove XFS_WANT_CORRUPTED_RETURN from dir3 data verifiersDarrick J. Wong
Since __xfs_dir3_data_check verifies on-disk metadata, we can't have it noisily blowing asserts and hanging the system on corrupt data coming in off the disk. Instead, have it return a boolean like all the other checker functions, and only have it noisily fail if we fail in debug mode. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08xfs: refactor short form btree pointer verificationDarrick J. Wong
Now that we have xfs_verify_agbno, use it to verify short form btree pointers instead of open-coding them. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08xfs: refactor long-format btree header verification routinesDarrick J. Wong
Create two helper functions to verify the headers of a long format btree block. We'll use this later for the realtime rmapbt. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08xfs: remove XFS_FSB_SANITY_CHECKDarrick J. Wong
We already have a function to verify fsb pointers, so get rid of the last users of the (less robust) macro. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08xfs: eliminate duplicate icreate tx reservation functionsBrian Foster
The create transaction reservation calculation has two different branches of code depending on whether the filesystem is a v5 format fs or older. Each branch considers the max reservation between the allocation case (new chunk allocation + record insert) and the modify case (chunk exists, record modification) of inode allocation. The modify case is the same for both superblock versions with the exception of the finobt. The finobt helper checks the feature bit, however, and so the modify case already shares the same code. Now that inode chunk allocation has been refactored into a helper that checks the superblock version to calculate the appropriate reservation for the create transaction, the only remaining difference between the create and icreate branches is the call to the finobt helper. As noted above, the finobt helper is a no-op when the feature is not enabled. Therefore, these branches are effectively duplicate and can be condensed. Remove the xfs_calc_create_*() branch of functions and update the various callers to use the xfs_calc_icreate_*() variant. The latter creates the same reservation size for v4 create transactions as the removed branch. As such, this patch does not result in transaction reservation changes. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-08xfs: refactor inode chunk alloc/free tx reservationBrian Foster
The reservation for the various forms of inode allocation is scattered across several different functions. This includes two variants of chunk allocation (v5 icreate transactions vs. older create transactions) and the inode free transaction. To clean up some of this code and clarify the purpose of specific allocfree reservations, continue the pattern of defining helper functions for smaller operational units of broader transactions. Refactor the reservation into an inode chunk alloc/free helper that considers the various conditions based on filesystem format. An inode chunk free involves an extent free and buffer invalidations. The latter requires reservation for log headers only. An inode chunk allocation modifies the free space btrees and logs the chunk on v4 supers. v5 supers initialize the inode chunk using ordered buffers and so do not log the chunk. As a side effect of this refactoring, add one more allocfree res to the ifree transaction. Technically this does not serve a specific purpose because inode chunks are freed via deferred operations and thus occur after a transaction roll. tr_ifree has a bit of a history of tx overruns caused by too many agfl fixups during sustained file deletion workloads, so add this extra reservation as a form of padding nonetheless. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-08xfs: include an allocfree res for inobt modificationsBrian Foster
Analysis of recent reports of log reservation overruns and code inspection has uncovered that the reservations associated with inode operations may not cover the worst case scenarios. In particular, many cases only include one allocfree res. for a particular operation even though said operations may also entail AGFL fixups and inode btree block allocations in addition to the actual inode chunk allocation. This can easily turn into two or three block allocations (or frees) per operation. In theory, the only way to define the worst case reservation is to include an allocfree res for each individual allocation in a transaction. Since that is impractical (we can perform multiple agfl fixups per tx and not every allocation results in a full tree operation), we need to find a reasonable compromise that addresses the deficiency in practice without blowing out the size of the transactions. Since the inode btrees are not filled by the AGFL, record insertion and removal can directly result in block allocations and frees depending on the shape of the tree. These allocations and frees occur in the same transaction context as the inobt update itself, but are separate from the allocation/free that might be required for an inode chunk. Therefore, it makes sense to assume that an [f]inobt insert/remove can directly result in one or more block allocations on behalf of the tree. Refactor the inode transaction reservations to include one allocfree res. per inode btree modification to cover allocations required by the tree itself. This separates the reservation required to allocate the inode chunk from the reservation required for inobt record insertion/removal. Apply the same logic to the finobt. This results in killing off the finobt modify condition because we no longer assume that the broader transaction reservation will cover finobt block allocations and finobt shape changes can occur in either of the inobt allocation or modify situations. Suggested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>