summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2019-05-07Merge tag 'driver-core-5.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core/kobject updates from Greg KH: "Here is the "big" set of driver core patches for 5.2-rc1 There are a number of ACPI patches in here as well, as Rafael said they should go through this tree due to the driver core changes they required. They have all been acked by the ACPI developers. There are also a number of small subsystem-specific changes in here, due to some changes to the kobject core code. Those too have all been acked by the various subsystem maintainers. As for content, it's pretty boring outside of the ACPI changes: - spdx cleanups - kobject documentation updates - default attribute groups for kobjects - other minor kobject/driver core fixes All have been in linux-next for a while with no reported issues" * tag 'driver-core-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (47 commits) kobject: clean up the kobject add documentation a bit more kobject: Fix kernel-doc comment first line kobject: Remove docstring reference to kset firmware_loader: Fix a typo ("syfs" -> "sysfs") kobject: fix dereference before null check on kobj Revert "driver core: platform: Fix the usage of platform device name(pdev->name)" init/config: Do not select BUILD_BIN2C for IKCONFIG Provide in-kernel headers to make extending kernel easier kobject: Improve doc clarity kobject_init_and_add() kobject: Improve docs for kobject_add/del driver core: platform: Fix the usage of platform device name(pdev->name) livepatch: Replace klp_ktype_patch's default_attrs with groups cpufreq: schedutil: Replace default_attrs field with groups padata: Replace padata_attr_type default_attrs field with groups irqdesc: Replace irq_kobj_type's default_attrs field with groups net-sysfs: Replace ktype default_attrs field with groups block: Replace all ktype default_attrs with groups samples/kobject: Replace foo_ktype's default_attrs field with groups kobject: Add support for default attribute groups to kobj_type driver core: Postpone DMA tear-down until after devres release for probe failure ...
2019-05-07ubifs: Drop unnecessary setting of zbr->znodeSascha Hauer
in dbg_walk_index ubifs_load_znode is used to load the znode behind a zbranch. ubifs_load_znode links the new child znode to the zbranch, so doing it again is unnecessary. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
2019-05-07ubifs: Remove ifdefs around CONFIG_UBIFS_ATIME_SUPPORTSascha Hauer
ifdefs reduce readability and compile coverage. This removes the ifdefs around CONFIG_UBIFS_ATIME_SUPPORT by replacing them with IS_ENABLED() where applicable. The fs layer would fall back to generic_update_time() when .update_time doesn't exist. We do this fallback explicitly now. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
2019-05-07ubifs: Remove #ifdef around CONFIG_FS_ENCRYPTIONSascha Hauer
ifdefs reduce readablity and compile coverage. This removes the ifdefs around CONFIG_FS_ENCRYPTION by using IS_ENABLED and relying on static inline wrappers. A new static inline wrapper for setting sb->s_cop is introduced to allow filesystems to unconditionally compile in their s_cop operations. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
2019-05-07ubifs: Limit number of xattrs per inodeRichard Weinberger
Since we have to write one deletion inode per xattr into the journal, limit the max number of xattrs. In theory UBIFS supported up to 65535 xattrs per inode. But this never worked correctly, expect no powercuts happened. Now we support only as many xattrs as we can store in 50% of a LEB. Even for tiny flashes this allows dozens of xattrs per inode, which is for an embedded filesystem still fine. In case someone has existing inodes with much more xattrs, it is still possible to delete them. UBIFS will fall back to an non-atomic deletion mode. Reported-by: Stefan Agner <stefan@agner.ch> Fixes: 1e51764a3c2ac ("UBIFS: add new flash file system") Signed-off-by: Richard Weinberger <richard@nod.at>
2019-05-07ubifs: orphan: Handle xattrs like filesRichard Weinberger
Like for the journal case, make sure that we track all xattr inodes. Otherwise UBIFS might not be able to locate stale xattr inodes upon recovery. Reported-by: Stefan Agner <stefan@agner.ch> Fixes: 1e51764a3c2ac ("UBIFS: add new flash file system") Signed-off-by: Richard Weinberger <richard@nod.at>
2019-05-07ubifs: journal: Handle xattrs like filesRichard Weinberger
If an inode hosts xattrs, create deletion entries for each inode. That way we can make sure that upon journal replay UBIFS can find find all xattr inodes. Otherwise it can happen that GC consumed already a LEB which contained parts of the TNC that pointed to the xattrs and we no longer find all xattr inodes, which will confuse the LPT and cause space allocation issues. Reported-by: Stefan Agner <stefan@agner.ch> Fixes: 1e51764a3c2ac ("UBIFS: add new flash file system") Signed-off-by: Richard Weinberger <richard@nod.at>
2019-05-07ubifs: find.c: replace swap function with built-in oneAndrey Abramov
Replace swap_dirty_idx function with built-in one, because swap_dirty_idx does only a simple byte to byte swap. Since Spectre mitigations have made indirect function calls more expensive, and the default simple byte copies swap is implemented without them, an "optimized" custom swap function is now a waste of time as well as code. Signed-off-by: Andrey Abramov <st5pub@yandex.ru> Reviewed by: George Spelvin <lkml@sdf.org> Signed-off-by: Richard Weinberger <richard@nod.at>
2019-05-07ubifs: Do not skip hash checking in data nodesSascha Hauer
UBIFS bails out early from try_read_node() when it doesn't have to check the CRC. Still the node hash has to be checked, otherwise wrong data could be sneaked into the FS. Fix this by not bailing out early and always checking the node hash. Fixes: 16a26b20d2af ("ubifs: authentication: Add hashes to index nodes") Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
2019-05-07Merge tag 'Wimplicit-fallthrough-5.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull Wimplicit-fallthrough updates from Gustavo A. R. Silva: "Mark switch cases where we are expecting to fall through. This is part of the ongoing efforts to enable -Wimplicit-fallthrough. Most of them have been baking in linux-next for a whole development cycle. And with Stephen Rothwell's help, we've had linux-next nag-emails going out for newly introduced code that triggers -Wimplicit-fallthrough to avoid gaining more of these cases while we work to remove the ones that are already present. We are getting close to completing this work. Currently, there are only 32 of 2311 of these cases left to be addressed in linux-next. I'm auditing every case; I take a look into the code and analyze it in order to determine if I'm dealing with an actual bug or a false positive, as explained here: https://lore.kernel.org/lkml/c2fad584-1705-a5f2-d63c-824e9b96cf50@embeddedor.com/ While working on this, I've found and fixed the several missing break/return bugs, some of them introduced more than 5 years ago. Once this work is finished, we'll be able to universally enable "-Wimplicit-fallthrough" to avoid any of these kinds of bugs from entering the kernel again" * tag 'Wimplicit-fallthrough-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (27 commits) memstick: mark expected switch fall-throughs drm/nouveau/nvkm: mark expected switch fall-throughs NFC: st21nfca: Fix fall-through warnings NFC: pn533: mark expected switch fall-throughs block: Mark expected switch fall-throughs ASN.1: mark expected switch fall-through lib/cmdline.c: mark expected switch fall-throughs lib: zstd: Mark expected switch fall-throughs scsi: sym53c8xx_2: sym_nvram: Mark expected switch fall-through scsi: sym53c8xx_2: sym_hipd: mark expected switch fall-throughs scsi: ppa: mark expected switch fall-through scsi: osst: mark expected switch fall-throughs scsi: lpfc: lpfc_scsi: Mark expected switch fall-throughs scsi: lpfc: lpfc_nvme: Mark expected switch fall-through scsi: lpfc: lpfc_nportdisc: Mark expected switch fall-through scsi: lpfc: lpfc_hbadisc: Mark expected switch fall-throughs scsi: lpfc: lpfc_els: Mark expected switch fall-throughs scsi: lpfc: lpfc_ct: Mark expected switch fall-throughs scsi: imm: mark expected switch fall-throughs scsi: csiostor: csio_wr: mark expected switch fall-through ...
2019-05-07ubifs: work around high stack usage with clangArnd Bergmann
Building this file with clang can result in large stack usage as seen from this warning: fs/ubifs/auth.c:78:5: error: stack frame size of 1152 bytes in function 'ubifs_prepare_auth_node' The problem is that inlining ubifs_hash_calc_hmac() leads to two SHASH_DESC_ON_STACK() blocks in the same function, and clang for some reason does not reuse the stack space as it should. Putting the first declaration into a separate basic block avoids this problem and reduces the stack allocation to 640 bytes. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2019-05-07ubifs: remove unused function __ubifs_shash_finalYueHaibing
There is no callers in tree, and can be removed. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Richard Weinberger <richard@nod.at>
2019-05-07ubifs: remove unnecessary #ifdef around fscrypt_ioctl_get_policy()Eric Biggers
When !CONFIG_FS_ENCRYPTION, fscrypt_ioctl_get_policy() is already stubbed out to return -EOPNOTSUPP, so the extra #ifdef is not needed. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2019-05-07ubifs: remove unnecessary calls to set up directory keyEric Biggers
In ubifs_unlink() and ubifs_rmdir(), remove the call to fscrypt_get_encryption_info() that precedes fscrypt_setup_filename(). This call was unnecessary, because fscrypt_setup_filename() already tries to set up the directory's encryption key. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2019-05-07Merge tag 'pidfd-v5.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull pidfd updates from Christian Brauner: "This patchset makes it possible to retrieve pidfds at process creation time by introducing the new flag CLONE_PIDFD to the clone() system call. Linus originally suggested to implement this as a new flag to clone() instead of making it a separate system call. After a thorough review from Oleg CLONE_PIDFD returns pidfds in the parent_tidptr argument. This means we can give back the associated pid and the pidfd at the same time. Access to process metadata information thus becomes rather trivial. As has been agreed, CLONE_PIDFD creates file descriptors based on anonymous inodes similar to the new mount api. They are made unconditional by this patchset as they are now needed by core kernel code (vfs, pidfd) even more than they already were before (timerfd, signalfd, io_uring, epoll etc.). The core patchset is rather small. The bulky looking changelist is caused by David's very simple changes to Kconfig to make anon inodes unconditional. A pidfd comes with additional information in fdinfo if the kernel supports procfs. The fdinfo file contains the pid of the process in the callers pid namespace in the same format as the procfs status file, i.e. "Pid:\t%d". To remove worries about missing metadata access this patchset comes with a sample/test program that illustrates how a combination of CLONE_PIDFD and pidfd_send_signal() can be used to gain race-free access to process metadata through /proc/<pid>. Further work based on this patchset has been done by Joel. His work makes pidfds pollable. It finished too late for this merge window. I would prefer to have it sitting in linux-next for a while and send it for inclusion during the 5.3 merge window" * tag 'pidfd-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: samples: show race-free pidfd metadata access signal: support CLONE_PIDFD with pidfd_send_signal clone: add CLONE_PIDFD Make anon_inodes unconditional
2019-05-07Merge tag 'stream_open-5.2' of https://lab.nexedi.com/kirr/linuxLinus Torvalds
Pull stream_open conversion from Kirill Smelkov: - remove unnecessary double nonseekable_open from drivers/char/dtlk.c as noticed by Pavel Machek while reviewing nonseekable_open -> stream_open mass conversion. - the mass conversion patch promised in commit 10dce8af3422 ("fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock") and is automatically generated by running $ make coccicheck MODE=patch COCCI=scripts/coccinelle/api/stream_open.cocci I've verified each generated change manually - that it is correct to convert - and each other nonseekable_open instance left - that it is either not correct to convert there, or that it is not converted due to current stream_open.cocci limitations. More details on this in the patch. - finally, change VFS to pass ppos=NULL into .read/.write for files that declare themselves streams. It was suggested by Rasmus Villemoes and makes sure that if ppos starts to be erroneously used in a stream file, such bug won't go unnoticed and will produce an oops instead of creating illusion of position change being taken into account. Note: this patch does not conflict with "fuse: Add FOPEN_STREAM to use stream_open()" that will be hopefully coming via FUSE tree, because fs/fuse/ uses new-style .read_iter/.write_iter, and for these accessors position is still passed as non-pointer kiocb.ki_pos . * tag 'stream_open-5.2' of https://lab.nexedi.com/kirr/linux: vfs: pass ppos=NULL to .read()/.write() of FMODE_STREAM files *: convert stream-like files from nonseekable_open -> stream_open dtlk: remove double call to nonseekable_open
2019-05-07Merge tag 'xfs-5.2-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs updates from Darrick Wong: "Here's a big pile of new stuff for XFS for 5.2. XFS has grown the ability to report metadata health status to userspace after online fsck checks the filesystem. The online metadata checking code is (I really hope) feature complete with the addition of checks for the global fs counters, though it'll remain EXPERIMENTAL for now. There are also fixes for thundering herds of writeback completions and some other deadlocks, fixes for theoretical integer overflow attacks on space accounting, and removal of the long-defunct 'mntpt' option which was deprecated in the mid-2000s and (it turns out) totally broken since 2011 (and nobody complained...). Summary: - Fix some more buffer deadlocks when performing an unmount after a hard shutdown. - Fix some minor space accounting issues. - Fix some use after free problems. - Make the (undocumented) FITRIM behavior consistent with other filesystems. - Embiggen the xfs geometry ioctl's data structure. - Introduce a new AG geometry ioctl. - Introduce a new online health reporting infrastructure and ioctl for userspace to query a filesystem's health status. - Enhance online scrub and repair to update the health reports. - Reduce thundering herd problems when writeback io completes. - Fix some transaction reservation type errors. - Fix integer overflow problems with delayed alloc reservation counters. - Fix some problems where we would exit to userspace without unlocking. - Fix inconsistent behavior when finishing deferred ops fails. - Strengthen scrub to check incore data against ondisk metadata. - Remove long-broken mntpt mount option. - Add an online scrub function for the filesystem summary counters, which should make online metadata scrub more or less feature complete for now. - Various cleanups" * tag 'xfs-5.2-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (38 commits) xfs: change some error-less functions to void types xfs: add online scrub for superblock counters xfs: don't parse the mtpt mount option xfs: always rejoin held resources during defer roll xfs: add missing error check in xfs_prepare_shift() xfs: scrub should check incore counters against ondisk headers xfs: allow scrubbers to pause background reclaim xfs: rename the speculative block allocation reclaim toggle functions xfs: track delayed allocation reservations across the filesystem xfs: fix broken bhold behavior in xrep_roll_ag_trans xfs: unlock inode when xfs_ioctl_setattr_get_trans can't get transaction xfs: kill the xfs_dqtrx_t typedef xfs: widen inode delalloc block counter to 64-bits xfs: widen quota block counters to 64-bit integers xfs: abort unaligned nowait directio early xfs: assert that we don't enter agfl freeing with a non-permanent transaction xfs: make tr_growdata a permanent transaction xfs: merge adjacent io completions of the same type xfs: remove unused m_data_workqueue xfs: implement per-inode writeback completion queues ...
2019-05-07Merge tag 'iomap-5.2-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull iomap updates from Darrick Wong: "Nothing particularly exciting here, just adding some callouts for gfs2 and cleaning a few things. Summary: - Add some extra hooks to the iomap buffered write path to enable gfs2 journalled writes - SPDX conversion - Various refactoring" * tag 'iomap-5.2-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: move iomap_read_inline_data around iomap: Add a page_prepare callback iomap: Fix use-after-free error in page_done callback fs: Turn __generic_write_end into a void function iomap: Clean up __generic_write_end calling iomap: convert to SPDX identifier
2019-05-07Merge tag 'jfs-5.2' of git://github.com/kleikamp/linux-shaggyLinus Torvalds
Pull jfs updates from Dave Kleikamp: "Several minor jfs fixes" * tag 'jfs-5.2' of git://github.com/kleikamp/linux-shaggy: jfs: fix bogus variable self-initialization fs/jfs: Switch to use new generic UUID API jfs: compare old and new mode before setting update_mode flag jfs: remove incorrect comment in jfs_superblock jfs: fix spelling mistake, EACCESS -> EACCES
2019-05-07Merge tag 'for-5.2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "This time the majority of changes are cleanups, though there's still a number of changes of user interest. User visible changes: - better read time and write checks to catch errors early and before writing data to disk (to catch potential memory corruption on data that get checksummed) - qgroups + metadata relocation: last speed up patch int the series to address the slowness, there should be no overhead comparing balance with and without qgroups - FIEMAP ioctl does not start a transaction unnecessarily, this can result in a speed up and less blocking due to IO - LOGICAL_INO (v1, v2) does not start transaction unnecessarily, this can speed up the mentioned ioctl and scrub as well - fsync on files with many (but not too many) hardlinks is faster, finer decision if the links should be fsynced individually or completely - send tries harder to find ranges to clone - trim/discard will skip unallocated chunks that haven't been touched since the last mount Fixes: - send flushes delayed allocation before start, otherwise it could miss some changes in case of a very recent rw->ro switch of a subvolume - fix fallocate with qgroups that could lead to space accounting underflow, reported as a warning - trim/discard ioctl honours the requested range - starting send and dedupe on a subvolume at the same time will let only one of them succeed, this is to prevent changes that send could miss due to dedupe; both operations are restartable Core changes: - more tree-checker validations, errors reported by fuzzing tools: - device item - inode item - block group profiles - tracepoints for extent buffer locking - async cow preallocates memory to avoid errors happening too deep in the call chain - metadata reservations for delalloc reworked to better adapt in many-writers/low-space scenarios - improved space flushing logic for intense DIO vs buffered workloads - lots of cleanups - removed unused struct members - redundant argument removal - properties and xattrs - extent buffer locking - selftests - use common file type conversions - many-argument functions reduction" * tag 'for-5.2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (227 commits) btrfs: Use kvmalloc for allocating compressed path context btrfs: Factor out common extent locking code in submit_compressed_extents btrfs: Set io_tree only once in submit_compressed_extents btrfs: Replace clear_extent_bit with unlock_extent btrfs: Make compress_file_range take only struct async_chunk btrfs: Remove fs_info from struct async_chunk btrfs: Rename async_cow to async_chunk btrfs: Preallocate chunks in cow_file_range_async btrfs: reserve delalloc metadata differently btrfs: track DIO bytes in flight btrfs: merge calls of btrfs_setxattr and btrfs_setxattr_trans in btrfs_set_prop btrfs: delete unused function btrfs_set_prop_trans btrfs: start transaction in xattr_handler_set_prop btrfs: drop local copy of inode i_mode btrfs: drop old_fsflags in btrfs_ioctl_setflags btrfs: modify local copy of btrfs_inode flags btrfs: drop useless inode i_flags copy and restore btrfs: start transaction in btrfs_ioctl_setflags() btrfs: export btrfs_set_prop btrfs: refactor btrfs_set_props to validate externally ...
2019-05-07Merge branch 'stable-fodder' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs stable fodder fixes from Al Viro: - acct_on() fix for deadlock caught by overlayfs folks - autofs RCU use-after-free SNAFU (->d_manage() can be called locklessly, so we need to RCU-delay freeing the objects it looks at) - (hopefully) the end of "do we need freeing this dentry RCU-delayed" whack-a-mole. * 'stable-fodder' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: autofs: fix use-after-free in lockless ->d_manage() dcache: sort the freeing-without-RCU-delay mess for good. acct_on(): don't mess with freeze protection
2019-05-07Merge branch 'work.icache' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs inode freeing updates from Al Viro: "Introduction of separate method for RCU-delayed part of ->destroy_inode() (if any). Pretty much as posted, except that destroy_inode() stashes ->free_inode into the victim (anon-unioned with ->i_fops) before scheduling i_callback() and the last two patches (sockfs conversion and folding struct socket_wq into struct socket) are excluded - that pair should go through netdev once davem reopens his tree" * 'work.icache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (58 commits) orangefs: make use of ->free_inode() shmem: make use of ->free_inode() hugetlb: make use of ->free_inode() overlayfs: make use of ->free_inode() jfs: switch to ->free_inode() fuse: switch to ->free_inode() ext4: make use of ->free_inode() ecryptfs: make use of ->free_inode() ceph: use ->free_inode() btrfs: use ->free_inode() afs: switch to use of ->free_inode() dax: make use of ->free_inode() ntfs: switch to ->free_inode() securityfs: switch to ->free_inode() apparmor: switch to ->free_inode() rpcpipe: switch to ->free_inode() bpf: switch to ->free_inode() mqueue: switch to ->free_inode() ufs: switch to ->free_inode() coda: switch to ->free_inode() ...
2019-05-07ceph: flush dirty inodes before proceeding with remountJeff Layton
xfstest generic/452 was triggering a "Busy inodes after umount" warning. ceph was allowing the mount to go read-only without first flushing out dirty inodes in the cache. Ensure we sync out the filesystem before allowing a remount to proceed. Cc: stable@vger.kernel.org Link: http://tracker.ceph.com/issues/39571 Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: fix unaligned access in ceph_send_cap_releasesJeff Layton
Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07libceph: make ceph_pr_addr take an struct ceph_entity_addr pointerJeff Layton
GCC9 is throwing a lot of warnings about unaligned accesses by callers of ceph_pr_addr. All of the current callers are passing a pointer to the sockaddr inside struct ceph_entity_addr. Fix it to take a pointer to a struct ceph_entity_addr instead, and then have the function make a copy of the sockaddr before printing it. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: print inode number in __caps_issued_mask debugging messagesJeff Layton
To make it easier to correlate with MDS logs. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: just call get_session in __ceph_lookup_mds_sessionJeff Layton
I originally thought there was a potential race here, but the fact that this is called with the mdsc->mutex held, ensures that the last reference to the session can't be put here. Still, it's clearer to just return the value from get_session here, and may prevent a bug later if we ever rework this code to be less reliant on mutexes. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: simplify arguments and return semantics of try_get_cap_refsJeff Layton
The return of this function is rather complex. It can return 0 or 1, and in the case of a 1 return, the "err" pointer will be filled out. This necessitates a lot of copying of values. We can achieve the same effect by just returning 0, 1 or a negative error code, and drop the "err" argument from this function. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: fix comment over ceph_drop_caps_for_unlinkJeff Layton
It's not clear what AUTH_RDCACHE means in this context, and we're clearly just dropping LINK caps here. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: move wait for mds request into helper functionJeff Layton
Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: have ceph_mdsc_do_request call ceph_mdsc_submit_requestJeff Layton
Nothing calls ceph_mdsc_submit_request today, but in later patches we'll need to be able to call this separately. Have the helper return an int so we can check the r_err under the mutex, and have the caller just check the error code from the submit. Also move the acquisition of CEPH_CAP_PIN references into the same function. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: after an MDS request, do callback and completionsJeff Layton
No MDS requests use r_callback today, but that will change in the future. The OSD client always does r_callback and then completes r_completion. Let's have the MDS client do the same. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: use pathlen values returned by set_request_path_attrJeff Layton
We make copies of the dentry name in set_request_path_attr, but then create_request_message re-fetches the lengths out of the dentry. While we don't currently set the *_drop fields unless the parents are locked, it's still better not to rely on that sort of implicit assumption. Use the pathlen values that set_request_path_attr returned instead, as they will always be correct for the returned paths themselves. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: use __getname/__putname in ceph_mdsc_build_pathJeff Layton
Al suggested we get rid of the kmalloc here and just use __getname and __putname to get a full PATH_MAX pathname buffer. Since we build the path in reverse, we continue to return a pointer to the beginning of the string and the length, and add a new helper to free the thing at the end. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: use ceph_mdsc_build_path instead of clone_dentry_nameJeff Layton
While it may be slightly more efficient, it's probably not worthwhile to optimize for the case that clone_dentry_name handles. We can get the same result by just calling ceph_mdsc_build_path when the parent isn't locked, with less code duplication. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: fix potential use-after-free in ceph_mdsc_build_pathJeff Layton
temp is not defined outside of the RCU critical section here. Ensure we grab that value before we drop the rcu_read_lock. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: dump granular cap info in "caps" debugfs fileJeff Layton
We have a "caps" file already that gives statistics on the caps cache as a whole. Add another section to that output and dump a line for each individual cap record. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: make iterate_session_caps a public symbolJeff Layton
Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: fix NULL pointer deref when debugging is enabledJeff Layton
Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: properly handle granular statx requestsJeff Layton
cephfs can benefit from statx. We can have the client just request caps sufficient for the needed attributes and leave off the rest. Also, recognize when AT_STATX_DONT_SYNC is set, and just scrape the inode without doing any call in that case. Force a call to the MDS in the event that AT_STATX_FORCE_SYNC is set. Link: http://tracker.ceph.com/issues/39258 Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: David Howells <dhowells@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: remove superfluous inode_lock in ceph_fsyncJeff Layton
Originally, filemap_write_and_wait took the i_mutex internally, but commit 02c24a82187d pushed the mutex acquisition into the individual fsync routines, leaving it up to the subsystem maintainers to remove it if it wasn't needed. For ceph, I see no reason to take the inode_lock here. All of the operations inside that lock are protected by their own locking. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: snapshot nfs re-exportYan, Zheng
To support snapshot nfs re-export, we need a way to lookup snapped inode by file handle. For directory inode, snapped metadata are always stored together with head inode. Client just need to pass vinodeno_t to MDS. For non-directory inode, there can be multiple version of snapped inodes and they can be stored in different dirfrags. Besides vinodeno_t, client also need to tell mds from which dirfrag it got the snapped inode. Another problem of supporting snapshot nfs re-export is that there can be multiple paths to access a snapped inode. For example: mkdir -p d1/d2/d3 mkdir d1/.snap/s1 Paths 'd1/.snap/s1/d2/d3', 'd1/d2/.snap/_s1_<inode number of d1>/d3' and 'd1/d2/d3/.snap/_s1_<inode number of d1>' are all reference to the same snapped inode. For a given snapped inode, There is no convenient way to get the first form and the second form paths. For simplicity, ceph_get_parent() return snapdir for snapped directory inode. Furthermore, client may access snapshot of deleted directory. For example: mkdir -p d1/d2 mkdir d1/.snap/s1 open d1/.snap/s1/d2 rm -rf d1/d2 <nfs server restart> The path constucted by ceph_get_parent() and ceph_get_name() is '<inode of d2>/.snap/_s1_<inode number of d1>'. Futher lookup parent of <inode of d2> will fail. To workaround this case, this patch uses d_obtain_root() to get dentry for snapdir of deleted directory. snapdir dentry has no DCACHE_DISCONNECTED flag set, reconnect_path() stops when it reaches snapdir dentry. Link: http://tracker.ceph.com/issues/22105 Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: quota: fix quota subdir mountsLuis Henriques
The CephFS kernel client does not enforce quotas set in a directory that isn't visible from the mount point. For example, given the path '/dir1/dir2', if quotas are set in 'dir1' and the filesystem is mounted with mount -t ceph <server>:<port>:/dir1/ /mnt then the client won't be able to access 'dir1' inode, even if 'dir2' belongs to a quota realm that points to it. This patch fixes this issue by simply doing an MDS LOOKUPINO operation for unknown inodes. Any inode reference obtained this way will be added to a list in ceph_mds_client, and will only be released when the filesystem is umounted. Link: https://tracker.ceph.com/issues/38482 Reported-by: Hendrik Peyerl <hpeyerl@plusline.net> Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: factor out ceph_lookup_inode()Luis Henriques
This function will be used by __fh_to_dentry and by the quotas code, to find quota realm inodes that are not visible in the mountpoint. Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: remove duplicated filelock ref increaseZhi Zhang
Inode i_filelock_ref is increased in ceph_lock or ceph_flock, but it is increased again in ceph_lock_message. This results in this ref won't become zero. If CEPH_I_ERROR_FILELOCK flag is set in remove_session_caps once, this flag can't be cleared even if client is back to normal. So further file lock will return EIO. Signed-off-by: Zhi Zhang <zhang.david2011@gmail.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07Merge tag 'printk-for-5.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - Allow state reset of printk_once() calls. - Prevent crashes when dereferencing invalid pointers in vsprintf(). Only the first byte is checked for simplicity. - Make vsprintf warnings consistent and inlined. - Treewide conversion of obsolete %pf, %pF to %ps, %pF printf modifiers. - Some clean up of vsprintf and test_printf code. * tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: lib/vsprintf: Make function pointer_string static vsprintf: Limit the length of inlined error messages vsprintf: Avoid confusion between invalid address and value vsprintf: Prevent crash when dereferencing invalid pointers vsprintf: Consolidate handling of unknown pointer specifiers vsprintf: Factor out %pO handler as kobject_string() vsprintf: Factor out %pV handler as va_format() vsprintf: Factor out %p[iI] handler as ip_addr_string() vsprintf: Do not check address of well-known strings vsprintf: Consistent %pK handling for kptr_restrict == 0 vsprintf: Shuffle restricted_pointer() printk: Tie printk_once / printk_deferred_once into .data.once for reset treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively lib/test_printf: Switch to bitmap_zalloc()
2019-05-07afs: Implement YFS ACL settingDavid Howells
Implement the setting of YFS ACLs in AFS through the interface of setting the afs.yfs.acl extended attribute on the file. Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-07afs: Get YFS ACLs and information through xattrsDavid Howells
The YFS/AuriStor variant of AFS provides more capable ACLs and provides per-volume ACLs and per-file ACLs as well as per-directory ACLs. It also provides some extra information that can be retrieved through four ACLs: (1) afs.yfs.acl The YFS file ACL (not the same format as afs.acl). (2) afs.yfs.vol_acl The YFS volume ACL. (3) afs.yfs.acl_inherited "1" if a file's ACL is inherited from its parent directory, "0" otherwise. (4) afs.yfs.acl_num_cleaned The number of of ACEs removed from the ACL by the server because the PT entries were removed from the PTS database (ie. the subject is no longer known). Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-07afs: implement acl settingJoe Gorse
Implements the setting of ACLs in AFS by means of setting the afs.acl extended attribute on the file. Signed-off-by: Joe Gorse <jhgorse@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-07afs: Get an AFS3 ACL as an xattrDavid Howells
Implement an xattr on AFS files called "afs.acl" that retrieves a file's ACL. It returns the raw AFS3 ACL from the result of calling FS.FetchACL, leaving any interpretation to userspace. Note that whilst YFS servers will respond to FS.FetchACL, this will render a more-advanced YFS ACL down. Use "afs.yfs.acl" instead for that. Signed-off-by: David Howells <dhowells@redhat.com>