summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2019-04-03bcachefs: delete some debug codeKent Overstreet
2019-04-03bcachefs: add missing includeKent Overstreet
2019-04-03bcachefs: BCH_NAME_MAXKent Overstreet
also fix some dirent bugs
2019-04-03bcachefs: optimize __bch2_btree_iter_relock()Kent Overstreet
bch2_btree_node_relock() and __bch2_btree_iter_relock() are now only used for relocking, not upgrading or downgrading locks, so we can split out bch2_btree_node_upgrade() and slim down the fast path.
2019-04-03bcachefs: btree_node_lock_increment()Kent Overstreet
2019-04-03bcachefs: btree_iter_get_locks()Kent Overstreet
2019-04-03bcachefs: bch2_btree_iter_upgrade()/downgrade()Kent Overstreet
Replaces bch2_btree_iter_set_locks_want() - also add more assertions Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2019-04-03bcachefs: Fix a minor memory leakKent Overstreet
2019-04-03bcachefs: Fix a bug in the str_hash codeKent Overstreet
fixes b0f3e786995cb3b12975503f963e469db5a4f09b
2019-04-03bcachefs: bch_sb_field_cleanKent Overstreet
Implement a superblock field so we don't have to read the journal after a clean shutdown (and more importantly, we can verify what we find in the journal after a clean shutdown) Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2019-04-03bcachefs: Make some improvements to the journal shutdown codeKent Overstreet
2019-04-03bcachefs: split out recovery.cKent Overstreet
2019-04-03bcachefs: btree gc refactoringKent Overstreet
2019-04-03bcachefs: fix a btree iter traverse error pathKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2019-04-03bcachefs: btree perf/unit testsKent Overstreet
The sysfs interface is crap and will be changed Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2019-04-03bcachefs: fix missing bch_crc_bytes entriesKent Overstreet
2019-04-03bcachefs: add a discard mount optionKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2019-04-03bcachefs: fix a minor fsync bugKent Overstreet
2019-04-03bcachefs: drop locks when needed in bch2_btree_node_get_sibling()Kent Overstreet
2019-04-03bcachefs: btree iter refactoringKent Overstreet
2019-04-03bcachefs: fix a fun truncate bugKent Overstreet
truncate was leaving extents past the end of i_size. Turns out, it was doing so because it thought it wasn't shrinking the file when it was, and it thought it wasn't shrinking because i_size had gotten screwed up - the in memory i_size was smaller than the on disk i_size, which is never supposed to happen. Also turns out, the thing that was screwing up i_size was truncate - specifically, the error path when the filemap_write_and_wait_range() call fails. Besides fixing truncate itself, this patch also fixes and makes rigorous a lot of the locking pertaining to i_size and ei_inode (the cached on disk inode in bch_inode_info). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2019-04-03bcachefs: add a journal_seq_verify debug optionKent Overstreet
2019-04-03bcachefs: implement BTREE_INSERT_NOUNLOCKKent Overstreet
BTREE_INSERT_NOUNLOCK means after a sucessful btree update, do not drop any locks (e.g. while merging nodes). This is going to be used to fix some locking primarily related to bi_size in bch_inode_info.
2019-04-03bcachefs: use BTREE_ITER_END more consistentlyKent Overstreet
2019-04-03bcachefs: better bch2_strtoh()Kent Overstreet
2019-04-03bcachefs: Fix a spurious error in fsckKent Overstreet
If fsck finds an unreachable directory, it could just be because we crashed between deleting the dirent and deleting the inode, since that isn't done atomically yet - it's only a real error if the directory isn't empty
2019-04-03bcachefs: don't use BTREE_INSERT_NOWAIT when we're not supposed toKent Overstreet
was causing spurious journal replay failures
2019-04-03bcachefs: fix missing btree_iter_set_dirty() callKent Overstreet
2019-04-03bcachefs: btree allocation deadlock fixKent Overstreet
2019-04-03bcachefs: fix another minor locking bugKent Overstreet
2019-04-03bcachefs: fix error path in fallocateKent Overstreet
2019-04-03bcachefs: tighten up reserve sizesKent Overstreet
2019-04-03bcachefs: fix device sysfs linksKent Overstreet
2019-04-03bcachefs: fcollapse works on block granularity, not pageKent Overstreet
2019-04-03bcachefs: fix an error path in fcollapseKent Overstreet
2019-04-03bcachefs: fix SGID + aclsKent Overstreet
2019-04-03bcachefs: fix dio write when faulting in from file we're writing toKent Overstreet
2019-04-03bcachefs: drop some dead codeKent Overstreet
2019-04-03bcachefs: kill bch2_read_string_list()Kent Overstreet
2019-04-03bcachefs: Initial commitKent Overstreet
Fork of drivers/md/bcache Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2019-04-03cifs: convert to add_to_page_cache()Kent Overstreet
2019-04-03fs: factor out d_mark_tmpfile()Kent Overstreet
New helper for bcachefs - bcachefs doesn't want the inode_dec_link_count() call that d_tmpfile does, it handles i_nlink on its own atomically with other btree updates Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2019-04-03fs: insert_inode_locked2()Kent Overstreet
New helper for bcachefs, so that when we race inserting an inode we can atomically grab a ref to the inode already in the inode cache. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2019-04-03mm: pagecache add lockKent Overstreet
Add a per address space lock around adding pages to the pagecache - making it possible for fallocate INSERT_RANGE/COLLAPSE_RANGE to work correctly, and also hopefully making truncate and dio a bit saner. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2019-03-01Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "2 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: hugetlbfs: fix races and page leaks during migration kasan: turn off asan-stack for clang-8 and earlier
2019-03-01hugetlbfs: fix races and page leaks during migrationMike Kravetz
hugetlb pages should only be migrated if they are 'active'. The routines set/clear_page_huge_active() modify the active state of hugetlb pages. When a new hugetlb page is allocated at fault time, set_page_huge_active is called before the page is locked. Therefore, another thread could race and migrate the page while it is being added to page table by the fault code. This race is somewhat hard to trigger, but can be seen by strategically adding udelay to simulate worst case scheduling behavior. Depending on 'how' the code races, various BUG()s could be triggered. To address this issue, simply delay the set_page_huge_active call until after the page is successfully added to the page table. Hugetlb pages can also be leaked at migration time if the pages are associated with a file in an explicitly mounted hugetlbfs filesystem. For example, consider a two node system with 4GB worth of huge pages available. A program mmaps a 2G file in a hugetlbfs filesystem. It then migrates the pages associated with the file from one node to another. When the program exits, huge page counts are as follows: node0 1024 free_hugepages 1024 nr_hugepages node1 0 free_hugepages 1024 nr_hugepages Filesystem Size Used Avail Use% Mounted on nodev 4.0G 2.0G 2.0G 50% /var/opt/hugepool That is as expected. 2G of huge pages are taken from the free_hugepages counts, and 2G is the size of the file in the explicitly mounted filesystem. If the file is then removed, the counts become: node0 1024 free_hugepages 1024 nr_hugepages node1 1024 free_hugepages 1024 nr_hugepages Filesystem Size Used Avail Use% Mounted on nodev 4.0G 2.0G 2.0G 50% /var/opt/hugepool Note that the filesystem still shows 2G of pages used, while there actually are no huge pages in use. The only way to 'fix' the filesystem accounting is to unmount the filesystem If a hugetlb page is associated with an explicitly mounted filesystem, this information in contained in the page_private field. At migration time, this information is not preserved. To fix, simply transfer page_private from old to new page at migration time if necessary. There is a related race with removing a huge page from a file and migration. When a huge page is removed from the pagecache, the page_mapping() field is cleared, yet page_private remains set until the page is actually freed by free_huge_page(). A page could be migrated while in this state. However, since page_mapping() is not set the hugetlbfs specific routine to transfer page_private is not called and we leak the page count in the filesystem. To fix that, check for this condition before migrating a huge page. If the condition is detected, return EBUSY for the page. Link: http://lkml.kernel.org/r/74510272-7319-7372-9ea6-ec914734c179@oracle.com Link: http://lkml.kernel.org/r/20190212221400.3512-1-mike.kravetz@oracle.com Fixes: bcc54222309c ("mm: hugetlb: introduce page_huge_active") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: <stable@vger.kernel.org> [mike.kravetz@oracle.com: v2] Link: http://lkml.kernel.org/r/7534d322-d782-8ac6-1c8d-a8dc380eb3ab@oracle.com [mike.kravetz@oracle.com: update comment and changelog] Link: http://lkml.kernel.org/r/420bcfd6-158b-38e4-98da-26d0cd85bd01@oracle.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-28Merge tag 'for-linus-5.0-ofs1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs fixlet from Mike Marshall: "Remove two un-needed BUG_ONs" * tag 'for-linus-5.0-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: orangefs: remove two un-needed BUG_ONs...
2019-02-25afs: Fix manually set volume location server listDavid Howells
When a cell with a volume location server list is added manually by echoing the details into /proc/net/afs/cells, a record is added but the flag saying it has been looked up isn't set. This causes the VL server rotation code to wait forever, with the top of /proc/pid/stack looking like: afs_select_vlserver+0x3a6/0x6f3 afs_vl_lookup_vldb+0x4b/0x92 afs_create_volume+0x25/0x1b9 ... with the thread stuck in afs_start_vl_iteration() waiting for AFS_CELL_FL_NO_LOOKUP_YET to be cleared. Fix this by clearing AFS_CELL_FL_NO_LOOKUP_YET when setting up a record if that record's details were supplied manually. Fixes: 0a5143f2f89c ("afs: Implement VL server rotation") Reported-by: Dave Botsch <dwb7@cornell.edu> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-25Revert "x86/fault: BUG() when uaccess helpers fault on kernel addresses"Linus Torvalds
This reverts commit 9da3f2b74054406f87dff7101a569217ffceb29b. It was well-intentioned, but wrong. Overriding the exception tables for instructions for random reasons is just wrong, and that is what the new code did. It caused problems for tracing, and it caused problems for strncpy_from_user(), because the new checks made perfectly valid use cases break, rather than catch things that did bad things. Unchecked user space accesses are a problem, but that's not a reason to add invalid checks that then people have to work around with silly flags (in this case, that 'kernel_uaccess_faults_ok' flag, which is just an odd way to say "this commit was wrong" and was sprinked into random places to hide the wrongness). The real fix to unchecked user space accesses is to get rid of the special "let's not check __get_user() and __put_user() at all" logic. Make __{get|put}_user() be just aliases to the regular {get|put}_user() functions, and make it impossible to access user space without having the proper checks in places. The raison d'être of the special double-underscore versions used to be that the range check was expensive, and if you did multiple user accesses, you'd do the range check up front (like the signal frame handling code, for example). But SMAP (on x86) and PAN (on ARM) have made that optimization pointless, because the _real_ expense is the "set CPU flag to allow user space access". Do let's not break the valid cases to catch invalid cases that shouldn't even exist. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org> Cc: Tobin C. Harding <tobin@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Jann Horn <jannh@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-21Merge tag 'ceph-for-5.0-rc8' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "Two bug fixes for old issues, both marked for stable" * tag 'ceph-for-5.0-rc8' of git://github.com/ceph/ceph-client: ceph: avoid repeatedly adding inode to mdsc->snap_flush_list libceph: handle an empty authorize reply