summaryrefslogtreecommitdiff
path: root/fs/nfs
AgeCommit message (Collapse)Author
2013-02-24pnfs: fix resend_to_mds for directioBenny Halevy
Pass the directio request on pageio_init to clean up the API. Percolate pg_dreq from original nfs_pageio_descriptor to the pnfs_{read,write}_done_resend_to_mds and use it on respective call to nfs_pageio_init_{read,write} on the newly created nfs_pageio_descriptor. Reproduced by command: mount -o vers=4.1 server:/ /mnt dd bs=128k count=8 if=/dev/zero of=/mnt/dd.out oflag=direct BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 IP: [<ffffffffa021a3a8>] atomic_inc+0x4/0x9 [nfs] PGD 34786067 PUD 34794067 PMD 0 Oops: 0002 [#1] SMP Modules linked in: nfs_layout_nfsv41_files nfsv4 nfs nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc btrfs zlib_deflate libcrc32c ipv6 autofs4 CPU 1 Pid: 259, comm: kworker/1:2 Not tainted 3.8.0-rc6 #2 Bochs Bochs RIP: 0010:[<ffffffffa021a3a8>] [<ffffffffa021a3a8>] atomic_inc+0x4/0x9 [nfs] RSP: 0018:ffff880038f8fa68 EFLAGS: 00010206 RAX: ffffffffa021a6a9 RBX: ffff880038f8fb48 RCX: 00000000000a0000 RDX: ffffffffa021e616 RSI: ffff8800385e9a40 RDI: 0000000000000028 RBP: ffff880038f8fa68 R08: ffffffff81ad6720 R09: ffff8800385e9510 R10: ffffffffa0228450 R11: ffff880038e87418 R12: ffff8800385e9a40 R13: ffff8800385e9a70 R14: ffff880038f8fb38 R15: ffffffffa0148878 FS: 0000000000000000(0000) GS:ffff88003e400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000028 CR3: 0000000034789000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kworker/1:2 (pid: 259, threadinfo ffff880038f8e000, task ffff880038302480) Stack: ffff880038f8fa78 ffffffffa021a6bf ffff880038f8fa88 ffffffffa021bb82 ffff880038f8fae8 ffffffffa021f454 ffff880038f8fae8 ffffffff8109689d ffff880038f8fab8 ffffffff00000006 0000000000000000 ffff880038f8fb48 Call Trace: [<ffffffffa021a6bf>] nfs_direct_pgio_init+0x16/0x18 [nfs] [<ffffffffa021bb82>] nfs_pgheader_init+0x6a/0x6c [nfs] [<ffffffffa021f454>] nfs_generic_pg_writepages+0x51/0xf8 [nfs] [<ffffffff8109689d>] ? mark_held_locks+0x71/0x99 [<ffffffffa0148878>] ? rpc_release_resources_task+0x37/0x37 [sunrpc] [<ffffffffa021bc25>] nfs_pageio_doio+0x1a/0x43 [nfs] [<ffffffffa021be7c>] nfs_pageio_complete+0x16/0x2c [nfs] [<ffffffffa02608be>] pnfs_write_done_resend_to_mds+0x95/0xc5 [nfsv4] [<ffffffffa0148878>] ? rpc_release_resources_task+0x37/0x37 [sunrpc] [<ffffffffa028e27f>] filelayout_reset_write+0x8c/0x99 [nfs_layout_nfsv41_files] [<ffffffffa028e5f9>] filelayout_write_done_cb+0x4d/0xc1 [nfs_layout_nfsv41_files] [<ffffffffa024587a>] nfs4_write_done+0x36/0x49 [nfsv4] [<ffffffffa021f996>] nfs_writeback_done+0x53/0x1cc [nfs] [<ffffffffa021fb1d>] nfs_writeback_done_common+0xe/0x10 [nfs] [<ffffffffa028e03d>] filelayout_write_call_done+0x28/0x2a [nfs_layout_nfsv41_files] [<ffffffffa01488a1>] rpc_exit_task+0x29/0x87 [sunrpc] [<ffffffffa014a0c9>] __rpc_execute+0x11d/0x3cc [sunrpc] [<ffffffff810969dc>] ? trace_hardirqs_on_caller+0x117/0x173 [<ffffffffa014a39f>] rpc_async_schedule+0x27/0x32 [sunrpc] [<ffffffffa014a378>] ? __rpc_execute+0x3cc/0x3cc [sunrpc] [<ffffffff8105f8c1>] process_one_work+0x226/0x422 [<ffffffff8105f7f4>] ? process_one_work+0x159/0x422 [<ffffffff81094757>] ? lock_acquired+0x210/0x249 [<ffffffffa014a378>] ? __rpc_execute+0x3cc/0x3cc [sunrpc] [<ffffffff810600d8>] worker_thread+0x126/0x1c4 [<ffffffff8105ffb2>] ? manage_workers+0x240/0x240 [<ffffffff81064ef8>] kthread+0xb1/0xb9 [<ffffffff81064e47>] ? __kthread_parkme+0x65/0x65 [<ffffffff815206ec>] ret_from_fork+0x7c/0xb0 [<ffffffff81064e47>] ? __kthread_parkme+0x65/0x65 Code: 00 83 38 02 74 12 48 81 4b 50 00 00 01 00 c7 83 60 07 00 00 01 00 00 00 48 89 df e8 55 fe ff ff 5b 41 5c 5d c3 66 90 55 48 89 e5 <f0> ff 07 5d c3 55 48 89 e5 f0 ff 0f 0f 94 c0 84 c0 0f 95 c0 0f RIP [<ffffffffa021a3a8>] atomic_inc+0x4/0x9 [nfs] RSP <ffff880038f8fa68> CR2: 0000000000000028 Signed-off-by: Benny Halevy <bhalevy@tonian.com> Cc: stable@kernel.org [>= 3.6] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-22new helper: file_inode(file)Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22NFS: Don't allow NFS silly-renamed files to be deleted, no signalTrond Myklebust
Commit 73ca100 broke the code that prevents the client from deleting a silly renamed dentry. This affected "delete on last close" semantics as after that commit, nothing prevented removal of silly-renamed files. As a result, a process holding a file open could easily get an ESTALE on the file in a directory where some other process issued 'rm -rf some_dir_containing_the_file' twice. Before the commit, any attempt at unlinking silly renamed files would fail inside may_delete() with -EBUSY because of the DCACHE_NFSFS_RENAMED flag. The following testcase demonstrates the problem: tail -f /nfsmnt/dir/file & rm -rf /nfsmnt/dir rm -rf /nfsmnt/dir # second removal does not fail, 'tail' process receives ESTALE The problem with the above commit is that it unhashes the old and new dentries from the lookup path, even in the normal case when a signal is not encountered and it would have been safe to call d_move. Unfortunately the old dentry has the special DCACHE_NFSFS_RENAMED flag set on it. Unhashing has the side-effect that future lookups call d_alloc(), allocating a new dentry without the special flag for any silly-renamed files. As a result, subsequent calls to unlink silly renamed files do not fail but allow the removal to go through. This will result in ESTALE errors for any other process doing operations on the file. To fix this, go back to using d_move on success. For the signal case, it's unclear what we may safely do beyond d_drop. Reported-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Jeff Layton <jlayton@redhat.com> Cc: stable@vger.kernel.org
2013-02-17umount oops when remove blocklayoutdriver firstfanchaoting
now pnfs client uses block layout, maybe we can remove blocklayoutdriver first. if we umount later, it can cause oops in unset_pnfs_layoutdriver. because nfss->pnfs_curr_ld->clear_layoutdriver is invalid. reproduce it: modprobe blocklayoutdriver mount -t nfs4 -o minorversion=1 pnfsip:/ /mnt/ rmmod blocklayoutdriver umount /mnt then you can see following CPU 0 Pid: 17023, comm: umount.nfs4 Tainted: GF O 3.7.0-rc6-pnfs #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform RIP: 0010:[<ffffffffa04cfe6d>] [<ffffffffa04cfe6d>] unset_pnfs_layoutdriver+0x1d/0x70 [nfsv4] RSP: 0018:ffff8800022d9e48 EFLAGS: 00010286 RAX: ffffffffa04a1b00 RBX: ffff88000b013800 RCX: 0000000000000001 RDX: ffffffff81ae8ee0 RSI: ffff880001ee94b8 RDI: ffff88000b013800 RBP: ffff8800022d9e58 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff880001ee9400 R13: ffff8800105978c0 R14: 00007fff25846c08 R15: 0000000001bba550 FS: 00007f45ae7f0700(0000) GS:ffff880012c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffffffa04a1b38 CR3: 0000000002c0c000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process umount.nfs4 (pid: 17023, threadinfo ffff8800022d8000, task ffff880006e48aa0) Stack: ffff8800105978c0 ffff88000b013800 ffff8800022d9e78 ffffffffa04cd0ce ffff8800022d9e78 ffff88000b013800 ffff8800022d9ea8 ffffffffa04755a7 ffff8800022d9ea8 ffff880002f96400 ffff88000b013800 ffff880002f96400 Call Trace: [<ffffffffa04cd0ce>] nfs4_destroy_server+0x1e/0x30 [nfsv4] [<ffffffffa04755a7>] nfs_free_server+0xb7/0x150 [nfs] [<ffffffffa047d4d5>] nfs_kill_super+0x35/0x40 [nfs] [<ffffffff81178d35>] deactivate_locked_super+0x45/0x70 [<ffffffff8117986a>] deactivate_super+0x4a/0x70 [<ffffffff81193ee2>] mntput_no_expire+0xd2/0x130 [<ffffffff81194d62>] sys_umount+0x72/0xe0 [<ffffffff8154af59>] system_call_fastpath+0x16/0x1b Code: 06 e1 b8 ea ff ff ff eb 9e 0f 1f 44 00 00 55 48 89 e5 53 48 83 ec 08 66 66 66 66 90 48 8b 87 80 03 00 00 48 89 fb 48 85 c0 74 29 <48> 8b 40 38 48 85 c0 74 02 ff d0 48 8b 03 3e ff 48 04 0f 94 c2 RIP [<ffffffffa04cfe6d>] unset_pnfs_layoutdriver+0x1d/0x70 [nfsv4] RSP <ffff8800022d9e48> CR2: ffffffffa04a1b38 ---[ end trace 29f75aaedda058bf ]--- Signed-off-by: fanchaoting<fanchaoting@cn.fujitsu.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
2013-02-17nfs: remove kfree() redundant null checksTim Gardner
smatch analysis: fs/nfs/getroot.c:130 nfs_get_root() info: redundant null check on name calling kfree() fs/nfs/unlink.c:272 nfs_async_unlink() info: redundant null check on devname_garbage calling kfree() Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: linux-nfs@vger.kernel.org Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-17NFSv4.1: Don't decode skipped layoutgetsWeston Andros Adamson
layoutget's prepare hook can call rpc_exit with status = NFS4_OK (0). Because of this, nfs4_proc_layoutget can't depend on a 0 status to mean that the RPC was successfully sent, received and parsed. To fix this, use the result's len member to see if parsing took place. This fixes the following OOPS -- calling xdr_init_decode() with a buffer length 0 doesn't set the stream's 'p' member and ends up using uninitialized memory in filelayout_decode_layout. BUG: unable to handle kernel paging request at 0000000000008050 IP: [<ffffffff81282e78>] memcpy+0x18/0x120 PGD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:11.0/0000:02:01.0/irq CPU 1 Modules linked in: nfs_layout_nfsv41_files nfs lockd fscache auth_rpcgss nfs_acl autofs4 sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_mirror dm_region_hash dm_log dm_mod ppdev parport_pc parport snd_ens1371 snd_rawmidi snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000 microcode vmware_balloon i2c_piix4 i2c_core sg shpchp ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptspi mptscsih mptbase scsi_transport_spi [last unloaded: speedstep_lib] Pid: 1665, comm: flush-0:22 Not tainted 2.6.32-356-test-2 #2 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform RIP: 0010:[<ffffffff81282e78>] [<ffffffff81282e78>] memcpy+0x18/0x120 RSP: 0018:ffff88003dfab588 EFLAGS: 00010206 RAX: ffff88003dc42000 RBX: ffff88003dfab610 RCX: 0000000000000009 RDX: 000000003f807ff0 RSI: 0000000000008050 RDI: ffff88003dc42000 RBP: ffff88003dfab5b0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000024 R13: ffff88003dc42000 R14: ffff88003f808030 R15: ffff88003dfab6a0 FS: 0000000000000000(0000) GS:ffff880003420000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000008050 CR3: 000000003bc92000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process flush-0:22 (pid: 1665, threadinfo ffff88003dfaa000, task ffff880037f77540) Stack: ffffffffa0398ac1 ffff8800397c5940 ffff88003dfab610 ffff88003dfab6a0 <d> ffff88003dfab5d0 ffff88003dfab680 ffffffffa01c150b ffffea0000d82e70 <d> 000000508116713b 0000000000000000 0000000000000000 0000000000000000 Call Trace: [<ffffffffa0398ac1>] ? xdr_inline_decode+0xb1/0x120 [sunrpc] [<ffffffffa01c150b>] filelayout_decode_layout+0xeb/0x350 [nfs_layout_nfsv41_files] [<ffffffffa01c17fc>] filelayout_alloc_lseg+0x8c/0x3c0 [nfs_layout_nfsv41_files] [<ffffffff8150e6ce>] ? __wait_on_bit+0x7e/0x90 Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
2013-02-15SUNRPC: remove "cache_request" argument in sunrpc_cache_pipe_upcall() functionStanislav Kinsbursky
Passing this pointer is redundant since it's stored on cache_detail structure, which is also passed to sunrpc_cache_pipe_upcall () function. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-15SUNRPC: introduce cache_detail->cache_request callbackStanislav Kinsbursky
This callback will allow to simplify upcalls in further patches in this series. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-15NFS: simplify and clean cache libraryStanislav Kinsbursky
This is a cleanup patch. Such helpers like nfs_cache_init() and nfs_cache_destroy() are redundant, because they are just a wrappers around sunrpc_init_cache_detail() and sunrpc_destroy_cache_detail() respectively. So let's remove them completely and move corresponding logic to nfs_cache_register_net() and nfs_cache_unregister_net() respectively (since they are called together anyway). Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-15NFS: use SUNRPC cache creation and destruction helper for DNS cacheStanislav Kinsbursky
This cache was the first containerized and doesn't use net-aware cache creation and destruction helpers. This is a cleanup patch which just makes code looks clearer and reduce amount of lines of code. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-14NFSv4.1: Fix bulk recall and destroy of layoutsTrond Myklebust
The current code in pnfs_destroy_all_layouts() assumes that removing the layout from the server->layouts list is sufficient to make it invisible to other processes. This ignores the fact that most users access the layout through the nfs_inode->layout... There is further breakage due to lack of reference counting of the layouts, meaning that the whole thing Oopses at the drop of a hat. The code in initiate_bulk_draining() is almost correct, and can be used as a model for pnfs_destroy_all_layouts(), so move that code to pnfs.c, and refactor the code to allow us to choose between a single filesystem bulk recall, and a recall of all layouts. Also note that initiate_bulk_draining() currently calls iput() while holding locks. Fix that too. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
2013-02-13nfs: kuid and kgid conversions for nfs/inode.cEric W. Biederman
- Use uid_eq and gid_eq when comparing kuids and kgids. - Use make_kuid(&init_user_ns, -2) and make_kgid(&init_user_ns, -2) as the initial uid and gid on nfs inodes, instead of using the typeunsafe value of -2. Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13nfs: Convert nfs4xdr to use kuids and kgidsEric W. Biederman
When reading uids and gids off the wire convert them to kuids and kgids. When putting kuids and kgids onto the wire first convert them to uids and gids the other side will understand. When printing kuids and kgids convert them to values in the initial user namespace then use normal printf formats. Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13nfs: Convert nfs3xdr to use kuids and kgidsEric W. Biederman
When reading uids and gids off the wire convert them to kuids and kgids. When putting kuids and kgids onto the wire first convert them to uids and gids the other side will understand. Add an additional failure mode incoming for uids or gids that are invalid. Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13nfs: Convert nfs2xdr to use kuids and kgidsEric W. Biederman
When reading uids and gids off the wire convert them to kuids and kgids. When putting kuids and kgids onto the wire first convert them to uids and gids the other side will understand. Add an additional failure mode for incoming uid or gids that are invalid. Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13nfs: Convert idmap to use kuids and kgidsEric W. Biederman
Convert nfs_map_name_to_uid to return a kuid_t value. Convert nfs_map_name_to_gid to return a kgid_t value. Convert nfs_map_uid_to_name to take a kuid_t paramater. Convert nfs_map_gid_to_name to take a kgid_t paramater. Tweak nfs_fattr_map_owner_to_name to use a kuid_t intermediate value. Tweak nfs_fattr_map_group_to_name to use a kgid_t intermediate value. Which makes these functions properly handle kuids and kgids, including erroring of the generated kuid or kgid is invalid. Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13nfs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring allocEric W. Biederman
Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-11NFSv4.1: Fix an ABBA locking issue with session and state serialisationTrond Myklebust
Ensure that if nfs_wait_on_sequence() causes our rpc task to wait for an NFSv4 state serialisation lock, then we also drop the session slot. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
2013-02-11NFSv4: Fix a reboot recovery race when opening a fileTrond Myklebust
If the server reboots after it has replied to our OPEN, but before we call nfs4_opendata_to_nfs4_state(), then the reboot recovery thread will not see a stateid for this open, and so will fail to recover it. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-11NFSv4: Ensure delegation recall and byte range lock removal don't conflictTrond Myklebust
Add a mutex to the struct nfs4_state_owner to ensure that delegation recall doesn't conflict with byte range lock removal. Note that we nest the new mutex _outside_ the state manager reclaim protection (nfsi->rwsem) in order to avoid deadlocks. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-11NFSv4: Fix up the return values of nfs4_open_delegation_recallTrond Myklebust
Adjust the return values so that they return EAGAIN to the caller in cases where we might want to retry the delegation recall after the state recovery has run. Note that we can't wait and retry in this routine, because the caller may be the state manager thread. If delegation recall fails due to a session or reboot related issue, also ensure that we mark the stateid as delegated so that nfs_delegation_claim_opens can find it again later. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-11NFSv4.1: Don't lose locks when a server reboots during delegation returnTrond Myklebust
If the server reboots while we are converting a delegation into OPEN/LOCK stateids as part of a delegation return, the current code will simply exit with an error. This causes us to lose both delegation state and locking state (i.e. locking atomicity). Deal with this by exposing the delegation stateid during delegation return, so that we can recover the delegation, and then resume open/lock recovery. Note that not having to hold the nfs_inode->rwsem across the calls to nfs_delegation_claim_opens() also fixes a deadlock against the NFSv4.1 reboot recovery code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-11NFSv4.1: Prevent deadlocks between state recovery and file lockingTrond Myklebust
We currently have a deadlock in which the state recovery thread ends up blocking due to one of the locks which it is trying to recover holding the nfs_inode->rwsem. The situation is as follows: the state recovery thread is scheduled in order to recover from a reboot. It immediately drains the session, forcing all ordinary NFSv4.1 calls to nfs41_setup_sequence() to be put to sleep. This includes the file locking process that holds the nfs_inode->rwsem. When the thread gets to nfs4_reclaim_locks(), it tries to grab a write lock on nfs_inode->rwsem, and boom... Fix is to have the lock drop the nfs_inode->rwsem while it is doing RPC calls. We use a sequence lock in order to signal to the locking process whether or not a state recovery thread has run on that inode, in which case it should retry the lock. Reported-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-11NFSv4: Allow the state manager to mark an open_owner as being recoveredTrond Myklebust
This patch adds a seqcount_t lock for use by the state manager to signal that an open owner has been recovered. This mechanism will be used by the delegation, open and byte range lock code in order to figure out if they need to replay requests due to collisions with lock recovery. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-05sunrpc: move address copy/cmp/convert routines and prototypes from clnt.h to ↵Jeff Layton
addr.h These routines are used by server and client code, so having them in a separate header would be best. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-01Revert "NFS: add nfs_sb_deactive_async to avoid deadlock"Trond Myklebust
This reverts commit 324d003b0cd82151adbaecefef57b73f7959a469. The deadlock turned out to be caused by a workqueue limitation that has now been worked around in the RPC code (see comment in rpc_free_task). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-01-30NFSv4.1: Handle NFS4ERR_DELAY when resetting the NFSv4.1 sessionTrond Myklebust
NFS4ERR_DELAY is a legal reply when we call DESTROY_SESSION. It usually means that the server is busy handling an unfinished RPC request. Just sleep for a second and then retry. We also need to be able to handle the NFS4ERR_BACK_CHAN_BUSY return value. If the NFS server has outstanding callbacks, we just want to similarly sleep & retry. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
2013-01-30NFS: Don't silently fail setattr() requests on mountpointsTrond Myklebust
Ensure that any setattr and getattr requests for junctions and/or mountpoints are sent to the server. Ever since commit 0ec26fd0698 (vfs: automount should ignore LOOKUP_FOLLOW), we have silently dropped any setattr requests to a server-side mountpoint. For referrals, we have silently dropped both getattr and setattr requests. This patch restores the original behaviour for setattr on mountpoints, and tries to do the same for referrals, provided that we have a filehandle... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
2013-01-27NFSv4.1: Ensure that nfs41_walk_client_list() does start lease recoveryTrond Myklebust
We do need to start the lease recovery thread prior to waiting for the client initialisation to complete in NFSv4.1. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Ben Greear <greearb@candelatech.com> Cc: stable@vger.kernel.org [>=3.7]
2013-01-27NFSv4: Fix NFSv4 trunking discoveryTrond Myklebust
If walking the list in nfs4[01]_walk_client_list fails, then the most likely explanation is that the server dropped the clientid before we actually managed to confirm it. As long as our nfs_client is the very last one in the list to be tested, the caller can be assured that this is the case when the final return value is NFS4ERR_STALE_CLIENTID. Reported-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org [>=3.7] Tested-by: Ben Greear <greearb@candelatech.com>
2013-01-27NFSv4: Fix NFSv4 reference counting for trunked sessionsTrond Myklebust
The reference counting in nfs4_init_client assumes wongly that it is safe for nfs4_discover_server_trunking() to return a pointer to a nfs_client prior to bumping the reference count. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Ben Greear <greearb@candelatech.com> Cc: stable@vger.kernel.org [>=3.7]
2013-01-27NFS: Fix error reporting in nfs_xdev_mountTrond Myklebust
Currently, nfs_xdev_mount converts all errors from clone_server() to ENOMEM, which can then leak to userspace (for instance to 'mount'). Fix that. Also ensure that if nfs_fs_mount_common() returns an error, we don't dprintk(0)... The regression originated in commit 3d176e3fe4f6dc379b252bf43e2e146a8f7caf01 (NFS: Use nfs_fs_mount_common() for xdev mounts) Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org [>= 3.5]
2013-01-05nfs: avoid dereferencing null pointer in initiate_bulk_drainingNickolai Zeldovich
Fix an inverted null pointer check in initiate_bulk_draining(). Signed-off-by: Nickolai Zeldovich <nickolai@csail.mit.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org [>= 3.7]
2013-01-04NFS: Ensure that we free the rpc_task after read and write cleanups are doneTrond Myklebust
This patch ensures that we free the rpc_task after the cleanup callbacks are done in order to avoid a deadlock problem that can be triggered if the callback needs to wait for another workqueue item to complete. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Weston Andros Adamson <dros@netapp.com> Cc: Tejun Heo <tj@kernel.org> Cc: Bruce Fields <bfields@fieldses.org> Cc: stable@vger.kernel.org [>= 3.5]
2013-01-04nfs: fix null checking in nfs_get_option_str()Xi Wang
The following null pointer check is broken. *option = match_strdup(args); return !option; The pointer `option' must be non-null, and thus `!option' is always false. Use `!*option' instead. The bug was introduced in commit c5cb09b6f8 ("Cleanup: Factor out some cut-and-paste code."). Signed-off-by: Xi Wang <xi.wang@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-01-04pnfs: Increase the refcount when LAYOUTGET fails the first timeYanchuan Nian
The layout will be set unusable if LAYOUTGET fails. Is it reasonable to increase the refcount iff LAYOUTGET fails the first time? Signed-off-by: Yanchuan Nian <ycnian@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org [>= 3.7]
2013-01-03NFS: Fix access to suid/sgid executablesWeston Andros Adamson
nfs_open_permission_mask() should only check MAY_EXEC for files that are opened with __FMODE_EXEC. Also fix NFSv4 access-in-open path in a similar way -- openflags must be used because fmode will not always have FMODE_EXEC set. This patch fixes https://bugzilla.kernel.org/show_bug.cgi?id=49101 Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
2012-12-21NFS: Kill fscache warnings when mounting without -ofscTrond Myklebust
The fscache code will currently bleat a "non-unique superblock keys" warning even if the user is mounting without the 'fsc' option. There should be no reason to even initialise the superblock cache cookie unless we're planning on using fscache for something, so ensure that we check for the NFS_OPTION_FSCACHE flag before calling into the fscache code. Reported-by: Paweł Sikora <pawel.sikora@agmk.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: David Howells <dhowells@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-21NFS: Provide stub nfs_fscache_wait_on_invalidate() for when CONFIG_NFS_FSCACHE=nDavid Howells
Provide a stub nfs_fscache_wait_on_invalidate() function for when CONFIG_NFS_FSCACHE=n lest the following error appear: fs/nfs/inode.c: In function 'nfs_invalidate_mapping': fs/nfs/inode.c:887:2: error: implicit declaration of function 'nfs_fscache_wait_on_invalidate' [-Werror=implicit-function-declaration] cc1: some warnings being treated as errors Reported-by: kbuild test robot <fengguang.wu@intel.com> Reported-by: Vineet Gupta <Vineet.Gupta1@synopsys.com> Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-20NFS4: Open files for fscachingDavid Howells
nfs4_file_open() should open files for fscaching. Signed-off-by: David Howells <dhowells@redhat.com>
2012-12-20NFS: nfs_migrate_page() does not wait for FS-Cache to finish with a pageDavid Howells
nfs_migrate_page() does not wait for FS-Cache to finish with a page, probably leading to the following bad-page-state: BUG: Bad page state in process python-bin pfn:17d39b page:ffffea00053649e8 flags:004000000000100c count:0 mapcount:0 mapping:(null) index:38686 (Tainted: G B ---------------- ) Pid: 31053, comm: python-bin Tainted: G B ---------------- 2.6.32-71.24.1.el6.x86_64 #1 Call Trace: [<ffffffff8111bfe7>] bad_page+0x107/0x160 [<ffffffff8111ee69>] free_hot_cold_page+0x1c9/0x220 [<ffffffff8111ef19>] __pagevec_free+0x59/0xb0 [<ffffffff8104b988>] ? flush_tlb_others_ipi+0x128/0x130 [<ffffffff8112230c>] release_pages+0x21c/0x250 [<ffffffff8115b92a>] ? remove_migration_pte+0x28a/0x2b0 [<ffffffff8115f3f8>] ? mem_cgroup_get_reclaim_stat_from_page+0x18/0x70 [<ffffffff81122687>] ____pagevec_lru_add+0x167/0x180 [<ffffffff811226f8>] __lru_cache_add+0x58/0x70 [<ffffffff81122731>] lru_cache_add_lru+0x21/0x40 [<ffffffff81123f49>] putback_lru_page+0x69/0x100 [<ffffffff8115c0bd>] migrate_pages+0x13d/0x5d0 [<ffffffff81122687>] ? ____pagevec_lru_add+0x167/0x180 [<ffffffff81152ab0>] ? compaction_alloc+0x0/0x370 [<ffffffff8115255c>] compact_zone+0x4cc/0x600 [<ffffffff8111cfac>] ? get_page_from_freelist+0x15c/0x820 [<ffffffff810672f4>] ? check_preempt_wakeup+0x1c4/0x3c0 [<ffffffff8115290e>] compact_zone_order+0x7e/0xb0 [<ffffffff81152a49>] try_to_compact_pages+0x109/0x170 [<ffffffff8111e94d>] __alloc_pages_nodemask+0x5ed/0x850 [<ffffffff814c9136>] ? thread_return+0x4e/0x778 [<ffffffff81150d43>] alloc_pages_vma+0x93/0x150 [<ffffffff81167ea5>] do_huge_pmd_anonymous_page+0x135/0x340 [<ffffffff814cb6f6>] ? rwsem_down_read_failed+0x26/0x30 [<ffffffff81136755>] handle_mm_fault+0x245/0x2b0 [<ffffffff814ce383>] do_page_fault+0x123/0x3a0 [<ffffffff814cbdf5>] page_fault+0x25/0x30 nfs_migrate_page() calls nfs_fscache_release_page() which doesn't actually wait - even if __GFP_WAIT is set. The reason that doesn't wait is that fscache_maybe_release_page() might deadlock the allocator as the work threads writing to the cache may all end up sleeping on memory allocation. However, I wonder if that is actually a problem. There are a number of things I can do to deal with this: (1) Make nfs_migrate_page() wait. (2) Make fscache_maybe_release_page() honour the __GFP_WAIT flag. (3) Set a timeout around the wait. (4) Make nfs_migrate_page() return an error if the page is still busy. For the moment, I'll select (2) and (4). Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Jeff Layton <jlayton@redhat.com>
2012-12-20NFS: Use FS-Cache invalidationDavid Howells
Use the new FS-Cache invalidation facility from NFS to deal with foreign changes being detected on the server rather than attempting to retire the old cookie and get a new one. The problem with the old method was that NFS did not wait for all outstanding storage and retrieval ops on the cache to complete. There was no automatic wait between the calls to ->readpages() and calls to invalidate_inode_pages2() as the latter can only wait on locked pages that have been added to the pagecache (which they haven't yet on entry to ->readpages()). This was leading to oopses like the one below when an outstanding read got cut off from its cookie by a premature release. BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 IP: [<ffffffffa0075118>] __fscache_read_or_alloc_pages+0x1dd/0x315 [fscache] PGD 15889067 PUD 15890067 PMD 0 Oops: 0000 [#1] SMP CPU 0 Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc Pid: 4544, comm: tar Not tainted 3.1.0-rc4-fsdevel+ #1064 /DG965RY RIP: 0010:[<ffffffffa0075118>] [<ffffffffa0075118>] __fscache_read_or_alloc_pages+0x1dd/0x315 [fscache] RSP: 0018:ffff8800158799e8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8800070d41e0 RCX: ffff8800083dc1b0 RDX: 0000000000000000 RSI: ffff880015879960 RDI: ffff88003e627b90 RBP: ffff880015879a28 R08: 0000000000000002 R09: 0000000000000002 R10: 0000000000000001 R11: ffff880015879950 R12: ffff880015879aa4 R13: 0000000000000000 R14: ffff8800083dc158 R15: ffff880015879be8 FS: 00007f671e9d87c0(0000) GS:ffff88003bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000000000a8 CR3: 000000001587f000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process tar (pid: 4544, threadinfo ffff880015878000, task ffff880015875040) Stack: ffffffffa00b1759 ffff8800070dc158 ffff8800000213da ffff88002a286508 ffff880015879aa4 ffff880015879be8 0000000000000001 ffff88002a2866e8 ffff880015879a88 ffffffffa00b20be 00000000000200da ffff880015875040 Call Trace: [<ffffffffa00b1759>] ? nfs_fscache_wait_bit+0xd/0xd [nfs] [<ffffffffa00b20be>] __nfs_readpages_from_fscache+0x7e/0x13f [nfs] [<ffffffff81095fe7>] ? __alloc_pages_nodemask+0x156/0x662 [<ffffffffa0098763>] nfs_readpages+0xee/0x187 [nfs] [<ffffffff81098a5e>] __do_page_cache_readahead+0x1be/0x267 [<ffffffff81098942>] ? __do_page_cache_readahead+0xa2/0x267 [<ffffffff81098d7b>] ra_submit+0x1c/0x20 [<ffffffff8109900a>] ondemand_readahead+0x28b/0x29a [<ffffffff810990ce>] page_cache_sync_readahead+0x38/0x3a [<ffffffff81091d8a>] generic_file_aio_read+0x2ab/0x67e [<ffffffffa008cfbe>] nfs_file_read+0xa4/0xc9 [nfs] [<ffffffff810c22c4>] do_sync_read+0xba/0xfa [<ffffffff810a62c9>] ? might_fault+0x4e/0x9e [<ffffffff81177a47>] ? security_file_permission+0x7b/0x84 [<ffffffff810c25dd>] ? rw_verify_area+0xab/0xc8 [<ffffffff810c29a4>] vfs_read+0xaa/0x13a [<ffffffff810c2a79>] sys_read+0x45/0x6c [<ffffffff813ac37b>] system_call_fastpath+0x16/0x1b Reported-by: Mark Moseley <moseleymark@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
2012-12-18Merge tag 'nfs-for-3.8-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Features include: - Full audit of BUG_ON asserts in the NFS, SUNRPC and lockd client code. Remove altogether where possible, and replace with WARN_ON_ONCE and appropriate error returns where not. - NFSv4.1 client adds session dynamic slot table management. There is matching server side code that has been submitted to Bruce for consideration. Together, this code allows the server to dynamically manage the amount of memory it allocates to the duplicate request cache for each client. It will constantly resize those caches to reserve more memory for clients that are hot while shrinking caches for those that are quiescent. In addition, there are assorted bugfixes for the generic NFS write code, fixes to deal with the drop_nlink() warnings, and yet another fix for NFSv4 getacl." * tag 'nfs-for-3.8-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (106 commits) SUNRPC: continue run over clients list on PipeFS event instead of break NFS: Don't use SetPageError in the NFS writeback code SUNRPC: variable 'svsk' is unused in function bc_send_request SUNRPC: Handle ECONNREFUSED in xs_local_setup_socket NFSv4.1: Deal effectively with interrupted RPC calls. NFSv4.1: Move the RPC timestamp out of the slot. NFSv4.1: Try to deal with NFS4ERR_SEQ_MISORDERED. NFS: nfs_lookup_revalidate should not trust an inode with i_nlink == 0 NFS: Fix calls to drop_nlink() NFS: Ensure that we always drop inodes that have been marked as stale nfs: Remove unused list nfs4_clientid_list nfs: Remove duplicate function declaration in internal.h NFS: avoid NULL dereference in nfs_destroy_server SUNRPC handle EKEYEXPIRED in call_refreshresult SUNRPC set gss gc_expiry to full lifetime nfs: fix page dirtying in NFS DIO read codepath nfs: don't zero out the rest of the page if we hit the EOF on a DIO READ NFSv4.1: Be conservative about the client highest slotid NFSv4.1: Handle NFS4ERR_BADSLOT errors correctly nfs: don't extend writes to cover entire page if pagecache is invalid ...
2012-12-17lseek: the "whence" argument is called "whence"Andrew Morton
But the kernel decided to call it "origin" instead. Fix most of the sites. Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-16Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "A quiet cycle for the security subsystem with just a few maintenance updates." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: Smack: create a sysfs mount point for smackfs Smack: use select not depends in Kconfig Yama: remove locking from delete path Yama: add RCU to drop read locking drivers/char/tpm: remove tasklet and cleanup KEYS: Use keyring_alloc() to create special keyrings KEYS: Reduce initial permissions on keys KEYS: Make the session and process keyrings per-thread seccomp: Make syscall skipping and nr changes more consistent key: Fix resource leak keys: Fix unreachable code KEYS: Add payload preparsing opportunity prior to key instantiate or update
2012-12-15NFS: Don't use SetPageError in the NFS writeback codeTrond Myklebust
The writeback code is already capable of passing errors back to user space by means of the open_context->error. In the case of ENOSPC, Neil Brown is reporting seeing 2 errors being returned. Neil writes: "e.g. if /mnt2/ if an nfs mounted filesystem that has no space then strace dd if=/dev/zero conv=fsync >> /mnt2/afile count=1 reported Input/output error and the relevant parts of the strace output are: write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512 fsync(1) = -1 EIO (Input/output error) close(1) = -1 ENOSPC (No space left on device)" Neil then shows that the duplication of error messages appears to be due to the use of the PageError() mechanism, which causes filemap_fdatawait_range to return the extra EIO. The regression was introduced by commit 7b281ee026552f10862b617a2a51acf49c829554 (NFS: fsync() must exit with an error if page writeback failed). Fix this by removing the call to SetPageError(), and just relying on open_context->error reporting the ENOSPC back to fsync(). Reported-by: Neil Brown <neilb@suse.de> Tested-by: Neil Brown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org [3.6+]
2012-12-15NFSv4.1: Deal effectively with interrupted RPC calls.Trond Myklebust
If an RPC call is interrupted, assume that the server hasn't processed the RPC call so that the next time we use the slot, we know that if we get a NFS4ERR_SEQ_MISORDERED or NFS4ERR_SEQ_FALSE_RETRY, we just have to bump the sequence number. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-15NFSv4.1: Move the RPC timestamp out of the slot.Trond Myklebust
Shave a few bytes off the slot table size by moving the RPC timestamp into the sequence results. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-15NFSv4.1: Try to deal with NFS4ERR_SEQ_MISORDERED.Trond Myklebust
If the server returns NFS4ERR_SEQ_MISORDERED, it could be a sign that the slot was retired at some point. Retry the attempt after reinitialising the slot sequence number to 1. Also add a handler for NFS4ERR_SEQ_FALSE_RETRY. Just bump the slot sequence number and retry... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-14NFS: nfs_lookup_revalidate should not trust an inode with i_nlink == 0Trond Myklebust
If the inode has no links, then we should force a new lookup. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>