summaryrefslogtreecommitdiff
path: root/fs/proc
AgeCommit message (Collapse)Author
2018-05-17fs/proc: re-factor proc_pid_cmdline_read() a bitLinus Torvalds
This is a pure refactoring of the function, preparing for some further cleanups. The thing was pretty illegible, and the core functionality still is, but now the core loop is a bit more isolated from the thing that goes on around it. This was "inspired" by the confluence of kworker workqueue name cleanups by Tejun, currently scheduled for 4.18, and commit 7f7ccc2ccc2e ("proc: do not access cmdline nor environ from file-backed areas"). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-17proc: do not access cmdline nor environ from file-backed areasWilly Tarreau
proc_pid_cmdline_read() and environ_read() directly access the target process' VM to retrieve the command line and environment. If this process remaps these areas onto a file via mmap(), the requesting process may experience various issues such as extra delays if the underlying device is slow to respond. Let's simply refuse to access file-backed areas in these functions. For this we add a new FOLL_ANON gup flag that is passed to all calls to access_remote_vm(). The code already takes care of such failures (including unmapped areas). Accesses via /proc/pid/mem were not changed though. This was assigned CVE-2018-1120. Note for stable backports: the patch may apply to kernels prior to 4.11 but silently miss one location; it must be checked that no call to access_remote_vm() keeps zero as the last argument. Reported-by: Qualys Security Advisory <qsa@qualys.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Oleg Nesterov <oleg@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-16proc: update SIZEOF_PDE_INLINE_NAME for the new pde fieldsChristoph Hellwig
This makes Alexey happy and Al groan. Based on a patch from Alexey Dobriyan. Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-16tty: replace ->proc_fops with ->proc_showChristoph Hellwig
Just set up the show callback in the tty_operations, and use proc_create_single_data to create the file without additional boilerplace code. Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-16proc: introduce proc_create_net_singleChristoph Hellwig
Variant of proc_create_data that directly take a seq_file show callback and deals with network namespaces in ->open and ->release. All callers of proc_create + single_open_net converted over, and single_{open,release}_net are removed entirely. Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-16proc: introduce proc_create_net{,_data}Christoph Hellwig
Variants of proc_create{,_data} that directly take a struct seq_operations and deal with network namespaces in ->open and ->release. All callers of proc_create + seq_open_net converted over, and seq_{open,release}_net are removed entirely. Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-16proc: introduce proc_create_single{,_data}Christoph Hellwig
Variants of proc_create{,_data} that directly take a seq_file show callback and drastically reduces the boilerplate code in the callers. All trivial callers converted over. Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-16proc: introduce proc_create_seq_privateChristoph Hellwig
Variant of proc_create_data that directly take a struct seq_operations argument + a private state size and drastically reduces the boilerplate code in the callers. All trivial callers converted over. Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-16proc: introduce proc_create_seq{,_data}Christoph Hellwig
Variants of proc_create{,_data} that directly take a struct seq_operations argument and drastically reduces the boilerplate code in the callers. All trivial callers converted over. Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-16proc: add a proc_create_reg helperChristoph Hellwig
Common code for creating a regular file. Factor out of proc_create_data, to be reused by other functions soon. Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-16proc: simplify proc_register calling conventionsChristoph Hellwig
Return registered entry on success, return NULL on failure and free the passed in entry. Also expose it in internal.h as we'll start using it in proc_net.c soon. Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-16proc: don't detour through seq->private to get the inodeChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-16proc: introduce a proc_pid_ns helperChristoph Hellwig
Factor out retrieving the per-sb pid namespaces from the sb private data into an easier to understand helper. Suggested-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-14vmcore: append device dumps to vmcore as elf notesRahul Lakkireddy
Update read and mmap logic to append device dumps as additional notes before the other elf notes. We add device dumps before other elf notes because the other elf notes may not fill the elf notes buffer completely and we will end up with zero-filled data between the elf notes and the device dumps. Tools will then try to decode this zero-filled data as valid notes and we don't want that. Hence, adding device dumps before the other elf notes ensure that zero-filled data can be avoided. This also ensures that the device dumps and the other elf notes can be properly mmaped at page aligned address. Incorporate device dump size into the total vmcore size. Also update offsets for other program headers after the device dumps are added. Suggested-by: Eric Biederman <ebiederm@xmission.com>. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14vmcore: add API to collect hardware dump in second kernelRahul Lakkireddy
The sequence of actions done by device drivers to append their device specific hardware/firmware logs to /proc/vmcore are as follows: 1. During probe (before hardware is initialized), device drivers register to the vmcore module (via vmcore_add_device_dump()), with callback function, along with buffer size and log name needed for firmware/hardware log collection. 2. vmcore module allocates the buffer with requested size. It adds an Elf note and invokes the device driver's registered callback function. 3. Device driver collects all hardware/firmware logs into the buffer and returns control back to vmcore module. Ensure that the device dump buffer size is always aligned to page size so that it can be mmaped. Also, rename alloc_elfnotes_buf() to vmcore_alloc_buf() to make it more generic and reserve NT_VMCOREDD note type to indicate vmcore device dump. Suggested-by: Eric Biederman <ebiederm@xmission.com>. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11proc/kcore: don't bounds check against address 0Laura Abbott
The existing kcore code checks for bad addresses against __va(0) with the assumption that this is the lowest address on the system. This may not hold true on some systems (e.g. arm64) and produce overflows and crashes. Switch to using other functions to validate the address range. It's currently only seen on arm64 and it's not clear if anyone wants to use that particular combination on a stable release. So this is not urgent for stable. Link: http://lkml.kernel.org/r/20180501201143.15121-1-labbott@redhat.com Signed-off-by: Laura Abbott <labbott@redhat.com> Tested-by: Dave Anderson <anderson@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Alexey Dobriyan <adobriyan@gmail.com>a Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-09proc: Use underscores for SSBD in 'status'Konrad Rzeszutek Wilk
The style for the 'status' file is CamelCase or this. _. Fixes: fae1fa0fc ("proc: Provide details on speculation flaw mitigations") Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-09mm/pkeys, x86, powerpc: Display pkey in smaps if arch supports pkeysRam Pai
Currently the architecture specific code is expected to display the protection keys in smap for a given vma. This can lead to redundant code and possibly to divergent formats in which the key gets displayed. This patch changes the implementation. It displays the pkey only if the architecture support pkeys, i.e arch_pkeys_enabled() returns true. x86 arch_show_smap() function is not needed anymore, delete it. Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> [mpe: Split out from larger patch, rebased on header changes] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Dave Hansen <dave.hansen@intel.com>
2018-05-09mm, powerpc, x86: introduce an additional vma bit for powerpc pkeyRam Pai
Currently only 4bits are allocated in the vma flags to hold 16 keys. This is sufficient for x86. PowerPC supports 32 keys, which needs 5bits. This patch allocates an additional bit. Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> [mpe: Fold in #if VM_PKEY_BIT4 as noticed by Dave Hansen] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-09mm, powerpc, x86: define VM_PKEY_BITx bits if CONFIG_ARCH_HAS_PKEYS is enabledRam Pai
VM_PKEY_BITx are defined only if CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS is enabled. Powerpc also needs these bits. Hence lets define the VM_PKEY_BITx bits for any architecture that enables CONFIG_ARCH_HAS_PKEYS. Reviewed-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-05prctl: Add force disable speculationThomas Gleixner
For certain use cases it is desired to enforce mitigations so they cannot be undone afterwards. That's important for loader stubs which want to prevent a child from disabling the mitigation again. Will also be used for seccomp(). The extra state preserving of the prctl state for SSB is a preparatory step for EBPF dymanic speculation control. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-03proc: Provide details on speculation flaw mitigationsKees Cook
As done with seccomp and no_new_privs, also show speculation flaw mitigation state in /proc/$pid/status. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-04-27Merge tag 'v4.17-rc2' into docs-nextJonathan Corbet
Merge -rc2 to pick up the changes to Documentation/core-api/kernel-api.rst that hit mainline via the networking tree. In their absence, subsequent patches cannot be applied.
2018-04-27docs/admin-guide/mm: start moving here files from Documentation/vmMike Rapoport
Several documents in Documentation/vm fit quite well into the "admin/user guide" category. The documents that don't overload the reader with lots of implementation details and provide coherent description of certain feature can be moved to Documentation/admin-guide/mm. Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2018-04-20proc: fix /proc/loadavg regressionAlexey Dobriyan
Commit 95846ecf9dac ("pid: replace pid bitmap implementation with IDR API") changed last field of /proc/loadavg (last pid allocated) to be off by one: # unshare -p -f --mount-proc cat /proc/loadavg 0.00 0.00 0.00 1/60 2 <=== It should be 1 after first fork into pid namespace. This is formally a regression but given how useless this field is I don't think anyone is affected. Bug was found by /proc testsuite! Link: http://lkml.kernel.org/r/20180413175408.GA27246@avx2 Fixes: 95846ecf9dac508 ("pid: replace pid bitmap implementation with IDR API") Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Gargi Sharma <gs051095@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-20proc: revalidate kernel thread inodes to root:rootAlexey Dobriyan
task_dump_owner() has the following code: mm = task->mm; if (mm) { if (get_dumpable(mm) != SUID_DUMP_USER) { uid = ... } } Check for ->mm is buggy -- kernel thread might be borrowing mm and inode will go to some random uid:gid pair. Link: http://lkml.kernel.org/r/20180412220109.GA20978@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-20mm, pagemap: fix swap offset value for PMD migration entryHuang Ying
The swap offset reported by /proc/<pid>/pagemap may be not correct for PMD migration entries. If addr passed into pagemap_pmd_range() isn't aligned with PMD start address, the swap offset reported doesn't reflect this. And in the loop to report information of each sub-page, the swap offset isn't increased accordingly as that for PFN. This may happen after opening /proc/<pid>/pagemap and seeking to a page whose address doesn't align with a PMD start address. I have verified this with a simple test program. BTW: migration swap entries have PFN information, do we need to restrict whether to show them? [akpm@linux-foundation.org: fix typo, per Huang, Ying] Link: http://lkml.kernel.org/r/20180408033737.10897-1-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrei Vagin <avagin@openvz.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: "Jerome Glisse" <jglisse@redhat.com> Cc: Daniel Colascione <dancol@google.com> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-16Merge branch 'mm-rst' into docs-nextJonathan Corbet
Mike Rapoport says: These patches convert files in Documentation/vm to ReST format, add an initial index and link it to the top level documentation. There are no contents changes in the documentation, except few spelling fixes. The relatively large diffstat stems from the indentation and paragraph wrapping changes. I've tried to keep the formatting as consistent as possible, but I could miss some places that needed markup and add some markup where it was not necessary. [jc: significant conflicts in vm/hmm.rst]
2018-04-16docs/vm: rename documentation files to .rstMike Rapoport
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2018-04-13proc: revalidate misc dentriesAlexey Dobriyan
If module removes proc directory while another process pins it by chdir'ing to it, then subsequent recreation of proc entry and all entries down the tree will not be visible to any process until pinning process unchdir from directory and unpins everything. Steps to reproduce: proc_mkdir("aaa", NULL); proc_create("aaa/bbb", ...); chdir("/proc/aaa"); remove_proc_entry("aaa/bbb", NULL); remove_proc_entry("aaa", NULL); proc_mkdir("aaa", NULL); # inaccessible because "aaa" dentry still points # to the original "aaa". proc_create("aaa/bbb", ...); Fix is to implement ->d_revalidate and ->d_delete. Link: http://lkml.kernel.org/r/20180312201938.GA4871@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11fs/proc/proc_sysctl.c: fix typo in sysctl_check_table_array()Waiman Long
Patch series "ipc: Clamp *mni to the real IPCMNI limit", v3. The sysctl parameters msgmni, shmmni and semmni have an inherent limit of IPC_MNI (32k). However, users may not be aware of that because they can write a value much higher than that without getting any error or notification. Reading the parameters back will show the newly written values which are not real. Enforcing the limit by failing sysctl parameter write, however, can break existing user applications. To address this delemma, a new flags field is introduced into the ctl_table. The value CTL_FLAGS_CLAMP_RANGE can be added to any ctl_table entries to enable a looser range clamping without returning any error. For example, .flags = CTL_FLAGS_CLAMP_RANGE, This flags value are now used for the range checking of shmmni, msgmni and semmni without breaking existing applications. If any out of range value is written to those sysctl parameters, the following warning will be printed instead. Kernel parameter "shmmni" was set out of range [0, 32768], clamped to 32768. Reading the values back will show 32768 instead of some fake values. This patch (of 6): Fix a typo. Link: http://lkml.kernel.org/r/1519926220-7453-2-git-send-email-longman@redhat.com Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Matthew Wilcox <willy@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11proc: use slower rb_first()Alexey Dobriyan
In a typical for /proc "open+read+close" usecase, dentry is looked up successfully on open only to be killed in dput() on close. In fact dentries which aren't /proc/*/... and /proc/sys/* were almost NEVER CACHED. Simple printk in proc_lookup_de() shows that. Now that ->delete hook intelligently picks which dentries should live in dcache and which should not, rbtree caching is not necessary as dcache does it job, at last! As a side effect, struct proc_dir_entry shrinks by one pointer which can go into inline name. Link: http://lkml.kernel.org/r/20180314231032.GA15854@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Davidlohr Bueso <dbueso@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11proc: switch struct proc_dir_entry::count to refcountAlexey Dobriyan
->count is honest reference count unlike ->in_use. Link: http://lkml.kernel.org/r/20180313174550.GA4332@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11proc: reject "." and ".." as filenamesAlexey Dobriyan
Various subsystems can create files and directories in /proc with names directly controlled by userspace. Which means "/", "." and ".." are no-no. "/" split is already taken care of, do the other 2 prohibited names. Link: http://lkml.kernel.org/r/20180310001223.GB12443@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11proc: do mmput ASAP for /proc/*/map_filesAlexey Dobriyan
mm_struct is not needed while printing as all the data was already extracted. Link: http://lkml.kernel.org/r/20180309223120.GC3843@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11proc: faster /proc/cmdlineAlexey Dobriyan
Use seq_puts() and skip format string processing. Link: http://lkml.kernel.org/r/20180309222948.GB3843@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11proc: register filesystem lastAlexey Dobriyan
As soon as register_filesystem() exits, filesystem can be mounted. It is better to present fully operational /proc. Of course it doesn't matter because /proc is not modular but do it anyway. Drop error check, it should be handled by panicking. Link: http://lkml.kernel.org/r/20180309222709.GA3843@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11proc: fix /proc/*/map_files lookup some moreAlexey Dobriyan
I totally forgot that _parse_integer() accepts arbitrary amount of leading zeroes leading to the following lookups: OK # readlink /proc/1/map_files/56427ecba000-56427eddc000 /lib/systemd/systemd bogus # readlink /proc/1/map_files/00000000000056427ecba000-56427eddc000 /lib/systemd/systemd # readlink /proc/1/map_files/56427ecba000-00000000000056427eddc000 /lib/systemd/systemd Link: http://lkml.kernel.org/r/20180303215130.GA23480@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11proc: move "struct proc_dir_entry" into kmem cacheAlexey Dobriyan
"struct proc_dir_entry" is variable sized because of 0-length trailing array for name, however, because of SLAB padding allocations it is possible to make "struct proc_dir_entry" fixed sized and allocate same amount of memory. It buys fine-grained debugging with poisoning and usercopy protection which is not possible with kmalloc-* caches. Currently, on 32-bit 91+ byte allocations go into kmalloc-128 and on 64-bit 147+ byte allocations go to kmalloc-192 anyway. Additional memory is allocated only for 38/46+ byte long names which are rare or may not even exist in the wild. Link: http://lkml.kernel.org/r/20180223205504.GA17139@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11fs/proc/proc_sysctl.c: remove redundant link check in proc_sys_link_fill_cache()Danilo Krummrich
proc_sys_link_fill_cache() does not need to check whether we're called for a link - it's already done by scan(). Link: http://lkml.kernel.org/r/20180228013506.4915-2-danilokrummrich@dk-develop.de Signed-off-by: Danilo Krummrich <danilokrummrich@dk-develop.de> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Luis R . Rodriguez" <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11fs/proc/proc_sysctl.c: fix potential page fault while unregistering sysctl tableDanilo Krummrich
proc_sys_link_fill_cache() does not take currently unregistering sysctl tables into account, which might result into a page fault in sysctl_follow_link() - add a check to fix it. This bug has been present since v3.4. Link: http://lkml.kernel.org/r/20180228013506.4915-1-danilokrummrich@dk-develop.de Fixes: 0e47c99d7fe25 ("sysctl: Replace root_list with links between sysctl_table_sets") Signed-off-by: Danilo Krummrich <danilokrummrich@dk-develop.de> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: "Luis R . Rodriguez" <mcgrof@kernel.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11proc: use set_puts() at /proc/*/wchanAlexey Dobriyan
Link: http://lkml.kernel.org/r/20180217072011.GB16074@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11proc: check permissions earlier for /proc/*/wchanAlexey Dobriyan
get_wchan() accesses stack page before permissions are checked, let's not play this game. Link: http://lkml.kernel.org/r/20180217071923.GA16074@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11proc: replace seq_printf by seq_put_smth to speed up /proc/pid/statusAndrei Vagin
seq_printf() works slower than seq_puts, seq_puts, etc. == test_proc.c int main(int argc, char **argv) { int n, i, fd; char buf[16384]; n = atoi(argv[1]); for (i = 0; i < n; i++) { fd = open(argv[2], O_RDONLY); if (fd < 0) return 1; if (read(fd, buf, sizeof(buf)) <= 0) return 1; close(fd); } return 0; } == $ time ./test_proc 1000000 /proc/1/status == Before path == real 0m5.171s user 0m0.328s sys 0m4.783s == After patch == real 0m4.761s user 0m0.334s sys 0m4.366s Link: http://lkml.kernel.org/r/20180212074931.7227-4-avagin@openvz.org Signed-off-by: Andrei Vagin <avagin@openvz.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11proc: replace seq_printf on seq_putc to speed up /proc/pid/smapsAndrei Vagin
seq_putc() works much faster than seq_printf() == Before patch == $ time python test_smaps.py real 0m3.828s user 0m0.413s sys 0m3.408s == After patch == $ time python test_smaps.py real 0m3.405s user 0m0.401s sys 0m3.003s == Before patch == - 75.51% 4.62% python [kernel.kallsyms] [k] show_smap.isra.33 - 70.88% show_smap.isra.33 + 24.82% seq_put_decimal_ull_aligned + 19.78% __walk_page_range + 12.74% seq_printf + 11.08% show_map_vma.isra.23 + 1.68% seq_puts == After patch == - 69.16% 5.70% python [kernel.kallsyms] [k] show_smap.isra.33 - 63.46% show_smap.isra.33 + 25.98% seq_put_decimal_ull_aligned + 20.90% __walk_page_range + 12.60% show_map_vma.isra.23 1.56% seq_putc + 1.55% seq_puts Link: http://lkml.kernel.org/r/20180212074931.7227-2-avagin@openvz.org Signed-off-by: Andrei Vagin <avagin@openvz.org> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11proc: add seq_put_decimal_ull_width to speed up /proc/pid/smapsAndrei Vagin
seq_put_decimal_ull_w(m, str, val, width) prints a decimal number with a specified minimal field width. It is equivalent of seq_printf(m, "%s%*d", str, width, val), but it works much faster. == test_smaps.py num = 0 with open("/proc/1/smaps") as f: for x in xrange(10000): data = f.read() f.seek(0, 0) == == Before patch == $ time python test_smaps.py real 0m4.593s user 0m0.398s sys 0m4.158s == After patch == $ time python test_smaps.py real 0m3.828s user 0m0.413s sys 0m3.408s $ perf -g record python test_smaps.py == Before patch == - 79.01% 3.36% python [kernel.kallsyms] [k] show_smap.isra.33 - 75.65% show_smap.isra.33 + 48.85% seq_printf + 15.75% __walk_page_range + 9.70% show_map_vma.isra.23 0.61% seq_puts == After patch == - 75.51% 4.62% python [kernel.kallsyms] [k] show_smap.isra.33 - 70.88% show_smap.isra.33 + 24.82% seq_put_decimal_ull_w + 19.78% __walk_page_range + 12.74% seq_printf + 11.08% show_map_vma.isra.23 + 1.68% seq_puts [akpm@linux-foundation.org: fix drivers/of/unittest.c build] Link: http://lkml.kernel.org/r/20180212074931.7227-1-avagin@openvz.org Signed-off-by: Andrei Vagin <avagin@openvz.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11proc: account "struct pde_opener"Alexey Dobriyan
The allocation is persistent in fact as any fool can open a file in /proc and sit on it. Link: http://lkml.kernel.org/r/20180214082409.GC17157@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11proc: move "struct pde_opener" to kmem cacheAlexey Dobriyan
"struct pde_opener" is fixed size and we can have more granular approach to debugging. For those who don't know, per cache SLUB poisoning and red zoning don't work if there is at least one object allocated which is hopeless in case of kmalloc-64 but not in case of standalone cache. Although systemd opens 2 files from the get go, so it is hopeless after all. Link: http://lkml.kernel.org/r/20180214082306.GB17157@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11proc: randomize "struct pde_opener"Alexey Dobriyan
The more the merrier. Link: http://lkml.kernel.org/r/20180214081935.GA17157@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11proc: faster open/close of files without ->release hookAlexey Dobriyan
The whole point of code in fs/proc/inode.c is to make sure ->release hook is called either at close() or at rmmod time. All if it is unnecessary if there is no ->release hook. Save allocation+list manipulations under spinlock in that case. Link: http://lkml.kernel.org/r/20180214063033.GA15579@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>