summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2012-07-30ARM: dma-mapping: fix error path for memory allocation failureMarek Szyprowski
This patch fixes incorrect check in error path. When the allocation of first page fails, the kernel ops appears due to accessing -1 element of the pages array. Reported-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2012-07-30ARM: dma-mapping: add more sanity checks in arm_dma_mmap()Marek Szyprowski
Add some sanity checks and forbid mmaping of buffers into vma areas larger than allocated dma buffer. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2012-07-30ARM: dma-mapping: remove custom consistent dma regionMarek Szyprowski
This patch changes dma-mapping subsystem to use generic vmalloc areas for all consistent dma allocations. This increases the total size limit of the consistent allocations and removes platform hacks and a lot of duplicated code. Atomic allocations are served from special pool preallocated on boot, because vmalloc areas cannot be reliably created in atomic context. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com> Reviewed-by: Minchan Kim <minchan@kernel.org>
2012-07-30mm: vmalloc: use const void * for caller argumentMarek Szyprowski
'const void *' is a safer type for caller function type. This patch updates all references to caller function type. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com> Reviewed-by: Minchan Kim <minchan@kernel.org>
2012-07-30scatterlist: add sg_alloc_table_from_pages functionTomasz Stanislawski
This patch adds a new constructor for an sg table. The table is constructed from an array of struct pages. All contiguous chunks of the pages are merged into a single sg nodes. A user may provide an offset and a size of a buffer if the buffer is not page-aligned. The function is dedicated for DMABUF exporters which often perform conversion from an page array to a scatterlist. Moreover the scatterlist should be squashed in order to save memory and to speed-up the process of DMA mapping using dma_map_sg. The code is based on the patch 'v4l: vb2-dma-contig: add support for scatterlist in userptr mode' and hints from Laurent Pinchart. Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> CC: Andrew Morton <akpm@linux-foundation.org>
2012-07-30mm: Fix build warning in kmem_cache_create()Shuah Khan
The label oops is used in CONFIG_DEBUG_VM ifdef block and is defined outside ifdef CONFIG_DEBUG_VM block. This results in the following build warning when built with CONFIG_DEBUG_VM disabled. Fix to move label oops definition to inside a CONFIG_DEBUG_VM block. mm/slab_common.c: In function ‘kmem_cache_create’: mm/slab_common.c:101:1: warning: label ‘oops’ defined but not used [-Wunused-label] Signed-off-by: Shuah Khan <shuah.khan@hp.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-07-30hwmon: struct x86_cpu_id arrays can be __initconstJan Beulich
... as being referenced from __init code only. Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-30uprobes: __replace_page() needs munlock_vma_page()Oleg Nesterov
Like do_wp_page(), __replace_page() should do munlock_vma_page() for the case when the old page still has other !VM_LOCKED mappings. Unfortunately this needs mm/internal.h. Also, move put_page() outside of ptl lock. This doesn't really matter but looks a bit better. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120729182249.GA20372@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-30uprobes: Rename vma_address() and make it return "unsigned long"Oleg Nesterov
1. vma_address() returns loff_t, this looks confusing and this is unnecessary after the previous change. Make it return "ulong", all callers truncate the result anyway. 2. Its name conflicts with mm/rmap.c:vma_address(), rename it to offset_to_vaddr(), this matches vaddr_to_offset(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120729182247.GA20365@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-30uprobes: Fix register_for_each_vma()->vma_address() checkOleg Nesterov
1. register_for_each_vma() checks that vma_address() == vaddr, but this is not enough. We should also ensure that vaddr >= vm_start, find_vma() guarantees "vaddr < vm_end" only. 2. After the prevous changes, register_for_each_vma() is the only reason why vma_address() has to return loff_t, all other users know that we have the valid mapping at this offset and thus the overflow is not possible. Change the code to use vaddr_to_offset() instead, imho this looks more clean/understandable and now we can change vma_address(). 3. While at it, remove the unnecessary type-cast. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120729182244.GA20362@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-30uprobes: Introduce vaddr_to_offset(vma, vaddr)Oleg Nesterov
Add the new helper, vaddr_to_offset(vma, vaddr) which returns the offset in vma->vm_file this vaddr is mapped at. Change build_probe_list() and find_active_uprobe() to use the new helper, the next patch adds another user. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120729182242.GA20355@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-30uprobes: Teach build_probe_list() to consider the rangeOleg Nesterov
Currently build_probe_list() builds the list of all uprobes attached to the given inode, and the caller should filter out those who don't fall into the [start,end) range, this is sub-optimal. This patch turns find_least_offset_node() into find_node_in_range() which returns the first node inside the [min,max] range, and changes build_probe_list() to use this node as a starting point for rb_prev() and rb_next() to find all other nodes the caller needs. The resulting list is no longer sorted but we do not care. This can speed up both build_probe_list() and the callers, but there is another reason to introduce find_node_in_range(). It can be used to figure out whether the given vma has uprobes or not, this will be needed soon. While at it, shift INIT_LIST_HEAD(tmp_list) into build_probe_list(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120729182240.GA20352@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-30uprobes: Remove insert_vm_struct()->uprobe_mmap()Oleg Nesterov
Remove insert_vm_struct()->uprobe_mmap(). It is not needed, nobody except arch/ia64/kernel/perfmon.c uses insert_vm_struct(vma) with vma->vm_file != NULL. And it is wrong. Again, get_user_pages() can not succeed before vma_link(vma) makes is visible to find_vma(). And even if this worked, we must not insert the new bp before this mapping is visible to vma_prio_tree_foreach() for uprobe_unregister(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120729182238.GA20349@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-30uprobes: Remove copy_vma()->uprobe_mmap()Oleg Nesterov
Remove copy_vma()->uprobe_mmap(new_vma), it is absolutely wrong. This new_vma was just initialized to represent the new unmapped area, [vm_start, vm_end) was returned by get_unmapped_area() in the caller. This means that uprobe_mmap()->get_user_pages() will fail for sure, simply because find_vma() can never succeed. And I verified that sys_mremap()->mremap_to() indeed always fails with the wrong ENOMEM code if [addr, addr+old_len] is probed. And why this uprobe_mmap() was added? I believe the intent was wrong. Note that the caller is going to do move_page_tables(), all registered uprobes are already faulted in, we only change the virtual addresses. NOTE: However, somehow we need to close the race with uprobe_register() which relies on map_info->vaddr. This needs another fix I'll try to do later. Probably we need uprobe_mmap() in move_vma() but we can not do this right now, this can confuse uprobes_state.counter (which I still hope we are going to kill). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120729182236.GA20342@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-30uprobes: Fix overflow in vma_address()/find_active_uprobe()Oleg Nesterov
vma->vm_pgoff is "unsigned long", it should be promoted to loff_t before the multiplication to avoid the overflow. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120729182233.GA20339@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-30uprobes: Suppress uprobe_munmap() from mmput()Oleg Nesterov
uprobe_munmap() does get_user_pages() and it is also called from the final mmput()->exit_mmap() path. This slows down exit/mmput() for no reason, and I think it is simply dangerous/wrong to try to fault-in a page into the dying mm. If nothing else, this happens after the last sync_mm_rss(), afaics handle_mm_fault() can change the task->rss_stat and make the subsequent check_mm() unhappy. Change uprobe_munmap() to check mm->mm_users != 0. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120729182231.GA20336@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-30uprobes: Uprobe_mmap/munmap needs list_for_each_entry_safe()Oleg Nesterov
The bug was introduced by me in 449d0d7c ("uprobes: Simplify the usage of uprobe->pending_list"). Yes, we do not care about uprobe->pending_list after return and nobody can remove the current list entry, but put_uprobe(uprobe) can actually free it and thus we need list_for_each_safe(). Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com> Cc: Anton Arapov <anton@redhat.com> Link: http://lkml.kernel.org/r/20120729182229.GA20329@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-30uprobes: Clean up and document write_opcode()->lock_page(old_page)Oleg Nesterov
The comment above write_opcode()->lock_page(old_page) tells about the race with do_wp_page(). I don't really understand which exactly race it means, but afaics this lock_page() was not enough to close all races with do_wp_page(). Anyway, since: 77fc4af1b59d uprobes: Change register_for_each_vma() to take mm->mmap_sem for writing this code is always called with ->mmap_sem held for writing, so we can forget about do_wp_page(). However, we can't simply remove this lock_page(), and the only (afaics) reason is __replace_page()->try_to_free_swap(). Nothing in write_opcode() needs it, move it into __replace_page() and fix the comment. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120729182220.GA20322@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-30uprobes: Kill write_opcode()->lock_page(new_page)Oleg Nesterov
write_opcode() does lock_page(new_page) for no reason. Nobody can see this page until __replace_page() exposes it under ptl lock, and we do nothing with this page after pte_unmap_unlock(). If nothing else, the similar code in do_wp_page() doesn't lock the new page for page_add_new_anon_rmap/set_pte_at_notify. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120729182218.GA20315@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-30uprobes: __replace_page() should not use page_address_in_vma()Oleg Nesterov
page_address_in_vma(old_page) in __replace_page() is ugly and wrong. The caller already knows the correct virtual address, this page was found by get_user_pages(vaddr). However, page_address_in_vma() can actually fail if page->mapping was cleared by __delete_from_page_cache() after get_user_pages() returns. But this means the race with page reclaim, write_opcode() should not fail, it should retry and read this page again. Probably the race with remove_mapping() is not possible due to page_freeze_refs() logic, but afaics at least shmem_writepage()->shmem_delete_from_page_cache() can clear ->mapping. We could change __replace_page() to return -EAGAIN in this case, but it would be better to simply use the caller's vaddr and rely on page_check_address(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120729182216.GA20311@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-30uprobes: Don't recheck vma/f_mapping in write_opcode()Oleg Nesterov
write_opcode() rechecks valid_vma() and ->f_mapping, this is pointless. The caller, register_for_each_vma() or uprobe_mmap(), has already done these checks under mmap_sem. To clarify, uprobe_mmap() checks valid_vma() only, but we can rely on build_probe_list(vm_file->f_mapping->host). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120729182212.GA20304@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-30s390: make use of user_mode() macro where possibleHeiko Carstens
We use the user_mode() helper already at several places but also have the open coded variant at other places. Convert the code to always use the helper function. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-07-30s390/mm: rename user_mode variable to addressing_modeHeiko Carstens
Fix name clash with user_mode() define which is also used in common code. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-07-30s390/mm: fix fault handling for page table walk caseHeiko Carstens
Make sure the kernel does not incorrectly create a SIGBUS signal during user space accesses: For user space accesses in the switched addressing mode case the kernel may walk page tables and access user address space via the kernel mapping. If a page table entry is invalid the function __handle_fault() gets called in order to emulate a page fault and trigger all the usual actions like paging in a missing page etc. by calling handle_mm_fault(). If handle_mm_fault() returns with an error fixup handling is necessary. For the switched addressing mode case all errors need to be mapped to -EFAULT, so that the calling uaccess function can return -EFAULT to user space. Unfortunately the __handle_fault() incorrectly calls do_sigbus() if VM_FAULT_SIGBUS is set. This however should only happen if a page fault was triggered by a user space instruction. For kernel mode uaccesses the correct action is to only return -EFAULT. So user space may incorrectly see SIGBUS signals because of this bug. For current machines this would only be possible for the switched addressing mode case in conjunction with futex operations. Cc: stable@vger.kernel.org Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-07-30s390/mm: make page faults killableHeiko Carstens
This is the s390 variant of 37b23e05 "x86,mm: make pagefault killable". Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-07-30ALSA: es1688 - freeup resources on init failureFengguang Wu
This will fix the following oops: [ 6.169981] genirq: Flags mismatch irq 5. 00000000 (ES1688) vs. 00000000 (ES1688) [ 6.170851] Pid: 1, comm: swapper Not tainted 3.5.0-00004-gceee0e9 #14 [ 6.170851] Call Trace: [ 6.170851] [<c1062237>] ? __setup_irq+0x3c7/0x420 [ 6.170851] [<c1062486>] ? request_threaded_irq+0x76/0x140 [ 6.170851] [<c1290220>] ? snd_es1688_ioctl+0x10/0x10 [ 6.170851] [<c10624c2>] ? request_threaded_irq+0xb2/0x140 [ 6.170851] [<c1291196>] ? snd_es1688_create+0x96/0x330 [ 6.170851] [<c138365d>] ? snd_gusextreme_probe+0x18d/0x5a2 [ 6.170851] [<c11c9d80>] ? __driver_attach+0x80/0x80 [ 6.170851] [<c10db22f>] ? sysfs_create_link+0xf/0x20 [ 6.170851] [<c11c9d80>] ? __driver_attach+0x80/0x80 [ 6.170851] [<c11d1502>] ? isa_bus_probe+0x12/0x20 [ 6.170851] [<c11c9b95>] ? driver_probe_device+0x55/0x1c0 [ 6.170851] [<c13ae04f>] ? _raw_spin_unlock+0xf/0x30 [ 6.170851] [<c13705ea>] ? klist_next+0x6a/0xe0 [ 6.170851] [<c11d15c1>] ? isa_bus_match+0x21/0x40 [ 6.170851] [<c11c8a24>] ? bus_for_each_drv+0x34/0x70 [ 6.170851] [<c11c9e4b>] ? device_attach+0x7b/0x90 [ 6.170851] [<c11c9d80>] ? __driver_attach+0x80/0x80 [ 6.170851] [<c11c8bff>] ? bus_probe_device+0x5f/0x80 [ 6.170851] [<c11c7493>] ? device_add+0x573/0x620 [ 6.170851] [<c1042820>] ? complete_all+0x40/0x60 [ 6.170851] [<c13ae08a>] ? _raw_spin_unlock_irqrestore+0x1a/0x30 [ 6.170851] [<c11d16c6>] ? isa_register_driver+0xb6/0x150 [ 6.170851] [<c15c9002>] ? alsa_card_gusmax_init+0xf/0xf [ 6.170851] [<c15a99bc>] ? do_one_initcall+0x7f/0x12b [ 6.170851] [<c15a9b7a>] ? kernel_init+0x112/0x1a9 [ 6.170851] [<c15a9423>] ? do_early_param+0x77/0x77 [ 6.170851] [<c15a9a68>] ? do_one_initcall+0x12b/0x12b [ 6.170851] [<c13aefbe>] ? kernel_thread_helper+0x6/0xd [ 6.190170] es1688: can't grab IRQ 5 [ 6.190613] genirq: Flags mismatch irq 5. 00000000 (ES1688) vs. 00000000 (ES1688) [ 6.191566] Pid: 1, comm: swapper Not tainted 3.5.0-00004-gceee0e9 #14 [ 6.192394] Call Trace: [ 6.192685] [<c1062237>] ? __setup_irq+0x3c7/0x420 [ 6.193342] [<c1062486>] ? request_threaded_irq+0x76/0x140 [ 6.194081] [<c1290220>] ? snd_es1688_ioctl+0x10/0x10 [ 6.194607] [<c10624c2>] ? request_threaded_irq+0xb2/0x140 [ 6.194607] [<c1291196>] ? snd_es1688_create+0x96/0x330 [ 6.194607] [<c138365d>] ? snd_gusextreme_probe+0x18d/0x5a2 [ 6.194607] [<c11c9d80>] ? __driver_attach+0x80/0x80 [ 6.194607] [<c10db22f>] ? sysfs_create_link+0xf/0x20 [ 6.194607] [<c11c9d80>] ? __driver_attach+0x80/0x80 [ 6.194607] [<c11d1502>] ? isa_bus_probe+0x12/0x20 [ 6.194607] [<c11c9b95>] ? driver_probe_device+0x55/0x1c0 [ 6.194607] [<c13ae04f>] ? _raw_spin_unlock+0xf/0x30 [ 6.194607] [<c13705ea>] ? klist_next+0x6a/0xe0 [ 6.194607] [<c11d15c1>] ? isa_bus_match+0x21/0x40 [ 6.194607] [<c11c8a24>] ? bus_for_each_drv+0x34/0x70 [ 6.194607] [<c11c9e4b>] ? device_attach+0x7b/0x90 [ 6.194607] [<c11c9d80>] ? __driver_attach+0x80/0x80 [ 6.194607] [<c11c8bff>] ? bus_probe_device+0x5f/0x80 [ 6.194607] [<c11c7493>] ? device_add+0x573/0x620 [ 6.194607] [<c1042820>] ? complete_all+0x40/0x60 [ 6.194607] [<c13ae08a>] ? _raw_spin_unlock_irqrestore+0x1a/0x30 [ 6.194607] [<c11d16c6>] ? isa_register_driver+0xb6/0x150 [ 6.194607] [<c15c9002>] ? alsa_card_gusmax_init+0xf/0xf [ 6.194607] [<c15a99bc>] ? do_one_initcall+0x7f/0x12b [ 6.194607] [<c15a9b7a>] ? kernel_init+0x112/0x1a9 [ 6.194607] [<c15a9423>] ? do_early_param+0x77/0x77 [ 6.194607] [<c15a9a68>] ? do_one_initcall+0x12b/0x12b [ 6.194607] [<c13aefbe>] ? kernel_thread_helper+0x6/0xd [ 6.210779] es1688: can't grab IRQ 5 [ 6.211305] gusextreme: probe of gusextreme.0 failed with error -16 Signed-off-by: Daniel Mack <zonque@gmail.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2012-07-30fix O_EXCL handling for devicesAl Viro
O_EXCL without O_CREAT has different semantics; it's "fail if already opened", not "fail if already exists". commit 71574865 broke that... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-30Merge branch 'for-jens' of git://git.drbd.org/linux-drbd into for-3.6/driversJens Axboe
2012-07-29net/tun: fix ioctl() based info leaksMathias Krause
The tun module leaks up to 36 bytes of memory by not fully initializing a structure located on the stack that gets copied to user memory by the TUNGETIFF and SIOCGIFHWADDR ioctl()s. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-29tg3: Update version to 3.124Michael Chan
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-29tg3: Fix race condition in tg3_get_stats64()Michael Chan
Spinlock should be taken before checking for tp->hw_stats. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-29tg3: Add New 5719 Read DMA workaroundMichael Chan
After Power-on-reset, the 5719's TX DMA length registers may contain uninitialized values and cause TX DMA to stall. Check for invalid values and set a register bit to flush the TX channels. The bit needs to be turned off after the DMA channels have been flushed. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-29tg3: Fix Read DMA workaround for 5719 A0.Michael Chan
The workaround was mis-applied to all 5719 and 5720 chips. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-29tg3: Request APE_LOCK_PHY before PHY accessMichael Chan
to prevent PHY access conflict with APE firmware. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-29ipv6: fix incorrect route 'expires' value passed to userspaceLi Wei
When userspace use RTM_GETROUTE to dump route table, with an already expired route entry, we always got an 'expires' value(2147157) calculated base on INT_MAX. The reason of this problem is in the following satement: rt->dst.expires - jiffies < INT_MAX gcc promoted the type of both sides of '<' to unsigned long, thus a small negative value would be considered greater than INT_MAX. With the help of Eric Dumazet, do the out of bound checks in rtnl_put_cacheinfo(), _after_ conversion to clock_t. Signed-off-by: Li Wei <lw@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-29mISDN: Bugfix only few bytes are transfered on a connectionKarsten Keil
The test for the fillempty condition was wrong in one place. Changed the variable to the right boolean type. Signed-off-by: Karsten Keil <keil@b1-systems.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-29seeq: use PTR_RET at init_module of driverDevendra Naga
the driver sees wether the dev_seeq pointer is having a error that can be read by using the PTR_ERR, and returns it at error case, other wise 0 at success case. the PTR_RET does the same thing, and use PTR_RET instead of redoing the code of PTR_RET Signed-off-by: Devendra Naga <develkernel412222@gmail.com> Acked-by: David Howells <dhowells@redhat.com> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-29bnx2x: remove cast around the kmalloc in bnx2x_prev_mark_pathDevendra Naga
casting the void pointer is redundant (Documentation/CodingStyle) Signed-off-by: Devendra Naga <develkernel412222@gmail.com> Acked-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-29ipv4: clean up put_childLin Ming
The first parameter struct trie *t is not used anymore. Remove it. Signed-off-by: Lin Ming <mlin@ss.pku.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-29ipv4: fix debug info in tnode_newLin Ming
It should print size of struct rt_trie_node * allocated instead of size of struct rt_trie_node. Signed-off-by: Lin Ming <mlin@ss.pku.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-29qlge: Add offload features to vlan interfacesbrenohl@br.ibm.com
This patch fills the net_device vlan_features with the proper hardware features, thus, improving the vlan interface performance. With the patch applied, I can see around 148% improvement on a TCP_STREAM test, from 3.5 Gb/s to 8.7 Gb/s. On TCP_RR, I see a 11% improvement, from 18k to 20. The CPU utilization is almost the same on both cases, from the comparison above. Signed-off-by: Breno Leitao <brenohl@br.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-30selinux: fix selinux_inode_setxattr oopsAl Viro
OK, what we have so far is e.g. setxattr(path, name, whatever, 0, XATTR_REPLACE) with name being good enough to get through xattr_permission(). Then we reach security_inode_setxattr() with the desired value and size. Aha. name should begin with "security.selinux", or we won't get that far in selinux_inode_setxattr(). Suppose we got there and have enough permissions to relabel that sucker. We call security_context_to_sid() with value == NULL, size == 0. OK, we want ss_initialized to be non-zero. I.e. after everything had been set up and running. No problem... We do 1-byte kmalloc(), zero-length memcpy() (which doesn't oops, even thought the source is NULL) and put a NUL there. I.e. form an empty string. string_to_context_struct() is called and looks for the first ':' in there. Not found, -EINVAL we get. OK, security_context_to_sid_core() has rc == -EINVAL, force == 0, so it silently returns -EINVAL. All it takes now is not having CAP_MAC_ADMIN and we are fucked. All right, it might be a different bug (modulo strange code quoted in the report), but it's real. Easily fixed, AFAICS: Deal with size == 0, value == NULL case in selinux_inode_setxattr() Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Dave Jones <davej@redhat.com> Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-07-29Merge branch 'next' into for-linusDmitry Torokhov
Prepare second set of changes for 3.6 merge window.
2012-07-30KEYS: linux/key-type.h needs linux/errno.hDavid Howells
linux/key-type.h needs to #include linux/errno.h as it refers to ENOKEY. Without this, with sparc's allmodconfig in one of my test trees, the following error occurs: include/linux/key-type.h: In function 'key_negate_and_link': include/linux/key-type.h:122:43: error: 'ENOKEY' undeclared (first use in this function) include/linux/key-type.h:122:43: note: each undeclared identifier is reported only once for each fun Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-07-30smack: off by one errorAlan Cox
Consider the input case of a rule that consists entirely of non space symbols followed by a \0. Say 64 + \0 In this case strlen(data) = 64 kzalloc of subject and object are 64 byte objects sscanfdata, "%s %s %s", subject, ...) will put 65 bytes into subject. Signed-off-by: Alan Cox <alan@linux.intel.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Cc: stable@vger.kernel.org Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-07-30virtio-blk: return VIRTIO_BLK_F_FLUSH to header.Rusty Russell
This got renamed and clarified, but let's not break any userspace out there. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-07-30virtio-blk: allow toggling host cache between writeback and writethroughPaolo Bonzini
This patch adds support for the new VIRTIO_BLK_F_CONFIG_WCE feature, which exposes the cache mode in the configuration space and lets the driver modify it. The cache mode is exposed via sysfs. Even if the host does not support the new feature, the cache mode is visible (thanks to the existing VIRTIO_BLK_F_WCE), but not modifiable. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-07-30virtio-blk: Use block layer provided spinlockAsias He
Block layer will allocate a spinlock for the queue if the driver does not provide one in blk_init_queue(). The reason to use the internal spinlock is that blk_cleanup_queue() will switch to use the internal spinlock in the cleanup code path. if (q->queue_lock != &q->__queue_lock) q->queue_lock = &q->__queue_lock; However, processes which are in D state might have taken the driver provided spinlock, when the processes wake up, they would release the block provided spinlock. ===================================== [ BUG: bad unlock balance detected! ] 3.4.0-rc7+ #238 Not tainted ------------------------------------- fio/3587 is trying to release lock (&(&q->__queue_lock)->rlock) at: [<ffffffff813274d2>] blk_queue_bio+0x2a2/0x380 but there are no more locks to release! other info that might help us debug this: 1 lock held by fio/3587: #0: (&(&vblk->lock)->rlock){......}, at: [<ffffffff8132661a>] get_request_wait+0x19a/0x250 Other drivers use block layer provided spinlock as well, e.g. SCSI. Switching to the block layer provided spinlock saves a bit of memory and does not increase lock contention. Performance test shows no real difference is observed before and after this patch. Changes in v2: Improve commit log as Michael suggested. Cc: virtualization@lists.linux-foundation.org Cc: kvm@vger.kernel.org Cc: stable@kernel.org Signed-off-by: Asias He <asias@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-07-30virtio-blk: Reset device after blk_cleanup_queue()Asias He
blk_cleanup_queue() will call blk_drian_queue() to drain all the requests before queue DEAD marking. If we reset the device before blk_cleanup_queue() the drain would fail. 1) if the queue is stopped in do_virtblk_request() because device is full, the q->request_fn() will not be called. blk_drain_queue() { while(true) { ... if (!list_empty(&q->queue_head)) __blk_run_queue(q) { if (queue is not stoped) q->request_fn() } ... } } Do no reset the device before blk_cleanup_queue() gives the chance to start the queue in interrupt handler blk_done(). 2) In commit b79d866c8b7014a51f611a64c40546109beaf24a, We abort requests dispatched to driver before blk_cleanup_queue(). There is a race if requests are dispatched to driver after the abort and before the queue DEAD mark. To fix this, instead of aborting the requests explicitly, we can just reset the device after after blk_cleanup_queue so that the device can complete all the requests before queue DEAD marking in the drain process. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: virtualization@lists.linux-foundation.org Cc: kvm@vger.kernel.org Cc: stable@kernel.org Signed-off-by: Asias He <asias@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-07-30virtio-blk: Call del_gendisk() before disable guest kickAsias He
del_gendisk() might not return due to failing to remove the /sys/block/vda/serial sysfs entry when another thread (udev) is trying to read it. virtblk_remove() vdev->config->reset() : guest will not kick us through interrupt del_gendisk() device_del() kobject_del(): got stuck, sysfs entry ref count non zero sysfs_open_file(): user space process read /sys/block/vda/serial sysfs_get_active() : got sysfs entry ref count dev_attr_show() virtblk_serial_show() blk_execute_rq() : got stuck, interrupt is disabled request cannot be finished This patch fixes it by calling del_gendisk() before we disable guest's interrupt so that the request sent in virtblk_serial_show() will be finished and del_gendisk() will success. This fixes another race in hot-unplug process. It is save to call del_gendisk(vblk->disk) before flush_work(&vblk->config_work) which might access vblk->disk, because vblk->disk is not freed until put_disk(vblk->disk). Cc: virtualization@lists.linux-foundation.org Cc: kvm@vger.kernel.org Cc: stable@kernel.org Signed-off-by: Asias He <asias@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>