summaryrefslogtreecommitdiff
path: root/drivers/net/bonding/bond_main.c
AgeCommit message (Collapse)Author
2013-10-25net: fix rtnl notification in atomic contextAlexei Starovoitov
commit 991fb3f74c "dev: always advertise rx_flags changes via netlink" introduced rtnl notification from __dev_set_promiscuity(), which can be called in atomic context. Steps to reproduce: ip tuntap add dev tap1 mode tap ifconfig tap1 up tcpdump -nei tap1 & ip tuntap del dev tap1 mode tap [ 271.627994] device tap1 left promiscuous mode [ 271.639897] BUG: sleeping function called from invalid context at mm/slub.c:940 [ 271.664491] in_atomic(): 1, irqs_disabled(): 0, pid: 3394, name: ip [ 271.677525] INFO: lockdep is turned off. [ 271.690503] CPU: 0 PID: 3394 Comm: ip Tainted: G W 3.12.0-rc3+ #73 [ 271.703996] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012 [ 271.731254] ffffffff81a58506 ffff8807f0d57a58 ffffffff817544e5 ffff88082fa0f428 [ 271.760261] ffff8808071f5f40 ffff8807f0d57a88 ffffffff8108bad1 ffffffff81110ff8 [ 271.790683] 0000000000000010 00000000000000d0 00000000000000d0 ffff8807f0d57af8 [ 271.822332] Call Trace: [ 271.838234] [<ffffffff817544e5>] dump_stack+0x55/0x76 [ 271.854446] [<ffffffff8108bad1>] __might_sleep+0x181/0x240 [ 271.870836] [<ffffffff81110ff8>] ? rcu_irq_exit+0x68/0xb0 [ 271.887076] [<ffffffff811a80be>] kmem_cache_alloc_node+0x4e/0x2a0 [ 271.903368] [<ffffffff810b4ddc>] ? vprintk_emit+0x1dc/0x5a0 [ 271.919716] [<ffffffff81614d67>] ? __alloc_skb+0x57/0x2a0 [ 271.936088] [<ffffffff810b4de0>] ? vprintk_emit+0x1e0/0x5a0 [ 271.952504] [<ffffffff81614d67>] __alloc_skb+0x57/0x2a0 [ 271.968902] [<ffffffff8163a0b2>] rtmsg_ifinfo+0x52/0x100 [ 271.985302] [<ffffffff8162ac6d>] __dev_notify_flags+0xad/0xc0 [ 272.001642] [<ffffffff8162ad0c>] __dev_set_promiscuity+0x8c/0x1c0 [ 272.017917] [<ffffffff81731ea5>] ? packet_notifier+0x5/0x380 [ 272.033961] [<ffffffff8162b109>] dev_set_promiscuity+0x29/0x50 [ 272.049855] [<ffffffff8172e937>] packet_dev_mc+0x87/0xc0 [ 272.065494] [<ffffffff81732052>] packet_notifier+0x1b2/0x380 [ 272.080915] [<ffffffff81731ea5>] ? packet_notifier+0x5/0x380 [ 272.096009] [<ffffffff81761c66>] notifier_call_chain+0x66/0x150 [ 272.110803] [<ffffffff8108503e>] __raw_notifier_call_chain+0xe/0x10 [ 272.125468] [<ffffffff81085056>] raw_notifier_call_chain+0x16/0x20 [ 272.139984] [<ffffffff81620190>] call_netdevice_notifiers_info+0x40/0x70 [ 272.154523] [<ffffffff816201d6>] call_netdevice_notifiers+0x16/0x20 [ 272.168552] [<ffffffff816224c5>] rollback_registered_many+0x145/0x240 [ 272.182263] [<ffffffff81622641>] rollback_registered+0x31/0x40 [ 272.195369] [<ffffffff816229c8>] unregister_netdevice_queue+0x58/0x90 [ 272.208230] [<ffffffff81547ca0>] __tun_detach+0x140/0x340 [ 272.220686] [<ffffffff81547ed6>] tun_chr_close+0x36/0x60 Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-22bonding: move bond-specific init after enslave happensVeaceslav Falico
As Jiri noted, currently we first do all bonding-specific initialization (specifically - bond_select_active_slave(bond)) before we actually attach the slave (so that it becomes visible through bond_for_each_slave() and friends). This might result in bond_select_active_slave() not seeing the first/new slave and, thus, not actually selecting an active slave. Fix this by moving all the bond-related init part after we've actually completely initialized and linked (via bond_master_upper_dev_link()) the new slave. Also, remove the bond_(de/a)ttach_slave(), it's useless to have functions to ++/-- one int. After this we have all the initialization of the new slave *before* linking, and all the stuff that needs to be done on bonding *after* it. It has also a bonus effect - we can remove the locking on the new slave init completely, and only use it for bond_select_active_slave(). Reported-by: Jiri Pirko <jiri@resnulli.us> CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Acked-by: Ding Tianhong@huawei.com Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19bonding: remove bond_ioctl_change_active()Jiri Pirko
no longer needed since bond_option_active_slave_set() can be used instead. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19bonding: push Netlink bits into separate fileJiri Pirko
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-03bonding: modify the old and add new xmit hash policiesNikolay Aleksandrov
This patch adds two new hash policy modes which use skb_flow_dissect: 3 - Encapsulated layer 2+3 4 - Encapsulated layer 3+4 There should be a good improvement for tunnel users in those modes. It also changes the old hash functions to: hash ^= (__force u32)flow.dst ^ (__force u32)flow.src; hash ^= (hash >> 16); hash ^= (hash >> 8); Where hash will be initialized either to L2 hash, that is SRCMAC[5] XOR DSTMAC[5], or to flow->ports which should be extracted from the upper layer. Flow's dst and src are also extracted based on the xmit policy either directly from the buffer or by using skb_flow_dissect, but in both cases if the protocol is IPv6 then dst and src are obtained by ipv6_addr_hash() on the real addresses. In case of a non-dissectable packet, the algorithms fall back to L2 hashing. The bond_set_mode_ops() function is now obsolete and thus deleted because it was used only to set the proper hash policy. Also we trim a pointer from struct bonding because we no longer need to keep the hash function, now there's only a single hash function - bond_xmit_hash that works based on bond->params.xmit_policy. The hash function and skb_flow_dissect were suggested by Eric Dumazet. The layer names were suggested by Andy Gospodarek, because I suck at semantics. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/emulex/benet/be.h drivers/net/usb/qmi_wwan.c drivers/net/wireless/brcm80211/brcmfmac/dhd_bus.h include/net/netfilter/nf_conntrack_synproxy.h include/net/secure_seq.h The conflicts are of two varieties: 1) Conflicts with Joe Perches's 'extern' removal from header file function declarations. Usually it's an argument signature change or a function being added/removed. The resolutions are trivial. 2) Some overlapping changes in qmi_wwan.c and be.h, one commit adds a new value, another changes an existing value. That sort of thing. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30bonding: RCUify bond_set_rx_mode()Veaceslav Falico
Currently we rely on rtnl locking in bond_set_rx_mode(), however it's not always the case: RTNL: assertion failed at drivers/net/bonding/bond_main.c (3391) ... [<ffffffff81651ca5>] dump_stack+0x54/0x74 [<ffffffffa029e717>] bond_set_rx_mode+0xc7/0xd0 [bonding] [<ffffffff81553af7>] __dev_set_rx_mode+0x57/0xa0 [<ffffffff81557ff8>] __dev_mc_add+0x58/0x70 [<ffffffff81558020>] dev_mc_add+0x10/0x20 [<ffffffff8161e26e>] igmp6_group_added+0x18e/0x1d0 [<ffffffff81186f76>] ? kmem_cache_alloc_trace+0x236/0x260 [<ffffffff8161f80f>] ipv6_dev_mc_inc+0x29f/0x320 [<ffffffff8161f9e7>] ipv6_sock_mc_join+0x157/0x260 ... Fix this by using RCU primitives. Reported-by: Joe Lawrence <joe.lawrence@stratus.com> Tested-by: Joe Lawrence <joe.lawrence@stratus.com> CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30bonding: Fix broken promiscuity reference counting issueNeil Horman
Recently grabbed this report: https://bugzilla.redhat.com/show_bug.cgi?id=1005567 Of an issue in which the bonding driver, with an attached vlan encountered the following errors when bond0 was taken down and back up: dummy1: promiscuity touches roof, set promiscuity failed. promiscuity feature of device might be broken. The error occurs because, during __bond_release_one, if we release our last slave, we take on a random mac address and issue a NETDEV_CHANGEADDR notification. With an attached vlan, the vlan may see that the vlan and bond mac address were in sync, but no longer are. This triggers a call to dev_uc_add and dev_set_rx_mode, which enables IFF_PROMISC on the bond device. Then, when we complete __bond_release_one, we use the current state of the bond flags to determine if we should decrement the promiscuity of the releasing slave. But since the bond changed promiscuity state during the release operation, we incorrectly decrement the slave promisc count when it wasn't in promiscuous mode to begin with, causing the above error Fix is pretty simple, just cache the bonding flags at the start of the function and use those when determining the need to set promiscuity. This is also needed for the ALLMULTI flag CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: Mark Wu <wudxw@linux.vnet.ibm.com> CC: "David S. Miller" <davem@davemloft.net> Reported-by: Mark Wu <wudxw@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-28bonding: correctly verify for the first slave in bond_enslaveVeaceslav Falico
After commit 1f718f0f4f97145f4072d2d72dcf85069ca7226d ("bonding: populate neighbour's private on enslave"), we've moved the actual 'linking' in the end of the function - so that, once linked, the slave is ready to be used, and is not still in the process of enslaving. However, 802.3ad verified if it's the first slave by looking at the if (bond_first_slave(bond) == new_slave) which, because we've moved the linking to the end, became broken - on the first slave bond_first_slave(bond) returns NULL. Fix this by verifying if the prev_slave, that equals bond_last_slave(), is actually populated - if it is - then it's not the first slave, and vice versa. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-26net: create sysfs symlinks for neighbour devicesVeaceslav Falico
Also, remove the same functionality from bonding - it will be already done for any device that links to its lower/upper neighbour. The links will be created for dev's kobject, and will look like lower_eth0 for lower device eth0 and upper_bridge0 for upper device bridge0. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-26bonding: remove slave listsVeaceslav Falico
And all the initialization. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-26bonding: remove bond_prev_slave()Veaceslav Falico
We don't really need it, and it's really hard to RCUify the list->prev. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-26bonding: add bond_has_slaves() and use itVeaceslav Falico
Currently we verify if we have slaves by checking if bond->slave_list is empty. Create a define bond_has_slaves() and use it, a bit more readable and easier to change in the future. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-26bonding: rework bond_ab_arp_probe() to use bond_for_each_slave()Veaceslav Falico
Currently it uses the hard-to-rcuify bond_for_each_slave_from(), and also it doesn't check every slave for disrepencies between the actual IS_UP(slave) and the slave->link == BOND_LINK_UP, but only till we find the next suitable slave. Fix this by using bond_for_each_slave() and storing the first good slave in *before till we find the current_arp_slave, after that we store the first good slave in new_slave. If new_slave is empty - use the slave stored in before, and if it's also empty - then we didn't find any suitable slave. Also, in the meanwhile, check for each slave status. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-26bonding: rework bond_find_best_slave() to use bond_for_each_slave()Veaceslav Falico
bond_find_best_slave() does not have to be balanced - i.e. return the slave that is *after* some other slave, but rather return the best slave that suits, except of bond->primary_slave - in which case we just return it if it's suitable. After that we just look through all the slaves and return either first up slave or the slave whose link came back earliest. We also don't care about curr_active_slave lock cause we use it in bond_should_change_active() only and there we take it right away - i.e. it won't go away. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-26bonding: use bond_for_each_slave() in bond_uninit()Veaceslav Falico
We're safe agains removal there, cause we use neighbours primitives. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-26bonding: make bond_for_each_slave() use lower neighbour's privateVeaceslav Falico
It needs a list_head *iter, so add it wherever needed. Use both non-rcu and rcu variants. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: Dimitris Michailidis <dm@chelsio.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-26bonding: remove bond_for_each_slave_continue_reverse()Veaceslav Falico
We only use it in rollback scenarios and can easily use the standart bond_for_each_dev() instead. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-26bonding: populate neighbour's private on enslaveVeaceslav Falico
Use the new provided function when attaching the lower slave to populate its ->private with struct slave *new_slave. Also, move it to the end to be able to 'find' it only after it was completely initialized, and deinitialize in the first place on release. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-26net: add adj_list to save only neighboursVeaceslav Falico
Currently, we distinguish neighbours (first-level linked devices) from non-neighbours by the neighbour bool in the netdev_adjacent. This could be quite time-consuming in case we would like to traverse *only* through neighbours - cause we'd have to traverse through all devices and check for this flag, and in a (quite common) scenario where we have lots of vlans on top of bridge, which is on top of a bond - the bonding would have to go through all those vlans to get its upper neighbour linked devices. This situation is really unpleasant, cause there are already a lot of cases when a device with slaves needs to go through them in hot path. To fix this, introduce a new upper/lower device lists structure - adj_list, which contains only the neighbours. It works always in pair with the all_adj_list structure (renamed from upper/lower_dev_list), i.e. both of them contain the same links, only that all_adj_list contains also non-neighbour device links. It's really a small change visible, currently, only for __netdev_adjacent_dev_insert/remove(), and doesn't change the main linked logic at all. Also, add some comments a fix a name collision in netdev_for_each_upper_dev_rcu() and rework the naming by the following rules: netdev_(all_)(upper|lower)_* If "all_" is present, then we work with the whole list of upper/lower devices, otherwise - only with direct neighbours. Uninline functions - to get better stack traces. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Cong Wang <amwang@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-15bonding: Make alb learning packet interval configurableNeil Horman
running bonding in ALB mode requires that learning packets be sent periodically, so that the switch knows where to send responding traffic. However, depending on switch configuration, there may not be any need to send traffic at the default rate of 3 packets per second, which represents little more than wasted data. Allow the ALB learning packet interval to be made configurable via sysfs Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Acked-by: Veaceslav Falico <vfalico@redhat.com> CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-11bonding: fix bond_arp_rcv setting and arp validate desync statenikolay@redhat.com
We make bond_arp_rcv global so it can be used in bond_sysfs if the bond interface is up and arp_interval is being changed to a positive value and cleared otherwise as per Jay's suggestion. This also fixes a problem where bond_arp_rcv was set even though arp_validate was disabled while the bond was up by unsetting recv_probe in bond_store_arp_validate and respectively setting it if enabled. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Acked-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04bonding: drop read_lock in bond_compute_featuresnikolay@redhat.com
bond_compute_features is always called with RTNL held, so we can safely drop the read bond->lock. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04bonding: drop read_lock in bond_fix_featuresnikolay@redhat.com
We're protected by RTNL so nothing can happen and we can safely drop the read bond->lock. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04bonding: trivial: remove outdated comment and bracesnikolay@redhat.com
We don't have to release all slaves when closing the bond dev, so remove the outdated comment and the braces around the left single statement. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04bonding: simplify and fix peer notificationnikolay@redhat.com
This patch aims to remove a use of the bond->lock for mutual exclusion which will later allow easier migration to RCU of the users of this functionality. We use RTNL as a synchronizing mechanism since it's always held when send_peer_notif is set, and when it is decremented from the notifier function. We can also drop some locking, and fix the leakage of the send_peer_notif counter. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-29bonding: pr_debug instead of pr_warn in bond_arp_send_allVeaceslav Falico
They're simply annoying and will spam dmesg constantly if we hit them, so convert to pr_debug so that we still can access them in case of debugging. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-29bonding: remove vlan_list/current_alb_vlanVeaceslav Falico
Currently there are no real users of vlan_list/current_alb_vlan, only the helpers which maintain them, so remove them. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-29bonding: use vlan_uses_dev() in __bond_release_one()Veaceslav Falico
We always hold the rtnl_lock() in __bond_release_one(), so use vlan_uses_dev() instead of bond_vlan_used(). CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-29bonding: convert bond_has_this_ip() to use upper devicesVeaceslav Falico
Currently, bond_has_this_ip() is aware only of vlan upper devices, and thus will return false if the address is associated with the upper bridge or any other device, and thus will break the arp logic. Fix this by using the upper device list. For every upper device we verify if the address associated with it is our address, and if yes - return true. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-29bonding: make bond_arp_send_all use upper device listVeaceslav Falico
Currently, bond_arp_send_all() is aware only of vlans, which breaks configurations like bond <- bridge (or any other 'upper' device) with IP (which is quite a common scenario for virt setups). To fix this we convert the bond_arp_send_all() to first verify if the rt device is the bond itself, and if not - to go through its list of upper vlans and their respectiv upper devices (if the vlan's upper device matches - tag the packet), if still not found - go through all of our upper list devices to see if any of them match the route device for the target. If the match is a vlan device - we also save its vlan_id and tag it in bond_arp_send(). Also, clean the function a bit to be more readable. CC: Vlad Yasevich <vyasevic@redhat.com> CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-25bonding: fix error return code in bond_enslave()Wei Yongjun
Fix to return a negative error code in the add bond vlan ids error handling case instead of 0, as done elsewhere in this function. Introduced by commit 1ff412ad7714f6952f76ffd77f0a7f2f563288a1. (bonding: change the bond's vlan syncing functions with the standard ones) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-08bonding: unwind on bond_add_vlan failurenikolay@redhat.com
In case of bond_add_vlan() failure currently we'll have the vlan's refcnt bumped up in all slaves, but it will never go down because it failed to get added to the bond, so properly unwind the added vlan if bond_add_vlan fails. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-08bonding: change the bond's vlan syncing functions with the standard onesnikolay@redhat.com
Now we have vlan_vids_add/del_by_dev() which serve the same purpose as bond's bond_add/del_vlans_on_slave() with the good side effect of reverting the changes if one of the additions fails. There's only 1 change in the behaviour of enslave: if adding of the vlans to the slave fails, we'll fail the enslaving because otherwise we might delete some vlan that wasn't added by the bonding. The only way this may happen is with ENOMEM currently, so we're in trouble anyway. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-05bonding: remove locking from bond_set_rx_mode()Veaceslav Falico
We're already protected by RTNL lock, so nothing can happen to bond/its slaves, and thus the locking is useless here (both bond->lock and bond->curr_active_slave). Also, add ASSERT_RTNL() both to bond_set_rx_mode() and bond_hw_addr_swap() to catch possible uses of it without RTNL locking. This patch also saves us from a lockdep false-positive in bond_set_rx_mode() vs bond_hw_addr_swap(). CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-05bonding: add bond_time_in_interval() and use it for time comparisonVeaceslav Falico
Currently we use a lot of time comparison math for arp_interval comparisons, which are sometimes quite hard to read and understand. All the time comparisons have one pattern: (time - arp_interval_jiffies) <= jiffies <= (time + mod * arp_interval_jiffies + arp_interval_jiffies/2) Introduce a new helper - bond_time_in_interval(), which will do the math in one place and, thus, will clean up the logical code. This helper introduces a bit of overhead (by always calculating the jiffies from arp_interval), however it's really not visible, considering that functions using it usually run once in arp_interval milliseconds. There are several lines slightly over 80 chars, however breaking them would result in more hard-to-read code than several character after the 80 mark. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-05bonding: call slave_last_rx() only once per slaveVeaceslav Falico
Simple cleanup to not call slave_last_rx() on every time function. It won't give any measurable boost - but looks cleaner and easier to understand. There are no time-consuming functions in between these calls, so it's safe to call it in the beginning only once. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-02bonding: modify only neigh_parms owned by usVeaceslav Falico
Otherwise, on neighbour creation, bond_neigh_init() will be called with a foreign netdev. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-01bonding: initial RCU conversionnikolay@redhat.com
This patch does the initial bonding conversion to RCU. After it the following modes are protected by RCU alone: roundrobin, active-backup, broadcast and xor. Modes ALB/TLB and 3ad still acquire bond->lock for reading, and will be dealt with later. curr_active_slave needs to be dereferenced via rcu in the converted modes because the only thing protecting the slave after this patch is rcu_read_lock, so we need the proper barrier for weakly ordered archs and to make sure we don't have stale pointer. It's not tagged with __rcu yet because there's still work to be done to remove the curr_slave_lock, so sparse will complain when rcu_assign_pointer and rcu_dereference are used, but the alternative to use rcu_dereference_protected would've created much bigger code churn which is more difficult to test and review. That will be converted in time. 1. Active-backup mode 1.1 Perf recording while doing iperf -P 4 - old bonding: iperf spent 0.55% in bonding, system spent 0.29% CPU in bonding - new bonding: iperf spent 0.29% in bonding, system spent 0.15% CPU in bonding 1.2. Bandwidth measurements - old bonding: 16.1 gbps consistently - new bonding: 17.5 gbps consistently 2. Round-robin mode 2.1 Perf recording while doing iperf -P 4 - old bonding: iperf spent 0.51% in bonding, system spent 0.24% CPU in bonding - new bonding: iperf spent 0.16% in bonding, system spent 0.11% CPU in bonding 2.2 Bandwidth measurements - old bonding: 8 gbps (variable due to packet reorderings) - new bonding: 10 gbps (variable due to packet reorderings) Of course the latency has improved in all converted modes, and moreover while doing enslave/release (since it doesn't affect tx anymore). Also I've stress tested all modes doing enslave/release in a loop while transmitting traffic. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-01bonding: factor out slave id tx code and simplify xmit pathsNikolay Aleksandrov
I factored out the tx xmit code which relies on slave id in bond_xmit_slave_id. It is global because later it can be used also in 3ad mode xmit. Unnecessary obvious comments are removed. Active-backup mode is simplified because bond_dev_queue_xmit always consumes the skb. bond_xmit_xor becomes one line because of bond_xmit_slave_id. bond_for_each_slave_from is not used in bond_xmit_slave_id because later when RCU is used we can avoid important race condition by using standard rculist routines. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-01bonding: simplify broadcast_xmit functionNikolay Aleksandrov
We don't need to start from the curr_active_slave as the frame will be sent to all eligible slaves anyway, so we remove the unnecessary local variables, checks and comments, and make it use the standard list API. This has the nice side-effect that later when it's converted to RCU a race condition will be avoided which could lead to double packet tx. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-01bonding: remove unnecessary read_locks of curr_slave_locknikolay@redhat.com
In all the cases we already hold bond->lock for reading, so the slave can't get away and the check != NULL is sufficient. curr_active_slave can still change after the read_lock is unlocked prior to use of the dereferenced value, so there's no need for it. It either contains a valid slave which we use (and can't get away), or it is NULL which is checked. In some places the read_lock of curr_slave_lock was left because we need it not to change while performing some action (e.g. syncing current active slave's addresses, sending ARP requests through the active slave) such cases will be dealt with individually while converting to RCU. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-01bonding: convert to list API and replace bond's custom listnikolay@redhat.com
This patch aims to remove struct bonding's first_slave and struct slave's next and prev pointers, and replace them with the standard Linux list API. The old macros are converted to list API as well and some new primitives are available now. The checks if there're slaves that used slave_cnt have been replaced by the list_empty macro. Also a few small style fixes, changing longest -> shortest line in local variable declarations, leaving an empty line before return and removing unnecessary brackets. This is the first step to gradual RCU conversion. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-01bonding: fix system hang due to fast igmp timer reschedulingNikolay Aleksandrov
After commit 4aa5dee4d9 ("net: convert resend IGMP to notifier event") we try to acquire rtnl in bond_resend_igmp_join_requests but it can be scheduled with rtnl already held (e.g. when bond_change_active_slave is called with rtnl) causing a loop of immediate reschedules + calls because rtnl_trylock fails each time since it's being already held. For me this issue leads to system hangs very easy: modprobe bonding; ifconfig bond0 up; ifenslave bond0 eth0; rmmod bonding; The fix is to introduce a small (1 jiffy) delay which is enough for the sections holding rtnl to finish without putting any strain on the system. Also adjust the timer in bond_change_active_slave to be 1 jiffy, since most of the time it's called with rtnl already held. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-28bonding: remove bond_resend_igmp_join_requests read_unlock leftovernikolay@redhat.com
After commit 4aa5dee4d9 ("net: convert resend IGMP to notifier event") we have 1 read_unlock in bond_resend_igmp_join_requests which isn't paired with a read_lock because it's removed by that commit. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-26bond: cleanup netpoll codestephen hemminger
This started out with fixing a sparse warning, then I realized that the wrapper function bond_netpoll_info could just be removed by rolling it into the enable code. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-26bonding: use pre-defined macro in bond_mode_name instead of magic number 0Wang Sheng-Hui
We have BOND_MODE_ROUNDROBIN pre-defined as 0, and it's the lowest mode number. Use it to check the arg lower bound instead of magic number 0 in bond_mode_name. Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-24bonding: Fixed up a error "do not initialise statics to 0 or NULL" in ↵dingtianhong
bond_main.c The error is found by the checkpatch.pl tools. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-24bonding: don't call slave_xxx_netpoll under spinlocksdingtianhong
The slave_xxx_netpoll will call synchronize_rcu_bh(), so the function may schedule and sleep, it should't be called under spinlocks. bond_netpoll_setup() and bond_netpoll_cleanup() are always protected by rtnl lock, it is no need to take the read lock, as the slave list couldn't be changed outside rtnl lock. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-23net: convert resend IGMP to notifier eventJiri Pirko
Until now, bond_resend_igmp_join_requests() looks for vlans attached to bonding device, bridge where bonding act as port manually. It does not care of other scenarios, like stacked bonds or team device above. Make this more generic and use netdev notifier to propagate the event to upper devices and to actually call ip_mc_rejoin_groups(). Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>