summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2011-11-29net: Add queue state xoff flag for stackTom Herbert
Create separate queue state flags so that either the stack or drivers can turn on XOFF. Added a set of functions used in the stack to determine if a queue is really stopped (either by stack or driver) Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29Merge branch 'nf' of git://1984.lsi.us.es/netDavid S. Miller
2011-11-29tcp: do not scale TSO segment size with reordering degreeNeal Cardwell
Since 2005 (c1b4a7e69576d65efc31a8cea0714173c2841244) tcp_tso_should_defer has been using tcp_max_burst() as a target limit for deciding how large to make outgoing TSO packets when not using sysctl_tcp_tso_win_divisor. But since 2008 (dd9e0dda66ba38a2ddd1405ac279894260dc5c36) tcp_max_burst() returns the reordering degree. We should not have tcp_tso_should_defer attempt to build larger segments just because there is more reordering. This commit splits the notion of deferral size used in TSO from the notion of burst size used in cwnd moderation, and returns the TSO deferral limit to its original value. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29atm: br2684: Avoid alignment issuesPascal Hambourg
Use memcmp() instead of cast to u16 when checking the PAD field. Signed-off-by: Pascal Hambourg <pascal@plouf.fr.eu.org> Signed-off-by: chas williams - CONTRACTOR <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29atm: br2684: Make headroom and hard_header_len depend on the payload typePascal Hambourg
Routed payload requires less headroom than bridged payload. So do not reallocate headroom if not needed. Also, add worst case AAL5 overhead to netdev->hard_header_len. Signed-off-by: Pascal Hambourg <pascal@plouf.fr.eu.org> Signed-off-by: chas williams - CONTRACTOR <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29net: optimize socket timestampingEric Dumazet
We can test/set multiple bits from sk_flags at once, to shorten a bit socket setup/dismantle phase. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29net: dont call jump_label_dec from irq contextEric Dumazet
Igor Maravic reported an error caused by jump_label_dec() being called from IRQ context : BUG: sleeping function called from invalid context at kernel/mutex.c:271 in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper 1 lock held by swapper/0: #0: (&n->timer){+.-...}, at: [<ffffffff8107ce90>] call_timer_fn+0x0/0x340 Pid: 0, comm: swapper Not tainted 3.2.0-rc2-net-next-mpls+ #1 Call Trace: <IRQ> [<ffffffff8104f417>] __might_sleep+0x137/0x1f0 [<ffffffff816b9a2f>] mutex_lock_nested+0x2f/0x370 [<ffffffff810a89fd>] ? trace_hardirqs_off+0xd/0x10 [<ffffffff8109a37f>] ? local_clock+0x6f/0x80 [<ffffffff810a90a5>] ? lock_release_holdtime.part.22+0x15/0x1a0 [<ffffffff81557929>] ? sock_def_write_space+0x59/0x160 [<ffffffff815e936e>] ? arp_error_report+0x3e/0x90 [<ffffffff810969cd>] atomic_dec_and_mutex_lock+0x5d/0x80 [<ffffffff8112fc1d>] jump_label_dec+0x1d/0x50 [<ffffffff81566525>] net_disable_timestamp+0x15/0x20 [<ffffffff81557a75>] sock_disable_timestamp+0x45/0x50 [<ffffffff81557b00>] __sk_free+0x80/0x200 [<ffffffff815578d0>] ? sk_send_sigurg+0x70/0x70 [<ffffffff815e936e>] ? arp_error_report+0x3e/0x90 [<ffffffff81557cba>] sock_wfree+0x3a/0x70 [<ffffffff8155c2b0>] skb_release_head_state+0x70/0x120 [<ffffffff8155c0b6>] __kfree_skb+0x16/0x30 [<ffffffff8155c119>] kfree_skb+0x49/0x170 [<ffffffff815e936e>] arp_error_report+0x3e/0x90 [<ffffffff81575bd9>] neigh_invalidate+0x89/0xc0 [<ffffffff81578dbe>] neigh_timer_handler+0x9e/0x2a0 [<ffffffff81578d20>] ? neigh_update+0x640/0x640 [<ffffffff81073558>] __do_softirq+0xc8/0x3a0 Since jump_label_{inc|dec} must be called from process context only, we must defer jump_label_dec() if net_disable_timestamp() is called from interrupt context. Reported-by: Igor Maravic <igorm@etf.rs> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29NET: NETROM: Fix formatting.Ralf Baechle
The Linux coding style wants the return statement on its own line. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29NET: NETROM: Cleanup argument SIOCADDRT ioctl argument checking.Ralf Baechle
nr_route.ndigis is unsigned int so the nr_route.ndigis < 0 expression is never true and can be dropped. Doing the nr_ax25_dev_get call later allows the nr_route.ndigis test to bail out without having to dev_put. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Osterried <thomas@osterried.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29NET: NETROM: When adding a route verify length of mnemonic string.Ralf Baechle
struct nr_route_struct's mnemonic permits a string of up to 7 bytes to be used. If userland passes a not zero terminated string to the kernel adding a node to the routing table might result in the kernel attempting to read copy a too long string. Mnemonic is part of the NET/ROM routing protocol; NET/ROM routing table updates only broadcast 6 bytes. The 7th byte in the mnemonic array exists only as a \0 termination character for the kernel code's convenience. Fixed by rejecting mnemonic strings that have no terminating \0 in the first 7 characters. Do this test only NETROM_NODE to avoid breaking NETROM_NEIGH where userland might passing an uninitialized mnemonic field. Initial patch by Dan Carpenter <dan.carpenter@oracle.com>. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Walter Harms <wharms@bfs.de> Cc: Thomas Osterried <thomas@osterried.de> Acked-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29NET: AX.25: Check ioctl arguments to avoid overflows further down the road.Ralf Baechle
Very large, nonsenical arguments or use in very extreme conditions could result in integer overflows. Check ioctls arguments to avoid such overflows and return -EINVAL for too large arguments. To allow the use of AX.25 for even the most extreme setup (think packet radio to the Phase 5E mars probe) we make no further attempt to clamp the argument range. Originally reported by Fan Long <longfancn@gmail.com> and a first patch was sent by Xi Wang <xi.wang@gmail.com>. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Cc: Xi Wang <xi.wang@gmail.com> Cc: Joerg Reuter <jreuter@yaina.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Thomas Osterried <thomas@osterried.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29dsa: Move switch drivers to new directory drivers/net/dsaBen Hutchings
Support for specific hardware belongs under drivers/net/ not net/. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Acked-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29dsa: Move all definitions needed by drivers into <net/dsa.h>Ben Hutchings
Any headers included by drivers should be under include/, and any definitions they use are not really private to the core as the name "dsa_priv.h" suggests. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Acked-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29dsa: Remove unnecessary exportsBen Hutchings
I mistakenly exported functions from slave.c that are only called from dsa.c, part of the same module. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Acked-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-28Merge branch 'for-davem' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
2011-11-28sch_sfb: use skb_flow_dissect()Eric Dumazet
Current SFB double hashing is not fulfilling SFB theory, if two flows share same rxhash value. Using skb_flow_dissect() permits to really have better hash dispersion, and get tunnelling support as well. Double hashing point was mentioned by Florian Westphal Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-28cls_flow: use skb_flow_dissect()Eric Dumazet
Instead of using a custom flow dissector, use skb_flow_dissect() and benefit from tunnelling support. This lack of tunnelling support was mentioned by Dan Siemon. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-28net: use skb_flow_dissect() in __skb_get_rxhash()Eric Dumazet
No functional changes. This uses the code we factorized in skb_flow_dissect() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-28net: introduce skb_flow_dissect()Eric Dumazet
We use at least two flow dissectors in network stack, with known limitations and code duplication. Introduce skb_flow_dissect() to factorize this, highly inspired from existing dissector from __skb_get_rxhash() Note : We extensively use skb_header_pointer(), this permits us to not touch skb at all. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-28tcp: tcp_sendmsg() wrong access to sk_route_capsEric Dumazet
Now sk_route_caps is u64, its dangerous to use an integer to store result of an AND operator. It wont work if NETIF_F_SG is moved on the upper part of u64. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-28ipv6: Set mcast_hops to IPV6_DEFAULT_MCASTHOPS when -1 was given.Li Wei
We need to set np->mcast_hops to it's default value at this moment otherwise when we use it and found it's value is -1, the logic to get default hop limit doesn't take multicast into account and will return wrong hop limit(IPV6_DEFAULT_HOPLIMIT) which is for unicast. Signed-off-by: Li Wei <lw@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-28net: Fix corruption in /proc/*/net/dev_mcastAnton Blanchard
I just hit this during my testing. Isn't there another bug lurking? BUG kmalloc-8: Redzone overwritten INFO: 0xc0000000de9dec48-0xc0000000de9dec4b. First byte 0x0 instead of 0xcc INFO: Allocated in .__seq_open_private+0x30/0xa0 age=0 cpu=5 pid=3896 .__kmalloc+0x1e0/0x2d0 .__seq_open_private+0x30/0xa0 .seq_open_net+0x60/0xe0 .dev_mc_seq_open+0x4c/0x70 .proc_reg_open+0xd8/0x260 .__dentry_open.clone.11+0x2b8/0x400 .do_last+0xf4/0x950 .path_openat+0xf8/0x480 .do_filp_open+0x48/0xc0 .do_sys_open+0x140/0x250 syscall_exit+0x0/0x40 dev_mc_seq_ops uses dev_seq_start/next/stop but only allocates sizeof(struct seq_net_private) of private data, whereas it expects sizeof(struct dev_iter_state): struct dev_iter_state { struct seq_net_private p; unsigned int pos; /* bucket << BUCKET_SPACE + offset */ }; Create dev_seq_open_ops and use it so we don't have to expose struct dev_iter_state. [ Problem added by commit f04565ddf52e4 (dev: use name hash for dev_seq_ops) -Eric ] Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-28mac80211: remove tracing config symbolJohannes Berg
There's little point in this config symbol, if tracing is disabled the overhead is negligible and if you think it's too bad you can always turn off tracing completely. Also remove the part where we don't have sparse check the tracing code -- it seems that it can now deal with it (or the code changed). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-11-28mac80211: clean up rx_h_mesh_fwdingThomas Pedersen
Lose about two levels of unnecessary indentation. Signed-off-by: Thomas Pedersen <thomas@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-11-28mac80211: don't initiate path discovery when forwarding frame with unknown DAThomas Pedersen
We used to initiate a path discovery when receiving a frame for which there is no forwarding information. To cut down on PREQ spam, just send a (gated) PERR in response. Also separate path discovery logic from nexthop querying. This patch means we no longer queue frames when forwarding, so kill the PERR TX stuff in discard_frame(). Signed-off-by: Thomas Pedersen <thomas@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-11-28{nl,cfg,mac}80211: implement dot11MeshHWMPperrMinIntervalThomas Pedersen
As per 802.11mb 13.9.11.3 Signed-off-by: Thomas Pedersen <thomas@cozybit.com> Signed-off-by: Javier Cardona <javier@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-11-28mac80211: fix forwarded mesh frame queue mappingThomas Pedersen
We can't rely on ieee80211_select_queue() to do its job at this point since the skb->protocol is not yet known. Instead, factor out and reuse the queue mapping logic for injected frames. Also, to mitigate congestion, forwarded frames should be dropped if the outgoing queue was stopped. This was not correctly implemented as we were not checking the right queue. Furthermore, we were dropping frames that had arrived to their destination if that queue was stopped. Signed-off-by: Thomas Pedersen <thomas@cozybit.com> Signed-off-by: Javier Cardona <javier@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-11-28mac80211: fix switched HWMP frame addressesThomas Pedersen
HWMP originator and target addresses were switched on the air but also on reception, which is why path selection still worked. Signed-off-by: Thomas Pedersen <thomas@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-11-28mac80211: failed forwarded mesh frame addressingThomas Pedersen
Don't write the TA until next hop is actually known, since we might need the original TA for sending a PERR. Previously we would send a PERR to ourself if path resolution for a forwarded frame failed. Signed-off-by: Thomas Pedersen <thomas@cozybit.com> Signed-off-by: Javier Cardona <javier@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-11-28{nl,cfg,mac}80211: Allow Setting Multicast Rate in MeshChun-Yeow Yeoh
Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com> Signed-off-by: Thomas Pedersen <thomas@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-11-28mac80211: Make __check_htcap_disable static.Ben Greear
Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-11-28mac80211: call skb_put() before copying the data (trivial)Eliad Peller
It doesn't have any actual effect here, but we should skb_put() *before* copying the data. Signed-off-by: Eliad Peller <eliad@wizery.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-11-28mac80211: fix TX warningJohannes Berg
Emmanuel reported that my previous patches to enable handing all fragments to drivers at once triggered the warning that the SKB queue wasn't empty. This is happening when we actually queue up some frames and don't hand them to the driver (queues are stopped). The reason for it is that my code that splices the frame(s) over to the pending queue didn't re-init the local queue, so skb_queue_empty() was false. Fix this by using the _init versions of the splicing. Also, convert the warning to WARN_ON_ONCE. Reported-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Tested-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-11-28mac80211: do not pass AP VLAN vif pointers to driversFelix Fietkau
This fixes frequent WARN_ONs when using AP VLAN + aggregation, as these vifs are virtual and not registered with drivers. Use sta_info_get_bss instead of sta_info_get in aggregation callbacks, so that these callbacks can find the station entry when called with the AP vif. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-11-28mac80211: log reason and initiator when rx agg is stoppedNikolay Martynov
Add additional debug logging of initiator and reason when rx aggregation session is stopped Signed-off-by: Nikolay Martynov <mar.kolya@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-11-28mac80211: trivial: use WLAN_BACK_RECIPIENT instead of hardcoded 0Nikolay Martynov
Use WLAN_BACK_RECIPIENT instead of hardcoded 0 for clarity Signed-off-by: Nikolay Martynov <mar.kolya@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-11-28mac80211: timeout tx agg sessions in way similar to rx agg sessionsNikolay Martynov
Currently tx aggregation is not being timed out even if timeout is specified when aggregation is opened. Tx tid stays active until delba arrives from recipient (i.e. recipient times out tid when it is inactive). The problem with this approach is that delba can get lost in the air and tx tid will stay perpetually opened on the originator while closed on recipient thus all data sent via this tid will be lost. This patch implements tx tid timeouting in way very similar to rx tid timeouting. Signed-off-by: Nikolay Martynov <mar.kolya@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-11-28mac80211: don't indicate probe resp change in IBSS modeArik Nemtsov
Due the a fall-through in the switch statement, the IBSS mode got a report for AP_RPOBE_RESPONSE change on reconfig. Change this to an AP only notification. Signed-off-by: Arik Nemtsov <arik@wizery.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-11-28mac80211: dereference RCU protected probe_resp pointer correctlyArik Nemtsov
This fixes a sparse warning: cfg.c:502:13: warning: incorrect type in assignment (different address spaces) cfg.c:502:13: expected struct sk_buff *old cfg.c:502:13: got struct sk_buff [noderef] <asn:4>*probe_resp Reported-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Arik Nemtsov <arik@wizery.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-11-28mac80211: fix duration calculation for QoS NOACK framesSimon Wunderlich
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-11-28mac80211: Add NoAck per tid supportSimon Wunderlich
This patch contains the processing changes in mac80211. Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-11-28wireless: Add NoAck per tid supportSimon Wunderlich
This patch contains the configuration changes in nl80211/cfg80211. Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-11-28mac80211: remove debugfs noack testSimon Wunderlich
This feature has been superseded by the NoAck per Queue feature. Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-11-28Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
2011-11-28mac80211: fix race between the AGG SM and the Tx data pathEmmanuel Grumbach
When a packet is supposed to sent be as an a-MPDU, mac80211 sets IEEE80211_TX_CTL_AMPDU to let the driver know. On the other hand, mac80211 configures the driver for aggregration with the ampdu_action callback. There is race between these two mechanisms since the following scenario can occur when the BA agreement is torn down: Tx softIRQ drv configuration ========== ================= check OPERATIONAL bit Set the TX_CTL_AMPDU bit in the packet clear OPERATIONAL bit stop Tx AGG Pass Tx packet to the driver. In that case the driver would get a packet with TX_CTL_AMPDU set although it has already been notified that the BA session has been torn down. To fix this, we need to synchronize all the Qdisc activity after we cleared the OPERATIONAL bit. After that step, all the following packets will be buffered until the driver reports it is ready to get new packets for this RA / TID. This buffering allows not to run into another race that would send packets with TX_CTL_AMPDU unset while the driver hasn't been requested to tear down the BA session yet. This race occurs in practice and iwlwifi complains with a WARN_ON when it happens. Cc: stable@kernel.org Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-11-28mac80211: fix race condition caused by late addBA responseNikolay Martynov
If addBA responses comes in just after addba_resp_timer has expired mac80211 will still accept it and try to open the aggregation session. This causes drivers to be confused and in some cases even crash. This patch fixes the race condition and makes sure that if addba_resp_timer has expired addBA response is not longer accepted and we do not try to open half-closed session. Cc: stable@vger.kernel.org Signed-off-by: Nikolay Martynov <mar.kolya@gmail.com> [some adjustments] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-11-28mac80211: don't stop a single aggregation session twiceJohannes Berg
Nikolay noticed (by code review) that mac80211 can attempt to stop an aggregation session while it is already being stopped. So to fix it, check whether stop is already being done and bail out if so. Also move setting the STOPPING state into the lock so things are properly atomic. Cc: stable@vger.kernel.org Reported-by: Nikolay Martynov <mar.kolya@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-11-28nl80211: fix MAC address validationEliad Peller
MAC addresses have a fixed length. The current policy allows passing < ETH_ALEN bytes, which might result in reading beyond the buffer. Cc: stable@vger.kernel.org Signed-off-by: Eliad Peller <eliad@wizery.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-11-27tcp: skip cwnd moderation in TCP_CA_Open in tcp_try_to_openNeal Cardwell
The problem: Senders were overriding cwnd values picked during an undo by calling tcp_moderate_cwnd() in tcp_try_to_open(). The fix: Don't moderate cwnd in tcp_try_to_open() if we're in TCP_CA_Open, since doing so is generally unnecessary and specifically would override a DSACK-based undo of a cwnd reduction made in fast recovery. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-27tcp: allow undo from reordered DSACKsNeal Cardwell
Previously, SACK-enabled connections hung around in TCP_CA_Disorder state while snd_una==high_seq, just waiting to accumulate DSACKs and hopefully undo a cwnd reduction. This could and did lead to the following unfortunate scenario: if some incoming ACKs advance snd_una beyond high_seq then we were setting undo_marker to 0 and moving to TCP_CA_Open, so if (due to reordering in the ACK return path) we shortly thereafter received a DSACK then we were no longer able to undo the cwnd reduction. The change: Simplify the congestion avoidance state machine by removing the behavior where SACK-enabled connections hung around in the TCP_CA_Disorder state just waiting for DSACKs. Instead, when snd_una advances to high_seq or beyond we typically move to TCP_CA_Open immediately and allow an undo in either TCP_CA_Open or TCP_CA_Disorder if we later receive enough DSACKs. Other patches in this series will provide other changes that are necessary to fully fix this problem. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>