summaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)Author
2017-04-11Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2017-04-11 1) Remove unused field from struct xfrm_mgr. 2) Code size optimizations for the xfrm prefix hash and address match. 3) Branch optimization for addr4_match. All patches from Alexey Dobriyan. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-09netfilter: udplite: Remove duplicated udplite4/6 declarationGao Feng
There are two nf_conntrack_l4proto_udp4 declarations in the head file nf_conntrack_ipv4/6.h. Now remove one which is not enbraced by the macro CONFIG_NF_CT_PROTO_UDPLITE. Signed-off-by: Gao Feng <fgao@ikuai8.com>
2017-04-08net: dsa: Factor bottom tag receive functionsFlorian Fainelli
All DSA tag receive functions do strictly the same thing after they have located the originating source port from their tag specific protocol: - push ETH_HLEN bytes - set pkt_type to PACKET_HOST - call eth_type_trans() - bump up counters - call netif_receive_skb() Factor all of that into dsa_switch_rcv(). This also makes us return a pointer to a sk_buff, which makes us symetric with the xmit function. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-07net-next: dsa: add Mediatek tag RX/TX handlerSean Wang
Add the support for the 4-bytes tag for DSA port distinguishing inserted allowing receiving and transmitting the packet via the particular port. The tag is being added after the source MAC address in the ethernet header. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: Landen Chao <Landen.Chao@mediatek.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-06Merge tag 'rxrpc-rewrite-20170406' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Miscellany Here's a set of patches that make some minor changes to AF_RXRPC: (1) Store error codes in struct rxrpc_call::error as negative codes and only convert to positive in recvmsg() to avoid confusion inside the kernel. (2) Note the result of trying to abort a call (this fails if the call is already 'completed'). (3) Don't abort on temporary errors whilst processing challenge and response packets, but rather drop the packet and wait for retransmission. And also adds some more tracing: (4) Protocol errors. (5) Received abort packets. (6) Changes in the Rx window size due to ACK packet information. (7) Client call initiation (to allow the rxrpc_call struct pointer, the wire call ID and the user ID/afs_call pointer to be cross-referenced). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-06net/sched: Removed unused vlan actions definitionOr Gerlitz
Commit c7e2b9689ef "sched: introduce vlan action" added both the UAPI values for the vlan actions (TCA_VLAN_ACT_) and these two in-kernel ones which are not used, remove them. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-06netfilter: nat: nf_nat_mangle_{udp,tcp}_packet returns booleanGao Feng
nf_nat_mangle_{udp,tcp}_packet() returns int. However, it is used as bool type in many spots. Fix this by consistently handle this return value as a boolean. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-06netfilter: nf_ct_expect: Add nf_ct_remove_expect()Gao Feng
When remove one expect, it needs three statements. And there are multiple duplicated codes in current code. So add one common function nf_ct_remove_expect to consolidate this. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-06netfilter: expect: Make sure the max_expected limit is effectiveGao Feng
Because the type of expecting, the member of nf_conn_help, is u8, it would overflow after reach U8_MAX(255). So it doesn't work when we configure the max_expected exceeds 255 with expect policy. Now add the check for max_expected. Return the -EINVAL when it exceeds the limit. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-06netfilter: nf_tables: add nft_is_base_chain() helperPablo Neira Ayuso
This new helper function allows us to check if this is a basechain. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Mostly simple cases of overlapping changes (adding code nearby, a function whose name changes, for example). Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-06rxrpc: Note a successfully aborted kernel operationDavid Howells
Make rxrpc_kernel_abort_call() return an indication as to whether it actually aborted the operation or not so that kafs can trace the failure of the operation. Note that 'success' in this context means changing the state of the call, not necessarily successfully transmitting an ABORT packet. Signed-off-by: David Howells <dhowells@redhat.com>
2017-04-05bonding: attempt to better support longer hw addressesJarod Wilson
People are using bonding over Infiniband IPoIB connections, and who knows what else. Infiniband has a hardware address length of 20 octets (INFINIBAND_ALEN), and the network core defines a MAX_ADDR_LEN of 32. Various places in the bonding code are currently hard-wired to 6 octets (ETH_ALEN), such as the 3ad code, which I've left untouched here. Besides, only alb is currently possible on Infiniband links right now anyway, due to commit 1533e7731522, so the alb code is where most of the changes are. One major component of this change is the addition of a bond_hw_addr_copy function that takes a length argument, instead of using ether_addr_copy everywhere that hardware addresses need to be copied about. The other major component of this change is converting the bonding code from using struct sockaddr for address storage to struct sockaddr_storage, as the former has an address storage space of only 14, while the latter is 128 minus a few, which is necessary to support bonding over device with up to MAX_ADDR_LEN octet hardware addresses. Additionally, this probably fixes up some memory corruption issues with the current code, where it's possible to write an infiniband hardware address into a sockaddr declared on the stack. Lightly tested on a dual mlx4 IPoIB setup, which properly shows a 20-octet hardware address now: $ cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active) Primary Slave: mlx4_ib0 (primary_reselect always) Currently Active Slave: mlx4_ib0 MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 100 Down Delay (ms): 100 Slave Interface: mlx4_ib0 MII Status: up Speed: Unknown Duplex: Unknown Link Failure Count: 0 Permanent HW addr: 80:00:02:08:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:1d:67:01 Slave queue ID: 0 Slave Interface: mlx4_ib1 MII Status: up Speed: Unknown Duplex: Unknown Link Failure Count: 0 Permanent HW addr: 80:00:02:09:fe:80:00:00:00:00:00:01:e4:1d:2d:03:00:1d:67:02 Slave queue ID: 0 Also tested with a standard 1Gbps NIC bonding setup (with a mix of e1000 and e1000e cards), running LNST's bonding tests. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-05Merge tag 'linux-can-next-for-4.13-20170404' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2017-03-03 this is a pull request of 5 patches for net-next/master. There are two patches by Yegor Yefremov which convert the ti_hecc driver into a DT only driver, as there is no in-tree user of the old platform driver interface anymore. The next patch by Mario Kicherer adds network namespace support to the can subsystem. The last two patches by Akshay Bhat add support for the holt_hi311x SPI CAN driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-05net: tcp: Define the TCP_MAX_WSCALE instead of literal number 14Gao Feng
Define one new macro TCP_MAX_WSCALE instead of literal number '14', and use U16_MAX instead of 65535 as the max value of TCP window. There is another minor change, use rounddown(space, mss) instead of (space / mss) * mss; Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-05sctp: get sock from transport in sctp_transport_update_pmtuXin Long
This patch is almost to revert commit 02f3d4ce9e81 ("sctp: Adjust PMTU updates to accomodate route invalidation."). As t->asoc can't be NULL in sctp_transport_update_pmtu, it could get sk from asoc, and no need to pass sk into that function. It is also to remove some duplicated codes from that function. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-05NFC: Add nfc_dbg() macroAndy Shevchenko
In some cases nfc_dbg() is useful. Add such macro to a header. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2017-04-04can: initial support for network namespacesMario Kicherer
This patch adds initial support for network namespaces. The changes only enable support in the CAN raw, proc and af_can code. GW and BCM still have their checks that ensure that they are used only from the main namespace. The patch boils down to moving the global structures, i.e. the global filter list and their /proc stats, into a per-namespace structure and passing around the corresponding "struct net" in a lot of different places. Changes since v1: - rebased on current HEAD (2bfe01e) - fixed overlong line Signed-off-by: Mario Kicherer <dev@kicherer.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-04-03flowcache: more "unsigned int"Alexey Dobriyan
Make ->hash_count, ->low_watermark and ->high_watermark unsigned int and propagate unsignedness to other variables. This change doesn't change code generation because these fields aren't used in 64-bit contexts but make it anyway: these fields can't be negative numbers. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-03flowcache: make flow_key_size() return "unsigned int"Alexey Dobriyan
Flow keys aren't 4GB+ numbers so 64-bit arithmetic is excessive. Space savings (I'm not sure what CSWTCH is): add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-48 (-48) function old new delta flow_cache_lookup 1163 1159 -4 CSWTCH 75997 75953 -44 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-03sctp: check for dst and pathmtu update in sctp_packet_configXin Long
This patch is to move sctp_transport_dst_check into sctp_packet_config from sctp_packet_transmit and add pathmtu check in sctp_packet_config. With this fix, sctp can update dst or pathmtu before appending chunks, which can void dropping packets in sctp_packet_transmit when dst is obsolete or dst's mtu is changed. This patch is also to improve some other codes in sctp_packet_config. It updates packet max_size with gso_max_size, checks for dst and pathmtu, and appends ecne chunk only when packet is empty and asoc is not NULL. It makes sctp flush work better, as we only need to set up them once for one flush schedule. It's also safe, since asoc is NULL only when the packet is created by sctp_ootb_pkt_new in which it just gets the new dst, no need to do more things for it other than set packet with transport's pathmtu. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-03sctp: add SCTP_PR_STREAM_STATUS sockopt for prsctpXin Long
Before when implementing sctp prsctp, SCTP_PR_STREAM_STATUS wasn't added, as it needs to save abandoned_(un)sent for every stream. After sctp stream reconf is added in sctp, assoc has structure sctp_stream_out to save per stream info. This patch is to add SCTP_PR_STREAM_STATUS by putting the prsctp per stream statistics into sctp_stream_out. v1->v2: fix an indent issue. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-02sock: correctly test SOCK_TIMESTAMP in sock_recv_ts_and_drops()Eric Dumazet
It seems the code does not match the intent. This broke packetdrill, and probably other programs. Fixes: 6c7c98bad488 ("sock: avoid dirtying sk_stamp, if possible") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01net: mpls: Increase max number of labels for lwt encapDavid Ahern
Alow users to push down more labels per MPLS encap. Similar to LSR case, move label array to the end of mpls_iptunnel_encap and allocate based on the number of labels for the route. For consistency with the LSR case, re-use the same maximum number of labels. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01net: dsa: add cross-chip bridging operationsVivien Didelot
Introduce crosschip_bridge_{join,leave} operations in the dsa_switch_ops structure, which can be used by switches supporting interconnection. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-31cfg80211: add documentation for cfg80211_get_bss()Johannes Berg
This was missing, but is referenced a lot in the documentation. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-03-31cfg80211: Add support for FILS shared key authentication offloadVidyullatha Kanchanapally
Enhance nl80211 and cfg80211 connect request and response APIs to support FILS shared key authentication offload. The new nl80211 attributes can be used to provide additional information to the driver to establish a FILS connection. Also enhance the set/del PMKSA to allow support for adding and deleting PMKSA based on FILS cache identifier. Add a new feature flag that drivers can use to advertize support for FILS shared key authentication and association in station mode when using their own SME. Signed-off-by: Vidyullatha Kanchanapally <vkanchan@qti.qualcomm.com> Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-03-31cfg80211: Use a structure to pass connect response paramsVidyullatha Kanchanapally
Currently the connect event from driver takes all the connection response parameters as arguments. With support for new features these response parameters can grow. Use a structure to pass these parameters rather than passing them as function arguments. Signed-off-by: Vidyullatha Kanchanapally <vkanchan@qti.qualcomm.com> Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> [add to documentation] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-03-30sock: avoid dirtying sk_stamp, if possiblePaolo Abeni
sock_recv_ts_and_drops() unconditionally set sk->sk_stamp for every packet, even if the SOCK_TIMESTAMP flag is not set in the related socket. If selinux is enabled, this cause a cache miss for every packet since sk->sk_stamp and sk->sk_security share the same cacheline. With this change sk_stamp is set only if the SOCK_TIMESTAMP flag is set, and is cleared for the first packet, so that the user perceived behavior is unchanged. This gives up to 5% speed-up under udp-flood with small packets. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-30sctp: alloc stream info when initializing asocXin Long
When sending a msg without asoc established, sctp will send INIT packet first and then enqueue chunks. Before receiving INIT_ACK, stream info is not yet alloced. But enqueuing chunks needs to access stream info, like out stream state and out stream cnt. This patch is to fix it by allocing out stream info when initializing an asoc, allocing in stream and re-allocing out stream when processing init. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-29Merge tag 'mlx5e-pedit' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Or Gerlitz says: ==================== mlx5e-pedit 2017-03-28 This series adds support for offloading modifications of packet headers using ConnectX-5 HW header re-write as an action applied during packet steering. The offloaded SW mechanism is TC's pedit action. The offloading is supported for E-Switch steering of VF traffic in the SRIOV switchdev mode and for NIC (non eswitch) RX. One use-case for this offload on virtual networks, is when the hypervisor implements flow based router such as Open-Stack's DVR, where L2 headers of guest packets re-written with routers' MAC addresses and the IP TTL is decremented. Another use case (which can be applied in parallel with routing) is stateless NAT where guest L3/L4 headers are re-written. The series is built as follows: the 1st six patches are preperations which don't yet add new functionality, patches 7-8 add the FW APIs (data-structures and commands) for header re-write, and patch nine allows offloading driver to access pedit keys. The 10th patch is somehow the core of the series, where we translate from the pedit way to represent set of header modification elements to the FW API for that same matter. Once a set of HW modification is established, we register it with the FW and get a modify header ID. When this ID is used with an action during packet steering, the HW applies the header modification on the packet. Patches 11 and 12 implement the above logic as an offload for pedit action for the NIC and E-Switch use-cases. I'd like to thanks Elijah Shakkour <elijahs@mellanox.com> for implementing and helping me testing this functionality on HW simulator, before it could be done with FW. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28net: dsa: dsa2: Add basic support of devlinkAndrew Lunn
Register the switch and its ports with devlink. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28net: break include loop netdevice.h, dsa.h, devlink.hAndrew Lunn
There is an include loop between netdevice.h, dsa.h, devlink.h because of NETDEV_ALIGN, making it impossible to use devlink structures in dsa.h. Break this loop by taking dsa.h out of netdevice.h, add a forward declaration of dsa_switch_tree and netdev_set_default_ethtool_ops() function, which is what netdevice.h requires. No longer having dsa.h in netdevice.h means the includes in dsa.h no longer get included. This breaks a few other files which depend on these includes. Add these directly in the affected file. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28net: ipv6: Refactor inet6_netconf_notify_devconf to take eventDavid Ahern
Refactor inet6_netconf_notify_devconf to take the event as an input arg. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28ipv6: add support for NETDEV_RESEND_IGMP eventVlad Yasevich
This patch adds support for NETDEV_RESEND_IGMP event similar to how it works for IPv4. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28sctp: change to save MSG_MORE flag into assocXin Long
David Laight noticed the support for MSG_MORE with datamsg->force_delay didn't really work as we expected, as the first msg with MSG_MORE set would always block the following chunks' dequeuing. This Patch is to rewrite it by saving the MSG_MORE flag into assoc as David Laight suggested. asoc->force_delay is used to save MSG_MORE flag before a msg is sent. All chunks in queue would not be sent out if asoc->force_delay is set by the msg with MSG_MORE flag, until a new msg without MSG_MORE flag clears asoc->force_delay. Note that this change would not affect the flush is generated by other triggers, like asoc->state != ESTABLISHED, queue size > pmtu etc. v1->v2: Not clear asoc->force_delay after sending the msg with MSG_MORE flag. Fixes: 4ea0c32f5f42 ("sctp: add support for MSG_MORE") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: David Laight <david.laight@aculab.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28devlink: Support for pipeline debug (dpipe)Arkadi Sharshevsky
The pipeline debug is used to export the pipeline abstractions for the main objects - tables, headers and entries. The only support for set is for changing the counter parameter on specific table. The basic structures: Header - can represent a real protocol header information or internal metadata. Generic protocol headers like IPv4 can be shared between drivers. Each driver can add local headers. Field - part of a header. Can represent protocol field or specific ASIC metadata field. Hardware special metadata fields can be mapped to different resources, for example switch ASIC ports can have internal number which from the systems point of view is mapped to netdeivce ifindex. Match - represent specific match rule. Can describe match on specific field or header. The header index should be specified as well in order to support several header instances of the same type (tunneling). Action - represents specific action rule. Actions can describe operations on specific field values for example like set, increment, etc. And header operation like add and delete. Value - represents value which can be associated with specific match or action. Table - represents a hardware block which can be described with match/ action behavior. The match/action can be done on the packets data or on the internal metadata that it gathered along the packets traversal throw the pipeline which is vendor specific and should be exported in order to provide understanding of ASICs behavior. Entry - represents single record in a specific table. The entry is identified by specific combination of values for match/action. Prior to accessing the tables/entries the drivers provide the header/ field data base which is used by driver to user-space. The data base is split between the shared headers and unique headers. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28net/sched: Add accessor functions to pedit keys for offloading driversOr Gerlitz
HW drivers will use the header-type and command fields from the extended keys, and some fields (e.g mask, val, offset) from the legacy keys. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-03-27bonding: split bond_set_slave_link_state into two partsMahesh Bandewar
Split the function into two (a) propose (b) commit phase without changing the semantics for the original API. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-27xfrm: branchless addr4_match() on 64-bitAlexey Dobriyan
Current addr4_match() code has special test for /0 prefixes because of standard required undefined behaviour. However, it is possible to omit it on 64-bit because shifting can be done within a 64-bit register and then truncated to the expected value (which is 0 mask). Implicit truncation by htonl() fits nicely into R32-within-R64 model on x86-64. Space savings: none (coincidence) Branch savings: 1 Before: movzx eax,BYTE PTR [rdi+0x2a] # ->prefixlen_d test al,al jne xfrm_selector_match + 0x23f ... movzx eax,BYTE PTR [rbx+0x2b] # ->prefixlen_s test al,al je xfrm_selector_match + 0x1c7 After (no branches): mov r8d,0x20 mov rdx,0xffffffffffffffff mov esi,DWORD PTR [rsi+0x2c] mov ecx,r8d sub cl,BYTE PTR [rdi+0x2a] xor esi,DWORD PTR [rbx] mov rdi,rdx xor eax,eax shl rdi,cl bswap edi Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-03-24net: Commonize busy polling code to focus on napi_id instead of socketSridhar Samudrala
Move the core functionality in sk_busy_loop() to napi_busy_loop() and make it independent of sk. This enables re-using this function in epoll busy loop implementation. Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24net: Track start of busy loop instead of when it should endAlexander Duyck
This patch flips the logic we were using to determine if the busy polling has timed out. The main motivation for this is that we will need to support two different possible timeout values in the future and by recording the start time rather than when we would want to end we can focus on making the end_time specific to the task be it epoll or socket based polling. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24net: Change return type of sk_busy_loop from bool to voidAlexander Duyck
checking the return value of sk_busy_loop. As there are only a few consumers of that data, and the data being checked for can be replaced with a check for !skb_queue_empty() we might as well just pull the code out of sk_busy_loop and place it in the spots that actually need it. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24net: Only define skb_mark_napi_id in one spot instead of twoAlexander Duyck
Instead of defining two versions of skb_mark_napi_id I think it is more readable to just match the format of the sk_mark_napi_id functions and just wrap the contents of the function instead of defining two versions of the function. This way we can save a few lines of code since we only need 2 of the ifdef/endif but needed 5 for the extra function declaration. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24net: Busy polling should ignore sender CPUsAlexander Duyck
This patch is a cleanup/fix for NAPI IDs following the changes that made it so that sender_cpu and napi_id were doing a better job of sharing the same location in the sk_buff. One issue I found is that we weren't validating the napi_id as being valid before we started trying to setup the busy polling. This change corrects that by using the MIN_NAPI_ID value that is now used in both allocating the NAPI IDs, as well as validating them. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24tcp: sysctl: Fix a race to avoid unexpected 0 window from spaceGao Feng
Because sysctl_tcp_adv_win_scale could be changed any time, so there is one race in tcp_win_from_space. For example, 1.sysctl_tcp_adv_win_scale<=0 (sysctl_tcp_adv_win_scale is negative now) 2.space>>(-sysctl_tcp_adv_win_scale) (sysctl_tcp_adv_win_scale is postive now) As a result, tcp_win_from_space returns 0. It is unexpected. Certainly if the compiler put the sysctl_tcp_adv_win_scale into one register firstly, then use the register directly, it would be ok. But we could not depend on the compiler behavior. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24net: Add sysctl to toggle early demux for tcp and udpsubashab@codeaurora.org
Certain system process significant unconnected UDP workload. It would be preferrable to disable UDP early demux for those systems and enable it for TCP only. By disabling UDP demux, we see these slight gains on an ARM64 system- 782 -> 788Mbps unconnected single stream UDPv4 633 -> 654Mbps unconnected UDPv4 different sources The performance impact can change based on CPU architecure and cache sizes. There will not much difference seen if entire UDP hash table is in cache. Both sysctls are enabled by default to preserve existing behavior. v1->v2: Change function pointer instead of adding conditional as suggested by Stephen. v2->v3: Read once in callers to avoid issues due to compiler optimizations. Also update commit message with the tests. v3->v4: Store and use read once result instead of querying pointer again incorrectly. v4->v5: Refactor to avoid errors due to compilation with IPV6={m,n} Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Suggested-by: Eric Dumazet <edumazet@google.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Tom Herbert <tom@herbertland.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24xfrm: use "unsigned int" in addr_match()Alexey Dobriyan
x86_64 is zero-extending arch so "unsigned int" is preferred over "int" for address calculations and extending to size_t. Space savings: add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-24 (-24) function old new delta xfrm_state_walk 708 696 -12 xfrm_selector_match 918 906 -12 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-03-24xfrm: remove unused struct xfrm_mgr::idAlexey Dobriyan
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-03-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/broadcom/genet/bcmmii.c drivers/net/hyperv/netvsc.c kernel/bpf/hashtab.c Almost entirely overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>