summaryrefslogtreecommitdiff
path: root/include/net/sctp/structs.h
AgeCommit message (Collapse)Author
2017-07-04net, sctp: convert sctp_chunk.refcnt from atomic_t to refcount_tReshetova, Elena
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04net, sctp: convert sctp_datamsg.refcnt from atomic_t to refcount_tReshetova, Elena
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01sctp: remove the typedef sctp_init_chunk_tXin Long
This patch is to remove the typedef sctp_init_chunk_t, and replace with struct sctp_init_chunk in the places where it's using this typedef. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01sctp: remove the typedef sctp_paramhdr_tXin Long
This patch is to remove the typedef sctp_paramhdr_t, and replace with struct sctp_paramhdr in the places where it's using this typedef. It is also to fix some indents and use sizeof(variable) instead of sizeof(type). Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01sctp: remove the typedef sctp_cid_tXin Long
This patch is to remove the typedef sctp_cid_t, and replace with struct sctp_cid in the places where it's using this typedef. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20sctp: handle errors when updating asocXin Long
It's a bad thing not to handle errors when updating asoc. The memory allocation failure in any of the functions called in sctp_assoc_update() would cause sctp to work unexpectedly. This patch is to fix it by aborting the asoc and reporting the error when any of these functions fails. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-02sctp: merge sctp_stream_new and sctp_stream_initXin Long
Since last patch, sctp doesn't need to alloc memory for asoc->stream any more. sctp_stream_new and sctp_stream_init both are used to alloc memory for stream.in or stream.out, and their names are also confusing. This patch is to merge them into sctp_stream_init, and only pass stream and streamcnt parameters into it, instead of the whole asoc. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-02sctp: define the member stream as an object instead of pointer in asocXin Long
As Marcelo's suggestion, stream is a fixed size member of asoc and would not grow with more streams. To avoid an allocation for it, this patch is to define it as an object instead of pointer and update the places using it, also create sctp_stream_update() called in sctp_assoc_update() to migrate the stream info from one stream to another. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-18sctp: process duplicated strreset out and addstrm out requests correctlyXin Long
Now sctp stream reconf will process a request again even if it's seqno is less than asoc->strreset_inseq. If one request has been done successfully and some data chunks have been accepted and then a duplicated strreset out request comes, the streamin's ssn will be cleared. It will cause that stream will never receive chunks any more because of unsynchronized ssn. It allows a replay attack. A similar issue also exists when processing addstrm out requests. It will cause more extra streams being added. This patch is to fix it by saving the last 2 results into asoc. When a duplicated strreset out or addstrm out request is received, reply it with bad seqno if it's seqno < asoc->strreset_inseq - 2, and reply it with the result saved in asoc if it's seqno >= asoc->strreset_inseq - 2. Note that it saves last 2 results instead of only last 1 result, because two requests can be sent together in one chunk. And note that when receiving a duplicated request, the receiver side will still reply it even if the peer has received the response. It's safe, As the response will be dropped by the peer. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Mostly simple cases of overlapping changes (adding code nearby, a function whose name changes, for example). Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-05sctp: get sock from transport in sctp_transport_update_pmtuXin Long
This patch is almost to revert commit 02f3d4ce9e81 ("sctp: Adjust PMTU updates to accomodate route invalidation."). As t->asoc can't be NULL in sctp_transport_update_pmtu, it could get sk from asoc, and no need to pass sk into that function. It is also to remove some duplicated codes from that function. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-03sctp: add SCTP_PR_STREAM_STATUS sockopt for prsctpXin Long
Before when implementing sctp prsctp, SCTP_PR_STREAM_STATUS wasn't added, as it needs to save abandoned_(un)sent for every stream. After sctp stream reconf is added in sctp, assoc has structure sctp_stream_out to save per stream info. This patch is to add SCTP_PR_STREAM_STATUS by putting the prsctp per stream statistics into sctp_stream_out. v1->v2: fix an indent issue. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-30sctp: alloc stream info when initializing asocXin Long
When sending a msg without asoc established, sctp will send INIT packet first and then enqueue chunks. Before receiving INIT_ACK, stream info is not yet alloced. But enqueuing chunks needs to access stream info, like out stream state and out stream cnt. This patch is to fix it by allocing out stream info when initializing an asoc, allocing in stream and re-allocing out stream when processing init. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28sctp: change to save MSG_MORE flag into assocXin Long
David Laight noticed the support for MSG_MORE with datamsg->force_delay didn't really work as we expected, as the first msg with MSG_MORE set would always block the following chunks' dequeuing. This Patch is to rewrite it by saving the MSG_MORE flag into assoc as David Laight suggested. asoc->force_delay is used to save MSG_MORE flag before a msg is sent. All chunks in queue would not be sent out if asoc->force_delay is set by the msg with MSG_MORE flag, until a new msg without MSG_MORE flag clears asoc->force_delay. Note that this change would not affect the flush is generated by other triggers, like asoc->state != ESTABLISHED, queue size > pmtu etc. v1->v2: Not clear asoc->force_delay after sending the msg with MSG_MORE flag. Fixes: 4ea0c32f5f42 ("sctp: add support for MSG_MORE") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: David Laight <david.laight@aculab.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22sctp: declare struct sctp_stream before using itXin Long
sctp_stream_free uses struct sctp_stream as a param, but struct sctp_stream is defined after it's declaration. This patch is to declare struct sctp_stream before sctp_stream_free. Fixes: a83863174a61 ("sctp: prepare asoc stream for stream reconf") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21sctp: define dst_pending_confirm as a bit in sctp_transportXin Long
As tp->dst_pending_confirm's value can only be set 0 or 1, this patch is to change to define it as a bit instead of __u32. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09net: Work around lockdep limitation in sockets that use socketsDavid Howells
Lockdep issues a circular dependency warning when AFS issues an operation through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem. The theory lockdep comes up with is as follows: (1) If the pagefault handler decides it needs to read pages from AFS, it calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but creating a call requires the socket lock: mmap_sem must be taken before sk_lock-AF_RXRPC (2) afs_open_socket() opens an AF_RXRPC socket and binds it. rxrpc_bind() binds the underlying UDP socket whilst holding its socket lock. inet_bind() takes its own socket lock: sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET (3) Reading from a TCP socket into a userspace buffer might cause a fault and thus cause the kernel to take the mmap_sem, but the TCP socket is locked whilst doing this: sk_lock-AF_INET must be taken before mmap_sem However, lockdep's theory is wrong in this instance because it deals only with lock classes and not individual locks. The AF_INET lock in (2) isn't really equivalent to the AF_INET lock in (3) as the former deals with a socket entirely internal to the kernel that never sees userspace. This is a limitation in the design of lockdep. Fix the general case by: (1) Double up all the locking keys used in sockets so that one set are used if the socket is created by userspace and the other set is used if the socket is created by the kernel. (2) Store the kern parameter passed to sk_alloc() in a variable in the sock struct (sk_kern_sock). This informs sock_lock_init(), sock_init_data() and sk_clone_lock() as to the lock keys to be used. Note that the child created by sk_clone_lock() inherits the parent's kern setting. (3) Add a 'kern' parameter to ->accept() that is analogous to the one passed in to ->create() that distinguishes whether kernel_accept() or sys_accept4() was the caller and can be passed to sk_alloc(). Note that a lot of accept functions merely dequeue an already allocated socket. I haven't touched these as the new socket already exists before we get the parameter. Note also that there are a couple of places where I've made the accepted socket unconditionally kernel-based: irda_accept() rds_rcp_accept_one() tcp_accept_from_sock() because they follow a sock_create_kern() and accept off of that. Whilst creating this, I noticed that lustre and ocfs don't create sockets through sock_create_kern() and thus they aren't marked as for-kernel, though they appear to be internal. I wonder if these should do that so that they use the new set of lock keys. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-20sctp: add support for MSG_MOREXin Long
This patch is to add support for MSG_MORE on sctp. It adds force_delay in sctp_datamsg to save MSG_MORE, and sets it after creating datamsg according to the send flag. sctp_packet_can_append_data then uses it to decide if the chunks of this msg will be sent at once or delay it. Note that unlike [1], this patch saves MSG_MORE in datamsg, instead of in assoc. As sctp enqueues the chunks first, then dequeue them one by one. If it's saved in assoc,the current msg's send flag (MSG_MORE) may affect other chunks' bundling. Since last patch, sctp flush out queue once assoc state falls into SHUTDOWN_PENDING, the close block problem mentioned in [1] has been solved as well. [1] https://patchwork.ozlabs.org/patch/372404/ Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07sctp: drop __packed from almost all SCTP structuresMarcelo Ricardo Leitner
__packed is considered harmful as it potentially generates code that doesn't perform well and its usage should be avoided as much as possible. This patch drops __packed from all SCTP structures except one, which is sctp_signed_cookie. In there it's required, as per changelog on commit 9834a2bb4970 ("[SCTP]: Fix sctp_cookie alignment in the packet."). After this patch, no alignment changes neither in x86 or x86_64 and no exceptions were noticed during testing on both archs. Code size for SCTP module also didn't change with this patch. Cc: David Miller <davem@davemloft.net> Cc: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07sctp: add dst_pending_confirm flagJulian Anastasov
Add new transport flag to allow sockets to confirm neighbour. When same struct dst_entry can be used for many different neighbours we can not use it for pending confirmations. The flag is propagated from transport to every packet. It is reset when cached dst is reset. Reported-by: YueHaibing <yuehaibing@huawei.com> Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.") Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.") Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-18sctp: implement sender-side procedures for SSN Reset Request ParameterXin Long
This patch is to implement sender-side procedures for the Outgoing and Incoming SSN Reset Request Parameter described in rfc6525 section 5.1.2 and 5.1.3. It is also add sockopt SCTP_RESET_STREAMS in rfc6525 section 6.3.2 for users. Note that the new asoc member strreset_outstanding is to make sure only one reconf request chunk on the fly as rfc6525 section 5.1.1 demands. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-18sctp: add sockopt SCTP_ENABLE_STREAM_RESETXin Long
This patch is to add sockopt SCTP_ENABLE_STREAM_RESET to get/set strreset_enable to indicate which reconf request type it supports, which is described in rfc6525 section 6.3.1. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-18sctp: add reconf_enable in asoc ep and netnsXin Long
This patch is to add reconf_enable field in all of asoc ep and netns to indicate if they support stream reset. When initializing, asoc reconf_enable get the default value from ep reconf_enable which is from netns netns reconf_enable by default. It is also to add reconf_capable in asoc peer part to know if peer supports reconf_enable, the value is set if ext params have reconf chunk support when processing init chunk, just as rfc6525 section 5.1.1 demands. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-18sctp: add stream reconf timerXin Long
This patch is to add a per transport timer based on sctp timer frame for stream reconf chunk retransmission. It would start after sending a reconf request chunk, and stop after receiving the response chunk. If the timer expires, besides retransmitting the reconf request chunk, it would also do the same thing with data RTO timer. like to increase the appropriate error counts, and perform threshold management, possibly destroying the asoc if sctp retransmission thresholds are exceeded, just as section 5.1.1 describes. This patch is also to add asoc strreset_chunk, it is used to save the reconf request chunk, so that it can be retransmitted, and to check if the response is really for this request by comparing the information inside with the response chunk as well. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-18sctp: add support for generating stream reconf ssn reset request chunkXin Long
This patch is to add asoc strreset_outseq and strreset_inseq for saving the reconf request sequence, initialize them when create assoc and process init, and also to define Incoming and Outgoing SSN Reset Request Parameter described in rfc6525 section 4.1 and 4.2, As they can be in one same chunk as section rfc6525 3.1-3 describes, it makes them in one function. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-06sctp: prepare asoc stream for stream reconfXin Long
sctp stream reconf, described in RFC 6525, needs a structure to save per stream information in assoc, like stream state. In the future, sctp stream scheduler also needs it to save some stream scheduler params and queues. This patchset is to prepare the stream array in assoc for stream reconf. It defines sctp_stream that includes stream arrays inside to replace ssnmap. Note that we use different structures for IN and OUT streams, as the members in per OUT stream will get more and more different from per IN stream. v1->v2: - put these patches into a smaller group. v2->v3: - define sctp_stream to contain stream arrays, and create stream.c to put stream-related functions. - merge 3 patches into 1, as new sctp_stream has the same name with before. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-28sctp: remove return value from sctp_packet_init/configMarcelo Ricardo Leitner
There is no reason to use this cascading. It doesn't add anything. Let's remove it and simplify. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16sctp: use new rhlist interface on sctp transport rhashtableXin Long
Now sctp transport rhashtable uses hash(lport, dport, daddr) as the key to hash a node to one chain. If in one host thousands of assocs connect to one server with the same lport and different laddrs (although it's not a normal case), all the transports would be hashed into the same chain. It may cause to keep returning -EBUSY when inserting a new node, as the chain is too long and sctp inserts a transport node in a loop, which could even lead to system hangs there. The new rhlist interface works for this case that there are many nodes with the same key in one chain. It puts them into a list then makes this list be as a node of the chain. This patch is to replace rhashtable_ interface with rhltable_ interface. Since a chain would not be too long and it would not return -EBUSY with this fix when inserting a node, the reinsert loop is also removed here. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-13sctp: remove the old ttl expires policyXin Long
The prsctp polices include ttl expires policy already, we should remove the old ttl expires codes, and just adjust the new polices' codes to be compatible with the old one for users. This patch is to remove all the old expires codes, and if prsctp polices are not set, it will still set msg's expires_at and check the expires in sctp_check_abandoned. Note that asoc->prsctp_enable is set by default, so users can't feel any difference even if they use the old expires api in userspace. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-13sctp: reuse sent_count to avoid retransmitted chunks for RTT measurementsXin Long
Now sctp uses chunk->resent to record if a chunk is retransmitted, for RTT measurements with retransmitted DATA chunks. chunk->sent_count was introduced to record how many times one chunk has been sent for prsctp RTX policy before. We actually can know if one chunk is retransmitted by checking chunk->sent_count is greater than 1. This patch is to remove resent from sctp_chunk and reuse sent_count to avoid retransmitted chunks for RTT measurements. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Three sets of overlapping changes. Nothing serious. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30sctp: remove prsctp_param from sctp_chunkXin Long
Now sctp uses chunk->prsctp_param to save the prsctp param for all the prsctp polices, we didn't need to introduce prsctp_param to sctp_chunk. We can just use chunk->sinfo.sinfo_timetolive for RTX and BUF polices, and reuse msg->expires_at for TTL policy, as the prsctp polices and old expires policy are mutual exclusive. This patch is to remove prsctp_param from sctp_chunk, and reuse msg's expires_at for TTL and chunk's sinfo.sinfo_timetolive for RTX and BUF polices. Note that sctp can't use chunk's sinfo.sinfo_timetolive for TTL policy, as it needs a u64 variables to save the expires_at time. This one also fixes the "netperf-Throughput_Mbps -37.2% regression" issue. Fixes: a6c2f792873a ("sctp: implement prsctp TTL policy") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30sctp: move sent_count to the memory hole in sctp_chunkXin Long
Now pahole sctp_chunk, it has 2 memory holes: struct sctp_chunk { struct list_head list; atomic_t refcnt; /* XXX 4 bytes hole, try to pack */ ... long unsigned int prsctp_param; int sent_count; /* XXX 4 bytes hole, try to pack */ This patch is to move up sent_count to fill the 1st one and eliminate the 2nd one. It's not just another struct compaction, it also fixes the "netperf- Throughput_Mbps -37.2% regression" issue when overloading the CPU. Fixes: a6c2f792873a ("sctp: implement prsctp TTL policy") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-18sctp: make sctp_outq_flush/tail/uncork return voidXin Long
sctp_outq_flush return value is meaningless now, this patch is to make sctp_outq_flush return void, as well as sctp_outq_fail and sctp_outq_uncork. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-18sctp: free msg->chunks when sctp_primitive_SEND return errXin Long
Last patch "sctp: do not return the transmit err back to sctp_sendmsg" made sctp_primitive_SEND return err only when asoc state is unavailable. In this case, chunks are not enqueued, they have no chance to be freed if we don't take care of them later. This Patch is actually to revert commit 1cd4d5c4326a ("sctp: remove the unused sctp_datamsg_free()"), commit 69b5777f2e57 ("sctp: hold the chunks only after the chunk is enqueued in outq") and commit 8b570dc9f7b6 ("sctp: only drop the reference on the datamsg after sending a msg"), to use sctp_datamsg_free to free the chunks of current msg. Fixes: 8b570dc9f7b6 ("sctp: only drop the reference on the datamsg after sending a msg") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-13sctp: avoid identifying address family many times for a chunkMarcelo Ricardo Leitner
Identifying address family operations during rx path is not something expensive but it's ugly to the eye to have it done multiple times, specially when we already validated it during initial rx processing. This patch takes advantage of the now shared sctp_input_cb and make the pointer to the operations readily available. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-13sctp: allow GSO frags to access the chunk tooMarcelo Ricardo Leitner
SCTP will try to access original IP headers on sctp_recvmsg in order to copy the addresses used. There are also other places that do similar access to IP or even SCTP headers. But after 90017accff61 ("sctp: Add GSO support") they aren't always there because they are only present in the header skb. SCTP handles the queueing of incoming data by cloning the incoming skb and limiting to only the relevant payload. This clone has its cb updated to something different and it's then queued on socket rx queue. Thus we need to fix this in two moments. For rx path, not related to socket queue yet, this patch uses a partially copied sctp_input_cb to such GSO frags. This restores the ability to access the headers for this part of the code. Regarding the socket rx queue, it removes iif member from sctp_event and also add a chunk pointer on it. With these changes we're always able to reach the headers again. The biggest change here is that now the sctp_chunk struct and the original skb are only freed after the application consumed the buffer. Note however that the original payload was already like this due to the skb cloning. For iif, SCTP's IPv4 code doesn't use it, so no change is necessary. IPv6 now can fetch it directly from original's IPv6 CB as the original skb is still accessible. In the future we probably can simplify sctp_v*_skb_iif() stuff, as sctp_v4_skb_iif() was called but it's return value not used, and now it's not even called, but such cleanup is out of scope for this change. Fixes: 90017accff61 ("sctp: Add GSO support") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-13sctp: allow others to use sctp_input_cbMarcelo Ricardo Leitner
We process input path in other files too and having access to it is nice, so move it to a header where it's shared. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11sctp: implement prsctp PRIO policyXin Long
prsctp PRIO policy is a policy to abandon lower priority chunks when asoc doesn't have enough snd buffer, so that the current chunk with higher priority can be queued successfully. Similar to TTL/RTX policy, we will set the priority of the chunk to prsctp_param with sinfo->sinfo_timetolive in sctp_set_prsctp_policy(). So if PRIO policy is enabled, msg->expire_at won't work. asoc->sent_cnt_removable will record how many chunks can be checked to remove. If priority policy is enabled, when the chunk is queued into the out_queue, we will increase sent_cnt_removable. When the chunk is moved to abandon_queue or dequeue and free, we will decrease sent_cnt_removable. In sctp_sendmsg, we will check if there is enough snd buffer for current msg and if sent_cnt_removable is not 0. Then try to abandon chunks in sctp_prune_prsctp when sendmsg from the retransmit/transmited queue, and free chunks from out_queue in right order until the abandon+free size > msg_len - sctp_wfree. For the abandon size, we have to wait until it sends FORWARD TSN, receives the sack and the chunks are really freed. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11sctp: implement prsctp TTL policyXin Long
prsctp TTL policy is a policy to abandon chunks when they expire at the specific time in local stack. It's similar with expires_at in struct sctp_datamsg. This patch uses sinfo->sinfo_timetolive to set the specific time for TTL policy. sinfo->sinfo_timetolive is also used for msg->expires_at. So if prsctp_enable or TTL policy is not enabled, msg->expires_at still works as before. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11sctp: add SCTP_PR_ASSOC_STATUS on sctp sockoptXin Long
This patch adds SCTP_PR_ASSOC_STATUS to sctp sockopt, which is used to dump the prsctp statistics info from the asoc. The prsctp statistics includes abandoned_sent/unsent from the asoc. abandoned_sent is the count of the packets we drop packets from retransmit/transmited queue, and abandoned_unsent is the count of the packets we drop from out_queue according to the policy. Note: another option for prsctp statistics dump described in rfc is SCTP_PR_STREAM_STATUS, which is used to dump the prsctp statistics info from each stream. But by now, linux doesn't yet have per stream statistics info, it needs rfc6525 to be implemented. As the prsctp statistics for each stream has to be based on per stream statistics, we will delay it until rfc6525 is done in linux. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11sctp: add SCTP_PR_SUPPORTED on sctp sockoptXin Long
According to section 4.5 of rfc7496, prsctp_enable should be per asoc. We will add prsctp_enable to both asoc and ep, and replace the places where it used net.sctp->prsctp_enable with asoc->prsctp_enable. ep->prsctp_enable will be initialized with net.sctp->prsctp_enable, and asoc->prsctp_enable will be initialized with ep->prsctp_enable. We can also modify it's value through sockopt SCTP_PR_SUPPORTED. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03sctp: Add GSO supportMarcelo Ricardo Leitner
SCTP has this pecualiarity that its packets cannot be just segmented to (P)MTU. Its chunks must be contained in IP segments, padding respected. So we can't just generate a big skb, set gso_size to the fragmentation point and deliver it to IP layer. This patch takes a different approach. SCTP will now build a skb as it would be if it was received using GRO. That is, there will be a cover skb with protocol headers and children ones containing the actual segments, already segmented to a way that respects SCTP RFCs. With that, we can tell skb_segment() to just split based on frag_list, trusting its sizes are already in accordance. This way SCTP can benefit from GSO and instead of passing several packets through the stack, it can pass a single large packet. v2: - Added support for receiving GSO frames, as requested by Dave Miller. - Clear skb->cb if packet is GSO (otherwise it's not used by SCTP) - Added heuristics similar to what we have in TCP for not generating single GSO packets that fills cwnd. v3: - consider sctphdr size in skb_gso_transport_seglen() - rebased due to 5c7cdf339af5 ("gso: Remove arbitrary checks for unsupported GSO") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-01sctp: signal sk_data_ready earlier on data chunks receptionMarcelo Ricardo Leitner
Dave Miller pointed out that fb586f25300f ("sctp: delay calls to sk_data_ready() as much as possible") may insert latency specially if the receiving application is running on another CPU and that it would be better if we signalled as early as possible. This patch thus basically inverts the logic on fb586f25300f and signals it as early as possible, similar to what we had before. Fixes: fb586f25300f ("sctp: delay calls to sk_data_ready() as much as possible") Reported-by: Dave Miller <davem@davemloft.net> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts were two cases of simple overlapping changes, nothing serious. In the UDP case, we need to add a hlist_add_tail_rcu() to linux/rculist.h, because we've moved UDP socket handling away from using nulls lists. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-13sctp: delay calls to sk_data_ready() as much as possibleMarcelo Ricardo Leitner
Currently processing of multiple chunks in a single SCTP packet leads to multiple calls to sk_data_ready, causing multiple wake up signals which are costy and doesn't make it wake up any faster. With this patch it will note that the wake up is pending and will do it before leaving the state machine interpreter, latest place possible to do it realiably and cleanly. Note that sk_data_ready events are not dependent on asocs, unlike waking up writers. v2: series re-checked v3: use local vars to cleanup the code, suggested by Jakub Sitnicki Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-13sctp: compress bit-wide flags to a bitfield on sctp_sockMarcelo Ricardo Leitner
It wastes space and gets worse as we add new flags, so convert bit-wide flags to a bitfield. Currently it already saves 4 bytes in sctp_sock, which are left as holes in it for now. The whole struct needs packing, which should be done in another patch. Note that do_auto_asconf cannot be merged, as explained in the comment before it. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-10sctp: avoid refreshing heartbeat timer too oftenMarcelo Ricardo Leitner
Currently on high rate SCTP streams the heartbeat timer refresh can consume quite a lot of resources as timer updates are costly and it contains a random factor, which a) is also costly and b) invalidates mod_timer() optimization for not editing a timer to the same value. It may even cause the timer to be slightly advanced, for no good reason. As suggested by David Laight this patch now removes this timer update from hot path by leaving the timer on and re-evaluating upon its expiration if the heartbeat is still needed or not, similarly to what is done for TCP. If it's not needed anymore the timer is re-scheduled to the new timeout, considering the time already elapsed. For this, we now record the last tx timestamp per transport, updated in the same spots as hb timer was restarted on tx. Also split up sctp_transport_reset_timers into sctp_transport_reset_t3_rtx and sctp_transport_reset_hb_timer, so we can re-arm T3 without re-arming the heartbeat one. On loopback with MTU of 65535 and data chunks with 1636, so that we have a considerable amount of chunks without stressing system calls, netperf -t SCTP_STREAM -l 30, perf looked like this before: Samples: 103K of event 'cpu-clock', Event count (approx.): 25833000000 Overhead Command Shared Object Symbol + 6,15% netperf [kernel.vmlinux] [k] copy_user_enhanced_fast_string - 5,43% netperf [kernel.vmlinux] [k] _raw_write_unlock_irqrestore - _raw_write_unlock_irqrestore - 96,54% _raw_spin_unlock_irqrestore - 36,14% mod_timer + 97,24% sctp_transport_reset_timers + 2,76% sctp_do_sm + 33,65% __wake_up_sync_key + 28,77% sctp_ulpq_tail_event + 1,40% del_timer - 1,84% mod_timer + 99,03% sctp_transport_reset_timers + 0,97% sctp_do_sm + 1,50% sctp_ulpq_tail_event And after this patch, now with netperf -l 60: Samples: 230K of event 'cpu-clock', Event count (approx.): 57707250000 Overhead Command Shared Object Symbol + 5,65% netperf [kernel.vmlinux] [k] memcpy_erms + 5,59% netperf [kernel.vmlinux] [k] copy_user_enhanced_fast_string - 5,05% netperf [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore - _raw_spin_unlock_irqrestore + 49,89% __wake_up_sync_key + 45,68% sctp_ulpq_tail_event - 2,85% mod_timer + 76,51% sctp_transport_reset_t3_rtx + 23,49% sctp_do_sm + 1,55% del_timer + 2,50% netperf [sctp] [k] sctp_datamsg_from_user + 2,26% netperf [sctp] [k] sctp_sendmsg Throughput-wise, from 6800mbps without the patch to 7050mbps with it, ~3.7%. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linuxDavid S. Miller
2016-03-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: "Highlights: 1) Support more Realtek wireless chips, from Jes Sorenson. 2) New BPF types for per-cpu hash and arrap maps, from Alexei Starovoitov. 3) Make several TCP sysctls per-namespace, from Nikolay Borisov. 4) Allow the use of SO_REUSEPORT in order to do per-thread processing of incoming TCP/UDP connections. The muxing can be done using a BPF program which hashes the incoming packet. From Craig Gallek. 5) Add a multiplexer for TCP streams, to provide a messaged based interface. BPF programs can be used to determine the message boundaries. From Tom Herbert. 6) Add 802.1AE MACSEC support, from Sabrina Dubroca. 7) Avoid factorial complexity when taking down an inetdev interface with lots of configured addresses. We were doing things like traversing the entire address less for each address removed, and flushing the entire netfilter conntrack table for every address as well. 8) Add and use SKB bulk free infrastructure, from Jesper Brouer. 9) Allow offloading u32 classifiers to hardware, and implement for ixgbe, from John Fastabend. 10) Allow configuring IRQ coalescing parameters on a per-queue basis, from Kan Liang. 11) Extend ethtool so that larger link mode masks can be supported. From David Decotigny. 12) Introduce devlink, which can be used to configure port link types (ethernet vs Infiniband, etc.), port splitting, and switch device level attributes as a whole. From Jiri Pirko. 13) Hardware offload support for flower classifiers, from Amir Vadai. 14) Add "Local Checksum Offload". Basically, for a tunneled packet the checksum of the outer header is 'constant' (because with the checksum field filled into the inner protocol header, the payload of the outer frame checksums to 'zero'), and we can take advantage of that in various ways. From Edward Cree" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1548 commits) bonding: fix bond_get_stats() net: bcmgenet: fix dma api length mismatch net/mlx4_core: Fix backward compatibility on VFs phy: mdio-thunder: Fix some Kconfig typos lan78xx: add ndo_get_stats64 lan78xx: handle statistics counter rollover RDS: TCP: Remove unused constant RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket net: smc911x: convert pxa dma to dmaengine team: remove duplicate set of flag IFF_MULTICAST bonding: remove duplicate set of flag IFF_MULTICAST net: fix a comment typo ethernet: micrel: fix some error codes ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it bpf, dst: add and use dst_tclassid helper bpf: make skb->tc_classid also readable net: mvneta: bm: clarify dependencies cls_bpf: reset class and reuse major in da ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c ldmvsw: Add ldmvsw.c driver code ...