Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bpf, WiFi and netfilter.
Current release - regressions:
- ipv6: fix address dump when IPv6 is disabled on an interface
Current release - new code bugs:
- bpf: temporarily disable atomic operations in BPF arena
- nexthop: fix uninitialized variable in nla_put_nh_group_stats()
Previous releases - regressions:
- bpf: protect against int overflow for stack access size
- hsr: fix the promiscuous mode in offload mode
- wifi: don't always use FW dump trig
- tls: adjust recv return with async crypto and failed copy to
userspace
- tcp: properly terminate timers for kernel sockets
- ice: fix memory corruption bug with suspend and rebuild
- at803x: fix kernel panic with at8031_probe
- qeth: handle deferred cc1
Previous releases - always broken:
- bpf: fix bug in BPF_LDX_MEMSX
- netfilter: reject table flag and netdev basechain updates
- inet_defrag: prevent sk release while still in use
- wifi: pick the version of SESSION_PROTECTION_NOTIF
- wwan: t7xx: split 64bit accesses to fix alignment issues
- mlxbf_gige: call request_irq() after NAPI initialized
- hns3: fix kernel crash when devlink reload during pf
initialization"
* tag 'net-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits)
inet: inet_defrag: prevent sk release while still in use
Octeontx2-af: fix pause frame configuration in GMP mode
net: lan743x: Add set RFE read fifo threshold for PCI1x1x chips
net: bcmasp: Remove phy_{suspend/resume}
net: bcmasp: Bring up unimac after PHY link up
net: phy: qcom: at803x: fix kernel panic with at8031_probe
netfilter: arptables: Select NETFILTER_FAMILY_ARP when building arp_tables.c
netfilter: nf_tables: skip netdev hook unregistration if table is dormant
netfilter: nf_tables: reject table flag and netdev basechain updates
netfilter: nf_tables: reject destroy command to remove basechain hooks
bpf: update BPF LSM designated reviewer list
bpf: Protect against int overflow for stack access size
bpf: Check bloom filter map value size
bpf: fix warning for crash_kexec
selftests: netdevsim: set test timeout to 10 minutes
net: wan: framer: Add missing static inline qualifiers
mlxbf_gige: call request_irq() after NAPI initialized
tls: get psock ref after taking rxlock to avoid leak
selftests: tls: add test with a partially invalid iov
tls: adjust recv return with async crypto and failed copy to userspace
...
|
|
The initial igc Tx timestamping implementation used only one register for
retrieving Tx timestamps. Commit 3ed247e78911 ("igc: Add support for
multiple in-flight TX timestamps") added support for utilizing all four of
them e.g., for multiple domain support. Remove the stale comment/FIXME.
Fixes: 3ed247e78911 ("igc: Add support for multiple in-flight TX timestamps")
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Change kzalloc() flags used in ixgbe_ipsec_vf_add_sa() to GFP_ATOMIC, to
avoid sleeping in IRQ context.
Dan Carpenter, with the help of Smatch, has found following issue:
The patch eda0333ac293: "ixgbe: add VF IPsec management" from Aug 13,
2018 (linux-next), leads to the following Smatch static checker
warning: drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c:917 ixgbe_ipsec_vf_add_sa()
warn: sleeping in IRQ context
The call tree that Smatch is worried about is:
ixgbe_msix_other() <- IRQ handler
-> ixgbe_msg_task()
-> ixgbe_rcv_msg_from_vf()
-> ixgbe_ipsec_vf_add_sa()
Fixes: eda0333ac293 ("ixgbe: add VF IPsec management")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/intel-wired-lan/db31a0b0-4d9f-4e6b-aed8-88266eb5665c@moroto.mountain
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The ice driver would previously panic after suspend. This is caused
from the driver *only* calling the ice_vsi_free_q_vectors() function by
itself, when it is suspending. Since commit b3e7b3a6ee92 ("ice: prevent
NULL pointer deref during reload") the driver has zeroed out
num_q_vectors, and only restored it in ice_vsi_cfg_def().
This further causes the ice_rebuild() function to allocate a zero length
buffer, after which num_q_vectors is updated, and then the new value of
num_q_vectors is used to index into the zero length buffer, which
corrupts memory.
The fix entails making sure all the code referencing num_q_vectors only
does so after it has been reset via ice_vsi_cfg_def().
I didn't perform a full bisect, but I was able to test against 6.1.77
kernel and that ice driver works fine for suspend/resume with no panic,
so sometime since then, this problem was introduced.
Also clean up an un-needed init of a local variable in the function
being modified.
PANIC from 6.8.0-rc1:
[1026674.915596] PM: suspend exit
[1026675.664697] ice 0000:17:00.1: PTP reset successful
[1026675.664707] ice 0000:17:00.1: 2755 msecs passed between update to cached PHC time
[1026675.667660] ice 0000:b1:00.0: PTP reset successful
[1026675.675944] ice 0000:b1:00.0: 2832 msecs passed between update to cached PHC time
[1026677.137733] ixgbe 0000:31:00.0 ens787: NIC Link is Up 1 Gbps, Flow Control: None
[1026677.190201] BUG: kernel NULL pointer dereference, address: 0000000000000010
[1026677.192753] ice 0000:17:00.0: PTP reset successful
[1026677.192764] ice 0000:17:00.0: 4548 msecs passed between update to cached PHC time
[1026677.197928] #PF: supervisor read access in kernel mode
[1026677.197933] #PF: error_code(0x0000) - not-present page
[1026677.197937] PGD 1557a7067 P4D 0
[1026677.212133] ice 0000:b1:00.1: PTP reset successful
[1026677.212143] ice 0000:b1:00.1: 4344 msecs passed between update to cached PHC time
[1026677.212575]
[1026677.243142] Oops: 0000 [#1] PREEMPT SMP NOPTI
[1026677.247918] CPU: 23 PID: 42790 Comm: kworker/23:0 Kdump: loaded Tainted: G W 6.8.0-rc1+ #1
[1026677.257989] Hardware name: Intel Corporation M50CYP2SBSTD/M50CYP2SBSTD, BIOS SE5C620.86B.01.01.0005.2202160810 02/16/2022
[1026677.269367] Workqueue: ice ice_service_task [ice]
[1026677.274592] RIP: 0010:ice_vsi_rebuild_set_coalesce+0x130/0x1e0 [ice]
[1026677.281421] Code: 0f 84 3a ff ff ff 41 0f b7 74 ec 02 66 89 b0 22 02 00 00 81 e6 ff 1f 00 00 e8 ec fd ff ff e9 35 ff ff ff 48 8b 43 30 49 63 ed <41> 0f b7 34 24 41 83 c5 01 48 8b 3c e8 66 89 b7 aa 02 00 00 81 e6
[1026677.300877] RSP: 0018:ff3be62a6399bcc0 EFLAGS: 00010202
[1026677.306556] RAX: ff28691e28980828 RBX: ff28691e41099828 RCX: 0000000000188000
[1026677.314148] RDX: 0000000000000000 RSI: 0000000000000010 RDI: ff28691e41099828
[1026677.321730] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[1026677.329311] R10: 0000000000000007 R11: ffffffffffffffc0 R12: 0000000000000010
[1026677.336896] R13: 0000000000000000 R14: 0000000000000000 R15: ff28691e0eaa81a0
[1026677.344472] FS: 0000000000000000(0000) GS:ff28693cbffc0000(0000) knlGS:0000000000000000
[1026677.353000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1026677.359195] CR2: 0000000000000010 CR3: 0000000128df4001 CR4: 0000000000771ef0
[1026677.366779] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1026677.374369] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[1026677.381952] PKRU: 55555554
[1026677.385116] Call Trace:
[1026677.388023] <TASK>
[1026677.390589] ? __die+0x20/0x70
[1026677.394105] ? page_fault_oops+0x82/0x160
[1026677.398576] ? do_user_addr_fault+0x65/0x6a0
[1026677.403307] ? exc_page_fault+0x6a/0x150
[1026677.407694] ? asm_exc_page_fault+0x22/0x30
[1026677.412349] ? ice_vsi_rebuild_set_coalesce+0x130/0x1e0 [ice]
[1026677.418614] ice_vsi_rebuild+0x34b/0x3c0 [ice]
[1026677.423583] ice_vsi_rebuild_by_type+0x76/0x180 [ice]
[1026677.429147] ice_rebuild+0x18b/0x520 [ice]
[1026677.433746] ? delay_tsc+0x8f/0xc0
[1026677.437630] ice_do_reset+0xa3/0x190 [ice]
[1026677.442231] ice_service_task+0x26/0x440 [ice]
[1026677.447180] process_one_work+0x174/0x340
[1026677.451669] worker_thread+0x27e/0x390
[1026677.455890] ? __pfx_worker_thread+0x10/0x10
[1026677.460627] kthread+0xee/0x120
[1026677.464235] ? __pfx_kthread+0x10/0x10
[1026677.468445] ret_from_fork+0x2d/0x50
[1026677.472476] ? __pfx_kthread+0x10/0x10
[1026677.476671] ret_from_fork_asm+0x1b/0x30
[1026677.481050] </TASK>
Fixes: b3e7b3a6ee92 ("ice: prevent NULL pointer deref during reload")
Reported-by: Robert Elliott <elliott@hpe.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
According to the datasheet, the recipe association data is an 8-byte
little-endian value. It is described as 'Bitmap of the recipe indexes
associated with this profile', it is from 24 to 31 byte area in FW.
Therefore, it is defined to '__le64 recipe_assoc' in struct
ice_aqc_recipe_to_profile. And then fix the bitmap casting issue, as we
must never ever use castings for bitmap type.
Fixes: 1e0f9881ef79 ("ice: Flesh out implementation of support for SRIOV on bonded interface")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Andrii Staikov <andrii.staikov@intel.com>
Reviewed-by: Jan Sokolowski <jan.sokolowski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Steven Zou <steven.zou@intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The norm should be flexible array structures with __counted_by
annotations, so DEFINE_FLEX() is updated to expect that. Rename
the non-annotated version to DEFINE_RAW_FLEX(), and update the
few existing users. Additionally add selftests for the macros.
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20240306235128.it.933-kees@kernel.org
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Merge in late fixes to prepare for the 6.9 net-next PR.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
ethtool: ice: Support for RSS settings to GTP
Takeru Hayasaka enables RSS functionality for GTP packets on ice driver
with ethtool.
A user can include TEID and make RSS work for GTP-U over IPv4 by doing the
following:`ethtool -N ens3 rx-flow-hash gtpu4 sde`
In addition to gtpu(4|6), we now support gtpc(4|6),gtpc(4|6)t,gtpu(4|6)e,
gtpu(4|6)u, and gtpu(4|6)d.
gtpc(4|6): Used for GTP-C in IPv4 and IPv6, where the GTP header format does
not include a TEID.
gtpc(4|6)t: Used for GTP-C in IPv4 and IPv6, with a GTP header format that
includes a TEID.
gtpu(4|6): Used for GTP-U in both IPv4 and IPv6 scenarios.
gtpu(4|6)e: Used for GTP-U with extended headers in both IPv4 and IPv6.
gtpu(4|6)u: Used when the PSC (PDU session container) in the GTP-U extended
header includes Uplink, applicable to both IPv4 and IPv6.
gtpu(4|6)d: Used when the PSC in the GTP-U extended header includes Downlink,
for both IPv4 and IPv6.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-03-06 (iavf, i40e, ixgbe)
This series contains updates to iavf, i40e, and ixgbe drivers.
Alexey Kodanev removes duplicate calls related to cloud filters on iavf
and unnecessary null checks on i40e.
Maciej adds helper functions for common code relating to updating
statistics for ixgbe.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move RPS related structures and helpers from include/linux/netdevice.h
and include/net/sock.h to a new include file.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240306160031.874438-18-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
net/core/page_pool_user.c
0b11b1c5c320 ("netdev: let netlink core handle -EMSGSIZE errors")
429679dcf7d9 ("page_pool: fix netlink dump stop/resume")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Introduce ixgbe_update_{r,t}x_ring_stats() that will be used by both
standard and ZC datapath.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The "qv_info" ptr cannot be NULL when it gets the address of
an element of the flexible array "qvlist_info->qv_info".
Detected using the static analysis tool - Svace.
Signed-off-by: Alexey Kodanev <aleksei.kodanev@bell-sw.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
There are currently two pairs of identical checks and calls
to iavf_{add|del}_cloud_filter().
Detected using the static analysis tool - Svace.
Signed-off-by: Alexey Kodanev <aleksei.kodanev@bell-sw.com>
Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Following the addition of new GTP RSS hash options to ethtool.h, this patch
implements the corresponding RSS settings for GTP packets in the Intel ice
driver. It enables users to configure RSS for GTP-U and GTP-C traffic over IPv4
and IPv6, utilizing the newly defined hash options.
The implementation covers the handling of gtpu(4|6), gtpc(4|6), gtpc(4|6)t,
gtpu(4|6)e, gtpu(4|6)u, and gtpu(4|6)d traffic, providing enhanced load
distribution for GTP traffic across multiple processing units.
Signed-off-by: Takeru Hayasaka <hayatake396@gmail.com>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Simplify stats accumulation logic to fix the case where we don't take
previous stat value into account, we should always respect it.
Main netdev stats of our PF (Tx/Rx packets/bytes) were reported orders of
magnitude too big during OpenStack reconfiguration events, possibly other
reconfiguration cases too.
The regression was reported to be between 6.1 and 6.2, so I was almost
certain that on of the two "preserve stats over reset" commits were the
culprit. While reading the code, it was found that in some cases we will
increase the stats by arbitrarily large number (thanks to ignoring "-prev"
part of condition, after zeroing it).
Note that this fixes also the case where we were around limits of u64, but
that was not the regression reported.
Full disclosure: I remember suggesting this particular piece of code to
Ben a few years ago, so blame on me.
Fixes: 2fd5e433cd26 ("ice: Accumulate HW and Netdev statistics over reset")
Reported-by: Nebojsa Stevanovic <nebojsa.stevanovic@gcore.com>
Link: https://lore.kernel.org/intel-wired-lan/VI1PR02MB439744DEDAA7B59B9A2833FE912EA@VI1PR02MB4397.eurprd02.prod.outlook.com
Reported-by: Christian Rohmann <christian.rohmann@inovex.de>
Link: https://lore.kernel.org/intel-wired-lan/f38a6ca4-af05-48b1-a3e6-17ef2054e525@inovex.de
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Fix "double" clearing of interrupts, which can cause external events
or timestamps to be missed.
The E1000_TSIRC Time Sync Interrupt Cause register can be cleared in two
ways, by either reading it or by writing '1' into the specific cause
bit. This is documented in section 8.16.1.
The following flow was used:
1. read E1000_TSIRC into 'tsicr';
2. handle the interrupts present into 'tsirc' and mark them in 'ack';
3. write 'ack' into E1000_TSICR;
As both (1) and (3) will clear the interrupt cause, if the same
interrupt happens again between (1) and (3) it will be ignored,
causing events to be missed.
Remove the extra clear in (3).
Fixes: 00c65578b47b ("igb: enable internal PPS for the i210")
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Fix "double" clearing of interrupts, which can cause external events
or timestamps to be missed.
The IGC_TSIRC Time Sync Interrupt Cause register can be cleared in two
ways, by either reading it or by writing '1' into the specific cause
bit. This is documented in section 8.16.1.
The following flow was used:
1. read IGC_TSIRC into 'tsicr';
2. handle the interrupts present in 'tsirc' and mark them in 'ack';
3. write 'ack' into IGC_TSICR;
As both (1) and (3) will clear the interrupt cause, if the same
interrupt happens again between (1) and (3) it will be ignored,
causing events to be missed.
Remove the extra clear in (3).
Fixes: 2c344ae24501 ("igc: Add support for TX timestamping")
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Kurt Kanzenbach <kurt@linutronix.de> # Intel i225
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-03-04 (ice)
This series contains updates to ice driver only.
Jake changes the driver to use relative VSI index for VF VSIs as the VF
driver has no direct use of the VSI number on ice hardware. He also
reworks some Tx/Rx functions to clarify their uses, cleans up some style
issues, and utilizes kernel helper functions.
Maciej removes a redundant call to disable Tx queues on ifdown and
removes some unnecessary devm usages.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
From: Tony Nguyen <anthony.l.nguyen@intel.com>
To: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com,
edumazet@google.com, netdev@vger.kernel.org
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>, alan.brady@intel.com
Tony Nguyen says:
====================
idpf: refactor virtchnl messages
Alan Brady says:
The motivation for this series has two primary goals. We want to enable
support of multiple simultaneous messages and make the channel more
robust. The way it works right now, the driver can only send and receive
a single message at a time and if something goes really wrong, it can
lead to data corruption and strange bugs.
To start the series, we introduce an idpf_virtchnl.h file. This reduces
the burden on idpf.h which is overloaded with struct and function
declarations.
The conversion works by conceptualizing a send and receive as a
"virtchnl transaction" (idpf_vc_xn) and introducing a "transaction
manager" (idpf_vc_xn_manager). The vcxn_mngr will init a ring of
transactions from which the driver will pop from a bitmap of free
transactions to track in-flight messages. Instead of needing to handle a
complicated send/recv for every a message, the driver now just needs to
fill out a xn_params struct and hand it over to idpf_vc_xn_exec which
will take care of all the messy bits. Once a message is sent and
receives a reply, we leverage the completion API to signal the received
buffer is ready to be used (assuming success, or an error code
otherwise).
At a low-level, this implements the "sw cookie" field of the virtchnl
message descriptor to enable this. We have 16 bits we can put whatever
we want and the recipient is required to apply the same cookie to the
reply for that message. We use the first 8 bits as an index into the
array of transactions to enable fast lookups and we use the second 8
bits as a salt to make sure each cookie is unique for that message. As
transactions are received in arbitrary order, it's possible to reuse a
transaction index and the salt guards against index conflicts to make
certain the lookup is correct. As a primitive example, say index 1 is
used with salt 1. The message times out without receiving a reply so
index 1 is renewed to be ready for a new transaction, we report the
timeout, and send the message again. Since index 1 is free to be used
again now, index 1 is again sent but now salt is 2. This time we do get
a reply, however it could be that the reply is _actually_ for the
previous send index 1 with salt 1. Without the salt we would have no
way of knowing for sure if it's the correct reply, but with we will know
for certain.
Through this conversion we also get several other benefits. We can now
more appropriately handle asynchronously sent messages by providing
space for a callback to be defined. This notably allows us to handle MAC
filter failures better; previously we could potentially have stale,
failed filters in our list, which shouldn't really have a major impact
but is obviously not correct. I also managed to remove fairly
significant more lines than I added which is a win in my book.
Additionally, this converts some variables to use auto-variables where
appropriate. This makes the alloc paths much cleaner and less prone to
memory leaks. We also fix a few virtchnl related bugs while we're here.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-03-05 (idpf, ice, i40e, igc, e1000e)
This series contains updates to idpf, ice, i40e, igc and e1000e drivers.
Emil disables local BH on NAPI schedule for proper handling of softirqs
on idpf.
Jake stops reporting of virtchannel RSS option which in unsupported on
ice.
Rand Deeb adds null check to prevent possible null pointer dereference
on ice.
Michal Schmidt moves DPLL mutex initialization to resolve uninitialized
mutex usage for ice.
Jesse fixes incorrect variable usage for calculating Tx stats on ice.
Ivan Vecera corrects logic for firmware equals check on i40e.
Florian Kauer prevents memory corruption for XDP_REDIRECT on igc.
Sasha reverts an incorrect use of FIELD_GET which caused a regression
for Wake on LAN on e1000e.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Older versions of GCC really want to know the full definition
of the type involved in rcu_assign_pointer().
struct dpll_pin is defined in a local header, net/core can't
reach it. Move all the netdev <> dpll code into dpll, where
the type is known. Otherwise we'd need multiple function calls
to jump between the compilation units.
This is the same problem the commit under fixes was trying to address,
but with rcu_assign_pointer() not rcu_dereference().
Some of the exports are not needed, networking core can't
be a module, we only need exports for the helpers used by
drivers.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/all/35a869c8-52e8-177-1d4d-e57578b99b6@linux-m68k.org/
Fixes: 640f41ed33b5 ("dpll: fix build failure due to rcu_dereference_check() on unknown type")
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240305013532.694866-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Refactoring of the field get conversion introduced a regression in the
legacy Wake On Lan from a magic packet with i219 devices. Rx address
copied not correctly from MAC to PHY with FIELD_GET macro.
Fixes: b9a452545075 ("intel: legacy: field get conversion")
Suggested-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
When a frame can not be transmitted in XDP_REDIRECT
(e.g. due to a full queue), it is necessary to free
it by calling xdp_return_frame_rx_napi.
However, this is the responsibility of the caller of
the ndo_xdp_xmit (see for example bq_xmit_all in
kernel/bpf/devmap.c) and thus calling it inside
igc_xdp_xmit (which is the ndo_xdp_xmit of the igc
driver) as well will lead to memory corruption.
In fact, bq_xmit_all expects that it can return all
frames after the last successfully transmitted one.
Therefore, break for the first not transmitted frame,
but do not call xdp_return_frame_rx_napi in igc_xdp_xmit.
This is equally implemented in other Intel drivers
such as the igb.
There are two alternatives to this that were rejected:
1. Return num_frames as all the frames would have been
transmitted and release them inside igc_xdp_xmit.
While it might work technically, it is not what
the return value is meant to represent (i.e. the
number of SUCCESSFULLY transmitted packets).
2. Rework kernel/bpf/devmap.c and all drivers to
support non-consecutively dropped packets.
Besides being complex, it likely has a negative
performance impact without a significant gain
since it is anyway unlikely that the next frame
can be transmitted if the previous one was dropped.
The memory corruption can be reproduced with
the following script which leads to a kernel panic
after a few seconds. It basically generates more
traffic than a i225 NIC can transmit and pushes it
via XDP_REDIRECT from a virtual interface to the
physical interface where frames get dropped.
#!/bin/bash
INTERFACE=enp4s0
INTERFACE_IDX=`cat /sys/class/net/$INTERFACE/ifindex`
sudo ip link add dev veth1 type veth peer name veth2
sudo ip link set up $INTERFACE
sudo ip link set up veth1
sudo ip link set up veth2
cat << EOF > redirect.bpf.c
SEC("prog")
int redirect(struct xdp_md *ctx)
{
return bpf_redirect($INTERFACE_IDX, 0);
}
char _license[] SEC("license") = "GPL";
EOF
clang -O2 -g -Wall -target bpf -c redirect.bpf.c -o redirect.bpf.o
sudo ip link set veth2 xdp obj redirect.bpf.o
cat << EOF > pass.bpf.c
SEC("prog")
int pass(struct xdp_md *ctx)
{
return XDP_PASS;
}
char _license[] SEC("license") = "GPL";
EOF
clang -O2 -g -Wall -target bpf -c pass.bpf.c -o pass.bpf.o
sudo ip link set $INTERFACE xdp obj pass.bpf.o
cat << EOF > trafgen.cfg
{
/* Ethernet Header */
0xe8, 0x6a, 0x64, 0x41, 0xbf, 0x46,
0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
const16(ETH_P_IP),
/* IPv4 Header */
0b01000101, 0, # IPv4 version, IHL, TOS
const16(1028), # IPv4 total length (UDP length + 20 bytes (IP header))
const16(2), # IPv4 ident
0b01000000, 0, # IPv4 flags, fragmentation off
64, # IPv4 TTL
17, # Protocol UDP
csumip(14, 33), # IPv4 checksum
/* UDP Header */
10, 0, 1, 1, # IP Src - adapt as needed
10, 0, 1, 2, # IP Dest - adapt as needed
const16(6666), # UDP Src Port
const16(6666), # UDP Dest Port
const16(1008), # UDP length (UDP header 8 bytes + payload length)
csumudp(14, 34), # UDP checksum
/* Payload */
fill('W', 1000),
}
EOF
sudo trafgen -i trafgen.cfg -b3000MB -o veth1 --cpp
Fixes: 4ff320361092 ("igc: Add support for XDP_REDIRECT action")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Helper i40e_is_fw_ver_eq() compares incorrectly given firmware version
as it returns true when the major version of running firmware is
greater than the given major version that is wrong and results in
failure during getting of DCB configuration where this helper is used.
Fix the check and return true only if the running FW version is exactly
equals to the given version.
Reproducer:
1. Load i40e driver
2. Check dmesg output
[root@host ~]# modprobe i40e
[root@host ~]# dmesg | grep 'i40e.*DCB'
[ 74.750642] i40e 0000:02:00.0: Query for DCB configuration failed, err -EIO aq_err I40E_AQ_RC_EINVAL
[ 74.759770] i40e 0000:02:00.0: DCB init failed -5, disabled
[ 74.966550] i40e 0000:02:00.1: Query for DCB configuration failed, err -EIO aq_err I40E_AQ_RC_EINVAL
[ 74.975683] i40e 0000:02:00.1: DCB init failed -5, disabled
Fixes: cf488e13221f ("i40e: Add other helpers to check version of running firmware and AQ API")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Fix an obviously incorrect assignment, created with a typo or cut-n-paste
error.
Fixes: 5995ef88e3a8 ("ice: realloc VSI stats arrays")
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The pf->dplls.lock mutex is initialized too late, after its first use.
Move it to the top of ice_dpll_init.
Note that the "err_exit" error path destroys the mutex. And the mutex is
the last thing destroyed in ice_dpll_deinit.
This fixes the following warning with CONFIG_DEBUG_MUTEXES:
ice 0000:10:00.0: The DDP package was successfully loaded: ICE OS Default Package version 1.3.36.0
ice 0000:10:00.0: 252.048 Gb/s available PCIe bandwidth (16.0 GT/s PCIe x16 link)
ice 0000:10:00.0: PTP init successful
------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(lock->magic != lock)
WARNING: CPU: 0 PID: 410 at kernel/locking/mutex.c:587 __mutex_lock+0x773/0xd40
Modules linked in: crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ice(+) nvme nvme_c>
CPU: 0 PID: 410 Comm: kworker/0:4 Not tainted 6.8.0-rc5+ #3
Hardware name: HPE ProLiant DL110 Gen10 Plus/ProLiant DL110 Gen10 Plus, BIOS U56 10/19/2023
Workqueue: events work_for_cpu_fn
RIP: 0010:__mutex_lock+0x773/0xd40
Code: c0 0f 84 1d f9 ff ff 44 8b 35 0d 9c 69 01 45 85 f6 0f 85 0d f9 ff ff 48 c7 c6 12 a2 a9 85 48 c7 c7 12 f1 a>
RSP: 0018:ff7eb1a3417a7ae0 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000
RDX: 0000000000000002 RSI: ffffffff85ac2bff RDI: 00000000ffffffff
RBP: ff7eb1a3417a7b80 R08: 0000000000000000 R09: 00000000ffffbfff
R10: ff7eb1a3417a7978 R11: ff32b80f7fd2e568 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: ff32b7f02c50e0d8
FS: 0000000000000000(0000) GS:ff32b80efe800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055b5852cc000 CR3: 000000003c43a004 CR4: 0000000000771ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
? __warn+0x84/0x170
? __mutex_lock+0x773/0xd40
? report_bug+0x1c7/0x1d0
? prb_read_valid+0x1b/0x30
? handle_bug+0x42/0x70
? exc_invalid_op+0x18/0x70
? asm_exc_invalid_op+0x1a/0x20
? __mutex_lock+0x773/0xd40
? rcu_is_watching+0x11/0x50
? __kmalloc_node_track_caller+0x346/0x490
? ice_dpll_lock_status_get+0x28/0x50 [ice]
? __pfx_ice_dpll_lock_status_get+0x10/0x10 [ice]
? ice_dpll_lock_status_get+0x28/0x50 [ice]
ice_dpll_lock_status_get+0x28/0x50 [ice]
dpll_device_get_one+0x14f/0x2e0
dpll_device_event_send+0x7d/0x150
dpll_device_register+0x124/0x180
ice_dpll_init_dpll+0x7b/0xd0 [ice]
ice_dpll_init+0x224/0xa40 [ice]
? _dev_info+0x70/0x90
ice_load+0x468/0x690 [ice]
ice_probe+0x75b/0xa10 [ice]
? _raw_spin_unlock_irqrestore+0x4f/0x80
? process_one_work+0x1a3/0x500
local_pci_probe+0x47/0xa0
work_for_cpu_fn+0x17/0x30
process_one_work+0x20d/0x500
worker_thread+0x1df/0x3e0
? __pfx_worker_thread+0x10/0x10
kthread+0x103/0x140
? __pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
</TASK>
irq event stamp: 125197
hardirqs last enabled at (125197): [<ffffffff8416409d>] finish_task_switch.isra.0+0x12d/0x3d0
hardirqs last disabled at (125196): [<ffffffff85134044>] __schedule+0xea4/0x19f0
softirqs last enabled at (105334): [<ffffffff84e1e65a>] napi_get_frags_check+0x1a/0x60
softirqs last disabled at (105332): [<ffffffff84e1e65a>] napi_get_frags_check+0x1a/0x60
---[ end trace 0000000000000000 ]---
Fixes: d7999f5ea64b ("ice: implement dpll interface to control cgu")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The function ice_bridge_setlink() may encounter a NULL pointer dereference
if nlmsg_find_attr() returns NULL and br_spec is dereferenced subsequently
in nla_for_each_nested(). To address this issue, add a check to ensure that
br_spec is not NULL before proceeding with the nested attribute iteration.
Fixes: b1edc14a3fbf ("ice: Implement ice_bridge_getlink and ice_bridge_setlink")
Signed-off-by: Rand Deeb <rand.sec96@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The E800 series hardware uses the same iAVF driver as older devices,
including the virtchnl negotiation scheme.
This negotiation scheme includes a mechanism to determine what type of RSS
should be supported, including RSS over PF virtchnl messages, RSS over
firmware AdminQ messages, and RSS via direct register access.
The PF driver will always prefer VIRTCHNL_VF_OFFLOAD_RSS_PF if its
supported by the VF driver. However, if an older VF driver is loaded, it
may request only VIRTCHNL_VF_OFFLOAD_RSS_REG or VIRTCHNL_VF_OFFLOAD_RSS_AQ.
The ice driver happily agrees to support these methods. Unfortunately, the
underlying hardware does not support these mechanisms. The E800 series VFs
don't have the appropriate registers for RSS_REG. The mailbox queue used by
VFs for VF to PF communication blocks messages which do not have the
VF-to-PF opcode.
Stop lying to the VF that it could support RSS over AdminQ or registers, as
these interfaces do not work when the hardware is operating on an E800
series device.
In practice this is unlikely to be hit by any normal user. The iAVF driver
has supported RSS over PF virtchnl commands since 2016, and always defaults
to using RSS_PF if possible.
In principle, nothing actually stops the existing VF from attempting to
access the registers or send an AQ command. However a properly coded VF
will check the capability flags and will report a more useful error if it
detects a case where the driver does not support the RSS offloads that it
does.
Fixes: 1071a8358a28 ("ice: Implement virtchnl commands for AVF support")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Alan Brady <alan.brady@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Fix softirq's not being handled during napi_schedule() call when
receiving marker packets for queue disable by disabling local bottom
half.
The issue can be seen on ifdown:
NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!!
Using ftrace to catch the failing scenario:
ifconfig [003] d.... 22739.830624: softirq_raise: vec=3 [action=NET_RX]
<idle>-0 [003] ..s.. 22739.831357: softirq_entry: vec=3 [action=NET_RX]
No interrupt and CPU is idle.
After the patch when disabling local BH before calling napi_schedule:
ifconfig [003] d.... 22993.928336: softirq_raise: vec=3 [action=NET_RX]
ifconfig [003] ..s1. 22993.928337: softirq_entry: vec=3 [action=NET_RX]
Fixes: c2d548cad150 ("idpf: add TX splitq napi poll support")
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Alan Brady <alan.brady@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add curly braces to avoid entering to an if statement where it is not
always required in e1000_shutdown function.
This improves code readability and might prevent non-deterministic
behaviour in the future.
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20240301184806.2634508-5-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When IGC is built-in but LEDS_CLASS is a loadable module, there is
a link failure:
x86_64-linux-ld: drivers/net/ethernet/intel/igc/igc_leds.o: in function `igc_led_setup':
igc_leds.c:(.text+0x75c): undefined reference to `devm_led_classdev_register_ext'
Add another dependency that prevents this combination.
Fixes: ea578703b03d ("igc: Add support for LEDs on i225/i226")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20240301184806.2634508-4-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Added support for 1000BASE-BX, i.e. Gigabit Ethernet over single strand
of single-mode fiber.
The initialization of a 1000BASE-BX SFP is the same as 1000BASE-SX/LX
with the only difference that the Bit Rate Nominal Value must be
checked to make sure it is a Gigabit Ethernet transceiver, as described
by the SFF-8472 specification.
This was tested with the FS.com SFP-GE-BX 1310/1490nm 10km transceiver:
$ ethtool -m eth4
Identifier : 0x03 (SFP)
Extended identifier : 0x04 (GBIC/SFP defined by 2-wire interface ID)
Connector : 0x07 (LC)
Transceiver codes : 0x00 0x00 0x00 0x40 0x00 0x00 0x00 0x00 0x00
Transceiver type : Ethernet: BASE-BX10
Encoding : 0x01 (8B/10B)
BR, Nominal : 1300MBd
Rate identifier : 0x00 (unspecified)
Length (SMF,km) : 10km
Length (SMF) : 10000m
Length (50um) : 0m
Length (62.5um) : 0m
Length (Copper) : 0m
Length (OM3) : 0m
Laser wavelength : 1310nm
Vendor name : FS
Vendor OUI : 64:9d:99
Vendor PN : SFP-GE-BX
Vendor rev :
Option values : 0x20 0x0a
Option : RX_LOS implemented
Option : TX_FAULT implemented
Option : Power level 3 requirement
BR margin, max : 0%
BR margin, min : 0%
Vendor SN : S2202359108
Date code : 220307
Optical diagnostics support : Yes
Laser bias current : 17.650 mA
Laser output power : 0.2132 mW / -6.71 dBm
Receiver signal average optical power : 0.2740 mW / -5.62 dBm
Module temperature : 47.30 degrees C / 117.13 degrees F
Module voltage : 3.2576 V
Alarm/warning flags implemented : Yes
Laser bias current high alarm : Off
Laser bias current low alarm : Off
Laser bias current high warning : Off
Laser bias current low warning : Off
Laser output power high alarm : Off
Laser output power low alarm : Off
Laser output power high warning : Off
Laser output power low warning : Off
Module temperature high alarm : Off
Module temperature low alarm : Off
Module temperature high warning : Off
Module temperature low warning : Off
Module voltage high alarm : Off
Module voltage low alarm : Off
Module voltage high warning : Off
Module voltage low warning : Off
Laser rx power high alarm : Off
Laser rx power low alarm : Off
Laser rx power high warning : Off
Laser rx power low warning : Off
Laser bias current high alarm threshold : 110.000 mA
Laser bias current low alarm threshold : 1.000 mA
Laser bias current high warning threshold : 100.000 mA
Laser bias current low warning threshold : 1.000 mA
Laser output power high alarm threshold : 0.7079 mW / -1.50 dBm
Laser output power low alarm threshold : 0.0891 mW / -10.50 dBm
Laser output power high warning threshold : 0.6310 mW / -2.00 dBm
Laser output power low warning threshold : 0.1000 mW / -10.00 dBm
Module temperature high alarm threshold : 90.00 degrees C / 194.00 degrees F
Module temperature low alarm threshold : -45.00 degrees C / -49.00 degrees F
Module temperature high warning threshold : 85.00 degrees C / 185.00 degrees F
Module temperature low warning threshold : -40.00 degrees C / -40.00 degrees F
Module voltage high alarm threshold : 3.7950 V
Module voltage low alarm threshold : 2.8050 V
Module voltage high warning threshold : 3.4650 V
Module voltage low warning threshold : 3.1350 V
Laser rx power high alarm threshold : 0.7079 mW / -1.50 dBm
Laser rx power low alarm threshold : 0.0028 mW / -25.53 dBm
Laser rx power high warning threshold : 0.6310 mW / -2.00 dBm
Laser rx power low warning threshold : 0.0032 mW / -24.95 dBm
Signed-off-by: Ernesto Castellotti <ernesto@castellotti.net>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20240301184806.2634508-3-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Linux users sometimes need an easy way to check current values of module
parameters. For example the module may be manually reloaded with different
parameters. Make these visible and readable in the /sys filesystem to allow
that. But don't make the "debug" module parameter visible as debugging is
enabled via ethtool msglvl.
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20240301184806.2634508-2-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
1. pcaps are free'd right after AQ routines are done, no need for
devm_'s
2. a test frame for loopback test in ethtool -t is destroyed at the end
of the test so we don't need devm_ here either.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
ice_down() clears QINT_TQCTL_CAUSE_ENA_M bit twice, which is not
necessary. First clearing happens in ice_vsi_dis_irq() and second in
ice_vsi_stop_tx_ring() - remove the first one.
While at it, make ice_vsi_dis_irq() static as ice_down() is the only
current caller of it.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The indentation for ice_set_ctx and ice_write_rxq_ctx breaks the function
name after the return type. This style of breaking is used a lot throughout
the ice driver, even in cases where its not actually helpful for
readability. We no longer prefer this style of line splitting in the
driver, and new code is avoiding it.
Normally, I would leave this alone unless the actual function contents or
description needed updating. However, a future change is going to add
inverse functions for converting packed context to unpacked context
structures. To keep this code uniform with the existing set functions, fix
up the style to the modern format of keeping the type on the same line.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The functions used to pack the Tx and Rx context into the hardware format
rely on using BIT() and then subtracting 1 to get a bitmask. These
functions even have a comment about how x86 machines can't use this method
for certain widths because the SHL instructions will not work properly.
The Linux kernel already provides the GENMASK macro for generating a
suitable bitmask. Further, GENMASK is capable of generating the mask
including the shift_width. Since width is the total field width, take care
to subtract one to get the final bit position.
Since we now include the shifted bits as part of the mask, shift the source
value first before applying the mask.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
In ice_common.c there are 4 functions used for converting the unpacked
software Tx and Rx context structure data into the packed format used by
hardware. These functions have extremely generic names:
* ice_write_byte
* ice_write_word
* ice_write_dword
* ice_write_qword
When I saw these function names my first thought was "write what? to
where?". Understanding what these functions do requires looking at the
implementation details. The functions take bits from an unpacked structure
and copy them into the packed layout used by hardware.
As part of live migration, we will want functions which perform the inverse
operation of reading bits from the packed layout and copying them into the
unpacked format. Naming these as "ice_read_byte", etc would be very
confusing since they appear to write data.
In preparation for adding this new inverse operation, rename the existing
functions to use the prefix "ice_pack_ctx_". This makes it clear that they
perform the bit packing while copying from the unpacked software context
structure to the packed hardware context.
The inverse operations can then neatly be named ice_unpack_ctx_*, clearly
indicating they perform the bit unpacking while copying from the packed
hardware context to the unpacked software context structure.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The lan_vsi_num field of the VF structure is no longer used for any
purpose. Remove it.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
When initializing over virtchnl, the PF is required to pass a VSI ID to the
VF as part of its capabilities exchange. The VF driver reports this value
back to the PF in a variety of commands. The PF driver validates that this
value matches the value it sent to the VF.
Some hardware families such as the E700 series could use this value when
reading RSS registers or communicating directly with firmware over the
Admin Queue.
However, E800 series hardware does not support any of these interfaces and
the VF's only use for this value is to report it back to the PF. Thus,
there is no requirement that this value be an actual VSI ID value of any
kind.
The PF driver already does not trust that the VF sends it a real VSI ID.
The VSI structure is always looked up from the VF structure. The PF does
validate that the VSI ID provided matches a VSI associated with the VF, but
otherwise does not use the VSI ID for any purpose.
Instead of reporting the VSI number relative to the PF space, report a
fixed value of 1. When communicating with the VF over virtchnl, validate
that the VSI number is returned appropriately.
This avoids leaking information about the firmware of the PF state.
Currently the ice driver only supplies a VF with a single VSI. However, it
appears that virtchnl has some support for allowing multiple VSIs. I did
not attempt to implement this. However, space is left open to allow further
relative indexes if additional VSIs are provided in future feature
development. For this reason, keep the ice_vc_isvalid_vsi_id function in
place to allow extending it for multiple VSIs in the future.
This change will also simplify handling of live migration in a future
series. Since we no longer will provide a real VSI number to the VF, there
will be no need to keep track of this number when migrating to a new host.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The ice_vc_fdir_param_check() function validates that the VSI ID of the
virtchnl flow director command matches the VSI number of the VF. This is
already checked by the call to ice_vc_isvalid_vsi_id() immediately
following this.
This check is unnecessary since ice_vc_isvalid_vsi_id() already confirms
this by checking that the VSI ID can locate the VSI associated with the VF
structure.
Furthermore, a following change is going to refactor the ice driver to
report VSI IDs using a relative index for each VF instead of reporting the
PF VSI number. This additional check would break that logic since it
enforces that the VSI ID matches the VSI number.
Since this check duplicates the logic in ice_vc_isvalid_vsi_id() and gets
in the way of refactoring that logic, remove it.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The ice_vc_isvalid_q_id() function takes a VSI index and a queue ID. It
looks up the VSI from its index, and then validates that the queue number
is valid for that VSI.
The VSI ID passed is typically a VSI index from the VF. This VSI number is
validated by the PF to ensure that it matches the VSI associated with the
VF already.
In every flow where ice_vc_isvalid_q_id() is called, the PF driver already
has a pointer to the VSI associated with the VF. This pointer is obtained
using ice_get_vf_vsi(), rather than looking up the VSI using the index sent
by the VF.
Since we already know which VSI to operate on, we can modify
ice_vc_isvalid_q_id() to take a VSI pointer instead of a VSI index. Pass
the VSI we found from ice_get_vf_vsi() instead of re-doing the lookup. This
removes some unnecessary computation and scanning of the VSI list.
It also removes the last place where the driver directly used the VSI
number from the VF. This will pave the way for refactoring to communicate
relative VSI numbers to the VF instead of absolute numbers from the PF
space.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
This error message is at best not really helpful and at worst
misleading. If we're here in idpf_intr_rel we're likely trying to do
remove or reset. If we're in reset, this message will fail because we
lose the virtchnl on reset and HW is going to clean up those resources
regardless in that case. If we're in remove and we get an error here,
we're going to reset the device at the end of remove anyway so not a big
deal. Just remove this message it's not useful.
Tested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
While we're here improving virtchnl we can include two minor fixes for
the lower level ctrlq flow.
This adds a memory barrier to idpf_post_rx_buffs before we update tail
on the controlq. We should make sure our writes have had a chance to
finish before we tell HW it can touch them.
This also removes some defensive programming in idpf_ctrlq_recv. The
caller should not be using a num_q_msg value of zero or more than the
ring size and it's their responsibility to call functions sanely.
Tested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
In idpf_remove we need to tear down the virtchnl core with
idpf_vc_core_deinit so we can free up resources and leave things in a
good state. However, in the case where we failed to establish VC
communications we may not have ever actually successfully initialized
the virtchnl core.
This fixes it by setting a bit once we successfully init the virtchnl
core. Then, in deinit, we'll check for it before going on further,
otherwise we just return. Also clear the bit at the end of deinit so we
know it's gone now.
Tested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
We can now remove a bunch of gross code we don't need anymore like the
vc state bits and vc_buf_lock since everything is using transaction API
now.
Tested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com>
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Now that all the messages are using the transaction API, we can rework
idpf_recv_mb_msg quite a lot to simplify it. Due to this, we remove
idpf_find_vport as no longer used and alter idpf_recv_event_msg
slightly.
Tested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
There are situations where the driver needs to add a MAC filter but
we're explicitly not allowed to sleep so we can wait for a virtchnl
message to complete.
This adds an async_handler for asynchronously sent messages for MAC
filters so that we can better handle if there's an error of some kind.
If success we don't need to do anything else, but if we failed to
program the new filter we really should remove it from our list of MAC
filters. If we don't remove bad filters, what I expect to happen is
after a reset of some kind we try to program the MAC filter again and it
fails again. This is clearly wrong and I would expect to be confusing
for the user.
It could also be the failure is for a delete MAC filter message but
those filters get deleted regardless. Not much we can do about a delete
failure.
Tested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
This takes care of RSS/SRIOV/MAC and other misc virtchnl messages. This
again is mostly mechanical.
In absence of an async_handler for MAC filters, this will simply
generically report any errors from idpf_vc_xn_forward_async. This
maintains the existing behavior. Follow up patch will add an async
handler for MAC filters to remove bad filters from our list.
While we're here we can also make the code much nicer by converting some
variables to auto-variables where appropriate. This makes it cleaner and
less prone to memory leaking.
There's still a bit more cleanup we can do here to remove stuff that's
not being used anymore now; follow-up patches will take care of loose
ends.
Tested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|