summaryrefslogtreecommitdiff
path: root/drivers/net/dsa/ocelot/felix_vsc9959.c
AgeCommit message (Collapse)Author
2022-09-07net: dsa: felix: access QSYS_TAG_CONFIG under tas_lock in ↵Vladimir Oltean
vsc9959_sched_speed_set The read-modify-write of QSYS_TAG_CONFIG from vsc9959_sched_speed_set() runs unlocked with respect to the other functions that access it, which are vsc9959_tas_guard_bands_update(), vsc9959_qos_port_tas_set() and vsc9959_tas_clock_adjust(). All the others are under ocelot->tas_lock, so move the vsc9959_sched_speed_set() access under that lock as well, to resolve the concurrency. Fixes: 55a515b1f5a9 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07net: dsa: felix: disable cut-through forwarding for frames oversized for ↵Vladimir Oltean
tc-taprio Experimentally, it looks like when QSYS_QMAXSDU_CFG_7 is set to 605, frames even way larger than 601 octets are transmitted even though these should be considered as oversized, according to the documentation, and dropped. Since oversized frame dropping depends on frame size, which is only known at the EOF stage, and therefore not at SOF when cut-through forwarding begins, it means that the switch cannot take QSYS_QMAXSDU_CFG_* into consideration for traffic classes that are cut-through. Since cut-through forwarding has no UAPI to control it, and the driver enables it based on the mantra "if we can, then why not", the strategy is to alter vsc9959_cut_through_fwd() to take into consideration which tc's have oversize frame dropping enabled, and disable cut-through for them. Then, from vsc9959_tas_guard_bands_update(), we re-trigger the cut-through determination process. There are 2 strategies for vsc9959_cut_through_fwd() to determine whether a tc has oversized dropping enabled or not. One is to keep a bit mask of traffic classes per port, and the other is to read back from the hardware registers (a non-zero value of QSYS_QMAXSDU_CFG_* means the feature is enabled). We choose reading back from registers, because struct ocelot_port is shared with drivers (ocelot, seville) that don't support either cut-through nor tc-taprio, and we don't have a felix specific extension of struct ocelot_port. Furthermore, reading registers from the Felix hardware is quite cheap, since they are memory-mapped. Fixes: 55a515b1f5a9 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07net: dsa: felix: tc-taprio intervals smaller than MTU should send at least ↵Vladimir Oltean
one packet The blamed commit broke tc-taprio schedules such as this one: tc qdisc replace dev $swp1 root taprio \ num_tc 8 \ map 0 1 2 3 4 5 6 7 \ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ base-time 0 \ sched-entry S 0x7f 990000 \ sched-entry S 0x80 10000 \ flags 0x2 because the gate entry for TC 7 (S 0x80 10000 ns) now has a static guard band added earlier than its 'gate close' event, such that packet overruns won't occur in the worst case of the largest packet possible. Since guard bands are statically determined based on the per-tc QSYS_QMAXSDU_CFG_* with a fallback on the port-based QSYS_PORT_MAX_SDU, we need to discuss what happens with TC 7 depending on kernel version, since the driver, prior to commit 55a515b1f5a9 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port"), did not touch QSYS_QMAXSDU_CFG_*, and therefore relied on QSYS_PORT_MAX_SDU. 1 (before vsc9959_tas_guard_bands_update): QSYS_PORT_MAX_SDU defaults to 1518, and at gigabit this introduces a static guard band (independent of packet sizes) of 12144 ns, plus QSYS::HSCH_MISC_CFG.FRM_ADJ (bit time of 20 octets => 160 ns). But this is larger than the time window itself, of 10000 ns. So, the queue system never considers a frame with TC 7 as eligible for transmission, since the gate practically never opens, and these frames are forever stuck in the TX queues and hang the port. 2 (after vsc9959_tas_guard_bands_update): Under the sole goal of enabling oversized frame dropping, we make an effort to set QSYS_QMAXSDU_CFG_7 to 1230 bytes. But QSYS_QMAXSDU_CFG_7 plays one more role, which we did not take into account: per-tc static guard band, expressed in L2 byte time (auto-adjusted for FCS and L1 overhead). There is a discrepancy between what the driver thinks (that there is no guard band, and 100% of min_gate_len[tc] is available for egress scheduling) and what the hardware actually does (crops the equivalent of QSYS_QMAXSDU_CFG_7 ns out of min_gate_len[tc]). In practice, this means that the hardware thinks it has exactly 0 ns for scheduling tc 7. In both cases, even minimum sized Ethernet frames are stuck on egress rather than being considered for scheduling on TC 7, even if they would fit given a proper configuration. Considering the current situation, with vsc9959_tas_guard_bands_update(), frames between 60 octets and 1230 octets in size are not eligible for oversized dropping (because they are smaller than QSYS_QMAXSDU_CFG_7), but won't be considered as eligible for scheduling either, because the min_gate_len[7] (10000 ns) minus the guard band determined by QSYS_QMAXSDU_CFG_7 (1230 octets * 8 ns per octet == 9840 ns) minus the guard band auto-added for L1 overhead by QSYS::HSCH_MISC_CFG.FRM_ADJ (20 octets * 8 ns per octet == 160 octets) leaves 0 ns for scheduling in the queue system proper. Investigating the hardware behavior, it becomes apparent that the queue system needs precisely 33 ns of 'gate open' time in order to consider a frame as eligible for scheduling to a tc. So the solution to this problem is to amend vsc9959_tas_guard_bands_update(), by giving the per-tc guard bands less space by exactly 33 ns, just enough for one frame to be scheduled in that interval. This allows the queue system to make forward progress for that port-tc, and prevents it from hanging. Fixes: 297c4de6f780 ("net: dsa: felix: re-enable TAS guard band mode") Reported-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-17net: mscc: ocelot: keep ocelot_stat_layout by reg address, not offsetVladimir Oltean
With so many counter addresses recently discovered as being wrong, it is desirable to at least have a central database of information, rather than two: one through the SYS_COUNT_* registers (used for ndo_get_stats64), and the other through the offset field of struct ocelot_stat_layout elements (used for ethtool -S). The strategy will be to keep the SYS_COUNT_* definitions as the single source of truth, but for that we need to expand our current definitions to cover all registers. Then we need to convert the ocelot region creation logic, and stats worker, to the read semantics imposed by going through SYS_COUNT_* absolute register addresses, rather than offsets of 32-bit words relative to SYS_COUNT_RX_OCTETS (which should have been SYS_CNT, by the way). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net: mscc: ocelot: make struct ocelot_stat_layout array indexableVladimir Oltean
The ocelot counters are 32-bit and require periodic reading, every 2 seconds, by ocelot_port_update_stats(), so that wraparounds are detected. Currently, the counters reported by ocelot_get_stats64() come from the 32-bit hardware counters directly, rather than from the 64-bit accumulated ocelot->stats, and this is a problem for their integrity. The strategy is to make ocelot_get_stats64() able to cherry-pick individual stats from ocelot->stats the way in which it currently reads them out from SYS_COUNT_* registers. But currently it can't, because ocelot->stats is an opaque u64 array that's used only to feed data into ethtool -S. To solve that problem, we need to make ocelot->stats indexable, and associate each element with an element of struct ocelot_stat_layout used by ethtool -S. This makes ocelot_stat_layout a fat (and possibly sparse) array, so we need to change the way in which we access it. We no longer need OCELOT_STAT_END as a sentinel, because we know the array's size (OCELOT_NUM_STATS). We just need to skip the array elements that were left unpopulated for the switch revision (ocelot, felix, seville). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net: mscc: ocelot: turn stats_lock into a spinlockVladimir Oltean
ocelot_get_stats64() currently runs unlocked and therefore may collide with ocelot_port_update_stats() which indirectly accesses the same counters. However, ocelot_get_stats64() runs in atomic context, and we cannot simply take the sleepable ocelot->stats_lock mutex. We need to convert it to an atomic spinlock first. Do that as a preparatory change. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net: mscc: ocelot: fix incorrect ndo_get_stats64 packet countersVladimir Oltean
Reading stats using the SYS_COUNT_* register definitions is only used by ocelot_get_stats64() from the ocelot switchdev driver, however, currently the bucket definitions are incorrect. Separately, on both RX and TX, we have the following problems: - a 256-1023 bucket which actually tracks the 256-511 packets - the 1024-1526 bucket actually tracks the 512-1023 packets - the 1527-max bucket actually tracks the 1024-1526 packets => nobody tracks the packets from the real 1527-max bucket Additionally, the RX_PAUSE, RX_CONTROL, RX_LONGS and RX_CLASSIFIED_DROPS all track the wrong thing. However this doesn't seem to have any consequence, since ocelot_get_stats64() doesn't use these. Even though this problem only manifests itself for the switchdev driver, we cannot split the fix for ocelot and for DSA, since it requires fixing the bucket definitions from enum ocelot_reg, which makes us necessarily adapt the structures from felix and seville as well. Fixes: 84705fc16552 ("net: dsa: felix: introduce support for Seville VSC9953 switch") Fixes: 56051948773e ("net: dsa: ocelot: add driver for Felix switch family") Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net: dsa: felix: fix ethtool 256-511 and 512-1023 TX packet countersVladimir Oltean
What the driver actually reports as 256-511 is in fact 512-1023, and the TX packets in the 256-511 bucket are not reported. Fix that. Fixes: 56051948773e ("net: dsa: ocelot: add driver for Felix switch family") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-08net: dsa: felix: fix min gate len calculation for tc when its first gate is ↵Vladimir Oltean
closed min_gate_len[tc] is supposed to track the shortest interval of continuously open gates for a traffic class. For example, in the following case: TC 76543210 t0 00000001b 200000 ns t1 00000010b 200000 ns min_gate_len[0] and min_gate_len[1] should be 200000, while min_gate_len[2-7] should be 0. However what happens is that min_gate_len[0] is 200000, but min_gate_len[1] ends up being 0 (despite gate_len[1] being 200000 at the point where the logic detects the gate close event for TC 1). The problem is that the code considers a "gate close" event whenever it sees that there is a 0 for that TC (essentially it's level rather than edge triggered). By doing that, any time a gate is seen as closed without having been open prior, gate_len, which is 0, will be written into min_gate_len. Once min_gate_len becomes 0, it's impossible for it to track anything higher than that (the length of actually open intervals). To fix this, we make the writing to min_gate_len[tc] be edge-triggered, which avoids writes for gates that are closed in consecutive intervals. However what this does is it makes us need to special-case the permanently closed gates at the end. Fixes: 55a515b1f5a9 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220804202817.1677572-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-30time64.h: consolidate uses of PSEC_PER_NSECVladimir Oltean
Time-sensitive networking code needs to work with PTP times expressed in nanoseconds, and with packet transmission times expressed in picoseconds, since those would be fractional at higher than gigabit speed when expressed in nanoseconds. Convert the existing uses in tc-taprio and the ocelot/felix DSA driver to a PSEC_PER_NSEC macro. This macro is placed in include/linux/time64.h as opposed to its relatives (PSEC_PER_SEC etc) from include/vdso/time64.h because the vDSO library does not (yet) need/use it. Cc: Andy Lutomirski <luto@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> # for the vDSO parts Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-30net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the ↵Vladimir Oltean
port Currently, sending a packet into a time gate too small for it (or always closed) causes the queue system to hold the frame forever. Even worse, this frame isn't subject to aging either, because for that to happen, it needs to be scheduled for transmission in the first place. But the frame will consume buffer memory and frame references while it is forever held in the queue system. Before commit a4ae997adcbd ("net: mscc: ocelot: initialize watermarks to sane defaults"), this behavior was somewhat subtle, as the switch had a more intricately tuned default watermark configuration out of reset, which did not allow any single port and tc to consume the entire switch buffer space. Nonetheless, the held frames are still there, and they reduce the total backplane capacity of the switch. However, after the aforementioned commit, the behavior can be very clearly seen, since we deliberately allow each {port, tc} to consume the entire shared buffer of the switch minus the reservations (and we disable all reservations by default). That is to say, we allow a permanently closed tc-taprio gate to hang the entire switch. A careful inspection of the documentation shows that the QSYS:Q_MAX_SDU per-port-tc registers serve 2 purposes: one is for guard band calculation (when zero, this falls back to QSYS:PORT_MAX_SDU), and the other is to enable oversized frame dropping (when non-zero). Currently the QSYS:Q_MAX_SDU registers are all zero, so oversized frame dropping is disabled. The goal of the change is to enable it seamlessly. For that, we need to hook into the MTU change, tc-taprio change, and port link speed change procedures, since we depend on these variables. Frames are not dropped on egress due to a queue system oversize condition, instead that egress port is simply excluded from the mask of valid destination ports for the packet. If there are no destination ports at all, the ingress counter that increments is the generic "drop_tail" in ethtool -S. The issue exists in various forms since the tc-taprio offload was introduced. Fixes: de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload") Reported-by: Richie Pearn <richard.pearn@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-30net: dsa: felix: keep QSYS_TAG_CONFIG_INIT_GATE_STATE(0xFF) out of rmwVladimir Oltean
In vsc9959_tas_clock_adjust(), the INIT_GATE_STATE field is not changed, only the ENABLE field. Similarly for the disabling of the time-aware shaper in vsc9959_qos_port_tas_set(). To reflect this, keep the QSYS_TAG_CONFIG_INIT_GATE_STATE_M mask out of the read-modify-write procedure to make it clearer what is the intention of the code. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-30net: dsa: felix: keep reference on entire tc-taprio configVladimir Oltean
In a future change we will need to remember the entire tc-taprio config on all ports rather than just the base time, so use the taprio_offload_get() helper function to replace ocelot_port->base_time with ocelot_port->taprio. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
drivers/net/ethernet/microchip/sparx5/sparx5_switchdev.c 9c5de246c1db ("net: sparx5: mdb add/del handle non-sparx5 devices") fbb89d02e33a ("net: sparx5: Allow mdb entries to both CPU and ports") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-30net: dsa: felix: fix race between reading PSFP stats and port statsVladimir Oltean
Both PSFP stats and the port stats read by ocelot_check_stats_work() are indirectly read through the same mechanism - write to STAT_CFG:STAT_VIEW, read from SYS:STAT:CNT[n]. It's just that for port stats, we write STAT_VIEW with the index of the port, and for PSFP stats, we write STAT_VIEW with the filter index. So if we allow them to run concurrently, ocelot_check_stats_work() may change the view from vsc9959_psfp_counters_get(), and vice versa. Fixes: 7d4b564d6add ("net: dsa: felix: support psfp filter on vsc9959") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220629183007.3808130-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-19net: dsa: felix: update base time of time-aware shaper when adjusting PTP timeXiaoliang Yang
When adjusting the PTP clock, the base time of the TAS configuration will become unreliable. We need reset the TAS configuration by using a new base time. For example, if the driver gets a base time 0 of Qbv configuration from user, and current time is 20000. The driver will set the TAS base time to be 20000. After the PTP clock adjustment, the current time becomes 10000. If the TAS base time is still 20000, it will be a future time, and TAS entry list will stop running. Another example, if the current time becomes to be 10000000 after PTP clock adjust, a large time offset can cause the hardware to hang. This patch introduces a tas_clock_adjust() function to reset the TAS module by using a new base time after the PTP clock adjustment. This can avoid issues above. Due to PTP clock adjustment can occur at any time, it may conflict with the TAS configuration. We introduce a new TAS lock to serialize the access to the TAS registers. Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-23net: mscc: ocelot: switch from {,un}set to {,un}assign for tag_8021q CPU portsVladimir Oltean
There is a desire for the felix driver to gain support for multiple tag_8021q CPU ports, but the current model prevents it. This is because ocelot_apply_bridge_fwd_mask() only takes into consideration whether a port is a tag_8021q CPU port, but not whose CPU port it is. We need a model where we can have a direct affinity between an ocelot port and a tag_8021q CPU port. This serves as the basis for multiple CPU ports. Declare a "dsa_8021q_cpu" backpointer in struct ocelot_port which encodes that affinity. Repurpose the "ocelot_set_dsa_8021q_cpu" API to "ocelot_assign_dsa_8021q_cpu" to express the change of paradigm. Note that this change makes the first practical use of the new ocelot_port->index field in ocelot_port_unassign_dsa_8021q_cpu(), where we need to remove the old tag_8021q CPU port from the reserved VLAN range. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-11net: dsa: ocelot: accept 1000base-X for VSC9959 and VSC9953Vladimir Oltean
Switches using the Lynx PCS driver support 1000base-X optical SFP modules. Accept this interface type on a port. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220510164320.10313-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-30net: ethernet: ocelot: remove the need for num_stats initializerColin Foster
There is a desire to share the oclot_stats_layout struct outside of the current vsc7514 driver. In order to do so, the length of the array needs to be known at compile time, and defined in the struct ocelot and struct felix_info. Since the array is defined in a .c file and would be declared in the header file via: extern struct ocelot_stat_layout[]; the size of the array will not be known at compile time to outside modules. To fix this, remove the need for defining the number of stats at compile time and allow this number to be determined at initialization. Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08net: dsa: felix: suppress -EPROBE_DEFER errorsMichael Walle
The DSA master might not have been probed yet in which case the probe of the felix switch fails with -EPROBE_DEFER: [ 4.435305] mscc_felix 0000:00:00.5: Failed to register DSA switch: -517 It is not an error. Use dev_err_probe() to demote this particular error to a debug message. Fixes: 56051948773e ("net: dsa: ocelot: add driver for Felix switch family") Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220408101521.281886-1-michael@walle.cc Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-30net: dsa: felix: fix possible NULL pointer dereferenceZheng Yongjun
As the possible failure of the allocation, kzalloc() may return NULL pointer. Therefore, it should be better to check the 'sgi' in order to prevent the dereference of NULL pointer. Fixes: 23ae3a7877718 ("net: dsa: felix: add stream gate settings for psfp"). Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220329090800.130106-1-zhengyongjun3@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-21net: dsa: felix: allow PHY_INTERFACE_MODE_INTERNAL on port 5Vladimir Oltean
The Felix switch has 6 ports, 2 of which are internal. Due to some misunderstanding, my initial suggestion for vsc9959_port_modes[]: https://patchwork.kernel.org/project/netdevbpf/patch/20220129220221.2823127-10-colin.foster@in-advantage.com/#24718277 got translated by Colin into a 5-port array, leading to an all-zero port mode mask for port 5. Fixes: acf242fc739e ("net: dsa: felix: remove prevalidate_phy_mode interface") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220318195812.276276-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-28net: dsa: felix: remove prevalidate_phy_mode interfaceColin Foster
All users of the felix driver were creating their own prevalidate_phy_mode function. The same logic can be performed in a more general way by using a simple array of bit fields. Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-26net: dsa: ocelot: remove interface checksRussell King (Oracle)
When the supported interfaces bitmap is populated, phylink will itself check that the interface mode is present in this bitmap. Drivers no longer need to perform this check themselves. Remove these checks. Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-08net: dsa: felix: don't use devres for mdiobusVladimir Oltean
As explained in commits: 74b6d7d13307 ("net: dsa: realtek: register the MDIO bus under devres") 5135e96a3dd2 ("net: dsa: don't allocate the slave_mii_bus using devres") mdiobus_free() will panic when called from devm_mdiobus_free() <- devres_release_all() <- __device_release_driver(), and that mdiobus was not previously unregistered. The Felix VSC9959 switch is a PCI device, so the initial set of constraints that I thought would cause this (I2C or SPI buses which call ->remove on ->shutdown) do not apply. But there is one more which applies here. If the DSA master itself is on a bus that calls ->remove from ->shutdown (like dpaa2-eth, which is on the fsl-mc bus), there is a device link between the switch and the DSA master, and device_links_unbind_consumers() will unbind the felix switch driver on shutdown. So the same treatment must be applied to all DSA switch drivers, which is: either use devres for both the mdiobus allocation and registration, or don't use devres at all. The felix driver has the code structure in place for orderly mdiobus removal, so just replace devm_mdiobus_alloc_size() with the non-devres variant, and add manual free where necessary, to ensure that we don't let devres free a still-registered bus. Fixes: ac3a68d56651 ("net: phy: don't abuse devres in devm_mdiobus_register()") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-02net: dsa: felix: name change for clarity from pcs to mdio_deviceColin Foster
Simple rename of a variable to make things more logical. Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-02net: phy: lynx: refactor Lynx PCS module to use generic phylink_pcsColin Foster
Remove references to lynx_pcs structures so drivers like the Felix DSA can reference alternate PCS drivers. Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19flow_offload: add index to flow_action_entry structureBaowen Zheng
Add index to flow_action_entry structure and delete index from police and gate child structure. We make this change to offload tc action for driver to identify a tc action. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-07net: dsa: felix: use kmemdup() to replace kmalloc + memcpyYihao Han
Fix following coccicheck warning: /drivers/net/dsa/ocelot/felix_vsc9959.c:1627:13-20: WARNING opportunity for kmemdup /drivers/net/dsa/ocelot/felix_vsc9959.c:1506:16-23: WARNING opportunity for kmemdup Signed-off-by: Yihao Han <hanyihao@vivo.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20211207064419.38632-1-hanyihao@vivo.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07net: dsa: ocelot: felix: add interface for custom regmapsColin Foster
Add an interface so that non-mmio regmaps can be used Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07net: dsa: ocelot: remove unnecessary pci_bar variablesColin Foster
The pci_bar variables for the switch and imdio don't make sense for the generic felix driver. Moving them to felix_vsc9959 to limit scope and simplify the felix_info struct. Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-29net: dsa: felix: fix flexible_array.cocci warningskernel test robot
Zero-length and one-element arrays are deprecated, see Documentation/process/deprecated.rst Flexible-array members should be used instead. Generated by: scripts/coccinelle/misc/flexible_array.cocci Fixes: 23ae3a787771 ("net: dsa: felix: add stream gate settings for psfp") CC: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: kernel test robot <lkp@intel.com> Signed-off-by: Julia Lawall <julia.lawall@inria.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-25net: dsa: felix: enable cut-through forwarding between ports by defaultVladimir Oltean
The VSC9959 switch embedded within NXP LS1028A (and that version of Ocelot switches only) supports cut-through forwarding - meaning it can start the process of looking up the destination ports for a packet, and forward towards those ports, before the entire packet has been received (as opposed to the store-and-forward mode). The up side is having lower forwarding latency for large packets. The down side is that frames with FCS errors are forwarded instead of being dropped. However, erroneous frames do not result in incorrect updates of the FDB or incorrect policer updates, since these processes are deferred inside the switch to the end of frame. Since the switch starts the cut-through forwarding process after all packet headers (including IP, if any) have been processed, packets with large headers and small payload do not see the benefit of lower forwarding latency. There are two cases that need special attention. The first is when a packet is multicast (or flooded) to multiple destinations, one of which doesn't have cut-through forwarding enabled. The switch deals with this automatically by disabling cut-through forwarding for the frame towards all destination ports. The second is when a packet is forwarded from a port of lower link speed towards a port of higher link speed. This is not handled by the hardware and needs software intervention. Since we practically need to update the cut-through forwarding domain from paths that aren't serialized by the rtnl_mutex (phylink mac_link_down/mac_link_up ops), this means we need to serialize physical link events with user space updates of bonding/bridging domains. Enabling cut-through forwarding is done per {egress port, traffic class}. I don't see any reason why this would be a configurable option as long as it works without issues, and there doesn't appear to be any user space configuration tool to toggle this on/off, so this patch enables cut-through forwarding on all eligible ports and traffic classes. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20211125125808.2383984-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-18net: dsa: felix: restrict psfp rules on ingress portXiaoliang Yang
PSFP rules take effect on the streams from any port of VSC9959 switch. This patch use ingress port to limit the rule only active on this port. Each stream can only match two ingress source ports in VSC9959. Streams from lowest port gets the configuration of SFID pointed by MAC Table lookup and streams from highest port gets the configuration of (SFID+1) pointed by MAC Table lookup. This patch defines the PSFP rule on highest port as dummy rule, which means that it does not modify the MAC table. Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-18net: dsa: felix: use vcap policer to set flow meter for psfpXiaoliang Yang
This patch add police action to set flow meter table which is defined in IEEE802.1Qci. Flow metering is two rates two buckets and three color marker to policing the frames, we only enable one rate one bucket in this patch. Flow metering shares a same policer pool with VCAP policers, so the PSFP policer calls ocelot_vcap_policer_add() and ocelot_vcap_policer_del() to set flow meter police. Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-18net: mscc: ocelot: use index to set vcap policerXiaoliang Yang
Policer was previously automatically assigned from the highest index to the lowest index from policer pool. But police action of tc flower now uses index to set an police entry. This patch uses the police index to set vcap policers, so that one policer can be shared by multiple rules. Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-18net: dsa: felix: add stream gate settings for psfpXiaoliang Yang
This patch adds stream gate settings for PSFP. Use SGI table to store stream gate entries. Disable the gate entry when it is not used by any stream. Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-18net: dsa: felix: support psfp filter on vsc9959Xiaoliang Yang
VSC9959 supports Per-Stream Filtering and Policing(PSFP) that complies with the IEEE 802.1Qci standard. The stream is identified by Null stream identification(DMAC and VLAN ID) defined in IEEE802.1CB. For PSFP, four tables need to be set up: stream table, stream filter table, stream gate table, and flow meter table. Identify the stream by parsing the tc flower keys and add it to the stream table. The stream filter table is automatically maintained, and its index is determined by SGID(flow gate index) and FMID(flow meter index). Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-24net: convert users of bitmap_foo() to linkmode_foo()Sean Anderson
This converts instances of bitmap_foo(args..., __ETHTOOL_LINK_MODE_MASK_NBITS) to linkmode_foo(args...) I manually fixed up some lines to prevent them from being excessively long. Otherwise, this change was generated with the following semantic patch: // Generated with // echo linux/linkmode.h > includes // git grep -Flf includes include/ | cut -f 2- -d / | cat includes - \ // | sort | uniq | tee new_includes | wc -l && mv new_includes includes // and repeating until the number stopped going up @i@ @@ ( #include <linux/acpi_mdio.h> | #include <linux/brcmphy.h> | #include <linux/dsa/loop.h> | #include <linux/dsa/sja1105.h> | #include <linux/ethtool.h> | #include <linux/ethtool_netlink.h> | #include <linux/fec.h> | #include <linux/fs_enet_pd.h> | #include <linux/fsl/enetc_mdio.h> | #include <linux/fwnode_mdio.h> | #include <linux/linkmode.h> | #include <linux/lsm_audit.h> | #include <linux/mdio-bitbang.h> | #include <linux/mdio.h> | #include <linux/mdio-mux.h> | #include <linux/mii.h> | #include <linux/mii_timestamper.h> | #include <linux/mlx5/accel.h> | #include <linux/mlx5/cq.h> | #include <linux/mlx5/device.h> | #include <linux/mlx5/driver.h> | #include <linux/mlx5/eswitch.h> | #include <linux/mlx5/fs.h> | #include <linux/mlx5/port.h> | #include <linux/mlx5/qp.h> | #include <linux/mlx5/rsc_dump.h> | #include <linux/mlx5/transobj.h> | #include <linux/mlx5/vport.h> | #include <linux/of_mdio.h> | #include <linux/of_net.h> | #include <linux/pcs-lynx.h> | #include <linux/pcs/pcs-xpcs.h> | #include <linux/phy.h> | #include <linux/phy_led_triggers.h> | #include <linux/phylink.h> | #include <linux/platform_data/bcmgenet.h> | #include <linux/platform_data/xilinx-ll-temac.h> | #include <linux/pxa168_eth.h> | #include <linux/qed/qed_eth_if.h> | #include <linux/qed/qed_fcoe_if.h> | #include <linux/qed/qed_if.h> | #include <linux/qed/qed_iov_if.h> | #include <linux/qed/qed_iscsi_if.h> | #include <linux/qed/qed_ll2_if.h> | #include <linux/qed/qed_nvmetcp_if.h> | #include <linux/qed/qed_rdma_if.h> | #include <linux/sfp.h> | #include <linux/sh_eth.h> | #include <linux/smsc911x.h> | #include <linux/soc/nxp/lpc32xx-misc.h> | #include <linux/stmmac.h> | #include <linux/sunrpc/svc_rdma.h> | #include <linux/sxgbe_platform.h> | #include <net/cfg80211.h> | #include <net/dsa.h> | #include <net/mac80211.h> | #include <net/selftests.h> | #include <rdma/ib_addr.h> | #include <rdma/ib_cache.h> | #include <rdma/ib_cm.h> | #include <rdma/ib_hdrs.h> | #include <rdma/ib_mad.h> | #include <rdma/ib_marshall.h> | #include <rdma/ib_pack.h> | #include <rdma/ib_pma.h> | #include <rdma/ib_sa.h> | #include <rdma/ib_smi.h> | #include <rdma/ib_umem.h> | #include <rdma/ib_umem_odp.h> | #include <rdma/ib_verbs.h> | #include <rdma/iw_cm.h> | #include <rdma/mr_pool.h> | #include <rdma/opa_addr.h> | #include <rdma/opa_port_info.h> | #include <rdma/opa_smi.h> | #include <rdma/opa_vnic.h> | #include <rdma/rdma_cm.h> | #include <rdma/rdma_cm_ib.h> | #include <rdma/rdmavt_cq.h> | #include <rdma/rdma_vt.h> | #include <rdma/rdmavt_qp.h> | #include <rdma/rw.h> | #include <rdma/tid_rdma_defs.h> | #include <rdma/uverbs_ioctl.h> | #include <rdma/uverbs_named_ioctl.h> | #include <rdma/uverbs_std_types.h> | #include <rdma/uverbs_types.h> | #include <soc/mscc/ocelot.h> | #include <soc/mscc/ocelot_ptp.h> | #include <soc/mscc/ocelot_vcap.h> | #include <trace/events/ib_mad.h> | #include <trace/events/rdma_core.h> | #include <trace/events/rdma.h> | #include <trace/events/rpcrdma.h> | #include <uapi/linux/ethtool.h> | #include <uapi/linux/ethtool_netlink.h> | #include <uapi/linux/mdio.h> | #include <uapi/linux/mii.h> ) @depends on i@ expression list args; @@ ( - bitmap_zero(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_zero(args) | - bitmap_copy(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_copy(args) | - bitmap_and(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_and(args) | - bitmap_or(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_or(args) | - bitmap_empty(args, ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_empty(args) | - bitmap_andnot(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_andnot(args) | - bitmap_equal(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_equal(args) | - bitmap_intersects(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_intersects(args) | - bitmap_subset(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_subset(args) ) Add missing linux/mii.h include to mellanox. -DaveM Signed-off-by: Sean Anderson <sean.anderson@seco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-19net: dsa: be compatible with masters which unregister on shutdownVladimir Oltean
Lino reports that on his system with bcmgenet as DSA master and KSZ9897 as a switch, rebooting or shutting down never works properly. What does the bcmgenet driver have special to trigger this, that other DSA masters do not? It has an implementation of ->shutdown which simply calls its ->remove implementation. Otherwise said, it unregisters its network interface on shutdown. This message can be seen in a loop, and it hangs the reboot process there: unregister_netdevice: waiting for eth0 to become free. Usage count = 3 So why 3? A usage count of 1 is normal for a registered network interface, and any virtual interface which links itself as an upper of that will increment it via dev_hold. In the case of DSA, this is the call path: dsa_slave_create -> netdev_upper_dev_link -> __netdev_upper_dev_link -> __netdev_adjacent_dev_insert -> dev_hold So a DSA switch with 3 interfaces will result in a usage count elevated by two, and netdev_wait_allrefs will wait until they have gone away. Other stacked interfaces, like VLAN, watch NETDEV_UNREGISTER events and delete themselves, but DSA cannot just vanish and go poof, at most it can unbind itself from the switch devices, but that must happen strictly earlier compared to when the DSA master unregisters its net_device, so reacting on the NETDEV_UNREGISTER event is way too late. It seems that it is a pretty established pattern to have a driver's ->shutdown hook redirect to its ->remove hook, so the same code is executed regardless of whether the driver is unbound from the device, or the system is just shutting down. As Florian puts it, it is quite a big hammer for bcmgenet to unregister its net_device during shutdown, but having a common code path with the driver unbind helps ensure it is well tested. So DSA, for better or for worse, has to live with that and engage in an arms race of implementing the ->shutdown hook too, from all individual drivers, and do something sane when paired with masters that unregister their net_device there. The only sane thing to do, of course, is to unlink from the master. However, complications arise really quickly. The pattern of redirecting ->shutdown to ->remove is not unique to bcmgenet or even to net_device drivers. In fact, SPI controllers do it too (see dspi_shutdown -> dspi_remove), and presumably, I2C controllers and MDIO controllers do it too (this is something I have not researched too deeply, but even if this is not the case today, it is certainly plausible to happen in the future, and must be taken into consideration). Since DSA switches might be SPI devices, I2C devices, MDIO devices, the insane implication is that for the exact same DSA switch device, we might have both ->shutdown and ->remove getting called. So we need to do something with that insane environment. The pattern I've come up with is "if this, then not that", so if either ->shutdown or ->remove gets called, we set the device's drvdata to NULL, and in the other hook, we check whether the drvdata is NULL and just do nothing. This is probably not necessary for platform devices, just for devices on buses, but I would really insist for consistency among drivers, because when code is copy-pasted, it is not always copy-pasted from the best sources. So depending on whether the DSA switch's ->remove or ->shutdown will get called first, we cannot really guarantee even for the same driver if rebooting will result in the same code path on all platforms. But nonetheless, we need to do something minimally reasonable on ->shutdown too to fix the bug. Of course, the ->remove will do more (a full teardown of the tree, with all data structures freed, and this is why the bug was not caught for so long). The new ->shutdown method is kept separate from dsa_unregister_switch not because we couldn't have unregistered the switch, but simply in the interest of doing something quick and to the point. The big question is: does the DSA switch's ->shutdown get called earlier than the DSA master's ->shutdown? If not, there is still a risk that we might still trigger the WARN_ON in unregister_netdevice that says we are attempting to unregister a net_device which has uppers. That's no good. Although the reference to the master net_device won't physically go away even if DSA's ->shutdown comes afterwards, remember we have a dev_hold on it. The answer to that question lies in this comment above device_link_add: * A side effect of the link creation is re-ordering of dpm_list and the * devices_kset list by moving the consumer device and all devices depending * on it to the ends of these lists (that does not happen to devices that have * not been registered when this function is called). so the fact that DSA uses device_link_add towards its master is not exactly for nothing. device_shutdown() walks devices_kset from the back, so this is our guarantee that DSA's shutdown happens before the master's shutdown. Fixes: 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings") Link: https://lore.kernel.org/netdev/20210909095324.12978-1-LinoSanfilippo@gmx.de/ Reported-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-17net: update NXP copyright textVladimir Oltean
NXP Legal insists that the following are not fine: - Saying "NXP Semiconductors" instead of "NXP", since the company's registered name is "NXP" - Putting a "(c)" sign in the copyright string - Putting a comma in the copyright string The only accepted copyright string format is "Copyright <year-range> NXP". This patch changes the copyright headers in the networking files that were sent by me, or derived from code sent by me. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-10net: dsa: felix: re-enable TAS guard band modeMichael Walle
Commit 316bcffe4479 ("net: dsa: felix: disable always guard band bit for TAS config") disabled the guard band and broke 802.3Qbv compliance. There are two issues here: (1) Without the guard band the end of the scheduling window could be overrun by a frame in transit. (2) Frames that don't fit into a configured window will still be sent. The reason for both issues is that the switch will schedule the _start_ of a frame transmission inside the predefined window without taking the length of the frame into account. Thus, we'll need the guard band which will close the gate early, so that a complete frame can still be sent. Revert the commit and add a note. For a lengthy discussion see [1]. [1] https://lore.kernel.org/netdev/c7618025da6723418c56a54fe4683bd7@walle.cc/ Fixes: 316bcffe4479 ("net: dsa: felix: disable always guard band bit for TAS config") Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-20net: dsa: felix: disable always guard band bit for TAS configXiaoliang Yang
ALWAYS_GUARD_BAND_SCH_Q bit in TAS config register is descripted as this: 0: Guard band is implemented for nonschedule queues to schedule queues transition. 1: Guard band is implemented for any queue to schedule queue transition. The driver set guard band be implemented for any queue to schedule queue transition before, which will make each GCL time slot reserve a guard band time that can pass the max SDU frame. Because guard band time could not be set in tc-taprio now, it will use about 12000ns to pass 1500B max SDU. This limits each GCL time interval to be more than 12000ns. This patch change the guard band to be only implemented for nonschedule queues to schedule queues transition, so that there is no need to reserve guard band on each GCL. Users can manually add guard band time for each schedule queues in their configuration if they want. Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-29net: mscc: ocelot: remove redundant dev_err call in vsc9959_mdio_bus_alloc()Guobin Huang
There is a error message within devm_ioremap_resource already, so remove the dev_err call to avoid redundant error message. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Guobin Huang <huangguobin4@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14net: dsa: felix: setup MMIO filtering rules for PTP when using tag_8021qVladimir Oltean
Since the tag_8021q tagger is software-defined, it has no means by itself for retrieving hardware timestamps of PTP event messages. Because we do want to support PTP on ocelot even with tag_8021q, we need to use the CPU port module for that. The RX timestamp is present in the Extraction Frame Header. And because we can't use NPI mode which redirects the CPU queues to an "external CPU" (meaning the ARM CPU running Linux), then we need to poll the CPU port module through the MMIO registers to retrieve TX and RX timestamps. Sadly, on NXP LS1028A, the Felix switch was integrated into the SoC without wiring the extraction IRQ line to the ARM GIC. So, if we want to be notified of any PTP packets received on the CPU port module, we have a problem. There is a possible workaround, which is to use the Ethernet CPU port as a notification channel that packets are available on the CPU port module as well. When a PTP packet is received by the DSA tagger (without timestamp, of course), we go to the CPU extraction queues, poll for it there, then we drop the original Ethernet packet and masquerade the packet retrieved over MMIO (plus the timestamp) as the original when we inject it up the stack. Create a quirk in struct felix is selected by the Felix driver (but not by Seville, since that doesn't support PTP at all). We want to do this such that the workaround is minimally invasive for future switches that don't require this workaround. The only traffic for which we need timestamps is PTP traffic, so add a redirection rule to the CPU port module for this. Currently we only have the need for PTP over L2, so redirection rules for UDP ports 319 and 320 are TBD for now. Note that for the workaround of matching of PTP-over-Ethernet-port with PTP-over-MMIO queues to work properly, both channels need to be absolutely lossless. There are two parts to achieving that: - We keep flow control enabled on the tag_8021q CPU port - We put the DSA master interface in promiscuous mode, so it will never drop a PTP frame (for the profiles we are interested in, these are sent to the multicast MAC addresses of 01-80-c2-00-00-0e and 01-1b-19-00-00-00). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14net: dsa: tag_ocelot: create separate tagger for SevilleVladimir Oltean
The ocelot tagger is a hot mess currently, it relies on memory initialized by the attached driver for basic frame transmission. This is against all that DSA tagging protocols stand for, which is that the transmission and reception of a DSA-tagged frame, the data path, should be independent from the switch control path, because the tag protocol is in principle hot-pluggable and reusable across switches (even if in practice it wasn't until very recently). But if another driver like dsa_loop wants to make use of tag_ocelot, it couldn't. This was done to have common code between Felix and Ocelot, which have one bit difference in the frame header format. Quoting from commit 67c2404922c2 ("net: dsa: felix: create a template for the DSA tags on xmit"): Other alternatives have been analyzed, such as: - Create a separate tag_seville.c: too much code duplication for just 1 bit field difference. - Create a separate DSA_TAG_PROTO_SEVILLE under tag_ocelot.c, just like tag_brcm.c, which would have a separate .xmit function. Again, too much code duplication for just 1 bit field difference. - Allocate the template from the init function of the tag_ocelot.c module, instead of from the driver: couldn't figure out a method of accessing the correct port template corresponding to the correct tagger in the .xmit function. The really interesting part is that Seville should have had its own tagging protocol defined - it is not compatible on the wire with Ocelot, even for that single bit. In principle, a packet generated by DSA_TAG_PROTO_OCELOT when booted on NXP LS1028A would look in a certain way, but when booted on NXP T1040 it would look differently. The reverse is also true: a packet generated by a Seville switch would be interpreted incorrectly by Wireshark if it was told it was generated by an Ocelot switch. Actually things are a bit more nuanced. If we concentrate only on the DSA tag, what I said above is true, but Ocelot/Seville also support an optional DSA tag prefix, which can be short or long, and it is possible to distinguish the two taggers based on an integer constant put in that prefix. Nonetheless, creating a separate tagger is still justified, since the tag prefix is optional, and without it, there is again no way to distinguish. Claiming backwards binary compatibility is a bit more tough, since I've already changed the format of tag_ocelot once, in commit 5124197ce58b ("net: dsa: tag_ocelot: use a short prefix on both ingress and egress"). Therefore I am not very concerned with treating this as a bugfix and backporting it to stable kernels (which would be another mess due to the fact that there would be lots of conflicts with the other DSA_TAG_PROTO* definitions). It's just simpler to say that the string values of the taggers have ABI value starting with kernel 5.12, which will be when the changing of tag protocol via /sys/class/net/<dsa-master>/dsa/tagging goes live. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14net: mscc: ocelot: use common tag parsing code with DSAVladimir Oltean
The Injection Frame Header and Extraction Frame Header that the switch prepends to frames over the NPI port is also prepended to frames delivered over the CPU port module's queues. Let's unify the handling of the frame headers by making the ocelot driver call some helpers exported by the DSA tagger. Among other things, this allows us to get rid of the strange cpu_to_be32 when transmitting the Injection Frame Header on ocelot, since the packing API uses network byte order natively (when "quirks" is 0). The comments above ocelot_gen_ifh talk about setting pop_cnt to 3, and the cpu extraction queue mask to something, but the code doesn't do it, so we don't do it either. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-01-29net: dsa: felix: convert to the new .change_tag_protocol DSA APIVladimir Oltean
In expectation of the new tag_ocelot_8021q tagger implementation, we need to be able to do runtime switchover between one tagger and another. So we must structure the existing code for the current NPI-based tagger in a certain way. We move the felix_npi_port_init function in expectation of the future driver configuration necessary for tag_ocelot_8021q: we would like to not have the NPI-related bits interspersed with the tag_8021q bits. The conversion from this: ocelot_write_rix(ocelot, ANA_PGID_PGID_PGID(GENMASK(ocelot->num_phys_ports, 0)), ANA_PGID_PGID, PGID_UC); to this: cpu_flood = ANA_PGID_PGID_PGID(BIT(ocelot->num_phys_ports)); ocelot_rmw_rix(ocelot, cpu_flood, cpu_flood, ANA_PGID_PGID, PGID_UC); is perhaps non-trivial, but is nonetheless non-functional. The PGID_UC (replicator for unknown unicast) is already configured out of hardware reset to flood to all ports except ocelot->num_phys_ports (the CPU port module). All we change is that we use a read-modify-write to only add the CPU port module to the unknown unicast replicator, as opposed to doing a full write to the register. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15net: mscc: ocelot: export NUM_TC constant from felix to common switch libVladimir Oltean
We should be moving anything that isn't DSA-specific or SoC-specific out of the felix DSA driver, and into the common mscc_ocelot switch library. The number of traffic classes is one of the aspects that is common between all ocelot switches, so it belongs in the library. This patch also makes seville use 8 TX queues, and therefore enables prioritization via the QOS_CLASS field in the NPI injection header. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15net: mscc: ocelot: add ops for decoding watermark threshold and occupancyVladimir Oltean
We'll need to read back the watermark thresholds and occupancy from hardware (for devlink-sb integration), not only to write them as we did so far in ocelot_port_set_maxlen. So introduce 2 new functions in struct ocelot_ops, similar to wm_enc, and implement them for the 3 supported mscc_ocelot switches. Remove the INUSE and MAXUSE unpacking helpers for the QSYS_RES_STAT register, because that doesn't scale with the number of switches that mscc_ocelot supports now. They have different bit widths for the watermarks, and we need function pointers to abstract that difference away. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>