summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/mscc
AgeCommit message (Collapse)Author
2021-12-10net: ocelot: add support for ndo_change_mtuClément Léger
This commit adds support for changing MTU for the ocelot register based interface. For ocelot, JUMBO frame size can be set up to 25000 bytes but has been set to 9000 which is a saner value and allows for maximum gain of performance with FDMA. Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Clément Léger <clement.leger@bootlin.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-10net: ocelot: add and export ocelot_ptp_rx_timestamp()Clément Léger
In order to support PTP in FDMA, PTP handling code is needed. Since this is the same as for register-based extraction, export it with a new ocelot_ptp_rx_timestamp() function. Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Clément Léger <clement.leger@bootlin.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-10net: ocelot: export ocelot_ifh_port_set() to setup IFHClément Léger
FDMA will need this code to prepare the injection frame header when sending SKBs. Move this code into ocelot_ifh_port_set() and add conditional IFH setting for vlan and rew op if they are not set. Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Clément Léger <clement.leger@bootlin.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07net: mscc: ocelot: split register definitions to a separate fileColin Foster
Move these to a separate file will allow them to be shared to other drivers. Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-30net: mscc: ocelot: fix mutex_lock not releasedLv Ruyi
If err is true, the function will be returned, but mutex_lock isn't released. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
drivers/net/ipa/ipa_main.c 8afc7e471ad3 ("net: ipa: separate disabling setup from modem stop") 76b5fbcd6b47 ("net: ipa: kill ipa_modem_init()") Duplicated include, drop one. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-26net: mscc: ocelot: correctly report the timestamping RX filters in ethtoolVladimir Oltean
The driver doesn't support RX timestamping for non-PTP packets, but it declares that it does. Restrict the reported RX filters to PTP v2 over L2 and over L4. Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-26net: mscc: ocelot: set up traps for PTP packetsVladimir Oltean
IEEE 1588 support was declared too soon for the Ocelot switch. Out of reset, this switch does not apply any special treatment for PTP packets, i.e. when an event message is received, the natural tendency is to forward it by MAC DA/VLAN ID. This poses a problem when the ingress port is under a bridge, since user space application stacks (written primarily for endpoint ports, not switches) like ptp4l expect that PTP messages are always received on AF_PACKET / AF_INET sockets (depending on the PTP transport being used), and never being autonomously forwarded. Any forwarding, if necessary (for example in Transparent Clock mode) is handled in software by ptp4l. Having the hardware forward these packets too will cause duplicates which will confuse endpoints connected to these switches. So PTP over L2 barely works, in the sense that PTP packets reach the CPU port, but they reach it via flooding, and therefore reach lots of other unwanted destinations too. But PTP over IPv4/IPv6 does not work at all. This is because the Ocelot switch have a separate destination port mask for unknown IP multicast (which PTP over IP is) flooding compared to unknown non-IP multicast (which PTP over L2 is) flooding. Specifically, the driver allows the CPU port to be in the PGID_MC port group, but not in PGID_MCIPV4 and PGID_MCIPV6. There are several presentations from Allan Nielsen which explain that the embedded MIPS CPU on Ocelot switches is not very powerful at all, so every penny they could save by not allowing flooding to the CPU port module matters. Unknown IP multicast did not make it. The de facto consensus is that when a switch is PTP-aware and an application stack for PTP is running, switches should have some sort of trapping mechanism for PTP packets, to extract them from the hardware data path. This avoids both problems: (a) PTP packets are no longer flooded to unwanted destinations (b) PTP over IP packets are no longer denied from reaching the CPU since they arrive there via a trap and not via flooding It is not the first time when this change is attempted. Last time, the feedback from Allan Nielsen and Andrew Lunn was that the traps should not be installed by default, and that PTP-unaware switching may be desired for some use cases: https://patchwork.ozlabs.org/project/netdev/patch/20190813025214.18601-5-yangbo.lu@nxp.com/ To address that feedback, the present patch adds the necessary packet traps according to the RX filter configuration transmitted by user space through the SIOCSHWTSTAMP ioctl. Trapping is done via VCAP IS2, where we keep 5 filters, which are amended each time RX timestamping is enabled or disabled on a port: - 1 for PTP over L2 - 2 for PTP over IPv4 (UDP ports 319 and 320) - 2 for PTP over IPv6 (UDP ports 319 and 320) The cookie by which these filters (invisible to tc) are identified is strategically chosen such that it does not collide with the filters used for the ocelot-8021q tagging protocol by the Felix driver, or with the MRP traps set up by the Ocelot library. Other alternatives were considered, like patching user space to do something, but there are so many ways in which PTP packets could be made to reach the CPU, generically speaking, that "do what?" is a very valid question. The ptp4l program from the linuxptp stack already attempts to do something: it calls setsockopt(IP_ADD_MEMBERSHIP) (and PACKET_ADD_MEMBERSHIP, respectively) which translates in both cases into a dev_mc_add() on the interface, in the kernel: https://github.com/richardcochran/linuxptp/blob/v3.1.1/udp.c#L73 https://github.com/richardcochran/linuxptp/blob/v3.1.1/raw.c Reality shows that this is not sufficient in case the interface belongs to a switchdev driver, as dev_mc_add() does not show the intention to trap a packet to the CPU, but rather the intention to not drop it (it is strictly for RX filtering, same as promiscuous does not mean to send all traffic to the CPU, but to not drop traffic with unknown MAC DA). This topic is a can of worms in itself, and it would be great if user space could just stay out of it. On the other hand, setting up PTP traps privately within the driver is not new by any stretch of the imagination: https://elixir.bootlin.com/linux/v5.16-rc2/source/drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c#L833 https://elixir.bootlin.com/linux/v5.16-rc2/source/drivers/net/dsa/hirschmann/hellcreek.c#L1050 https://elixir.bootlin.com/linux/v5.16-rc2/source/include/linux/dsa/sja1105.h#L21 So this is the approach taken here as well. The difference here being that we prepare and destroy the traps per port, dynamically at runtime, as opposed to driver init time, because apparently, PTP-unaware forwarding is a use case. Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support") Reported-by: Po Liu <po.liu@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-26net: mscc: ocelot: create a function that replaces an existing VCAP filterVladimir Oltean
VCAP (Versatile Content Aware Processor) is the TCAM-based engine behind tc flower offload on ocelot, among other things. The ingress port mask on which VCAP rules match is present as a bit field in the actual key of the rule. This means that it is possible for a rule to be shared among multiple source ports. When the rule is added one by one on each desired port, that the ingress port mask of the key must be edited and rewritten to hardware. But the API in ocelot_vcap.c does not allow for this. For one thing, ocelot_vcap_filter_add() and ocelot_vcap_filter_del() are not symmetric, because ocelot_vcap_filter_add() works with a preallocated and prepopulated filter and programs it to hardware, and ocelot_vcap_filter_del() does both the job of removing the specified filter from hardware, as well as kfreeing it. That is to say, the only option of editing a filter in place, which is to delete it, modify the structure and add it back, does not work because it results in use-after-free. This patch introduces ocelot_vcap_filter_replace, which trivially reprograms a VCAP entry to hardware, at the exact same index at which it existed before, without modifying any list or allocating any memory. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-26net: mscc: ocelot: don't downgrade timestamping RX filters in SIOCSHWTSTAMPVladimir Oltean
The ocelot driver, when asked to timestamp all receiving packets, 1588 v1 or NTP, says "nah, here's 1588 v2 for you". According to this discussion: https://patchwork.kernel.org/project/netdevbpf/patch/20211104133204.19757-8-martin.kaistra@linutronix.de/#24577647 drivers that downgrade from a wider request to a narrower response (or even a response where the intersection with the request is empty) are buggy, and should return -ERANGE instead. This patch fixes that. Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support") Suggested-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-25net: dsa: felix: enable cut-through forwarding between ports by defaultVladimir Oltean
The VSC9959 switch embedded within NXP LS1028A (and that version of Ocelot switches only) supports cut-through forwarding - meaning it can start the process of looking up the destination ports for a packet, and forward towards those ports, before the entire packet has been received (as opposed to the store-and-forward mode). The up side is having lower forwarding latency for large packets. The down side is that frames with FCS errors are forwarded instead of being dropped. However, erroneous frames do not result in incorrect updates of the FDB or incorrect policer updates, since these processes are deferred inside the switch to the end of frame. Since the switch starts the cut-through forwarding process after all packet headers (including IP, if any) have been processed, packets with large headers and small payload do not see the benefit of lower forwarding latency. There are two cases that need special attention. The first is when a packet is multicast (or flooded) to multiple destinations, one of which doesn't have cut-through forwarding enabled. The switch deals with this automatically by disabling cut-through forwarding for the frame towards all destination ports. The second is when a packet is forwarded from a port of lower link speed towards a port of higher link speed. This is not handled by the hardware and needs software intervention. Since we practically need to update the cut-through forwarding domain from paths that aren't serialized by the rtnl_mutex (phylink mac_link_down/mac_link_up ops), this means we need to serialize physical link events with user space updates of bonding/bridging domains. Enabling cut-through forwarding is done per {egress port, traffic class}. I don't see any reason why this would be a configurable option as long as it works without issues, and there doesn't appear to be any user space configuration tool to toggle this on/off, so this patch enables cut-through forwarding on all eligible ports and traffic classes. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20211125125808.2383984-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-25net: ocelot: remove "bridge" argument from ocelot_get_bridge_fwd_maskVladimir Oltean
The only called takes ocelot_port->bridge and passes it as the "bridge" argument to this function, which then compares it with ocelot_port->bridge. This is not useful. Instead, we would like this function to return 0 if ocelot_port->bridge is not present, which is what this patch does. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20211125125808.2383984-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-18net: dsa: felix: restrict psfp rules on ingress portXiaoliang Yang
PSFP rules take effect on the streams from any port of VSC9959 switch. This patch use ingress port to limit the rule only active on this port. Each stream can only match two ingress source ports in VSC9959. Streams from lowest port gets the configuration of SFID pointed by MAC Table lookup and streams from highest port gets the configuration of (SFID+1) pointed by MAC Table lookup. This patch defines the PSFP rule on highest port as dummy rule, which means that it does not modify the MAC table. Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-18net: mscc: ocelot: use index to set vcap policerXiaoliang Yang
Policer was previously automatically assigned from the highest index to the lowest index from policer pool. But police action of tc flower now uses index to set an police entry. This patch uses the police index to set vcap policers, so that one policer can be shared by multiple rules. Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-18net: mscc: ocelot: add gate and police action offload to PSFPXiaoliang Yang
PSFP support gate and police action. This patch add the gate and police action to flower parse action, check chain ID to determine which block to offload. Adding psfp callback functions to add, delete and update gate and police in PSFP table if hardware supports it. Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-18net: mscc: ocelot: set vcap IS2 chain to goto PSFP chainXiaoliang Yang
Some chips in the ocelot series such as VSC9959 support Per-Stream Filtering and Policing(PSFP), which is processing after VCAP blocks. We set this block on chain 30000 and set vcap IS2 chain to goto PSFP chain if hardware support. Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-18net: mscc: ocelot: add MAC table stream learn and lookup operationsXiaoliang Yang
ocelot_mact_learn_streamdata() can be used in VSC9959 to overwrite an FDB entry with stream data. The stream data includes SFID and SSID which can be used for PSFP and FRER set. ocelot_mact_lookup() can be used to check if the given {DMAC, VID} FDB entry is exist, and also can retrieve the DEST_IDX and entry type for the FDB entry. Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-17net: ocelot_net: use phylink_generic_validate()Russell King (Oracle)
ocelot_net has no special behaviour in its validation implementation, so can be switched to phylink_generic_validate(). Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-17net: ocelot_net: remove interface checks in macb_validate()Russell King (Oracle)
As phylink checks the interface mode against the supported_interfaces bitmap, we no longer need to validate the interface mode in the validation function. Remove this to simplify it. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-17net: ocelot_net: populate supported_interfaces memberRussell King (Oracle)
Populate the phy interface mode bitmap for the MSCC Ocelot driver with the interface modes supported by the MAC. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25net: mscc: ocelot: serialize access to the MAC tableVladimir Oltean
DSA would like to remove the rtnl_lock from its SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE handlers, and the felix driver uses the same MAC table functions as ocelot. This means that the MAC table functions will no longer be implicitly serialized with respect to each other by the rtnl_mutex, we need to add a dedicated lock in ocelot for the non-atomic operations of selecting a MAC table row, reading/writing what we want and polling for completion. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25Revert "Merge branch 'dsa-rtnl'"David S. Miller
This reverts commit 965e6b262f48257dbdb51b565ecfd84877a0ab5f, reversing changes made to 4d98bb0d7ec2d0b417df6207b0bafe1868bad9f8.
2021-10-24net: convert users of bitmap_foo() to linkmode_foo()Sean Anderson
This converts instances of bitmap_foo(args..., __ETHTOOL_LINK_MODE_MASK_NBITS) to linkmode_foo(args...) I manually fixed up some lines to prevent them from being excessively long. Otherwise, this change was generated with the following semantic patch: // Generated with // echo linux/linkmode.h > includes // git grep -Flf includes include/ | cut -f 2- -d / | cat includes - \ // | sort | uniq | tee new_includes | wc -l && mv new_includes includes // and repeating until the number stopped going up @i@ @@ ( #include <linux/acpi_mdio.h> | #include <linux/brcmphy.h> | #include <linux/dsa/loop.h> | #include <linux/dsa/sja1105.h> | #include <linux/ethtool.h> | #include <linux/ethtool_netlink.h> | #include <linux/fec.h> | #include <linux/fs_enet_pd.h> | #include <linux/fsl/enetc_mdio.h> | #include <linux/fwnode_mdio.h> | #include <linux/linkmode.h> | #include <linux/lsm_audit.h> | #include <linux/mdio-bitbang.h> | #include <linux/mdio.h> | #include <linux/mdio-mux.h> | #include <linux/mii.h> | #include <linux/mii_timestamper.h> | #include <linux/mlx5/accel.h> | #include <linux/mlx5/cq.h> | #include <linux/mlx5/device.h> | #include <linux/mlx5/driver.h> | #include <linux/mlx5/eswitch.h> | #include <linux/mlx5/fs.h> | #include <linux/mlx5/port.h> | #include <linux/mlx5/qp.h> | #include <linux/mlx5/rsc_dump.h> | #include <linux/mlx5/transobj.h> | #include <linux/mlx5/vport.h> | #include <linux/of_mdio.h> | #include <linux/of_net.h> | #include <linux/pcs-lynx.h> | #include <linux/pcs/pcs-xpcs.h> | #include <linux/phy.h> | #include <linux/phy_led_triggers.h> | #include <linux/phylink.h> | #include <linux/platform_data/bcmgenet.h> | #include <linux/platform_data/xilinx-ll-temac.h> | #include <linux/pxa168_eth.h> | #include <linux/qed/qed_eth_if.h> | #include <linux/qed/qed_fcoe_if.h> | #include <linux/qed/qed_if.h> | #include <linux/qed/qed_iov_if.h> | #include <linux/qed/qed_iscsi_if.h> | #include <linux/qed/qed_ll2_if.h> | #include <linux/qed/qed_nvmetcp_if.h> | #include <linux/qed/qed_rdma_if.h> | #include <linux/sfp.h> | #include <linux/sh_eth.h> | #include <linux/smsc911x.h> | #include <linux/soc/nxp/lpc32xx-misc.h> | #include <linux/stmmac.h> | #include <linux/sunrpc/svc_rdma.h> | #include <linux/sxgbe_platform.h> | #include <net/cfg80211.h> | #include <net/dsa.h> | #include <net/mac80211.h> | #include <net/selftests.h> | #include <rdma/ib_addr.h> | #include <rdma/ib_cache.h> | #include <rdma/ib_cm.h> | #include <rdma/ib_hdrs.h> | #include <rdma/ib_mad.h> | #include <rdma/ib_marshall.h> | #include <rdma/ib_pack.h> | #include <rdma/ib_pma.h> | #include <rdma/ib_sa.h> | #include <rdma/ib_smi.h> | #include <rdma/ib_umem.h> | #include <rdma/ib_umem_odp.h> | #include <rdma/ib_verbs.h> | #include <rdma/iw_cm.h> | #include <rdma/mr_pool.h> | #include <rdma/opa_addr.h> | #include <rdma/opa_port_info.h> | #include <rdma/opa_smi.h> | #include <rdma/opa_vnic.h> | #include <rdma/rdma_cm.h> | #include <rdma/rdma_cm_ib.h> | #include <rdma/rdmavt_cq.h> | #include <rdma/rdma_vt.h> | #include <rdma/rdmavt_qp.h> | #include <rdma/rw.h> | #include <rdma/tid_rdma_defs.h> | #include <rdma/uverbs_ioctl.h> | #include <rdma/uverbs_named_ioctl.h> | #include <rdma/uverbs_std_types.h> | #include <rdma/uverbs_types.h> | #include <soc/mscc/ocelot.h> | #include <soc/mscc/ocelot_ptp.h> | #include <soc/mscc/ocelot_vcap.h> | #include <trace/events/ib_mad.h> | #include <trace/events/rdma_core.h> | #include <trace/events/rdma.h> | #include <trace/events/rpcrdma.h> | #include <uapi/linux/ethtool.h> | #include <uapi/linux/ethtool_netlink.h> | #include <uapi/linux/mdio.h> | #include <uapi/linux/mii.h> ) @depends on i@ expression list args; @@ ( - bitmap_zero(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_zero(args) | - bitmap_copy(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_copy(args) | - bitmap_and(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_and(args) | - bitmap_or(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_or(args) | - bitmap_empty(args, ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_empty(args) | - bitmap_andnot(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_andnot(args) | - bitmap_equal(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_equal(args) | - bitmap_intersects(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_intersects(args) | - bitmap_subset(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_subset(args) ) Add missing linux/mii.h include to mellanox. -DaveM Signed-off-by: Sean Anderson <sean.anderson@seco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-24net: mscc: ocelot: serialize access to the MAC tableVladimir Oltean
DSA would like to remove the rtnl_lock from its SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE handlers, and the felix driver uses the same MAC table functions as ocelot. This means that the MAC table functions will no longer be implicitly serialized with respect to each other by the rtnl_mutex, we need to add a dedicated lock in ocelot for the non-atomic operations of selecting a MAC table row, reading/writing what we want and polling for completion. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Lots of simnple overlapping additions. With a build fix from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21net: mscc: ocelot: track the port pvid using a pointerVladimir Oltean
Now that we have a list of struct ocelot_bridge_vlan entries, we can rewrite the pvid logic to simply point to one of those structures, instead of having a separate structure with a "bool valid". The NULL pointer will represent the lack of a bridge pvid (not to be confused with the lack of a hardware pvid on the port, that is present at all times). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21net: mscc: ocelot: add the local station MAC addresses in VID 0Vladimir Oltean
The ocelot switchdev driver does not include the CPU port in the list of flooding destinations for unknown traffic, instead that traffic is supposed to match FDB entries to reach the CPU. The addresses it installs are: (a) the station MAC address, in ocelot_probe_port() and later during runtime in ocelot_port_set_mac_address(). These are the VLAN-unaware addresses. The VLAN-aware addresses are in ocelot_vlan_vid_add(). (b) multicast addresses added with dev_mc_add() (not bridge host MDB entries) in ocelot_mc_sync() (c) multicast destination MAC addresses for MRP in ocelot_mrp_save_mac(), to make sure those are dropped (not forwarded) by the bridging service, just trapped to the CPU So we can see that the logic is slightly buggy ever since the initial commit a556c76adc05 ("net: mscc: Add initial Ocelot switch support"). This is because, when ocelot_probe_port() runs, the port pvid is 0. Then we join a VLAN-aware bridge, the pvid becomes 1, we call ocelot_port_set_mac_address(), this learns the new MAC address in VID 1 (also fails to forget the old one, since it thinks it's in VID 1, but that's not so important). Then when we leave the VLAN-aware bridge, outside world is unable to ping our new MAC address because it isn't learned in VID 0, the VLAN-unaware pvid. [ note: this is strictly based on static analysis, I don't have hardware to test. But there are also many more corner cases ] The basic idea is that we should have a separation of concerns, and the FDB entries used for standalone operation should be managed by the driver, and the FDB entries used by the bridging service should be managed by the bridge. So the standalone and VLAN-unaware bridge FDB entries should not follow the bridge PVID, because that will only be active when the bridge is VLAN-aware. So since the port pvid is coincidentally zero during probe time, just make those entries statically go to VID 0. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21net: mscc: ocelot: allow a config where all bridge VLANs are egress-untaggedVladimir Oltean
At present, the ocelot driver accepts a single egress-untagged bridge VLAN, meaning that this sequence of operations: ip link add br0 type bridge vlan_filtering 1 ip link set swp0 master br0 bridge vlan add dev swp0 vid 2 pvid untagged fails because the bridge automatically installs VID 1 as a pvid & untagged VLAN, and vid 2 would be the second untagged VLAN on this port. It is necessary to delete VID 1 before proceeding to add VID 2. This limitation comes from the fact that we operate the port tag, when it has an egress-untagged VID, in the OCELOT_PORT_TAG_NATIVE mode. The ocelot switches do not have full flexibility and can either have one single VID as egress-untagged, or all of them. There are use cases for having all VLANs as egress-untagged as well, and this patch adds support for that. The change rewrites ocelot_port_set_native_vlan() into a more generic ocelot_port_manage_port_tag() function. Because the software bridge's state, transmitted to us via switchdev, can become very complex, we don't attempt to track all possible state transitions, but instead take a more declarative approach and just make ocelot_port_manage_port_tag() figure out which more to operate in: - port is VLAN-unaware: the classified VLAN (internal, unrelated to the 802.1Q header) is not inserted into packets on egress - port is VLAN-aware: - port has tagged VLANs: -> port has no untagged VLAN: set up as pure trunk -> port has one untagged VLAN: set up as trunk port + native VLAN -> port has more than one untagged VLAN: this is an invalid config which is rejected by ocelot_vlan_prepare - port has no tagged VLANs -> set up as pure egress-untagged port We don't keep the number of tagged and untagged VLANs, we just count the structures we keep. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21net: mscc: ocelot: convert the VLAN masks to a listVladimir Oltean
First and foremost, the driver currently allocates a constant sized 4K * u32 (16KB memory) array for the VLAN masks. However, a typical application might not need so many VLANs, so if we dynamically allocate the memory as needed, we might actually save some space. Secondly, we'll need to keep more advanced bookkeeping of the VLANs we have, notably we'll have to check how many untagged and how many tagged VLANs we have. This will have to stay in a structure, and allocating another 16 KB array for that is again a bit too much. So refactor the bridge VLANs in a linked list of structures. The hook points inside the driver are ocelot_vlan_member_add() and ocelot_vlan_member_del(), which previously used to operate on the ocelot->vlan_mask[vid] array element. ocelot_vlan_member_add() and ocelot_vlan_member_del() used to call ocelot_vlan_member_set() to commit to the ocelot->vlan_mask. Additionally, we had two calls to ocelot_vlan_member_set() from outside those callers, and those were directly from ocelot_vlan_init(). Those calls do not set up bridging service VLANs, instead they: - clear the VLAN table on reset - set the port pvid to the value used by this driver for VLAN-unaware standalone port operation (VID 0) So now, when we have a structure which represents actual bridge VLANs, VID 0 doesn't belong in that structure, since it is not part of the bridging layer. So delete the middle man, ocelot_vlan_member_set(), and let ocelot_vlan_init() call directly ocelot_vlant_set_mask() which forgoes any data structure and writes directly to hardware, which is all that we need. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21net: mscc: ocelot: add a type definition for REW_TAG_CFG_TAG_CFGVladimir Oltean
This is a cosmetic patch which clarifies what are the port tagging options for Ocelot switches. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19ethernet: ocelot: use eth_hw_addr_gen()Jakub Kicinski
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18net: mscc: ocelot: Add of_node_put() before gotoWan Jiabing
Fix following coccicheck warning: ./drivers/net/ethernet/mscc/ocelot_vsc7514.c:946:1-33: WARNING: Function for_each_available_child_of_node should have of_node_put() before goto. Early exits from for_each_available_child_of_node should decrement the node reference counter. Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
tools/testing/selftests/net/ioam6.sh 7b1700e009cc ("selftests: net: modify IOAM tests for undef bits") bf77b1400a56 ("selftests: net: Test for the IOAM encapsulation with IPv6") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12net: dsa: tag_ocelot: break circular dependency with ocelot switch lib driverVladimir Oltean
As explained here: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/ DSA tagging protocol drivers cannot depend on symbols exported by switch drivers, because this creates a circular dependency that breaks module autoloading. The tag_ocelot.c file depends on the ocelot_ptp_rew_op() function exported by the common ocelot switch lib. This function looks at OCELOT_SKB_CB(skb) and computes how to populate the REW_OP field of the DSA tag, for PTP timestamping (the command: one-step/two-step, and the TX timestamp identifier). None of that requires deep insight into the driver, it is quite stateless, as it only depends upon the skb->cb. So let's make it a static inline function and put it in include/linux/dsa/ocelot.h, a file that despite its name is used by the ocelot switch driver for populating the injection header too - since commit 40d3f295b5fe ("net: mscc: ocelot: use common tag parsing code with DSA"). With that function declared as static inline, its body is expanded inside each call site, so the dependency is broken and the DSA tagger can be built without the switch library, upon which the felix driver depends. Fixes: 39e5308b3250 ("net: mscc: ocelot: support PTP Sync one-step timestamping") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12net: mscc: ocelot: cross-check the sequence id from the timestamp FIFO with ↵Vladimir Oltean
the skb PTP header The sad reality is that when a PTP frame with a TX timestamping request is transmitted, it isn't guaranteed that it will make it all the way to the wire (due to congestion inside the switch), and that a timestamp will be taken by the hardware and placed in the timestamp FIFO where an IRQ will be raised for it. The implication is that if enough PTP frames are silently dropped by the hardware such that the timestamp ID has rolled over, it is possible to match a timestamp to an old skb. Furthermore, nobody will match on the real skb corresponding to this timestamp, since we stupidly matched on a previous one that was stale in the queue, and stopped there. So PTP timestamping will be broken and there will be no way to recover. It looks like the hardware parses the sequenceID from the PTP header, and also provides that metadata for each timestamp. The driver currently ignores this, but it shouldn't. As an extra resiliency measure, do the following: - check whether the PTP sequenceID also matches between the skb and the timestamp, treat the skb as stale otherwise and free it - if we see a stale skb, don't stop there and try to match an skb one more time, chances are there's one more skb in the queue with the same timestamp ID, otherwise we wouldn't have ever found the stale one (it is by timestamp ID that we matched it). While this does not prevent PTP packet drops, it at least prevents the catastrophic consequences of incorrect timestamp matching. Since we already call ptp_classify_raw in the TX path, save the result in the skb->cb of the clone, and just use that result in the interrupt code path. Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12net: mscc: ocelot: deny TX timestamping of non-PTP packetsVladimir Oltean
It appears that Ocelot switches cannot timestamp non-PTP frames, I tested this using the isochron program at: https://github.com/vladimiroltean/tsn-scripts with the result that the driver increments the ocelot_port->ts_id counter as expected, puts it in the REW_OP, but the hardware seems to not timestamp these packets at all, since no IRQ is emitted. Therefore check whether we are sending PTP frames, and refuse to populate REW_OP otherwise. Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12net: mscc: ocelot: warn when a PTP IRQ is raised for an unknown skbVladimir Oltean
When skb_match is NULL, it means we received a PTP IRQ for a timestamp ID that the kernel has no idea about, since there is no skb in the timestamping queue with that timestamp ID. This is a grave error and not something to just "continue" over. So print a big warning in case this happens. Also, move the check above ocelot_get_hwtimestamp(), there is no point in reading the full 64-bit current PTP time if we're not going to do anything with it anyway for this skb. Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12net: mscc: ocelot: avoid overflowing the PTP timestamp FIFOVladimir Oltean
PTP packets with 2-step TX timestamp requests are matched to packets based on the egress port number and a 6-bit timestamp identifier. All PTP timestamps are held in a common FIFO that is 128 entry deep. This patch ensures that back-to-back timestamping requests cannot exceed the hardware FIFO capacity. If that happens, simply send the packets without requesting a TX timestamp to be taken (in the case of felix, since the DSA API has a void return code in ds->ops->port_txtstamp) or drop them (in the case of ocelot). I've moved the ts_id_lock from a per-port basis to a per-switch basis, because we need separate accounting for both numbers of PTP frames in flight. And since we need locking to inc/dec the per-switch counter, that also offers protection for the per-port counter and hence there is no reason to have a per-port counter anymore. Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12net: mscc: ocelot: make use of all 63 PTP timestamp identifiersVladimir Oltean
At present, there is a problem when user space bombards a port with PTP event frames which have TX timestamping requests (or when a tc-taprio offload is installed on a port, which delays the TX timestamps by a significant amount of time). The driver will happily roll over the 2-bit timestamp ID and this will cause incorrect matches between an skb and the TX timestamp collected from the FIFO. The Ocelot switches have a 6-bit PTP timestamp identifier, and the value 63 is reserved, so that leaves identifiers 0-62 to be used. The timestamp identifiers are selected by the REW_OP packet field, and are actually shared between CPU-injected frames and frames which match a VCAP IS2 rule that modifies the REW_OP. The hardware supports partitioning between the two uses of the REW_OP field through the PTP_ID_LOW and PTP_ID_HIGH registers, and by default reserves the PTP IDs 0-3 for CPU-injected traffic and the rest for VCAP IS2. The driver does not use VCAP IS2 to set REW_OP for 2-step timestamping, and it also writes 0xffffffff to both PTP_ID_HIGH and PTP_ID_LOW in ocelot_init_timestamp() which makes all timestamp identifiers available to CPU injection. Therefore, we can make use of all 63 timestamp identifiers, which should allow more timestampable packets to be in flight on each port. This is only part of the solution, more issues will be addressed in future changes. Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12net: mscc: ocelot: Fix dumplicated argument in ocelotWan Jiabing
Fix the following coccicheck warning: drivers/net/ethernet/mscc/ocelot.c:474:duplicated argument to & or | drivers/net/ethernet/mscc/ocelot.c:476:duplicated argument to & or | drivers/net/ethernet/mscc/ocelot_net.c:1627:duplicated argument to & or | These DEV_CLOCK_CFG_MAC_TX_RST are duplicate here. Here should be DEV_CLOCK_CFG_MAC_RX_RST. Fixes: e6e12df625f2 ("net: mscc: ocelot: convert to phylink") Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-07of: net: move of_net under net/Jakub Kicinski
Rob suggests to move of_net.c from under drivers/of/ somewhere to the networking code. Suggested-by: Rob Herring <robh@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-02ethernet: use eth_hw_addr_set() instead of ether_addr_copy()Jakub Kicinski
Convert Ethernet from ether_addr_copy() to eth_hw_addr_set(): @@ expression dev, np; @@ - ether_addr_copy(dev->dev_addr, np) + eth_hw_addr_set(dev, np) Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-02ethernet: use eth_hw_addr_set()Jakub Kicinski
Convert all Ethernet drivers from memcpy(... ETH_ADDR) to eth_hw_addr_set(): @@ expression dev, np; @@ - memcpy(dev->dev_addr, np, ETH_ALEN) + eth_hw_addr_set(dev, np) Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-02net: mscc: ocelot: write full VLAN TCI in the injection headerVladimir Oltean
The VLAN TCI contains more than the VLAN ID, it also has the VLAN PCP and Drop Eligibility Indicator. If the ocelot driver is going to write the VLAN header inside the DSA tag, it could just as well write the entire TCI. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-02net: mscc: ocelot: support egress VLAN rewriting via VCAP ES0Vladimir Oltean
Currently the ocelot driver does support the 'vlan modify' action, but in the ingress chain, and it is offloaded to VCAP IS1. This action changes the classified VLAN before the packet enters the bridging service, and the bridging works with the classified VLAN modified by VCAP IS1. That is good for some use cases, but there are others where the VLAN must be modified at the stage of the egress port, after the packet has exited the bridging service. One example is simulating IEEE 802.1CB active stream identification filters ("active" means that not only the rule matches on a packet flow, but it is also able to change some headers). For example, a stream is replicated on two egress ports, but they must have different VLAN IDs on egress ports A and B. This seems like a task for the VCAP ES0, but that currently only supports pushing the ES0 tag A, which is specified in the rule. Pushing another VLAN header is not what we want, but rather overwriting the existing one. It looks like when we push the ES0 tag A, it is actually possible to not only take the ES0 tag A's value from the rule itself (VID_A_VAL), but derive it from the following formula: ES0_TAG_A = Classified VID + VID_A_VAL Otherwise said, ES0_TAG_A can be used to increment with a given value the VLAN ID that the packet was already classified to, and the packet will have this value as an outer VLAN tag. This new VLAN ID value then gets stripped on egress (or not) according to the value of the native VLAN from the bridging service. While the hardware will happily increment the classified VLAN ID for all packets that match the ES0 rule, in practice this would be rather insane, so we only allow this kind of ES0 action if the ES0 filter contains a VLAN ID too, so as to restrict the matching on a known classified VLAN. If we program VID_A_VAL with the delta between the desired final VLAN (ES0_TAG_A) and the classified VLAN, we obtain the desired behavior. It doesn't look like it is possible with the tc-vlan action to modify the VLAN ID but not the PCP. In hardware it is possible to leave the PCP to the classified value, but we unconditionally program it to overwrite it with the PCP value from the rule. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-01net: mscc: ocelot: fix VCAP filters remaining active after being deletedVladimir Oltean
When ocelot_flower.c calls ocelot_vcap_filter_add(), the filter has a given filter->id.cookie. This filter is added to the block->rules list. However, when ocelot_flower.c calls ocelot_vcap_block_find_filter_by_id() which passes the cookie as argument, the filter is never found by filter->id.cookie when searching through the block->rules list. This is unsurprising, since the filter->id.cookie is an unsigned long, but the cookie argument provided to ocelot_vcap_block_find_filter_by_id() is a signed int, and the comparison fails. Fixes: 50c6cc5b9283 ("net: mscc: ocelot: store a namespaced VCAP filter ID") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20210930125330.2078625-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-27net: mscc: ocelot: delay devlink registration to the endLeon Romanovsky
Open access to the devlink interface when the driver fully initialized. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
net/mptcp/protocol.c 977d293e23b4 ("mptcp: ensure tx skbs always have the MPTCP ext") efe686ffce01 ("mptcp: ensure tx skbs always have the MPTCP ext") same patch merged in both trees, keep net-next. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-23net: mscc: ocelot: fix forwarding from BLOCKING ports remaining enabledVladimir Oltean
The blamed commit made the fatally incorrect assumption that ports which aren't in the FORWARDING STP state should not have packets forwarded towards them, and that is all that needs to be done. However, that logic alone permits BLOCKING ports to forward to FORWARDING ports, which of course allows packet storms to occur when there is an L2 loop. The ocelot_get_bridge_fwd_mask should not only ask "what can the bridge do for you", but "what can you do for the bridge". This way, only FORWARDING ports forward to the other FORWARDING ports from the same bridging domain, and we are still compatible with the idea of multiple bridges. Fixes: df291e54ccca ("net: ocelot: support multiple bridges") Suggested-by: Colin Foster <colin.foster@in-advantage.com> Reported-by: Colin Foster <colin.foster@in-advantage.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Signed-off-by: David S. Miller <davem@davemloft.net>