Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:
 "Just the usual assortment of small'ish fixes:

   1) Conntrack timeout is sometimes not initialized properly, from
      Alexander Potapenko.

   2) Add a reasonable range limit to tcp_min_rtt_wlen to avoid
      undefined behavior. From ZhangXiaoxu.

   3) des1 field of descriptor in stmmac driver is initialized with the
      wrong variable. From Yue Haibing.

   4) Increase mlxsw pci sw reset timeout a little bit more, from Ido
      Schimmel.

   5) Match IOT2000 stmmac devices more accurately, from Su Bao Cheng.

   6) Fallback refcount fix in TLS code, from Jakub Kicinski.

   7) Fix max MTU check when using XDP in mlx5, from Maxim Mikityanskiy.

   8) Fix recursive locking in team driver, from Hangbin Liu.

   9) Fix tls_set_device_offload_Rx() deadlock, from Jakub Kicinski.

  10) Don't use napi_alloc_frag() outside of softiq context of socionext
      driver, from Ilias Apalodimas.

  11) MAC address increment overflow in ncsi, from Tao Ren.

  12) Fix a regression in 8K/1M pool switching of RDS, from Zhu Yanjun.

  13) ipv4_link_failure has to validate the headers that are actually
      there because RAW sockets can pass in arbitrary garbage, from Eric
      Dumazet"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
  ipv4: add sanity checks in ipv4_link_failure()
  net/rose: fix unbound loop in rose_loopback_timer()
  rxrpc: fix race condition in rxrpc_input_packet()
  net: rds: exchange of 8K and 1M pool
  net: vrf: Fix operation not supported when set vrf mac
  net/ncsi: handle overflow when incrementing mac address
  net: socionext: replace napi_alloc_frag with the netdev variant on init
  net: atheros: fix spelling mistake "underun" -> "underrun"
  spi: ST ST95HF NFC: declare missing of table
  spi: Micrel eth switch: declare missing of table
  net: stmmac: move stmmac_check_ether_addr() to driver probe
  netfilter: fix nf_l4proto_log_invalid to log invalid packets
  netfilter: never get/set skb->tstamp
  netfilter: ebtables: CONFIG_COMPAT: drop a bogus WARN_ON
  Documentation: decnet: remove reference to CONFIG_DECNET_ROUTE_FWMARK
  dt-bindings: add an explanation for internal phy-mode
  net/tls: don't leak IV and record seq when offload fails
  net/tls: avoid potential deadlock in tls_set_device_offload_rx()
  selftests/net: correct the return value for run_afpackettests
  team: fix possible recursive locking when add slaves
  ...
This commit is contained in:
Linus Torvalds 2019-04-24 16:18:59 -07:00
commit cd8dead0c3
61 changed files with 690 additions and 187 deletions

View File

@ -20,6 +20,8 @@ Required properties:
Optional properties:
- phy-handle: See ethernet.txt file in the same directory.
If absent, davinci_emac driver defaults to 100/FULL.
- nvmem-cells: phandle, reference to an nvmem node for the MAC address
- nvmem-cell-names: string, should be "mac-address" if nvmem is to be used
- ti,davinci-rmii-en: 1 byte, 1 means use RMII
- ti,davinci-no-bd-ram: boolean, does EMAC have BD RAM?

View File

@ -10,15 +10,14 @@ Documentation/devicetree/bindings/phy/phy-bindings.txt.
the boot program; should be used in cases where the MAC address assigned to
the device by the boot program is different from the "local-mac-address"
property;
- nvmem-cells: phandle, reference to an nvmem node for the MAC address;
- nvmem-cell-names: string, should be "mac-address" if nvmem is to be used;
- max-speed: number, specifies maximum speed in Mbit/s supported by the device;
- max-frame-size: number, maximum transfer unit (IEEE defined MTU), rather than
the maximum frame size (there's contradiction in the Devicetree
Specification).
- phy-mode: string, operation mode of the PHY interface. This is now a de-facto
standard property; supported values are:
* "internal"
* "internal" (Internal means there is not a standard bus between the MAC and
the PHY, something proprietary is being used to embed the PHY in the MAC.)
* "mii"
* "gmii"
* "sgmii"

View File

@ -26,6 +26,10 @@ Required properties:
Optional elements: 'tsu_clk'
- clocks: Phandles to input clocks.
Optional properties:
- nvmem-cells: phandle, reference to an nvmem node for the MAC address
- nvmem-cell-names: string, should be "mac-address" if nvmem is to be used
Optional properties for PHY child node:
- reset-gpios : Should specify the gpio for phy reset
- magic-packet : If present, indicates that the hardware supports waking

View File

@ -22,8 +22,6 @@ you'll need the following options as well...
CONFIG_DECNET_ROUTER (to be able to add/delete routes)
CONFIG_NETFILTER (will be required for the DECnet routing daemon)
CONFIG_DECNET_ROUTE_FWMARK is optional
Don't turn on SIOCGIFCONF support for DECnet unless you are really sure
that you need it, in general you won't and it can cause ifconfig to
malfunction.

View File

@ -422,6 +422,7 @@ tcp_min_rtt_wlen - INTEGER
minimum RTT when it is moved to a longer path (e.g., due to traffic
engineering). A longer window makes the filter more resistant to RTT
inflations such as transient congestion. The unit is seconds.
Possible values: 0 - 86400 (1 day)
Default: 300
tcp_moderate_rcvbuf - BOOLEAN

View File

@ -1646,7 +1646,7 @@ static irqreturn_t fs_irq (int irq, void *dev_id)
}
if (status & ISR_TBRQ_W) {
fs_dprintk (FS_DEBUG_IRQ, "Data tramsitted!\n");
fs_dprintk (FS_DEBUG_IRQ, "Data transmitted!\n");
process_txdone_queue (dev, &dev->tx_relq);
}

View File

@ -1721,7 +1721,7 @@ static void atl1_inc_smb(struct atl1_adapter *adapter)
adapter->soft_stats.scc += smb->tx_1_col;
adapter->soft_stats.mcc += smb->tx_2_col;
adapter->soft_stats.latecol += smb->tx_late_col;
adapter->soft_stats.tx_underun += smb->tx_underrun;
adapter->soft_stats.tx_underrun += smb->tx_underrun;
adapter->soft_stats.tx_trunc += smb->tx_trunc;
adapter->soft_stats.tx_pause += smb->tx_pause;
@ -3179,7 +3179,7 @@ static struct atl1_stats atl1_gstrings_stats[] = {
{"tx_deferred_ok", ATL1_STAT(soft_stats.deffer)},
{"tx_single_coll_ok", ATL1_STAT(soft_stats.scc)},
{"tx_multi_coll_ok", ATL1_STAT(soft_stats.mcc)},
{"tx_underun", ATL1_STAT(soft_stats.tx_underun)},
{"tx_underrun", ATL1_STAT(soft_stats.tx_underrun)},
{"tx_trunc", ATL1_STAT(soft_stats.tx_trunc)},
{"tx_pause", ATL1_STAT(soft_stats.tx_pause)},
{"rx_pause", ATL1_STAT(soft_stats.rx_pause)},

View File

@ -681,7 +681,7 @@ struct atl1_sft_stats {
u64 scc; /* packets TX after a single collision */
u64 mcc; /* packets TX after multiple collisions */
u64 latecol; /* TX packets w/ late collisions */
u64 tx_underun; /* TX packets aborted due to TX FIFO underrun
u64 tx_underrun; /* TX packets aborted due to TX FIFO underrun
* or TRD FIFO underrun */
u64 tx_trunc; /* TX packets truncated due to size > MTU */
u64 rx_pause; /* num Pause packets received. */

View File

@ -553,7 +553,7 @@ static void atl2_intr_tx(struct atl2_adapter *adapter)
netdev->stats.tx_aborted_errors++;
if (txs->late_col)
netdev->stats.tx_window_errors++;
if (txs->underun)
if (txs->underrun)
netdev->stats.tx_fifo_errors++;
} while (1);

View File

@ -260,7 +260,7 @@ struct tx_pkt_status {
unsigned multi_col:1;
unsigned late_col:1;
unsigned abort_col:1;
unsigned underun:1; /* current packet is aborted
unsigned underrun:1; /* current packet is aborted
* due to txram underrun */
unsigned:3; /* reserved */
unsigned update:1; /* always 1'b1 in tx_status_buf */

View File

@ -33,6 +33,26 @@
#include <linux/bpf_trace.h>
#include "en/xdp.h"
int mlx5e_xdp_max_mtu(struct mlx5e_params *params)
{
int hr = NET_IP_ALIGN + XDP_PACKET_HEADROOM;
/* Let S := SKB_DATA_ALIGN(sizeof(struct skb_shared_info)).
* The condition checked in mlx5e_rx_is_linear_skb is:
* SKB_DATA_ALIGN(sw_mtu + hard_mtu + hr) + S <= PAGE_SIZE (1)
* (Note that hw_mtu == sw_mtu + hard_mtu.)
* What is returned from this function is:
* max_mtu = PAGE_SIZE - S - hr - hard_mtu (2)
* After assigning sw_mtu := max_mtu, the left side of (1) turns to
* SKB_DATA_ALIGN(PAGE_SIZE - S) + S, which is equal to PAGE_SIZE,
* because both PAGE_SIZE and S are already aligned. Any number greater
* than max_mtu would make the left side of (1) greater than PAGE_SIZE,
* so max_mtu is the maximum MTU allowed.
*/
return MLX5E_HW2SW_MTU(params, SKB_MAX_HEAD(hr));
}
static inline bool
mlx5e_xmit_xdp_buff(struct mlx5e_xdpsq *sq, struct mlx5e_dma_info *di,
struct xdp_buff *xdp)
@ -304,9 +324,9 @@ bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq *cq, struct mlx5e_rq *rq)
mlx5e_xdpi_fifo_pop(xdpi_fifo);
if (is_redirect) {
xdp_return_frame(xdpi.xdpf);
dma_unmap_single(sq->pdev, xdpi.dma_addr,
xdpi.xdpf->len, DMA_TO_DEVICE);
xdp_return_frame(xdpi.xdpf);
} else {
/* Recycle RX page */
mlx5e_page_release(rq, &xdpi.di, true);
@ -345,9 +365,9 @@ void mlx5e_free_xdpsq_descs(struct mlx5e_xdpsq *sq, struct mlx5e_rq *rq)
mlx5e_xdpi_fifo_pop(xdpi_fifo);
if (is_redirect) {
xdp_return_frame(xdpi.xdpf);
dma_unmap_single(sq->pdev, xdpi.dma_addr,
xdpi.xdpf->len, DMA_TO_DEVICE);
xdp_return_frame(xdpi.xdpf);
} else {
/* Recycle RX page */
mlx5e_page_release(rq, &xdpi.di, false);

View File

@ -34,13 +34,12 @@
#include "en.h"
#define MLX5E_XDP_MAX_MTU ((int)(PAGE_SIZE - \
MLX5_SKB_FRAG_SZ(XDP_PACKET_HEADROOM)))
#define MLX5E_XDP_MIN_INLINE (ETH_HLEN + VLAN_HLEN)
#define MLX5E_XDP_TX_EMPTY_DS_COUNT \
(sizeof(struct mlx5e_tx_wqe) / MLX5_SEND_WQE_DS)
#define MLX5E_XDP_TX_DS_COUNT (MLX5E_XDP_TX_EMPTY_DS_COUNT + 1 /* SG DS */)
int mlx5e_xdp_max_mtu(struct mlx5e_params *params);
bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
void *va, u16 *rx_headroom, u32 *len);
bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq *cq, struct mlx5e_rq *rq);

View File

@ -1586,7 +1586,7 @@ static int mlx5e_get_module_info(struct net_device *netdev,
break;
case MLX5_MODULE_ID_SFP:
modinfo->type = ETH_MODULE_SFF_8472;
modinfo->eeprom_len = ETH_MODULE_SFF_8472_LEN;
modinfo->eeprom_len = MLX5_EEPROM_PAGE_LENGTH;
break;
default:
netdev_err(priv->netdev, "%s: cable type not recognized:0x%x\n",

View File

@ -3777,7 +3777,7 @@ int mlx5e_change_mtu(struct net_device *netdev, int new_mtu,
if (params->xdp_prog &&
!mlx5e_rx_is_linear_skb(priv->mdev, &new_channels.params)) {
netdev_err(netdev, "MTU(%d) > %d is not allowed while XDP enabled\n",
new_mtu, MLX5E_XDP_MAX_MTU);
new_mtu, mlx5e_xdp_max_mtu(params));
err = -EINVAL;
goto out;
}
@ -4212,7 +4212,8 @@ static int mlx5e_xdp_allowed(struct mlx5e_priv *priv, struct bpf_prog *prog)
if (!mlx5e_rx_is_linear_skb(priv->mdev, &new_channels.params)) {
netdev_warn(netdev, "XDP is not allowed with MTU(%d) > %d\n",
new_channels.params.sw_mtu, MLX5E_XDP_MAX_MTU);
new_channels.params.sw_mtu,
mlx5e_xdp_max_mtu(&new_channels.params));
return -EINVAL;
}

View File

@ -317,10 +317,6 @@ int mlx5_query_module_eeprom(struct mlx5_core_dev *dev,
size -= offset + size - MLX5_EEPROM_PAGE_LENGTH;
i2c_addr = MLX5_I2C_ADDR_LOW;
if (offset >= MLX5_EEPROM_PAGE_LENGTH) {
i2c_addr = MLX5_I2C_ADDR_HIGH;
offset -= MLX5_EEPROM_PAGE_LENGTH;
}
MLX5_SET(mcia_reg, in, l, 0);
MLX5_SET(mcia_reg, in, module, module_num);

View File

@ -27,7 +27,7 @@
#define MLXSW_PCI_SW_RESET 0xF0010
#define MLXSW_PCI_SW_RESET_RST_BIT BIT(0)
#define MLXSW_PCI_SW_RESET_TIMEOUT_MSECS 13000
#define MLXSW_PCI_SW_RESET_TIMEOUT_MSECS 20000
#define MLXSW_PCI_SW_RESET_WAIT_MSECS 100
#define MLXSW_PCI_FW_READY 0xA1844
#define MLXSW_PCI_FW_READY_MASK 0xFFFF

View File

@ -3126,11 +3126,11 @@ mlxsw_sp_port_set_link_ksettings(struct net_device *dev,
if (err)
return err;
mlxsw_sp_port->link.autoneg = autoneg;
if (!netif_running(dev))
return 0;
mlxsw_sp_port->link.autoneg = autoneg;
mlxsw_sp_port_admin_status_set(mlxsw_sp_port, false);
mlxsw_sp_port_admin_status_set(mlxsw_sp_port, true);
@ -3316,7 +3316,7 @@ static int mlxsw_sp_port_ets_init(struct mlxsw_sp_port *mlxsw_sp_port)
err = mlxsw_sp_port_ets_set(mlxsw_sp_port,
MLXSW_REG_QEEC_HIERARCY_TC,
i + 8, i,
false, 0);
true, 100);
if (err)
return err;
}

View File

@ -39,7 +39,7 @@ nfp_abm_u32_check_knode(struct nfp_abm *abm, struct tc_cls_u32_knode *knode,
}
if (knode->sel->off || knode->sel->offshift || knode->sel->offmask ||
knode->sel->offoff || knode->fshift) {
NL_SET_ERR_MSG_MOD(extack, "variable offseting not supported");
NL_SET_ERR_MSG_MOD(extack, "variable offsetting not supported");
return false;
}
if (knode->sel->hoff || knode->sel->hmask) {
@ -78,7 +78,7 @@ nfp_abm_u32_check_knode(struct nfp_abm *abm, struct tc_cls_u32_knode *knode,
k = &knode->sel->keys[0];
if (k->offmask) {
NL_SET_ERR_MSG_MOD(extack, "offset mask - variable offseting not supported");
NL_SET_ERR_MSG_MOD(extack, "offset mask - variable offsetting not supported");
return false;
}
if (k->off) {

View File

@ -673,7 +673,8 @@ static void netsec_process_tx(struct netsec_priv *priv)
}
static void *netsec_alloc_rx_data(struct netsec_priv *priv,
dma_addr_t *dma_handle, u16 *desc_len)
dma_addr_t *dma_handle, u16 *desc_len,
bool napi)
{
size_t total_len = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
size_t payload_len = NETSEC_RX_BUF_SZ;
@ -682,7 +683,7 @@ static void *netsec_alloc_rx_data(struct netsec_priv *priv,
total_len += SKB_DATA_ALIGN(payload_len + NETSEC_SKB_PAD);
buf = napi_alloc_frag(total_len);
buf = napi ? napi_alloc_frag(total_len) : netdev_alloc_frag(total_len);
if (!buf)
return NULL;
@ -765,7 +766,8 @@ static int netsec_process_rx(struct netsec_priv *priv, int budget)
/* allocate a fresh buffer and map it to the hardware.
* This will eventually replace the old buffer in the hardware
*/
buf_addr = netsec_alloc_rx_data(priv, &dma_handle, &desc_len);
buf_addr = netsec_alloc_rx_data(priv, &dma_handle, &desc_len,
true);
if (unlikely(!buf_addr))
break;
@ -1069,7 +1071,8 @@ static int netsec_setup_rx_dring(struct netsec_priv *priv)
void *buf;
u16 len;
buf = netsec_alloc_rx_data(priv, &dma_handle, &len);
buf = netsec_alloc_rx_data(priv, &dma_handle, &len,
false);
if (!buf) {
netsec_uninit_pkt_dring(priv, NETSEC_RING_RX);
goto err_out;

View File

@ -140,7 +140,7 @@ static void ndesc_init_rx_desc(struct dma_desc *p, int disable_rx_ic, int mode,
p->des0 |= cpu_to_le32(RDES0_OWN);
bfsize1 = min(bfsize, BUF_SIZE_2KiB - 1);
p->des1 |= cpu_to_le32(bfsize & RDES1_BUFFER1_SIZE_MASK);
p->des1 |= cpu_to_le32(bfsize1 & RDES1_BUFFER1_SIZE_MASK);
if (mode == STMMAC_CHAIN_MODE)
ndesc_rx_set_on_chain(p, end);

View File

@ -2616,8 +2616,6 @@ static int stmmac_open(struct net_device *dev)
u32 chan;
int ret;
stmmac_check_ether_addr(priv);
if (priv->hw->pcs != STMMAC_PCS_RGMII &&
priv->hw->pcs != STMMAC_PCS_TBI &&
priv->hw->pcs != STMMAC_PCS_RTBI) {
@ -4303,6 +4301,8 @@ int stmmac_dvr_probe(struct device *device,
if (ret)
goto error_hw_init;
stmmac_check_ether_addr(priv);
/* Configure real RX and TX queues */
netif_set_real_num_rx_queues(ndev, priv->plat->rx_queues_to_use);
netif_set_real_num_tx_queues(ndev, priv->plat->tx_queues_to_use);

View File

@ -159,6 +159,12 @@ static const struct dmi_system_id quark_pci_dmi[] = {
},
.driver_data = (void *)&galileo_stmmac_dmi_data,
},
/*
* There are 2 types of SIMATIC IOT2000: IOT20202 and IOT2040.
* The asset tag "6ES7647-0AA00-0YA2" is only for IOT2020 which
* has only one pci network device while other asset tags are
* for IOT2040 which has two.
*/
{
.matches = {
DMI_EXACT_MATCH(DMI_BOARD_NAME, "SIMATIC IOT2000"),
@ -170,8 +176,6 @@ static const struct dmi_system_id quark_pci_dmi[] = {
{
.matches = {
DMI_EXACT_MATCH(DMI_BOARD_NAME, "SIMATIC IOT2000"),
DMI_EXACT_MATCH(DMI_BOARD_ASSET_TAG,
"6ES7647-0AA00-1YA2"),
},
.driver_data = (void *)&iot2040_stmmac_dmi_data,
},

View File

@ -159,6 +159,14 @@ static const struct spi_device_id ks8995_id[] = {
};
MODULE_DEVICE_TABLE(spi, ks8995_id);
static const struct of_device_id ks8895_spi_of_match[] = {
{ .compatible = "micrel,ks8995" },
{ .compatible = "micrel,ksz8864" },
{ .compatible = "micrel,ksz8795" },
{ },
};
MODULE_DEVICE_TABLE(of, ks8895_spi_of_match);
static inline u8 get_chip_id(u8 val)
{
return (val >> ID1_CHIPID_S) & ID1_CHIPID_M;
@ -526,6 +534,7 @@ static int ks8995_remove(struct spi_device *spi)
static struct spi_driver ks8995_driver = {
.driver = {
.name = "spi-ks8995",
.of_match_table = of_match_ptr(ks8895_spi_of_match),
},
.probe = ks8995_probe,
.remove = ks8995_remove,

View File

@ -1156,6 +1156,13 @@ static int team_port_add(struct team *team, struct net_device *port_dev,
return -EINVAL;
}
if (netdev_has_upper_dev(dev, port_dev)) {
NL_SET_ERR_MSG(extack, "Device is already an upper device of the team interface");
netdev_err(dev, "Device %s is already an upper device of the team interface\n",
portname);
return -EBUSY;
}
if (port_dev->features & NETIF_F_VLAN_CHALLENGED &&
vlan_uses_dev(dev)) {
NL_SET_ERR_MSG(extack, "Device is VLAN challenged and team device has VLAN set up");

View File

@ -875,6 +875,7 @@ static const struct net_device_ops vrf_netdev_ops = {
.ndo_init = vrf_dev_init,
.ndo_uninit = vrf_dev_uninit,
.ndo_start_xmit = vrf_xmit,
.ndo_set_mac_address = eth_mac_addr,
.ndo_get_stats64 = vrf_get_stats64,
.ndo_add_slave = vrf_add_slave,
.ndo_del_slave = vrf_del_slave,
@ -1274,6 +1275,7 @@ static void vrf_setup(struct net_device *dev)
/* default to no qdisc; user can add if desired */
dev->priv_flags |= IFF_NO_QUEUE;
dev->priv_flags |= IFF_NO_RX_HANDLER;
dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
/* VRF devices do not care about MTU, but if the MTU is set
* too low then the ipv4 and ipv6 protocols are disabled

View File

@ -1074,6 +1074,12 @@ static const struct spi_device_id st95hf_id[] = {
};
MODULE_DEVICE_TABLE(spi, st95hf_id);
static const struct of_device_id st95hf_spi_of_match[] = {
{ .compatible = "st,st95hf" },
{ },
};
MODULE_DEVICE_TABLE(of, st95hf_spi_of_match);
static int st95hf_probe(struct spi_device *nfc_spi_dev)
{
int ret;
@ -1260,6 +1266,7 @@ static struct spi_driver st95hf_driver = {
.driver = {
.name = "st95hf",
.owner = THIS_MODULE,
.of_match_table = of_match_ptr(st95hf_spi_of_match),
},
.id_table = st95hf_id,
.probe = st95hf_probe,

View File

@ -7,7 +7,6 @@
*/
#include <linux/etherdevice.h>
#include <linux/kernel.h>
#include <linux/nvmem-consumer.h>
#include <linux/of_net.h>
#include <linux/phy.h>
#include <linux/export.h>

View File

@ -1595,6 +1595,7 @@ static int ctcm_new_device(struct ccwgroup_device *cgdev)
if (priv->channel[direction] == NULL) {
if (direction == CTCM_WRITE)
channel_free(priv->channel[CTCM_READ]);
result = -ENODEV;
goto out_dev;
}
priv->channel[direction]->netdev = dev;

View File

@ -448,6 +448,18 @@ static inline void eth_addr_dec(u8 *addr)
u64_to_ether_addr(u, addr);
}
/**
* eth_addr_inc() - Increment the given MAC address.
* @addr: Pointer to a six-byte array containing Ethernet address to increment.
*/
static inline void eth_addr_inc(u8 *addr)
{
u64 u = ether_addr_to_u64(addr);
u++;
u64_to_ether_addr(u, addr);
}
/**
* is_etherdev_addr - Tell if given Ethernet address belongs to the device.
* @dev: Pointer to a device structure

View File

@ -316,6 +316,8 @@ struct nf_conn *nf_ct_tmpl_alloc(struct net *net,
gfp_t flags);
void nf_ct_tmpl_free(struct nf_conn *tmpl);
u32 nf_ct_get_id(const struct nf_conn *ct);
static inline void
nf_ct_set(struct sk_buff *skb, struct nf_conn *ct, enum ip_conntrack_info info)
{

View File

@ -75,6 +75,12 @@ bool nf_conntrack_invert_icmp_tuple(struct nf_conntrack_tuple *tuple,
bool nf_conntrack_invert_icmpv6_tuple(struct nf_conntrack_tuple *tuple,
const struct nf_conntrack_tuple *orig);
int nf_conntrack_inet_error(struct nf_conn *tmpl, struct sk_buff *skb,
unsigned int dataoff,
const struct nf_hook_state *state,
u8 l4proto,
union nf_inet_addr *outer_daddr);
int nf_conntrack_icmpv4_error(struct nf_conn *tmpl,
struct sk_buff *skb,
unsigned int dataoff,

View File

@ -2032,7 +2032,8 @@ static int ebt_size_mwt(struct compat_ebt_entry_mwt *match32,
if (match_kern)
match_kern->match_size = ret;
if (WARN_ON(type == EBT_COMPAT_TARGET && size_left))
/* rule should have no remaining data after target */
if (type == EBT_COMPAT_TARGET && size_left)
return -EINVAL;
match32 = (struct compat_ebt_entry_mwt *) buf;

View File

@ -1183,16 +1183,23 @@ static struct dst_entry *ipv4_dst_check(struct dst_entry *dst, u32 cookie)
return dst;
}
static void ipv4_link_failure(struct sk_buff *skb)
static void ipv4_send_dest_unreach(struct sk_buff *skb)
{
struct ip_options opt;
struct rtable *rt;
int res;
/* Recompile ip options since IPCB may not be valid anymore.
* Also check we have a reasonable ipv4 header.
*/
if (!pskb_network_may_pull(skb, sizeof(struct iphdr)) ||
ip_hdr(skb)->version != 4 || ip_hdr(skb)->ihl < 5)
return;
memset(&opt, 0, sizeof(opt));
opt.optlen = ip_hdr(skb)->ihl*4 - sizeof(struct iphdr);
if (ip_hdr(skb)->ihl > 5) {
if (!pskb_network_may_pull(skb, ip_hdr(skb)->ihl * 4))
return;
opt.optlen = ip_hdr(skb)->ihl * 4 - sizeof(struct iphdr);
rcu_read_lock();
res = __ip_options_compile(dev_net(skb->dev), &opt, skb, NULL);
@ -1200,8 +1207,15 @@ static void ipv4_link_failure(struct sk_buff *skb)
if (res)
return;
}
__icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0, &opt);
}
static void ipv4_link_failure(struct sk_buff *skb)
{
struct rtable *rt;
ipv4_send_dest_unreach(skb);
rt = skb_rtable(skb);
if (rt)

View File

@ -49,6 +49,7 @@ static int ip_ping_group_range_min[] = { 0, 0 };
static int ip_ping_group_range_max[] = { GID_T_MAX, GID_T_MAX };
static int comp_sack_nr_max = 255;
static u32 u32_max_div_HZ = UINT_MAX / HZ;
static int one_day_secs = 24 * 3600;
/* obsolete */
static int sysctl_tcp_low_latency __read_mostly;
@ -1151,7 +1152,9 @@ static struct ctl_table ipv4_net_table[] = {
.data = &init_net.ipv4.sysctl_tcp_min_rtt_wlen,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec
.proc_handler = proc_dointvec_minmax,
.extra1 = &zero,
.extra2 = &one_day_secs
},
{
.procname = "tcp_autocorking",

View File

@ -476,7 +476,7 @@ static int ip6addrlbl_valid_dump_req(const struct nlmsghdr *nlh,
}
if (nlmsg_attrlen(nlh, sizeof(*ifal))) {
NL_SET_ERR_MSG_MOD(extack, "Invalid data after header for address label dump requewst");
NL_SET_ERR_MSG_MOD(extack, "Invalid data after header for address label dump request");
return -EINVAL;
}

View File

@ -11,6 +11,7 @@
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/netdevice.h>
#include <linux/etherdevice.h>
#include <linux/skbuff.h>
#include <net/ncsi.h>
@ -667,7 +668,10 @@ static int ncsi_rsp_handler_oem_bcm_gma(struct ncsi_request *nr)
ndev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
memcpy(saddr.sa_data, &rsp->data[BCM_MAC_ADDR_OFFSET], ETH_ALEN);
/* Increase mac address by 1 for BMC's address */
saddr.sa_data[ETH_ALEN - 1]++;
eth_addr_inc((u8 *)saddr.sa_data);
if (!is_valid_ether_addr((const u8 *)saddr.sa_data))
return -ENXIO;
ret = ops->ndo_set_mac_address(ndev, &saddr);
if (ret < 0)
netdev_warn(ndev, "NCSI: 'Writing mac address to device failed\n");

View File

@ -1678,7 +1678,7 @@ ip_vs_in_icmp(struct netns_ipvs *ipvs, struct sk_buff *skb, int *related,
if (!cp) {
int v;
if (!sysctl_schedule_icmp(ipvs))
if (ipip || !sysctl_schedule_icmp(ipvs))
return NF_ACCEPT;
if (!ip_vs_try_to_schedule(ipvs, AF_INET, skb, pd, &v, &cp, &ciph))

View File

@ -25,6 +25,7 @@
#include <linux/slab.h>
#include <linux/random.h>
#include <linux/jhash.h>
#include <linux/siphash.h>
#include <linux/err.h>
#include <linux/percpu.h>
#include <linux/moduleparam.h>
@ -449,6 +450,40 @@ nf_ct_invert_tuple(struct nf_conntrack_tuple *inverse,
}
EXPORT_SYMBOL_GPL(nf_ct_invert_tuple);
/* Generate a almost-unique pseudo-id for a given conntrack.
*
* intentionally doesn't re-use any of the seeds used for hash
* table location, we assume id gets exposed to userspace.
*
* Following nf_conn items do not change throughout lifetime
* of the nf_conn after it has been committed to main hash table:
*
* 1. nf_conn address
* 2. nf_conn->ext address
* 3. nf_conn->master address (normally NULL)
* 4. tuple
* 5. the associated net namespace
*/
u32 nf_ct_get_id(const struct nf_conn *ct)
{
static __read_mostly siphash_key_t ct_id_seed;
unsigned long a, b, c, d;
net_get_random_once(&ct_id_seed, sizeof(ct_id_seed));
a = (unsigned long)ct;
b = (unsigned long)ct->master ^ net_hash_mix(nf_ct_net(ct));
c = (unsigned long)ct->ext;
d = (unsigned long)siphash(&ct->tuplehash, sizeof(ct->tuplehash),
&ct_id_seed);
#ifdef CONFIG_64BIT
return siphash_4u64((u64)a, (u64)b, (u64)c, (u64)d, &ct_id_seed);
#else
return siphash_4u32((u32)a, (u32)b, (u32)c, (u32)d, &ct_id_seed);
#endif
}
EXPORT_SYMBOL_GPL(nf_ct_get_id);
static void
clean_from_lists(struct nf_conn *ct)
{
@ -982,12 +1017,9 @@ __nf_conntrack_confirm(struct sk_buff *skb)
/* set conntrack timestamp, if enabled. */
tstamp = nf_conn_tstamp_find(ct);
if (tstamp) {
if (skb->tstamp == 0)
__net_timestamp(skb);
if (tstamp)
tstamp->start = ktime_get_real_ns();
tstamp->start = ktime_to_ns(skb->tstamp);
}
/* Since the lookup is lockless, hash insertion must be done after
* starting the timer and setting the CONFIRMED bit. The RCU barriers
* guarantee that no other CPU can find the conntrack before the above
@ -1350,6 +1382,7 @@ __nf_conntrack_alloc(struct net *net,
/* save hash for reusing when confirming */
*(unsigned long *)(&ct->tuplehash[IP_CT_DIR_REPLY].hnnode.pprev) = hash;
ct->status = 0;
ct->timeout = 0;
write_pnet(&ct->ct_net, net);
memset(&ct->__nfct_init_offset[0], 0,
offsetof(struct nf_conn, proto) -

View File

@ -29,6 +29,7 @@
#include <linux/spinlock.h>
#include <linux/interrupt.h>
#include <linux/slab.h>
#include <linux/siphash.h>
#include <linux/netfilter.h>
#include <net/netlink.h>
@ -485,7 +486,9 @@ nla_put_failure:
static int ctnetlink_dump_id(struct sk_buff *skb, const struct nf_conn *ct)
{
if (nla_put_be32(skb, CTA_ID, htonl((unsigned long)ct)))
__be32 id = (__force __be32)nf_ct_get_id(ct);
if (nla_put_be32(skb, CTA_ID, id))
goto nla_put_failure;
return 0;
@ -1286,8 +1289,9 @@ static int ctnetlink_del_conntrack(struct net *net, struct sock *ctnl,
}
if (cda[CTA_ID]) {
u_int32_t id = ntohl(nla_get_be32(cda[CTA_ID]));
if (id != (u32)(unsigned long)ct) {
__be32 id = nla_get_be32(cda[CTA_ID]);
if (id != (__force __be32)nf_ct_get_id(ct)) {
nf_ct_put(ct);
return -ENOENT;
}
@ -2692,6 +2696,25 @@ nla_put_failure:
static const union nf_inet_addr any_addr;
static __be32 nf_expect_get_id(const struct nf_conntrack_expect *exp)
{
static __read_mostly siphash_key_t exp_id_seed;
unsigned long a, b, c, d;
net_get_random_once(&exp_id_seed, sizeof(exp_id_seed));
a = (unsigned long)exp;
b = (unsigned long)exp->helper;
c = (unsigned long)exp->master;
d = (unsigned long)siphash(&exp->tuple, sizeof(exp->tuple), &exp_id_seed);
#ifdef CONFIG_64BIT
return (__force __be32)siphash_4u64((u64)a, (u64)b, (u64)c, (u64)d, &exp_id_seed);
#else
return (__force __be32)siphash_4u32((u32)a, (u32)b, (u32)c, (u32)d, &exp_id_seed);
#endif
}
static int
ctnetlink_exp_dump_expect(struct sk_buff *skb,
const struct nf_conntrack_expect *exp)
@ -2739,7 +2762,7 @@ ctnetlink_exp_dump_expect(struct sk_buff *skb,
}
#endif
if (nla_put_be32(skb, CTA_EXPECT_TIMEOUT, htonl(timeout)) ||
nla_put_be32(skb, CTA_EXPECT_ID, htonl((unsigned long)exp)) ||
nla_put_be32(skb, CTA_EXPECT_ID, nf_expect_get_id(exp)) ||
nla_put_be32(skb, CTA_EXPECT_FLAGS, htonl(exp->flags)) ||
nla_put_be32(skb, CTA_EXPECT_CLASS, htonl(exp->class)))
goto nla_put_failure;
@ -3044,7 +3067,8 @@ static int ctnetlink_get_expect(struct net *net, struct sock *ctnl,
if (cda[CTA_EXPECT_ID]) {
__be32 id = nla_get_be32(cda[CTA_EXPECT_ID]);
if (ntohl(id) != (u32)(unsigned long)exp) {
if (id != nf_expect_get_id(exp)) {
nf_ct_expect_put(exp);
return -ENOENT;
}

View File

@ -55,7 +55,7 @@ void nf_l4proto_log_invalid(const struct sk_buff *skb,
struct va_format vaf;
va_list args;
if (net->ct.sysctl_log_invalid != protonum ||
if (net->ct.sysctl_log_invalid != protonum &&
net->ct.sysctl_log_invalid != IPPROTO_RAW)
return;

View File

@ -103,49 +103,94 @@ int nf_conntrack_icmp_packet(struct nf_conn *ct,
return NF_ACCEPT;
}
/* Returns conntrack if it dealt with ICMP, and filled in skb fields */
static int
icmp_error_message(struct nf_conn *tmpl, struct sk_buff *skb,
const struct nf_hook_state *state)
/* Check inner header is related to any of the existing connections */
int nf_conntrack_inet_error(struct nf_conn *tmpl, struct sk_buff *skb,
unsigned int dataoff,
const struct nf_hook_state *state,
u8 l4proto, union nf_inet_addr *outer_daddr)
{
struct nf_conntrack_tuple innertuple, origtuple;
const struct nf_conntrack_tuple_hash *h;
const struct nf_conntrack_zone *zone;
enum ip_conntrack_info ctinfo;
struct nf_conntrack_zone tmp;
union nf_inet_addr *ct_daddr;
enum ip_conntrack_dir dir;
struct nf_conn *ct;
WARN_ON(skb_nfct(skb));
zone = nf_ct_zone_tmpl(tmpl, skb, &tmp);
/* Are they talking about one of our connections? */
if (!nf_ct_get_tuplepr(skb,
skb_network_offset(skb) + ip_hdrlen(skb)
+ sizeof(struct icmphdr),
PF_INET, state->net, &origtuple)) {
pr_debug("icmp_error_message: failed to get tuple\n");
if (!nf_ct_get_tuplepr(skb, dataoff,
state->pf, state->net, &origtuple))
return -NF_ACCEPT;
}
/* Ordinarily, we'd expect the inverted tupleproto, but it's
been preserved inside the ICMP. */
if (!nf_ct_invert_tuple(&innertuple, &origtuple)) {
pr_debug("icmp_error_message: no match\n");
if (!nf_ct_invert_tuple(&innertuple, &origtuple))
return -NF_ACCEPT;
h = nf_conntrack_find_get(state->net, zone, &innertuple);
if (!h)
return -NF_ACCEPT;
/* Consider: A -> T (=This machine) -> B
* Conntrack entry will look like this:
* Original: A->B
* Reply: B->T (SNAT case) OR A
*
* When this function runs, we got packet that looks like this:
* iphdr|icmphdr|inner_iphdr|l4header (tcp, udp, ..).
*
* Above nf_conntrack_find_get() makes lookup based on inner_hdr,
* so we should expect that destination of the found connection
* matches outer header destination address.
*
* In above example, we can consider these two cases:
* 1. Error coming in reply direction from B or M (middle box) to
* T (SNAT case) or A.
* Inner saddr will be B, dst will be T or A.
* The found conntrack will be reply tuple (B->T/A).
* 2. Error coming in original direction from A or M to B.
* Inner saddr will be A, inner daddr will be B.
* The found conntrack will be original tuple (A->B).
*
* In both cases, conntrack[dir].dst == inner.dst.
*
* A bogus packet could look like this:
* Inner: B->T
* Outer: B->X (other machine reachable by T).
*
* In this case, lookup yields connection A->B and will
* set packet from B->X as *RELATED*, even though no connection
* from X was ever seen.
*/
ct = nf_ct_tuplehash_to_ctrack(h);
dir = NF_CT_DIRECTION(h);
ct_daddr = &ct->tuplehash[dir].tuple.dst.u3;
if (!nf_inet_addr_cmp(outer_daddr, ct_daddr)) {
if (state->pf == AF_INET) {
nf_l4proto_log_invalid(skb, state->net, state->pf,
l4proto,
"outer daddr %pI4 != inner %pI4",
&outer_daddr->ip, &ct_daddr->ip);
} else if (state->pf == AF_INET6) {
nf_l4proto_log_invalid(skb, state->net, state->pf,
l4proto,
"outer daddr %pI6 != inner %pI6",
&outer_daddr->ip6, &ct_daddr->ip6);
}
nf_ct_put(ct);
return -NF_ACCEPT;
}
ctinfo = IP_CT_RELATED;
h = nf_conntrack_find_get(state->net, zone, &innertuple);
if (!h) {
pr_debug("icmp_error_message: no match\n");
return -NF_ACCEPT;
}
if (NF_CT_DIRECTION(h) == IP_CT_DIR_REPLY)
if (dir == IP_CT_DIR_REPLY)
ctinfo += IP_CT_IS_REPLY;
/* Update skb to refer to this connection */
nf_ct_set(skb, nf_ct_tuplehash_to_ctrack(h), ctinfo);
nf_ct_set(skb, ct, ctinfo);
return NF_ACCEPT;
}
@ -162,11 +207,12 @@ int nf_conntrack_icmpv4_error(struct nf_conn *tmpl,
struct sk_buff *skb, unsigned int dataoff,
const struct nf_hook_state *state)
{
union nf_inet_addr outer_daddr;
const struct icmphdr *icmph;
struct icmphdr _ih;
/* Not enough header? */
icmph = skb_header_pointer(skb, ip_hdrlen(skb), sizeof(_ih), &_ih);
icmph = skb_header_pointer(skb, dataoff, sizeof(_ih), &_ih);
if (icmph == NULL) {
icmp_error_log(skb, state, "short packet");
return -NF_ACCEPT;
@ -199,7 +245,12 @@ int nf_conntrack_icmpv4_error(struct nf_conn *tmpl,
icmph->type != ICMP_REDIRECT)
return NF_ACCEPT;
return icmp_error_message(tmpl, skb, state);
memset(&outer_daddr, 0, sizeof(outer_daddr));
outer_daddr.ip = ip_hdr(skb)->daddr;
dataoff += sizeof(*icmph);
return nf_conntrack_inet_error(tmpl, skb, dataoff, state,
IPPROTO_ICMP, &outer_daddr);
}
#if IS_ENABLED(CONFIG_NF_CT_NETLINK)

View File

@ -123,51 +123,6 @@ int nf_conntrack_icmpv6_packet(struct nf_conn *ct,
return NF_ACCEPT;
}
static int
icmpv6_error_message(struct net *net, struct nf_conn *tmpl,
struct sk_buff *skb,
unsigned int icmp6off)
{
struct nf_conntrack_tuple intuple, origtuple;
const struct nf_conntrack_tuple_hash *h;
enum ip_conntrack_info ctinfo;
struct nf_conntrack_zone tmp;
WARN_ON(skb_nfct(skb));
/* Are they talking about one of our connections? */
if (!nf_ct_get_tuplepr(skb,
skb_network_offset(skb)
+ sizeof(struct ipv6hdr)
+ sizeof(struct icmp6hdr),
PF_INET6, net, &origtuple)) {
pr_debug("icmpv6_error: Can't get tuple\n");
return -NF_ACCEPT;
}
/* Ordinarily, we'd expect the inverted tupleproto, but it's
been preserved inside the ICMP. */
if (!nf_ct_invert_tuple(&intuple, &origtuple)) {
pr_debug("icmpv6_error: Can't invert tuple\n");
return -NF_ACCEPT;
}
ctinfo = IP_CT_RELATED;
h = nf_conntrack_find_get(net, nf_ct_zone_tmpl(tmpl, skb, &tmp),
&intuple);
if (!h) {
pr_debug("icmpv6_error: no match\n");
return -NF_ACCEPT;
} else {
if (NF_CT_DIRECTION(h) == IP_CT_DIR_REPLY)
ctinfo += IP_CT_IS_REPLY;
}
/* Update skb to refer to this connection */
nf_ct_set(skb, nf_ct_tuplehash_to_ctrack(h), ctinfo);
return NF_ACCEPT;
}
static void icmpv6_error_log(const struct sk_buff *skb,
const struct nf_hook_state *state,
@ -182,6 +137,7 @@ int nf_conntrack_icmpv6_error(struct nf_conn *tmpl,
unsigned int dataoff,
const struct nf_hook_state *state)
{
union nf_inet_addr outer_daddr;
const struct icmp6hdr *icmp6h;
struct icmp6hdr _ih;
int type;
@ -210,7 +166,11 @@ int nf_conntrack_icmpv6_error(struct nf_conn *tmpl,
if (icmp6h->icmp6_type >= 128)
return NF_ACCEPT;
return icmpv6_error_message(state->net, tmpl, skb, dataoff);
memcpy(&outer_daddr.ip6, &ipv6_hdr(skb)->daddr,
sizeof(outer_daddr.ip6));
dataoff += sizeof(*icmp6h);
return nf_conntrack_inet_error(tmpl, skb, dataoff, state,
IPPROTO_ICMPV6, &outer_daddr);
}
#if IS_ENABLED(CONFIG_NF_CT_NETLINK)

View File

@ -415,9 +415,14 @@ static void nf_nat_l4proto_unique_tuple(struct nf_conntrack_tuple *tuple,
case IPPROTO_ICMPV6:
/* id is same for either direction... */
keyptr = &tuple->src.u.icmp.id;
min = range->min_proto.icmp.id;
if (!(range->flags & NF_NAT_RANGE_PROTO_SPECIFIED)) {
min = 0;
range_size = 65536;
} else {
min = ntohs(range->min_proto.icmp.id);
range_size = ntohs(range->max_proto.icmp.id) -
ntohs(range->min_proto.icmp.id) + 1;
}
goto find_free_id;
#if IS_ENABLED(CONFIG_NF_CT_PROTO_GRE)
case IPPROTO_GRE:

View File

@ -1545,7 +1545,7 @@ static int nft_chain_parse_hook(struct net *net,
if (IS_ERR(type))
return PTR_ERR(type);
}
if (!(type->hook_mask & (1 << hook->num)))
if (hook->num > NF_MAX_HOOKS || !(type->hook_mask & (1 << hook->num)))
return -EOPNOTSUPP;
if (type->type == NFT_CHAIN_T_NAT &&

View File

@ -540,7 +540,7 @@ __build_packet_message(struct nfnl_log_net *log,
goto nla_put_failure;
}
if (skb->tstamp) {
if (hooknum <= NF_INET_FORWARD && skb->tstamp) {
struct nfulnl_msg_packet_timestamp ts;
struct timespec64 kts = ktime_to_timespec64(skb->tstamp);
ts.sec = cpu_to_be64(kts.tv_sec);

View File

@ -582,7 +582,7 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
if (nfqnl_put_bridge(entry, skb) < 0)
goto nla_put_failure;
if (entskb->tstamp) {
if (entry->state.hook <= NF_INET_FORWARD && entskb->tstamp) {
struct nfqnl_msg_packet_timestamp ts;
struct timespec64 kts = ktime_to_timespec64(entskb->tstamp);

View File

@ -163,19 +163,24 @@ time_mt(const struct sk_buff *skb, struct xt_action_param *par)
s64 stamp;
/*
* We cannot use get_seconds() instead of __net_timestamp() here.
* We need real time here, but we can neither use skb->tstamp
* nor __net_timestamp().
*
* skb->tstamp and skb->skb_mstamp_ns overlap, however, they
* use different clock types (real vs monotonic).
*
* Suppose you have two rules:
* 1. match before 13:00
* 2. match after 13:00
*
* If you match against processing time (get_seconds) it
* may happen that the same packet matches both rules if
* it arrived at the right moment before 13:00.
* it arrived at the right moment before 13:00, so it would be
* better to check skb->tstamp and set it via __net_timestamp()
* if needed. This however breaks outgoing packets tx timestamp,
* and causes them to get delayed forever by fq packet scheduler.
*/
if (skb->tstamp == 0)
__net_timestamp((struct sk_buff *)skb);
stamp = ktime_to_ns(skb->tstamp);
stamp = div_s64(stamp, NSEC_PER_SEC);
stamp = get_seconds();
if (info->flags & XT_TIME_LOCAL_TZ)
/* Adjust for local timezone */

View File

@ -44,6 +44,17 @@ struct rds_ib_mr *rds_ib_alloc_fmr(struct rds_ib_device *rds_ibdev, int npages)
else
pool = rds_ibdev->mr_1m_pool;
if (atomic_read(&pool->dirty_count) >= pool->max_items / 10)
queue_delayed_work(rds_ib_mr_wq, &pool->flush_worker, 10);
/* Switch pools if one of the pool is reaching upper limit */
if (atomic_read(&pool->dirty_count) >= pool->max_items * 9 / 10) {
if (pool->pool_type == RDS_IB_MR_8K_POOL)
pool = rds_ibdev->mr_1m_pool;
else
pool = rds_ibdev->mr_8k_pool;
}
ibmr = rds_ib_try_reuse_ibmr(pool);
if (ibmr)
return ibmr;

View File

@ -454,9 +454,6 @@ struct rds_ib_mr *rds_ib_try_reuse_ibmr(struct rds_ib_mr_pool *pool)
struct rds_ib_mr *ibmr = NULL;
int iter = 0;
if (atomic_read(&pool->dirty_count) >= pool->max_items_soft / 10)
queue_delayed_work(rds_ib_mr_wq, &pool->flush_worker, 10);
while (1) {
ibmr = rds_ib_reuse_mr(pool);
if (ibmr)

View File

@ -16,6 +16,7 @@
#include <linux/init.h>
static struct sk_buff_head loopback_queue;
#define ROSE_LOOPBACK_LIMIT 1000
static struct timer_list loopback_timer;
static void rose_set_loopback_timer(void);
@ -35,29 +36,27 @@ static int rose_loopback_running(void)
int rose_loopback_queue(struct sk_buff *skb, struct rose_neigh *neigh)
{
struct sk_buff *skbn;
struct sk_buff *skbn = NULL;
if (skb_queue_len(&loopback_queue) < ROSE_LOOPBACK_LIMIT)
skbn = skb_clone(skb, GFP_ATOMIC);
kfree_skb(skb);
if (skbn != NULL) {
if (skbn) {
consume_skb(skb);
skb_queue_tail(&loopback_queue, skbn);
if (!rose_loopback_running())
rose_set_loopback_timer();
} else {
kfree_skb(skb);
}
return 1;
}
static void rose_set_loopback_timer(void)
{
del_timer(&loopback_timer);
loopback_timer.expires = jiffies + 10;
add_timer(&loopback_timer);
mod_timer(&loopback_timer, jiffies + 10);
}
static void rose_loopback_timer(struct timer_list *unused)
@ -68,8 +67,12 @@ static void rose_loopback_timer(struct timer_list *unused)
struct sock *sk;
unsigned short frametype;
unsigned int lci_i, lci_o;
int count;
while ((skb = skb_dequeue(&loopback_queue)) != NULL) {
for (count = 0; count < ROSE_LOOPBACK_LIMIT; count++) {
skb = skb_dequeue(&loopback_queue);
if (!skb)
return;
if (skb->len < ROSE_MIN_LEN) {
kfree_skb(skb);
continue;
@ -106,6 +109,8 @@ static void rose_loopback_timer(struct timer_list *unused)
kfree_skb(skb);
}
}
if (!skb_queue_empty(&loopback_queue))
mod_timer(&loopback_timer, jiffies + 1);
}
void __exit rose_loopback_clear(void)

View File

@ -1161,19 +1161,19 @@ int rxrpc_extract_header(struct rxrpc_skb_priv *sp, struct sk_buff *skb)
* handle data received on the local endpoint
* - may be called in interrupt context
*
* The socket is locked by the caller and this prevents the socket from being
* shut down and the local endpoint from going away, thus sk_user_data will not
* be cleared until this function returns.
* [!] Note that as this is called from the encap_rcv hook, the socket is not
* held locked by the caller and nothing prevents sk_user_data on the UDP from
* being cleared in the middle of processing this function.
*
* Called with the RCU read lock held from the IP layer via UDP.
*/
int rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb)
{
struct rxrpc_local *local = rcu_dereference_sk_user_data(udp_sk);
struct rxrpc_connection *conn;
struct rxrpc_channel *chan;
struct rxrpc_call *call = NULL;
struct rxrpc_skb_priv *sp;
struct rxrpc_local *local = udp_sk->sk_user_data;
struct rxrpc_peer *peer = NULL;
struct rxrpc_sock *rx = NULL;
unsigned int channel;
@ -1181,6 +1181,10 @@ int rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb)
_enter("%p", udp_sk);
if (unlikely(!local)) {
kfree_skb(skb);
return 0;
}
if (skb->tstamp == 0)
skb->tstamp = ktime_get_real();

View File

@ -304,7 +304,8 @@ nomem:
ret = -ENOMEM;
sock_error:
mutex_unlock(&rxnet->local_mutex);
kfree(local);
if (local)
call_rcu(&local->rcu, rxrpc_local_rcu);
_leave(" = %d", ret);
return ERR_PTR(ret);

View File

@ -904,7 +904,9 @@ int tls_set_device_offload_rx(struct sock *sk, struct tls_context *ctx)
goto release_netdev;
free_sw_resources:
up_read(&device_offload_lock);
tls_sw_free_resources_rx(sk);
down_read(&device_offload_lock);
release_ctx:
ctx->priv_ctx_rx = NULL;
release_netdev:
@ -939,8 +941,6 @@ void tls_device_offload_cleanup_rx(struct sock *sk)
}
out:
up_read(&device_offload_lock);
kfree(tls_ctx->rx.rec_seq);
kfree(tls_ctx->rx.iv);
tls_sw_release_resources_rx(sk);
}

View File

@ -194,6 +194,9 @@ static void update_chksum(struct sk_buff *skb, int headln)
static void complete_skb(struct sk_buff *nskb, struct sk_buff *skb, int headln)
{
struct sock *sk = skb->sk;
int delta;
skb_copy_header(nskb, skb);
skb_put(nskb, skb->len);
@ -201,11 +204,15 @@ static void complete_skb(struct sk_buff *nskb, struct sk_buff *skb, int headln)
update_chksum(nskb, headln);
nskb->destructor = skb->destructor;
nskb->sk = skb->sk;
nskb->sk = sk;
skb->destructor = NULL;
skb->sk = NULL;
refcount_add(nskb->truesize - skb->truesize,
&nskb->sk->sk_wmem_alloc);
delta = nskb->truesize - skb->truesize;
if (likely(delta < 0))
WARN_ON_ONCE(refcount_sub_and_test(-delta, &sk->sk_wmem_alloc));
else if (delta)
refcount_add(delta, &sk->sk_wmem_alloc);
}
/* This function may be called after the user socket is already

View File

@ -293,11 +293,8 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
#endif
}
if (ctx->rx_conf == TLS_SW) {
kfree(ctx->rx.rec_seq);
kfree(ctx->rx.iv);
if (ctx->rx_conf == TLS_SW)
tls_sw_free_resources_rx(sk);
}
#ifdef CONFIG_TLS_DEVICE
if (ctx->rx_conf == TLS_HW)

View File

@ -2078,6 +2078,9 @@ void tls_sw_release_resources_rx(struct sock *sk)
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
kfree(tls_ctx->rx.rec_seq);
kfree(tls_ctx->rx.iv);
if (ctx->aead_recv) {
kfree_skb(ctx->recv_pkt);
ctx->recv_pkt = NULL;

View File

@ -6,12 +6,14 @@ if [ $(id -u) != 0 ]; then
exit 0
fi
ret=0
echo "--------------------"
echo "running psock_fanout test"
echo "--------------------"
./in_netns.sh ./psock_fanout
if [ $? -ne 0 ]; then
echo "[FAIL]"
ret=1
else
echo "[PASS]"
fi
@ -22,6 +24,7 @@ echo "--------------------"
./in_netns.sh ./psock_tpacket
if [ $? -ne 0 ]; then
echo "[FAIL]"
ret=1
else
echo "[PASS]"
fi
@ -32,6 +35,8 @@ echo "--------------------"
./in_netns.sh ./txring_overwrite
if [ $? -ne 0 ]; then
echo "[FAIL]"
ret=1
else
echo "[PASS]"
fi
exit $ret

View File

@ -7,7 +7,7 @@ echo "--------------------"
./socket
if [ $? -ne 0 ]; then
echo "[FAIL]"
exit 1
else
echo "[PASS]"
fi

View File

@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
# Makefile for netfilter selftests
TEST_PROGS := nft_trans_stress.sh nft_nat.sh
TEST_PROGS := nft_trans_stress.sh nft_nat.sh conntrack_icmp_related.sh
include ../lib.mk

View File

@ -0,0 +1,283 @@
#!/bin/bash
#
# check that ICMP df-needed/pkttoobig icmp are set are set as related
# state
#
# Setup is:
#
# nsclient1 -> nsrouter1 -> nsrouter2 -> nsclient2
# MTU 1500, except for nsrouter2 <-> nsclient2 link (1280).
# ping nsclient2 from nsclient1, checking that conntrack did set RELATED
# 'fragmentation needed' icmp packet.
#
# In addition, nsrouter1 will perform IP masquerading, i.e. also
# check the icmp errors are propagated to the correct host as per
# nat of "established" icmp-echo "connection".
# Kselftest framework requirement - SKIP code is 4.
ksft_skip=4
ret=0
nft --version > /dev/null 2>&1
if [ $? -ne 0 ];then
echo "SKIP: Could not run test without nft tool"
exit $ksft_skip
fi
ip -Version > /dev/null 2>&1
if [ $? -ne 0 ];then
echo "SKIP: Could not run test without ip tool"
exit $ksft_skip
fi
cleanup() {
for i in 1 2;do ip netns del nsclient$i;done
for i in 1 2;do ip netns del nsrouter$i;done
}
ipv4() {
echo -n 192.168.$1.2
}
ipv6 () {
echo -n dead:$1::2
}
check_counter()
{
ns=$1
name=$2
expect=$3
local lret=0
cnt=$(ip netns exec $ns nft list counter inet filter "$name" | grep -q "$expect")
if [ $? -ne 0 ]; then
echo "ERROR: counter $name in $ns has unexpected value (expected $expect)" 1>&2
ip netns exec $ns nft list counter inet filter "$name" 1>&2
lret=1
fi
return $lret
}
check_unknown()
{
expect="packets 0 bytes 0"
for n in nsclient1 nsclient2 nsrouter1 nsrouter2; do
check_counter $n "unknown" "$expect"
if [ $? -ne 0 ] ;then
return 1
fi
done
return 0
}
for n in nsclient1 nsclient2 nsrouter1 nsrouter2; do
ip netns add $n
ip -net $n link set lo up
done
DEV=veth0
ip link add $DEV netns nsclient1 type veth peer name eth1 netns nsrouter1
DEV=veth0
ip link add $DEV netns nsclient2 type veth peer name eth1 netns nsrouter2
DEV=veth0
ip link add $DEV netns nsrouter1 type veth peer name eth2 netns nsrouter2
DEV=veth0
for i in 1 2; do
ip -net nsclient$i link set $DEV up
ip -net nsclient$i addr add $(ipv4 $i)/24 dev $DEV
ip -net nsclient$i addr add $(ipv6 $i)/64 dev $DEV
done
ip -net nsrouter1 link set eth1 up
ip -net nsrouter1 link set veth0 up
ip -net nsrouter2 link set eth1 up
ip -net nsrouter2 link set eth2 up
ip -net nsclient1 route add default via 192.168.1.1
ip -net nsclient1 -6 route add default via dead:1::1
ip -net nsclient2 route add default via 192.168.2.1
ip -net nsclient2 route add default via dead:2::1
i=3
ip -net nsrouter1 addr add 192.168.1.1/24 dev eth1
ip -net nsrouter1 addr add 192.168.3.1/24 dev veth0
ip -net nsrouter1 addr add dead:1::1/64 dev eth1
ip -net nsrouter1 addr add dead:3::1/64 dev veth0
ip -net nsrouter1 route add default via 192.168.3.10
ip -net nsrouter1 -6 route add default via dead:3::10
ip -net nsrouter2 addr add 192.168.2.1/24 dev eth1
ip -net nsrouter2 addr add 192.168.3.10/24 dev eth2
ip -net nsrouter2 addr add dead:2::1/64 dev eth1
ip -net nsrouter2 addr add dead:3::10/64 dev eth2
ip -net nsrouter2 route add default via 192.168.3.1
ip -net nsrouter2 route add default via dead:3::1
sleep 2
for i in 4 6; do
ip netns exec nsrouter1 sysctl -q net.ipv$i.conf.all.forwarding=1
ip netns exec nsrouter2 sysctl -q net.ipv$i.conf.all.forwarding=1
done
for netns in nsrouter1 nsrouter2; do
ip netns exec $netns nft -f - <<EOF
table inet filter {
counter unknown { }
counter related { }
chain forward {
type filter hook forward priority 0; policy accept;
meta l4proto icmpv6 icmpv6 type "packet-too-big" ct state "related" counter name "related" accept
meta l4proto icmp icmp type "destination-unreachable" ct state "related" counter name "related" accept
meta l4proto { icmp, icmpv6 } ct state new,established accept
counter name "unknown" drop
}
}
EOF
done
ip netns exec nsclient1 nft -f - <<EOF
table inet filter {
counter unknown { }
counter related { }
chain input {
type filter hook input priority 0; policy accept;
meta l4proto { icmp, icmpv6 } ct state established,untracked accept
meta l4proto { icmp, icmpv6 } ct state "related" counter name "related" accept
counter name "unknown" drop
}
}
EOF
ip netns exec nsclient2 nft -f - <<EOF
table inet filter {
counter unknown { }
counter new { }
counter established { }
chain input {
type filter hook input priority 0; policy accept;
meta l4proto { icmp, icmpv6 } ct state established,untracked accept
meta l4proto { icmp, icmpv6 } ct state "new" counter name "new" accept
meta l4proto { icmp, icmpv6 } ct state "established" counter name "established" accept
counter name "unknown" drop
}
chain output {
type filter hook output priority 0; policy accept;
meta l4proto { icmp, icmpv6 } ct state established,untracked accept
meta l4proto { icmp, icmpv6 } ct state "new" counter name "new"
meta l4proto { icmp, icmpv6 } ct state "established" counter name "established"
counter name "unknown" drop
}
}
EOF
# make sure NAT core rewrites adress of icmp error if nat is used according to
# conntrack nat information (icmp error will be directed at nsrouter1 address,
# but it needs to be routed to nsclient1 address).
ip netns exec nsrouter1 nft -f - <<EOF
table ip nat {
chain postrouting {
type nat hook postrouting priority 0; policy accept;
ip protocol icmp oifname "veth0" counter masquerade
}
}
table ip6 nat {
chain postrouting {
type nat hook postrouting priority 0; policy accept;
ip6 nexthdr icmpv6 oifname "veth0" counter masquerade
}
}
EOF
ip netns exec nsrouter2 ip link set eth1 mtu 1280
ip netns exec nsclient2 ip link set veth0 mtu 1280
sleep 1
ip netns exec nsclient1 ping -c 1 -s 1000 -q -M do 192.168.2.2 >/dev/null
if [ $? -ne 0 ]; then
echo "ERROR: netns ip routing/connectivity broken" 1>&2
cleanup
exit 1
fi
ip netns exec nsclient1 ping6 -q -c 1 -s 1000 dead:2::2 >/dev/null
if [ $? -ne 0 ]; then
echo "ERROR: netns ipv6 routing/connectivity broken" 1>&2
cleanup
exit 1
fi
check_unknown
if [ $? -ne 0 ]; then
ret=1
fi
expect="packets 0 bytes 0"
for netns in nsrouter1 nsrouter2 nsclient1;do
check_counter "$netns" "related" "$expect"
if [ $? -ne 0 ]; then
ret=1
fi
done
expect="packets 2 bytes 2076"
check_counter nsclient2 "new" "$expect"
if [ $? -ne 0 ]; then
ret=1
fi
ip netns exec nsclient1 ping -q -c 1 -s 1300 -M do 192.168.2.2 > /dev/null
if [ $? -eq 0 ]; then
echo "ERROR: ping should have failed with PMTU too big error" 1>&2
ret=1
fi
# nsrouter2 should have generated the icmp error, so
# related counter should be 0 (its in forward).
expect="packets 0 bytes 0"
check_counter "nsrouter2" "related" "$expect"
if [ $? -ne 0 ]; then
ret=1
fi
# but nsrouter1 should have seen it, same for nsclient1.
expect="packets 1 bytes 576"
for netns in nsrouter1 nsclient1;do
check_counter "$netns" "related" "$expect"
if [ $? -ne 0 ]; then
ret=1
fi
done
ip netns exec nsclient1 ping6 -c 1 -s 1300 dead:2::2 > /dev/null
if [ $? -eq 0 ]; then
echo "ERROR: ping6 should have failed with PMTU too big error" 1>&2
ret=1
fi
expect="packets 2 bytes 1856"
for netns in nsrouter1 nsclient1;do
check_counter "$netns" "related" "$expect"
if [ $? -ne 0 ]; then
ret=1
fi
done
if [ $ret -eq 0 ];then
echo "PASS: icmp mtu error had RELATED state"
else
echo "ERROR: icmp error RELATED state test has failed"
fi
cleanup
exit $ret

View File

@ -321,6 +321,7 @@ EOF
test_masquerade6()
{
local natflags=$1
local lret=0
ip netns exec ns0 sysctl net.ipv6.conf.all.forwarding=1 > /dev/null
@ -354,13 +355,13 @@ ip netns exec ns0 nft -f - <<EOF
table ip6 nat {
chain postrouting {
type nat hook postrouting priority 0; policy accept;
meta oif veth0 masquerade
meta oif veth0 masquerade $natflags
}
}
EOF
ip netns exec ns2 ping -q -c 1 dead:1::99 > /dev/null # ping ns2->ns1
if [ $? -ne 0 ] ; then
echo "ERROR: cannot ping ns1 from ns2 with active ipv6 masquerading"
echo "ERROR: cannot ping ns1 from ns2 with active ipv6 masquerade $natflags"
lret=1
fi
@ -397,19 +398,26 @@ EOF
fi
done
ip netns exec ns2 ping -q -c 1 dead:1::99 > /dev/null # ping ns2->ns1
if [ $? -ne 0 ] ; then
echo "ERROR: cannot ping ns1 from ns2 with active ipv6 masquerade $natflags (attempt 2)"
lret=1
fi
ip netns exec ns0 nft flush chain ip6 nat postrouting
if [ $? -ne 0 ]; then
echo "ERROR: Could not flush ip6 nat postrouting" 1>&2
lret=1
fi
test $lret -eq 0 && echo "PASS: IPv6 masquerade for ns2"
test $lret -eq 0 && echo "PASS: IPv6 masquerade $natflags for ns2"
return $lret
}
test_masquerade()
{
local natflags=$1
local lret=0
ip netns exec ns0 sysctl net.ipv4.conf.veth0.forwarding=1 > /dev/null
@ -417,7 +425,7 @@ test_masquerade()
ip netns exec ns2 ping -q -c 1 10.0.1.99 > /dev/null # ping ns2->ns1
if [ $? -ne 0 ] ; then
echo "ERROR: canot ping ns1 from ns2"
echo "ERROR: cannot ping ns1 from ns2 $natflags"
lret=1
fi
@ -443,13 +451,13 @@ ip netns exec ns0 nft -f - <<EOF
table ip nat {
chain postrouting {
type nat hook postrouting priority 0; policy accept;
meta oif veth0 masquerade
meta oif veth0 masquerade $natflags
}
}
EOF
ip netns exec ns2 ping -q -c 1 10.0.1.99 > /dev/null # ping ns2->ns1
if [ $? -ne 0 ] ; then
echo "ERROR: cannot ping ns1 from ns2 with active ip masquerading"
echo "ERROR: cannot ping ns1 from ns2 with active ip masquere $natflags"
lret=1
fi
@ -485,13 +493,19 @@ EOF
fi
done
ip netns exec ns2 ping -q -c 1 10.0.1.99 > /dev/null # ping ns2->ns1
if [ $? -ne 0 ] ; then
echo "ERROR: cannot ping ns1 from ns2 with active ip masquerade $natflags (attempt 2)"
lret=1
fi
ip netns exec ns0 nft flush chain ip nat postrouting
if [ $? -ne 0 ]; then
echo "ERROR: Could not flush nat postrouting" 1>&2
lret=1
fi
test $lret -eq 0 && echo "PASS: IP masquerade for ns2"
test $lret -eq 0 && echo "PASS: IP masquerade $natflags for ns2"
return $lret
}
@ -750,8 +764,12 @@ test_local_dnat
test_local_dnat6
reset_counters
test_masquerade
test_masquerade6
test_masquerade ""
test_masquerade6 ""
reset_counters
test_masquerade "fully-random"
test_masquerade6 "fully-random"
reset_counters
test_redirect