2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-25 13:43:55 +08:00
Commit Graph

2574 Commits

Author SHA1 Message Date
Vlad Buslov
87750d173c net: sched: act_tunnel_key: fix metadata handling
Tunnel key action params->tcft_enc_metadata is only set when action is
TCA_TUNNEL_KEY_ACT_SET. However, metadata pointer is incorrectly
dereferenced during tunnel key init and release without verifying that
action is if correct type, which causes NULL pointer dereference. Metadata
tunnel dst_cache is also leaked on action overwrite.

Fix metadata handling:
- Verify that metadata pointer is not NULL before dereferencing it in
  tunnel_key_init error handling code.
- Move dst_cache destroy code into tunnel_key_release_params() function
  that is called in both action overwrite and release cases (fixes resource
  leak) and verifies that actions has correct type before dereferencing
  metadata pointer (fixes NULL pointer dereference).

Oops with KASAN enabled during tdc tests execution:

[  261.080482] ==================================================================
[  261.088049] BUG: KASAN: null-ptr-deref in dst_cache_destroy+0x21/0xa0
[  261.094613] Read of size 8 at addr 00000000000000b0 by task tc/2976
[  261.102524] CPU: 14 PID: 2976 Comm: tc Not tainted 5.0.0-rc7+ #157
[  261.108844] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[  261.116726] Call Trace:
[  261.119234]  dump_stack+0x9a/0xeb
[  261.122625]  ? dst_cache_destroy+0x21/0xa0
[  261.126818]  ? dst_cache_destroy+0x21/0xa0
[  261.131004]  kasan_report+0x176/0x192
[  261.134752]  ? idr_get_next+0xd0/0x120
[  261.138578]  ? dst_cache_destroy+0x21/0xa0
[  261.142768]  dst_cache_destroy+0x21/0xa0
[  261.146799]  tunnel_key_release+0x3a/0x50 [act_tunnel_key]
[  261.152392]  tcf_action_cleanup+0x2c/0xc0
[  261.156490]  tcf_generic_walker+0x4c2/0x5c0
[  261.160794]  ? tcf_action_dump_1+0x390/0x390
[  261.165163]  ? tunnel_key_walker+0x5/0x1a0 [act_tunnel_key]
[  261.170865]  ? tunnel_key_walker+0xe9/0x1a0 [act_tunnel_key]
[  261.176641]  tca_action_gd+0x600/0xa40
[  261.180482]  ? tca_get_fill.constprop.17+0x200/0x200
[  261.185548]  ? __lock_acquire+0x588/0x1d20
[  261.189741]  ? __lock_acquire+0x588/0x1d20
[  261.193922]  ? mark_held_locks+0x90/0x90
[  261.197944]  ? mark_held_locks+0x90/0x90
[  261.202018]  ? __nla_parse+0xfe/0x190
[  261.205774]  tc_ctl_action+0x218/0x230
[  261.209614]  ? tcf_action_add+0x230/0x230
[  261.213726]  rtnetlink_rcv_msg+0x3a5/0x600
[  261.217910]  ? lock_downgrade+0x2d0/0x2d0
[  261.222006]  ? validate_linkmsg+0x400/0x400
[  261.226278]  ? find_held_lock+0x6d/0xd0
[  261.230200]  ? match_held_lock+0x1b/0x210
[  261.234296]  ? validate_linkmsg+0x400/0x400
[  261.238567]  netlink_rcv_skb+0xc7/0x1f0
[  261.242489]  ? netlink_ack+0x470/0x470
[  261.246319]  ? netlink_deliver_tap+0x1f3/0x5a0
[  261.250874]  netlink_unicast+0x2ae/0x350
[  261.254884]  ? netlink_attachskb+0x340/0x340
[  261.261647]  ? _copy_from_iter_full+0xdd/0x380
[  261.268576]  ? __virt_addr_valid+0xb6/0xf0
[  261.275227]  ? __check_object_size+0x159/0x240
[  261.282184]  netlink_sendmsg+0x4d3/0x630
[  261.288572]  ? netlink_unicast+0x350/0x350
[  261.295132]  ? netlink_unicast+0x350/0x350
[  261.301608]  sock_sendmsg+0x6d/0x80
[  261.307467]  ___sys_sendmsg+0x48e/0x540
[  261.313633]  ? copy_msghdr_from_user+0x210/0x210
[  261.320545]  ? save_stack+0x89/0xb0
[  261.326289]  ? __lock_acquire+0x588/0x1d20
[  261.332605]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  261.340063]  ? mark_held_locks+0x90/0x90
[  261.346162]  ? do_filp_open+0x138/0x1d0
[  261.352108]  ? may_open_dev+0x50/0x50
[  261.357897]  ? match_held_lock+0x1b/0x210
[  261.364016]  ? __fget_light+0xa6/0xe0
[  261.369840]  ? __sys_sendmsg+0xd2/0x150
[  261.375814]  __sys_sendmsg+0xd2/0x150
[  261.381610]  ? __ia32_sys_shutdown+0x30/0x30
[  261.388026]  ? lock_downgrade+0x2d0/0x2d0
[  261.394182]  ? mark_held_locks+0x1c/0x90
[  261.400230]  ? do_syscall_64+0x1e/0x280
[  261.406172]  do_syscall_64+0x78/0x280
[  261.411932]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  261.419103] RIP: 0033:0x7f28e91a8b87
[  261.424791] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00 00 8b 05 6a 2b 2c 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00 00 00 53 48 89 f3 48
[  261.448226] RSP: 002b:00007ffdc5c4e2d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[  261.458183] RAX: ffffffffffffffda RBX: 000000005c73c202 RCX: 00007f28e91a8b87
[  261.467728] RDX: 0000000000000000 RSI: 00007ffdc5c4e340 RDI: 0000000000000003
[  261.477342] RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000000000c
[  261.486970] R10: 000000000000000c R11: 0000000000000246 R12: 0000000000000001
[  261.496599] R13: 000000000067b4e0 R14: 00007ffdc5c5248c R15: 00007ffdc5c52480
[  261.506281] ==================================================================
[  261.516076] Disabling lock debugging due to kernel taint
[  261.523979] BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0
[  261.534413] #PF error: [normal kernel read fault]
[  261.541730] PGD 8000000317400067 P4D 8000000317400067 PUD 316878067 PMD 0
[  261.551294] Oops: 0000 [#1] SMP KASAN PTI
[  261.557985] CPU: 14 PID: 2976 Comm: tc Tainted: G    B             5.0.0-rc7+ #157
[  261.568306] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[  261.578874] RIP: 0010:dst_cache_destroy+0x21/0xa0
[  261.586413] Code: f4 ff ff ff eb f6 0f 1f 00 0f 1f 44 00 00 41 56 41 55 49 c7 c6 60 fe 35 af 41 54 55 49 89 fc 53 bd ff ff ff ff e8 ef 98 73 ff <49> 83 3c 24 00 75 35 eb 6c 4c 63 ed e8 de 98 73 ff 4a 8d 3c ed 40
[  261.611247] RSP: 0018:ffff888316447160 EFLAGS: 00010282
[  261.619564] RAX: 0000000000000000 RBX: ffff88835b3e2f00 RCX: ffffffffad1c5071
[  261.629862] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: 0000000000000297
[  261.640149] RBP: 00000000ffffffff R08: fffffbfff5dd4e89 R09: fffffbfff5dd4e89
[  261.650467] R10: 0000000000000001 R11: fffffbfff5dd4e88 R12: 00000000000000b0
[  261.660785] R13: ffff8883267a10c0 R14: ffffffffaf35fe60 R15: 0000000000000001
[  261.671110] FS:  00007f28ea3e6400(0000) GS:ffff888364200000(0000) knlGS:0000000000000000
[  261.682447] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  261.691491] CR2: 00000000000000b0 CR3: 00000003178ae004 CR4: 00000000001606e0
[  261.701283] Call Trace:
[  261.706374]  tunnel_key_release+0x3a/0x50 [act_tunnel_key]
[  261.714522]  tcf_action_cleanup+0x2c/0xc0
[  261.721208]  tcf_generic_walker+0x4c2/0x5c0
[  261.728074]  ? tcf_action_dump_1+0x390/0x390
[  261.734996]  ? tunnel_key_walker+0x5/0x1a0 [act_tunnel_key]
[  261.743247]  ? tunnel_key_walker+0xe9/0x1a0 [act_tunnel_key]
[  261.751557]  tca_action_gd+0x600/0xa40
[  261.757991]  ? tca_get_fill.constprop.17+0x200/0x200
[  261.765644]  ? __lock_acquire+0x588/0x1d20
[  261.772461]  ? __lock_acquire+0x588/0x1d20
[  261.779266]  ? mark_held_locks+0x90/0x90
[  261.785880]  ? mark_held_locks+0x90/0x90
[  261.792470]  ? __nla_parse+0xfe/0x190
[  261.798738]  tc_ctl_action+0x218/0x230
[  261.805145]  ? tcf_action_add+0x230/0x230
[  261.811760]  rtnetlink_rcv_msg+0x3a5/0x600
[  261.818564]  ? lock_downgrade+0x2d0/0x2d0
[  261.825433]  ? validate_linkmsg+0x400/0x400
[  261.832256]  ? find_held_lock+0x6d/0xd0
[  261.838624]  ? match_held_lock+0x1b/0x210
[  261.845142]  ? validate_linkmsg+0x400/0x400
[  261.851729]  netlink_rcv_skb+0xc7/0x1f0
[  261.857976]  ? netlink_ack+0x470/0x470
[  261.864132]  ? netlink_deliver_tap+0x1f3/0x5a0
[  261.870969]  netlink_unicast+0x2ae/0x350
[  261.877294]  ? netlink_attachskb+0x340/0x340
[  261.883962]  ? _copy_from_iter_full+0xdd/0x380
[  261.890750]  ? __virt_addr_valid+0xb6/0xf0
[  261.897188]  ? __check_object_size+0x159/0x240
[  261.903928]  netlink_sendmsg+0x4d3/0x630
[  261.910112]  ? netlink_unicast+0x350/0x350
[  261.916410]  ? netlink_unicast+0x350/0x350
[  261.922656]  sock_sendmsg+0x6d/0x80
[  261.928257]  ___sys_sendmsg+0x48e/0x540
[  261.934183]  ? copy_msghdr_from_user+0x210/0x210
[  261.940865]  ? save_stack+0x89/0xb0
[  261.946355]  ? __lock_acquire+0x588/0x1d20
[  261.952358]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  261.959468]  ? mark_held_locks+0x90/0x90
[  261.965248]  ? do_filp_open+0x138/0x1d0
[  261.970910]  ? may_open_dev+0x50/0x50
[  261.976386]  ? match_held_lock+0x1b/0x210
[  261.982210]  ? __fget_light+0xa6/0xe0
[  261.987648]  ? __sys_sendmsg+0xd2/0x150
[  261.993263]  __sys_sendmsg+0xd2/0x150
[  261.998613]  ? __ia32_sys_shutdown+0x30/0x30
[  262.004555]  ? lock_downgrade+0x2d0/0x2d0
[  262.010236]  ? mark_held_locks+0x1c/0x90
[  262.015758]  ? do_syscall_64+0x1e/0x280
[  262.021234]  do_syscall_64+0x78/0x280
[  262.026500]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  262.033207] RIP: 0033:0x7f28e91a8b87
[  262.038421] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00 00 8b 05 6a 2b 2c 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00 00 00 53 48 89 f3 48
[  262.060708] RSP: 002b:00007ffdc5c4e2d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[  262.070112] RAX: ffffffffffffffda RBX: 000000005c73c202 RCX: 00007f28e91a8b87
[  262.079087] RDX: 0000000000000000 RSI: 00007ffdc5c4e340 RDI: 0000000000000003
[  262.088122] RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000000000c
[  262.097157] R10: 000000000000000c R11: 0000000000000246 R12: 0000000000000001
[  262.106207] R13: 000000000067b4e0 R14: 00007ffdc5c5248c R15: 00007ffdc5c52480
[  262.115271] Modules linked in: act_tunnel_key act_skbmod act_simple act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 act_csum libcrc32c act_meta_skbtcindex act_meta_skbprio act_meta_mark act_ife ife act_police act_sample psample act_gact veth nfsv3 nfs_acl nfs lockd grace fscache bridge stp llc intel_rapl sb_edac mlx5_ib x86_pkg_temp_thermal sunrpc intel_powerclamp coretemp ib_uverbs kvm_intel ib_core kvm irqbypass mlx5_core crct10dif_pclmul crc32_pclmul crc32c_intel igb ghash_clmulni_intel intel_cstate mlxfw iTCO_wdt devlink intel_uncore iTCO_vendor_support ipmi_ssif ptp mei_me intel_rapl_perf ioatdma joydev pps_core ses mei i2c_i801 pcspkr enclosure lpc_ich dca wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter pcc_cpufreq ast i2c_algo_bit drm_kms_helper ttm drm mpt3sas raid_class scsi_transport_sas
[  262.204393] CR2: 00000000000000b0
[  262.210390] ---[ end trace 2e41d786f2c7901a ]---
[  262.226790] RIP: 0010:dst_cache_destroy+0x21/0xa0
[  262.234083] Code: f4 ff ff ff eb f6 0f 1f 00 0f 1f 44 00 00 41 56 41 55 49 c7 c6 60 fe 35 af 41 54 55 49 89 fc 53 bd ff ff ff ff e8 ef 98 73 ff <49> 83 3c 24 00 75 35 eb 6c 4c 63 ed e8 de 98 73 ff 4a 8d 3c ed 40
[  262.258311] RSP: 0018:ffff888316447160 EFLAGS: 00010282
[  262.266304] RAX: 0000000000000000 RBX: ffff88835b3e2f00 RCX: ffffffffad1c5071
[  262.276251] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: 0000000000000297
[  262.286208] RBP: 00000000ffffffff R08: fffffbfff5dd4e89 R09: fffffbfff5dd4e89
[  262.296183] R10: 0000000000000001 R11: fffffbfff5dd4e88 R12: 00000000000000b0
[  262.306157] R13: ffff8883267a10c0 R14: ffffffffaf35fe60 R15: 0000000000000001
[  262.316139] FS:  00007f28ea3e6400(0000) GS:ffff888364200000(0000) knlGS:0000000000000000
[  262.327146] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  262.335815] CR2: 00000000000000b0 CR3: 00000003178ae004 CR4: 00000000001606e0

Fixes: 41411e2fd6 ("net/sched: act_tunnel_key: Add dst_cache support")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-27 21:31:15 -08:00
Vlad Buslov
1d9978757d Revert "net: sched: fw: don't set arg->stop in fw_walk() when empty"
This reverts commit 31a9984876 ("net: sched: fw: don't set arg->stop in
fw_walk() when empty")

Cls API function tcf_proto_is_empty() was changed in commit
6676d5e416 ("net: sched: set dedicated tcf_walker flag when tp is empty")
to no longer depend on arg->stop to determine that classifier instance is
empty. Instead, it adds dedicated arg->nonempty field, which makes the fix
in fw classifier no longer necessary.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-27 10:12:19 -08:00
Leslie Monis
ff8285f818 net: sched: pie: fix 64-bit division
Use div_u64() to resolve build failures on 32-bit platforms.

Fixes: 3f7ae5f3dc ("net: sched: pie: add more cases to auto-tune alpha and beta")
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-26 18:55:38 -08:00
Li RongQing
3b40bf4e24 net: Use RCU_POINTER_INITIALIZER() to init static variable
This pointer is RCU protected, so proper primitives should be used.

Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-26 14:50:22 -08:00
Vlad Buslov
268a351d4a net: sched: fix typo in walker_check_empty()
Function walker_check_empty() incorrectly verifies that tp pointer is not
NULL, instead of actual filter pointer. Fix conditional to check the right
pointer. Adjust filter pointer naming accordingly to other cls API
functions.

Fixes: 6676d5e416 ("net: sched: set dedicated tcf_walker flag when tp is empty")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-26 09:17:49 -08:00
Leslie Monis
24ed49002c net: sched: pie: fix mistake in reference link
Fix the incorrect reference link to RFC 8033

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-26 09:11:33 -08:00
Mohit P. Tahiliani
c9d2ac5e6b net: sched: pie: update references
RFC 8033 replaces the IETF draft for PIE

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-25 14:21:03 -08:00
Mohit P. Tahiliani
95400b975d net: sched: pie: add derandomization mechanism
Random dropping of packets to achieve latency control may
introduce outlier situations where packets are dropped too
close to each other or too far from each other. This can
cause the real drop percentage to temporarily deviate from
the intended drop probability. In certain scenarios, such
as a small number of simultaneous TCP flows, these
deviations can cause significant deviations in link
utilization and queuing latency.

RFC 8033 suggests using a derandomization mechanism to avoid
these deviations.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-25 14:21:03 -08:00
Mohit P. Tahiliani
3f7ae5f3dc net: sched: pie: add more cases to auto-tune alpha and beta
The current implementation scales the local alpha and beta
variables in the calculate_probability function by the same
amount for all values of drop probability below 1%.

RFC 8033 suggests using additional cases for auto-tuning
alpha and beta when the drop probability is less than 1%.

In order to add more auto-tuning cases, MAX_PROB must be
scaled by u64 instead of u32 to prevent underflow when
scaling the local alpha and beta variables in the
calculate_probability function.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-25 14:21:03 -08:00
Mohit P. Tahiliani
30a92ad703 net: sched: pie: change initial value of pie_vars->burst_time
RFC 8033 suggests an initial value of 150 milliseconds for
the maximum time allowed for a burst of packets.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-25 14:21:03 -08:00
Mohit P. Tahiliani
29daa85538 net: sched: pie: change default value of pie_params->tupdate
RFC 8033 suggests a default value of 15 milliseconds for the
update interval.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-25 14:21:03 -08:00
Mohit P. Tahiliani
abde7920de net: sched: pie: change default value of pie_params->target
RFC 8033 suggests a default value of 15 milliseconds for the
target queue delay.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-25 14:21:03 -08:00
Mohit P. Tahiliani
575090036c net: sched: pie: change value of QUEUE_THRESHOLD
RFC 8033 recommends a value of 16384 bytes for the queue
threshold.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-25 14:21:03 -08:00
Vlad Buslov
ace4a267e8 net: sched: don't release block->lock when dumping chains
Function tc_dump_chain() obtains and releases block->lock on each iteration
of its inner loop that dumps all chains on block. Outputting chain template
info is fast operation so locking/unlocking mutex multiple times is an
overhead when lock is highly contested. Modify tc_dump_chain() to only
obtain block->lock once and dump all chains without releasing it.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-25 10:25:02 -08:00
Vlad Buslov
6676d5e416 net: sched: set dedicated tcf_walker flag when tp is empty
Using tcf_walker->stop flag to determine when tcf_walker->fn() was called
at least once is unreliable. Some classifiers set 'stop' flag on error
before calling walker callback, other classifiers used to call it with NULL
filter pointer when empty. In order to prevent further regressions, extend
tcf_walker structure with dedicated 'nonempty' flag. Set this flag in
tcf_walker->fn() implementation that is used to check if classifier has
filters configured.

Fixes: 8b64678e0a ("net: sched: refactor tp insert/delete for concurrent execution")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-25 10:18:17 -08:00
wenxu
41411e2fd6 net/sched: act_tunnel_key: Add dst_cache support
The metadata_dst is not init the dst_cache which make the
ip_md_tunnel_xmit can't use the dst_cache. It will lookup
route table every packets.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24 21:51:55 -08:00
Cong Wang
14215108a1 net_sched: initialize net pointer inside tcf_exts_init()
For tcindex filter, it is too late to initialize the
net pointer in tcf_exts_validate(), as tcf_exts_get_net()
requires a non-NULL net pointer. We can just move its
initialization into tcf_exts_init(), which just requires
an additional parameter.

This makes the code in tcindex_alloc_perfect_hash()
prettier.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-22 15:26:51 -08:00
Dan Carpenter
af736bf071 net: sched: potential NULL dereference in tcf_block_find()
The error code isn't set on this path so it would result in returning
ERR_PTR(0) and a NULL dereference in the caller.

Fixes: 18d3eefb17 ("net: sched: refactor tcf_block_find() into standalone functions")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-21 13:11:07 -08:00
Cong Wang
51dcb69de6 net_sched: fix a memory leak in cls_tcindex
(cherry picked from commit 033b228e7f)

When tcindex_destroy() destroys all the filter results in
the perfect hash table, it invokes the walker to delete
each of them. However, results with class==0 are skipped
in either tcindex_walk() or tcindex_delete(), which causes
a memory leak reported by kmemleak.

This patch fixes it by skipping the walker and directly
deleting these filter results so we don't miss any filter
result.

As a result of this change, we have to initialize exts->net
properly in tcindex_alloc_perfect_hash(). For net-next, we
need to consider whether we should initialize ->net in
tcf_exts_init() instead, before that just directly test
CONFIG_NET_CLS_ACT=y.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-20 20:11:10 -08:00
Cong Wang
3d210534cc net_sched: fix a race condition in tcindex_destroy()
(cherry picked from commit 8015d93ebd)

tcindex_destroy() invokes tcindex_destroy_element() via
a walker to delete each filter result in its perfect hash
table, and tcindex_destroy_element() calls tcindex_delete()
which schedules tcf RCU works to do the final deletion work.
Unfortunately this races with the RCU callback
__tcindex_destroy(), which could lead to use-after-free as
reported by Adrian.

Fix this by migrating this RCU callback to tcf RCU work too,
as that workqueue is ordered, we will not have use-after-free.

Note, we don't need to hold netns refcnt because we don't call
tcf_exts_destroy() here.

Fixes: 27ce4f05e2 ("net_sched: use tcf_queue_work() in tcindex filter")
Reported-by: Adrian <bugs@abtelecom.ro>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-20 20:11:10 -08:00
Wei Yongjun
6e07902f56 net: sched: using kfree_rcu() to simplify the code
The callback function of call_rcu() just calls a kfree(), so we
can use kfree_rcu() instead of call_rcu() + callback function.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-18 16:34:51 -08:00
Ivan Vecera
aaeb1dea51 net: sched: sch_api: set an error msg when qdisc_alloc_handle() fails
This patch sets an error message in extack when the number of qdisc
handles exceeds the maximum. Also the error-code ENOSPC is more
appropriate than ENOMEM in this situation.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reported-by: Li Shuang <shuali@redhat.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 15:39:08 -08:00
Vlad Buslov
8b58d12f4a net: sched: cgroup: verify that filter is not NULL during walk
Check that filter is not NULL before passing it to tcf_walker->fn()
callback in cls_cgroup_walk(). This can happen when cls_cgroup_change()
failed to set first filter.

Fixes: ed76f5edcc ("net: sched: protect filter_chain list with filter_chain_lock mutex")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 13:26:57 -08:00
Vlad Buslov
d66022cd16 net: sched: matchall: verify that filter is not NULL in mall_walk()
Check that filter is not NULL before passing it to tcf_walker->fn()
callback. This can happen when mall_change() failed to offload filter to
hardware.

Fixes: ed76f5edcc ("net: sched: protect filter_chain list with filter_chain_lock mutex")
Reported-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 13:26:39 -08:00
Vlad Buslov
3027ff41f6 net: sched: route: don't set arg->stop in route4_walk() when empty
Some classifiers set arg->stop in their implementation of tp->walk() API
when empty. Most of classifiers do not adhere to that convention. Do not
set arg->stop in route4_walk() to unify tp->walk() behavior among
classifier implementations.

Fixes: ed76f5edcc ("net: sched: protect filter_chain list with filter_chain_lock mutex")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 13:24:39 -08:00
Vlad Buslov
31a9984876 net: sched: fw: don't set arg->stop in fw_walk() when empty
Some classifiers set arg->stop in their implementation of tp->walk() API
when empty. Most of classifiers do not adhere to that convention. Do not
set arg->stop in fw_walk() to unify tp->walk() behavior among classifier
implementations.

Fixes: ed76f5edcc ("net: sched: protect filter_chain list with filter_chain_lock mutex")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 13:24:18 -08:00
David S. Miller
3313da8188 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The netfilter conflicts were rather simple overlapping
changes.

However, the cls_tcindex.c stuff was a bit more complex.

On the 'net' side, Cong is fixing several races and memory
leaks.  Whilst on the 'net-next' side we have Vlad adding
the rtnl-ness support.

What I've decided to do, in order to resolve this, is revert the
conversion over to using a workqueue that Cong did, bringing us back
to pure RCU.  I did it this way because I believe that either Cong's
races don't apply with have Vlad did things, or Cong will have to
implement the race fix slightly differently.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15 12:38:38 -08:00
YueHaibing
fb14b09635 net: sched: remove duplicated include from cls_api.c
Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13 21:09:06 -08:00
Vlad Buslov
1f15bb4f39 net: sched: flower: only return error from hw offload if skip_sw
Recently introduced tc_setup_flow_action() can fail when parsing tcf_exts
on some unsupported action commands. However, this should not affect the
case when user did not explicitly request hw offload by setting skip_sw
flag. Modify tc_setup_flow_action() callers to only propagate the error if
skip_sw flag is set for filter that is being offloaded, and set extack
error message in that case.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Fixes: 3a7b68617d ("cls_api: add translator to flow_action representation")
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13 21:06:43 -08:00
Cong Wang
1db817e75f net_sched: fix two more memory leaks in cls_tcindex
struct tcindex_filter_result contains two parts:
struct tcf_exts and struct tcf_result.

For the local variable 'cr', its exts part is never used but
initialized without being released properly on success path. So
just completely remove the exts part to fix this leak.

For the local variable 'new_filter_result', it is never properly
released if not used by 'r' on success path.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 14:10:56 -05:00
Cong Wang
033b228e7f net_sched: fix a memory leak in cls_tcindex
When tcindex_destroy() destroys all the filter results in
the perfect hash table, it invokes the walker to delete
each of them. However, results with class==0 are skipped
in either tcindex_walk() or tcindex_delete(), which causes
a memory leak reported by kmemleak.

This patch fixes it by skipping the walker and directly
deleting these filter results so we don't miss any filter
result.

As a result of this change, we have to initialize exts->net
properly in tcindex_alloc_perfect_hash(). For net-next, we
need to consider whether we should initialize ->net in
tcf_exts_init() instead, before that just directly test
CONFIG_NET_CLS_ACT=y.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 14:10:56 -05:00
Cong Wang
8015d93ebd net_sched: fix a race condition in tcindex_destroy()
tcindex_destroy() invokes tcindex_destroy_element() via
a walker to delete each filter result in its perfect hash
table, and tcindex_destroy_element() calls tcindex_delete()
which schedules tcf RCU works to do the final deletion work.
Unfortunately this races with the RCU callback
__tcindex_destroy(), which could lead to use-after-free as
reported by Adrian.

Fix this by migrating this RCU callback to tcf RCU work too,
as that workqueue is ordered, we will not have use-after-free.

Note, we don't need to hold netns refcnt because we don't call
tcf_exts_destroy() here.

Fixes: 27ce4f05e2 ("net_sched: use tcf_queue_work() in tcindex filter")
Reported-by: Adrian <bugs@abtelecom.ro>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 14:10:56 -05:00
Vlad Buslov
470502de5b net: sched: unlock rules update API
Register netlink protocol handlers for message types RTM_NEWTFILTER,
RTM_DELTFILTER, RTM_GETTFILTER as unlocked. Set rtnl_held variable that
tracks rtnl mutex state to be false by default.

Introduce tcf_proto_is_unlocked() helper that is used to check
tcf_proto_ops->flag to determine if ops can be called without taking rtnl
lock. Manually lookup Qdisc, class and block in rule update handlers.
Verify that both Qdisc ops and proto ops are unlocked before using any of
their callbacks, and obtain rtnl lock otherwise.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 13:41:33 -05:00
Vlad Buslov
18d3eefb17 net: sched: refactor tcf_block_find() into standalone functions
Refactor tcf_block_find() code into three standalone functions:
- __tcf_qdisc_find() to lookup Qdisc and increment its reference counter.
- __tcf_qdisc_cl_find() to lookup class.
- __tcf_block_find() to lookup block and increment its reference counter.

This change is necessary to allow netlink tc rule update handlers to call
these functions directly in order to conditionally take rtnl lock
according to Qdisc class ops flags before calling any of class ops
functions.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 13:41:33 -05:00
Vlad Buslov
12db03b65c net: sched: extend proto ops to support unlocked classifiers
Add 'rtnl_held' flag to tcf proto change, delete, destroy, dump, walk
functions to track rtnl lock status. Extend users of these function in cls
API to propagate rtnl lock status to them. This allows classifiers to
obtain rtnl lock when necessary and to pass rtnl lock status to extensions
and driver offload callbacks.

Add flags field to tcf proto ops. Add flag value to indicate that
classifier doesn't require rtnl lock.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 13:41:33 -05:00
Vlad Buslov
7d5509fa0d net: sched: extend proto ops with 'put' callback
Add optional tp->ops->put() API to be implemented for filter reference
counting. This new function is called by cls API to release filter
reference for filters returned by tp->ops->change() or tp->ops->get()
functions. Implement tfilter_put() helper to call tp->ops->put() only for
classifiers that implement it.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 13:41:33 -05:00
Vlad Buslov
ec6743a109 net: sched: track rtnl lock status when validating extensions
Actions API is already updated to not rely on rtnl lock for
synchronization. However, it need to be provided with rtnl status when
called from classifiers API in order to be able to correctly release the
lock when loading kernel module.

Extend extension validation function with 'rtnl_held' flag which is passed
to actions API. Add new 'rtnl_held' parameter to tcf_exts_validate() in cls
API. No classifier is currently updated to support unlocked execution, so
pass hardcoded 'true' flag parameter value.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 13:41:33 -05:00
Vlad Buslov
726d061286 net: sched: prevent insertion of new classifiers during chain flush
Extend tcf_chain with 'flushing' flag. Use the flag to prevent insertion of
new classifier instances when chain flushing is in progress in order to
prevent resource leak when tcf_proto is created by unlocked users
concurrently.

Return EAGAIN error from tcf_chain_tp_insert_unique() to restart
tc_new_tfilter() and lookup the chain/proto again.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 13:41:33 -05:00
Vlad Buslov
8b64678e0a net: sched: refactor tp insert/delete for concurrent execution
Implement unique insertion function to atomically attach tcf_proto to chain
after verifying that no other tcf proto with specified priority exists.
Implement delete function that verifies that tp is actually empty before
deleting it. Use these functions to refactor cls API to account for
concurrent tp and rule update instead of relying on rtnl lock. Add new
'deleting' flag to tcf proto. Use it to restart search when iterating over
tp's on chain to prevent accessing potentially inval tp->next pointer.

Extend tcf proto with spinlock that is intended to be used to protect its
data from concurrent modification instead of relying on rtnl mutex. Use it
to protect 'deleting' flag. Add lockdep macros to validate that lock is
held when accessing protected fields.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 13:41:33 -05:00
Vlad Buslov
fe2923afc1 net: sched: traverse classifiers in chain with tcf_get_next_proto()
All users of chain->filters_chain rely on rtnl lock and assume that no new
classifier instances are added when traversing the list. Use
tcf_get_next_proto() to traverse filters list without relying on rtnl
mutex. This function iterates over classifiers by taking reference to
current iterator classifier only and doesn't assume external
synchronization of filters list.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 13:41:33 -05:00
Vlad Buslov
4dbfa76644 net: sched: introduce reference counting for tcf_proto
In order to remove dependency on rtnl lock and allow concurrent tcf_proto
modification, extend tcf_proto with reference counter. Implement helper
get/put functions for tcf proto and use them to modify cls API to always
take reference to tcf_proto while using it. Only release reference to
parent chain after releasing last reference to tp.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 13:41:32 -05:00
Vlad Buslov
ed76f5edcc net: sched: protect filter_chain list with filter_chain_lock mutex
Extend tcf_chain with new filter_chain_lock mutex. Always lock the chain
when accessing filter_chain list, instead of relying on rtnl lock.
Dereference filter_chain with tcf_chain_dereference() lockdep macro to
verify that all users of chain_list have the lock taken.

Rearrange tp insert/remove code in tc_new_tfilter/tc_del_tfilter to execute
all necessary code while holding chain lock in order to prevent
invalidation of chain_info structure by potential concurrent change. This
also serializes calls to tcf_chain0_head_change(), which allows head change
callbacks to rely on filter_chain_lock for synchronization instead of rtnl
mutex.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 13:41:32 -05:00
Vlad Buslov
a5654820bb net: sched: protect chain template accesses with block lock
When cls API is called without protection of rtnl lock, parallel
modification of chain is possible, which means that chain template can be
changed concurrently in certain circumstances. For example, when chain is
'deleted' by new user-space chain API, the chain might continue to be used
if it is referenced by actions, and can be 're-created' again by user. In
such case same chain structure is reused and its template is changed. To
protect from described scenario, cache chain template while holding block
lock. Introduce standalone tc_chain_notify_delete() function that works
with cached template values, instead of chains themselves.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 13:41:32 -05:00
Vlad Buslov
bbf73830cd net: sched: traverse chains in block with tcf_get_next_chain()
All users of block->chain_list rely on rtnl lock and assume that no new
chains are added when traversing the list. Use tcf_get_next_chain() to
traverse chain list without relying on rtnl mutex. This function iterates
over chains by taking reference to current iterator chain only and doesn't
assume external synchronization of chain list.

Don't take reference to all chains in block when flushing and use
tcf_get_next_chain() to safely iterate over chain list instead. Remove
tcf_block_put_all_chains() that is no longer used.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 13:41:32 -05:00
Vlad Buslov
165f01354c net: sched: protect block->chain0 with block->lock
In order to remove dependency on rtnl lock, use block->lock to protect
chain0 struct from concurrent modification. Rearrange code in chain0
callback add and del functions to only access chain0 when block->lock is
held.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 13:41:32 -05:00
Vlad Buslov
2cbfab07c6 net: sched: refactor tc_ctl_chain() to use block->lock
In order to remove dependency on rtnl lock, modify chain API to use
block->lock to protect chain from concurrent modification. Rearrange
tc_ctl_chain() code to call tcf_chain_hold() while holding block->lock to
prevent concurrent chain removal.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 13:41:32 -05:00
Vlad Buslov
91052fa1c6 net: sched: protect chain->explicitly_created with block->lock
In order to remove dependency on rtnl lock, protect
tcf_chain->explicitly_created flag with block->lock. Consolidate code that
checks and resets 'explicitly_created' flag into __tcf_chain_put() to
execute it atomically with rest of code that puts chain reference.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 13:41:32 -05:00
Vlad Buslov
c266f64dbf net: sched: protect block state with mutex
Currently, tcf_block doesn't use any synchronization mechanisms to protect
critical sections that manage lifetime of its chains. block->chain_list and
multiple variables in tcf_chain that control its lifetime assume external
synchronization provided by global rtnl lock. Converting chain reference
counting to atomic reference counters is not possible because cls API uses
multiple counters and flags to control chain lifetime, so all of them must
be synchronized in chain get/put code.

Use single per-block lock to protect block data and manage lifetime of all
chains on the block. Always take block->lock when accessing chain_list.
Chain get and put modify chain lifetime-management data and parent block's
chain_list, so take the lock in these functions. Verify block->lock state
with assertions in functions that expect to be called with the lock taken
and are called from multiple places. Take block->lock when accessing
filter_chain_list.

In order to allow parallel update of rules on single block, move all calls
to classifiers outside of critical sections protected by new block->lock.
Rearrange chain get and put functions code to only access protected chain
data while holding block lock:
- Rearrange code to only access chain reference counter and chain action
  reference counter while holding block lock.
- Extract code that requires block->lock from tcf_chain_destroy() into
  standalone tcf_chain_destroy() function that is called by
  __tcf_chain_put() in same critical section that changes chain reference
  counters.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 13:41:32 -05:00
Jouke Witteveen
989723b00b Documentation: bring operstate documentation up-to-date
Netlink has moved from bitmasks to group numbers long ago.

Signed-off-by: Jouke Witteveen <j.witteveen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-11 12:38:51 -08:00
Eli Cohen
eddd2cf195 net: Change TCA_ACT_* to TCA_ID_* to match that of TCA_ID_POLICE
Modify the kernel users of the TCA_ACT_* macros to use TCA_ID_*. For
example, use TCA_ID_GACT instead of TCA_ACT_GACT. This will align with
TCA_ID_POLICE and also differentiates these identifier, used in struct
tc_action_ops type field, from other macros starting with TCA_ACT_.

To make things clearer, we name the enum defining the TCA_ID_*
identifiers and also change the "type" field of struct tc_action to
id.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-10 09:28:43 -08:00