RDS removes a datagram (rds_message) from the retransmit queue when
an ACK is received. The ACK indicates that the receiver has queued
the RDS datagram, so that the sender can safely forget the datagram.
When all references to the rds_message are quiesced, rds_message_purge
is called to release resources used by the rds_message
If the datagram to be removed had pinned pages set up, add
an entry to the rs->rs_znotify_queue so that the notifcation
will be sent up via rds_rm_zerocopy_callback() when the
rds_message is eventually freed by rds_message_purge.
rds_rm_zerocopy_callback() attempts to batch the number of cookies
sent with each notification to a max of SO_EE_ORIGIN_MAX_ZCOOKIES.
This is achieved by checking the tail skb in the sk_error_queue:
if this has room for one more cookie, the cookie from the
current notification is added; else a new skb is added to the
sk_error_queue. Every invocation of rds_rm_zerocopy_callback() will
trigger a ->sk_error_report to notify the application.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
allow the application to set SO_ZEROCOPY on the underlying sk
of a PF_RDS socket
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing model holds a reference from the rds_sock to the
rds_message, but the rds_message does not itself hold a sock_put()
on the rds_sock. Instead the m_rs field in the rds_message is
assigned when the message is queued on the sock, and nulled when
the message is dequeued from the sock.
We want to be able to notify userspace when the rds_message
is actually freed (from rds_message_purge(), after the refcounts
to the rds_message go to 0). At the time that rds_message_purge()
is called, the message is no longer on the rds_sock retransmit
queue. Thus the explicit reference for the m_rs is needed to
send a notification that will signal to userspace that
it is now safe to free/reuse any pages that may have
been pinned down for zerocopy.
This patch manages the m_rs assignment in the rds_message with
the necessary refcount book-keeping.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RDS would like to use the helper functions for managing pinned pages
added by Commit a91dbff551 ("sock: ulimit on MSG_ZEROCOPY pages")
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring says:
====================
net: sched: act: add extack support
this patch series adds extack support for the TC action subsystem.
As example I for the extack support in a TC action I choosed mirred
action.
- Alex
Cc: David Ahern <dsahern@gmail.com>
changes since v3:
- adapt recommended changes from Davide Caratti, please check if
I catch everything. Thanks.
changes since v2:
- remove newline in extack of generic walker handling
Thanks to Davide Caratti
- add kernel@mojatatu.com in cc
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds extack to tcf_action_init and tcf_action_init_1
functions. These are necessary to make individual extack handling in
each act implementation.
Based on work by David Ahern <dsahern@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is used by subsequent patches. It fixes code style issues
caught by checkpatch.
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When tca_action_flush() calls the action walk() and gets an error,
a successful call to nla_nest_start() is not followed by a call to
nla_nest_cancel(). It's harmless, as the skb is freed in the error
path - but it's worth to fix this unbalance.
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn says:
====================
net: dsa: mv88e6xxx: Improve PTP access latency
PTP needs to retrieve the hardware timestamps from the switch device
in a low latency manor. However ethtool -S and bridge fdb show can
hold the switch register access mutex for a long time. These patches
changes the reading the statistics and the ATU so that the mutex is
released and taken again between each statistic or ATU entry. The PTP
code can then interleave its access to the hardware, keeping its
latency low.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The PTP code needs low latency access to the PTP hardware timestamps.
Reading all the ATU entries in one go adds a lot of latency to the PTP
code. So take and release the reg_lock mutex for each individual MAC
address in the ATU, allowing the PTP thread jump in between.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PTP code needs low latency access to the PTP hardware timestamps.
Reading all the statistics in one go adds a lot of latency to the PTP
code. So take and release the reg_lock mutex for each individual
statistics, allowing the PTP thread jump in between.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy says:
====================
tipc: de-generealize topology server
The topology server is partially based on a template that is much
more generic than what we need. This results in a code that is
unnecessarily hard to follow and keeping bug free.
We now take the consequence of the fact that we only have one such
server in TIPC, - with no prospects for introducing any more, and
adapt the code to the specialized task is really is doing.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We rename struct tipc_server to struct tipc_topsrv. This reflect its now
specialized role as topology server. Accoringly, we change or add function
prefixes to make it clearer which functionality those belong to.
There are no functional changes in this commit.
Acked-by: Ying.Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We move the listener socket to struct tipc_server and give it its own
work item. This makes it easier to follow the code, and entails some
simplifications in the reception code in subscriber sockets.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to narrow the interface and dependencies between the topology
server and the subscription/binding table functionality we move struct
tipc_server inside the file server.c. This requires some code
adaptations in other files, but those are mostly minor.
The most important change is that we have to move the start/stop
functions for the topology server to server.c, where they logically
belong anyway.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we now have removed struct tipc_subscriber from the code, and
only struct tipc_subscription remains, there is no longer need for long
and awkward prefixes to distinguish between their pertaining functions.
We now change all tipc_subscrp_* prefixes to tipc_sub_*. This is
a purely cosmetic change.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the previous changes it becomes logical to collapse the two-level
creation of subscription instances into one. We do that here.
We also rename the creation and deletion functions for more consistency.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because of the requirement for total distribution transparency, users
send subscriptions and receive topology events in their own host format.
It is up to the topology server to determine this format and do the
correct conversions to and from its own host format when needed.
Until now, this has been handled in a rather non-transparent way inside
the topology server and subscriber code, leading to unnecessary
complexity when creating subscriptions and issuing events.
We now improve this situation by adding two new macros, tipc_sub_read()
and tipc_evt_write(). Both those functions calculate the need for
conversion internally before performing their respective operations.
Hence, all handling of such conversions become transparent to the rest
of the code.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The message transmission and reception in the topology server is more
generic than is currently necessary. By basing the funtionality on the
fact that we only send items of type struct tipc_event and always
receive items of struct tipc_subcr we can make several simplifications,
and also get rid of some unnecessary dynamic memory allocations.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is unnecessary to keep two structures, struct tipc_conn and struct
tipc_subscriber, with a one-to-one relationship and still with different
life cycles. The fact that the two often run in different contexts, and
still may access each other via direct pointers constitutes an additional
hazard, something we have experienced at several occasions, and still
see happening.
We have identified at least two remaining problems that are easier to
fix if we simplify the topology server data structure somewhat.
- When there is a race between a subscription up/down event and a
timeout event, it is fully possible that the former might be delivered
after the latter, leading to confusion for the receiver.
- The function tipc_subcrp_timeout() is executing in interrupt context,
while the following call chain is at least theoretically possible:
tipc_subscrp_timeout()
tipc_subscrp_send_event()
tipc_conn_sendmsg()
conn_put()
tipc_conn_kref_release()
sock_release(sock)
I.e., we end up calling a function that might try to sleep in
interrupt context. To eliminate this, we need to ensure that the
tipc_conn structure and the socket, as well as the subscription
instances, only are deleted in work queue context, i.e., after the
timeout event really has been sent out.
We now remove this unnecessary complexity, by merging data and
functionality of the subscriber structure into struct tipc_conn
and the associated file server.c. We thereafter add a spinlock and
a new 'inactive' state to the subscription structure. Using those,
both problems described above can be easily solved.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Interaction between the functionality in server.c and subscr.c is
done via function pointers installed in struct server. This makes
the code harder to follow, and doesn't serve any obvious purpose.
Here, we replace the function pointers with direct function calls.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The socket handling in the topology server is unnecessarily generic.
It is prepared to handle both SOCK_RDM, SOCK_DGRAM and SOCK_STREAM
type sockets, as well as the only socket type which is really used,
SOCK_SEQPACKET.
We now remove this redundant code to make the code more readable.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg says:
====================
pull request: bluetooth-next 2018-02-15
Here's the first bluetooth-next pull request targetting the 4.17 kernel
release.
- Fixes & cleanups to Atheros and Marvell drivers
- Support for two new Realtek controllers
- Support for new Intel Bluetooth controller
- Fix for supporting multiple slave-role Bluetooth LE connections
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
eBPF test fails due to verifier failure because log_buf is too small.
Fixed by increasing log_buf size
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove rt_table_id from rtable. It was added for getroute to return the
table id that was hit in the lookup. With the changes for fibmatch the
table id can be extracted from the fib_info returned in the fib_result
so it no longer needs to be in rtable directly.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brenda J. Butler says:
====================
tools: tc-testing: Plugin Architecture
To make tdc.py more general, we are introducing a plugin architecture.
This patch set first organizes the command line parameters, then
introduces the plugin architecture and some example plugins.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Run the command under test under valgrind. Produce an extra set of
tap output for the memory check on each test.
Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the functionality of creating a namespace before the test suite
and destroying it afterwards to a plugin.
Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the functionality that checks for root permissions into a plugin.
Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This should be a general test architecture, and yet allow specific
tests to be done. Introduce a plugin architecture.
An individual test has 4 stages, setup/execute/verify/teardown. Each
plugin gets a chance to run a function at each stage, plus one call
before all the tests are called ("pre" suite) and one after all the
tests are called ("post" suite). In addition, just before each
command is executed, the plugin gets a chance to modify the command
using the "adjust_command" hook. This makes the test suite quite
flexible.
Future patches will take some functionality out of the tdc.py script and
place it in plugins.
To use the plugins, place the implementation in the plugins directory
and run tdc.py. It will notice the plugins and use them.
Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Split the test_runner function into the loop part (test_runner)
and the contents (run_one_test) for maintainability.
It makes it a little easier to catch exceptions
in an individual test, and keep going (and flush a bunch
of tap results for the skipped tests).
Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Separate the functionality of the command line parameters into "selection"
parameters, "action" parameters and other parameters.
"Selection" parameters are for choosing which tests on which to act.
"Action" parameters are for choosing what to do with the selected tests.
"Other" parameters are for global effect (like "help" or "verbose").
With this commit, we add the ability to name a directory as another
selection mechanism. We can accumulate a number of tests by directory,
file, category, or even by test id, instead of being constrained to
run all tests in one collection or just one test.
Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai says:
====================
net: Add ioctl() SIOCGSKNS cmd to allow obtaining net ns of tun device
Currently, it's not possible to get or check net namespace,
which was used to create tun socket. User may have two tun
devices with the same names in different nets, and there
is no way to differ them each other.
The patchset adds support for ioctl() cmd SIOCGSKNS for tun
devices. It will allow people to obtain net namespace file
descriptor like we allow to do that for sockets in general.
v2: Add new patch [2/3] to export open_related_ns().
====================
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds possibility to get tun device's net namespace fd
in the same way we allow to do that for sockets.
Socket ioctl numbers do not intersect with tun-specific, and there
is already SIOCSIFHWADDR used in tun code. So, SIOCGSKNS number
is choosen instead of custom-made for this functionality.
Note, that open_related_ns() uses plain get_net_ns() and it's safe
(net can't be already dead at this moment):
tun socket is allocated via sk_alloc() with zero last arg (kern = 0).
So, each alive socket increments net::count, and the socket is definitely
alive during ioctl syscall.
Also, common variable net is introduced, so small cleanup in TUNSETIFF
is made.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function will be used to obtain net of tun device.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function will be used to obtain net of tun device.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2018-02-14
This patch series enables the new mqprio hardware offload mechanism
creating traffic classes on VFs for XL710 devices. The parameters
needed to configure these traffic classes/queue channels are provides
by the user via the tc tool. A maximum of four traffic classes can be
created on each VF. This patch series also enables application of cloud
filters to each of these traffic classes. The cloud filters are applied
using the tc-flower classifier.
Example:
1. tc qdisc add dev vf0 root mqprio num_tc 4 map 0 0 0 0 1 2 2 3\
queues 2@0 2@2 1@4 1@5 hw 1 mode channel
2. tc qdisc add dev vf0 ingress
3. ethtool -K vf0 hw-tc-offload on
4. ip link set eth0 vf 0 spoofchk off
5. tc filter add dev vf0 protocol ip parent ffff: prio 1 flower dst_ip\
192.168.3.5/32 ip_proto udp dst_port 25 skip_sw hw_tc 2
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In kcm_attach strp_done is called when sk_user_data is already
set to fail the attach. strp_done needs the strp to be stopped and
warns if it isn't. Call strp_stop in this case to eliminate the
warning message.
Reported-by: syzbot+88dfb55e4c8b770d86e3@syzkaller.appspotmail.com
Fixes: e557124023 ("kcm: Check if sk_user_data already set in kcm_attach"
Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add documentation of ti,clk-output-sel which can be used to select
a specific clock for CLK_OUT.
Signed-off-by: Wadim Egorov <w.egorov@phytec.de>
Signed-off-by: Daniel Schultz <d.schultz@phytec.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The DP83867 has a muxing option for the CLK_OUT pin. It is possible
to set CLK_OUT for different channels.
Create a binding to select a specific clock for CLK_OUT pin.
Signed-off-by: Wadim Egorov <w.egorov@phytec.de>
Signed-off-by: Daniel Schultz <d.schultz@phytec.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the default link tolerance set in struct tipc_bearer only
has effect on links going up after that moment. I.e., a user has to
reset all the node's links across that bearer to have the new value
applied. This is too limiting and disturbing on a running cluster to
be useful.
We now change this so that also already existing links are updated
dynamically, without any need for a reset, when the bearer value is
changed. We leverage the already existing per-link functionality
for this to achieve the wanted effect.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rahul Lakkireddy says:
====================
cxgb4: speed up reading on-chip memory
This series of patches speed up reading on-chip memory (EDC and MC)
by reading 64-bits at a time.
Patch 1 reworks logic to read EDC and MC.
Patch 2 adds logic to read EDC and MC 64-bits at a time.
v2:
- Dropped AVX CPU intrinsic instructions.
- Use readq() to read 64-bits at a time.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Use readq() (via t4_read_reg64()) to read 64-bits at a time.
Read residual in 32-bit multiples.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rework logic to read EDC and MC. Do 32-bit reads at a time.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv4 uses set_lwt_redirect to set the lwtunnel redirect functions as
needed. Move it to lwtunnel.h as lwtunnel_set_redirect and change
IPv6 to also use it.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn says:
====================
PTP support for DSA and mv88e6xxx driver.
This patchset adds support for using the PTP hardware in switches
supported by the mv88e6xxx driver. The code was produces in
collaboration with Brandon Streiff doing the initial implementation,
and then Richard Cochran and Andrew Lunn making further changes and
cleanups.
The code is sufficient to use ptp4l on a single DSA interface, either
as a master or a slave. Due to the use of an MDIO bus to access the
switch, reading hardware timestamps is slower than what ptp4l
expects. Thus it is necessary to use the option
--tx_timestamp_timeout=32. Heavy use of ethtool -S, or bridge fdb show
can also upset ptp4l. Patches to address this will follow.
Further work is requires to support bridges using Boundary Clock or
Transparent Clock mode.
Since the RFC, an overflow bug has been fixed. Brandon Streiff
has also Acked-by: the updates to his initial patchset.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
88E6341 devices default to timestamping at the PHY, but due to a
hardware issue, timestamps via this component are unreliable. For
this family, configure the PTP hardware to force the timestamping
to occur at the MAC.
Signed-off-by: Brandon Streiff <brandon.streiff@ni.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements RX/TX timestamping support.
The Marvell PTP hardware supports RX timestamping individual message
types, but for simplicity we only support the EVENT receive filter since
few if any clients bother with the more specific filter types.
checkpatch and reverse Christmas tree changes by Andrew Lunn.
Re-factor duplicated code paths and avoid IfOk anti-pattern, use the
common ptp worker thread from the class layer and time stamp UDP/IPv4
frames as well as Layer-2 frame by Richard Cochran.
Signed-off-by: Brandon Streiff <brandon.streiff@ni.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>