Rather than allocating/freeing a piece of memory every time
we try to figure out how long a CCW chain is, let's use a piece
of memory allocated for each device.
The io_mutex added with commit 4f76617378 ("vfio-ccw: protect
the I/O region") is held for the duration of the VFIO_CCW_EVENT_IO_REQ
event that accesses/uses this space, so there should be no race
concerns with another CPU attempting an (unexpected) SSCH for the
same device.
Suggested-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Message-Id: <20190618202352.39702-2-farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Unlike the legacy I/O path, scsi-mq preallocates a large array to hold
the scatterlist for each request. This static allocation can consume
substantial amounts of memory on modern controllers which support a
large number of concurrently outstanding requests.
To facilitate a switch to a smaller static allocation combined with a
dynamic allocation for requests that need it, we need to make sure all
SCSI drivers handle chained scatterlists correctly.
Convert remaining drivers that directly dereference the scatterlist
array to using the iterator functions.
[mkp: clarified commit message]
Cc: Steffen Maier <maier@linux.ibm.com>
Cc: Benjamin Block <bblock@linux.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Acked-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This allows device drivers (eg. qeth) to use the struct when processing
information retrieved via RCD.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Acked-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
This feature has never been used, so remove it.
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
With both the direct-addressed and indirect-addressed CCW paths
simplified to this point, the amount of shared code between them is
(hopefully) more easily visible. Move the processing of IDA-specific
bits into the direct-addressed path, and add some useful commentary of
what the individual pieces are doing. This allows us to remove the
entire ccwchain_fetch_idal() routine and maintain a single function
for any non-TIC CCW.
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20190606202831.44135-10-farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
This is purely deck furniture, to help understand the merge of the
direct and indirect handlers.
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20190606202831.44135-9-farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Now that both CCW codepaths build this nested array:
ccwchain->pfn_array_table[1]->pfn_array[#idaws/#pages]
We can collapse this into simply:
ccwchain->pfn_array[#idaws/#pages]
Let's do that, so that we don't have to continually navigate two
nested arrays when the first array always has a count of one.
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20190606202831.44135-8-farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Now that pfn_array_table[] is always an array of 1, it seems silly to
check for the very first entry in an array in the middle of two nested
loops, since we know it'll only ever happen once.
Let's move this outside the loops to simplify things, even though
the "k" variable is still necessary.
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20190606202831.44135-7-farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
While processing a channel program, we currently have two nested
arrays that carry a slightly different structure. The direct CCW
path creates this:
ccwchain->pfn_array_table[1]->pfn_array[#pages]
while an IDA CCW creates:
ccwchain->pfn_array_table[#idaws]->pfn_array[1]
The distinction appears to state that each pfn_array_table entry
points to an array of contiguous pages, represented by a pfn_array,
um, array. Since the direct-addressed scenario can ONLY represent
contiguous pages, it makes the intermediate array necessary but
difficult to recognize. Meanwhile, since an IDAL can contain
non-contiguous pages and there is no logic in vfio-ccw to detect
adjacent IDAWs, it is the second array that is necessary but appearing
to be superfluous.
I am not aware of any documentation that states the pfn_array[] needs
to be of contiguous pages; it is just what the code does today.
I don't see any reason for this either, let's just flip the IDA
codepath around so that it generates:
ch_pat->pfn_array_table[1]->pfn_array[#idaws]
This will bring it in line with the direct-addressed codepath,
so that we can understand the behavior of this memory regardless
of what type of CCW is being processed. And it means the casual
observer does not need to know/care whether the pfn_array[]
represents contiguous pages or not.
NB: The existing vfio-ccw code only supports 4K-block Format-2 IDAs,
so that "#pages" == "#idaws" in this area. This means that we will
have difficulty with this overlap in terminology if support for
Format-1 or 2K-block Format-2 IDAs is ever added. I don't think that
this patch changes our ability to make that distinction.
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20190606202831.44135-6-farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
It is now pretty apparent that ccwchain_handle_ccw()
(nee ccwchain_handle_tic()) does everything that cp_init()
wants to do.
Let's remove that duplicated code from cp_init() and let
ccwchain_handle_ccw() handle it itself.
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20190606202831.44135-5-farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Refactor ccwchain_handle_tic() into a routine that handles a channel
program address (which itself is a CCW pointer), rather than a CCW pointer
that is only a TIC CCW. This will make it easier to reuse this code for
other CCW commands.
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20190606202831.44135-4-farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Extract the "does the target of this TIC already exist?" check from
ccwchain_handle_tic(), so that it's easier to refactor that function
into one that cp_init() is able to use.
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20190606202831.44135-3-farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
The routine cp_free() does nothing but call cp_unpin_free(), and while
most places call cp_free() there is one caller of cp_unpin_free() used
when the cp is guaranteed to have not been marked initialized.
This seems like a dubious way to make a distinction, so let's combine
these routines and make cp_free() do all the work.
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20190606202831.44135-2-farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
The hypervisor needs to interact with the summary indicators, so these
need to be DMA memory as well (at least for protected virtualization
guests).
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Michael Mueller <mimu@linux.ibm.com>
Tested-by: Michael Mueller <mimu@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Before virtio-ccw could get away with not using DMA API for the pieces of
memory it does ccw I/O with. With protected virtualization this has to
change, since the hypervisor needs to read and sometimes also write these
pieces of memory.
The hypervisor is supposed to poke the classic notifiers, if these are
used, out of band with regards to ccw I/O. So these need to be allocated
as DMA memory (which is shared memory for protected virtualization
guests).
Let us factor out everything from struct virtio_ccw_device that needs to
be DMA memory in a satellite that is allocated as such.
Note: The control blocks of I/O instructions do not need to be shared.
These are marshalled by the ultravisor.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Michael Mueller <mimu@linux.ibm.com>
Tested-by: Michael Mueller <mimu@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
This will come in handy soon when we pull out the indicators from
virtio_ccw_device to a memory area that is shared with the hypervisor
(in particular for protected virtualization guests).
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Michael Mueller <mimu@linux.ibm.com>
Tested-by: Michael Mueller <mimu@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
The flag AIRQ_IV_CACHELINE was recently added to airq_iv_create(). Let
us use it! We actually wanted the vector to span a cacheline all along.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Michael Mueller <mimu@linux.ibm.com>
Tested-by: Michael Mueller <mimu@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Protected virtualization guests have to use shared pages for airq
notifier bit vectors, because the hypervisor needs to write these bits.
Let us make sure we allocate DMA memory for the notifier bit vectors by
replacing the kmem_cache with a dma_cache and kalloc() with
cio_dma_zalloc().
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Sebastian Ott <sebott@linux.ibm.com>
Reviewed-by: Michael Mueller <mimu@linux.ibm.com>
Tested-by: Michael Mueller <mimu@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
As virtio-ccw devices are channel devices, we need to use the
dma area within the common I/O layer for any communication with
the hypervisor.
Note that we do not need to use that area for control blocks
directly referenced by instructions, e.g. the orb.
It handles neither QDIO in the common code, nor any device type specific
stuff (like channel programs constructed by the DASD driver).
An interesting side effect is that virtio structures are now going to
get allocated in 31 bit addressable storage.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Sebastian Ott <sebott@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Michael Mueller <mimu@linux.ibm.com>
Tested-by: Michael Mueller <mimu@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
To support protected virtualization cio will need to make sure the
memory used for communication with the hypervisor is DMA memory.
Let us introduce one global pool for cio.
Our DMA pools are implemented as a gen_pool backed with DMA pages. The
idea is to avoid each allocation effectively wasting a page, as we
typically allocate much less than PAGE_SIZE.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Sebastian Ott <sebott@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Michael Mueller <mimu@linux.ibm.com>
Tested-by: Michael Mueller <mimu@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
systemd-modules-load.service automatically tries to load the pkey module
on systems that have MSA.
Pkey also requires the MSA3 facility and a bunch of subfunctions.
Failing with -EOPNOTSUPP makes "systemd-modules-load.service" fail on
any system that does not have all needed subfunctions. For example,
when running under QEMU TCG (but also on systems where protected keys
are disabled via the HMC).
Let's use -ENODEV, so systemd-modules-load.service properly ignores
failing to load the pkey module because of missing HW functionality.
While at it, also convert the -EOPNOTSUPP in pkey_clr2protkey() to -ENODEV.
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
We statically allocate 8 cmd buffers on the read channel, when the only
IO left that's still using them is the long-running READ.
Replace this with a single allocated cmd, that gets restarted whenever
the READ completed.
This introduces refcounting for allocated cmds, so that the READ cmd can
survive the IO completion.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current IDX sequence first sends one WRITE cmd to activate the
device, and then sends a second cmd that READs the response.
Using qeth_alloc_cmd(), we can combine this into a single IO with two
command-chained CCWs.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RCD code is the last remaining IO path that doesn't use the
qeth_send_control_data() infrastructure. Doing so allows us to remove
all sorts of custom state machinery and logic in the IRQ handler.
Instead of introducing statically allocated cmd buffers for this single
IO on the data channel, use the new qeth_alloc_cmd() helper.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
qeth currently uses a fixed set of statically allocated cmd buffers for
the read and write IO channels. This (1) doesn't play well with the single
RCD cmd we need to issue on the data channel, (2) doesn't provide the
necessary flexibility for certain IDX improvements, and (3) is also rather
wasteful since the buffers are idle most of the time.
Add a new type of cmd buffer that is dynamically allocated, and keeps
its ccw chain in the DMA data area. Since this touches most callers of
qeth_setup_ccw(), also add a new CCW flags parameter for future usage.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each cmd buffer maintains a pointer to the IO channel that it was/will
be issued on. So when dealing with cmd buffers, we don't need to pass
around a separate channel pointer.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The vast majority of SETUP-classified trace entries can be moved to
their device-specific trace file. This reduces pollution of the global
SETUP file, and provides a consistent trace view of all activity on the
device.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
OSN currently provides a custom code path to submit IPA cmds, without
waiting for the cmd response. Replace it with qeth_send_ipa_cmd(), which
uses the common qeth_send_control_data() IO infrastructure.
By setting a custom iob->callback, we can now provide feedback to the
caller about whether the cmd has been successfully submitted to HW.
Since the callback then immediately wakes up the reply-waiter object, we
maintain the old behaviour of returning early without waiting for the
response.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The basic MPC initialization sequence is strictly sequential, and
waiting for an available cmd buffer should never be necessary.
So this change only affects the OSN path, where dangling waiters on an
unbounded wait_event() are not desirable. Switch to qeth_get_buffers(),
and let OSN callers deal with -ENOMEM.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When called from qeth_core_probe_device(), qeth_determine_capabilities()
initializes the device's BLKT defaults. From all other callers, the
ccw_device has already been set online and the BLKT setting is skipped.
Clean this up by extracting the BLKT setting into a separate helper that
gets called from the right place.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The completion of a pending READ cmd is processed via
qeth_issue_next_read_cb(). Let this callback also start the next READ
cmd, instead of hardcoding that step into the IRQ handler.
While at it remove the check of the channel state,
__qeth_issue_next_read() already does this.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the tear down sequence in qeth_l?_stop_card() has finished, the
card is guaranteed to be in DOWN state and we don't have to check for
it again.
With this insight we can also remove the redundant setting of
card->state in qeth_l?_set_online()'s error path.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Slightly reduce the complexity of the core xmit path, by replacing some
open-coded logic with the corresponding helpers.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current code suppresses debug entries when an TX buffer completes in
ERROR state with no error indication set in SBALF15.
This was introduced back with
commit 58490f1807 ("qeth: HiperSockets SIGA retry support on CC=2.").
But qeth no longer retries after CC=2, and this sort of suppression
make no sense anymore. Remove it.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert all text files with s390 documentation to ReST format.
Tried to preserve as much as possible the original document
format. Still, some of the files required some work in order
for it to be visible on both plain text and after converted
to html.
The conversion is actually:
- add blank lines and identation in order to identify paragraphs;
- fix tables markups;
- add some lists markups;
- mark literal blocks;
- adjust title markups.
At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Pull networking fixes from David Miller:
1) Free AF_PACKET po->rollover properly, from Willem de Bruijn.
2) Read SFP eeprom in max 16 byte increments to avoid problems with
some SFP modules, from Russell King.
3) Fix UDP socket lookup wrt. VRF, from Tim Beale.
4) Handle route invalidation properly in s390 qeth driver, from Julian
Wiedmann.
5) Memory leak on unload in RDS, from Zhu Yanjun.
6) sctp_process_init leak, from Neil HOrman.
7) Fix fib_rules rule insertion semantic change that broke Android,
from Hangbin Liu.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits)
pktgen: do not sleep with the thread lock held.
net: mvpp2: Use strscpy to handle stat strings
net: rds: fix memory leak in rds_ib_flush_mr_pool
ipv6: fix EFAULT on sendto with icmpv6 and hdrincl
ipv6: use READ_ONCE() for inet->hdrincl as in ipv4
Revert "fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied"
net: aquantia: fix wol configuration not applied sometimes
ethtool: fix potential userspace buffer overflow
Fix memory leak in sctp_process_init
net: rds: fix memory leak when unload rds_rdma
ipv6: fix the check before getting the cookie in rt6_get_cookie
ipv4: not do cache for local delivery if bc_forwarding is enabled
s390/qeth: handle error when updating TX queue count
s390/qeth: fix VLAN attribute in bridge_hostnotify udev event
s390/qeth: check dst entry before use
s390/qeth: handle limited IPv4 broadcast in L3 TX path
net: fix indirect calls helpers for ptype list hooks.
net: ipvlan: Fix ipvlan device tso disabled while NETIF_F_IP_CSUM is set
udp: only choose unbound UDP socket for multicast when not in a VRF
net/tls: replace the sleeping lock around RX resync with a bit lock
...
When a CQ-enabled device uses QEBSM for SBAL state inspection,
get_buf_states() can return the PENDING state for an Output Queue.
get_outbound_buffer_frontier() isn't prepared for this, and any PENDING
buffer will permanently stall all further completion processing on this
Queue.
This isn't a concern for non-QEBSM devices, as get_buf_states() for such
devices will manually turn PENDING buffers into EMPTY ones.
Fixes: 104ea556ee ("qdio: support asynchronous delivery of storage blocks")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Add missing parameter description to fix the following warning:
drivers/s390/cio/qdio_thinint.c:183: warning:
Function parameter or member 'floating' not described in 'tiqdio_thinint_handler'
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Within an EP11 cprb there exists a byte field flags. Bit 0x20
of this field indicates a special cprb. A special cprb triggers
special handling in the firmware below the OS layer.
However, a special cprb also needs to have the S bit in GPR0
set when NQAP is called. This was not the case for EP11 cprbs
and this patch now introduces the code to support this.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
netif_set_real_num_tx_queues() can return an error, deal with it.
Fixes: 73dc2daf11 ("s390/qeth: add TX multiqueue support for OSA devices")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enabling sysfs attribute bridge_hostnotify triggers a series of udev events
for the MAC addresses of all currently connected peers. In case no VLAN is
set for a peer, the device reports the corresponding MAC addresses with
VLAN ID 4096. This currently results in attribute VLAN=4096 for all
non-VLAN interfaces in the initial series of events after host-notify is
enabled.
Instead, no VLAN attribute should be reported in the udev event for
non-VLAN interfaces.
Only the initial events face this issue. For dynamic changes that are
reported later, the device uses a validity flag.
This also changes the code so that it now sets the VLAN attribute for
MAC addresses with VID 0. On Linux, no qeth interface will ever be
registered with VID 0: Linux kernel registers VID 0 on all network
interfaces initially, but qeth will drop .ndo_vlan_rx_add_vid for VID 0.
Peers with other OSs could register MACs with VID 0.
Fixes: 9f48b9db9a ("qeth: bridgeport support - address notifications")
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While qeth_l3 uses netif_keep_dst() to hold onto the dst, a skb's dst
may still have been obsoleted (via dst_dev_put()) by the time that we
end up using it. The dst then points to the loopback interface, which
means the neighbour lookup in qeth_l3_get_cast_type() determines a bogus
cast type of RTN_BROADCAST.
For IQD interfaces this causes us to place such skbs on the wrong
HW queue, resulting in TX errors.
Fix-up the various call sites to first validate the dst entry with
dst_check(), and fall back accordingly.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When selecting the cast type of a neighbourless IPv4 skb (eg. on a raw
socket), qeth_l3 falls back to the packet's destination IP address.
For this case we should classify traffic sent to 255.255.255.255 as
broadcast.
This fixes DHCP requests, which were misclassified as unicast
(and for IQD interfaces thus ended up on the wrong HW queue).
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Formatting of Kconfig files doesn't look so pretty, so just
take damp cloth and clean it up.
Signed-off-by: Enrico Weigelt, metux IT consult <info@metux.net>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
If the CCW being processed is a No-Operation, then by definition no
data is being transferred. Let's fold those checks into the normal
CCW processors, rather than skipping out early.
Likewise, if the CCW being processed is a "test" (a category defined
here as an opcode that contains zero in the lowest four bits) then no
special processing is necessary as far as vfio-ccw is concerned.
These command codes have not been valid since the S/370 days, meaning
they are invalid in the same way as one that ends in an eight [1] or
an otherwise valid command code that is undefined for the device type
in question. Considering that, let's just process "test" CCWs like
any other CCW, and send everything to the hardware.
[1] POPS states that a x08 is a TIC CCW, and that having any high-order
bits enabled is invalid for format-1 CCWs. For format-0 CCWs, the
high-order bits are ignored.
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Message-Id: <20190516161403.79053-4-farman@linux.ibm.com>
Acked-by: Farhan Ali <alifm@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
It is possible that a guest might issue a CCW with a length of zero,
and will expect a particular response. Consider this chain:
Address Format-1 CCW
-------- -----------------
0 33110EC0 346022CC 33177468
1 33110EC8 CF200000 3318300C
CCW[0] moves a little more than two pages, but also has the
Suppress Length Indication (SLI) bit set to handle the expectation
that considerably less data will be moved. CCW[1] also has the SLI
bit set, and has a length of zero. Once vfio-ccw does its magic,
the kernel issues a start subchannel on behalf of the guest with this:
Address Format-1 CCW
-------- -----------------
0 021EDED0 346422CC 021F0000
1 021EDED8 CF240000 3318300C
Both CCWs were converted to an IDAL and have the corresponding flags
set (which is by design), but only the address of the first data
address is converted to something the host is aware of. The second
CCW still has the address used by the guest, which happens to be (A)
(probably) an invalid address for the host, and (B) an invalid IDAW
address (doubleword boundary, etc.).
While the I/O fails, it doesn't fail correctly. In this example, we
would receive a program check for an invalid IDAW address, instead of
a unit check for an invalid command.
To fix this, revert commit 4cebc5d6a6 ("vfio: ccw: validate the
count field of a ccw before pinning") and allow the individual fetch
routines to process them like anything else. We'll make a slight
adjustment to our allocation of the pfn_array (for direct CCWs) or
IDAL (for IDAL CCWs) memory, so that we have room for at least one
address even though no guest memory will be pinned and thus the
IDAW will not be populated with a host address.
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Message-Id: <20190516161403.79053-3-farman@linux.ibm.com>
Acked-by: Farhan Ali <alifm@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
The skip flag of a CCW offers the possibility of data not being
transferred, but is only meaningful for certain commands.
Specifically, it is only applicable for a read, read backward, sense,
or sense ID CCW and will be ignored for any other command code
(SA22-7832-11 page 15-64, and figure 15-30 on page 15-75).
(A sense ID is xE4, while a sense is x04 with possible modifiers in the
upper four bits. So we will cover the whole "family" of sense CCWs.)
For those scenarios, since there is no requirement for the target
address to be valid, we should skip the call to vfio_pin_pages() and
rely on the IDAL address we have allocated/built for the channel
program. The fact that the individual IDAWs within the IDAL are
invalid is fine, since they aren't actually checked in these cases.
Set pa_nr to zero when skipping the pfn_array_pin() call, since it is
defined as the number of pages pinned and is used to determine
whether to call vfio_unpin_pages() upon cleanup.
The pfn_array_pin() routine returns the number of pages that were
pinned, but now might be skipped for some CCWs. Thus we need to
calculate the expected number of pages ourselves such that we are
guaranteed to allocate a reasonable number of IDAWs, which will
provide a valid address in CCW.CDA regardless of whether the IDAWs
are filled in with pinned/translated addresses or not.
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Message-Id: <20190516161403.79053-2-farman@linux.ibm.com>
Acked-by: Farhan Ali <alifm@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Let's initialize the host address to something that is invalid,
rather than letting it default to zero. This just makes it easier
to notice when a pin operation has failed or been skipped.
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Message-Id: <20190514234248.36203-5-farman@linux.ibm.com>
Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
The pfn_array_alloc_pin routine is doing too much. Today, it does the
alloc of the pfn_array struct and its member arrays, builds the iova
address lists out of a contiguous piece of guest memory, and asks vfio
to pin the resulting pages.
Let's effectively revert a significant portion of commit 5c1cfb1c39
("vfio: ccw: refactor and improve pfn_array_alloc_pin()") such that we
break pfn_array_alloc_pin() into its component pieces, and have one
routine that allocates/populates the pfn_array structs, and another
that actually pins the memory. In the future, we will be able to
handle scenarios where pinning memory isn't actually appropriate.
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Message-Id: <20190514234248.36203-4-farman@linux.ibm.com>
Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Otherwise, the guest can believe it's okay to start another I/O
and bump into the non-idle state. This results in a cc=2 (with
the asynchronous CSCH/HSCH code) returned to the guest, which is
unfortunate since everything is otherwise working normally.
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Message-Id: <20190514234248.36203-3-farman@linux.ibm.com>
Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Per the POPs [1], when processing an interrupt the SCSW.CPA field of an
IRB generally points to 8 bytes after the last CCW that was executed
(there are exceptions, but this is the most common behavior).
In the case of an error, this points us to the first un-executed CCW
in the chain. But in the case of normal I/O, the address points beyond
the end of the chain. While the guest generally only cares about this
when possibly restarting a channel program after error recovery, we
should convert the address even in the good scenario so that we provide
a consistent, valid, response upon I/O completion.
[1] Figure 16-6 in SA22-7832-11. The footnotes in that table also state
that this is true even if the resulting address is invalid or protected,
but moving to the end of the guest chain should not be a surprise.
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Message-Id: <20190514234248.36203-2-farman@linux.ibm.com>
Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Six minor fixes to device drivers and one to the multipath alua
handler. The most extensive fix is the zfcp port remove prevention
one, but it's impact is only s390.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXPJlASYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishSElAP9A/qo+
mLomrWOPMGqkQKOX7Mp3sP5lMllbvPndrR+9KQD/eDzFp3HCyUO2OFd6aR7/X90H
IZ+cd/wJCSHenKMJOX8=
=NFnR
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Six minor fixes to device drivers and one to the multipath alua
handler.
The most extensive fix is the zfcp port remove prevention one, but
it's impact is only s390"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: libsas: delete sas port if expander discover failed
scsi: libsas: only clear phy->in_shutdown after shutdown event done
scsi: scsi_dh_alua: Fix possible null-ptr-deref
scsi: smartpqi: properly set both the DMA mask and the coherent DMA mask
scsi: zfcp: fix to prevent port_remove with pure auto scan LUNs (only sdevs)
scsi: zfcp: fix missing zfcp_port reference put on -EBUSY from port_remove
scsi: libcxgbi: add a check for NULL pointer in cxgbi_check_route()
- Farewell Martin Schwidefsky: add Martin to CREDITS and remove him
from MAINTAINERS
- Vasily Gorbik and Christian Borntraeger join as maintainers for s390
- Fix locking bug in ctr(aes) and ctr(des) s390 specific ciphers
- A rather large patch which fixes gcm-aes-s390 scatter gather handling
- Fix zcrypt wrong dispatching for control domain CPRBs
- Fix assignment of bus resources in PCI code
- Fix structure definition for set PCI function
- Fix one compile error and one compile warning seen when
CONFIG_OPTIMIZE_INLINING is enabled
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJc8QIzAAoJECIOw3kbKW7C8+cP/iEuFF/YFKq896Zmd50wV1fL
0ASkqKfrwWNzQz+Y9c7WuIoGNghptr4zPPANkaRLkUCIZ2SsmvSLrggtAD0Ls4ut
vAoCXLH10rDdGWmiXHuDHOsmeQ/1RdbqW6ZfZEDhLNY7vYtCFfpOyAN0QEhGa7/F
NcAemkD9Q3uerATCr37mVcK3GrzzhcbGd9mVN5uqdq0ZLRDrI4K+JxWVisdBvzve
bnWCwUsgDYOhc1C1pMDD8IsWd+F3a7V+caDNWFhMgbRCPA9adNzf9fYuEH5ftI7U
W7bS45ZR5BX3pAHtOIg6s/l2W+cu3vGKuCYIutA2JpRqGM8IASEoUJwMZ54Fu/nF
Eh+zcfizxwREX7VjXKHd9oXZ41cocYdBIt8kCMe+gbj9zKzD716wzhiBVh3WCG6v
uBEx0nHMkHlKaTaNY3oUU5HP6t+zARw+uApkpA9EdNiM1pV6T/n9ySTnJHrU0NNo
3nYlCHzm1W9RLknbp+vmc9SbbEzWhhUpCDUi/5Ny8YsFwsddMixWMmg9DI8zJpJB
Qr6OSD05dThPPH7bWNcBbbt/NU0p/BaxgbVYewcHM/cln1tw2lDuOY0jjiIB3SV/
twhMoFx7fEZCpSsP8t27f33NA3UTEpHhr2KSNZZ2w2DGu6QE/QawgUHDT+VlFs5x
WiUkvArmRqlWL3Aaz6W8
=e7cv
-----END PGP SIGNATURE-----
Merge tag 's390-5.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Heiko Carstens:
- Farewell Martin Schwidefsky: add Martin to CREDITS and remove him
from MAINTAINERS
- Vasily Gorbik and Christian Borntraeger join as maintainers for s390
- Fix locking bug in ctr(aes) and ctr(des) s390 specific ciphers
- A rather large patch which fixes gcm-aes-s390 scatter gather handling
- Fix zcrypt wrong dispatching for control domain CPRBs
- Fix assignment of bus resources in PCI code
- Fix structure definition for set PCI function
- Fix one compile error and one compile warning seen when
CONFIG_OPTIMIZE_INLINING is enabled
* tag 's390-5.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
MAINTAINERS: add Vasily Gorbik and Christian Borntraeger for s390
MAINTAINERS: Farewell Martin Schwidefsky
s390/crypto: fix possible sleep during spinlock aquired
s390/crypto: fix gcm-aes-s390 selftest failures
s390/zcrypt: Fix wrong dispatching for control domain CPRBs
s390/pci: fix assignment of bus resources
s390/pci: fix struct definition for set PCI function
s390: mark __cpacf_check_opcode() and cpacf_query_func() as __always_inline
s390: add unreachable() to dump_fault_info() to fix -Wmaybe-uninitialized
When the user tries to remove a zfcp port via sysfs, we only rejected it if
there are zfcp unit children under the port. With purely automatically
scanned LUNs there are no zfcp units but only SCSI devices. In such cases,
the port_remove erroneously continued. We close the port and this
implicitly closes all LUNs under the port. The SCSI devices survive with
their private zfcp_scsi_dev still holding a reference to the "removed"
zfcp_port (still allocated but invisible in sysfs) [zfcp_get_port_by_wwpn
in zfcp_scsi_slave_alloc]. This is not a problem as long as the fc_rport
stays blocked. Once (auto) port scan brings back the removed port, we
unblock its fc_rport again by design. However, there is no mechanism that
would recover (open) the LUNs under the port (no "ersfs_3" without
zfcp_unit [zfcp_erp_strategy_followup_success]). Any pending or new I/O to
such LUN leads to repeated:
Done: NEEDS_RETRY Result: hostbyte=DID_IMM_RETRY driverbyte=DRIVER_OK
See also v4.10 commit 6f2ce1c6af ("scsi: zfcp: fix rport unblock race
with LUN recovery"). Even a manual LUN recovery
(echo 0 > /sys/bus/scsi/devices/H:C:T:L/zfcp_failed)
does not help, as the LUN links to the old "removed" port which remains
to lack ZFCP_STATUS_COMMON_RUNNING [zfcp_erp_required_act].
The only workaround is to first ensure that the fc_rport is blocked
(e.g. port_remove again in case it was re-discovered by (auto) port scan),
then delete the SCSI devices, and finally re-discover by (auto) port scan.
The port scan includes an fc_rport unblock, which in turn triggers
a new scan on the scsi target to freshly get new pure auto scan LUNs.
Fix this by rejecting port_remove also if there are SCSI devices
(even without any zfcp_unit) under this port. Re-use mechanics from v3.7
commit d99b601b63 ("[SCSI] zfcp: restore refcount check on port_remove").
However, we have to give up zfcp_sysfs_port_units_mutex earlier in unit_add
to prevent a deadlock with scsi_host scan taking shost->scan_mutex first
and then zfcp_sysfs_port_units_mutex now in our zfcp_scsi_slave_alloc().
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: b62a8d9b45 ("[SCSI] zfcp: Use SCSI device data zfcp scsi dev instead of zfcp unit")
Fixes: f8210e3488 ("[SCSI] zfcp: Allow midlayer to scan for LUNs when running in NPIV mode")
Cc: <stable@vger.kernel.org> #2.6.37+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
With this early return due to zfcp_unit child(ren), we don't use the
zfcp_port reference from the earlier zfcp_get_port_by_wwpn() anymore and
need to put it.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: d99b601b63 ("[SCSI] zfcp: restore refcount check on port_remove")
Cc: <stable@vger.kernel.org> #3.7+
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The zcrypt device driver does not handle CPRBs which address
a control domain correctly. This fix introduces a workaround:
The domain field of the request CPRB is checked if there is
a valid domain value in there. If this is true and the value
is a control only domain (a domain which is enabled in the
crypto config ADM mask but disabled in the AQM mask) the
CPRB is forwarded to the default usage domain. If there is
no default domain, the request is rejected with an ENODEV.
This fix is important for maintaining crypto adapters. For
example one LPAR can use a crypto adapter domain ('Control
and Usage') but another LPAR needs to be able to maintain
this adapter domain ('Control'). Scenarios like this did
not work properly and the patch enables this.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Pankaj reports that starting with commit ad428cdb52 "dax: Check the
end of the block-device capacity with dax_direct_access()" device-mapper
no longer allows dax operation. This results from the stricter checks in
__bdev_dax_supported() that validate that the start and end of a
block-device map to the same 'pagemap' instance.
Teach the dax-core and device-mapper to validate the 'pagemap' on a
per-target basis. This is accomplished by refactoring the
bdev_dax_supported() internals into generic_fsdax_supported() which
takes a sector range to validate. Consequently generic_fsdax_supported()
is suitable to be used in a device-mapper ->iterate_devices() callback.
A new ->dax_supported() operation is added to allow composite devices to
split and route upper-level bdev_dax_supported() requests.
Fixes: ad428cdb52 ("dax: Check the end of the block-device...")
Cc: <stable@vger.kernel.org>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reported-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
Tested-by: Pankaj Gupta <pagupta@redhat.com>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
- Enhancements for the QDIO layer
- Remove the RCP trace event
- Avoid three build issues
- Move the defconfig to the configs directory
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJc3lk4AAoJEDjwexyKj9rgPXQH/06jspiNAiz2aZBvAxcfUku+
b1yDNg2EivWYSbUtTPqvI3H5skANvc3N4Dnv6mYUcO/lpIWo6MdXW9lepxEP1yNb
AdvVY/c1yMZjMdi2WYrhC8+mLtoFmgjIPSIBcmSsZ2j+5IwI0bB5UcOjzKlujhAY
E1QGdlVB5hxCI7RZnCGdggqkjAPsVKYNl1zVgX2GO8UwprZbt1IzuWK4NgdmE/bh
8SV1AGOjp/BGXNY+uj06b7DI5K19wwlv9yPdO+cv22HPqLvEvZU9xyIjBYTQy+/0
LmtcZ91DLS7fWM45XE/2T1UPUDTophGsI1jVBLGlTOBYjYSMD2ijYN8MJd0+IlY=
=/28O
-----END PGP SIGNATURE-----
Merge tag 's390-5.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Martin Schwidefsky:
- Enhancements for the QDIO layer
- Remove the RCP trace event
- Avoid three build issues
- Move the defconfig to the configs directory
* tag 's390-5.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: move arch/s390/defconfig to arch/s390/configs/defconfig
s390/qdio: optimize state inspection of HW-owned SBALs
s390/qdio: use get_buf_state() in debug_get_buf_state()
s390/qdio: allow to scan all Output SBALs in one go
s390/cio: Remove tracing for rchp instruction
s390/kasan: adapt disabled_wait usage to avoid build error
latent_entropy: avoid build error when plugin cflags are not set
s390/boot: fix compiler error due to missing awk strtonum
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlzd7PYQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpggWD/46Hmn6FuiXQ30HTJd9WKtJzenAAIdUpjq8
+U985q7vvcqIUotMcG9VUOlCaxk79D5XbptInzLo5CRSn9vMv0sXmAHIFkoj201K
gW3sHqajnWFFj60Eq5IVdHBZekvD8+bBZMvnX+S53QHOfwY+D1Nx/CtjkxNeq+48
98kMA/Q1d87Ied6oMW6Nyc7UEN3SanTnntYRIeSrXOJPiwxVWT6SsPUC01VZcwrt
NSt6IVoW2vFgU0sg8VetzCSfJyTzI0YytjTj/WKGQzuBiKFAvChWrrYZiZ/Z4587
6W4SFR94nYkW5U1BKgrMp64KUEn20m+jk0IHRYApsFwutSBHJCeB9m2sddxur/GQ
G/IyXZxv5jKFNBhUEiSedfml9OF+nBbwJGJCKF64Wnybk/gqFgxM1gzyw4fMAXr+
qYQdETv02W0rDqUG9i3/CaXlN4Lf1IvLR8al4ao0LfDJ0TSXw+UviNsuHEHAv8ey
sioREF8JacSj1q42TsRGckn3k4HVmaGyFwI3ceLT5bRq8VAhJ+cp7WqML1lUEmY0
2iIz+PKPDSyigqrh1wvo8ZqhqHifo+0TbRkCOCi5j+PRX6GiYlrvShGevZXEZPqC
lOFNDgCH3VBTvrcx3j05jJK1qvL4QWAwb/rDUsHZVbsnSVTEHxs/3BsIFQNZpE9/
AoXCH/ye0Q==
=ZKv1
-----END PGP SIGNATURE-----
Merge tag 'for-5.2/block-post-20190516' of git://git.kernel.dk/linux-block
Pull more block updates from Jens Axboe:
"This is mainly some late lightnvm changes that came in just before the
merge window, as well as fixes that have been queued up since the
initial pull request was frozen.
This contains:
- lightnvm changes, fixing race conditions, improving memory
utilization, and improving pblk compatability (Chansol, Igor,
Marcin)
- NVMe pull request with minor fixes all over the map (via Christoph)
- remove redundant error print in sata_rcar (Geert)
- struct_size() cleanup (Jackie)
- dasd CONFIG_LBADF warning fix (Ming)
- brd cond_resched() improvement (Mikulas)"
* tag 'for-5.2/block-post-20190516' of git://git.kernel.dk/linux-block: (41 commits)
block/bio-integrity: use struct_size() in kmalloc()
nvme: validate cntlid during controller initialisation
nvme: change locking for the per-subsystem controller list
nvme: trace all async notice events
nvme: fix typos in nvme status code values
nvme-fabrics: remove unused argument
nvme-multipath: avoid crash on invalid subsystem cntlid enumeration
nvme-fc: use separate work queue to avoid warning
nvme-rdma: remove redundant reference between ib_device and tagset
nvme-pci: mark expected switch fall-through
nvme-pci: add known admin effects to augument admin effects log page
nvme-pci: init shadow doorbell after each reset
brd: add cond_resched to brd_free_pages
sata_rcar: Remove ata_host_alloc() error printing
s390/dasd: fix build warning in dasd_eckd_build_cp_raw
lightnvm: pblk: use nvm_rq_to_ppa_list()
lightnvm: pblk: simplify partial read path
lightnvm: do not remove instance under global lock
lightnvm: track inflight target creations
lightnvm: pblk: recover only written metadata
...
s390 has packed ring support.
several fixes.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJc2y5qAAoJECgfDbjSjVRpfqUIAJfKKzwNm3YQ8zAQuI1dR5FN
xCTO13R+20rFPiDYCmhJVc+zodHlzbdvu+DqqithNJ7ZnwovDkY3YTq6hm8pVtLW
vpVuXVHap1nE8Hztw9/kTDrr4iKs1rV/tlMs57dSvdOBnovoT8VqhQ0qemLY/lI8
CIhOrykO/BYmv2tC4cRUMR5QBpOrm1NyotkWqCrL7Y+3WW21pB0kJp01umLzeGjb
9zhab1VaMxH6m1wQPoYumzduTRdaNJBzHJYnLh7KR+6DTNEgjhn7Kz6ijQbyDOmv
+X+7pe7M8yJMelc/CEjyqbdt0JxEZ6tpgfGvtfzL2BMkAs9Byqonc4mRd/j3Unk=
=StAI
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
- enable packed ring support for s390
- several fixes
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio/s390: enable packed ring
virtio/s390: DMA support for virtio-ccw
virtio/s390: use vring_create_virtqueue
virtio/virtio_ring: do some comment fixes
vhost-scsi: remove incorrect memory barrier
tools/virtio/ringtest: Remove bogus definition of BUG_ON()
virtio_ring: Fix potential mem leak in virtqueue_add_indirect_packed
Nothing precludes to accepting VIRTIO_F_RING_PACKED any more.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Currently virtio-ccw devices do not work if the device has
VIRTIO_F_IOMMU_PLATFORM. In future we do want to support DMA API with
virtio-ccw.
Let us do the plumbing, so the feature VIRTIO_F_IOMMU_PLATFORM works
with virtio-ccw.
Let us also switch from legacy avail/used accessors to the DMA aware
ones (even if it isn't strictly necessary), and remove the legacy
accessors (we were the last users).
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
The commit 2a2d1382fe ("virtio: Add improved queue allocation API")
establishes a new way of allocating virtqueues (as a part of the effort
that taught DMA to virtio rings).
In the future we will want virtio-ccw to use the DMA API as well.
Let us switch from the legacy method of allocating virtqueues to
vring_create_virtqueue() as the first step into that direction.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
When get_buf_states() gets called with count > 1, it scans the
corresponding number of SBAL states until it encounters a mismatch.
But when these SBALs are in a HW-owned state, the callers don't actually
care _how many_ such SBALs are on the queue. If we can't process the
first SBAL, we can't process any of the following SBALs either. So when
the first SBAL is HW-owned, skip the scan of the remaining SBALs and
thus save some CPU time.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
For a 1-SBAL state inspection, use the corresponding helper.
No functional change, just reducing the number of immediate callers to
get_buf_states().
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Old code restricted the number of inspected SBALs to
QDIO_MAX_BUFFERS_PER_Q - 1, as otherwise the first_to_check and
first_to_kick cursors could overlap. Subsequent code would then assume that
there was no progress on the queue, when in fact _all_ SBALs on the queue
were ready-to-process.
This limitation no longer applies, so allow the queue-scan code to inspect
all SBALs on the queue. Note that qeth requires an additional patch
("s390/qeth: stop/wake TX queues based on their fill level"), to avoid
potential queue stalls when all 128 SBALs are reported as ready-to-process.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Since commit d485235b00 "s390: assume diag308 set always works",
the kernel does not use the rchp instruction anymore. So let's
remove the tracing for it.
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
Acked-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Pull networking updates from David Miller:
"Highlights:
1) Support AES128-CCM ciphers in kTLS, from Vakul Garg.
2) Add fib_sync_mem to control the amount of dirty memory we allow to
queue up between synchronize RCU calls, from David Ahern.
3) Make flow classifier more lockless, from Vlad Buslov.
4) Add PHY downshift support to aquantia driver, from Heiner
Kallweit.
5) Add SKB cache for TCP rx and tx, from Eric Dumazet. This reduces
contention on SLAB spinlocks in heavy RPC workloads.
6) Partial GSO offload support in XFRM, from Boris Pismenny.
7) Add fast link down support to ethtool, from Heiner Kallweit.
8) Use siphash for IP ID generator, from Eric Dumazet.
9) Pull nexthops even further out from ipv4/ipv6 routes and FIB
entries, from David Ahern.
10) Move skb->xmit_more into a per-cpu variable, from Florian
Westphal.
11) Improve eBPF verifier speed and increase maximum program size,
from Alexei Starovoitov.
12) Eliminate per-bucket spinlocks in rhashtable, and instead use bit
spinlocks. From Neil Brown.
13) Allow tunneling with GUE encap in ipvs, from Jacky Hu.
14) Improve link partner cap detection in generic PHY code, from
Heiner Kallweit.
15) Add layer 2 encap support to bpf_skb_adjust_room(), from Alan
Maguire.
16) Remove SKB list implementation assumptions in SCTP, your's truly.
17) Various cleanups, optimizations, and simplifications in r8169
driver. From Heiner Kallweit.
18) Add memory accounting on TX and RX path of SCTP, from Xin Long.
19) Switch PHY drivers over to use dynamic featue detection, from
Heiner Kallweit.
20) Support flow steering without masking in dpaa2-eth, from Ioana
Ciocoi.
21) Implement ndo_get_devlink_port in netdevsim driver, from Jiri
Pirko.
22) Increase the strict parsing of current and future netlink
attributes, also export such policies to userspace. From Johannes
Berg.
23) Allow DSA tag drivers to be modular, from Andrew Lunn.
24) Remove legacy DSA probing support, also from Andrew Lunn.
25) Allow ll_temac driver to be used on non-x86 platforms, from Esben
Haabendal.
26) Add a generic tracepoint for TX queue timeouts to ease debugging,
from Cong Wang.
27) More indirect call optimizations, from Paolo Abeni"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1763 commits)
cxgb4: Fix error path in cxgb4_init_module
net: phy: improve pause mode reporting in phy_print_status
dt-bindings: net: Fix a typo in the phy-mode list for ethernet bindings
net: macb: Change interrupt and napi enable order in open
net: ll_temac: Improve error message on error IRQ
net/sched: remove block pointer from common offload structure
net: ethernet: support of_get_mac_address new ERR_PTR error
net: usb: smsc: fix warning reported by kbuild test robot
staging: octeon-ethernet: Fix of_get_mac_address ERR_PTR check
net: dsa: support of_get_mac_address new ERR_PTR error
net: dsa: sja1105: Fix status initialization in sja1105_get_ethtool_stats
vrf: sit mtu should not be updated when vrf netdev is the link
net: dsa: Fix error cleanup path in dsa_init_module
l2tp: Fix possible NULL pointer dereference
taprio: add null check on sched_nest to avoid potential null pointer dereference
net: mvpp2: cls: fix less than zero check on a u32 variable
net_sched: sch_fq: handle non connected flows
net_sched: sch_fq: do not assume EDT packets are ordered
net: hns3: use devm_kcalloc when allocating desc_cb
net: hns3: some cleanup for struct hns3_enet_ring
...
https://lore.kernel.org/linux-fsdevel/CAHk-=wg1tFzcaX2v9Z91vPJiBR486ddW5MtgDL02-fOen2F0Aw@mail.gmail.com/T/#m5b2d9ad3aeacea4bd6aa1964468ac074bf3aa5bf
-----BEGIN PGP SIGNATURE-----
iQJEBAABCgAuFiEECVWwJCUO7/z+QjZbZsp4hBP2dUkFAlzR1UgQHGtpcnJAbmV4
ZWRpLmNvbQAKCRBmyniEE/Z1SZBiEACGw1LzUmjV9eBYFjqaUkgX/Zfcu42D4Ek2
8MuWnNdRabtpGQq0LccYlfoL3yH5xECp14IkCgJvkjqoZ3CcqWcv6uDxf0WtnUqZ
wPx1RYZykb4RZj2A6/ndhInReP4AlXICyTVulKb+BquVkemMvmXX8k+bkr/msKfT
9jdKWFIn+ANNABt3y2D7ywZvs9mkxIx+Fti+tVV4BFBeGfUuj4ArZBOHnngRnIk/
XYlQ7FVzENSPSB+3GvL34jTGEzo8suPHKhHQlIhtcd5hwzVRZKE2sdVXsCc6/WbY
YnT32gmT1/+cUuDl1mZSiQY5R4Xkb07k6/jNrdmjQpwmWbZu90cuRhb+JBXwnmjZ
2Wgy3sfwYISDxtePukg1iYePlHlVlGTYqMo3AQrTBs/gEwCKWrsKQb98mRxlf1YK
e2mdtmq6upYoorLFQesfRgrCg4GTBiPkrR3amXsFgJ2O5fhV6R98ZdGSv4kip19f
ZNoc/t1EtKGwyAJwjINduv36E3RSHODWwSPtSnmSS1ieCGToY1SI3bVUkFM4C0tO
5GMdSugHgXRGGVbTd/VftndJm6Wtj8b1j8c/1Vh04Q8qbKKJDRTDzAbK1v8oLaDh
UXAKMIc8uY4caZy3/bTAB2Ou9dibrSi8Oc+LwZqJlwIcbkwn/IGNvmwtWv4ehorE
N7EhCFZsFQ==
=Mavg
-----END PGP SIGNATURE-----
Merge tag 'stream_open-5.2' of https://lab.nexedi.com/kirr/linux
Pull stream_open conversion from Kirill Smelkov:
- remove unnecessary double nonseekable_open from drivers/char/dtlk.c
as noticed by Pavel Machek while reviewing nonseekable_open ->
stream_open mass conversion.
- the mass conversion patch promised in commit 10dce8af34 ("fs:
stream_open - opener for stream-like files so that read and write can
run simultaneously without deadlock") and is automatically generated
by running
$ make coccicheck MODE=patch COCCI=scripts/coccinelle/api/stream_open.cocci
I've verified each generated change manually - that it is correct to
convert - and each other nonseekable_open instance left - that it is
either not correct to convert there, or that it is not converted due
to current stream_open.cocci limitations. More details on this in the
patch.
- finally, change VFS to pass ppos=NULL into .read/.write for files
that declare themselves streams. It was suggested by Rasmus Villemoes
and makes sure that if ppos starts to be erroneously used in a stream
file, such bug won't go unnoticed and will produce an oops instead of
creating illusion of position change being taken into account.
Note: this patch does not conflict with "fuse: Add FOPEN_STREAM to
use stream_open()" that will be hopefully coming via FUSE tree,
because fs/fuse/ uses new-style .read_iter/.write_iter, and for these
accessors position is still passed as non-pointer kiocb.ki_pos .
* tag 'stream_open-5.2' of https://lab.nexedi.com/kirr/linux:
vfs: pass ppos=NULL to .read()/.write() of FMODE_STREAM files
*: convert stream-like files from nonseekable_open -> stream_open
dtlk: remove double call to nonseekable_open
- Support for kernel address space layout randomization
- Add support for kernel image signature verification
- Convert s390 to the generic get_user_pages_fast code
- Convert s390 to the stack unwind API analog to x86
- Add support for CPU directed interrupts for PCI devices
- Provide support for MIO instructions to the PCI base layer, this
will allow the use of direct PCI mappings in user space code
- Add the basic KVM guest ultravisor interface for protected VMs
- Add AT_HWCAP bits for several new hardware capabilities
- Update the CPU measurement facility counter definitions to SVN 6
- Arnds cleanup patches for his quest to get LLVM compiles working
- A vfio-ccw update with bug fixes and support for halt and clear
- Improvements for the hardware TRNG code
- Another round of cleanup for the QDIO layer
- Numerous cleanups and bug fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJc0CCEAAoJEDjwexyKj9rgjmkH/A3e2drvuP/hSF3xfCKTQFdx
/PoLHQVCqENB3HU3FA/ljoXuG6jMgwj61looqlxBNumXFpIfTg0E1JC5S4wRGJ+K
cOVhIKV53gcuZkRcCJQp0WMnGzpk1Daf7iYXYmAl+7e+mREUPxOuJ0Ei6vXvRGZS
8cQrUCGrtPgkAeLlndypHI2M2TDDGJIMczOGbOZau8+8Lo7Wq9zt5y0h/v0ew37g
ogA0eGh6koU1435dt2pclZRiZ1XOcar3Uin9ioT+RnSgJ4pr1Pza/F6IGO0RdQa+
rva990lqGFp5r9lE4rMCwK9LWb/rfHdVPd35t9XPwphnQ/ORoWUwLk3uc5XOHow=
=dbuy
-----END PGP SIGNATURE-----
Merge tag 's390-5.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
- Support for kernel address space layout randomization
- Add support for kernel image signature verification
- Convert s390 to the generic get_user_pages_fast code
- Convert s390 to the stack unwind API analog to x86
- Add support for CPU directed interrupts for PCI devices
- Provide support for MIO instructions to the PCI base layer, this will
allow the use of direct PCI mappings in user space code
- Add the basic KVM guest ultravisor interface for protected VMs
- Add AT_HWCAP bits for several new hardware capabilities
- Update the CPU measurement facility counter definitions to SVN 6
- Arnds cleanup patches for his quest to get LLVM compiles working
- A vfio-ccw update with bug fixes and support for halt and clear
- Improvements for the hardware TRNG code
- Another round of cleanup for the QDIO layer
- Numerous cleanups and bug fixes
* tag 's390-5.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (98 commits)
s390/vdso: drop unnecessary cc-ldoption
s390: fix clang -Wpointer-sign warnigns in boot code
s390: drop CONFIG_VIRT_TO_BUS
s390: boot, purgatory: pass $(CLANG_FLAGS) where needed
s390: only build for new CPUs with clang
s390: simplify disabled_wait
s390/ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
s390/unwind: introduce stack unwind API
s390/opcodes: add missing instructions to the disassembler
s390/bug: add entry size to the __bug_table section
s390: use proper expoline sections for .dma code
s390/nospec: rename assembler generated expoline thunks
s390: add missing ENDPROC statements to assembler functions
locking/lockdep: check for freed initmem in static_obj()
s390/kernel: add support for kernel address space layout randomization (KASLR)
s390/kernel: introduce .dma sections
s390/sclp: do not use static sccbs
s390/kprobes: use static buffer for insn_page
s390/kernel: convert SYSCALL and PGM_CHECK handlers to .quad
s390/kernel: build a relocatable kernel
...
Using scripts/coccinelle/api/stream_open.cocci added in 10dce8af34
("fs: stream_open - opener for stream-like files so that read and write
can run simultaneously without deadlock"), search and convert to
stream_open all in-kernel nonseekable_open users for which read and
write actually do not depend on ppos and where there is no other methods
in file_operations which assume @offset access.
I've verified each generated change manually - that it is correct to convert -
and each other nonseekable_open instance left - that it is either not correct
to convert there, or that it is not converted due to current stream_open.cocci
limitations. The script also does not convert files that should be valid to
convert, but that currently have .llseek = noop_llseek or generic_file_llseek
for unknown reason despite file being opened with nonseekable_open (e.g.
drivers/input/mousedev.c)
Among cases converted 14 were potentially vulnerable to read vs write deadlock
(see details in 10dce8af34):
drivers/char/pcmcia/cm4000_cs.c:1685:7-23: ERROR: cm4000_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
drivers/gnss/core.c:45:1-17: ERROR: gnss_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
drivers/hid/uhid.c:635:1-17: ERROR: uhid_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
drivers/infiniband/core/user_mad.c:988:1-17: ERROR: umad_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
drivers/input/evdev.c:527:1-17: ERROR: evdev_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
drivers/input/misc/uinput.c:401:1-17: ERROR: uinput_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
drivers/isdn/capi/capi.c:963:8-24: ERROR: capi_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
drivers/leds/uleds.c:77:1-17: ERROR: uleds_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
drivers/media/rc/lirc_dev.c:198:1-17: ERROR: lirc_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
drivers/s390/char/fs3270.c:488:1-17: ERROR: fs3270_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
drivers/usb/misc/ldusb.c:310:1-17: ERROR: ld_usb_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
drivers/xen/evtchn.c:667:8-24: ERROR: evtchn_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
net/batman-adv/icmp_socket.c:80:1-17: ERROR: batadv_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
net/rfkill/core.c:1146:8-24: ERROR: rfkill_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
and the rest were just safe to convert to stream_open because their read and
write do not use ppos at all and corresponding file_operations do not
have methods that assume @offset file access(*):
arch/powerpc/platforms/52xx/mpc52xx_gpt.c:631:8-24: WARNING: mpc52xx_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_ibox_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_ibox_stat_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_mbox_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_mbox_stat_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_wbox_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_wbox_stat_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
arch/um/drivers/harddog_kern.c:88:8-24: WARNING: harddog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
arch/x86/kernel/cpu/microcode/core.c:430:33-49: WARNING: microcode_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/char/ds1620.c:215:8-24: WARNING: ds1620_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/char/dtlk.c:301:1-17: WARNING: dtlk_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
drivers/char/ipmi/ipmi_watchdog.c:840:9-25: WARNING: ipmi_wdog_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
drivers/char/pcmcia/scr24x_cs.c:95:8-24: WARNING: scr24x_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
drivers/char/tb0219.c:246:9-25: WARNING: tb0219_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
drivers/firewire/nosy.c:306:8-24: WARNING: nosy_ops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/hwmon/fschmd.c:840:8-24: WARNING: watchdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/hwmon/w83793.c:1344:8-24: WARNING: watchdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/infiniband/core/ucma.c:1747:8-24: WARNING: ucma_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/infiniband/core/ucm.c:1178:8-24: WARNING: ucm_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/infiniband/core/uverbs_main.c:1086:8-24: WARNING: uverbs_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/input/joydev.c:282:1-17: WARNING: joydev_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/pci/switch/switchtec.c:393:1-17: WARNING: switchtec_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
drivers/platform/chrome/cros_ec_debugfs.c:135:8-24: WARNING: cros_ec_console_log_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/rtc/rtc-ds1374.c:470:9-25: WARNING: ds1374_wdt_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
drivers/rtc/rtc-m41t80.c:805:9-25: WARNING: wdt_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
drivers/s390/char/tape_char.c:293:2-18: WARNING: tape_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
drivers/s390/char/zcore.c:194:8-24: WARNING: zcore_reipl_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/s390/crypto/zcrypt_api.c:528:8-24: WARNING: zcrypt_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
drivers/spi/spidev.c:594:1-17: WARNING: spidev_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
drivers/staging/pi433/pi433_if.c:974:1-17: WARNING: pi433_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/acquirewdt.c:203:8-24: WARNING: acq_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/advantechwdt.c:202:8-24: WARNING: advwdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/alim1535_wdt.c:252:8-24: WARNING: ali_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/alim7101_wdt.c:217:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/ar7_wdt.c:166:8-24: WARNING: ar7_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/at91rm9200_wdt.c:113:8-24: WARNING: at91wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/ath79_wdt.c:135:8-24: WARNING: ath79_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/bcm63xx_wdt.c:119:8-24: WARNING: bcm63xx_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/cpu5wdt.c:143:8-24: WARNING: cpu5wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/cpwd.c:397:8-24: WARNING: cpwd_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/eurotechwdt.c:319:8-24: WARNING: eurwdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/f71808e_wdt.c:528:8-24: WARNING: watchdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/gef_wdt.c:232:8-24: WARNING: gef_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/geodewdt.c:95:8-24: WARNING: geodewdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/ib700wdt.c:241:8-24: WARNING: ibwdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/ibmasr.c:326:8-24: WARNING: asr_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/indydog.c:80:8-24: WARNING: indydog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/intel_scu_watchdog.c:307:8-24: WARNING: intel_scu_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/iop_wdt.c:104:8-24: WARNING: iop_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/it8712f_wdt.c:330:8-24: WARNING: it8712f_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/ixp4xx_wdt.c:68:8-24: WARNING: ixp4xx_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/ks8695_wdt.c:145:8-24: WARNING: ks8695wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/m54xx_wdt.c:88:8-24: WARNING: m54xx_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/machzwd.c:336:8-24: WARNING: zf_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/mixcomwd.c:153:8-24: WARNING: mixcomwd_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/mtx-1_wdt.c:121:8-24: WARNING: mtx1_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/mv64x60_wdt.c:136:8-24: WARNING: mv64x60_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/nuc900_wdt.c:134:8-24: WARNING: nuc900wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/nv_tco.c:164:8-24: WARNING: nv_tco_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/pc87413_wdt.c:289:8-24: WARNING: pc87413_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/pcwd.c:698:8-24: WARNING: pcwd_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/pcwd.c:737:8-24: WARNING: pcwd_temp_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/pcwd_pci.c:581:8-24: WARNING: pcipcwd_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/pcwd_pci.c:623:8-24: WARNING: pcipcwd_temp_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/pcwd_usb.c:488:8-24: WARNING: usb_pcwd_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/pcwd_usb.c:527:8-24: WARNING: usb_pcwd_temperature_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/pika_wdt.c:121:8-24: WARNING: pikawdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/pnx833x_wdt.c:119:8-24: WARNING: pnx833x_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/rc32434_wdt.c:153:8-24: WARNING: rc32434_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/rdc321x_wdt.c:145:8-24: WARNING: rdc321x_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/riowd.c:79:1-17: WARNING: riowd_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/sa1100_wdt.c:62:8-24: WARNING: sa1100dog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/sbc60xxwdt.c:211:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/sbc7240_wdt.c:139:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/sbc8360.c:274:8-24: WARNING: sbc8360_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/sbc_epx_c3.c:81:8-24: WARNING: epx_c3_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/sbc_fitpc2_wdt.c:78:8-24: WARNING: fitpc2_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/sb_wdog.c:108:1-17: WARNING: sbwdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/sc1200wdt.c:181:8-24: WARNING: sc1200wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/sc520_wdt.c:261:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/sch311x_wdt.c:319:8-24: WARNING: sch311x_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/scx200_wdt.c:105:8-24: WARNING: scx200_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/smsc37b787_wdt.c:369:8-24: WARNING: wb_smsc_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/w83877f_wdt.c:227:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/w83977f_wdt.c:301:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/wafer5823wdt.c:200:8-24: WARNING: wafwdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/watchdog_dev.c:828:8-24: WARNING: watchdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/wdrtas.c:379:8-24: WARNING: wdrtas_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/wdrtas.c:445:8-24: WARNING: wdrtas_temp_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/wdt285.c:104:1-17: WARNING: watchdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/wdt977.c:276:8-24: WARNING: wdt977_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/wdt.c:424:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/wdt.c:484:8-24: WARNING: wdt_temp_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/wdt_pci.c:464:8-24: WARNING: wdtpci_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/wdt_pci.c:527:8-24: WARNING: wdtpci_temp_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
net/batman-adv/log.c:105:1-17: WARNING: batadv_log_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
sound/core/control.c:57:7-23: WARNING: snd_ctl_f_ops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
sound/core/rawmidi.c:385:7-23: WARNING: snd_rawmidi_f_ops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
sound/core/seq/seq_clientmgr.c:310:7-23: WARNING: snd_seq_f_ops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
sound/core/timer.c:1428:7-23: WARNING: snd_timer_f_ops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
One can also recheck/review the patch via generating it with explanation comments included via
$ make coccicheck MODE=patch COCCI=scripts/coccinelle/api/stream_open.cocci SPFLAGS="-D explain"
(*) This second group also contains cases with read/write deadlocks that
stream_open.cocci don't yet detect, but which are still valid to convert to
stream_open since ppos is not used. For example drivers/pci/switch/switchtec.c
calls wait_for_completion_interruptible() in its .read, but stream_open.cocci
currently detects only "wait_event*" as blocking.
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Yongzhi Pan <panyongzhi@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Nikolaus Rath <Nikolaus@rath.org>
Cc: Han-Wen Nienhuys <hanwen@google.com>
Cc: Anatolij Gustschin <agust@denx.de>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James R. Van Zandt" <jrv@vanzandt.mv.com>
Cc: Corey Minyard <minyard@acm.org>
Cc: Harald Welte <laforge@gnumonks.org>
Acked-by: Lubomir Rintel <lkundrak@v3.sk> [scr24x_cs]
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: Johan Hovold <johan@kernel.org>
Cc: David Herrmann <dh.herrmann@googlemail.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Cc: Jean Delvare <jdelvare@suse.com>
Acked-by: Guenter Roeck <linux@roeck-us.net> [watchdog/* hwmon/*]
Cc: Rudolf Marek <r.marek@assembler.cz>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
Acked-by: Logan Gunthorpe <logang@deltatee.com> [drivers/pci/switch/switchtec]
Acked-by: Bjorn Helgaas <bhelgaas@google.com> [drivers/pci/switch/switchtec]
Cc: Benson Leung <bleung@chromium.org>
Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> [platform/chrome]
Cc: Alessandro Zummo <a.zummo@towertech.it>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> [rtc/*]
Cc: Mark Brown <broonie@kernel.org>
Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: bcm-kernel-feedback-list@broadcom.com
Cc: Wan ZongShun <mcuos.com@gmail.com>
Cc: Zwane Mwaikambo <zwanem@gmail.com>
Cc: Marek Lindner <mareklindner@neomailbox.ch>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Cc: Antonio Quartulli <a@unstable.cc>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Takashi Iwai <tiwai@suse.com>
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
The arch/s390/boot directory is built with its own set of compiler
options that does not include -Wno-pointer-sign like the rest of
the kernel does, this causes a lot of harmless but correct warnings
when building with clang.
For the atomics, we can add type casts to avoid the warnings, for
everything else the easiest way is to slightly adapt the types
to be more consistent.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The sccbs for init/read/sdias/early have to be located below 2 GB, and
they are currently defined as a static buffer.
With a relocatable kernel that could reside at any place in memory, this
will no longer guarantee the location below 2 GB, so use a dynamic
GFP_DMA allocation instead.
The sclp_early_sccb buffer needs special handling, as it can be used
very early, and by both the decompressor and also the decompressed
kernel. Therefore, a fixed 4 KB buffer is introduced at 0x11000, the
former PARMAREA_END. The new PARMAREA_END is now 0x12000, and it is
renamed to HEAD_END, as it is rather the end of head.S and not the end
of the parmarea.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reviewed-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
ISM devices are special in how they access PCI memory space. Provide
wrappers for handling commands to the device. No functional change.
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This is a preparation patch for usage of new pci instructions.
No functional change.
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Improve /proc/interrupts on s390 to show statistics for individual
MSI interrupts.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Provide the ability to create cachesize aligned interrupt vectors.
These will be used for per-CPU interrupt vectors.
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add an extra parameter for airq handlers to recognize
floating vs. directed interrupts.
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Detect the adapter CPU directed interruption facility.
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Provide an interface for userspace so it can find out if a machine is
capeable of doing secure boot. The interface is, for example, needed for
zipl so it can find out which file format it can/should write to disk.
Signed-off-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When building the L3 HW header for non-IP packets, trust the cast type
that was passed as parameter. qeth_l3_get_cast_type() has most likely
also used h_dest to determine the cast type, so we get the same
result, and can remove that duplicated code.
In the unlikely case that we would get a _different_ cast type, then
that's based off a route lookup and should be considered authoritative.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This de-duplicates the L2 and L3 cast-type code, and makes the L2 code
a bit more robust by removing the fragile assumption that skb->data
always points to the Ethernet Header. This would break in code paths
where we pushed the HW header onto the skb.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The QETH_MAX_BUFFER_ELEMENTS() macro effectively returns a constant
value. To avoid some redundant pointer chasing and computations in the
xmit hot path, cache this value in the queue struct.
Take this as opportunity to shrink some of the queue struct's fields to
their appropriate value range, slightly reducing its total size.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On the first initialization of a queue, its Output Buffers are in a
clean state with no attached resources. On every subsequent
initialization, qeth_l?_stop_card() has previously put them in a clean
state via qeth_drain_output_queues(). So the call to
qeth_clear_output_buffer() is redundant and can be removed.
While at it, move the initialization of the queue's card pointer into
the queue allocation. It never changes afterwards.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have helper macros for all possible device types, replace all
remaining open-coded accesses to the type fields.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We don't keep track of Input Buffer states, so remove the comments that
make it sound like the qeth_qdio_buffer_states enum applies to
Input Buffers.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's unclear what exact purpose this seqno may have served in the past.
But it's certainly no longer used anymore, as the following
napi_gro_receive() will straight away clear this part of the cb again.
Suggested-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
clang produces a harmless warning for each use for the qeth_adp_supported
macro:
drivers/s390/net/qeth_l2_main.c:559:31: warning: implicit conversion from enumeration type 'enum qeth_ipa_setadp_cmd' to
different enumeration type 'enum qeth_ipa_funcs' [-Wenum-conversion]
if (qeth_adp_supported(card, IPA_SETADP_SET_PROMISC_MODE))
~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/s390/net/qeth_core.h:179:41: note: expanded from macro 'qeth_adp_supported'
qeth_is_ipa_supported(&c->options.adp, f)
~~~~~~~~~~~~~~~~~~~~~ ^
Add a version of this macro that uses the correct types, and
remove the unused qeth_adp_enabled() macro that has the same
problem.
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With git commit 1e941d3949
"s390: move ipl block to .boot.preserved.data section" the earl_ipl_block
got renamed to ipl_block and became publicly available via boot_data.h.
This might cause problems with zcore, which has it's own ipl_block
variable. Thus rename the ipl_block in zcore to prevent name collision
and highlight that it's only used locally.
Signed-off-by: Philipp Rudo <prudo@linux.ibm.com>
Fixes: 1e941d3949 ("s390: move ipl block to .boot.preserved.data section")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Pull networking fixes from David Miller:
"Just the usual assortment of small'ish fixes:
1) Conntrack timeout is sometimes not initialized properly, from
Alexander Potapenko.
2) Add a reasonable range limit to tcp_min_rtt_wlen to avoid
undefined behavior. From ZhangXiaoxu.
3) des1 field of descriptor in stmmac driver is initialized with the
wrong variable. From Yue Haibing.
4) Increase mlxsw pci sw reset timeout a little bit more, from Ido
Schimmel.
5) Match IOT2000 stmmac devices more accurately, from Su Bao Cheng.
6) Fallback refcount fix in TLS code, from Jakub Kicinski.
7) Fix max MTU check when using XDP in mlx5, from Maxim Mikityanskiy.
8) Fix recursive locking in team driver, from Hangbin Liu.
9) Fix tls_set_device_offload_Rx() deadlock, from Jakub Kicinski.
10) Don't use napi_alloc_frag() outside of softiq context of socionext
driver, from Ilias Apalodimas.
11) MAC address increment overflow in ncsi, from Tao Ren.
12) Fix a regression in 8K/1M pool switching of RDS, from Zhu Yanjun.
13) ipv4_link_failure has to validate the headers that are actually
there because RAW sockets can pass in arbitrary garbage, from Eric
Dumazet"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
ipv4: add sanity checks in ipv4_link_failure()
net/rose: fix unbound loop in rose_loopback_timer()
rxrpc: fix race condition in rxrpc_input_packet()
net: rds: exchange of 8K and 1M pool
net: vrf: Fix operation not supported when set vrf mac
net/ncsi: handle overflow when incrementing mac address
net: socionext: replace napi_alloc_frag with the netdev variant on init
net: atheros: fix spelling mistake "underun" -> "underrun"
spi: ST ST95HF NFC: declare missing of table
spi: Micrel eth switch: declare missing of table
net: stmmac: move stmmac_check_ether_addr() to driver probe
netfilter: fix nf_l4proto_log_invalid to log invalid packets
netfilter: never get/set skb->tstamp
netfilter: ebtables: CONFIG_COMPAT: drop a bogus WARN_ON
Documentation: decnet: remove reference to CONFIG_DECNET_ROUTE_FWMARK
dt-bindings: add an explanation for internal phy-mode
net/tls: don't leak IV and record seq when offload fails
net/tls: avoid potential deadlock in tls_set_device_offload_rx()
selftests/net: correct the return value for run_afpackettests
team: fix possible recursive locking when add slaves
...
The quiesce function calls cio_cancel_halt_clear() and if we
get an -EBUSY we go into a loop where we:
- wait for any interrupts
- flush all I/O in the workqueue
- retry cio_cancel_halt_clear
During the period where we are waiting for interrupts or
flushing all I/O, the channel subsystem could have completed
a halt/clear action and turned off the corresponding activity
control bits in the subchannel status word. This means the next
time we call cio_cancel_halt_clear(), we will again start by
calling cancel subchannel and so we can be stuck between calling
cancel and halt forever.
Rather than calling cio_cancel_halt_clear() immediately after
waiting, let's try to disable the subchannel. If we succeed in
disabling the subchannel then we know nothing else can happen
with the device.
Suggested-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
Message-Id: <4d5a4b98ab1b41ac6131b5c36de18b76c5d66898.1555449329.git.alifm@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
When releasing the vfio-ccw mdev, we currently do not release
any existing channel program and its pinned pages. This can
lead to the following warning:
[1038876.561565] WARNING: CPU: 2 PID: 144727 at drivers/vfio/vfio_iommu_type1.c:1494 vfio_sanity_check_pfn_list+0x40/0x70 [vfio_iommu_type1]
....
1038876.561921] Call Trace:
[1038876.561935] ([<00000009897fb870>] 0x9897fb870)
[1038876.561949] [<000003ff8013bf62>] vfio_iommu_type1_detach_group+0xda/0x2f0 [vfio_iommu_type1]
[1038876.561965] [<000003ff8007b634>] __vfio_group_unset_container+0x64/0x190 [vfio]
[1038876.561978] [<000003ff8007b87e>] vfio_group_put_external_user+0x26/0x38 [vfio]
[1038876.562024] [<000003ff806fc608>] kvm_vfio_group_put_external_user+0x40/0x60 [kvm]
[1038876.562045] [<000003ff806fcb9e>] kvm_vfio_destroy+0x5e/0xd0 [kvm]
[1038876.562065] [<000003ff806f63fc>] kvm_put_kvm+0x2a4/0x3d0 [kvm]
[1038876.562083] [<000003ff806f655e>] kvm_vm_release+0x36/0x48 [kvm]
[1038876.562098] [<00000000003c2dc4>] __fput+0x144/0x228
[1038876.562113] [<000000000016ee82>] task_work_run+0x8a/0xd8
[1038876.562125] [<000000000014c7a8>] do_exit+0x5d8/0xd90
[1038876.562140] [<000000000014d084>] do_group_exit+0xc4/0xc8
[1038876.562155] [<000000000015c046>] get_signal+0x9ae/0xa68
[1038876.562169] [<0000000000108d66>] do_signal+0x66/0x768
[1038876.562185] [<0000000000b9e37e>] system_call+0x1ea/0x2d8
[1038876.562195] 2 locks held by qemu-system-s39/144727:
[1038876.562205] #0: 00000000537abaf9 (&container->group_lock){++++}, at: __vfio_group_unset_container+0x3c/0x190 [vfio]
[1038876.562230] #1: 00000000670008b5 (&iommu->lock){+.+.}, at: vfio_iommu_type1_detach_group+0x36/0x2f0 [vfio_iommu_type1]
[1038876.562250] Last Breaking-Event-Address:
[1038876.562262] [<000003ff8013aa24>] vfio_sanity_check_pfn_list+0x3c/0x70 [vfio_iommu_type1]
[1038876.562272] irq event stamp: 4236481
[1038876.562287] hardirqs last enabled at (4236489): [<00000000001cee7a>] console_unlock+0x6d2/0x740
[1038876.562299] hardirqs last disabled at (4236496): [<00000000001ce87e>] console_unlock+0xd6/0x740
[1038876.562311] softirqs last enabled at (4234162): [<0000000000b9fa1e>] __do_softirq+0x556/0x598
[1038876.562325] softirqs last disabled at (4234153): [<000000000014e4cc>] irq_exit+0xac/0x108
[1038876.562337] ---[ end trace 6c96d467b1c3ca06 ]---
Similarly we do not free the channel program when we are removing
the vfio-ccw device. Let's fix this by resetting the device and freeing
the channel program and pinned pages in the release path. For the remove
path we can just quiesce the device, since in the remove path the mediated
device is going away for good and so we don't need to do a full reset.
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
Message-Id: <ae9f20dc8873f2027f7b3c5d2aaa0bdfe06850b8.1554756534.git.alifm@linux.ibm.com>
Acked-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Add a region to the vfio-ccw device that can be used to submit
asynchronous I/O instructions. ssch continues to be handled by the
existing I/O region; the new region handles hsch and csch.
Interrupt status continues to be reported through the same channels
as for ssch.
Acked-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
The vfio-ccw code will need this, and it matches treatment of ssch
and csch.
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Allow to extend the regions used by vfio-ccw. The first user will be
handling of halt and clear subchannel.
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Introduce a mutex to disallow concurrent reads or writes to the
I/O region. This makes sure that the data the kernel or user
space see is always consistent.
The same mutex will be used to protect the async region as well.
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
The flow for processing ssch requests can be improved by splitting
the BUSY state:
- CP_PROCESSING: We reject any user space requests while we are in
the process of translating a channel program and submitting it to
the hardware. Use -EAGAIN to signal user space that it should
retry the request.
- CP_PENDING: We have successfully submitted a request with ssch and
are now expecting an interrupt. As we can't handle more than one
channel program being processed, reject any further requests with
-EBUSY. A final interrupt will move us out of this state.
By making this a separate state, we make it possible to issue a
halt or a clear while we're still waiting for the final interrupt
for the ssch (in a follow-on patch).
It also makes a lot of sense not to preemptively filter out writes to
the io_region if we're in an incorrect state: the state machine will
handle this correctly.
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
When we get a solicited interrupt, the start function may have
been cleared by a csch, but we still have a channel program
structure allocated. Make it safe to call the cp accessors in
any case, so we can call them unconditionally.
While at it, also make sure that functions called from other parts
of the code return gracefully if the channel program structure
has not been initialized (even though that is a bug in the caller).
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
- Fix overwrite of the initial ramdisk due to misuse of IS_ENABLED
- Fix integer overflow in the dasd driver resulting in incorrect number
of blocks for large devices
- Fix a lockdep false positive in the 3270 driver
- Fix a deadlock in the zcrypt driver
- Fix incorrect debug feature entries in the pkey api
- Fix inline assembly constraints fallout with CONFIG_KASAN=y
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJcuIwlAAoJEDjwexyKj9rgCJEH/Agbx2sJlxApuKC6BMAjPZCy
dJUu7hmumgtL+QI7IqQMRgfuxiC16xpJQd14GvfNoyjX1NVajepYI2+OUZHj7F6s
La0fq/16fRriVByLiL+pqjV2y0c5kQ8Uhe15vwEvSmr3T/NgPLqUbIe5pwY5KS59
O6B3WpVKeIb8UOUlEAQnezTYj70jWMT783NTyzhPcYLkiN7JJ7vI21FKPRM88ElS
Q7f9qxGPv0hGsOqBXCBWztxgx+ezEHsnlH4t+he0tLOGXuuiy4BrgGtgbJx17frj
xNNe8gOXV83bUbYTBygITnjMjS0m/1viJ4XTBgv/HLGgcKG06n3v6U4+zefB4N4=
=+41g
-----END PGP SIGNATURE-----
Merge tag 's390-5.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 bug fixes from Martin Schwidefsky:
- Fix overwrite of the initial ramdisk due to misuse of IS_ENABLED
- Fix integer overflow in the dasd driver resulting in incorrect number
of blocks for large devices
- Fix a lockdep false positive in the 3270 driver
- Fix a deadlock in the zcrypt driver
- Fix incorrect debug feature entries in the pkey api
- Fix inline assembly constraints fallout with CONFIG_KASAN=y
* tag 's390-5.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: correct some inline assembly constraints
s390/pkey: add one more argument space for debug feature entry
s390/zcrypt: fix possible deadlock situation on ap queue remove
s390/3270: fix lockdep false positive on view->lock
s390/dasd: Fix capacity calculation for large volumes
s390/mem_detect: Use IS_ENABLED(CONFIG_BLK_DEV_INITRD)
qdio.ko offers a small number of high-level functions to drive the
scanning of a QDIO queue for ready-to-process SBALs:
qdio_get_next_buffers(), __[ti]qdio_inbound_processing() and
__qdio_outbound_processing().
Let each of those functions maintain the 'start' index for their current
scan, and pass it to lower-level helpers as needed. This improves the
code's overall layering, and allows us to eliminate the additional
first_to_kick cursor with a follow-on patch.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Refactor all the low-level helpers to take the first_to_check cursor as
parameter, rather than accessing it directly.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
clang points out that the return code from this function is
undefined for one of the error paths:
../drivers/s390/net/ctcm_main.c:1595:7: warning: variable 'result' is used uninitialized whenever 'if' condition is true
[-Wsometimes-uninitialized]
if (priv->channel[direction] == NULL) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/s390/net/ctcm_main.c:1638:9: note: uninitialized use occurs here
return result;
^~~~~~
../drivers/s390/net/ctcm_main.c:1595:3: note: remove the 'if' if its condition is always false
if (priv->channel[direction] == NULL) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/s390/net/ctcm_main.c:1539:12: note: initialize the variable 'result' to silence this warning
int result;
^
Make it return -ENODEV here, as in the related failure cases.
gcc has a known bug in underreporting some of these warnings
when it has already eliminated the assignment of the return code
based on some earlier optimization step.
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current xmit code only stops the txq after attempting to fill an
IO buffer that hasn't been TX-completed yet. In many-connection
scenarios, this can result in frequent rejected TX attempts, requeuing
of skbs with NETDEV_TX_BUSY and extra overhead.
Now that we have a proper 1-to-1 relation between stack-side txqs and
our HW Queues, overhaul the stop/wake logic so that the xmit code
stops the txq as needed.
Given that we might map multiple skbs into a single buffer, it's crucial
to ensure that the queue always provides an _entirely_ empty IO buffer.
Otherwise large skbs (eg TSO) might not fit into the last available
buffer. So whenever qeth_do_send_packet() first utilizes an _empty_
buffer, it updates & checks the used_buffers count.
This now ensures that an skb passed to qeth_xmit() can always be mapped
into an IO buffer, so remove all of the -EBUSY roll-back handling in the
TX path. We preserve the minimal safety-checks ("Is this IO buffer
really available?"), just in case some nasty future bug ever attempts to
corrupt an in-use buffer.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
qeth_get_priority_queue() is no longer used for IQD devices, remove the
special-casing of their mcast queue.
This effectively reverts
commit 70deb01662 ("qeth: omit outbound queue 3 for unicast packets in Priority Queuing on HiperSockets").
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds trivial support for multiple TX queues on OSA-style devices
(both real HW and z/VM NICs). For now we expose the driver's existing
QoS mechanism via .ndo_select_queue, and adjust the number of available
TX queues when qeth_update_from_chp_desc() detects that the
HW configuration has changed.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
qeth has been supporting multiple HW Output Queues for a long time. But
rather than exposing those queues to the stack, it uses its own queue
selection logic in .ndo_start_xmit... with all the drawbacks that
entails.
Start off by switching IQD devices over to a proper mqs net_device,
and converting all the netdev_queue management code.
One oddity with IQD devices is the requirement to place all mcast
traffic on the _highest_ established HW queue. Doing so via
.ndo_select_queue seems straight-forward - but that won't work if only
some of the HW queues are active
(ie. when dev->real_num_tx_queues < dev->num_tx_queues), since
netdev_cap_txqueue() will not allow us to put skbs on the higher queues.
To make this work, we
1. let .ndo_select_queue() map all mcast traffic to netdev_queue 0, and
2. later re-map the netdev_queue and HW queue indices in
.ndo_start_xmit and the TX completion handler.
With this patch we default to a fixed set of 1 ucast and 1 mcast queue.
Support for dynamic reconfiguration is added at a later time.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct netdev_queue contains a counter for tx timeouts, which gets
updated by dev_watchdog(). So let's not attempt to maintain our own
statistics, in particular not by overloading the skb-error counter.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As the documentation for netif_trans_update() says, netdev_start_xmit()
already updates the last-tx time after every good xmit. So don't
duplicate that effort.
One odd case is that qeth_flush_buffers() also gets called from our
TX completion handler, to flush out any partially filled buffer when
we switch the queue to non-packing mode. But as the TX completion
handler will _always_ wake the txq, we don't have to worry about
the TX watchdog there.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Subsequent code relies on the values that qeth_update_from_chp_desc()
reads from the CHP descriptor. Rather than dealing with weird errors
later on, just handle it properly here.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The naming of several QDIO helpers doesn't match their actual
functionality, or the structures they operate on. Clean this up.
s/qeth_alloc_qdio_buffers/qeth_alloc_qdio_queues
s/qeth_free_qdio_buffers/qeth_free_qdio_queues
s/qeth_alloc_qdio_out_buf/qeth_alloc_output_queue
s/qeth_clear_outq_buffers/qeth_drain_output_queue
s/qeth_clear_qdio_buffers/qeth_drain_output_queues
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The debug feature entries have been used with up to 5 arguents
(including the pointer to the format string) but there was only
space reserved for 4 arguemnts. So now the registration does
reserve space for 5 times a long value.
This fixes a sometime appearing weired value as the last
value of an debug feature entry like this:
... pkey_sec2protkey zcrypt_send_cprb (cardnr=10 domain=12)
failed with errno -2143346254
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reported-by: Christian Rund <Christian.Rund@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The 'func_code' variable gets printed in debug statements without
a prior initialization in multiple functions, as reported when building
with clang:
drivers/s390/crypto/zcrypt_api.c:659:6: warning: variable 'func_code' is used uninitialized whenever 'if' condition is true
[-Wsometimes-uninitialized]
if (mex->outputdatalength < mex->inputdatalength) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/s390/crypto/zcrypt_api.c:725:29: note: uninitialized use occurs here
trace_s390_zcrypt_rep(mex, func_code, rc,
^~~~~~~~~
drivers/s390/crypto/zcrypt_api.c:659:2: note: remove the 'if' if its condition is always false
if (mex->outputdatalength < mex->inputdatalength) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/s390/crypto/zcrypt_api.c:654:24: note: initialize the variable 'func_code' to silence this warning
unsigned int func_code;
^
Add initializations to all affected code paths to shut up the warning
and make the warning output consistent.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
clang points out that the declaration of cio_irb does not match the
definition exactly, it is missing the alignment attribute:
../drivers/s390/cio/cio.c:50:1: warning: section does not match previous declaration [-Wsection]
DEFINE_PER_CPU_ALIGNED(struct irb, cio_irb);
^
../include/linux/percpu-defs.h:150:2: note: expanded from macro 'DEFINE_PER_CPU_ALIGNED'
DEFINE_PER_CPU_SECTION(type, name, PER_CPU_ALIGNED_SECTION) \
^
../include/linux/percpu-defs.h:93:9: note: expanded from macro 'DEFINE_PER_CPU_SECTION'
extern __PCPU_ATTRS(sec) __typeof__(type) name; \
^
../include/linux/percpu-defs.h:49:26: note: expanded from macro '__PCPU_ATTRS'
__percpu __attribute__((section(PER_CPU_BASE_SECTION sec))) \
^
../drivers/s390/cio/cio.h:118:1: note: previous attribute is here
DECLARE_PER_CPU(struct irb, cio_irb);
^
../include/linux/percpu-defs.h:111:2: note: expanded from macro 'DECLARE_PER_CPU'
DECLARE_PER_CPU_SECTION(type, name, "")
^
../include/linux/percpu-defs.h:87:9: note: expanded from macro 'DECLARE_PER_CPU_SECTION'
extern __PCPU_ATTRS(sec) __typeof__(type) name
^
../include/linux/percpu-defs.h:49:26: note: expanded from macro '__PCPU_ATTRS'
__percpu __attribute__((section(PER_CPU_BASE_SECTION sec))) \
^
Use DECLARE_PER_CPU_ALIGNED() here, to make the two match.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This cursor is used for debugging only. But since
commit "s390/qdio: pass up count of ready-to-process SBALs" it effectively
duplicates the first_to_check cursor, diverging for just a short moment
when get_*_buffer_frontier() updates q->first_to_check.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When passing a range of ready-to-process SBALs to the upper-layer
driver, use the available 'count' instead of calculating the distance
between the first_to_check and first_to_kick cursors.
This simplifies the logic of the queue-scan path, and opens up the
possibility of scanning all 128 SBALs in one go (as determining the
reported count no longer requires wrap-around safe arithmetic on the
queue's cursors).
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When qdio_{in,out}bound_q_moved() scans a queue for pending work, it
currently only returns a boolean to its caller. The interface to the
upper-layer-drivers (qdio_kick_handler() and qdio_get_next_buffers())
then re-calculates the number of pending SBALs from the
q->first_to_check and q->first_to_kick cursors.
Refactor this so that whenever get_{in,out}bound_buffer_frontier()
adjusted the queue's first_to_check cursor, it also returns the
corresponding count of ready-to-process SBALs (and 0 else).
A subsequent patch will then make use of this additional information.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The DSCI is a 1-byte field, placed at the start of an u32. So when
printing it to a queue's debug state, limit the output to the part
that's actually occupied by the DSCI.
When the DSCI is set this gives us the expected output of '1', rather
than the current (obscure) value of '16777216'.
Suggested-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
With commit 01396a374c ("s390/zcrypt: revisit ap device remove
procedure") the ap queue remove is now a two stage process. However,
a del_timer_sync() call may trigger the timer function which may
try to lock the very same spinlock as is held by the function
just initiating the del_timer_sync() call. This could end up in
a deadlock situation. Very unlikely but possible as you need to
remove an ap queue at the exact sime time when a timeout of a
request occurs.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reported-by: Pierre Morel <pmorel@linux.ibm.com>
Fixes: commit 01396a374c ("s390/zcrypt: revisit ap device remove procedure")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The spinlock in the raw3270_view structure is used by con3270, tty3270
and fs3270 in different ways. For con3270 the lock can be acquired in
irq context, for tty3270 and fs3270 the highest context is bh.
Lockdep sees the view->lock as a single class and if the 3270 driver
is used for the console the following message is generated:
WARNING: inconsistent lock state
5.1.0-rc3-05157-g5c168033979d #12 Not tainted
--------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
swapper/0/1 [HC0[0]:SC1[1]:HE1:SE0] takes:
(____ptrval____) (&(&view->lock)->rlock){?.-.}, at: tty3270_update+0x7c/0x330
Introduce a lockdep subclass for the view lock to distinguish bh from
irq locks.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Minor comment merge conflict in mlx5.
Staging driver has a fixup due to the skb->xmit_more changes
in 'net-next', but was removed in 'net'.
Signed-off-by: David S. Miller <davem@davemloft.net>
13 Fixes, 7 of which are for IBM fibre channel and three additional
for fairly serious bugs in drivers (qla2xxx, mpt3sas, aacraid). Of
the three core fixes, the most significant is probably the missed run
queue causing an indefinite hang. The others are fixing a potential
use after free on device close and silencing an incorrect warning.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXJ5iYiYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishQ4GAP99uutL
UuDJ4pLfcl7N3PgUy1/HtvZ5CXcNGjK3Tu1V7wD9FJ/rC0EKSmc+s01/w51iSytt
/9QaDbK+R/RV6Rg/QJc=
=4AwN
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Thirteen fixes, seven of which are for IBM fibre channel and three
additional for fairly serious bugs in drivers (qla2xxx, mpt3sas,
aacraid).
Of the three core fixes, the most significant is probably the missed
run queue causing an indefinite hang. The others are fixing a
potential use after free on device close and silencing an incorrect
warning"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ibmvfc: Clean up transport events
scsi: ibmvfc: Byte swap status and error codes when logging
scsi: ibmvfc: Add failed PRLI to cmd_status lookup array
scsi: ibmvfc: Remove "failed" from logged errors
scsi: zfcp: reduce flood of fcrscn1 trace records on multi-element RSCN
scsi: zfcp: fix scsi_eh host reset with port_forced ERP for non-NPIV FCP devices
scsi: zfcp: fix rport unblock if deleted SCSI devices on Scsi_Host
scsi: sd: Quiesce warning if device does not report optimal I/O size
scsi: sd: Fix a race between closing an sd device and sd I/O
scsi: core: Run queue when state is set to running after being blocked
scsi: qla4xxx: fix a potential NULL pointer dereference
scsi: aacraid: Insure we don't access PCIe space during AER/EEH
scsi: mpt3sas: Fix kernel panic during expander reset
This helper is not thinint-specific, qdio_get_next_buffers() also calls it
for non-thinint devices. So give it a more fitting name, and while at it
adjust its parameter.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
pci_out_supported() currently takes a single queue as parameter, even
though Output IRQ support is a per-device feature. Adjust the parameter,
so that the macro can also be used in code paths with no access to a queue
struct. This allows us to remove the remaining open-coded checks for
QIB_AC_OUTBOUND_PCI_SUPPORTED.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The DASD driver incorrectly limits the maximum number of blocks of ECKD
DASD volumes to 32 bit numbers. Volumes with a capacity greater than
2^32-1 blocks are incorrectly recognized as smaller volumes.
This results in the following volume capacity limits depending on the
formatted block size:
BLKSIZE MAX_GB MAX_CYL
512 2047 5843492
1024 4095 8676701
2048 8191 13634816
4096 16383 23860929
The same problem occurs when a volume with more than 17895697 cylinders
is accessed in raw-track-access mode.
Fix this problem by adding an explicit type cast when calculating the
maximum number of blocks.
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This converts the IDX code to use qeth_send_control_data(), replacing
a bunch of duplicated IO code and unbounded waits. It also allows the
IDX sequence to benefit from the improved timeout & notify
infrastructure, so that we can eliminate the DOWN -> ACTIVATING -> UP
transition in the channel state machine.
The patch looks rather big, but most of it is a straight-forward
conversion of the old IDX cmd setup & callbacks to the new model.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To avoid concurrency issues, some parts of the cmd setup are delayed
until qeth_send_control_data() holds the IO channel's irq_pending
"lock". Rather than hard-coding those setup steps for each cmd type,
have the cmd provide a callback. This will make it easier to also issue
IDX commands via qeth_send_control_data().
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As trivial cleanup before adding more users to qeth_notify_reply(),
move the setup of reply->rc from the caller into the helper.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current code makes it look like qeth_send_control_data_cb() is some
sort of default callback for all cmds. But in practice, it is only used
for half of the cmd buffers we issue.
Reduce the confusion by only setting this callback for cmds that
actually want it, and while at it give the callback a name that matches
the established naming scheme.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All callers are running in process context now, so we can safely sleep
in qeth_send_control_data() while waiting for a cmd to complete.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All users of the lock are running in process context now.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The inet6addr_chain is atomic. So instead of starting the cmd IO for
SETIP / DELIP straight from the notifier callback, run it from a
workqueue. This is the last step towards removal of cmd IO completion
polling.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extract a little helper, so that high-level callers can manipulate the
IP table without worrying about the locking. This will make it easier
to convert the code to a different locking primitive later on.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The L2 and L3 .ndo_set_rx_mode callbacks maintain an address cache
to decide which addresses have changed since the last modeset.
When the card is set offline, qeth_l?_stop_card() drains this cache.
This happens only after 1) the net_device has been detached, and
2) any pending RX modeset has completed. Consequently we can access the
cache lock-free.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
.ndo_set_rx_mode gets called in process context, but while holding the
addr_list spinlock. Which means we currently can't sleep while
re-programming the HW, and need to poll for IO completion. That's bad,
in particular since receiving the cmd response can fail silently and
we're then polling until the timeout hits.
As a first step towards eliminating the IO completion polling, run the
RX modeset from a work element and only take the addr_list lock while
updating the RX mode address cache.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Fix early free of the channel program in vfio
- On AP device removal make sure that all messages are flushed
with the driver still attached that queued the message
- Limit brk randomization to 32MB to reduce the chance that the
heap of ld.so is placed after the main stack
- Add a rolling average for the steal time of a CPU, this will be
needed for KVM to decide when to do busy waiting
- Fix a warning in the CPU-MF code
- Add a notification handler for AP configuration change to react
faster to new AP devices
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJcnIq7AAoJEDjwexyKj9rgddUH/3VQP6BMvq2fwAsLqx8JeYgT
082xzP2nHli3tO6m8fFHmtqrSg5KTEDfuQVafqp92LeEMKUNWQI6kRu7rXeAVBct
M6hx21mqkm9VNjAlAjSq8IAUXP2K6/K0BMD5mYInYYYVRvJm3on4sHnkEj0kvXbm
OGxwnNBd9UnH5g6ti2vW4cyDvs0aqj1eDbSudy5KedumQz5J2XdFPn4f4Ej6p2+t
nuvlZFDnZ2Z4rliE3RFCuKExZR+YFZgS1urm6pcklncfvbJRsqFJ+nvhurskDUI3
4gOp1Yv1tvGNv/cNVEtnz8g/Kg8/sI7evjQBtxhtEsV/W0sbZPnjCt+28Cf1DN4=
=4nL7
-----END PGP SIGNATURE-----
Merge tag 's390-5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"Improvements and bug fixes for 5.1-rc2:
- Fix early free of the channel program in vfio
- On AP device removal make sure that all messages are flushed with
the driver still attached that queued the message
- Limit brk randomization to 32MB to reduce the chance that the heap
of ld.so is placed after the main stack
- Add a rolling average for the steal time of a CPU, this will be
needed for KVM to decide when to do busy waiting
- Fix a warning in the CPU-MF code
- Add a notification handler for AP configuration change to react
faster to new AP devices"
* tag 's390-5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/cpumf: Fix warning from check_processor_id
zcrypt: handle AP Info notification from CHSC SEI command
vfio: ccw: only free cp on final interrupt
s390/vtime: steal time exponential moving average
s390/zcrypt: revisit ap device remove procedure
s390: limit brk randomization to 32MB
If an incoming ELS of type RSCN contains more than one element, zfcp
suboptimally causes repeated erp trigger NOP trace records for each
previously failed port. These could be ports that went away. It loops over
each RSCN element, and for each of those in an inner loop over all
zfcp_ports.
The trigger to recover failed ports should be just the reception of some
RSCN, no matter how many elements it has. So we can loop over failed ports
separately, and only then loop over each RSCN element to handle the
non-failed ports.
The call chain was:
zfcp_fc_incoming_rscn
for (i = 1; i < no_entries; i++)
_zfcp_fc_incoming_rscn
list_for_each_entry(port, &adapter->port_list, list)
if (masked port->d_id match) zfcp_fc_test_link
if (!port->d_id) zfcp_erp_port_reopen "fcrscn1" <===
In order the reduce the "flooding" of the REC trace area in such cases, we
factor out handling the failed ports to be outside of the entries loop:
zfcp_fc_incoming_rscn
if (no_entries > 1) <===
list_for_each_entry(port, &adapter->port_list, list) <===
if (!port->d_id) zfcp_erp_port_reopen "fcrscn1" <===
for (i = 1; i < no_entries; i++)
_zfcp_fc_incoming_rscn
list_for_each_entry(port, &adapter->port_list, list)
if (masked port->d_id match) zfcp_fc_test_link
Abbreviated example trace records before this code change:
Tag : fcrscn1
WWPN : 0x500507630310d327
ERP want : 0x02
ERP need : 0x02
Tag : fcrscn1
WWPN : 0x500507630310d327
ERP want : 0x02
ERP need : 0x00 NOP => superfluous trace record
The last trace entry repeats if there are more than 2 RSCN elements.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Suppose more than one non-NPIV FCP device is active on the same channel.
Send I/O to storage and have some of the pending I/O run into a SCSI
command timeout, e.g. due to bit errors on the fibre. Now the error
situation stops. However, we saw FCP requests continue to timeout in the
channel. The abort will be successful, but the subsequent TUR fails.
Scsi_eh starts. The LUN reset fails. The target reset fails. The host
reset only did an FCP device recovery. However, for non-NPIV FCP devices,
this does not close and reopen ports on the SAN-side if other non-NPIV FCP
device(s) share the same open ports.
In order to resolve the continuing FCP request timeouts, we need to
explicitly close and reopen ports on the SAN-side.
This was missing since the beginning of zfcp in v2.6.0 history commit
ea127f975424 ("[PATCH] s390 (7/7): zfcp host adapter.").
Note: The FSF requests for forced port reopen could run into FSF request
timeouts due to other reasons. This would trigger an internal FCP device
recovery. Pending forced port reopen recoveries would get dismissed. So
some ports might not get fully reopened during this host reset handler.
However, subsequent I/O would trigger the above described escalation and
eventually all ports would be forced reopen to resolve any continuing FCP
request timeouts due to earlier bit errors.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: <stable@vger.kernel.org> #3.0+
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
An already deleted SCSI device can exist on the Scsi_Host and remain there
because something still holds a reference. A new SCSI device with the same
H:C:T:L and FCP device, target port WWPN, and FCP LUN can be created. When
we try to unblock an rport, we still find the deleted SCSI device and
return early because the zfcp_scsi_dev of that SCSI device is not
ZFCP_STATUS_COMMON_UNBLOCKED. Hence we miss to unblock the rport, even if
the new proper SCSI device would be in good state.
Therefore, skip deleted SCSI devices when iterating the sdevs of the shost.
[cf. __scsi_device_lookup{_by_target}() or scsi_device_get()]
The following abbreviated trace sequence can indicate such problem:
Area : REC
Tag : ersfs_3
LUN : 0x4045400300000000
WWPN : 0x50050763031bd327
LUN status : 0x40000000 not ZFCP_STATUS_COMMON_UNBLOCKED
Ready count : n not incremented yet
Running count : 0x00000000
ERP want : 0x01
ERP need : 0xc1 ZFCP_ERP_ACTION_NONE
Area : REC
Tag : ersfs_3
LUN : 0x4045400300000000
WWPN : 0x50050763031bd327
LUN status : 0x41000000
Ready count : n+1
Running count : 0x00000000
ERP want : 0x01
ERP need : 0x01
...
Area : REC
Level : 4 only with increased trace level
Tag : ertru_l
LUN : 0x4045400300000000
WWPN : 0x50050763031bd327
LUN status : 0x40000000
Request ID : 0x0000000000000000
ERP status : 0x01800000
ERP step : 0x1000
ERP action : 0x01
ERP count : 0x00
NOT followed by a trace record with tag "scpaddy"
for WWPN 0x50050763031bd327.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: 6f2ce1c6af ("scsi: zfcp: fix rport unblock race with LUN recovery")
Cc: <stable@vger.kernel.org> #2.6.32+
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull networking fixes from David Miller:
"Fixes here and there, a couple new device IDs, as usual:
1) Fix BQL race in dpaa2-eth driver, from Ioana Ciornei.
2) Fix 64-bit division in iwlwifi, from Arnd Bergmann.
3) Fix documentation for some eBPF helpers, from Quentin Monnet.
4) Some UAPI bpf header sync with tools, also from Quentin Monnet.
5) Set descriptor ownership bit at the right time for jumbo frames in
stmmac driver, from Aaro Koskinen.
6) Set IFF_UP properly in tun driver, from Eric Dumazet.
7) Fix load/store doubleword instruction generation in powerpc eBPF
JIT, from Naveen N. Rao.
8) nla_nest_start() return value checks all over, from Kangjie Lu.
9) Fix asoc_id handling in SCTP after the SCTP_*_ASSOC changes this
merge window. From Marcelo Ricardo Leitner and Xin Long.
10) Fix memory corruption with large MTUs in stmmac, from Aaro
Koskinen.
11) Do not use ipv4 header for ipv6 flows in TCP and DCCP, from Eric
Dumazet.
12) Fix topology subscription cancellation in tipc, from Erik Hugne.
13) Memory leak in genetlink error path, from Yue Haibing.
14) Valid control actions properly in packet scheduler, from Davide
Caratti.
15) Even if we get EEXIST, we still need to rehash if a shrink was
delayed. From Herbert Xu.
16) Fix interrupt mask handling in interrupt handler of r8169, from
Heiner Kallweit.
17) Fix leak in ehea driver, from Wen Yang"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (168 commits)
dpaa2-eth: fix race condition with bql frame accounting
chelsio: use BUG() instead of BUG_ON(1)
net: devlink: skip info_get op call if it is not defined in dumpit
net: phy: bcm54xx: Encode link speed and activity into LEDs
tipc: change to check tipc_own_id to return in tipc_net_stop
net: usb: aqc111: Extend HWID table by QNAP device
net: sched: Kconfig: update reference link for PIE
net: dsa: qca8k: extend slave-bus implementations
net: dsa: qca8k: remove leftover phy accessors
dt-bindings: net: dsa: qca8k: support internal mdio-bus
dt-bindings: net: dsa: qca8k: fix example
net: phy: don't clear BMCR in genphy_soft_reset
bpf, libbpf: clarify bump in libbpf version info
bpf, libbpf: fix version info and add it to shared object
rxrpc: avoid clang -Wuninitialized warning
tipc: tipc clang warning
net: sched: fix cleanup NULL pointer exception in act_mirr
r8169: fix cable re-plugging issue
net: ethernet: ti: fix possible object reference leak
net: ibm: fix possible object reference leak
...
As part of the TX completion path, qeth_release_skbs() frees the completed
skbs with __skb_queue_purge(). This ends in kfree_skb(), reporting every
completed skb as dropped.
On the other hand when dropping an skb in .ndo_start_xmit, we end up
calling consume_skb()... where we should be using kfree_skb() so that
drop monitors get notified.
Switch the drop/consume logic around, and also don't accumulate dropped
packets in the tx_errors statistics.
Fixes: dc149e3764 ("s390/qeth: replace open-coded skb_queue_walk()")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ucast IP table is utilized by some of the L3-specific sysfs attributes
that qeth_l3_create_device_attributes() provides. So initialize the table
_before_ registering the attributes.
Fixes: ebccc7397e ("s390/qeth: add missing hash table initializations")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The HW trap and VNICC configuration is exposed via sysfs, and may have
already been modified when qeth_l?_probe_device() attempts to initialize
them. So (1) initialize the VNICC values a little earlier, and (2) don't
bother about the HW trap mode, it was already initialized before.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
for 32-bit guests
s390: interrupt cleanup, introduction of the Guest Information Block,
preparation for processor subfunctions in cpu models
PPC: bug fixes and improvements, especially related to machine checks
and protection keys
x86: many, many cleanups, including removing a bunch of MMU code for
unnecessary optimizations; plus AVIC fixes.
Generic: memcg accounting
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJci+7XAAoJEL/70l94x66DUMkIAKvEefhceySHYiTpfefjLjIC
16RewgHa+9CO4Oo5iXiWd90fKxtXLXmxDQOS4VGzN0rxvLGRw/fyXIxL1MDOkaAO
l8SLSNuewY4XBUgISL3PMz123r18DAGOuy9mEcYU/IMesYD2F+wy5lJ17HIGq6X2
RpoF1p3qO1jfkPTKOob6Ixd4H5beJNPKpdth7LY3PJaVhDxgouj32fxnLnATVSnN
gENQ10fnt8BCjshRYW6Z2/9bF15JCkUFR1xdBW2/xh1oj+kvPqqqk2bEN1eVQzUy
2hT/XkwtpthqjSbX8NNavWRSFnOnbMLTRKQyIXmFVsM5VoSrwtiGsCFzBgcT++I=
=XIzU
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"ARM:
- some cleanups
- direct physical timer assignment
- cache sanitization for 32-bit guests
s390:
- interrupt cleanup
- introduction of the Guest Information Block
- preparation for processor subfunctions in cpu models
PPC:
- bug fixes and improvements, especially related to machine checks
and protection keys
x86:
- many, many cleanups, including removing a bunch of MMU code for
unnecessary optimizations
- AVIC fixes
Generic:
- memcg accounting"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (147 commits)
kvm: vmx: fix formatting of a comment
KVM: doc: Document the life cycle of a VM and its resources
MAINTAINERS: Add KVM selftests to existing KVM entry
Revert "KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()"
KVM: PPC: Book3S: Add count cache flush parameters to kvmppc_get_cpu_char()
KVM: PPC: Fix compilation when KVM is not enabled
KVM: Minor cleanups for kvm_main.c
KVM: s390: add debug logging for cpu model subfunctions
KVM: s390: implement subfunction processor calls
arm64: KVM: Fix architecturally invalid reset value for FPEXC32_EL2
KVM: arm/arm64: Remove unused timer variable
KVM: PPC: Book3S: Improve KVM reference counting
KVM: PPC: Book3S HV: Fix build failure without IOMMU support
Revert "KVM: Eliminate extra function calls in kvm_get_dirty_log_protect()"
x86: kvmguest: use TSC clocksource if invariant TSC is exposed
KVM: Never start grow vCPU halt_poll_ns from value below halt_poll_ns_grow_start
KVM: Expose the initial start value in grow_halt_poll_ns() as a module parameter
KVM: grow_halt_poll_ns() should never shrink vCPU halt_poll_ns
KVM: x86/mmu: Consolidate kvm_mmu_zap_all() and kvm_mmu_zap_mmio_sptes()
KVM: x86/mmu: WARN if zapping a MMIO spte results in zapping children
...
The current AP bus implementation periodically polls the AP configuration
to detect changes. When the AP configuration is dynamically changed via the
SE or an SCLP instruction, the changes will not be reflected to sysfs until
the next time the AP configuration is polled. The CHSC architecture
provides a Store Event Information (SEI) command to make notification of an
AP configuration change. This patch introduces a handler to process
notification from the CHSC SEI command by immediately kicking off an AP bus
scan-after-event.
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Sebastian Ott <sebott@linux.ibm.com>
Reviewed-by: Harald Freudenberger <FREUDE@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When we get an interrupt for a channel program, it is not
necessarily the final interrupt; for example, the issuing
guest may request an intermediate interrupt by specifying
the program-controlled-interrupt flag on a ccw.
We must not switch the state to idle if the interrupt is not
yet final; even more importantly, we must not free the translated
channel program if the interrupt is not yet final, or the host
can crash during cp rewind.
Fixes: e5f84dbaea ("vfio: ccw: return I/O results asynchronously")
Cc: stable@vger.kernel.org # v4.12+
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Several fixes, most notably fix for virtio on swiotlb systems.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJcf/Y0AAoJECgfDbjSjVRpzC8H/RG46PnIpTe69jcuaM3zv7es
Tr2GLl65wPV5AZBGMlRjXEoOt6JknWamROhZL7hJ0/17XX4x1mmEQb9mxweE/TDy
yDiNueni+NdFEptzQOoVjZahPXDaGYjuXH+wCvmCscg6N7iSXWqpKG08m+yr3ATF
NBNvB693FLy7B60v4IIHlsYTqoKFeWPYRvE+HIaapTpENodTAjetGpXDIYJhCTRc
6Yh6uNOYlF7XV8gbYzh4U9IcptrLO4Wv1xcEFMbgUoBeHwEMMpO6pLUFgDZttq0v
eT7lxu5Wg73hACOEdS1fb9HREXa4jm3Iu4qgLxEDeze8Y/AqlUdd8CJGBSFC32A=
=1bSe
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
"Several fixes, most notably fix for virtio on swiotlb systems"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost: silence an unused-variable warning
virtio: hint if callbacks surprisingly might sleep
virtio-ccw: wire up ->bus_name callback
s390/virtio: handle find on invalid queue gracefully
virtio-ccw: diag 500 may return a negative cookie
virtio_balloon: remove the unnecessary 0-initialization
virtio-balloon: improve update_balloon_size_func
virtio-blk: Consider virtio_max_dma_size() for maximum segment size
virtio: Introduce virtio_max_dma_size()
dma: Introduce dma_max_mapping_size()
swiotlb: Add is_swiotlb_active() function
swiotlb: Introduce swiotlb_max_mapping_size()
Return the bus id of the ccw proxy device. This makes 'ethtool -i'
show a more useful value than 'virtio' in the bus-info field.
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
A queue with a capacity of zero is clearly not a valid virtio queue.
Some emulators report zero queue size if queried with an invalid queue
index. Instead of crashing in this case let us just return -ENOENT. To
make that work properly, let us fix the notifier cleanup logic as well.
Cc: stable@vger.kernel.org
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Working with the vfio-ap driver let to some revisit of the way
how an ap (queue) device is removed from the driver.
With the current implementation all the cleanup was done before
the driver even got notified about the removal. Now the ap
queue removal is done in 3 steps:
1) A preparation step, all ap messages within the queue
are flushed and so the driver does 'receive' them.
Also a new state AP_STATE_REMOVE assigned to the queue
makes sure there are no new messages queued in.
2) Now the driver's remove function is invoked and the
driver should do the job of cleaning up it's internal
administration lists or whatever. After 2) is done
it is guaranteed, that the driver is not invoked any
more. On the other hand the driver has to make sure
that the APQN is not accessed any more after step 2
is complete.
3) Now the ap bus code does the job of total cleanup of the
APQN. A reset with zero is triggered and the state of
the queue goes to AP_STATE_UNBOUND.
After step 3) is complete, the ap queue has no pending
messages and the APQN is cleared and so there are no
requests and replies lingering around in the firmware
queue for this APQN. Also the interrupts are disabled.
After these remove steps the ap queue device may be assigned
to another driver.
Stress testing this remove/probe procedure showed a problem with the
correct module reference counting. The actual receive of an reply in
the driver is done asynchronous with completions. So with a driver
change on an ap queue the message flush triggers completions but the
threads waiting for the completions may run at a time where the queue
already has the new driver assigned. So the module_put() at receive
time needs to be done on the driver module which queued the ap
message. This change is also part of this patch.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
- A copy of Arnds compat wrapper generation series
- Pass information about the KVM guest to the host in form the control
program code and the control program version code
- Map IOV resources to support PCI physical functions on s390
- Add vector load and store alignment hints to improve performance
- Use the "jdd" constraint with gcc 9 to make jump labels working again
- Remove amode workaround for old z/VM releases from the DCSS code
- Add support for in-kernel performance measurements using the
CPU measurement counter facility
- Introduce a new PMU device cpum_cf_diag to capture counters and
store thenn as event raw data.
- Bug fixes and cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJcfh4QAAoJEDjwexyKj9rgXVAH/RzVbi3vznldujSNfCFTZKPu
EmFFAZIfbhifW3szfylyOJL52pFhxjcWzY0hkFEkbs2t90sn8l1BNkDscYZtfNHC
XvN3N9LsHyxOeyxvQuWLSio58qm+Lr1L0UrIhbMvqyAVkOLmIHvybFwi83OkMptm
djoL8NbuNsAA2s26y2bZLNtU7FmOW5smJIlnt7H4dmK4SFylqZKS/EnUZxGDgn+7
UrrTTOQUir0QZ8vraANsP1M0/LqPcd2YusLmj4jOdZ5Muc2Ch2AA991FofqdKShO
/8cGlsIzwHWGgdnP/YDea5gbetvonayYduixKy3EnYpWQ9iogiBjH4G7QNxcncs=
=v26J
-----END PGP SIGNATURE-----
Merge tag 's390-5.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
- A copy of Arnds compat wrapper generation series
- Pass information about the KVM guest to the host in form the control
program code and the control program version code
- Map IOV resources to support PCI physical functions on s390
- Add vector load and store alignment hints to improve performance
- Use the "jdd" constraint with gcc 9 to make jump labels working again
- Remove amode workaround for old z/VM releases from the DCSS code
- Add support for in-kernel performance measurements using the CPU
measurement counter facility
- Introduce a new PMU device cpum_cf_diag to capture counters and store
thenn as event raw data.
- Bug fixes and cleanups
* tag 's390-5.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (54 commits)
Revert "s390/cpum_cf: Add kernel message exaplanations"
s390/dasd: fix read device characteristic with CONFIG_VMAP_STACK=y
s390/suspend: fix prefix register reset in swsusp_arch_resume
s390: warn about clearing als implied facilities
s390: allow overriding facilities via command line
s390: clean up redundant facilities list setup
s390/als: remove duplicated in-place implementation of stfle
s390/cio: Use cpa range elsewhere within vfio-ccw
s390/cio: Fix vfio-ccw handling of recursive TICs
s390: vfio_ap: link the vfio_ap devices to the vfio_ap bus subsystem
s390/cpum_cf: Handle EBUSY return code from CPU counter facility reservation
s390/cpum_cf: Add kernel message exaplanations
s390/cpum_cf_diag: Add support for s390 counter facility diagnostic trace
s390/cpum_cf: add ctr_stcctm() function
s390/cpum_cf: move common functions into a separate file
s390/cpum_cf: introduce kernel_cpumcf_avail() function
s390/cpu_mf: replace stcctm5() with the stcctm() function
s390/cpu_mf: add store cpu counter multiple instruction support
s390/cpum_cf: Add minimal in-kernel interface for counter measurements
s390/cpum_cf: introduce kernel_cpumcf_alert() to obtain measurement alerts
...
The dasd_eckd_restore_device() function calls dasd_generic_read_dev_chars
with a temporary buffer on the stack. With CONFIG_VMAP_STACK=y this is
a vmalloc address but dasd_generic_restore_device() uses the address of
the buffer as I/O address. Circumvent this by using the already allocated
cqr->data buffer for the RDC data.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Now that qeth always uses dev_close() to shutdown the interface, we can
trust the locking and remove some custom state checks.
qeth_l?_stop_card() is no longer called for a card in UP state, so remove
the checks there too. This basically makes the UP state obsolete, so rip
out the whole thing (except for the sysfs-visible string).
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It makes no difference whether we
1. manually disarm the HW trap and call the offline code with
recovery_mode == 1, or
2. call the offline code with recovery_mode == 0, and let it disarm the
HW trap for us.
So consolidate the two code paths in the suspend callback.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The qeth-wide workqueue is now only used by a single caller to schedule
close_dev work. Just put it on a system queue instead.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The recovery code already runs in a kthread, we don't have to defer the
offlining further.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
smatch complains that __qeth_l3_set_offline() first accesses card->dev,
and then later checks whether the pointer is valid.
Since commit d3d1b205e8 ("s390/qeth: allocate netdevice early"), the
pointer is _always_ valid - that patch merely missed to remove this one
check.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When resetting an interface ("recovery"), qeth currently attempts to
elide the call to dev_close(). We initially only call .ndo_close to
quiesce the data path, and then offline & online the ccwgroup device.
If the reset succeeded, a call to .ndo_open then resumes the data path
along with some internal setup (dev_addr validation, RX modeset) that
dev_open() would have usually triggered.
dev_close() only gets called (via the close_dev worker) if the reset
action fails.
It's unclear whether this was initially done due to locking concerns, or
rather to execute the reset transparently. Either way, temporarily
closing the interface without dev_close() is fragile, and means we're
susceptible to various races and unexpected behaviour. For instance:
- Bypassing dev_deactivate_many() means that the qdiscs are not set to
__QDISC_STATE_DEACTIVATED. Consequently any intermittent TX completion
can wake up the txq, resulting in calls to .ndo_start_xmit while the
data path is down. We have custom state checking to detect this case
and drop such packets.
- Because the IFF_UP flag doesn't reflect the interface's actual state
during a reset, we have custom state checking in .ndo_open and
.ndo_close to guard against invalid calls.
- Considering that the reset might take a considerable amount of time
(in particular if an IO fails and we end up waiting for its timeout), we
_do_ want NETDEV_GOING_DOWN and NETDEV_DOWN events so that components
like bonding, team, bridge, macvlan, vlan, ... can take appropriate
action.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In its attempt to run only the minimal amount of tear down steps,
qeth_l2_stop_card() fails to reset the "is dev_addr registered?" flag
in some rare scenarios. But a future change to the tear down sequence
would cause us to _always_ hit this issue, so patch it up before that
code lands.
Fix it by unconditionally clearing the flag bit. This also allows us to
remove the additional cleanup step in qeth_dev_layer2_store().
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When setting a L2 qeth device online, enable the HW trap as soon as the
control plane is available. This allows us to catch any error that
occurs during the very first commands.
In the same spirit, the offline code should disable the HW trap as the
very first step of its processing.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The offline code uses a specific RECOVER state to indicate that the
interface should be brought up when a qeth device is set online again.
Rather than having a specific card-state for this, just put it in an
internal flag bit and set the state to DOWN. When working with the
card's state transitions, this reduces the complexity quite a bit.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we have a little function to see whether a channel
program address falls within a range of CCWs, let's use
it in the other places of code that make these checks.
(Why isn't ccw_head fully removed? Well, because this
way some longs lines don't have to be reflowed.)
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Message-Id: <20190222183941.29596-3-farman@linux.ibm.com>
Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
The routine ccwchain_calc_length() is tasked with looking at a
channel program, seeing how many CCWs are chained together by
the presence of the Chain-Command flag, and returning a count
to the caller.
Previously, it also considered a Transfer-in-Channel CCW as being
an appropriate mechanism for chaining. The problem at the time
was that the TIC CCW will almost certainly not go to the next CCW
in memory (because the CC flag would be sufficient), and so
advancing to the next 8 bytes will cause us to read potentially
invalid memory. So that comparison was removed, and the target
of the TIC is processed as a new chain.
This is fine when a TIC goes to a new chain (consider a NOP+TIC to
a channel program that is being redriven), but there is another
scenario where this falls apart. A TIC can be used to "rewind"
a channel program, for example to find a particular record on a
disk with various orientation CCWs. In this case, we DO want to
consider the memory after the TIC since the TIC will be skipped
once the requested criteria is met. This is due to the Status
Modifier presented by the device, though software doesn't need to
operate on it beyond understanding the behavior change of how the
channel program is executed.
So to handle this, we will re-introduce the check for a TIC CCW
but limit it by examining the target of the TIC. If the TIC
doesn't go back into the current chain, then current behavior
applies; we should stop counting CCWs and let the target of the
TIC be handled as a new chain. But, if the TIC DOES go back into
the current chain, then we need to keep looking at the memory after
the TIC for when the channel breaks out of the TIC loop. We can't
use tic_target_chain_exists() because the chain in question hasn't
been built yet, so we will redefine that comparison with some small
functions to make it more readable and to permit refactoring later.
Fixes: 405d566f98 ("vfio-ccw: Don't assume there are more ccws after a TIC")
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Message-Id: <20190222183941.29596-2-farman@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
- Clarify KVM related kernel messages
- Interrupt cleanup
- Introduction of the Guest Information Block (GIB)
- Preparation for processor subfunctions in cpu model
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJcb9iJAAoJEONU5rjiOLn4mI8P/1J3IyN3imnZUkTM5JOMWhNM
TSwh5obvn7dT6URZ6BgM1DWIz9E/FKrEb2kU4xr1hwf/a69Q1cYKVmHSnzzIpxHQ
ZNjr7QbcBCsVJ8LtasOoMmgGnVvtBYKKHr4J8UcqeW9raP3YfPJmqyETufiE2lFy
G50r8EBFr9rPh7nK7ImAabKC/7Q/qxZ0729m71cu729/uBb/Wf6frqaDmFlA8362
YZC7KY+xEHZbWqKQqAt/x1TWAOb7nA5dCzemeRckNrs5+FN7rSBrje6SbWApZPfn
weteCVbJMLCoRMUTFjRy3YNz1x0gAC9VQT6Qz5Kz7dColVfJjTPWdYuKpbRsj+n1
PEv1uuDBNbDqdS29KG3Dk9cfzUgAU12g+Xsb+3168HsQbU7XU1v6gCoRaR8ccaoq
3k8Em0xusHa+uGI6K4knKmWboRrCA6FWHIaink4B2K7qIaVdWqTebhHaDiDx8qB8
JRNjxQDho92FpRzxHyajHtamFKPjGT/Guc0yWMIrPHBn97GktUnDD6E5AdhTRVxs
aXTZv7XFq5j307lc3qWsdAf4zGEaPbi9f2nHgFK8hJf+z560CmNbye9Rw6L96Lil
gy0rvSQgN+3xBtSKvq3DNrgoouupOS6kFu5iyYLBS8UUOztXttKEzTCs+M87/3AP
fphwixKEEXsMRWR2SJvG
=YeIb
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-next-5.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next
KVM: s390: Features for 5.1
- Clarify KVM related kernel messages
- Interrupt cleanup
- Introduction of the Guest Information Block (GIB)
- Preparation for processor subfunctions in cpu model
Libudev relies on having a subsystem link for non-root devices. To
avoid libudev (and potentially other userspace tools) choking on the
matrix device let us introduce a matrix bus and with it the matrix
bus subsytem. Also make the matrix device reside within the matrix
bus.
Doing this we remove the forced link from the matrix device to the
vfio_ap driver and the device_type we do not need anymore.
Since the associated matrix driver is not the vfio_ap driver any more,
we have to change the search for the devices on the vfio_ap driver in
the function vfio_ap_verify_queue_reserved.
Fixes: 1fde573413 ("s390: vfio-ap: base implementation of VFIO AP device driver")
Cc: stable@vger.kernel.org
Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When the CCA master key is set twice with the same master key,
then the old and the current master key are the same and thus the
verification patterns are the same, too. The check to report if a
secure key is currently wrapped by the old master key erroneously
reports old mkvp in this case.
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Ingo Franzki <ifranzki@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Prior to dma unmap/free operations the ism driver tries to ensure
that the memory is no longer accessed by the HW. When errors
during deregistration of memory regions from the HW occur the ism
driver will not unmap/free this memory.
When we receive notification from the hypervisor that a PCI function
has been detached we can no longer access the device and would never
unmap/free these memory regions which led to complaints by the DMA
debug API.
Treat this kind of errors during the deregistration of memory regions
from the HW as success since it is already ensured that the memory
is no longer accessed by HW.
Reported-by: Karsten Graul <kgraul@linux.ibm.com>
Reported-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Rather than special-casing OSN in a number of places, just give this
device type its own netdev_ops structure.
When setting up the OSN net_device, also skip the handling of the
various HW offloads (eg TSO). The device shouldn't be advertising any of
them, and the OSN code paths in qeth don't have support for them.
In particular RX VLAN filtering is not supported, so don't hook up those
callbacks in the netdev_ops.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement a trivial callback that exposes the queue sizes.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Accumulate per-TX queue statistics, and increase their size to 64 bit.
Don't bother with enabling/disabling the statistics, the overhead is
negligible.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Counting the number of function calls and the time spent in functions
is best left to proper tracing facilities.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
qeth dynamically allocates an array for storing pointers to its
Output Queue structures. Switch this to a static array - we are
currently limited to 4 Output Queues, so shrinking the qeth_qdio_info
struct by just a few bytes doesn't justify the additional complexity.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Once a qeth ccwgroup device is set online, it's also armed for internal
recovery. So allow for testing that code path via sysfs, regardless of
whether the interface is up or down.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The netfilter conflicts were rather simple overlapping
changes.
However, the cls_tcindex.c stuff was a bit more complex.
On the 'net' side, Cong is fixing several races and memory
leaks. Whilst on the 'net-next' side we have Vlad adding
the rtnl-ness support.
What I've decided to do, in order to resolve this, is revert the
conversion over to using a workqueue that Cong did, bringing us back
to pure RCU. I did it this way because I believe that either Cong's
races don't apply with have Vlad did things, or Cong will have to
implement the race fix slightly differently.
Signed-off-by: David S. Miller <davem@davemloft.net>
When an alternate driver (vfio-ap) has bound an ap queue and this
binding is revised the ap queue device is in an intermittent
state not bound to any driver. The internal state variable
covered this with the state AP_STATE_BORKED which is also used to
reflect broken devices. When now an ap bus scan runs such a
device is destroyed and on the next scan reconstructed.
So a stress test with high frequency switching the queue driver
between the default and the vfio-ap driver hit this gap and the
queue was removed until the next ap bus scan. This fix now
introduces another state for the in-between condition for a queue
momentary not bound to a driver and so the ap bus scan function
skips this device instead of removing it.
Also some very slight but maybe helpful debug feature messages
come with this patch - in particular a message showing that a
broken card/queue device will get removed.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This calls the existing errno translation helpers from the callbacks,
adding trivial wrappers where necessary. For cmds that have no
sophisticated errno translation, default to -EIO.
For IPA cmds with no callback, fall back to a minimal default. This is
currently being used by qeth_l3_send_setrouting().
Thus having all converted all callbacks, remove the legacy path in
qeth_send_control_data_cb().
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
By letting the callbacks deal with error translation, we no longer need
to pass the raw error codes back to the originator. This allows us to
slim down the callback's private data, and nicely simplifies the code.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Error propagation from cmd callbacks currently works in a way where
qeth_send_control_data_cb() picks the raw HW code from the response,
and the cmd's originator later translates this into an errno.
The callback itself only returns 0 ("done") or 1 ("expect more data").
This is
1. limiting, as the only means for the callback to report an internal
error is to invent pseudo HW codes (such as IPA_RC_ENOMEM), that
the originator then needs to understand. For non-IPA callbacks, we
even provide a separate field in the IO buffer metadata (iob->rc) so
the callback can pass back a return value.
2. fragile, as the originator must take care to not translate any errno
that is returned by qeth's own IO code paths (eg -ENOMEM). Also, any
originator that forgets to translate the HW codes potentially passes
garbage back to its caller. For instance, see
commit 2aa4867198 ("s390/qeth: translate SETVLAN/DELVLAN errors").
Introduce a new model where all HW error translation is done within the
callback, and the callback returns
> 0, if it expects more data (as before)
== 0, on success
< 0, with an errno
Start off with converting all callbacks to the new model that either
a) pass back pseudo HW codes, or b) have a dependency on a specific
HW error code. Also convert c) the one callback that uses iob->rc, and
d) qeth_setadpparms_change_macaddr_cb() so that it can pass back an
error back to qeth_l2_request_initial_mac() even when the cmd itself
was successful.
The old model remains supported: if the callback returns 0, we still
propagate the response's HW error code back to the originator.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When sending cmds via qeth_send_control_data(), qeth puts the request
on the IO channel and then blocks on the reply object until the response
has been received.
If the IO completes with error, there will never be a response and we
block until the reply-wait hits its timeout. For this case, connect the
request buffer to its reply object, so that we can immediately cancel
the wait.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current code enqueues & dequeues a reply object from the waiter list
in various places. In particular, the dequeue & enqueue in
qeth_send_control_data_cb() looks fragile - this can cause
qeth_clear_ipacmd_list() to skip the active object.
Add some helpers, and boil the logic down by giving
qeth_send_control_data() the sole responsibility to add and remove
objects.
qeth_send_control_data_cb() and qeth_clear_ipacmd_list() will now only
notify the reply object to interrupt its wait cycle. This can cause
a slight delay in the removal, but that's no concern.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'len' specifies how much data we send to the HW, don't dump beyond this
boundary.
As of today this is no big concern - commands are built in full, zeroed
pages.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
csum offload and TSO have similar programming requirements. The TSO code
was reworked with commit "s390/qeth: enhance TSO control sequence",
adjust the csum control flow accordingly. Primarily this means replacing
custom helpers with more generic infrastructure.
Also, change the LP2LP check so that it warns on TX offload (not RX).
This is where reduced csum capability actually matters.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current code attempts to enable all advertised HW csum offload features.
Future-proof this by enabling only those features that we actually use.
Also, the IPv4 header csum feature is only needed for TX on L3 devices.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code to fill the IPA length fields is duplicated three times across
the driver:
1. qeth_send_ipa_cmd() sets IPA_CMD_LENGTH, which matches the defaults
in the IPA_PDU_HEADER template.
2. for OSN, qeth_osn_send_ipa_cmd() bypasses this logic and inserts the
length passed by the caller.
3. SNMP commands (that can outgrow IPA_CMD_LENGTH) have their own way
of setting the length fields, via qeth_send_ipa_snmp_cmd().
Consolidate this into qeth_prepare_ipa_cmd(), which all originators of
IPA cmds already call during setup of their cmd. Let qeth_send_ipa_cmd()
pull the length from the cmd instead of hard-coding IPA_CMD_LENGTH.
For now, the SNMP code still needs to fix-up its length fields manually.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
qeth_l3_query_arp_cache_info() indicates a data length that's much
larger than the actual length of its request (ie. the value passed to
qeth_get_setassparms_cmd()). The confusion presumably comes from the
fact that the cmd _response_ can be quite large - but that's no concern
for the initial request IO.
Fixing this up allows us to use the generic qeth_send_ipa_cmd()
infrastructure.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Fix specification exception on z196 during ap probe
- A fix for suspend-to-disk, the VMAP stack patch broke the
swsusp_arch_suspend function
- The EMC CKD ioctl of the dasd driver needs an additional size
check for user space data
- Revert an incorrect patch for the PCI base code that removed
a bit lock that turned out to be required after all
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJcYRUOAAoJEDjwexyKj9rgsj0H/0podjAiHUMw2F6ccgN6ZULp
g1zqUbRS4AoYX6v/sqMdCczbCSKwQdRR16juLaZHz1tfTP02KkR9KxVLg1/EmNNT
1NqeGVHrStR5MLFES+DNsxmJEVhye82aeT6Mj4NFvsYJvGr68ha90hMzB9ljc6jf
1BFTWgNOea+iQSZix0D9ALbmsoj4cFuSXG+U4N0D4jf+ZPCYLFXUsvsojtMdallS
WIdKTEj2MPMvmC+WpUY91ClkmXf27dmkoz7v/sRMNyW7n62sfmlA+34gRnAO4HVB
1bbIxJ1ss2ZyYtFL5mYI9ikYKsBL+WStmxl5HkhTXl7cdAtwgSbid9gcXqabi7A=
=k0u1
-----END PGP SIGNATURE-----
Merge tag 's390-5.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 bug fixes from Martin Schwidefsky:
- Fix specification exception on z196 during ap probe
- A fix for suspend-to-disk, the VMAP stack patch broke the
swsusp_arch_suspend function
- The EMC CKD ioctl of the dasd driver needs an additional size check
for user space data
- Revert an incorrect patch for the PCI base code that removed a bit
lock that turned out to be required after all
* tag 's390-5.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
Revert "s390/pci: remove bit_lock usage in interrupt handler"
s390/zcrypt: fix specification exception on z196 during ap probe
s390/dasd: fix using offset into zero size array error
s390/suspend: fix stack setup in swsusp_arch_suspend
An ipvlan bug fix in 'net' conflicted with the abstraction away
of the IPV6 specific support in 'net-next'.
Similarly, a bug fix for mlx5 in 'net' conflicted with the flow
action conversion in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
"This pull request is dedicated to the upcoming snowpocalypse parts 2
and 3 in the Pacific Northwest:
1) Drop profiles are broken because some drivers use dev_kfree_skb*
instead of dev_consume_skb*, from Yang Wei.
2) Fix IWLWIFI kconfig deps, from Luca Coelho.
3) Fix percpu maps updating in bpftool, from Paolo Abeni.
4) Missing station release in batman-adv, from Felix Fietkau.
5) Fix some networking compat ioctl bugs, from Johannes Berg.
6) ucc_geth must reset the BQL queue state when stopping the device,
from Mathias Thore.
7) Several XDP bug fixes in virtio_net from Toshiaki Makita.
8) TSO packets must be sent always on queue 0 in stmmac, from Jose
Abreu.
9) Fix socket refcounting bug in RDS, from Eric Dumazet.
10) Handle sparse cpu allocations in bpf selftests, from Martynas
Pumputis.
11) Make sure mgmt frames have enough tailroom in mac80211, from Felix
Feitkau.
12) Use safe list walking in sctp_sendmsg() asoc list traversal, from
Greg Kroah-Hartman.
13) Make DCCP's ccid_hc_[rt]x_parse_options always check for NULL
ccid, from Eric Dumazet.
14) Need to reload WoL password into bcmsysport device after deep
sleeps, from Florian Fainelli.
15) Remove filter from mask before freeing in cls_flower, from Petr
Machata.
16) Missing release and use after free in error paths of s390 qeth
code, from Julian Wiedmann.
17) Fix lockdep false positive in dsa code, from Marc Zyngier.
18) Fix counting of ATU violations in mv88e6xxx, from Andrew Lunn.
19) Fix EQ firmware assert in qed driver, from Manish Chopra.
20) Don't default Caivum PTP to Y in kconfig, from Bjorn Helgaas"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
net: dsa: b53: Fix for failure when irq is not defined in dt
sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach()
geneve: should not call rt6_lookup() when ipv6 was disabled
net: Don't default Cavium PTP driver to 'y'
net: broadcom: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: via-velocity: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: tehuti: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: sun: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: fsl_ucc_hdlc: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: fec_mpc52xx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: smsc: epic100: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: dscc4: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: tulip: de2104x: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: defxx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net/mlx5e: Don't overwrite pedit action when multiple pedit used
net/mlx5e: Update hw flows when encap source mac changed
qed*: Advance drivers version to 8.37.0.20
qed: Change verbosity for coalescing message.
qede: Fix system crash on configuring channels.
qed: Consider TX tcs while deriving the max num_queues for PF.
...
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEw9DWbcNiT/aowBjO3s9rk8bwL68FAlxYboYSHGNvaHVja0By
ZWRoYXQuY29tAAoJEN7Pa5PG8C+vNa0P/0XfHrBmrbfUUDLKiHje20fv8uGFiLNy
OxiCeXCqpjmYJD69UQyBtBgHksViYCgbQ1TA8Ia7BEbGrXDt8yixr2jZXQ0LMFfS
ZX1A2uvHnKhEY/yTt6gvH3bX8qB43EEZJBbrXmJGRNcZhV6TVoy51hMpaLaETeLO
3fJ9G2mzV/n/wBrPlyWcbrNW4iRzGzIdA6bWTYSgeKE3S/srMqLnCv8EPwDPhQV0
jfg0RYRgwphPi9OJ2bBuqoWExin2ScVFjW2ld+q3Lxuz5TlIu7zCVttIKqV57ubL
MXDabPREMUI/lkp3TAb7q265Bv6iOkwXyFWDQIkBkwxqlK0tgU1mzXeYlTKCVDnz
BOI1zPdU7ilzHClm9LJ9JZ5QKJCtSIwrxXlshdrItoF0dIpmhoyvQU2JltCTXumM
yCF5iUAvK0NQQPzK1pR/EwGd7ZzqFAWglKoTjN8wHgVxusLpJCCd8j/N9UZ0sjT0
djIdgbredP58xKEOGY1Tpa3C/P6lQTmy5a4xI8e8AmTHW2LuvtzKU9AO4xCswzsb
WRVDhTFtsj5LDBZmpKRHOwMbN755mSDXpUnDKGhXsm63WhaGoezTgq9XaW4UpCb7
FcowLgtw2xlWPil8rhNc6738BKWi5KCtVyOT2jlgH3vhRK/H0QGM5pPRCNY2Mtx7
VtQyUVc70JRH
=8LtS
-----END PGP SIGNATURE-----
Merge tag 'vfio-ccw-20190204' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/vfio-ccw into features
Pull vfio-ccw from Cornelia Huck with the following changes:
- A fix in ccw chain processing.
There is no need to use void pointers, all drivers are in agreement
about the underlying data structure of the SBAL arrays.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch implements the Set Guest Information Block operation
to request association or disassociation of a Guest Information
Block (GIB) with the Adapter Interruption Facility. The operation
is required to receive GIB alert interrupts for guest adapters
in conjunction with AIV and GISA.
Signed-off-by: Michael Mueller <mimu@linux.ibm.com>
Reviewed-by: Sebastian Ott <sebott@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20190131085247.13826-9-mimu@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Work for Bridgeport events is currently placed on a driver-wide
workqueue. If the card is removed and freed while any such work is still
active, this causes a use-after-free.
So put the events on a per-card queue, where we can control their
lifetime. As we also don't want stale events to last beyond an
offline & online cycle, flush this queue when setting the card offline.
Fixes: b4d72c08b3 ("qeth: bridgeport support - basic control")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A card's close_dev work is scheduled on a driver-wide workqueue. If the
card is removed and freed while the work is still active, this causes a
use-after-free.
So make sure that the work is completed before freeing the card.
Fixes: 0f54761d16 ("qeth: Support VEPA mode")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The error path in qeth_alloc_qdio_buffers() that takes care of
cleaning up the Output Queues is buggy. It first frees the queue, but
then calls qeth_clear_outq_buffers() with that very queue struct.
Make the call to qeth_clear_outq_buffers() part of the free action
(in the correct order), and while at it fix the naming of the helper.
Fixes: 0da9581ddb ("qeth: exploit asynchronous delivery of storage blocks")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Whenever we fail before/while starting an IO, make sure to release the
IO buffer. Usually qeth_irq() would do this for us, but if the IO
doesn't even start we obviously won't get an interrupt for it either.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When trying to calculate the length of a ccw chain, we assume
there are ccws after a TIC. This can lead to overcounting and
copying garbage data from guest memory.
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
Message-Id: <d63748c1f1b03147bcbf401596638627a5e35ef7.1548082107.git.alifm@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Five minor bug fixes. The libfc one is a tiny memory leak, the zfcp
one is an incorrect user visible parameter and the rest are on error
legs or obscure features.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXFTUrCYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishe/hAQCDic6S
FcD8TPv4jAt3qWun5vVCFAGCEuGet73p89sxpQEA51KcGrFjR9oCtZkd9nIzJL7M
enGaQ0RtT8wPMSUNklE=
=t6p4
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Five minor bug fixes.
The libfc one is a tiny memory leak, the zfcp one is an incorrect user
visible parameter and the rest are on error legs or obscure features"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: 53c700: pass correct "dev" to dma_alloc_attrs()
scsi: bnx2fc: Fix error handling in probe()
scsi: scsi_debug: fix write_same with virtual_gb problem
scsi: libfc: free skb when receiving invalid flogi resp
scsi: zfcp: fix sysfs block queue limit output for max_segment_size
Since v2.6.35 commit 683229845f ("[SCSI] zfcp: Report scatter-gather
limits to SCSI and block layer"), zfcp set dma_parms.max_segment_size ==
PAGE_SIZE (but without using the setter dma_set_max_seg_size()) and
scsi_host_template.dma_boundary == PAGE_SIZE - 1.
v5.0-rc1 commit 50c2e9107f ("scsi: introduce a max_segment_size
host_template parameters") introduced a new field
scsi_host_template.max_segment_size. If an LLDD such as zfcp does not set
it, scsi_host_alloc() uses BLK_MAX_SEGMENT_SIZE = 65536 for
Scsi_Host.max_segment_size. __scsi_init_queue() announced the minimum of
Scsi_Host.max_segment_size and dma_parms.max_segment_size to the block
layer. For zfcp: min(65536, 4096) == 4096 which was still good.
v5.0 commit a8cf59a669 ("scsi: communicate max segment size to the DMA
mapping code") announces Scsi_Host.max_segment_size to the block layer and
overwrites dma_parms.max_segment_size with Scsi_Host.max_segment_size. For
zfcp dma_parms.max_segment_size == Scsi_Host.max_segment_size == 65536
which is also reflected in block queue limits.
$ cd /sys/bus/ccw/drivers/zfcp
$ cd 0.0.3c40/host5/rport-5:0-4/target5:0:4/5:0:4:10/block/sdi/queue
$ cat max_segment_size
65536
Zfcp I/O still works because dma_boundary implicitly still keeps the
effective max segment size <= PAGE_SIZE. However, dma_boundary does not
seem visible to user space, but max_segment_size is visible and shows a
misleading wrong value. Fix it and inherit the stable tag of a8cf59a669.
Devices on our bus ccw support DMA but no DMA mapping. Of multiple device
types on the ccw bus, only zfcp needs dma_parms for SCSI limits. So, leave
dma_parms setup in zfcp and do not move it to the bus.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: 50c2e9107f ("scsi: introduce a max_segment_size host_template parameters")
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The s390x diagnose 318 instruction sets the control program name code (CPNC)
and control program version code (CPVC) to provide useful information
regarding the OS during debugging. The CPNC is explicitly set to 4 to
indicate a Linux/KVM environment.
The CPVC is a 7-byte value containing:
- 3-byte Linux version code, currently set to 0
- 3-byte unique value, currently set to 0
- 1-byte trailing null
Signed-off-by: Collin Walling <walling@linux.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <1544135405-22385-2-git-send-email-walling@linux.ibm.com>
[set version code to 0 until the structure is fully defined]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The older machines don't have the QCI instruction available.
With support for up to 256 crypto cards the probing of each
card has been extended to check card ids from 0 up to 255.
For machines with QCI support there is a filter limiting the
range of probed cards. The older machines (z196 and older)
don't have this filter and so since support for 256 cards is
in the driver all cards are probed. However, these machines
also require to have the card id fit into 6 bits. Exceeding
this limit results in a specification exception which happens
on every kernel startup even when there is no crypto configured
and used at all.
This fix limits the range of probed crypto cards to 64 if
there is no QCI instruction available to obey to the older
ap architecture and so fixes the specification exceptions
on z196 machines.
Cc: stable@vger.kernel.org # v4.17+
Fixes: af4a72276d ("s390/zcrypt: Support up to 256 crypto adapters.")
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Dan Carpenter reported the following:
The patch 52898025cf: "[S390] dasd: security and PSF update patch
for EMC CKD ioctl" from Mar 8, 2010, leads to the following static
checker warning:
drivers/s390/block/dasd_eckd.c:4486 dasd_symm_io()
error: using offset into zero size array 'psf_data[]'
drivers/s390/block/dasd_eckd.c
4458 /* Copy parms from caller */
4459 rc = -EFAULT;
4460 if (copy_from_user(&usrparm, argp, sizeof(usrparm)))
^^^^^^^
The user can specify any "usrparm.psf_data_len". They choose zero by
mistake.
4461 goto out;
4462 if (is_compat_task()) {
4463 /* Make sure pointers are sane even on 31 bit. */
4464 rc = -EINVAL;
4465 if ((usrparm.psf_data >> 32) != 0)
4466 goto out;
4467 if ((usrparm.rssd_result >> 32) != 0)
4468 goto out;
4469 usrparm.psf_data &= 0x7fffffffULL;
4470 usrparm.rssd_result &= 0x7fffffffULL;
4471 }
4472 /* alloc I/O data area */
4473 psf_data = kzalloc(usrparm.psf_data_len, GFP_KERNEL
| GFP_DMA);
4474 rssd_result = kzalloc(usrparm.rssd_result_len, GFP_KERNEL
| GFP_DMA);
4475 if (!psf_data || !rssd_result) {
kzalloc() returns a ZERO_SIZE_PTR (0x16).
4476 rc = -ENOMEM;
4477 goto out_free;
4478 }
4479
4480 /* get syscall header from user space */
4481 rc = -EFAULT;
4482 if (copy_from_user(psf_data,
4483 (void __user *)(unsigned long)
usrparm.psf_data,
4484 usrparm.psf_data_len))
That all works great.
4485 goto out_free;
4486 psf0 = psf_data[0];
4487 psf1 = psf_data[1];
But now we're assuming that "->psf_data_len" was at least 2 bytes.
Fix this by checking the user specified length psf_data_len.
Fixes: 52898025cf ("[S390] dasd: security and PSF update patch for EMC CKD ioctl")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
For recovery purposes, qeth keeps track of all registered VIDs. Replace
this by using the infrastructure introduced in
commit 9daae9bd47 ("net: Call add/kill vid ndo on vlan filter feature toggling").
By managing NETIF_F_HW_VLAN_CTAG_FILTER as a hw_feature,
netdev_update_features() will select it from dev->wanted_features
and replay all of the netdevice's VIDs to its ndo_vlan_rx_add_vid()
callback.
z/VM NICs strictly require VLAN registration, so don't expose it as
hw_feature there but add a little hack in qeth_enable_hw_features()
to make things work regardless.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a qeth card is offline, it has no connection to the HW. So none of
our control callbacks can run IO against it, and we can only cache the
input (eg a new MAC address) without providing proper feedback to the
caller. In this context, it seems much more reasonable to simply detach
the netdevice and let the kernel reject any interaction with it.
This also makes all sorts of internal state checks and locking obsolete.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Re-order the code flow a bit so that all initial HW setup is done before
putting the netdevice into play. For a netdevice that hasn't been
registered before, we also don't need to re-enable its HW features or
check for recovery actions.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At best this is redundant, at worst it papers over a race in the
offline / online code.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 4789a21880 ("s390/qeth: fix race when setting MAC address")
resolved a race where our initial programming of dev_addr into the HW
and a call to ndo_set_mac_address() could run concurrently. In this
case, we could end up getting confused about which address was actually
set in the HW.
The quick fix was to introduce additional locking that blocks any
ndo_set_mac_address() while the device is being set online. But the race
primarily originated from the fact that we first register the netdevice,
and only then program its dev_addr. By re-ordering this sequence,
userspace will only be able to change the MAC address _after_ we have
finished with setting the initial dev_addr.
Still, the same MAC address race can also occur during a subsequent call
to qeth_l2_set_online(). So keep around the locking for now, until a
follow-up patch fully resolves this.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The L2 and L3 code for these ops is almost identical, we only need to
provide a custom ndo_validate_addr() for L2 that checks whether
programming the MAC address succeeded.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
qeth_qdio_cq_handler() doesn't replenish the Output Queue(s), and thus
has no reason to wake the txq.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Consolidate the code that marks the current buffer to be flushed, and
let qeth_fill_buffer() advance the Output Queue's buffer cursor.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Do not claim to run under z/VM if the hypervisor can not be identified
- Fix crashes due to outdated ASCEs in CR1
- Avoid a deadlock in regard to CPU hotplug
- Really fix the vdso mapping issue for compat tasks
- Avoid crash on restart due to an incorrect stack address
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJcSFbpAAoJEDjwexyKj9rg7jgH/17R+vUTBQpK+5EGEJVU6n+D
fKm3kiQA8LRxvEgLCXn7HLVLOt0EdVesTHhk9C7dm8Vya7eCCbNIvnf61b+9CuVt
ezqdDAaQDxO5165WwP7RJU3vVfBHZsTGv7ysI/kjHdlWE4DSeDdHx1C0yE67YSCp
Btl+H3KZx98Ga0Uo0yn8Nmo3D4HFcg9T6KF7OA/3D5jILagbwktKI3+wJcL90LFx
EomvckGc60MOfvP550wIYu3izpCCbFC00Gir/RX0pmstHOwvITKMss7tnizIEvPu
NBKO+CenV0MMDzP8xjhWNcziuq9/OdzSeA1Q2vbnP5GVSe5sIrU6c7+QRXkA6uU=
=EvJV
-----END PGP SIGNATURE-----
Merge tag 's390-5.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
- Do not claim to run under z/VM if the hypervisor can not be
identified
- Fix crashes due to outdated ASCEs in CR1
- Avoid a deadlock in regard to CPU hotplug
- Really fix the vdso mapping issue for compat tasks
- Avoid crash on restart due to an incorrect stack address
* tag 's390-5.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/smp: Fix calling smp_call_ipl_cpu() from ipl CPU
s390/vdso: correct vdso mapping for compat tasks
s390/smp: fix CPU hotplug deadlock with CPU rescan
s390/mm: always force a load of the primary ASCE on context switch
s390/early: improve machine detection
Some vqs may not need to be allocated when their related feature bits
are disabled. So callers may pass in such vqs with "names = NULL".
Then we skip such vq allocations.
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 86a559787e ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
smp_rescan_cpus() is called without the device_hotplug_lock, which can lead
to a dedlock when a new CPU is found and immediately set online by a udev
rule.
This was observed on an older kernel version, where the cpu_hotplug_begin()
loop was still present, and it resulted in hanging chcpu and systemd-udev
processes. This specific deadlock will not show on current kernels. However,
there may be other possible deadlocks, and since smp_rescan_cpus() can still
trigger a CPU hotplug operation, the device_hotplug_lock should be held.
For reference, this was the deadlock with the old cpu_hotplug_begin() loop:
chcpu (rescan) systemd-udevd
echo 1 > /sys/../rescan
-> smp_rescan_cpus()
-> (*) get_online_cpus()
(increases refcount)
-> smp_add_present_cpu()
(new CPU found)
-> register_cpu()
-> device_add()
-> udev "add" event triggered -----------> udev rule sets CPU online
-> echo 1 > /sys/.../online
-> lock_device_hotplug_sysfs()
(this is missing in rescan path)
-> device_online()
-> (**) device_lock(new CPU dev)
-> cpu_up()
-> cpu_hotplug_begin()
(loops until refcount == 0)
-> deadlock with (*)
-> bus_probe_device()
-> device_attach()
-> device_lock(new CPU dev)
-> deadlock with (**)
Fix this by taking the device_hotplug_lock in the CPU rescan path.
Cc: <stable@vger.kernel.org>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
We already need to zero out memory for dma_alloc_coherent(), as such
using dma_zalloc_coherent() is superflous. Phase it out.
This change was generated with the following Coccinelle SmPL patch:
@ replace_dma_zalloc_coherent @
expression dev, size, data, handle, flags;
@@
-dma_zalloc_coherent(dev, size, handle, flags)
+dma_alloc_coherent(dev, size, handle, flags)
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
[hch: re-ran the script on the latest tree]
Signed-off-by: Christoph Hellwig <hch@lst.de>
- A larger update for the zcrypt / AP bus code
+ Update two inline assemblies in the zcrypt driver to make gcc happy
+ Add a missing reply code for invalid special commands for zcrypt
+ Allow AP device reset to be triggered from user space
+ Split the AP scan function into smaller, more readable functions
- Updates for vfio-ccw and vfio-ap
+ Add maintainers and reviewer for vfio-ccw
+ Include facility.h in vfio_ap_drv.c to avoid fragile include chain
+ Simplicy vfio-ccw state machine
- Use the common code version of bust_spinlocks
- Make use of the DEFINE_SHOW_ATTRIBUTE
- Fix three incorrect file permissions in the DASD driver
- Remove bit spin-lock from the PCI interrupt handler
- Fix GFP_ATOMIC vs GFP_KERNEL in the PCI code
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJcLGoIAAoJEDjwexyKj9rgyN8IANaQvHbVBA3vz/Ssb6ZiR/K6
rTBoXjJQqyJ/cf6RZeFi1b4Douv4QWJw3s06KXbrdmK/ONm5rypXVfXlAhY71pg5
40BUb92MGXhJw6JFDQ50Udd6Z5r7r6RYR1puyg4tzHmBuNVL7FB5RqFm92UOkMOD
ZI03G1sfA6/1XUKhNfCfNBB6Jt6V+iAAex8bgrp09wAeoGnAO20oFuis9u7pLlNm
a5Cp9n7faXEN+qes1iBtVDr5o7opuhanwWKnhvsYTAbpOo7jGJ/47IPKT2Wfmurd
wkMZBEC+Ntk/IfkaBzp7azeISZD5EbucTcgo/I9nzq/aWeflfXXeYl7My0aQB48=
=Lqrh
-----END PGP SIGNATURE-----
Merge tag 's390-4.21-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
- A larger update for the zcrypt / AP bus code:
+ Update two inline assemblies in the zcrypt driver to make gcc happy
+ Add a missing reply code for invalid special commands for zcrypt
+ Allow AP device reset to be triggered from user space
+ Split the AP scan function into smaller, more readable functions
- Updates for vfio-ccw and vfio-ap
+ Add maintainers and reviewer for vfio-ccw
+ Include facility.h in vfio_ap_drv.c to avoid fragile include chain
+ Simplicy vfio-ccw state machine
- Use the common code version of bust_spinlocks
- Make use of the DEFINE_SHOW_ATTRIBUTE
- Fix three incorrect file permissions in the DASD driver
- Remove bit spin-lock from the PCI interrupt handler
- Fix GFP_ATOMIC vs GFP_KERNEL in the PCI code
* tag 's390-4.21-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/zcrypt: rework ap scan bus code
s390/zcrypt: make sysfs reset attribute trigger queue reset
s390/pci: fix sleeping in atomic during hotplug
s390/pci: remove bit_lock usage in interrupt handler
s390/drivers: fix proc/debugfs file permissions
s390: convert to DEFINE_SHOW_ATTRIBUTE
MAINTAINERS/vfio-ccw: add Farhan and Eric, make Halil Reviewer
vfio: ccw: Merge BUSY and BOXED states
s390: use common bust_spinlocks()
s390/zcrypt: improve special ap message cmd handling
s390/ap: rework assembler functions to use unions for in/out register variables
s390: vfio-ap: include <asm/facility> for test_facility()
This is mostly update of the usual drivers: smarpqi, lpfc, qedi,
megaraid_sas, libsas, zfcp, mpt3sas, hisi_sas. Additionally, we have
a pile of annotation, unused variable and minor updates. The big API
change is the updates for Christoph's DMA rework which include
removing the DISABLE_CLUSTERING flag. And finally there are a couple
of target tree updates.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXCEUNiYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishdjKAP9vrTTv
qFaYmAoRSbPq9ZiixaXLMy0K/6o76Uay0gnBqgD/fgn3jg/KQ6alNaCjmfeV3wAj
u1j3H7tha9j1it+4pUw=
=GDa+
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This is mostly update of the usual drivers: smarpqi, lpfc, qedi,
megaraid_sas, libsas, zfcp, mpt3sas, hisi_sas.
Additionally, we have a pile of annotation, unused variable and minor
updates.
The big API change is the updates for Christoph's DMA rework which
include removing the DISABLE_CLUSTERING flag.
And finally there are a couple of target tree updates"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (259 commits)
scsi: isci: request: mark expected switch fall-through
scsi: isci: remote_node_context: mark expected switch fall-throughs
scsi: isci: remote_device: Mark expected switch fall-throughs
scsi: isci: phy: Mark expected switch fall-through
scsi: iscsi: Capture iscsi debug messages using tracepoints
scsi: myrb: Mark expected switch fall-throughs
scsi: megaraid: fix out-of-bound array accesses
scsi: mpt3sas: mpt3sas_scsih: Mark expected switch fall-through
scsi: fcoe: remove set but not used variable 'port'
scsi: smartpqi: call pqi_free_interrupts() in pqi_shutdown()
scsi: smartpqi: fix build warnings
scsi: smartpqi: update driver version
scsi: smartpqi: add ofa support
scsi: smartpqi: increase fw status register read timeout
scsi: smartpqi: bump driver version
scsi: smartpqi: add smp_utils support
scsi: smartpqi: correct lun reset issues
scsi: smartpqi: correct volume status
scsi: smartpqi: do not offline disks for transient did no connect conditions
scsi: smartpqi: allow for larger raid maps
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlwb7R8QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpjiID/97oDjMhNT7rwpuMbHw855h62j1hEN/m+N3
FI0uxivYoYZLD+eJRnMcBwHlKjrCX8iJQAcv9ffI3ThtFW7dnZT3atUacaZVR/Dt
IrxdymdBP3qsmuaId5NYBug7rJ+AiqFJKjEvCcSPu5X397J4I3SEbzhfvYLJ/aZX
16o0HJlVVIrcbmq1IP4HwiIIOaKXvPaw04L4z4fpeynRSWG7EAi8NLSnhlR4Rxbb
BTiMkCTsjRCFdyO6da4fvNQKWmPGPa3bJkYy3qR99cvJCeIbQjRyCloQlWNJRRgi
3eJpCHVxqFmN0/+DNTJVQEEr4H8o0AVucrLVct1Jc4pessenkpoUniP8vELqwlng
Z2VHLkhTfCEmvFlk82grrYdNvGATRsrbswt/PlP4T7rBfr1IpDk8kXDWF59EL2dy
ly35Sk3wJGHBl8qa+vEPXOAnaWdqJXuVGpwB4ifOIatOls8mOxwfZjiRc7x05/fC
1O4rR2IfLwRqwoYHs0AJ+h6ohOSn1mkGezl2Tch1VSFcJUOHmuYvraTaUi6hblpA
SslaAoEhO39hRBL0HsvsMeqVWM9uzqvFkLDCfNPdiA81H1258CIbo4vF8z6czCIS
eeXnTJxVhPVbZgb3a1a93SPwM6KIDZFoIijyd+NqjpU94thlnhYD0QEcKJIKH7os
2p4aHs6ktw==
=TRdW
-----END PGP SIGNATURE-----
Merge tag 'for-4.21/block-20181221' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:
"This is the main pull request for block/storage for 4.21.
Larger than usual, it was a busy round with lots of goodies queued up.
Most notable is the removal of the old IO stack, which has been a long
time coming. No new features for a while, everything coming in this
week has all been fixes for things that were previously merged.
This contains:
- Use atomic counters instead of semaphores for mtip32xx (Arnd)
- Cleanup of the mtip32xx request setup (Christoph)
- Fix for circular locking dependency in loop (Jan, Tetsuo)
- bcache (Coly, Guoju, Shenghui)
* Optimizations for writeback caching
* Various fixes and improvements
- nvme (Chaitanya, Christoph, Sagi, Jay, me, Keith)
* host and target support for NVMe over TCP
* Error log page support
* Support for separate read/write/poll queues
* Much improved polling
* discard OOM fallback
* Tracepoint improvements
- lightnvm (Hans, Hua, Igor, Matias, Javier)
* Igor added packed metadata to pblk. Now drives without metadata
per LBA can be used as well.
* Fix from Geert on uninitialized value on chunk metadata reads.
* Fixes from Hans and Javier to pblk recovery and write path.
* Fix from Hua Su to fix a race condition in the pblk recovery
code.
* Scan optimization added to pblk recovery from Zhoujie.
* Small geometry cleanup from me.
- Conversion of the last few drivers that used the legacy path to
blk-mq (me)
- Removal of legacy IO path in SCSI (me, Christoph)
- Removal of legacy IO stack and schedulers (me)
- Support for much better polling, now without interrupts at all.
blk-mq adds support for multiple queue maps, which enables us to
have a map per type. This in turn enables nvme to have separate
completion queues for polling, which can then be interrupt-less.
Also means we're ready for async polled IO, which is hopefully
coming in the next release.
- Killing of (now) unused block exports (Christoph)
- Unification of the blk-rq-qos and blk-wbt wait handling (Josef)
- Support for zoned testing with null_blk (Masato)
- sx8 conversion to per-host tag sets (Christoph)
- IO priority improvements (Damien)
- mq-deadline zoned fix (Damien)
- Ref count blkcg series (Dennis)
- Lots of blk-mq improvements and speedups (me)
- sbitmap scalability improvements (me)
- Make core inflight IO accounting per-cpu (Mikulas)
- Export timeout setting in sysfs (Weiping)
- Cleanup the direct issue path (Jianchao)
- Export blk-wbt internals in block debugfs for easier debugging
(Ming)
- Lots of other fixes and improvements"
* tag 'for-4.21/block-20181221' of git://git.kernel.dk/linux-block: (364 commits)
kyber: use sbitmap add_wait_queue/list_del wait helpers
sbitmap: add helpers for add/del wait queue handling
block: save irq state in blkg_lookup_create()
dm: don't reuse bio for flushes
nvme-pci: trace SQ status on completions
nvme-rdma: implement polling queue map
nvme-fabrics: allow user to pass in nr_poll_queues
nvme-fabrics: allow nvmf_connect_io_queue to poll
nvme-core: optionally poll sync commands
block: make request_to_qc_t public
nvme-tcp: fix spelling mistake "attepmpt" -> "attempt"
nvme-tcp: fix endianess annotations
nvmet-tcp: fix endianess annotations
nvme-pci: refactor nvme_poll_irqdisable to make sparse happy
nvme-pci: only set nr_maps to 2 if poll queues are supported
nvmet: use a macro for default error location
nvmet: fix comparison of a u16 with -1
blk-mq: enable IO poll if .nr_queues of type poll > 0
blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight()
blk-mq: skip zero-queue maps in blk_mq_map_swqueue
...
Most SCSI drivers want to enable "clustering", that is merging of
segments so that they might span more than a single page. Remove the
ENABLE_CLUSTERING define, and require drivers to explicitly set
DISABLE_CLUSTERING to disable this feature.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Rework of the AP bus scan code. The ap_scan_bus() function
is large, so this patch splits the code by introducing a new
new function _ap_scan_bus_adapter() which deals with just
one adapter and thus reduces the scan function code complexity.
Now the AP bus scan can handle a type change of an crypto
adapter on the fly (e.g. from CEX5 to CEX6). This may be
the case with newer versions of zVM where the card may
be pure virtual and a type change is just one click.
However a type or function change requires to unregister
all queue devices and the card device and re-register them.
Comments around the AP bus scan code have been added and/or
improved to provide some hopefully useful hints about what
the code is actually doing.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Until now there is no way to reset a AP queue or card. Driving a card
or queue offline and online again does only toggle the 'software'
online state. The only way to trigger a (hardware) reset is by running
hot-unplug/hot-plug for example on the HMC.
This patch makes the queue reset attribute in sysfs writable.
Writing into this attribute triggers a reset on the AP queue's state
machine. So the AP queue is flushed and state machine runs through the
initial states which cause a reset (PQAP(RAPQ)) and a re-registration
to interrupts (PQAP(AQIC)) if available.
The reset sysfs attribute is writable by root only. So only an
administrator is allowed to initiate a reset of AP queues. Please note
that the queue's counter values are left untouched by the reset.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove write permissions for fops without a write callback.
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
VFIO_CCW_STATE_BOXED and VFIO_CCW_STATE_BUSY have
identical actions for the same events.
Let's merge both into a single state to simplify the code.
We choose to keep VFIO_CCW_STATE_BUSY.
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Message-Id: <1539767923-10539-2-git-send-email-pmorel@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Several conflicts, seemingly all over the place.
I used Stephen Rothwell's sample resolutions for many of these, if not
just to double check my own work, so definitely the credit largely
goes to him.
The NFP conflict consisted of a bug fix (moving operations
past the rhashtable operation) while chaning the initial
argument in the function call in the moved code.
The net/dsa/master.c conflict had to do with a bug fix intermixing of
making dsa_master_set_mtu() static with the fixing of the tagging
attribute location.
cls_flower had a conflict because the dup reject fix from Or
overlapped with the addition of port range classifiction.
__set_phy_supported()'s conflict was relatively easy to resolve
because Andrew fixed it in both trees, so it was just a matter
of taking the net-next copy. Or at least I think it was :-)
Joe Stringer's fix to the handling of netns id 0 in bpf_sk_lookup()
intermixed with changes on how the sdif and caller_net are calculated
in these code paths in net-next.
The remaining BPF conflicts were largely about the addition of the
__bpf_md_ptr stuff in 'net' overlapping with adjustments and additions
to the relevant data structure where the MD pointer macros are used.
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlwNpb0eHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGwGwH/00UHnXfxww3ixxz
zwTVDzptA6SPm6s84yJOWatM5fXhPiAltZaHSYF9lzRzNU71NCq7Frhq3fQUIXKM
OxqDn9nfSTWcjWTk2q5n2keyRV/KIn67YX7UgqFc1bO/mqtVjEgNWaMyblhI+e9E
giu1ZXayHr43jK1cDOmGExZubXUq7Vsc9TOlrd+d2SwIqeEP7TCMrPhnHDwCNvX2
UU5dtANpVzGtHaBcr37wJj+L8kODCc0f+PQ3g2ar5jTHst5SLlHp2u0AMRnUmgdi
VkGx+mu/uk8mtwUqMIMqhplklVoqK6LTeLqsY5Xt32SKruw9UqyJGdphLjW2QP/g
MkmA1lI=
=7kaD
-----END PGP SIGNATURE-----
Merge tag 'v4.20-rc6' into for-4.21/block
Pull in v4.20-rc6 to resolve the conflict in NVMe, but also to get the
two corruption fixes. We're going to be overhauling the direct dispatch
path, and we need to do that on top of the changes we made for that
in mainline.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Complements
v2.6.35 commit 64deb6efdc ("[SCSI] zfcp: Use status_read_buf_num
provided by FCP channel") which replaced the hardcoded 16 with a
variable value
Also complements already existing fixups for above commit
v2.6.35 commit 8d88cf3f3b ("[SCSI] zfcp: Update status read mempool")
v3.10 commit 9edf7d75ee ("[SCSI] zfcp: status read buffers on first adapter open with link down")
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Suppose adapter (open) recovery is between opened QDIO queues and before
(the end of) initial posting of status read buffers (SRBs). This time
window can be seconds long due to FSF_PROT_HOST_CONNECTION_INITIALIZING
causing by design looping with exponential increase sleeps in the function
performing exchange config data during recovery
[zfcp_erp_adapter_strat_fsf_xconf()]. Recovery triggered by local link up.
Suppose an event occurs for which the FCP channel would send an unsolicited
notification to zfcp by means of a previously posted SRB. We saw it with
local cable pull (link down) in multi-initiator zoning with multiple
NPIV-enabled subchannels of the same shared FCP channel.
As soon as zfcp_erp_adapter_strategy_open_fsf() starts posting the initial
status read buffers from within the adapter's ERP thread, the channel does
send an unsolicited notification.
Since v2.6.27 commit d26ab06ede ("[SCSI] zfcp: receiving an unsolicted
status can lead to I/O stall"), zfcp_fsf_status_read_handler() schedules
adapter->stat_work to re-fill the just consumed SRB from a work item.
Now the ERP thread and the work item post SRBs in parallel. Both contexts
call the helper function zfcp_status_read_refill(). The tracking of
missing (to be posted / re-filled) SRBs is not thread-safe due to separate
atomic_read() and atomic_dec(), in order to depend on posting
success. Hence, both contexts can see
atomic_read(&adapter->stat_miss) == 1. One of the two contexts posts
one too many SRB. Zfcp gets QDIO_ERROR_SLSB_STATE on the output queue
(trace tag "qdireq1") leading to zfcp_erp_adapter_shutdown() in
zfcp_qdio_handler_error().
An obvious and seemingly clean fix would be to schedule stat_work from the
ERP thread and wait for it to finish. This would serialize all SRB
re-fills. However, we already have another work item wait on the ERP
thread: adapter->scan_work runs zfcp_fc_scan_ports() which calls
zfcp_fc_eval_gpn_ft(). The latter calls zfcp_erp_wait() to wait for all the
open port recoveries during zfcp auto port scan, but in fact it waits for
any pending recovery including an adapter recovery. This approach leads to
a deadlock. [see also v3.19 commit 18f87a67e6 ("zfcp: auto port scan
resiliency"); v2.6.37 commit d3e1088d68
("[SCSI] zfcp: No ERP escalation on gpn_ft eval");
v2.6.28 commit fca55b6fb5
("[SCSI] zfcp: fix deadlock between wq triggered port scan and ERP")
fixing v2.6.27 commit c57a39a45a
("[SCSI] zfcp: wait until adapter is finished with ERP during auto-port");
v2.6.27 commit cc8c282963
("[SCSI] zfcp: Automatically attach remote ports")]
Instead make the accounting of missing SRBs atomic for parallel execution
in both the ERP thread and adapter->stat_work.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: d26ab06ede ("[SCSI] zfcp: receiving an unsolicted status can lead to I/O stall")
Cc: <stable@vger.kernel.org> #2.6.27+
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Introduce separate zfcp module parameters to individually select support
for: DIF which should work (zfcp.dif, which used to be DIF+DIX, disabled)
or DIX+DIF which can cause trouble (zfcp.dix, new, disabled).
If DIX is enabled, we warn on zfcp driver initialization. As before, this
also reduces the maximum I/O request size to half, to support the worst
case of merged single sector requests with one protection data scatter
gather element per sector. This can impact the maximum throughput.
In DIF-only mode (zfcp.dif=1 zfcp.dix=0), we can use the full maximum I/O
request size as there is no protection data for zfcp.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Fedor Loshakov <loshakov@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In order to pass extack together with NETDEV_PRE_UP notifications, it's
necessary to route the extack to __dev_open() from diverse (possibly
indirect) callers. One prominent API through which the notification is
invoked is dev_open().
Therefore extend dev_open() with and extra extack argument and update
all users. Most of the calls end up just encoding NULL, but bond and
team drivers have the extack readily available.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While ccw_io_helper() seems like intended to be exclusive in a sense that
it is supposed to facilitate I/O for at most one thread at any given
time, there is actually nothing ensuring that threads won't pile up at
vcdev->wait_q. If they do, all threads get woken up and see the status
that belongs to some other request than their own. This can lead to bugs.
For an example see:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1788432
This race normally does not cause any problems. The operations provided
by struct virtio_config_ops are usually invoked in a well defined
sequence, normally don't fail, and are normally used quite infrequent
too.
Yet, if some of the these operations are directly triggered via sysfs
attributes, like in the case described by the referenced bug, userspace
is given an opportunity to force races by increasing the frequency of the
given operations.
Let us fix the problem by ensuring, that for each device, we finish
processing the previous request before starting with a new one.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Reported-by: Colin Ian King <colin.king@canonical.com>
Cc: stable@vger.kernel.org
Message-Id: <20180925121309.58524-3-pasic@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Currently we have a race on vcdev->config in virtio_ccw_get_config() and
in virtio_ccw_set_config().
This normally does not cause problems, as these are usually infrequent
operations. However, for some devices writing to/reading from the config
space can be triggered through sysfs attributes. For these, userspace can
force the race by increasing the frequency.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Cc: stable@vger.kernel.org
Message-Id: <20180925121309.58524-2-pasic@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlwEZdIeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGAlQH/19oax2Za3IPqF4X
DM3lal5M6zlUVkoYstqzpbR3MqUwgEnMfvoeMDC6mI9N4/+r2LkV7cRR8HzqQCCS
jDfD69IzRGb52VSeJmbOrkxBWsR1Nn0t4Z3rEeLPxwaOoNpRc8H973MbAQ2FKMpY
S4Y3jIK1dNiRRxdh52NupVkQF+djAUwkBuVk/rrvRJmTDij4la03cuCDAO+Di9lt
GHlVvygKw2SJhDR+z3ArwZNmE0ceCcE6+W7zPHzj2KeWuKrZg22kfUD454f2YEIw
FG0hu9qecgtpYCkLSm2vr4jQzmpsDoyq3ZfwhjGrP4qtvPC3Db3vL3dbQnkzUcJu
JtwhVCE=
=O1q1
-----END PGP SIGNATURE-----
Merge tag 'v4.20-rc5' into for-4.21/block
Pull in v4.20-rc5, solving a conflict we'll otherwise get in aio.c and
also getting the merge fix that went into mainline that users are
hitting testing for-4.21/block and/or for-next.
* tag 'v4.20-rc5': (664 commits)
Linux 4.20-rc5
PCI: Fix incorrect value returned from pcie_get_speed_cap()
MAINTAINERS: Update linux-mips mailing list address
ocfs2: fix potential use after free
mm/khugepaged: fix the xas_create_range() error path
mm/khugepaged: collapse_shmem() do not crash on Compound
mm/khugepaged: collapse_shmem() without freezing new_page
mm/khugepaged: minor reorderings in collapse_shmem()
mm/khugepaged: collapse_shmem() remember to clear holes
mm/khugepaged: fix crashes due to misaccounted holes
mm/khugepaged: collapse_shmem() stop if punched or truncated
mm/huge_memory: fix lockdep complaint on 32-bit i_size_read()
mm/huge_memory: splitting set mapping+index before unfreeze
mm/huge_memory: rename freeze_page() to unmap_page()
initramfs: clean old path before creating a hardlink
kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace
psi: make disabling/enabling easier for vendor kernels
proc: fixup map_files test on arm
debugobjects: avoid recursive calls with kmemleak
userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set
...
There exist very few ap messages which need to have the 'special' flag
enabled. This flag tells the firmware layer to do some pre- and maybe
postprocessing. However, it may happen that this special flag is
enabled but the firmware is unable to deal with this kind of message
and thus returns with reply code 0x41. For example older firmware may
not know the newest messages triggered by the zcrypt device driver and
thus react with reject and the named reply code. Unfortunately this
reply code is not known to the zcrypt error routines and thus default
behavior is to switch the ap queue offline.
This patch now makes the ap error routine aware of the reply code and
so userspace is informed about the bad processing result but the queue
is not switched to offline state any more.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The driver uses test_facility(), but does not include the
corresponding include file explicitly. The driver currently builds
only thanks to the following include chain:
vfio_ap_drv.c
<linux/module.h>
<linux/elf.h>
<asm/elf.h>
<linux/compat.h>
<asm/uaccess.h>
<asm/facility.h>
Files should not rely on such fragile implicit includes.
Signed-off-by: Petr Tesarik <ptesarik@suse.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
- Add two missing kfree calls on error paths in the vfio-ccw code
- Make sure that all data structures of a mediated vfio-ccw device are
initialized before registering it
- Fix a sparse warning in vfio-ccw
- A followup patch for the pgtable_bytes accounting, the page table
downgrade for compat processes missed a mm_dec_nr_pmds()
- Reject sampling requests in the PMU init function of the CPU
measurement counter facility
- With the vfio AP driver an AP queue needs to be reset on every device
probe as the alternative driver could have modified the device state
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJb/9clAAoJEDjwexyKj9rgDFIIAMifOGVivY5Dq+PbS2YfVJAS
QVMN6PZEyOZaLHk9rTxIFTAwaZ6i4i0vw01UOiE91VJkDz49wHenDiwaNELtVQrs
57FkzbEuAhjKJ6DaWIr7bWgeaBFUlC+Z4mNxXFaNI2olzNncHhDJME7lPI/XoBvu
zPlWBvliiNiaHAzLBDLw+zx1wyxyYjV+B8ZZ9HtH0xzHm3EOiRWpVsr2TbwePEmd
yC6fTYoDUSJHZF9eoVZaJ0p1wSa8d6NfTE7a5TZOIv7yZQHWo0/tyN0wzyoWb0Oo
7volx9/4BjbP7fBIkBN6LMPDNH8itTmrx6wdhekf+tElBj1iOOi27l8rcyTby00=
=AgxV
-----END PGP SIGNATURE-----
Merge tag 's390-4.20-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
- Add two missing kfree calls on error paths in the vfio-ccw code
- Make sure that all data structures of a mediated vfio-ccw device are
initialized before registering it
- Fix a sparse warning in vfio-ccw
- A followup patch for the pgtable_bytes accounting, the page table
downgrade for compat processes missed a mm_dec_nr_pmds()
- Reject sampling requests in the PMU init function of the CPU
measurement counter facility
- With the vfio AP driver an AP queue needs to be reset on every device
probe as the alternative driver could have modified the device state
* tag 's390-4.20-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/mm: correct pgtable_bytes on page table downgrade
s390/zcrypt: reinit ap queue state machine during device probe
s390/cpum_cf: Reject request for sampling in event initialization
s390/cio: Fix cleanup when unsupported IDA format is used
s390/cio: Fix cleanup of pfn_array alloc failure
vfio: ccw: Register mediated device once all structures are initialized
s390/cio: make vfio_ccw_io_region static
Trivial conflict in net/core/filter.c, a locally computed
'sdif' is now an argument to the function.
Signed-off-by: David S. Miller <davem@davemloft.net>
The response for a SNMP request can consist of multiple parts, which
the cmd callback stages into a kernel buffer until all parts have been
received. If the callback detects that the staging buffer provides
insufficient space, it bails out with error.
This processing is buggy for the first part of the response - while it
initially checks for a length of 'data_len', it later copies an
additional amount of 'offsetof(struct qeth_snmp_cmd, data)' bytes.
Fix the calculation of 'data_len' for the first part of the response.
This also nicely cleans up the memcpy code.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Until the vfio-ap driver came into live there was a well known
agreement about the way how ap devices are initialized and their
states when the driver's probe function is called.
However, the vfio device driver when receiving an ap queue device does
additional resets thereby removing the registration for interrupts for
the ap device done by the ap bus core code. So when later the vfio
driver releases the device and one of the default zcrypt drivers takes
care of the device the interrupt registration needs to get
renewed. The current code does no renew and result is that requests
send into such a queue will never see a reply processed - the
application hangs.
This patch adds a function which resets the aq queue state machine for
the ap queue device and triggers the walk through the initial states
(which are reset and registration for interrupts). This function is
now called before the driver's probe function is invoked.
When the association between driver and device is released, the
driver's remove function is called. The current implementation calls a
ap queue function ap_queue_remove(). This invokation has been moved to
the ap bus function to make the probe / remove pair for ap bus and
drivers more symmetric.
Fixes: 7e0bdbe5c2 ("s390/zcrypt: AP bus support for alternate driver(s)")
Cc: stable@vger.kernel.org # 4.19+
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewd-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewd-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Currently, ccw, vop and remoteproc need some legacy virtio
APIs to create or access virtio rings, which are not supported
by packed ring. So disable packed ring on these transports
for now.
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This was introduced with v2.6.27 commit 287ac01acf ("[SCSI] zfcp: Cleanup
code in zfcp_erp.c") but would now suppress helpful -Wswitch compiler
warnings when building with W=1 such as the following forced example:
drivers/s390/scsi/zfcp_erp.c: In function 'zfcp_erp_setup_act':
drivers/s390/scsi/zfcp_erp.c:220:2: warning: enumeration value 'ZFCP_ERP_ACTION_REOPEN_PORT' not handled in switch [-Wswitch]
switch (need) {
^~~~~~
But then again, only with W=1 we would notice unhandled enum cases.
Without the default cases and a missed unhandled enum case, the code might
perform unforeseen things we might not want...
As of today, we never run through the removed default case, so removing it
is no functional change. In the future, we never should run through a
default case but introduce the necessary specific case(s) to handle new
functionality.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This was introduced with v4.18 commit 8c3d20aada ("scsi: zfcp: fix
missing REC trigger trace for all objects in ERP_FAILED") but would now
suppress helpful -Wswitch compiler warnings when building with W=1 such as
the following forced example:
drivers/s390/scsi/zfcp_erp.c: In function 'zfcp_erp_handle_failed':
drivers/s390/scsi/zfcp_erp.c:126:2: warning: enumeration value 'ZFCP_ERP_ACTION_REOPEN_PORT_FORCED' not handled in switch [-Wswitch]
switch (want) {
^~~~~~
But then again, only with W=1 we would notice unhandled enum cases.
Without the default cases and a missed unhandled enum case, the code might
perform unforeseen things we might not want...
As of today, we never run through the removed default case, so removing it
is no functional change. In the future, we never should run through a
default case but introduce the necessary specific case(s) to handle new
functionality.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For some reason the already existing substring "fall through" in the
comment is not sufficient for GCC to silence -Wimplicit-fallthrough.
CC [M] drivers/s390/scsi/zfcp_erp.o
drivers/s390/scsi/zfcp_erp.c: In function 'zfcp_erp_lun_strategy':
drivers/s390/scsi/zfcp_erp.c:1065:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (atomic_read(&zfcp_sdev->status) & ZFCP_STATUS_COMMON_OPEN)
^
drivers/s390/scsi/zfcp_erp.c:1068:2: note: here
case ZFCP_ERP_STEP_LUN_CLOSING:
^~~~
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Improve whatever the following simple invocation reported:
$ ./scripts/kernel-doc -none drivers/s390/scsi/*.h
While at it, improve some related kdoc,
including struct zfcp_fsf_ct_els in zfcp_fsf.h.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>