Each bnxt_napi structure may no longer be having both an rx ring and
a tx ring. Check for a valid ring before using it.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, an rx and a tx ring are always paired with a completion ring.
We want to restructure it so that it is possible to have a dedicated
completion ring for tx or rx only.
The bnxt hardware uses a completion ring for rx and tx events. The driver
has to process the completion ring entries sequentially for the rx and tx
events. Using a dedicated completion ring for rx only or tx only has these
benefits:
1. A burst of rx packets can cause delay in processing tx events if the
completion ring is shared. If tx queue is stopped by BQL, this can cause
delay in re-starting the tx queue.
2. A completion ring is sized according to the rx and tx ring size rounded
up to the nearest power of 2. When the completion ring is shared, it is
sized by adding the rx and tx ring sizes and then rounded to the next power
of 2, often with a lot of wasted space.
3. Using dedicated completion ring, we can adjust the tx and rx coalescing
parameters independently for rx and tx.
The first step is to separate the rx and tx ring structures from the
bnxt_napi struct.
In this patch, an rx ring and a tx ring will point to the same bnxt_napi
struct to share the same completion ring. No change in ring assignment
and mapping yet.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
By adding 3 separate functions to dump the different ring states.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
HiSilicon host bridge driver
Fix 32-bit config reads (Dongdong Liu)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWhZsJAAoJEFmIoMA60/r8JO8P/2zpQEewcPoFqtPFoAXXNVCg
vRnUlSVs5kz3cj3gUjbSjannWJZXFvsFgz/V+f3nxaOSLel550ccphhdS+oyh3L+
NVeQka8nnIbsVmGvNmebxNteBXa2CTGlZB4snRHQw+n1XjacqPQOMeccN09jCXmK
GBdJvs1Xs2rphGHq52cLkkqUdSCEayUiYK/4WgAzcBe8EFy5kWvbObcoBuX9/3Lm
fjnoWPXYSZFr+uyW8Q5+MztrpXJeOZV/krRZjcH2NxnLr1Xs+PnrC/NNu3BZvKnH
qGyLc3vMIpeYS2VGiwJDKzmahyKm4Elh1iJNoywHIGPf3o0WzjJgnsiryZWomytd
nVueiL8Oy0wUxoLupnFGdBIgbNvBeQSdeqcrXzjRfYHdHn3iakQTarUpVjqDUOEW
4iO4R+Xohq6X4Yhdr9RFxg2tCLk4dJebvwRNSGwTPmDnPZqzoQmg5uK84R1QrlD7
BM/ggHPryOogmeCqr7wCifkl73pMcvlK7maKUZcTgBz1E9aCeaGbz7Nc3KhUxSYV
jvP84dEBx0QN5M3523sn/TNRZsAztUaBgGJLwuLetPazOgGORZD2msMqpTCr1a3J
4TQjadvc5RWG4MBOeU9r2WdQeZdSwj/X41XVLVh3qCZaYQCz8aBGMb/PGdgWz2kZ
cBunX4VY1+S/EHu4abuw
=5MNY
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.4-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI bugfix from Bjorn Helgaas:
"Here's another fix for v4.4.
This fixes 32-bit config reads for the HiSilicon driver. Obviously
the driver is completely broken without this fix (apparently it
actually was tested internally, but got broken somehow in the process
of upstreaming it).
Summary:
HiSilicon host bridge driver
Fix 32-bit config reads (Dongdong Liu)"
* tag 'pci-v4.4-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: hisi: Fix hisi_pcie_cfg_read() 32-bit reads
Pull sparc fixes from David Miller:
"Just some missing syscall wire ups"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: Wire up mlock2 system call.
sparc: Add all necessary direct socket system calls.
Pull networking fixes from David Miller:
1) Prevent XFRM per-cpu counter updates for one namespace from being
applied to another namespace. Fix from DanS treetman.
2) Fix RCU de-reference in iwl_mvm_get_key_sta_id(), from Johannes
Berg.
3) Remove ethernet header assumption in nft_do_chain_netdev(), from
Pablo Neira Ayuso.
4) Fix cpsw PHY ident with multiple slaves and fixed-phy, from Pascal
Speck.
5) Fix use after free in sixpack_close and mkiss_close.
6) Fix VXLAN fw assertion on bnx2x, from Yuval Mintz.
7) natsemi doesn't check for DMA mapping errors, from Alexey
Khoroshilov.
8) Fix inverted test in ip6addrlbl_get(), from ANdrey Ryabinin.
9) Missing initialization of needed_headroom in geneve tunnel driver,
from Paolo Abeni.
10) Fix conntrack template leak in openvswitch, from Joe Stringer.
11) Mission initialization of wq->flags in sock_alloc_inode(), from
Nicolai Stange.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits)
sctp: sctp should release assoc when sctp_make_abort_user return NULL in sctp_close
net, socket, socket_wq: fix missing initialization of flags
drivers: net: cpsw: fix error return code
openvswitch: Fix template leak in error cases.
sctp: label accepted/peeled off sockets
sctp: use GFP_USER for user-controlled kmalloc
qlcnic: fix a loop exit condition better
net: cdc_ncm: avoid changing RX/TX buffers on MTU changes
geneve: initialize needed_headroom
ipv6: honor ifindex in case we receive ll addresses in router advertisements
addrconf: always initialize sysctl table data
ipv6/addrlabel: fix ip6addrlbl_get()
switchdev: bridge: Pass ageing time as clock_t instead of jiffies
sh_eth: fix 16-bit descriptor field access endianness too
veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
net: usb: cdc_ncm: Adding Dell DW5813 LTE AT&T Mobile Broadband Card
net: usb: cdc_ncm: Adding Dell DW5812 LTE Verizon Mobile Broadband Card
natsemi: add checks for dma mapping errors
rhashtable: Kill harmless RCU warning in rhashtable_walk_init
openvswitch: correct encoding of set tunnel action attributes
...
The GLIBC folks would like to eliminate socketcall support
eventually, and this makes sense regardless so wire them
all up.
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg says:
====================
pull request: bluetooth-next 2015-12-31
Here's (probably) the last bluetooth-next pull request for the 4.5
kernel:
- Add support for BCM2E65 ACPI ID
- Minor fixes/cleanups in the bcm203x & bfusb drivers
- Minor debugfs related fix in 6lowpan code
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn says:
====================
Ethtool support for phy stats
This patchset add ethtool support for reading statistics from the PHY.
The Marvell and Micrel Phys are then extended to report receiver
packet errors and idle errors.
v2:
Fix linking when phylib is not enabled.
v3:
Inline helpers into ethtool.c, so fixing when phylib is a module.
v4:
Add missing static
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The PHY counters receiver errors and errors while idle.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PHY counters receiver errors and errors while idle.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ethernet PHYs can maintain statistics, for example errors while idle
and receive errors. Add an ethtool mechanism to retrieve these
statistics, using the same model as MAC statistics.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In sctp_close, sctp_make_abort_user may return NULL because of memory
allocation failure. If this happens, it will bypass any state change
and never free the assoc. The assoc has no chance to be freed and it
will be kept in memory with the state it had even after the socket is
closed by sctp_close().
So if sctp_make_abort_user fails to allocate memory, we should abort
the asoc via sctp_primitive_ABORT as well. Just like the annotation in
sctp_sf_cookie_wait_prm_abort and sctp_sf_do_9_1_prm_abort said,
"Even if we can't send the ABORT due to low memory delete the TCB.
This is a departure from our typical NOMEM handling".
But then the chunk is NULL (low memory) and the SCTP_CMD_REPLY cmd would
dereference the chunk pointer, and system crash. So we should add
SCTP_CMD_REPLY cmd only when the chunk is not NULL, just like other
places where it adds SCTP_CMD_REPLY cmd.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit ceb5d58b21 ("net: fix sock_wake_async() rcu protection") from
the current 4.4 release cycle introduced a new flags member in
struct socket_wq and moved SOCKWQ_ASYNC_NOSPACE and SOCKWQ_ASYNC_WAITDATA
from struct socket's flags member into that new place.
Unfortunately, the new flags field is never initialized properly, at least
not for the struct socket_wq instance created in sock_alloc_inode().
One particular issue I encountered because of this is that my GNU Emacs
failed to draw anything on my desktop -- i.e. what I got is a transparent
window, including the title bar. Bisection lead to the commit mentioned
above and further investigation by means of strace told me that Emacs
is indeed speaking to my Xorg through an O_ASYNC AF_UNIX socket. This is
reproducible 100% of times and the fact that properly initializing the
struct socket_wq ->flags fixes the issue leads me to the conclusion that
somehow SOCKWQ_ASYNC_WAITDATA got set in the uninitialized ->flags,
preventing my Emacs from receiving any SIGIO's due to data becoming
available and it got stuck.
Make sock_alloc_inode() set the newly created struct socket_wq's ->flags
member to zero.
Fixes: ceb5d58b21 ("net: fix sock_wake_async() rcu protection")
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
10GbE Intel Wired LAN Driver Updates 2015-12-29
This series contains updates to ixgbe and ixgbevf.
William Dauchy provides a fix for ixgbevf that was implemented for ixgbe,
commit 5d6002b7b8 ("ixgbe: Fix handling of NAPI budget when multiple
queues are enabled per vector"). The issue was that the polling routine
would increase the budget for receive to at least 1 per queue if multiple
queues were present, which resulted in receive packets being processed
when the budget was 0.
Emil provides minor cleanups for ixgbevf, one being that we need to
check rx_itr_setting with == and not &, since it is not a mask. Added
QSFP PHY support in ixgbe to allow for more accurate reporting of port
settings. Fixed the max RSS limit for X550 which is 63, not 64.
Veola fixes ixgbe ethtool reporting of backplane type interfaces as
1000/10000baseT link modes, instead, report the media as KR, KX or KX4
based on the backplane interface present.
Mark cleans up redundancy in the setting of hw_enc_features that makes
it appear that X550 has more encapsulation features than other devices.
Also do not set NETIF_F_SG any longer since that is set by the
register_netdev() call. Also fixed the X550EM_x revision check, which
needs to check a value, not just a bit.
Alex Duyck fixes additional bugs in ixgbe_clear_vf_vlans(), one being
that the mask was using a divide instead of a modulus, which resulted
in the mask bit being incorrectly set to 0 or 1 based on the value of
the VF being tested. Alex also found that he was not consistent in
using the "word" argument as an offset or as a register offset, so
made the code consistently use word as the offset in the array.
v2: dropped patch 8 of the original series, as it was undoing a part of
the fix Alex Duyck was doing in patch 9 of the original series.
Dropped based on feedback from Emil (the author).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla says:
====================
be2net: patch set
The following patch set contains some feature additions, code re-organization
and cleanup and a few non-critical fixes. Pls consider applying this to
the net-next tree. Thanks.
v3 changes: add a default case to the switch statement in patch 5 to
satisfy the compiler (-Wswitch).
v2 changes: replaced an if/else block that checks for error values with a
switch/case statement in patch 5.
Patch 1 fixes VF link state transition from disabled to auto that did
not work due to an issue in the FW. This issue could not be fixed in FW due
to some backward compatibility issues it causes with released drivers.
The issue has been fixed by introducing a new version (v2) of the cmd
from 10.6 FW onwards. This patch adds support for v2 version of this cmd.
Patch 2 reports a EOPNOTSUPP status to ethtool when the user tries to
configure BE3/SRIOV in VEPA mode as it is not supported by the chip.
Patch 3 cleansup FW flash image related constant definitions. Many of these
definitions (such as section offset values) were defined in decimal format
rather than hexa-decimal. This makes this part of the code un-readable.
Also some defines related to BE2 are labeld "g2" and defines related to BE3
are labeled "g3". This patch cleans up all of this to make this code more
readable.
Patch 4 moves the FW cmd code to be_cmds.c. All code relating to FW cmds
has been in be_cmds.[ch], excepting FW flash cmd related code.
This patch moves these routines from be_main.c to be_cmds.c.
Patch 5 adds a log message to report digital signature errors while
flashing a FW image. From FW version 11.0 onwards, the FW supports a new
"secure mode" feature (based on a jumper setting on the adapter.) In this
mode, the FW image when flashed is authenticated with a digital signature.
Patch 6 removes a line of code that has no effect.
Patch 7 removes some unused variables.
Patch 8 fixes port resource descriptor query via the GET_PROFILE FW cmd.
An earlier commit passed a specific pf_num while issuing this cmd as FW
returns descriptors for all functions when pf_num is zero. But, when pf_num
is set to a non-zero value, FW does not return the port resource descriptor.
This patch fixes this by setting pf_num to 0 while issuing the query cmd
and adds code to pick the correct NIC resource descriptor from the list of
descriptors returned by FW.
Patch 9 adds support for ethtool get-dump feature. In the past when this
option was not yet available, this feature was supported via the
--register-dump option as a workaround. This patch removes support for
FW-dump via --register-dump option as it is now available via --get-dump
option. Even though the "ethtool --register-dump" cmd which used to work
earlier, will now fail with ENOTSUPP error, we feel it is not an issue as
this is used only for diagnostics purpose.
Patch 10 bumps up the driver version.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for ethtool's --get-dump option in be2net,
to retrieve FW dump. In the past when this option was not yet available,
this feature was supported via the --register-dump option as a workaround.
This patch removes support for FW-dump via --register-dump option as it is
now available via --get-dump option. Even though the
"ethtool --register-dump" cmd which used to work earlier, will now fail
with ENOTSUPP error, we feel it is not an issue as this is used only
for diagnostics purpose.
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 72ef3a88fa ("be2net: set pci_func_num while issuing
GET_PROFILE_CONFIG cmd") passed a specific pf_num while issuing a
GET_PROFILE_CONFIG cmd as FW returns descriptors for all functions when
pf_num is zero. But, when pf_num is set to a non-zero value, FW does not
return the Port resource descriptor.
This patch fixes this by setting pf_num to 0 while issuing the query cmd
and adds code to pick the correct NIC resource descriptor from the list of
descriptors returned by FW.
Fixes: 72ef3a88fa ("be2net: set pci_func_num while issuing
GET_PROFILE_CONFIG cmd")
Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
eeh_error, fw_timeout, hw_error variables in the be_adapter structure are
not used anymore. An earlier patch that introduced adapter->err_flags to
store this information missed removing these variables.
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes a line of code that changes adapter->recommended_prio
value followed by yet another assignment.
Also, the variable is used to store the vlan priority value that is already
shifted to the PCP bits position in the vlan tag format. Hence, the name of
this variable is changed to recommended_prio_bits.
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(based on a jumper setting on the adapter.) In this mode, the FW image when
flashed is authenticated with a digital signature. This patch logs
appropriate error messages and return a status to ethtool when errors
relating to FW image authentication occur.
Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All code relating to FW cmds is in be_cmds.[ch] excepting FW flash cmd
related code. This patch moves these routines from be_main.c to be_cmds.c
Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Many constant definitions relating to the FW-image layout
(such as section offset values) were defined in decimal format rather than
hexa-decimal. This makes this part of the code un-readable. Also some
defines related to BE2 are labeld "g2" and defines related to BE3 are
labeled "g3". This patch cleans up all of this to make this code more
readable.
Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
BE3 chip doesn't support VEPA mode.
Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The VF link state setting transition from "disable" to "auto" does not work
due to a bug in SET_LOGICAL_LINK_CONFIG_V1 cmd in FW. This issue could not
be fixed in FW due to some backward compatibility issues it causes with
some released drivers. The issue has been fixed by introducing a new
version (v2) of the cmd from 10.6 FW onwards. In v2, to set the VF link
state to auto, both PLINK_ENABLE and PLINK_TRACK bits have to be set to 1.
The VF link state setting feature now works on Lancer chips too from
FW ver 10.6.315.0 onwards.
Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull block fixes from Jens Axboe:
"Make the block layer great again.
Basically three amazing fixes in this pull request, split into 4
patches. Believe me, they should go into 4.4. Two of them fix a
regression, the third and last fixes an easy-to-trigger bug.
- Fix a bad irq enable through null_blk, for queue_mode=1 and using
timer completions. Add a block helper to restart a queue
asynchronously, and use that from null_blk. From me.
- Fix a performance issue in NVMe. Some devices (Intel Pxxxx) expose
a stripe boundary, and performance suffers if we cross it. We took
that into account for merging, but not for the newer splitting
code. Fix from Keith.
- Fix a kernel oops in lightnvm with multiple channels. From Matias"
* 'for-linus' of git://git.kernel.dk/linux-block:
lightnvm: wrong offset in bad blk lun calculation
null_blk: use async queue restart helper
block: add blk_start_queue_async()
block: Split bios on chunk boundaries
When I had rewritten the code for ixgbe_clear_vf_vlans() it looks like I
had transitioned back and forth between using word as an offset and using
word as a register offset. As a result I honestly don't see how the code
was working before other than the fact that resetting the VLANs on the VF
like didn't do much to clear them.
Another issue found is that the mask was using a divide instead of a
modulus. As a result the mask bit was incorrectly being set to either bit
0 or 1 based on the value of the VF being tested. As a result the wrong
VFs were having their VLANs cleared if they were enabled.
I have updated the code so that word represents the offset in the array.
This way we can use the modulus and xor operations and they will make sense
instead of being performed on a 4 byte aligned value.
I replaced the statement "(word % 2) ^ 1" with "~word % 2" in order to
reduce the line length as the line exceeded 80 characters with the register
name inserted. The two should be equivalent so the change should be safe.
Reported-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The X550EM_x revision check needs to check a value, not just a bit.
Use a mask and check the value. Also remove the redundant check
inside the ixgbe_enter_lplu_t_x550em, because it can only be called
when both the mac type and revision check pass.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
X550 allows for up to 64 RSS queues, but the driver can have max
of 63 (-1 MSIX vector for link).
On systems with >= 64 CPUs the driver will set the redirection table
for all 64 queues which will result in packets being dropped.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Clean up minor redundancy in the setting of hw_enc_features that
makes it appears that X550 uniquely has more encapsulation features
than other devices. The driver only supports one more feature, so
make it look that way. No longer set NETIF_F_SG since that is set
by the register_netdev call. Thanks to Alex Duyck for noticing this
slight confusion.
Reported-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Ethtool reports backplane type interfaces as 1000/10000baseT link modes.
This has been corrected to report the media as KR, KX or KX4 based on the
backplane interface present.
Signed-off-by: Veola Nazareth <veola.nazareth@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add missing QSFP PHY types to allow for more accurate reporting of
port settings.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
adapter->rx_itr_setting is not a mask so check it with == instead of &
do not default to 12K interrupts in ixgbevf_set_itr()
There should be no functional effect from these changes.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This is the same patch as for ixgbe but applied differently according to
busy polling. See commit 5d6002b7b8 ("ixgbe: Fix handling of NAPI
budget when multiple queues are enabled per vector")
Signed-off-by: William Dauchy <william@gandi.net>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Merge misc fixes from Andrew Morton:
"9 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm/vmstat: fix overflow in mod_zone_page_state()
ocfs2/dlm: clear migration_pending when migration target goes down
mm/memory_hotplug.c: check for missing sections in test_pages_in_a_zone()
ocfs2: fix flock panic issue
m32r: add io*_rep helpers
m32r: fix build failure
arch/x86/xen/suspend.c: include xen/xen.h
mm: memcontrol: fix possible memcg leak due to interrupted reclaim
ocfs2: fix BUG when calculate new backup super
Pull vfs fix from Al Viro:
"Fix for 3.15 breakage of fcntl64() in arm OABI compat. -stable
fodder"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
[PATCH] arm: fix handling of F_OFD_... in oabi_fcntl64()
mod_zone_page_state() takes a "delta" integer argument. delta contains
the number of pages that should be added or subtracted from a struct
zone's vm_stat field.
If a zone is larger than 8TB this will cause overflows. E.g. for a
zone with a size slightly larger than 8TB the line
mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages);
in mm/page_alloc.c:free_area_init_core() will result in a negative
result for the NR_ALLOC_BATCH entry within the zone's vm_stat, since 8TB
contain 0x8xxxxxxx pages which will be sign extended to a negative
value.
Fix this by changing the delta argument to long type.
This could fix an early boot problem seen on s390, where we have a 9TB
system with only one node. ZONE_DMA contains 2GB and ZONE_NORMAL the
rest. The system is trying to allocate a GFP_DMA page but ZONE_DMA is
completely empty, so it tries to reclaim pages in an endless loop.
This was seen on a heavily patched 3.10 kernel. One possible
explaination seem to be the overflows caused by mod_zone_page_state().
Unfortunately I did not have the chance to verify that this patch
actually fixes the problem, since I don't have access to the system
right now. However the overflow problem does exist anyway.
Given the description that a system with slightly less than 8TB does
work, this seems to be a candidate for the observed problem.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have found a BUG on res->migration_pending when migrating lock
resources. The situation is as follows.
dlm_mark_lockres_migration
res->migration_pending = 1;
__dlm_lockres_reserve_ast
dlm_lockres_release_ast returns with res->migration_pending remains
because other threads reserve asts
wait dlm_migration_can_proceed returns 1
>>>>>>> o2hb found that target goes down and remove target
from domain_map
dlm_migration_can_proceed returns 1
dlm_mark_lockres_migrating returns -ESHOTDOWN with
res->migration_pending still remains.
When reentering dlm_mark_lockres_migrating(), it will trigger the BUG_ON
with res->migration_pending. So clear migration_pending when target is
down.
Signed-off-by: Jiufei Xue <xuejiufei@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
test_pages_in_a_zone() does not account for the possibility of missing
sections in the given pfn range. pfn_valid_within always returns 1 when
CONFIG_HOLES_IN_ZONE is not set, allowing invalid pfns from missing
sections to pass the test, leading to a kernel oops.
Wrap an additional pfn loop with PAGES_PER_SECTION granularity to check
for missing sections before proceeding into the zone-check code.
This also prevents a crash from offlining memory devices with missing
sections. Despite this, it may be a good idea to keep the related patch
'[PATCH 3/3] drivers: memory: prohibit offlining of memory blocks with
missing sections' because missing sections in a memory block may lead to
other problems not covered by the scope of this fix.
Signed-off-by: Andrew Banman <abanman@sgi.com>
Acked-by: Alex Thorlton <athorlton@sgi.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Greg KH <greg@kroah.com>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
m32r allmodconfig was failing with the error:
error: implicit declaration of function 'read'
On checking io.h it turned out that 'read' is not defined but 'readb' is
defined and 'ioread8' will then obviously mean 'readb'.
At the same time some of the helper functions ioreadN_rep() and
iowriteN_rep() were missing which also led to the build failure.
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
m32r allmodconfig is failing with:
In file included from ../include/linux/kvm_para.h:4:0,
from ../kernel/watchdog.c:26:
../include/uapi/linux/kvm_para.h:30:26: fatal error: asm/kvm_para.h: No such file or directory
kvm_para.h was not included in the build.
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix the build warning:
arch/x86/xen/suspend.c: In function 'xen_arch_pre_suspend':
arch/x86/xen/suspend.c:70:9: error: implicit declaration of function 'xen_pv_domain' [-Werror=implicit-function-declaration]
if (xen_pv_domain())
^
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Memory cgroup reclaim can be interrupted with mem_cgroup_iter_break()
once enough pages have been reclaimed, in which case, in contrast to a
full round-trip over a cgroup sub-tree, the current position stored in
mem_cgroup_reclaim_iter of the target cgroup does not get invalidated
and so is left holding the reference to the last scanned cgroup. If the
target cgroup does not get scanned again (we might have just reclaimed
the last page or all processes might exit and free their memory
voluntary), we will leak it, because there is nobody to put the
reference held by the iterator.
The problem is easy to reproduce by running the following command
sequence in a loop:
mkdir /sys/fs/cgroup/memory/test
echo 100M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
echo $$ > /sys/fs/cgroup/memory/test/cgroup.procs
memhog 150M
echo $$ > /sys/fs/cgroup/memory/cgroup.procs
rmdir test
The cgroups generated by it will never get freed.
This patch fixes this issue by making mem_cgroup_iter avoid taking
reference to the current position. In order not to hit use-after-free
bug while running reclaim in parallel with cgroup deletion, we make use
of ->css_released cgroup callback to clear references to the dying
cgroup in all reclaim iterators that might refer to it. This callback
is called right before scheduling rcu work which will free css, so if we
access iter->position from rcu read section, we might be sure it won't
go away under us.
[hannes@cmpxchg.org: clean up css ref handling]
Fixes: 5ac8fb31ad ("mm: memcontrol: convert reclaim iterator to simple css refcounting")
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org> [3.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When resizing, it firstly extends the last gd. Once it should backup
super in the gd, it calculates new backup super and update the
corresponding value.
But it currently doesn't consider the situation that the backup super is
already done. And in this case, it still sets the bit in gd bitmap and
then decrease from bg_free_bits_count, which leads to a corrupted gd and
trigger the BUG in ocfs2_block_group_set_bits:
BUG_ON(le16_to_cpu(bg->bg_free_bits_count) < num_bits);
So check whether the backup super is done and then do the updates.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Jiufei Xue <xuejiufei@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use to_platform_device() instead of open-coding it.
Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>