The nfsd duplicate reply cache should not be shared between network
namespaces.
The most straightforward way to fix this is just to move every global in
the code to per-net-namespace memory, so that's what we do.
Still todo: sort out which members of nfsd_stats should be global and
which per-net-namespace.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Since commit 10a68cdf10 (nfsd: fix performance-limiting session
calculation) (Linux 5.1-rc1 and 4.19.31), shares from NFS servers with
1 TB of memory cannot be mounted anymore. The mount just hangs on the
client.
The gist of commit 10a68cdf10 is the change below.
-avail = clamp_t(int, avail, slotsize, avail/3);
+avail = clamp_t(int, avail, slotsize, total_avail/3);
Here are the macros.
#define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
#define clamp_t(type, val, lo, hi) min_t(type, max_t(type, val, lo), hi)
`total_avail` is 8,434,659,328 on the 1 TB machine. `clamp_t()` casts
the values to `int`, which for 32-bit integers can only hold values
−2,147,483,648 (−2^31) through 2,147,483,647 (2^31 − 1).
`avail` (in the function signature) is just 65536, so that no overflow
was happening. Before the commit the assignment would result in 21845,
and `num = 4`.
When using `total_avail`, it is causing the assignment to be
18446744072226137429 (printed as %lu), and `num` is then 4164608182.
My next guess is, that `nfsd_drc_mem_used` is then exceeded, and the
server thinks there is no memory available any more for this client.
Updating the arguments of `clamp_t()` and `min_t()` to `unsigned long`
fixes the issue.
Now, `avail = 65536` (before commit 10a68cdf10 `avail = 21845`), but
`num = 4` remains the same.
Fixes: c54f24e338 (nfsd: fix performance-limiting session calculation)
Cc: stable@vger.kernel.org
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The DRC appears to be effectively empty after an RPC/RDMA transport
reconnect. The problem is that each connection uses a different
source port, which defeats the DRC hash.
Clients always have to disconnect before they send retransmissions
to reset the connection's credit accounting, thus every retransmit
on NFS/RDMA will miss the DRC.
An NFS/RDMA client's IP source port is meaningless for RDMA
transports. The transport layer typically sets the source port value
on the connection to a random ephemeral port. The server already
ignores it for the "secure port" check. See commit 16e4d93f6d
("NFSD: Ignore client's source port on RDMA transports").
The Linux NFS server's DRC resolves XID collisions from the same
source IP address by using the checksum of the first 200 bytes of
the RPC call header.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This reverts most of commit b8eee0e90f ("lockd: Show pid of lockd for
remote locks"), which caused remote locks to not be differentiated between
remote processes for NLM.
We retain the fixup for setting the client's fl_pid to a negative value.
Fixes: b8eee0e90f ("lockd: Show pid of lockd for remote locks")
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: XueWei Zhang <xueweiz@google.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
GCC 9 now warns about calling memset() on partial structures when it
goes across multiple fields. This adds a helper for the place in
tracing that does this type of clearing of a structure.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXOrlfhQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qoDhAP4mogBm0JjJ1LWr8RX2/X7qFm0x1zLz
5Mk0QKfeRP3MYgEAl2mV/HeFp7aMxEY2CKy0LslmaXPhamPx1r0LlfMgIws=
=drP3
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing warning fix from Steven Rostedt:
"Make the GCC 9 warning for sub struct memset go away.
GCC 9 now warns about calling memset() on partial structures when it
goes across multiple fields. This adds a helper for the place in
tracing that does this type of clearing of a structure"
* tag 'trace-v5.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Silence GCC 9 array bounds warning
merge window, but should not wait four months before they appear in
a release. I also travelled a bit more than usual in the first part
of May, which didn't help with picking up patches and reports promptly.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJc6RkmAAoJEL/70l94x66DhEAH/ijCkibV9vOUu8n/lSxMjAzi
I/Y1VEaVRFuQ6u0QSjWBBg22tVsWuWiVbonJ63w3JMRwi5Q5zW9REE7EaKRAa/eC
FiFE7vTesYh6sGVwdMCwoinjMDyCp7hybvtBc608+MWhVmrdzTYtPm5N85wxIDtW
xH5Kr2mVeLC43X3vfegolmXZ1obAbZEToJvOgJrYFhnzsmVYYl182kfGtrppBoO0
XXDPuDRGpTrm6A2oADMdOv+mT9p51pHsedmHQaDGXwAGEC/BkOGKdIdBfwppEwy7
QP2NGqwkHIyghV1aCPacT6O6G6xL0i2rfvlJ7+e6o7deU4uMXAqIdQ2DbIcHy3g=
=5IW2
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"The usual smattering of fixes and tunings that came in too late for
the merge window, but should not wait four months before they appear
in a release.
I also travelled a bit more than usual in the first part of May, which
didn't help with picking up patches and reports promptly"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (33 commits)
KVM: x86: fix return value for reserved EFER
tools/kvm_stat: fix fields filter for child events
KVM: selftests: Wrap vcpu_nested_state_get/set functions with x86 guard
kvm: selftests: aarch64: compile with warnings on
kvm: selftests: aarch64: fix default vm mode
kvm: selftests: aarch64: dirty_log_test: fix unaligned memslot size
KVM: s390: fix memory slot handling for KVM_SET_USER_MEMORY_REGION
KVM: x86/pmu: do not mask the value that is written to fixed PMUs
KVM: x86/pmu: mask the result of rdpmc according to the width of the counters
x86/kvm/pmu: Set AMD's virt PMU version to 1
KVM: x86: do not spam dmesg with VMCS/VMCB dumps
kvm: Check irqchip mode before assign irqfd
kvm: svm/avic: fix off-by-one in checking host APIC ID
KVM: selftests: do not blindly clobber registers in guest asm
KVM: selftests: Remove duplicated TEST_ASSERT in hyperv_cpuid.c
KVM: LAPIC: Expose per-vCPU timer_advance_ns to userspace
KVM: LAPIC: Fix lapic_timer_advance_ns parameter overflow
kvm: vmx: Fix -Wmissing-prototypes warnings
KVM: nVMX: Fix using __this_cpu_read() in preemptible context
kvm: fix compilation on s390
...
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlzqGSoACgkQ8vlZVpUN
gaNhqQgAiUHwKalYrZ82NwBQGnHcKcWv3JEE9vt8Bsu4fPUzirrEqYiSudvj6nHv
8uYFKHmGx7+GEWxLfwlVZzRjLlgZqa0kpyfNFEL01KFdbFsKQN4gTYvvky+OVftr
nRZ7tp66Y5hErwn/Y0wWn9WHFOykhxGi+kv5m5CFZ7MNec/b+1H2U1hXkhSt6oug
IO2wLZYLFSPXlrqfJLV7HYJ/OX1mO7g1viCNGvpRmrvLmjmO09q0/6DF3QNAvGmj
sXXu0eV+N/Ir0so0RbeN60ZeDXaoyeOZbXFlH9zfJEgkoFv+adZjT65bQEvSUWQ2
J/v4rLXd8gmCiVwOuEbCoLKebT/nbg==
=Wf1M
-----END PGP SIGNATURE-----
Merge tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random
Pull /dev/random fix from Ted Ts'o:
"Fix a soft lockup regression when reading from /dev/random in early
boot"
* tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
random: fix soft lockup when trying to read from an uninitialized blocking pool
Fixes: eb9d1bf079: "random: only read from /dev/random after its pool has received 128 bits"
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Starting with GCC 9, -Warray-bounds detects cases when memset is called
starting on a member of a struct but the size to be cleared ends up
writing over further members.
Such a call happens in the trace code to clear, at once, all members
after and including `seq` on struct trace_iterator:
In function 'memset',
inlined from 'ftrace_dump' at kernel/trace/trace.c:8914:3:
./include/linux/string.h:344:9: warning: '__builtin_memset' offset
[8505, 8560] from the object at 'iter' is out of the bounds of
referenced subobject 'seq' with type 'struct trace_seq' at offset
4368 [-Warray-bounds]
344 | return __builtin_memset(p, c, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
In order to avoid GCC complaining about it, we compute the address
ourselves by adding the offsetof distance instead of referring
directly to the member.
Since there are two places doing this clear (trace.c and trace_kdb.c),
take the chance to move the workaround into a single place in
the internal header.
Link: http://lkml.kernel.org/r/20190523124535.GA12931@gmail.com
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
[ Removed unnecessary parenthesis around "iter" ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlzppnIACgkQ8vlZVpUN
gaOWcwf/YmIeCi7HHuOJG5STYhMZjbAoK7eCNSjmP0HBIpyZSBaSZg1/ZEmtTVA6
SyGWxYD2xymphkEcRQ20pF8h2CYurHsjYl9RH+Im2iaCzdeFKvgfYxSSsqsaZixM
ejQK22W6mVULd1RqFGNPeo+5v7Fxn6fK0zw2k5JrLjFnIRq/XIA7qMdjblPOcfi+
QT/K9a2DZ5vHBGDKjEiVA+a0HX6bxdGTiiT4LW+uiHUJUESBWNQJqOHJqno9VdFh
J97/3XJHMGPAbjD4AiINAL0x8IZ2FXx1H+QgVDnrxy8lVrYaMVvWMEokMQ7HvkFr
SmYddgBPUHO+kk4u34nznZNuesvOqQ==
=dFk1
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Bug fixes (including a regression fix) for ext4"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix dcache lookup of !casefolded directories
ext4: do not delete unlinked inode from orphan list on failed truncate
ext4: wait for outstanding dio during truncate in nojournal mode
ext4: don't perform block validity checks on the journal inode
code and I forgot to incorporate them.
I also added a small clean up patch that was sent to me a while ago
and I just noticed it.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXOixqRQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qlPsAQCmNzno+SJMXLLIojZ80pqs9PqLqIrW
iBopFNihRBAkxgEAle8Pvr53S4R/Hjn6j++kxBm/uWaEfICOzyqcSyqxlQA=
=Sin+
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Tom Zanussi sent me some small fixes and cleanups to the histogram
code and I forgot to incorporate them.
I also added a small clean up patch that was sent to me a while ago
and I just noticed it"
* tag 'trace-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
kernel/trace/trace.h: Remove duplicate header of trace_seq.h
tracing: Add a check_val() check before updating cond_snapshot() track_val
tracing: Check keys for variable references in expressions too
tracing: Prevent hist_field_var_ref() from accessing NULL tracing_map_elts
Found by visual inspection, this wasn't caught by my xfstest, since it's
effect is ignoring positive dentries in the cache the fallback just goes
to the disk. it was introduced in the last iteration of the
case-insensitive patch.
d_compare should return 0 when the entries match, so make sure we are
correctly comparing the entire string if the encoding feature is set and
we are on a case-INsensitive directory.
Fixes: b886ee3e77 ("ext4: Support case-insensitive file name lookups")
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This is the same set of patches sent in the merge window as the final
pull except that Martin's read only rework is replaced with a simple
revert of the original change that caused the regression. Everything
else is an obvious fix or small cleanup.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXOh2RyYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishe+mAP9PtAon
IUlSEcJaMhej3VSyjxWYxble0pbCkBYnuH220gEAk7eCISK3xwAdkWYD0wVLLqxo
9t8qgzKbZSPZVRRD8Tk=
=9p0X
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is the same set of patches sent in the merge window as the final
pull except that Martin's read only rework is replaced with a simple
revert of the original change that caused the regression.
Everything else is an obvious fix or small cleanup"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
Revert "scsi: sd: Keep disk read-only when re-reading partition"
scsi: bnx2fc: fix incorrect cast to u64 on shift operation
scsi: smartpqi: Reporting unhandled SCSI errors
scsi: myrs: Fix uninitialized variable
scsi: lpfc: Update lpfc version to 12.2.0.2
scsi: lpfc: add check for loss of ndlp when sending RRQ
scsi: lpfc: correct rcu unlock issue in lpfc_nvme_info_show
scsi: lpfc: resolve lockdep warnings
scsi: qedi: remove set but not used variables 'cdev' and 'udev'
scsi: qedi: remove memset/memcpy to nfunc and use func instead
scsi: qla2xxx: Add cleanup for PCI EEH recovery
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlzobRYQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgptwcD/99hOkZWNqX0FKjkrofywXBjX//UqBb2OQS
/7vBoWgSMN+SXDI08YdePCjreviDs4VjbP1V1EgBTbb0HpEApbAuTqx7fszbsJLi
Ld6pMkDpRp6RKttmaDW6iT39gZC3w9wOYusbC8pfrVbvhXm9CRLum78Q8h2rdl0c
HzIMopvGvvJazTYj/ZD8L/83Z6oqHPWojnXPIK1CNw6PQ4+A1frD85WitW4Fragp
T5lx0ZBPLHe+1VPoIQg3Rq2ZZcQW2Kfm5mytw9sDG6KbG5/Vj7+jtF6X36QvuFhZ
fU2zWAN7zFVE0FvXxS/ze5lFI8/efkwIAa2xYvkkFWJ+FNBkOrNrhN1JgNyMQgTe
2r4dLPp3XGcfvCCndTnQdwNAGuc878X+bGwlxb1wjTRcElJRpflE1wBx2kzzdnjl
zD2dmUgxURJvY8clKbq/bpgoxLKtqGCsJy7mHOyCUTpflP7YrpvJnUcc14PARnDt
V2JlnTVNO2r9oZ7IBHPWtNLmFjZhba5BaQDD1EtUUgO3fId4wL1rJ52j5K9/2eg7
yC4qdKGZLQoHGTnn8qBY+BS8/bMeMxu6Lx4RqtgVa8r+dkKFhblIdOmYZnyevxSf
B5rtt8CJUU7d3edxZHp9jFiYVbmrc6CjIhRLYZyrLfQGCL3F6qFzozYd0Lwiwxhz
gx2TTsDfFg==
=lGyw
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20190524' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- NVMe pull request from Keith, with fixes from a few folks.
- bio and sbitmap before atomic barrier fixes (Andrea)
- Hang fix for blk-mq freeze and unfreeze (Bob)
- Single segment count regression fix (Christoph)
- AoE now has a new maintainer
- tools/io_uring/ Makefile fix, and sync with liburing (me)
* tag 'for-linus-20190524' of git://git.kernel.dk/linux-block: (23 commits)
tools/io_uring: sync with liburing
tools/io_uring: fix Makefile for pthread library link
blk-mq: fix hang caused by freeze/unfreeze sequence
block: remove the bi_seg_{front,back}_size fields in struct bio
block: remove the segment size check in bio_will_gap
block: force an unlimited segment size on queues with a virt boundary
block: don't decrement nr_phys_segments for physically contigous segments
sbitmap: fix improper use of smp_mb__before_atomic()
bio: fix improper use of smp_mb__before_atomic()
aoe: list new maintainer for aoe driver
nvme-pci: use blk-mq mapping for unmanaged irqs
nvme: update MAINTAINERS
nvme: copy MTFA field from identify controller
nvme: fix memory leak for power latency tolerance
nvme: release namespace SRCU protection before performing controller ioctls
nvme: merge nvme_ns_ioctl into nvme_ioctl
nvme: remove the ifdef around nvme_nvm_ioctl
nvme: fix srcu locking on error return in nvme_get_ns_from_disk
nvme: Fix known effects
nvme-pci: Sync queues on reset
...
This Kselftest fixes update for Linux 5.2-rc2 consists of:
- 2 fixes to regressions introduced in kselftest Makefile test run output
refactoring work from Kees Cook.
- Adding Atom support to syscall_arg_fault test from Tong Bo.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAlzoUmUACgkQCwJExA0N
QxzwxhAAnwCnP1Z5WXTet3GO4tUbzqtoXC05j1Z+zRbxnfA72RSoBu/by8uG4gZf
OR9YQ1F7SR+K2V4CkwO8j5BarHun992HuOAQuFXHFIx3oVxWMDprIQr4hVNm7hRF
14mLDsFWe/lHQFOmcZ2vFwCG3uyr1VukWLW0FPknpTaqb5NcaHyYwgheukmUyw5v
CSUZ/OKetxD5DmYoxoIONvM0nIk6iLZtnblSCfiNe9XTzpTD9ngs9Gwn5m1ILN8F
HetnOXcjq52SnN5VR6oJt75CvTlqmj9EmuLhhd8PJ4Jgq2TCmRQFxc6WjnBh03IZ
8AMq5w4JPd+XEIf1QeNmfV2w9T4/ftFvAlMz46aUPDkeBfKLRCcTAlfkhotru6a9
Vxkk1PD/+wxFH22ktA8asnXhyfQAegpUWlR2e8Efr7uE1PCl55KBkU9BZYjla9GP
Av6wo03ENcuy/eJ/i61H8eygOXRw9pGdJut/P/tEPwBlgdfz8V8/oelX/WQvfUIz
fff6lxZgtzoStICOYJFHNCsNZIdIeOtdU+27U2BHRIUQkrkhctcFrn0xXe4ETls8
ZH1SvkIwFnD7Ib+Iztf3ZSyF3Bi4G3tqPfc1WNSbHbJ2LOa5HGZecyd+zJOvnng1
6smviLK3FcwkJn+Pl0/CRkzEzP/dwpqgE1eJPSR7iy++ypv7E7o=
=BYfS
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:
- Two fixes to regressions introduced in kselftest Makefile test run
output refactoring work (Kees Cook)
- Adding Atom support to syscall_arg_fault test (Tong Bo)
* tag 'linux-kselftest-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/timers: Add missing fflush(stdout) calls
selftests: Remove forced unbuffering for test running
selftests/x86: Support Atom for syscall_arg_fault test
- Update checkpatch.pl to use DT vendor-prefixes.yaml
- Fix DT binding references to files converted to DT schema
- Clean-up Arm CPU binding examples to match schema
- Add Sifive block versioning scheme documentation
- Pass binding directory base to validation tools for reference lookups
-----BEGIN PGP SIGNATURE-----
iQJEBAABCgAuFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAlzoTXoQHHJvYmhAa2Vy
bmVsLm9yZwAKCRD6+121jbxhw2sBD/9GN9RJWPlNhfwEtQrkv8RplcKUaSWdT9m1
HrHDeU8IkKVFbjIv44qq6fGmoPJSGJf87kmPh38n42IvrL17dB+QaUKBOYqGzvLU
GXAwYVOAFSnW/Xt72dQz/8M9rnf9hd8uTTjlknZ0ssUwMHkORRUrQDUvhgrcdP+g
1sBJRrqgJhHtCtooeXZsDwK0J+Asw2PQA9dBgP+anROvVr7hR9w0dFyOW1cNeZo7
l1Eu8RRAH3yyOooCGM1B+Vy/9xmytBx1pHQm5EwOxptRVxZbHrSEZvO7pG5EMCb5
K+nUz7CaKgdNVF2bzOkkjpVrF3+qA+zhmSAji5sxDRsx5nhq1OuZKfqMW+6V4cyJ
lolBXQC/9IaFckZOrctlIfuB/slCxwmkO9frZ4Uv6co1fHCdAmq8FbU7ooEsKE40
uKAnr6TkkadkdLkkr8cW7DdEk769LA5Y4LMeUzxgEGz3dOz0C7GyU7wnMKCr2zep
Xs5KccNVXWZfxV4hFsNncqgSJi02ogRtORr7zzcD7Z/eoBT6ATNsCq1BMDzcdQbd
cPJ61521HUrO0PB0m92gPTLrUcF+PilFE8O19tlzM749gUDiKosuHgWTvEmmmHA6
lX1+d2cNWX+usnYfNtu+R7WufwGvXceoPje/LICxG74LEwXDZYBasl5tqhrt6V/J
EN+rL8jMZQ==
=7oNi
-----END PGP SIGNATURE-----
Merge tag 'devicetree-fixes-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull Devicetree fixes from Rob Herring:
- Update checkpatch.pl to use DT vendor-prefixes.yaml
- Fix DT binding references to files converted to DT schema
- Clean-up Arm CPU binding examples to match schema
- Add Sifive block versioning scheme documentation
- Pass binding directory base to validation tools for reference lookups
* tag 'devicetree-fixes-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
checkpatch.pl: Update DT vendor prefix check
dt: bindings: mtd: replace references to nand.txt with nand-controller.yaml
dt-bindings: interrupt-controller: arm,gic: Fix schema errors in example
dt-bindings: arm: Clean up CPU binding examples
dt: fix refs that were renamed to json with the same file name
dt-bindings: Pass binding directory to validation tools
dt-bindings: sifive: describe sifive-blocks versioning
Here is another set of reviewed patches that adds SPDX tags to different
kernel files, based on a set of rules that are being used to parse the
comments to try to determine that the license of the file is
"GPL-2.0-or-later". Only the "obvious" versions of these matches are
included here, a number of "non-obvious" variants of text have been
found but those have been postponed for later review and analysis.
These patches have been out for review on the linux-spdx@vger mailing
list, and while they were created by automatic tools, they were
hand-verified by a bunch of different people, all whom names are on the
patches are reviewers.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXOgmlw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yk4rACfRqxGOGVLR/t6E9dDzOZRAdEz/mYAoJLZmziY
0YlSSSPtP5HI6JDh65Ng
=HXQb
-----END PGP SIGNATURE-----
Merge tag 'spdx-5.2-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pule more SPDX updates from Greg KH:
"Here is another set of reviewed patches that adds SPDX tags to
different kernel files, based on a set of rules that are being used to
parse the comments to try to determine that the license of the file is
"GPL-2.0-or-later".
Only the "obvious" versions of these matches are included here, a
number of "non-obvious" variants of text have been found but those
have been postponed for later review and analysis.
These patches have been out for review on the linux-spdx@vger mailing
list, and while they were created by automatic tools, they were
hand-verified by a bunch of different people, all whom names are on
the patches are reviewers"
* tag 'spdx-5.2-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (85 commits)
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 125
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 123
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 122
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 121
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 120
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 119
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 118
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 116
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 114
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 113
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 112
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 111
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 110
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 106
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 105
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 104
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 103
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 102
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 101
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 98
...
The kernel test robot has reported that the use of __this_cpu_add()
causes bug messages like:
BUG: using __this_cpu_add() in preemptible [00000000] code: ...
Given the imprecise nature of the count and the possibility of resetting
the count and doing the measurement again, this is not really a big
problem to use the unprotected __this_cpu_*() functions.
To make the preemption checking code happy, the this_cpu_*() functions
will be used if CONFIG_DEBUG_PREEMPT is defined.
The imprecise nature of the locking counts are also documented with
the suggestion that we should run the measurement a few times with the
counts reset in between to get a better picture of what is going on
under the hood.
Fixes: a8654596f0 ("locking/rwsem: Enable lock event counting")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 11988499e6 ("KVM: x86: Skip EFER vs. guest CPUID checks for
host-initiated writes", 2019-04-02) introduced a "return false" in a
function returning int, and anyway set_efer has a "nonzero on error"
conventon so it should be returning 1.
Reported-by: Pavel Machek <pavel@denx.de>
Fixes: 11988499e6 ("KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes")
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The fields filter would not work with child fields, as the respective
parents would not be included. No parents displayed == no childs displayed.
To reproduce, run on s390 (would work on other platforms, too, but would
require a different filter name):
- Run 'kvm_stat -d'
- Press 'f'
- Enter 'instruct'
Notice that events like instruction_diag_44 or instruction_diag_500 are not
displayed - the output remains empty.
With this patch, we will filter by matching events and their parents.
However, consider the following example where we filter by
instruction_diag_44:
kvm statistics - summary
regex filter: instruction_diag_44
Event Total %Total CurAvg/s
exit_instruction 276 100.0 12
instruction_diag_44 256 92.8 11
Total 276 12
Note that the parent ('exit_instruction') displays the total events, but
the childs listed do not match its total (256 instead of 276). This is
intended (since we're filtering all but one child), but might be confusing
on first sight.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
struct kvm_nested_state is only available on x86 so far. To be able
to compile the code on other architectures as well, we need to wrap
the related code with #ifdefs.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
aarch64 fixups needed to compile with warnings as errors.
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
VM_MODE_P52V48_4K is not a valid mode for AArch64. Replace its
use in vm_create_default() with a mode that works and represents
a good AArch64 default. (We didn't ever see a problem with this
because we don't have any unit tests using vm_create_default(),
but it's good to get it fixed in advance.)
Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The memory slot size must be aligned to the host's page size. When
testing a guest with a 4k page size on a host with a 64k page size,
then 3 guest pages are not host page size aligned. Since we just need
a nearly arbitrary number of extra pages to ensure the memslot is not
aligned to a 64 host-page boundary for this test, then we can use
16, as that's 64k aligned, but not 64 * 64k aligned.
Fixes: 76d58e0f07 ("KVM: fix KVM_CLEAR_DIRTY_LOG for memory slots of unaligned size", 2019-04-17)
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
kselftests exposed a problem in the s390 handling for memory slots.
Right now we only do proper memory slot handling for creation of new
memory slots. Neither MOVE, nor DELETION are handled properly. Let us
implement those.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
According to the SDM, for MSR_IA32_PERFCTR0/1 "the lower-order 32 bits of
each MSR may be written with any value, and the high-order 8 bits are
sign-extended according to the value of bit 31", but the fixed counters
in real hardware are limited to the width of the fixed counters ("bits
beyond the width of the fixed-function counter are reserved and must be
written as zeros"). Fix KVM to do the same.
Reported-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch will simplify the changes in the next, by enforcing the
masking of the counters to RDPMC and RDMSR.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
After commit:
672ff6cff8 ("KVM: x86: Raise #GP when guest vCPU do not support PMU")
my AMD guests started #GPing like this:
general protection fault: 0000 [#1] PREEMPT SMP
CPU: 1 PID: 4355 Comm: bash Not tainted 5.1.0-rc6+ #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
RIP: 0010:x86_perf_event_update+0x3b/0xa0
with Code: pointing to RDPMC. It is RDPMC because the guest has the
hardware watchdog CONFIG_HARDLOCKUP_DETECTOR_PERF enabled which uses
perf. Instrumenting kvm_pmu_rdpmc() some, showed that it fails due to:
if (!pmu->version)
return 1;
which the above commit added. Since AMD's PMU leaves the version at 0,
that causes the #GP injection into the guest.
Set pmu->version arbitrarily to 1 and move it above the non-applicable
struct kvm_pmu members.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Cc: kvm@vger.kernel.org
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Mihai Carabas <mihai.carabas@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: stable@vger.kernel.org
Fixes: 672ff6cff8 ("KVM: x86: Raise #GP when guest vCPU do not support PMU")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Userspace can easily set up invalid processor state in such a way that
dmesg will be filled with VMCS or VMCB dumps. Disable this by default
using a module parameter.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When assigning kvm irqfd we didn't check the irqchip mode but we allow
KVM_IRQFD to succeed with all the irqchip modes. However it does not
make much sense to create irqfd even without the kernel chips. Let's
provide a arch-dependent helper to check whether a specific irqfd is
allowed by the arch. At least for x86, it should make sense to check:
- when irqchip mode is NONE, all irqfds should be disallowed, and,
- when irqchip mode is SPLIT, irqfds that are with resamplefd should
be disallowed.
For either of the case, previously we'll silently ignore the irq or
the irq ack event if the irqchip mode is incorrect. However that can
cause misterious guest behaviors and it can be hard to triage. Let's
fail KVM_IRQFD even earlier to detect these incorrect configurations.
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Radim Krčmář <rkrcmar@redhat.com>
CC: Alex Williamson <alex.williamson@redhat.com>
CC: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Current logic does not allow VCPU to be loaded onto CPU with
APIC ID 255. This should be allowed since the host physical APIC ID
field in the AVIC Physical APIC table entry is an 8-bit value,
and APIC ID 255 is valid in system with x2APIC enabled.
Instead, do not allow VCPU load if the host APIC ID cannot be
represented by an 8-bit value.
Also, use the more appropriate AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK
instead of AVIC_MAX_PHYSICAL_ID_COUNT.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The guest_code of sync_regs_test is assuming that the compiler will not
touch %r11 outside the asm that increments it, which is a bit brittle.
Instead, we can increment a variable and use a dummy asm to ensure the
increment is not optimized away. However, we also need to use a
callee-save register or the compiler will insert a save/restore around
the vmexit, breaking the whole idea behind the test.
(Yes, "if it ain't broken...", but I would like the test to be clean
before it is copied into the upcoming s390 selftests).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The check for entry->index == 0 is done twice. One time should
be sufficient.
Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Expose per-vCPU timer_advance_ns to userspace, so it is able to
query the auto-adjusted value.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
After commit c3941d9e0 (KVM: lapic: Allow user to disable adaptive tuning of
timer advancement), '-1' enables adaptive tuning starting from default
advancment of 1000ns. However, we should expose an int instead of an overflow
uint module parameter.
Before patch:
/sys/module/kvm/parameters/lapic_timer_advance_ns:4294967295
After patch:
/sys/module/kvm/parameters/lapic_timer_advance_ns:-1
Fixes: c3941d9e0 (KVM: lapic: Allow user to disable adaptive tuning of timer advancement)
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We get a warning when build kernel W=1:
arch/x86/kvm/vmx/vmx.c:6365:6: warning: no previous prototype for ‘vmx_update_host_rsp’ [-Wmissing-prototypes]
void vmx_update_host_rsp(struct vcpu_vmx *vmx, unsigned long host_rsp)
Add the missing declaration to fix this.
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Kvm now supports extended CPUID functions through 0x8000001f. CPUID
leaf 0x8000001e is AMD's Processor Topology Information leaf. This
contains similar information to CPUID leaf 0xb (Intel's Extended
Topology Enumeration leaf), and should be included in the output of
KVM_GET_SUPPORTED_CPUID, even though userspace is likely to override
some of this information based upon the configuration of the
particular VM.
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Borislav Petkov <bp@suse.de>
Fixes: 8765d75329 ("KVM: X86: Extend CPUID range to include new leaf")
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Marc Orr <marcorr@google.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Per the APM, "CPUID Fn8000_001D_E[D,C,B,A]X reports cache topology
information for the cache enumerated by the value passed to the
instruction in ECX, referred to as Cache n in the following
description. To gather information for all cache levels, software must
repeatedly execute CPUID with 8000_001Dh in EAX and ECX set to
increasing values beginning with 0 until a value of 00h is returned in
the field CacheType (EAX[4:0]) indicating no more cache descriptions
are available for this processor."
The termination condition is the same as leaf 4, so we can reuse that
code block for leaf 0x8000001d.
Fixes: 8765d75329 ("KVM: X86: Extend CPUID range to include new leaf")
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Marc Orr <marcorr@google.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
So far the KVM selftests are compiled without any compiler warnings
enabled. That's quite bad, since we miss a lot of possible bugs this
way. Let's enable at least "-Wall" and some other useful warning flags
now, and fix at least the trivial problems in the code (like unused
variables).
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The code is trying to check that all the padding is zeroed out and it
does this:
entry->padding[0] == entry->padding[1] == entry->padding[2] == 0
Assume everything is zeroed correctly, then the first comparison is
true, the next comparison is false and false is equal to zero so the
overall condition is true. This bug doesn't affect run time very
badly, but the code should instead just check that all three paddings
are zero individually.
Also the error message was copy and pasted from an earlier error and it
wasn't correct.
Fixes: 7edcb73433 ("KVM: selftests: Add hyperv_cpuid test")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
WARNING: CPU: 0 PID: 13554 at kvm/arch/x86/kvm//../../../virt/kvm/kvm_main.c:4183 kvm_resume+0x3c/0x40 [kvm]
CPU: 0 PID: 13554 Comm: step_after_susp Tainted: G OE 5.1.0-rc4+ #1
RIP: 0010:kvm_resume+0x3c/0x40 [kvm]
Call Trace:
syscore_resume+0x63/0x2d0
suspend_devices_and_enter+0x9d1/0xa40
pm_suspend+0x33a/0x3b0
state_store+0x82/0xf0
kobj_attr_store+0x12/0x20
sysfs_kf_write+0x4b/0x60
kernfs_fop_write+0x120/0x1a0
__vfs_write+0x1b/0x40
vfs_write+0xcd/0x1d0
ksys_write+0x5f/0xe0
__x64_sys_write+0x1a/0x20
do_syscall_64+0x6f/0x6c0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Commit ca84d1a24 (KVM: x86: Add clock sync request to hardware enable) mentioned
that "we always hold kvm_lock when hardware_enable is called. The one place that
doesn't need to worry about it is resume, as resuming a frozen CPU, the spinlock
won't be taken." However, commit 6706dae9 (virt/kvm: Replace spin_is_locked() with
lockdep) introduces a bug, it asserts when the lock is not held which is contrary
to the original goal.
This patch fixes it by WARN_ON when the lock is held.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Fixes: 6706dae9 ("virt/kvm: Replace spin_is_locked() with lockdep")
[Wrap with #ifdef CONFIG_LOCKDEP - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
VMX's nested_run_pending flag is subtly consumed when stuffing state to
enter guest mode, i.e. needs to be set according before KVM knows if
setting guest state is successful. If setting guest state fails, clear
the flag as a nested run is obviously not pending.
Reported-by: Aaron Lewis <aaronlewis@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The offset for reading the shadow VMCS is sizeof(*kvm_state)+VMCS12_SIZE,
so the correct size must be that plus sizeof(*vmcs12). This could lead
to KVM reading garbage data from userspace and not reporting an error,
but is otherwise not sensitive.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- Correctly annotate HYP-callable code to be non-traceable
- Remove Christoffer from the MAINTAINERS file as his request
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAlzn+B4VHG1hcmMuenlu
Z2llckBhcm0uY29tAAoJECPQ0LrRPXpD4aQQALuikHA7JTN2LUedPM8Klf7Hm0qO
cqCdI1VmMxMJfzV2l/73JIJaB7wRPYugfsoGtrJMgpvs6nD8YMq8bpvDjwDIA2Wy
SGMR0dNL+bhe+tWsYU9pR/9pBI9t7gSkWZY1Qv7SU4yNFOy1MV0dGtkCW5J6qhLh
xoAUOmWdAj/KkaTRh4J9PtQeToiAtgFTlrnx/qC4iZGYT/gDOaXfPZTHerPeAEt6
MMKU74YoV5mX6Y55qw+dspDpSrxH44IkO0IF0ml/DvoQ+PaWjn6PFsx/3ZItdRg4
iiZefBm1dsvDGTIq1hzoEPJWahVmPad9cgsKSKhJPjHH79pWEsvRcExBmbku0Llj
n1mmqMaKw64FLE5x/Vbbd8vHUoIdCpdBz+qWmeldHsXYHGge+c5HsoLbgHJqOFQD
RidO+S+C9imaNYHnbehrAudjvav2bY/wbQCb+SKU2ZOXiuxXmdzlMNgiA61ylicK
jqOvuNWISjhmVMJVK3GpkIs6pRwYeFwalNux809LqJTT9U9XnZARys/IdE16KlX/
5FMM60r+B1aX6ge+w8MfyPw28xm8xwYnX4Mor7UfmDLCPB70w7MKCAIsHF12pCUf
ygSnruajTgVic/2ARzum1FmY0boSdFcBRESN+eQpLriaJK3x7mHxS3qBrvd5Yj9j
w1xfuKSCN+zGbHzq
=v3an
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-fixes-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm updates for 5.2-rc2
- Correctly annotate HYP-callable code to be non-traceable
- Remove Christoffer from the MAINTAINERS file as his request