Commit Graph

1154278 Commits

Author SHA1 Message Date
Maximilian Luz
c965daac37 platform/surface: aggregator: Add missing call to ssam_request_sync_free()
Although rare, ssam_request_sync_init() can fail. In that case, the
request should be freed via ssam_request_sync_free(). Currently it is
leaked instead. Fix this.

Fixes: c167b9c7e3 ("platform/surface: Add Surface Aggregator subsystem")
Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
Link: https://lore.kernel.org/r/20221220175608.1436273-1-luzmaximilian@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-01-12 19:12:02 +01:00
Maximilian Luz
ae0fa0a312 platform/surface: aggregator: Ignore command messages not intended for us
It is possible that we (the host/kernel driver) receive command messages
that are not intended for us. Ignore those for now.

The whole story is a bit more complicated: It is possible to enable
debug output on SAM, which is sent via SSH command messages. By default
this output is sent to a debug connector, with its own target ID
(TID=0x03). It is possible to override the target of the debug output
and set it to the host/kernel driver. This, however, does not change the
original target ID of the message. Meaning, we receive messages with
TID=0x03 (debug) but expect to only receive messages with TID=0x00
(host).

The problem is that the different target ID also comes with a different
scope of request IDs. In particular, these do not follow the standard
event rules (i.e. do not fall into a set of small reserved values).
Therefore, current message handling interprets them as responses to
pending requests and tries to match them up via the request ID. However,
these debug output messages are not in fact responses, and therefore
this will at best fail to find the request and at worst pass on the
wrong data as response for a request.

Therefore ignore any command messages not intended for us (host) for
now. We can implement support for the debug messages once we have a
better understanding of them.

Note that this may also provide a bit more stability and avoid some
driver confusion in case any other targets want to talk to us in the
future, since we don't yet know what to do with those as well. A warning
for the dropped messages should suffice for now and also give us a
chance of discovering new targets if they come along without any
potential for bugs/instabilities.

Fixes: c167b9c7e3 ("platform/surface: Add Surface Aggregator subsystem")
Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
Link: https://lore.kernel.org/r/20221202223327.690880-2-luzmaximilian@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-01-12 19:11:52 +01:00
Jens Axboe
3d25b1e836 nvme fixes for Linux 6.2
- Identify quirks for Apple controllers (Hector Martin)
  - fix error handling in nvme_pci_enable (Tong Zhang)
  - refuse unprivileged passthrough on partitions (Christoph Hellwig)
  - fix MAINTAINERS to not match nvmem subsystem headers (Russell King)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmO/w30LHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYPS0xAAkQ9TVeEI11lc4ySdhSDibiKzIK7hWLEjUvXywshc
 0eP31QxfTS4EGqWP+YPwtzPjqDZJOvZK1kkAI6H7whT5c0dD7ASJCLNPC/nIcXkO
 Rib9WsJvxlBCqQ5daU7X2N6yzdLT/LMTpmyBNLG/mLhLAPtq7KM2SorfxLMRkjCB
 Z77cgz/XBWnPFYBBFpAHD0FLdbL5xNwUnA09wpNnzjJnsM4EzQDFr6rsNJTuEd04
 FEuiy60YErD9TV5zDNQoqu3IcBuXJcujBOKP1QPEQxda7qjQlxSLIWKXiycFH2j2
 o+7AL2Pabka2jlkZJcI57NcDbxPfyoBmDXlGW+qb1CeZhRCXDNvRibaaE0wR2ZUI
 FRV9kwtOaiJg0deUyFDgOtZtHdJOEK0JXA/J0t5gp/DBm9VfaTA/RY9O9ZqU1phY
 8PYvvUbj3g6cfxzfZAmNh7H/kBEova1ib3f7exAXXprmkZP9etvaT/GV0fx3DDhp
 V2/TGLW2FbzmTwFWXaGIu3bHsNUfAPXriHyF+6PPvUHU4fCeK7sBDNT/+k/yYS8D
 1xF6+WP6HXwYrHMVE5hjdgZ8Fo3bRsG7vW5e91esIukgvOYLYDE2l3Grd8vB8+vW
 CqTW47hKr7r1Dn9l+eCZ/J16a8WLmoEExW5ypH8ICyl3smwe/rR0iax565Fo/Wwr
 n2Y=
 =llj+
 -----END PGP SIGNATURE-----

Merge tag 'nvme-6.2-2023-01-12' of git://git.infradead.org/nvme into block-6.2

Pull NVMe fixes from Christoph:

"nvme fixes for Linux 6.2

 - Identify quirks for Apple controllers (Hector Martin)
 - fix error handling in nvme_pci_enable (Tong Zhang)
 - refuse unprivileged passthrough on partitions (Christoph Hellwig)
 - fix MAINTAINERS to not match nvmem subsystem headers (Russell King)"

* tag 'nvme-6.2-2023-01-12' of git://git.infradead.org/nvme:
  MAINTAINERS: stop nvme matching for nvmem files
  nvme: don't allow unprivileged passthrough on partitions
  nvme: replace the "bool vec" arguments with flags in the ioctl path
  nvme: remove __nvme_ioctl
  nvme-pci: fix error handling in nvme_pci_enable()
  nvme-pci: add NVME_QUIRK_IDENTIFY_CNS quirk to Apple T2 controllers
  nvme-apple: add NVME_QUIRK_IDENTIFY_CNS quirk to fix regression
2023-01-12 10:36:35 -07:00
Jens Axboe
6e5aedb932 io_uring/poll: attempt request issue after racy poll wakeup
If we have multiple requests waiting on the same target poll waitqueue,
then it's quite possible to get a request triggered and get disappointed
in not being able to make any progress with it. If we race in doing so,
we'll potentially leave the poll request on the internal tables, but
removed from the waitqueue. That means that any subsequent trigger of
the poll waitqueue will not kick that request into action, causing an
application to potentially wait for completion of a request that will
never happen.

Fix this by adding a new poll return state, IOU_POLL_REISSUE. Rather
than have complicated logic for how to re-arm a given type of request,
just punt it for a reissue.

While in there, move the 'ret' variable to the only section where it
gets used. This avoids confusion the scope of it.

Cc: stable@vger.kernel.org
Fixes: eb0089d629 ("io_uring: single shot poll removal optimisation")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-12 10:35:51 -07:00
Michael Klein
36c2b9d671 platform/x86: touchscreen_dmi: Add info for the CSL Panther Tab HD
Add touchscreen info for the CSL Panther Tab HD.

Signed-off-by: Michael Klein <m.klein@mvz-labor-lb.de>
Link: https://lore.kernel.org/r/20221220121103.uiwn5l7fii2iggct@LLGMVZLB-0037
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-01-12 17:45:04 +01:00
Volker Lendecke
a152d05ae4 cifs: Fix uninitialized memory read for smb311 posix symlink create
If smb311 posix is enabled, we send the intended mode for file
creation in the posix create context. Instead of using what's there on
the stack, create the mfsymlink file with 0644.

Fixes: ce558b0e17 ("smb3: Add posix create context for smb3.11 posix mounts")
Cc: stable@vger.kernel.org
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Tom Talpey <tom@talpey.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-01-12 09:51:48 -06:00
Andre Przywara
be53771c87 r8152: add vendor/device ID pair for Microsoft Devkit
The Microsoft Devkit 2023 is a an ARM64 based machine featuring a
Realtek 8153 USB3.0-to-GBit Ethernet adapter. As in their other
machines, Microsoft uses a custom USB device ID.

Add the respective ID values to the driver. This makes Ethernet work on
the MS Devkit device. The chip has been visually confirmed to be a
RTL8153.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/20230111133228.190801-1-andre.przywara@arm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-12 14:26:04 +01:00
Arunpravin Paneer Selvam
5640e81607 drm: Optimize drm buddy top-down allocation method
We are observing performance drop in many usecases which include
games, 3D benchmark applications,etc.. To solve this problem, We
are strictly not allowing top down flag enabled allocations to
steal the memory space from cpu visible region.

The idea is, we are sorting each order list entries in
ascending order and compare the last entry of each order
list in the freelist and return the max block.

This patch improves the 3D benchmark scores and solves
fragmentation issues.

All drm buddy selftests are verfied.
drm_buddy: pass:6 fail:0 skip:0 total:6

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230112120027.3072-1-Arunpravin.PaneerSelvam@amd.com
Signed-off-by: Christian König <christian.koenig@amd.com>
CC: Cc: stable@vger.kernel.org # 5.18+
2023-01-12 13:50:28 +01:00
Zack Rusin
040b35c19b drm/ttm: Fix a regression causing kernel oops'es
The branch is explicitly taken if ttm == NULL which means that to avoid
a null pointer reference the ttm object can not be used inside. Switch
back to dst_mem to avoid kernel oops'es.

This fixes kernel oops'es with any buffer objects which don't have ttm_tt,
e.g. with vram based screen objects on vmwgfx.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Fixes: e3c92eb4a8 ("drm/ttm: rework on ttm_resource to use size_t type")
Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230111175015.1134923-1-zack@kde.org
Signed-off-by: Christian König <christian.koenig@amd.com>
2023-01-12 13:28:51 +01:00
Linus Torvalds
c757fc92a3 spi: Fixes for v6.2
There's several things in this pull request:
 
  - Fixes for long standing issues with accesses to spidev->spi during
    teardown in the spidev userspace driver.
  - Rename the newly added spi-cs-setup-ns DT property to be more in line
    with our other delay properties before it becomes ABI.
  - A few driver specific fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmO+xWEACgkQJNaLcl1U
 h9C3OAf/Zuyjft9vo0U6nxLLFEpEHCFVkhUZ+oO82iIeHPi7pfWq+9dNGut1s28H
 mCP8EobJ0eU2VFcOAwvW4MVPnYw2wC1uLKzBvZOzfE9jEPM/TvpfCaKWM9Rwzfir
 o172dWTNnGGxKYcyC9JHfcLW/1au9XPwYWjGcX1T3I7c4vQyVTwxPV6fC6x9gVe2
 5eQTxntIfS9uu2tGYzUAVhklJWLIB3pjU/NUMAqFt3wsKVFl217MquMt3RHnr44M
 bJSOqi0GsNuGg9gsSV4epgT8wY9oW3Ch7PoBs+VyfwHY2byZdYLTCUBXaZOcwUDT
 89CxuJ7OqVorQJsRcdfzFumdZjXtUA==
 =jcZ7
 -----END PGP SIGNATURE-----

Merge tag 'spi-fix-v6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:

 - Fixes for long standing issues with accesses to spidev->spi during
   teardown in the spidev userspace driver.

 - Rename the newly added spi-cs-setup-ns DT property to be more in line
   with our other delay properties before it becomes ABI.

 - A few driver specific fixes.

* tag 'spi-fix-v6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: spidev: remove debug messages that access spidev->spi without locking
  spi: spidev: fix a race condition when accessing spidev->spi
  spi: Rename spi-cs-setup-ns property to spi-cs-setup-delay-ns
  spi: dt-bindings: Rename spi-cs-setup-ns to spi-cs-setup-delay-ns
  spi: cadence: Fix busy cycles calculation
  spi: mediatek: Enable irq before the spi registration
2023-01-12 06:10:45 -06:00
Linus Torvalds
cf9668a2f2 regulator: Fixes for 6.2
A couple of small driver specific fixes, one of which I queued for 6.1
 but didn't actually send out so has had *plenty* of testing in -next.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmO+xC8ACgkQJNaLcl1U
 h9DxZQf/dEZreqhL10cQGXAcyC9M9ZnESYF08mDvs5Q5L/lbeOufz+7FX9E7955x
 fC21CYRvNIu6iVOBJdpdjEWZAVsYAn7UslsEeusCH72XVAwRrMgBoiXSEHC4Z7F4
 hoGxsYnJcnVCT8czAl5nHu4qfwHJ+cZrvKznZ9z8KSlTy9iE5DzhuWWp80mwWqpY
 tjfYkNT3G5QJ0LoaMKgdFwvDSYdiwsFHBFeG/uSpH8JfTJ0b/ZjTsL4PsKLiEWNO
 d6nEnwKewzsjKgYtB3wJ5W8jEcnAx3P01/n/b7jag8hW5ALKSpSjZ3pfUZv50M17
 M0mZTzyPr0B7UBFsadW9GH05mwz1eQ==
 =Gd9D
 -----END PGP SIGNATURE-----

Merge tag 'regulator-fix-v6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
 "A couple of small driver specific fixes, one of which I queued for 6.1
  but didn't actually send out so has had *plenty* of testing in -next"

* tag 'regulator-fix-v6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: qcom-rpmh: PM8550 ldo11 regulator is an nldo
  regulator: da9211: Use irq handler when ready
2023-01-12 05:59:37 -06:00
Linus Torvalds
e58f087e9c MTD changes:
* cfi: Allow building spi-intel standalone to avoid build issues
 * parsers: scpart: Fix __udivdi3 undefined on mips
 * parsers: tplink_safeloader: Fix potential memory leak during parsing
 
 MAINTAINERS change:
 * Update email of Tudor Ambarus
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEE9HuaYnbmDhq/XIDIJWrqGEe9VoQFAmO/1kYACgkQJWrqGEe9
 VoTnewgAoLj1jxAaRQm7Qg9tk2wGfWVp+WGJ+nsqhZyXo5ohW99sp2vWbvYmr9ZI
 g2KKdiYR/pmRNtwBkWQFNPGpztp5CYkijWVfYl7uEYvJRRHnn8mRhZbd5vQmdV+V
 kFManEZucajNGSn5PsqaJEh28dY2Hw4WhAIVgWIcqMV8hD3CKD7Bn+pDHjPSpcXK
 AkkPthBSJdhmuv2CzQ8ZFboI1dy+FH8rriPI9U5Xqm65bfMcrgh1zGQ1FbyYulBn
 26dinpl72iSdzBUXvkNSOUbKxUwP4gBmXW6bz6U5NAa2t9YJD/VKAjR+D5gRVppo
 cKIA6plbPMJRdqVMyyfJNKGRTZv5FQ==
 =WxrV
 -----END PGP SIGNATURE-----

Merge tag 'mtd/fixes-for-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux

Pull MTD fixes from Miquel Raynal:

 - cfi: Allow building spi-intel standalone to avoid build issues

 - parsers: scpart: Fix __udivdi3 undefined on mips

 - parsers: tplink_safeloader: Fix potential memory leak during parsing

 - Update email of Tudor Ambarus

* tag 'mtd/fixes-for-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
  MAINTAINERS: Update email of Tudor Ambarus
  mtd: cfi: allow building spi-intel standalone
  mtd: parsers: scpart: fix __udivdi3 undefined on mips
  mtd: parsers: Fix potential memory leak in mtd_parser_tplink_safeloader_parse()
2023-01-12 05:56:06 -06:00
Takashi Iwai
84aa3059f4 ASoC: Fixes for v6.2
There's quite a few fixes here, mostly board specific apart from the SOF
 power management ones.  We also have some new quirks and Kconfig tweaks
 to enable existing code on new platforms, and a one liner which exposes
 the SOF firmware state in debugfs to aid with debugging.
 
 There's also a SPI fix that I mistakenly put in the wrong queue and
 did some merges on top of before I noticed, it seemed more trouble than
 it was worth to unpick things.  A copy of the same patch is also in the
 spi tree.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmO/BTgACgkQJNaLcl1U
 h9A8CAf/TKr2Q81PSGqgkAW5JJIMr2JAvRyIQohU7WoVGPhqe9/NwhqGI3g0s6JK
 2uWbkpStlvUPiZZvPKsynKTo4qSeNPO6HTCZL95LAsIdXB4FVdqd4NCB1vGFD5bW
 rqefwvXNnzdgbsTHIVf4mSulAQcXMVH6ivzZM6cesni+sI3JbJpUBCf0GULIiSqC
 aix2CzKkwow3xWx3r61oF7KtiCkuQQD8I+MFXlvLXTaloImnYkvOscq/CgXXi/Y8
 mmPcUy3QRo+oKx6eRwPUId/+xMLBlPW/Qemof/YXYQXgph57fMttQZG0Y+Jz/J+m
 Xpu3+NjpVoqMrpGINXqxsThjZYC+TQ==
 =brSz
 -----END PGP SIGNATURE-----

Merge tag 'asoc-fix-v6.2-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v6.2

There's quite a few fixes here, mostly board specific apart from the SOF
power management ones.  We also have some new quirks and Kconfig tweaks
to enable existing code on new platforms, and a one liner which exposes
the SOF firmware state in debugfs to aid with debugging.

There's also a SPI fix that I mistakenly put in the wrong queue and
did some merges on top of before I noticed, it seemed more trouble than
it was worth to unpick things.  A copy of the same patch is also in the
spi tree.
2023-01-12 12:52:08 +01:00
Linus Torvalds
23025cbcca SCSI fixes on 20230111
10 small fixes (less the one that cleaned up a reverted removal), 9 in
 drivers of which the ufs one is the most critical. The single core
 patch is a minor speedup to error handling.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCY77DTiYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishbE4AQDhMCqJ
 k59nF9+6ll0kbY7pNPG6TtaX9bK6HZ9caZ5kNAEApw/aZahHnl3kKS3xL3XeK8Yy
 N8m/eda0hlFy5aL0XNg=
 =VJVR
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Ten small fixes (less the one that cleaned up a reverted removal),
  nine in drivers of which the ufs one is the most critical.

  The single core patch is a minor speedup to error handling"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: libsas: Grab the ATA port lock in sas_ata_device_link_abort()
  scsi: hisi_sas: Fix tag freeing for reserved tags
  scsi: ufs: core: WLUN suspend SSU/enter hibern8 fail recovery
  scsi: scsi_debug: Delete unreachable code in inquiry_vpd_b0()
  scsi: mpi3mr: Refer CONFIG_SCSI_MPI3MR in Makefile
  scsi: core: scsi_error: Do not queue pointless abort workqueue functions
  scsi: storvsc: Fix swiotlb bounce buffer leak in confidential VM
  scsi: iscsi: Fix multiple iSCSI session unbind events sent to userspace
  scsi: mpi3mr: Remove usage of dma_get_required_mask() API
  scsi: mpt3sas: Remove usage of dma_get_required_mask() API
2023-01-12 05:50:56 -06:00
Luka Guzenko
ca88eeb308 ALSA: hda/realtek: Enable mute/micmute LEDs on HP Spectre x360 13-aw0xxx
The HP Spectre x360 13-aw0xxx devices use the ALC285 codec with GPIO 0x04
controlling the micmute LED and COEF 0x0b index 8 controlling the mute LED.
A quirk was added to make these work as well as a fixup.

Signed-off-by: Luka Guzenko <l.guzenko@web.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20230110202514.2792-1-l.guzenko@web.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-01-12 12:10:45 +01:00
Juergen Gross
26ce6ec364 x86/mm: fix poking_init() for Xen PV guests
Commit 3f4c8211d9 ("x86/mm: Use mm_alloc() in poking_init()") broke
the kernel for running as Xen PV guest.

It seems as if the new address space is never activated before being
used, resulting in Xen rejecting to accept the new CR3 value (the PGD
isn't pinned).

Fix that by adding the now missing call of paravirt_arch_dup_mmap() to
poking_init(). That call was previously done by dup_mm()->dup_mmap() and
it is a NOP for all cases but for Xen PV, where it is just doing the
pinning of the PGD.

Fixes: 3f4c8211d9 ("x86/mm: Use mm_alloc() in poking_init()")
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230109150922.10578-1-jgross@suse.com
2023-01-12 11:22:20 +01:00
Noor Azura Ahmad Tarmizi
ae9dcb91c6 net: stmmac: add aux timestamps fifo clearance wait
Add timeout polling wait for auxiliary timestamps snapshot FIFO clear bit
(ATSFC) to clear. This is to ensure no residue fifo value is being read
erroneously.

Fixes: f4da56529d ("net: stmmac: Add support for external trigger timestamping")
Cc: <stable@vger.kernel.org> # 5.10.x
Signed-off-by: Noor Azura Ahmad Tarmizi <noor.azura.ahmad.tarmizi@intel.com>
Link: https://lore.kernel.org/r/20230111050200.2130-1-noor.azura.ahmad.tarmizi@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-11 21:10:06 -08:00
Yihang Li
f58c897006 scsi: hisi_sas: Set a port invalid only if there are no devices attached when refreshing port id
Currently the driver sets the port invalid if one phy in the port is not
enabled, which may cause issues in expander situation. In directly attached
situation, if phy up doesn't occur in time when refreshing port id, the
port is incorrectly set to invalid which will also cause disk lost.

Therefore set a port invalid only if there are no devices attached to the
port.

Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1672805000-141102-3-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-01-12 00:08:03 -05:00
Xingui Yang
037b48057e scsi: hisi_sas: Use abort task set to reset SAS disks when discovered
Currently clear task set is used to abort all commands remaining in the
disk when the SAS disk is discovered, and if the disk is discovered by two
initiators, other I_T nexuses are also affected. So use abort task set
instead and take effect only on the specified I_T nexus.

Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1672805000-141102-2-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-01-12 00:08:03 -05:00
Jakub Kicinski
eb25df8828 Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-01-10 (ixgbe, igc, iavf)

This series contains updates to ixgbe, igc, and iavf drivers.

Yang Yingliang adds calls to pci_dev_put() for proper ref count tracking
on ixgbe.

Christopher adds setting of Toggle on Target Time bits for proper
pulse per second (PPS) synchronization for igc.

Daniil Tatianin fixes, likely, copy/paste issue that misreported
destination instead of source for IP mask for iavf error.

* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  iavf/iavf_main: actually log ->src mask when talking about it
  igc: Fix PPS delta between two synchronized end-points
  ixgbe: fix pci device refcount leak
====================

Link: https://lore.kernel.org/r/20230110223825.648544-1-anthony.l.nguyen@intel.com
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-11 20:50:42 -08:00
Jakub Kicinski
97f5e03a4a bnxt: make sure we return pages to the pool
Before the commit under Fixes the page would have been released
from the pool before the napi_alloc_skb() call, so normal page
freeing was fine (released page == no longer in the pool).

After the change we just mark the page for recycling so it's still
in the pool if the skb alloc fails, we need to recycle.

Same commit added the same bug in the new bnxt_rx_multi_page_skb().

Fixes: 1dc4c557bf ("bnxt: adding bnxt_xdp_build_skb to build skb from multibuffer xdp_buff")
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Link: https://lore.kernel.org/r/20230111042547.987749-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-11 20:48:37 -08:00
Jie Wang
ae9f29fdfd net: hns3: fix wrong use of rss size during VF rss config
Currently, it used old rss size to get current tc mode. As a result, the
rss size is updated, but the tc mode is still configured based on the old
rss size.

So this patch fixes it by using the new rss size in both process.

Fixes: 93969dc14f ("net: hns3: refactor VF rss init APIs with new common rss init APIs")
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Hao Lan <lanhao@huawei.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://lore.kernel.org/r/20230110115359.10163-1-lanhao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-11 20:20:59 -08:00
Lizzy Fleckenstein
19fa92fb72 init/Kconfig: fix typo (usafe -> unsafe)
Fix the help text for the PRINTK_SAFE_LOG_BUF_SHIFT setting.

Link: https://lkml.kernel.org/r/20230109201837.23873-1-eliasfleckenstein@web.de
Signed-off-by: Lizzy Fleckenstein <eliasfleckenstein@web.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-11 16:14:23 -08:00
Liam Howlett
fd9edbdbdc nommu: fix split_vma() map_count error
During the maple tree conversion of nommu, an error in counting the VMAs
was introduced by counting the existing VMA again.  The counting used to
be decremented by one and incremented by two, but now it only increments
by two.  Fix the counting error by moving the increment outside the
setup_vma_to_mm() function to the callers.

Link: https://lkml.kernel.org/r/20230109205809.956325-1-Liam.Howlett@oracle.com
Fixes: 8220543df1 ("nommu: remove uses of VMA linked list")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-11 16:14:23 -08:00
Liam Howlett
80be727ec8 nommu: fix do_munmap() error path
When removing a VMA from the tree fails due to no memory, do not free the
VMA since a reference still exists.

Link: https://lkml.kernel.org/r/20230109205708.956103-1-Liam.Howlett@oracle.com
Fixes: 8220543df1 ("nommu: remove uses of VMA linked list")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-11 16:14:22 -08:00
Liam Howlett
7f31cced57 nommu: fix memory leak in do_mmap() error path
The preallocation of the maple tree nodes may leak if the error path to
"error_just_free" is taken.  Fix this by moving the freeing of the maple
tree nodes to a shared location for all error paths.

Link: https://lkml.kernel.org/r/20230109205507.955577-1-Liam.Howlett@oracle.com
Fixes: 8220543df1 ("nommu: remove uses of VMA linked list")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-11 16:14:22 -08:00
Robert Foss
bf61acbed8 MAINTAINERS: update Robert Foss' email address
Update the email address for Robert's maintainer entries and fill in
.mailmap accordingly.

Link: https://lkml.kernel.org/r/20230106152151.115648-1-robert.foss@linaro.org
Signed-off-by: Robert Foss <rfoss@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: Kirill Tkhai <tkhai@ya.ru>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Vasily Averin <vasily.averin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-11 16:14:22 -08:00
Alexey Dobriyan
5316a017d0 proc: fix PIE proc-empty-vm, proc-pid-vm tests
vsyscall detection code uses direct call to the beginning of
the vsyscall page:

	asm ("call %P0" :: "i" (0xffffffffff600000))

It generates "call rel32" instruction but it is not relocated if binary
is PIE, so binary segfaults into random userspace address and vsyscall
page status is detected incorrectly.

Do more direct:

	asm ("call *%rax")

which doesn't do need any relocaltions.

Mark g_vsyscall as volatile for a good measure, I didn't find instruction
setting it to 0. Now the code is obviously correct:

	xor	eax, eax
	mov	rdi, rbp
	mov	rsi, rbp
	mov	DWORD PTR [rip+0x2d15], eax      # g_vsyscall = 0
	mov	rax, 0xffffffffff600000
	call	rax
	mov	DWORD PTR [rip+0x2d02], 1        # g_vsyscall = 1
	mov	eax, DWORD PTR ds:0xffffffffff600000
	mov	DWORD PTR [rip+0x2cf1], 2        # g_vsyscall = 2
	mov	edi, [rip+0x2ceb]                # exit(g_vsyscall)
	call	exit

Note: fixed proc-empty-vm test oopses 5.19.0-28-generic kernel
	but this is separate story.

Link: https://lkml.kernel.org/r/Y7h2xvzKLg36DSq8@p183
Fixes: 5bc73bb345 ("proc: test how it holds up with mapping'less process")
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reported-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Tested-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-11 16:14:22 -08:00
Lorenzo Stoakes
8651a137e6 mm: update mmap_sem comments to refer to mmap_lock
The rename from mm->mmap_sem to mm->mmap_lock was performed in commit
da1c55f1b2 ("mmap locking API: rename mmap_sem to mmap_lock") and commit
c1e8d7c6a7 ("map locking API: convert mmap_sem comments"), however some
incorrect comments remain.

This patch simply corrects those comments which are obviously incorrect
within mm itself.

Link: https://lkml.kernel.org/r/33fba04389ab63fc4980e7ba5442f521df6dc657.1673048927.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-11 16:14:22 -08:00
SeongJae Park
0411d6ee50 include/linux/mm: fix release_pages_arg kernel doc comment
Commit 449c796768 ("mm: teach release_pages() to take an array of
encoded page pointers too") added the kernel doc comment for
release_pages() on top of 'union release_pages_arg', so making 'make
htmldocs' complains as below:

    ./include/linux/mm.h:1268: warning: cannot understand function prototype: 'typedef union '

The kernel doc comment for the function is already on top of the
function's definition in mm/swap.c, and the new comment is actually not
for the function but indeed release_pages_arg.  Fixing the comment to
reflect the intent would be one option.  But, kernel doc cannot parse
the union as below due to the attribute.

    ./include/linux/mm.h:1272: error: Cannot parse struct or union!

Modify the comment to reflect the intent but do not mark it as a kernel
doc comment.

Link: https://lkml.kernel.org/r/20230106203331.127532-1-sj@kernel.org
Fixes: 449c796768 ("mm: teach release_pages() to take an array of encoded page pointers too")
Signed-off-by: SeongJae Park <sj@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-11 16:14:21 -08:00
Randy Dunlap
d09dce1fff lib/win_minmax: use /* notation for regular comments
Don't use kernel-doc "/**" notation for non-kernel-doc comments.
Prevents a kernel-doc warning:

lib/win_minmax.c:31: warning: expecting prototype for lib/minmax.c(). Prototype was for minmax_subwin_update() instead

Link: https://lkml.kernel.org/r/20230102211614.26343-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-11 16:14:21 -08:00
Andrey Konovalov
0de4a7f5ba kasan: mark kasan_kunit_executing as static
Mark kasan_kunit_executing as static, as it is only used within
mm/kasan/report.c.

Link: https://lkml.kernel.org/r/f64778a4683b16a73bba72576f73bf4a2b45a82f.1672794398.git.andreyknvl@google.com
Fixes: c8c7016f50 ("kasan: fail non-kasan KUnit tests on KASAN reports")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-11 16:14:21 -08:00
Ryusuke Konishi
7633355e5c nilfs2: fix general protection fault in nilfs_btree_insert()
If nilfs2 reads a corrupted disk image and tries to reads a b-tree node
block by calling __nilfs_btree_get_block() against an invalid virtual
block address, it returns -ENOENT because conversion of the virtual block
address to a disk block address fails.  However, this return value is the
same as the internal code that b-tree lookup routines return to indicate
that the block being searched does not exist, so functions that operate on
that b-tree may misbehave.

When nilfs_btree_insert() receives this spurious 'not found' code from
nilfs_btree_do_lookup(), it misunderstands that the 'not found' check was
successful and continues the insert operation using incomplete lookup path
data, causing the following crash:

 general protection fault, probably for non-canonical address
 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN
 KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
 ...
 RIP: 0010:nilfs_btree_get_nonroot_node fs/nilfs2/btree.c:418 [inline]
 RIP: 0010:nilfs_btree_prepare_insert fs/nilfs2/btree.c:1077 [inline]
 RIP: 0010:nilfs_btree_insert+0x6d3/0x1c10 fs/nilfs2/btree.c:1238
 Code: bc 24 80 00 00 00 4c 89 f8 48 c1 e8 03 42 80 3c 28 00 74 08 4c 89
 ff e8 4b 02 92 fe 4d 8b 3f 49 83 c7 28 4c 89 f8 48 c1 e8 03 <42> 80 3c
 28 00 74 08 4c 89 ff e8 2e 02 92 fe 4d 8b 3f 49 83 c7 02
 ...
 Call Trace:
 <TASK>
  nilfs_bmap_do_insert fs/nilfs2/bmap.c:121 [inline]
  nilfs_bmap_insert+0x20d/0x360 fs/nilfs2/bmap.c:147
  nilfs_get_block+0x414/0x8d0 fs/nilfs2/inode.c:101
  __block_write_begin_int+0x54c/0x1a80 fs/buffer.c:1991
  __block_write_begin fs/buffer.c:2041 [inline]
  block_write_begin+0x93/0x1e0 fs/buffer.c:2102
  nilfs_write_begin+0x9c/0x110 fs/nilfs2/inode.c:261
  generic_perform_write+0x2e4/0x5e0 mm/filemap.c:3772
  __generic_file_write_iter+0x176/0x400 mm/filemap.c:3900
  generic_file_write_iter+0xab/0x310 mm/filemap.c:3932
  call_write_iter include/linux/fs.h:2186 [inline]
  new_sync_write fs/read_write.c:491 [inline]
  vfs_write+0x7dc/0xc50 fs/read_write.c:584
  ksys_write+0x177/0x2a0 fs/read_write.c:637
  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
  do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
  entry_SYSCALL_64_after_hwframe+0x63/0xcd
 ...
 </TASK>

This patch fixes the root cause of this problem by replacing the error
code that __nilfs_btree_get_block() returns on block address conversion
failure from -ENOENT to another internal code -EINVAL which means that the
b-tree metadata is corrupted.

By returning -EINVAL, it propagates without glitches, and for all relevant
b-tree operations, functions in the upper bmap layer output an error
message indicating corrupted b-tree metadata via
nilfs_bmap_convert_error(), and code -EIO will be eventually returned as
it should be.

Link: https://lkml.kernel.org/r/000000000000bd89e205f0e38355@google.com
Link: https://lkml.kernel.org/r/20230105055356.8811-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+ede796cecd5296353515@syzkaller.appspotmail.com
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-11 16:14:21 -08:00
Nhat Pham
1beb8ae302 Docs/admin-guide/mm/zswap: remove zsmalloc's lack of writeback warning
Writeback has been implemented for zsmalloc, so this warning no longer
holds.

Link: https://lkml.kernel.org/r/20230106220016.172303-1-nphamcs@gmail.com
Fixes: 9997bc0175 ("zsmalloc: implement writeback mechanism for zsmalloc")
Suggested-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-11 16:14:21 -08:00
Peter Xu
fed15f1345 mm/hugetlb: pre-allocate pgtable pages for uffd wr-protects
Userfaultfd-wp uses pte markers to mark wr-protected pages for both shmem
and hugetlb.  Shmem has pre-allocation ready for markers, but hugetlb path
was overlooked.

Doing so by calling huge_pte_alloc() if the initial pgtable walk fails to
find the huge ptep.  It's possible that huge_pte_alloc() can fail with
high memory pressure, in that case stop the loop immediately and fail
silently.  This is not the most ideal solution but it matches with what we
do with shmem meanwhile it avoids the splat in dmesg.

Link: https://lkml.kernel.org/r/20230104225207.1066932-2-peterx@redhat.com
Fixes: 60dfaad65a ("mm/hugetlb: allow uffd wr-protect none ptes")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: James Houghton <jthoughton@google.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: James Houghton <jthoughton@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: <stable@vger.kernel.org>	[5.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-11 16:14:21 -08:00
James Houghton
b30c14cd61 hugetlb: unshare some PMDs when splitting VMAs
PMD sharing can only be done in PUD_SIZE-aligned pieces of VMAs; however,
it is possible that HugeTLB VMAs are split without unsharing the PMDs
first.

Without this fix, it is possible to hit the uffd-wp-related WARN_ON_ONCE
in hugetlb_change_protection [1].  The key there is that
hugetlb_unshare_all_pmds will not attempt to unshare PMDs in
non-PUD_SIZE-aligned sections of the VMA.

It might seem ideal to unshare in hugetlb_vm_op_open, but we need to
unshare in both the new and old VMAs, so unsharing in hugetlb_vm_op_split
seems natural.

[1]: https://lore.kernel.org/linux-mm/CADrL8HVeOkj0QH5VZZbRzybNE8CG-tEGFshnA+bG9nMgcWtBSg@mail.gmail.com/

Link: https://lkml.kernel.org/r/20230104231910.1464197-1-jthoughton@google.com
Fixes: 6dfeaff93b ("hugetlb/userfaultfd: unshare all pmds for hugetlbfs when register wp")
Signed-off-by: James Houghton <jthoughton@google.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-11 16:14:20 -08:00
Suren Baghdasaryan
a1193de562 mm: fix vma->anon_name memory leak for anonymous shmem VMAs
free_anon_vma_name() is missing a check for anonymous shmem VMA which
leads to a memory leak due to refcount not being dropped.  Fix this by
calling anon_vma_name_put() unconditionally.  It will free vma->anon_name
whenever it's non-NULL.

Link: https://lkml.kernel.org/r/20230105000241.1450843-1-surenb@google.com
Fixes: d09e8ca6cb ("mm: anonymous shared memory naming")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reported-by: syzbot+91edf9178386a07d06a7@syzkaller.appspotmail.com
Cc: Hugh Dickins <hughd@google.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-11 16:14:20 -08:00
Zach O'Keefe
3de0c269ad mm/shmem: restore SHMEM_HUGE_DENY precedence over MADV_COLLAPSE
SHMEM_HUGE_DENY is for emergency use by the admin, to disable allocation
of shmem huge pages if, for example, a dangerous bug is found in their
usage: see "deny" in Documentation/mm/transhuge.rst.  An app using
madvise(,,MADV_COLLAPSE) should not be allowed to override it: restore its
precedence over shmem_huge_force.

Restore SHMEM_HUGE_DENY precedence over MADV_COLLAPSE.

Link: https://lkml.kernel.org/r/20221224082035.3197140-2-zokeefe@google.com
Fixes: 7c6c6cc4d3 ("mm/shmem: add flag to enforce shmem THP in hugepage_vma_check()")
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Suggested-by: Hugh Dickins <hughd@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-11 16:14:20 -08:00
Zach O'Keefe
52dc031088 mm/MADV_COLLAPSE: don't expand collapse when vm_end is past requested end
MADV_COLLAPSE acts on one hugepage-aligned/sized region at a time, until
it has collapsed all eligible memory contained within the bounds supplied
by the user.

At the top of each hugepage iteration we (re)lock mmap_lock and
(re)validate the VMA for eligibility and update variables that might have
changed while mmap_lock was dropped.  One thing that might occur is that
the VMA could be resized, and as such, we refetch vma->vm_end to make sure
we don't collapse past the end of the VMA's new end.

However, it's possible that when refetching vma->vm_end that we expand the
region acted on by MADV_COLLAPSE if vma->vm_end is greater than size+len
supplied by the user.

The consequence here is that we may attempt to collapse more memory than
requested, possibly yielding either "too much success" or "false failure"
user-visible results.  An example of the former is if we MADV_COLLAPSE the
first 4MiB of a 2TiB mmap()'d file, the incorrect refetch would cause the
operation to block for much longer than anticipated as we attempt to
collapse the entire TiB region.  An example of the latter is that applying
MADV_COLLPSE to a 4MiB file mapped to the start of a 6MiB VMA will
successfully collapse the first 4MiB, then incorrectly attempt to collapse
the last hugepage-aligned/sized region -- fail (since readahead/page cache
lookup will fail) -- and report a failure to the user.

I don't believe there is a kernel stability concern here as we always
(re)validate the VMA / region accordingly.  Also as Hugh mentions, the
user-visible effects are: we try to collapse more memory than requested
by the user, and/or failing an operation that should have otherwise
succeeded.  An example is trying to collapse a 4MiB file contained
within a 12MiB VMA.

Don't expand the acted-on region when refetching vma->vm_end.

Link: https://lkml.kernel.org/r/20221224082035.3197140-1-zokeefe@google.com
Fixes: 4d24de9425 ("mm: MADV_COLLAPSE: refetch vm_end after reacquiring mmap_lock")
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Reported-by: Hugh Dickins <hughd@google.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-11 16:14:20 -08:00
David Hildenbrand
51d3d5eb74 mm/userfaultfd: enable writenotify while userfaultfd-wp is enabled for a VMA
Currently, we don't enable writenotify when enabling userfaultfd-wp on a
shared writable mapping (for now only shmem and hugetlb).  The consequence
is that vma->vm_page_prot will still include write permissions, to be set
as default for all PTEs that get remapped (e.g., mprotect(), NUMA hinting,
page migration, ...).

So far, vma->vm_page_prot is assumed to be a safe default, meaning that we
only add permissions (e.g., mkwrite) but not remove permissions (e.g.,
wrprotect).  For example, when enabling softdirty tracking, we enable
writenotify.  With uffd-wp on shared mappings, that changed.  More details
on vma->vm_page_prot semantics were summarized in [1].

This is problematic for uffd-wp: we'd have to manually check for a uffd-wp
PTEs/PMDs and manually write-protect PTEs/PMDs, which is error prone. 
Prone to such issues is any code that uses vma->vm_page_prot to set PTE
permissions: primarily pte_modify() and mk_pte().

Instead, let's enable writenotify such that PTEs/PMDs/...  will be mapped
write-protected as default and we will only allow selected PTEs that are
definitely safe to be mapped without write-protection (see
can_change_pte_writable()) to be writable.  In the future, we might want
to enable write-bit recovery -- e.g., can_change_pte_writable() -- at more
locations, for example, also when removing uffd-wp protection.

This fixes two known cases:

(a) remove_migration_pte() mapping uffd-wp'ed PTEs writable, resulting
    in uffd-wp not triggering on write access.
(b) do_numa_page() / do_huge_pmd_numa_page() mapping uffd-wp'ed PTEs/PMDs
    writable, resulting in uffd-wp not triggering on write access.

Note that do_numa_page() / do_huge_pmd_numa_page() can be reached even
without NUMA hinting (which currently doesn't seem to be applicable to
shmem), for example, by using uffd-wp with a PROT_WRITE shmem VMA.  On
such a VMA, userfaultfd-wp is currently non-functional.

Note that when enabling userfaultfd-wp, there is no need to walk page
tables to enforce the new default protection for the PTEs: we know that
they cannot be uffd-wp'ed yet, because that can only happen after enabling
uffd-wp for the VMA in general.

Also note that this makes mprotect() on ranges with uffd-wp'ed PTEs not
accidentally set the write bit -- which would result in uffd-wp not
triggering on later write access.  This commit makes uffd-wp on shmem
behave just like uffd-wp on anonymous memory in that regard, even though,
mixing mprotect with uffd-wp is controversial.

[1] https://lkml.kernel.org/r/92173bad-caa3-6b43-9d1e-9a471fdbc184@redhat.com

Link: https://lkml.kernel.org/r/20221209080912.7968-1-david@redhat.com
Fixes: b1f9e87686 ("mm/uffd: enable write protection for shmem & hugetlbfs")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Ives van Hoorne <ives@codesandbox.io>
Debugged-by: Peter Xu <peterx@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-11 16:14:20 -08:00
Hugh Dickins
ab0c3f1251 mm/khugepaged: fix collapse_pte_mapped_thp() to allow anon_vma
uprobe_write_opcode() uses collapse_pte_mapped_thp() to restore huge pmd,
when removing a breakpoint from hugepage text: vma->anon_vma is always set
in that case, so undo the prohibition.  And MADV_COLLAPSE ought to be able
to collapse some page tables in a vma which happens to have anon_vma set
from CoWing elsewhere.

Is anon_vma lock required?  Almost not: if any page other than expected
subpage of the non-anon huge page is found in the page table, collapse is
aborted without making any change.  However, it is possible that an anon
page was CoWed from this extent in another mm or vma, in which case a
concurrent lookup might look here: so keep it away while clearing pmd (but
perhaps we shall go back to using pmd_lock() there in future).

Note that collapse_pte_mapped_thp() is exceptional in freeing a page table
without having cleared its ptes: I'm uneasy about that, and had thought
pte_clear()ing appropriate; but exclusive i_mmap lock does fix the
problem, and we would have to move the mmu_notification if clearing those
ptes.

What this fixes is not a dangerous instability.  But I suggest Cc stable
because uprobes "healing" has regressed in that way, so this should follow
8d3c106e19 into those stable releases where it was backported (and may
want adjustment there - I'll supply backports as needed).

Link: https://lkml.kernel.org/r/b740c9fb-edba-92ba-59fb-7a5592e5dfc@google.com
Fixes: 8d3c106e19 ("mm/khugepaged: take the right locks for page table retraction")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: <stable@vger.kernel.org>    [5.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-11 16:14:19 -08:00
David Hildenbrand
44f86392bd mm/hugetlb: fix uffd-wp handling for migration entries in hugetlb_change_protection()
We have to update the uffd-wp SWP PTE bit independent of the type of
migration entry.  Currently, if we're unlucky and we want to install/clear
the uffd-wp bit just while we're migrating a read-only mapped hugetlb
page, we would miss to set/clear the uffd-wp bit.

Further, if we're processing a readable-exclusive migration entry and
neither want to set or clear the uffd-wp bit, we could currently end up
losing the uffd-wp bit.  Note that the same would hold for writable
migrating entries, however, having a writable migration entry with the
uffd-wp bit set would already mean that something went wrong.

Note that the change from !is_readable_migration_entry ->
writable_migration_entry is harmless and actually cleaner, as raised by
Miaohe Lin and discussed in [1].

[1] https://lkml.kernel.org/r/90dd6a93-4500-e0de-2bf0-bf522c311b0c@huawei.com

Link: https://lkml.kernel.org/r/20221222205511.675832-3-david@redhat.com
Fixes: 60dfaad65a ("mm/hugetlb: allow uffd wr-protect none ptes")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-11 16:14:19 -08:00
David Hildenbrand
0e678153f5 mm/hugetlb: fix PTE marker handling in hugetlb_change_protection()
Patch series "mm/hugetlb: uffd-wp fixes for hugetlb_change_protection()".

Playing with virtio-mem and background snapshots (using uffd-wp) on
hugetlb in QEMU, I managed to trigger a VM_BUG_ON().  Looking into the
details, hugetlb_change_protection() seems to not handle uffd-wp correctly
in all cases.

Patch #1 fixes my test case.  I don't have reproducers for patch #2, as it
requires running into migration entries.

I did not yet check in detail yet if !hugetlb code requires similar care.


This patch (of 2):

There are two problematic cases when stumbling over a PTE marker in
hugetlb_change_protection():

(1) We protect an uffd-wp PTE marker a second time using uffd-wp: we will
    end up in the "!huge_pte_none(pte)" case and mess up the PTE marker.

(2) We unprotect a uffd-wp PTE marker: we will similarly end up in the
    "!huge_pte_none(pte)" case even though we cleared the PTE, because
    the "pte" variable is stale. We'll mess up the PTE marker.

For example, if we later stumble over such a "wrongly modified" PTE marker,
we'll treat it like a present PTE that maps some garbage page.

This can, for example, be triggered by mapping a memfd backed by huge
pages, registering uffd-wp, uffd-wp'ing an unmapped page and (a)
uffd-wp'ing it a second time; or (b) uffd-unprotecting it; or (c)
unregistering uffd-wp. Then, ff we trigger fallocate(FALLOC_FL_PUNCH_HOLE)
on that file range, we will run into a VM_BUG_ON:

[  195.039560] page:00000000ba1f2987 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x0
[  195.039565] flags: 0x7ffffc0001000(reserved|node=0|zone=0|lastcpupid=0x1fffff)
[  195.039568] raw: 0007ffffc0001000 ffffe742c0000008 ffffe742c0000008 0000000000000000
[  195.039569] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[  195.039569] page dumped because: VM_BUG_ON_PAGE(compound && !PageHead(page))
[  195.039573] ------------[ cut here ]------------
[  195.039574] kernel BUG at mm/rmap.c:1346!
[  195.039579] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[  195.039581] CPU: 7 PID: 4777 Comm: qemu-system-x86 Not tainted 6.0.12-200.fc36.x86_64 #1
[  195.039583] Hardware name: LENOVO 20WNS1F81N/20WNS1F81N, BIOS N35ET50W (1.50 ) 09/15/2022
[  195.039584] RIP: 0010:page_remove_rmap+0x45b/0x550
[  195.039588] Code: [...]
[  195.039589] RSP: 0018:ffffbc03c3633ba8 EFLAGS: 00010292
[  195.039591] RAX: 0000000000000040 RBX: ffffe742c0000000 RCX: 0000000000000000
[  195.039592] RDX: 0000000000000002 RSI: ffffffff8e7aac1a RDI: 00000000ffffffff
[  195.039592] RBP: 0000000000000001 R08: 0000000000000000 R09: ffffbc03c3633a08
[  195.039593] R10: 0000000000000003 R11: ffffffff8f146328 R12: ffff9b04c42754b0
[  195.039594] R13: ffffffff8fcc6328 R14: ffffbc03c3633c80 R15: ffff9b0484ab9100
[  195.039595] FS:  00007fc7aaf68640(0000) GS:ffff9b0bbf7c0000(0000) knlGS:0000000000000000
[  195.039596] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  195.039597] CR2: 000055d402c49110 CR3: 0000000159392003 CR4: 0000000000772ee0
[  195.039598] PKRU: 55555554
[  195.039599] Call Trace:
[  195.039600]  <TASK>
[  195.039602]  __unmap_hugepage_range+0x33b/0x7d0
[  195.039605]  unmap_hugepage_range+0x55/0x70
[  195.039608]  hugetlb_vmdelete_list+0x77/0xa0
[  195.039611]  hugetlbfs_fallocate+0x410/0x550
[  195.039612]  ? _raw_spin_unlock_irqrestore+0x23/0x40
[  195.039616]  vfs_fallocate+0x12e/0x360
[  195.039618]  __x64_sys_fallocate+0x40/0x70
[  195.039620]  do_syscall_64+0x58/0x80
[  195.039623]  ? syscall_exit_to_user_mode+0x17/0x40
[  195.039624]  ? do_syscall_64+0x67/0x80
[  195.039626]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  195.039628] RIP: 0033:0x7fc7b590651f
[  195.039653] Code: [...]
[  195.039654] RSP: 002b:00007fc7aaf66e70 EFLAGS: 00000293 ORIG_RAX: 000000000000011d
[  195.039655] RAX: ffffffffffffffda RBX: 0000558ef4b7f370 RCX: 00007fc7b590651f
[  195.039656] RDX: 0000000018000000 RSI: 0000000000000003 RDI: 000000000000000c
[  195.039657] RBP: 0000000008000000 R08: 0000000000000000 R09: 0000000000000073
[  195.039658] R10: 0000000008000000 R11: 0000000000000293 R12: 0000000018000000
[  195.039658] R13: 00007fb8bbe00000 R14: 000000000000000c R15: 0000000000001000
[  195.039661]  </TASK>

Fix it by not going into the "!huge_pte_none(pte)" case if we stumble over
an exclusive marker.  spin_unlock() + continue would get the job done.

However, instead, make it clearer that there are no fall-through
statements: we process each case (hwpoison, migration, marker, !none,
none) and then unlock the page table to continue with the next PTE.  Let's
avoid "continue" statements and use a single spin_unlock() at the end.

Link: https://lkml.kernel.org/r/20221222205511.675832-1-david@redhat.com
Link: https://lkml.kernel.org/r/20221222205511.675832-2-david@redhat.com
Fixes: 60dfaad65a ("mm/hugetlb: allow uffd wr-protect none ptes")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-11 16:14:19 -08:00
Yang Yingliang
f12cd06109 powerpc/64s/hash: Make stress_hpt_timer_fn() static
stress_hpt_timer_fn() is only used in hash_utils.c, make it static.

Fixes: 6b34a099fa ("powerpc/64s/hash: add stress_hpt kernel boot option to increase hash faults")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221228093603.3166599-1-yangyingliang@huawei.com
2023-01-12 10:53:37 +11:00
Linus Torvalds
e8f60cd7db perf tools fixes for v6.2: 2nd batch
- Make 'perf kmem' cope with the removal of some kmem:kmem_cache_alloc_node and
   kmem:kmalloc_node in the 11e9734bcb ("mm/slab_common: unify NUMA and
   UMA version of tracepoints") commit, making sure it works with Linux >= 6.2 as well
   as with older kernels where those tracepoints are present.
 
 - Also make it handle the new "node" kmem:kmalloc and kmem:kmem_cache_alloc tracepoint
   field introduced in that same commit.
 
 - Fix hardware tracing PMU address filter duplicate symbol selection, that was
   preventing to match with static functions with the same name present in different
   object files.
 
 - Fix regression on what linux/types.h file gets used to build the "BPF prologue"
   'perf test' entry, the system one lacks the fmode_t definition used in this test,
   so provide that type in the test itself.
 
 - Avoid build breakage with libbpf < 0.8.0 + LIBBPF_DYNAMIC=1. If the user asks for
   linking with the libbpf package provided by the distro, then it has to be >= 0.8.0.
   Using the libbpf supplied with the kernel would be a fallback in that case.
 
 - Fix the build when libbpf isn't available or explicitly disabled via NO_LIBBPF=1.
 
 - Don't try to install libtraceevent plugins as its not anymore in the kernel sources
   and will thus always fail.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCY77xBAAKCRCyPKLppCJ+
 J9B5AQCjCu18vsulBd0nRcfBZDGdw2P53OJl10slaAdU4t6RLgD/fwhRZ5PlmDiY
 TD+rARyZYsOsZ+9bVpJTZKh3nuCnewo=
 =T8iN
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-fixes-for-v6.2-2-2023-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Make 'perf kmem' cope with the removal of some
   kmem:kmem_cache_alloc_node and kmem:kmalloc_node in the
   11e9734bcb ("mm/slab_common: unify NUMA and UMA version of
   tracepoints") commit, making sure it works with Linux >= 6.2 as well
   as with older kernels where those tracepoints are present.

 - Also make it handle the new "node" kmem:kmalloc and
   kmem:kmem_cache_alloc tracepoint field introduced in that same
   commit.

 - Fix hardware tracing PMU address filter duplicate symbol selection,
   that was preventing to match with static functions with the same name
   present in different object files.

 - Fix regression on what linux/types.h file gets used to build the "BPF
   prologue" 'perf test' entry, the system one lacks the fmode_t
   definition used in this test, so provide that type in the test
   itself.

 - Avoid build breakage with libbpf < 0.8.0 + LIBBPF_DYNAMIC=1. If the
   user asks for linking with the libbpf package provided by the distro,
   then it has to be >= 0.8.0. Using the libbpf supplied with the kernel
   would be a fallback in that case.

 - Fix the build when libbpf isn't available or explicitly disabled via
   NO_LIBBPF=1.

 - Don't try to install libtraceevent plugins as its not anymore in the
   kernel sources and will thus always fail.

* tag 'perf-tools-fixes-for-v6.2-2-2023-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf auxtrace: Fix address filter duplicate symbol selection
  perf bpf: Avoid build breakage with libbpf < 0.8.0 + LIBBPF_DYNAMIC=1
  perf build: Fix build error when NO_LIBBPF=1
  perf tools: Don't install libtraceevent plugins as its not anymore in the kernel sources
  perf kmem: Support field "node" in evsel__process_alloc_event() coping with recent tracepoint restructuring
  perf kmem: Support legacy tracepoints
  perf build: Properly guard libbpf includes
  perf tests bpf prologue: Fix bpf-script-test-prologue test compile issue with clang
2023-01-11 17:12:14 -06:00
David Woodhouse
310bc39546 KVM: x86/xen: Avoid deadlock by adding kvm->arch.xen.xen_lock leaf node lock
In commit 14243b3871 ("KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN
and event channel delivery") the clever version of me left some helpful
notes for those who would come after him:

       /*
        * For the irqfd workqueue, using the main kvm->lock mutex is
        * fine since this function is invoked from kvm_set_irq() with
        * no other lock held, no srcu. In future if it will be called
        * directly from a vCPU thread (e.g. on hypercall for an IPI)
        * then it may need to switch to using a leaf-node mutex for
        * serializing the shared_info mapping.
        */
       mutex_lock(&kvm->lock);

In commit 2fd6df2f2b ("KVM: x86/xen: intercept EVTCHNOP_send from guests")
the other version of me ran straight past that comment without reading it,
and introduced a potential deadlock by taking vcpu->mutex and kvm->lock
in the wrong order.

Solve this as originally suggested, by adding a leaf-node lock in the Xen
state rather than using kvm->lock for it.

Fixes: 2fd6df2f2b ("KVM: x86/xen: intercept EVTCHNOP_send from guests")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20230111180651.14394-4-dwmw2@infradead.org>
[Rebase, add docs. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-11 17:45:58 -05:00
Bjorn Helgaas
a48fe63769 x86/pci: Simplify is_mmconf_reserved() messages
is_mmconf_reserved() takes a "with_e820" parameter that only determines the
message logged if it finds the MMCONFIG region is reserved.  Pass the
message directly, which will simplify a future patch that adds a new way of
looking for that reservation.  No functional change intended.

Link: https://lore.kernel.org/r/20230110180243.1590045-2-helgaas@kernel.org
Tested-by: Tony Luck <tony.luck@intel.com>
Tested-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
2023-01-11 16:28:09 -06:00
Akira Yokosawa
a33ae832bf docs/conf.py: Use about.html only in sidebar of alabaster theme
"about.html" is available only for the alabaster theme [1].
Unconditionally putting it to html_sidebars prevents us from
using other themes which respect html_sidebars.

Remove about.html from the initialization and insert it at the
front for the alabaster theme.

Link: [1] https://alabaster.readthedocs.io/en/latest/installation.html#sidebars
Fixes: d5389d3145 ("docs: Switch the default HTML theme to alabaster")
Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Link: https://lore.kernel.org/r/4b162dbe-2a7f-1710-93e0-754cf8680aae@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-01-11 15:06:50 -07:00
Andrew Morton
12b98f333f Merge branch 'master' into mm-stable 2023-01-11 14:01:38 -08:00
Heiko Carstens
1ecf7bd9c2 s390: update defconfigs
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-11 21:26:40 +01:00