2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-17 17:53:56 +08:00
Commit Graph

1014917 Commits

Author SHA1 Message Date
Sven Schnelle
17e89e1340 s390/facilities: move stfl information from lowcore to global data
With gcc-11, there are a lot of warnings because the facility functions
are accessing lowcore through a null pointer. Fix this by moving the
facility arrays away from lowcore.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:06:58 +02:00
Sven Schnelle
af9ad82290 s390/entry: use assignment to read intcode / asm to copy gprs
arch/s390/kernel/syscall.c: In function __do_syscall:
arch/s390/kernel/syscall.c:147:9: warning: memcpy reading 64 bytes from a region of size 0 [-Wstringop-overread]
  147 |         memcpy(&regs->gprs[8], S390_lowcore.save_area_sync, 8 * sizeof(unsigned long));
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/s390/kernel/syscall.c:148:9: warning: memcpy reading 4 bytes from a region of size 0 [-Wstringop-overread]
  148 |         memcpy(&regs->int_code, &S390_lowcore.svc_ilc, sizeof(regs->int_code));
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix this by moving the gprs restore from C to assembly, and use a assignment
for int_code instead of memcpy.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:06:58 +02:00
Niklas Schnelle
d460bb6c64 s390: enable HAVE_IOREMAP_PROT
In commit b02002cc4c ("s390/pci: Implement ioremap_wc/prot() with
MIO") we implemented both ioremap_wc() and ioremap_prot() however until
now we had not set HAVE_IOREMAP_PROT in Kconfig, do so now.

This also requires implementing pte_pgprot() as this is used in the
generic_access_phys() code enabled by CONFIG_HAVE_IOREMAP_PROT. As with
ioremap_wc() we need to take the MMIO Write Back bit index into account.

Moreover since the pgprot value returned from pte_pgprot() is to be used
for mappings into kernel address space we must make sure that it uses
appropriate kernel page table protection bits. In particular a pgprot
value originally coming from userspace could have the _PAGE_PROTECT
bit set to enable fault based dirty bit accounting which would then make
the mapping inaccessible when used in kernel address space.

Fixes: b02002cc4c ("s390/pci: Implement ioremap_wc/prot() with MIO")
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:06:58 +02:00
Thomas Richter
15e5b53ff4 s390/cpumf: remove WARN_ON_ONCE in counter start handler
Remove some WARN_ON_ONCE() warnings when a counter is started. Each
counter is installed function calls
event_sched_in() --> cpumf_pmu_add(..., PERF_EF_START).

This is done after the event has been created using
perf_pmu_event_init() which verifies the counter is valid.
Member hwc->config must be valid at this point.

Function cpumf_pmu_start(..., PERF_EF_RELOAD) is called from
function cpumf_pmu_add() for counter events. All other invocations of
cpumf_pmu_start(..., PERF_EF_RELOAD) are from the performance subsystem
for sampling events.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:06:58 +02:00
Thomas Richter
d552a58d70 s390/cpumf: remove counter transaction call backs
The command 'perf stat -e cycles ...' triggers the following function
sequence in the CPU Measurement Facility counter device driver:

perf_pmu_event_init()
  __hw_perf_event_init()
    validate_ctr_auth()
    validate_ctr_version()

During event creation, the counter number is checked in functions
validate_ctr_auth() and validate_ctr_version() to verify it is a valid
counter and supported by the hardware. If this is not the case, both
functions return an error and the event is not created. System call
perf_event_open() returns an error in this case.

Later on the event is installed in the kernel event subsystem and the
driver functions cpumf_pmu_add() and cpumf_pmu_commit_txn() are called
to install the counter event by the hardware.

Since both events have been verified at event creation, there is no need
to re-evaluate the authorization state. This can not change since on
 * LPARs the authorization change requires a restart of the LPAR (and
   thus a reboot of the kernel)
 * DPMs can not take resources away, just add them.

Also the sequence of CPU Measurement facility counter device driver
calls is
  cpumf_pmu_start_txn
  cpumf_pmu_add
  cpumf_pmu_start
  cpumf_pmu_commit_txn
for every single event. Which means the condition in cpumf_pmu_add()
is never met and validate_ctr_auth() is never called.

This leaves the counter device driver transaction functions with
just one task:
start_txn: Verify a transaction is not in flight and call
	perf_pmu_disable()
cancel_txn, commit_txn: Verify a transaction is in flight and call
	perf_pmu_enable()

The same functionality is provided by the default transaction handling
functions in kernel/events/core.c. Use those by removing the
counter device driver private call back functions.

Suggested-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-07 17:06:58 +02:00
Linus Torvalds
614124bea7 Linux 5.13-rc5 2021-06-06 15:47:27 -07:00
Linus Torvalds
90d56a3d6e SCSI fixes on 20210606
Five small and fairly minor fixes, all in drivers.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYLzsbiYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishRkoAQD07kLp
 JSHpsn97DOdDpCYu+GoLtHz9uJ9Keh+61hbv+gEAoruwy+STPC3MiKP6IW4b1i/R
 U66kS0NWYkGqOITA2Xs=
 =Wqvj
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Five small and fairly minor fixes, all in drivers"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: scsi_devinfo: Add blacklist entry for HPE OPEN-V
  scsi: ufs: ufs-mediatek: Fix HCI version in some platforms
  scsi: qedf: Do not put host in qedf_vport_create() unconditionally
  scsi: lpfc: Fix failure to transmit ABTS on FC link
  scsi: target: core: Fix warning on realtime kernels
2021-06-06 15:39:56 -07:00
Linus Torvalds
20e41d9bc8 Miscellaneous ext4 bug fixes for v5.13
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmC82AQACgkQ8vlZVpUN
 gaOkAgf+KH57P/P0sB6aVBHpAzqa9jTKJWMA5kpCqYUDkYlfF7n2hwsjMzWpJ5MY
 ZvFpKAflmRnve/ULUZQX6+zrcbieNs3e+6VFZrZ0PmxN0dupyISLY7jnvCRDleA7
 BFO34AcH+QEst9zXJmgta9eoy3LA8sawhQ/d7ujVY+IRFk40m26fuAMiaGznlQJ5
 dmrx7pHZWKFIDFIg2TdFlP+Voqbxs2VTT16gmWpGBdTyWYHKjbSOLKJFc9DwYeE9
 aANf6iIzwXz7y9pZiOnTrGuKDEJcIZNESkbIqw62YgqsoObLbsbCZNmNcqxyHpYQ
 Mh3L59KtmjANW3iOxQfyxkNTugxchw==
 =BSnf
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Miscellaneous ext4 bug fixes"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: Only advertise encrypted_casefold when encryption and unicode are enabled
  ext4: fix no-key deletion for encrypt+casefold
  ext4: fix memory leak in ext4_fill_super
  ext4: fix fast commit alignment issues
  ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
  ext4: fix accessing uninit percpu counter variable with fast_commit
  ext4: fix memory leak in ext4_mb_init_backend on error path.
2021-06-06 14:24:13 -07:00
Linus Torvalds
decad3e1d1 ARM: SoC fixes for 5.13
A set of fixes that have been coming in over the last few weeks, the
 usual mix of fixes:
 
  - DT fixups for TI K3
  - SATA drive detection fix for TI DRA7
  - Power management fixes and a few build warning removals for OMAP
  - OP-TEE fix to use standard API for UUID exporting
  - DT fixes for a handful of i.MX boards
  ... plus a few other smaller items
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEElf+HevZ4QCAJmMQ+jBrnPN6EHHcFAmC9IRYPHG9sb2ZAbGl4
 b20ubmV0AAoJEIwa5zzehBx3m/sQAI8JaXHnUBbXAAn0ugMLaTfXA1FKI+in+O43
 xobyBwe6R4XNAr1GQvLYXF6vY3eaMHrcS/a1dU96uH7h3B8aq9ZR9DqrpJ5VNu37
 q55t+brJkEI8+t5u08u6jrxfVARu4FbyZB4jIkbIwxnyEynx/8Bl7sLYImOhspkb
 ipEkgPOSGcIDUcsI1eXIb6urQYkE0yssy20qNYFQ4neYS4Su4gy+LA2OTSqRHk/s
 uQtfJZGN9LbxUFJ4mho349hebjM3rjXw2ox9Znk4bIhBbW38sppjstQ8lhi5YhbW
 RlKDEIYcc/Fo/Hy1xoiAY2MZ7lMMUnXOEhMQmzQd1AZVNH0ysr8vuSEBy9mxF36r
 Jx3eXWDZulDnhM/3eA8GJg3o5kcCZcymFHxo2X0bMeiMlXZ77QP8XV5Y5/C6LQf3
 wJ4lncq0Q4wc9q7W2uGCq/JrMbJxOTjh6nbLlOIENHliRgwk48aoHY4RlOtKpws7
 82S/vCAAj/08mpV2AzZtTtUxIrvgJcdKm9freXZOoUp53yvHbltLZXm2bBrfLdh1
 qM0vT9+skUajxWXG0HpTNJSig3DeAMmfwC3As4kjibA8Jtr8y7249Z/xKPXJrZI+
 5lvi4S2AT/QJbER0jUe6nIamS9RIFnDy0+J0BPvfJ6VpDMIM18rn6M4iev0am9v7
 MDI6Mm2+
 =TbGI
 -----END PGP SIGNATURE-----

Merge tag 'arm-soc-fixes-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC fixes from Olof Johansson:
 "A set of fixes that have been coming in over the last few weeks, the
  usual mix of fixes:

   - DT fixups for TI K3

   - SATA drive detection fix for TI DRA7

   - Power management fixes and a few build warning removals for OMAP

   - OP-TEE fix to use standard API for UUID exporting

   - DT fixes for a handful of i.MX boards

  And a few other smaller items"

* tag 'arm-soc-fixes-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (29 commits)
  arm64: meson: select COMMON_CLK
  soc: amlogic: meson-clk-measure: remove redundant dev_err call in meson_msr_probe()
  ARM: OMAP1: ams-delta: remove unused function ams_delta_camera_power
  bus: ti-sysc: Fix flakey idling of uarts and stop using swsup_sidle_act
  ARM: dts: imx: emcon-avari: Fix nxp,pca8574 #gpio-cells
  ARM: dts: imx7d-pico: Fix the 'tuning-step' property
  ARM: dts: imx7d-meerkat96: Fix the 'tuning-step' property
  arm64: dts: freescale: sl28: var1: fix RGMII clock and voltage
  arm64: dts: freescale: sl28: var4: fix RGMII clock and voltage
  ARM: imx: pm-imx27: Include "common.h"
  arm64: dts: zii-ultra: fix 12V_MAIN voltage
  arm64: dts: zii-ultra: remove second GEN_3V3 regulator instance
  arm64: dts: ls1028a: fix memory node
  bus: ti-sysc: Fix am335x resume hang for usb otg module
  ARM: OMAP2+: Fix build warning when mmc_omap is not built
  ARM: OMAP1: isp1301-omap: Add missing gpiod_add_lookup_table function
  ARM: OMAP1: Fix use of possibly uninitialized irq variable
  optee: use export_uuid() to copy client UUID
  arm64: dts: ti: k3*: Introduce reg definition for interrupt routers
  arm64: dts: ti: k3-am65|j721e|am64: Map the dma / navigator subsystem via explicit ranges
  ...
2021-06-06 13:00:36 -07:00
Linus Torvalds
bd7b12aa60 powerpc fixes for 5.13 #5
Fix our KVM reverse map real-mode handling since we enabled huge vmalloc (in some
 configurations).
 
 Revert a recent change to our IOMMU code which broke some devices.
 
 Fix KVM handling of FSCR on P7/P8, which could have possibly let a guest crash it's Qemu.
 
 Fix kprobes validation of prefixed instructions across page boundary.
 
 Thanks to: Alexey Kardashevskiy, Christophe Leroy, Fabiano Rosas, Frederic Barrat, Naveen
 N. Rao, Nicholas Piggin.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmC8wi8THG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgN42D/4vHCHX4T0CZ/5bwh1RMOoGKM+PFyLe
 BoA2i8lvUILG1+LOiRJuBnVZiWwKYBqfkkfY4BmQpU3Oe3gjbJJwc9QGGHUDarWn
 NmMPqVgaO5qXObObKXzBU1Ihq4UQwMhK044srzXcgMYyTnSFNgWQAsvO0+0Cl4K4
 uT100AFV4tps8dLCHCq2XVHuQALnHzZah4yQ8i6u1TMN/TK+kXyONrMSCgsQ1mrM
 dDsT1zVeegj8EuW/n9kXkLNp2YZeatptZB7cPDtojlhCQTsZBcKnYtDq5ScASuwy
 7hGjzA2SyWsa6l0Iejoj8tr/ZS8Nutftz3izuhDNLEf4foz0tOWqxbXJayOA5J7w
 vzs9OSFbT6z/svELSIkRCvfePqUdDdC2MthWoShgv0SoIXj+Y7ABKQRW9B5rLeF5
 RiB2kCB+7S/03qjDtn57IlJC6aVoHzglTAdYXuj7guUEsZQrmtsdm1IM4eB0XYyx
 A9/AMCGSbswT0/IUriO4b9FtWGOJJf1vWv3WeqE63gPxqhyTz1ACqMT/0HLrARJZ
 /QLZrbuOSMBSGDnmJxy3vzb+3fxGxSGrUcoYc6MiSODuRgf7zHuRJsSDwoftnOTW
 PXVWPVz9ef0OEmuBJyEgTrO+/g9jjCPw8UJz9EaFzkMHbaoHRuZdo2m8X6zrXQLh
 AUVlDkkSmblY9w==
 =KkfQ
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Fix our KVM reverse map real-mode handling since we enabled huge
  vmalloc (in some configurations).

  Revert a recent change to our IOMMU code which broke some devices.

  Fix KVM handling of FSCR on P7/P8, which could have possibly let a
  guest crash it's Qemu.

  Fix kprobes validation of prefixed instructions across page boundary.

  Thanks to Alexey Kardashevskiy, Christophe Leroy, Fabiano Rosas,
  Frederic Barrat, Naveen N. Rao, and Nicholas Piggin"

* tag 'powerpc-5.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  Revert "powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to save TCEs"
  KVM: PPC: Book3S HV: Save host FSCR in the P7/8 path
  powerpc: Fix reverse map real-mode address lookup with huge vmalloc
  powerpc/kprobes: Fix validation of prefixed instructions across page boundary
2021-06-06 12:39:36 -07:00
Linus Torvalds
773ac53bbf - Fix out-of-spec hardware (1st gen Hygon) which does not implement
MSR_AMD64_SEV even though the spec clearly states so, and check CPUID
 bits first.
 
 - Send only one signal to a task when it is a SEGV_PKUERR si_code type.
 
 - Do away with all the wankery of reserving X amount of memory in
 the first megabyte to prevent BIOS corrupting it and simply and
 unconditionally reserve the whole first megabyte.
 
 - Make alternatives NOP optimization work at an arbitrary position
 within the patched sequence because the compiler can put single-byte
 NOPs for alignment anywhere in the sequence (32-bit retpoline), vs our
 previous assumption that the NOPs are only appended.
 
 - Force-disable ENQCMD[S] instructions support and remove update_pasid()
 because of insufficient protection against FPU state modification in an
 interrupt context, among other xstate horrors which are being addressed
 at the moment. This one limits the fallout until proper enablement.
 
 - Use cpu_feature_enabled() in the idxd driver so that it can be
 build-time disabled through the defines in .../asm/disabled-features.h.
 
 - Fix LVT thermal setup for SMI delivery mode by making sure the APIC
 LVT value is read before APIC initialization so that softlockups during
 boot do not happen at least on one machine.
 
 - Mark all legacy interrupts as legacy vectors when the IO-APIC is
 disabled and when all legacy interrupts are routed through the PIC.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmC8fdEACgkQEsHwGGHe
 VUqO5A/+IbIo8myl8VPjw6HRnHgY8rsYRjxdtmVhbaMi5XOmTMfVA9zJ6QALxseo
 Mar8bmWcezEs0/FmNvk1vEOtIgZvRVy5RqXbu3W2EgWICuzRWbj822q+KrkbY0tH
 1GWjcZQO8VlgeuQsukyj5QHaBLffpn3Fh1XB8r0cktZvwciM+LRNMnK8d6QjqxNM
 ctTX4wdI6kc076pOi7MhKxSe+/xo5Wnf27lClLMOcsO/SS42KqgeRM5psWqxihhL
 j6Y3Oe+Nm+7GKF8y841PUSlwjgWmlZa6UkR6DBTP7DGnHDa5hMpzxYvHOquq/SbA
 leV9OLqI0iWs56kSzbEcXo7do1kld62KjsA2KtUhJfVAtm+igQLh5G0jESBwrWca
 TBWaE5kt6s8wP7LXeg26o4U8XD8vqEH88Tmsjlgqb/t/PKDV9PMGvNpF00dPZFo6
 Jhj2yntJYjLQYoAQLuQm5pfnKhZy3KKvk7ViGcnp3iN9i4eU9HzawIiXnliNOrTI
 ohQ9KoRhy1Cx0UfLkR+cdK4ks0u26DC2/Ewt0CE5AP/CQ1rX6Zbv2gFLjSpy7yQo
 6A99HEpbaLuy3kDt5vn91viPNUlOveuIXIdHp6u+zgFfx88eLUoEvfR135aV/Gyh
 p5PJm/BO99KByQzFCnilkp7nBeKtnKYSmUojA6JsZKjzJimSPYo=
 =zRI1
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:
 "A bunch of x86/urgent stuff accumulated for the last two weeks so
  lemme unload it to you.

  It should be all totally risk-free, of course. :-)

   - Fix out-of-spec hardware (1st gen Hygon) which does not implement
     MSR_AMD64_SEV even though the spec clearly states so, and check
     CPUID bits first.

   - Send only one signal to a task when it is a SEGV_PKUERR si_code
     type.

   - Do away with all the wankery of reserving X amount of memory in the
     first megabyte to prevent BIOS corrupting it and simply and
     unconditionally reserve the whole first megabyte.

   - Make alternatives NOP optimization work at an arbitrary position
     within the patched sequence because the compiler can put
     single-byte NOPs for alignment anywhere in the sequence (32-bit
     retpoline), vs our previous assumption that the NOPs are only
     appended.

   - Force-disable ENQCMD[S] instructions support and remove
     update_pasid() because of insufficient protection against FPU state
     modification in an interrupt context, among other xstate horrors
     which are being addressed at the moment. This one limits the
     fallout until proper enablement.

   - Use cpu_feature_enabled() in the idxd driver so that it can be
     build-time disabled through the defines in disabled-features.h.

   - Fix LVT thermal setup for SMI delivery mode by making sure the APIC
     LVT value is read before APIC initialization so that softlockups
     during boot do not happen at least on one machine.

   - Mark all legacy interrupts as legacy vectors when the IO-APIC is
     disabled and when all legacy interrupts are routed through the PIC"

* tag 'x86_urgent_for_v5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev: Check SME/SEV support in CPUID first
  x86/fault: Don't send SIGSEGV twice on SEGV_PKUERR
  x86/setup: Always reserve the first 1M of RAM
  x86/alternative: Optimize single-byte NOPs at an arbitrary position
  x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid()
  dmaengine: idxd: Use cpu_feature_enabled()
  x86/thermal: Fix LVT thermal setup for SMI delivery mode
  x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing
2021-06-06 12:25:43 -07:00
Daniel Rosenberg
e71f99f2df ext4: Only advertise encrypted_casefold when encryption and unicode are enabled
Encrypted casefolding is only supported when both encryption and
casefolding are both enabled in the config.

Fixes: 471fbbea7f ("ext4: handle casefolding with encryption")
Cc: stable@vger.kernel.org # 5.13+
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Link: https://lore.kernel.org/r/20210603094849.314342-1-drosen@google.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-06 10:10:23 -04:00
Daniel Rosenberg
63e7f12893 ext4: fix no-key deletion for encrypt+casefold
commit 471fbbea7f ("ext4: handle casefolding with encryption") is
missing a few checks for the encryption key which are needed to
support deleting enrypted casefolded files when the key is not
present.

This bug made it impossible to delete encrypted+casefolded directories
without the encryption key, due to errors like:

    W         : EXT4-fs warning (device vdc): __ext4fs_dirhash:270: inode #49202: comm Binder:378_4: Siphash requires key

Repro steps in kvm-xfstests test appliance:
      mkfs.ext4 -F -E encoding=utf8 -O encrypt /dev/vdc
      mount /vdc
      mkdir /vdc/dir
      chattr +F /vdc/dir
      keyid=$(head -c 64 /dev/zero | xfs_io -c add_enckey /vdc | awk '{print $NF}')
      xfs_io -c "set_encpolicy $keyid" /vdc/dir
      for i in `seq 1 100`; do
          mkdir /vdc/dir/$i
      done
      xfs_io -c "rm_enckey $keyid" /vdc
      rm -rf /vdc/dir # fails with the bug

Fixes: 471fbbea7f ("ext4: handle casefolding with encryption")
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Link: https://lore.kernel.org/r/20210522004132.2142563-1-drosen@google.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-06 10:10:23 -04:00
Alexey Makhalov
afd09b617d ext4: fix memory leak in ext4_fill_super
Buffer head references must be released before calling kill_bdev();
otherwise the buffer head (and its page referenced by b_data) will not
be freed by kill_bdev, and subsequently that bh will be leaked.

If blocksizes differ, sb_set_blocksize() will kill current buffers and
page cache by using kill_bdev(). And then super block will be reread
again but using correct blocksize this time. sb_set_blocksize() didn't
fully free superblock page and buffer head, and being busy, they were
not freed and instead leaked.

This can easily be reproduced by calling an infinite loop of:

  systemctl start <ext4_on_lvm>.mount, and
  systemctl stop <ext4_on_lvm>.mount

... since systemd creates a cgroup for each slice which it mounts, and
the bh leak get amplified by a dying memory cgroup that also never
gets freed, and memory consumption is much more easily noticed.

Fixes: ce40733ce9 ("ext4: Check for return value from sb_set_blocksize")
Fixes: ac27a0ec11 ("ext4: initial copy of files from ext3")
Link: https://lore.kernel.org/r/20210521075533.95732-1-amakhalov@vmware.com
Signed-off-by: Alexey Makhalov <amakhalov@vmware.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2021-06-06 10:10:23 -04:00
Harshad Shirwadkar
a7ba36bc94 ext4: fix fast commit alignment issues
Fast commit recovery data on disk may not be aligned. So, when the
recovery code reads it, this patch makes sure that fast commit info
found on-disk is first memcpy-ed into an aligned variable before
accessing it. As a consequence of it, we also remove some macros that
could resulted in unaligned accesses.

Cc: stable@kernel.org
Fixes: 8016e29f43 ("ext4: fast commit recovery path")
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20210519215920.2037527-1-harshads@google.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-06 10:10:23 -04:00
Ye Bin
082cd4ec24 ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
We got follow bug_on when run fsstress with injecting IO fault:
[130747.323114] kernel BUG at fs/ext4/extents_status.c:762!
[130747.323117] Internal error: Oops - BUG: 0 [#1] SMP
......
[130747.334329] Call trace:
[130747.334553]  ext4_es_cache_extent+0x150/0x168 [ext4]
[130747.334975]  ext4_cache_extents+0x64/0xe8 [ext4]
[130747.335368]  ext4_find_extent+0x300/0x330 [ext4]
[130747.335759]  ext4_ext_map_blocks+0x74/0x1178 [ext4]
[130747.336179]  ext4_map_blocks+0x2f4/0x5f0 [ext4]
[130747.336567]  ext4_mpage_readpages+0x4a8/0x7a8 [ext4]
[130747.336995]  ext4_readpage+0x54/0x100 [ext4]
[130747.337359]  generic_file_buffered_read+0x410/0xae8
[130747.337767]  generic_file_read_iter+0x114/0x190
[130747.338152]  ext4_file_read_iter+0x5c/0x140 [ext4]
[130747.338556]  __vfs_read+0x11c/0x188
[130747.338851]  vfs_read+0x94/0x150
[130747.339110]  ksys_read+0x74/0xf0

This patch's modification is according to Jan Kara's suggestion in:
https://patchwork.ozlabs.org/project/linux-ext4/patch/20210428085158.3728201-1-yebin10@huawei.com/
"I see. Now I understand your patch. Honestly, seeing how fragile is trying
to fix extent tree after split has failed in the middle, I would probably
go even further and make sure we fix the tree properly in case of ENOSPC
and EDQUOT (those are easily user triggerable).  Anything else indicates a
HW problem or fs corruption so I'd rather leave the extent tree as is and
don't try to fix it (which also means we will not create overlapping
extents)."

Cc: stable@kernel.org
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210506141042.3298679-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-06 10:09:55 -04:00
Linus Torvalds
f5b6eb1e01 Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
 "Some more bugfixes from I2C for v5.13. Usual stuff"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: qcom-geni: Suspend and resume the bus during SYSTEM_SLEEP_PM ops
  i2c: qcom-geni: Add shutdown callback for i2c
  i2c: tegra-bpmp: Demote kernel-doc abuses
  i2c: altera: Fix formatting issue in struct and demote unworthy kernel-doc headers
2021-06-05 15:45:11 -07:00
Olof Johansson
b9c112f2c2 Devicetree fixes for TI K3 platforms for v5.13 merge window:
These minor fixes include:
 * Fixups for device tree discovered during yaml conversion
 * Fixups for missing dma-coherent property in j7200
 * Removal of camera sensor node from am65 evm dts to overlay
   as camera sensor boards are variable.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE+KKGk1TrgjIXoxo03bWEnRc2JJ0FAmCjqiAACgkQ3bWEnRc2
 JJ0BvQ//VZl9llR0VWY9mADZBxIZ4tKwEGK9tK1fvQx44oWfjy30YkoN5UNDVbOO
 RsSNPOiaUZbGUBjmVXJlxsjgDgpFsA3jGYaNwOiPHsN6CcglNug8W2roytmPO7j2
 +csfGvSXCgFZX/7fnaqyv5lTjZE2xoBH4btacSPkil34Na7sgZhEVJ1aZQ3gWYzl
 occPgPPqR+8AK3mVVDk7C43eVGNUAs2VhxM7BoIUYNJrT2UZt4eWziFOBLq8ULSa
 O3K6YqUNm59qSe5Y3Vy3oGItVtEHJi+6wsYLKOeYPL+DuomyO9GVNH5Qa9IiD/AU
 E7zNaG3Chx1YygRNvaIZ5g/DD3iLVFxa5nOigdqu/VqzO80NJnw4OWt6/F8U1czX
 TZscs1u1x9LmtWLhdfPBJ6O7qwuKKobVBPo+CqrviHbXebjJJdwRXbj8Bkf2IIKB
 612V6kXwLJ4Q/2XWaDSZHoYw1GMjHE/qQg05o0GltkrHtes40LnAHNe1aB9yKCIJ
 hMWK9Hp+asr5Jic7aJOXcFPWn6P10hp6jdEoV3Ihs5lygLdqHhrcQvjpT804z4Hz
 4aWbHWTLODvLQwI9OBiiT3sr+H0s6HOoBQ6svzqoWsEBwRpukxtGCl+A19vj+hHg
 far/47XRJBtDW+e8EV1pVhxQ7NqDjYjyBY4irgn2+FgH9WAPZJc=
 =veTm
 -----END PGP SIGNATURE-----

Merge tag 'ti-k3-dt-fixes-for-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nmenon/linux into arm/fixes

Devicetree fixes for TI K3 platforms for v5.13 merge window:

These minor fixes include:
* Fixups for device tree discovered during yaml conversion
* Fixups for missing dma-coherent property in j7200
* Removal of camera sensor node from am65 evm dts to overlay
  as camera sensor boards are variable.

* tag 'ti-k3-dt-fixes-for-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nmenon/linux:
  arm64: dts: ti: k3*: Introduce reg definition for interrupt routers
  arm64: dts: ti: k3-am65|j721e|am64: Map the dma / navigator subsystem via explicit ranges
  arm64: dts: ti: k3-*: Rename the TI-SCI node
  arm64: dts: ti: k3-am65-wakeup: Drop un-necessary properties from dmsc node
  arm64: dts: ti: k3-am65-wakeup: Add debug region to TI-SCI node
  arm64: dts: ti: k3-*: Rename the TI-SCI clocks node name
  arm64: dts: ti: j7200-main: Mark Main NAVSS as dma-coherent
  arm64: dts: ti: k3-am654-base-board: remove ov5640

Link: https://lore.kernel.org/r/20210518115634.467vgpbzplal5kou@obituary
Signed-off-by: Olof Johansson <olof@lixom.net>
2021-06-05 15:43:48 -07:00
Olof Johansson
7468bed8f8 OP-TEE use export_uuid() to copy UUID
-----BEGIN PGP SIGNATURE-----
 
 iQJOBAABCgA4FiEEFV+gSSXZJY9ZyuB5LinzTIcAHJcFAmCjhswaHGplbnMud2lr
 bGFuZGVyQGxpbmFyby5vcmcACgkQLinzTIcAHJfM7Q//f4hdxcAb+iLiFwJVj7IL
 0kJ5MUj+4NyxdzxWBsqHE80Vf0MxumAaxLLKLl1SrV1os3nKUQENG7+XAdC/5ozT
 m3Qcw0qTQRQ3eC3wMDM150fmnpadTkeOmwReQrsHBdczP+fuWbw9q+JN9dN3t2y8
 kaym43DtGoIbUfmyqmac5YbO80mE0QAq73wcqYqRcNQk56fS2nsPFEUTqDA9iQTp
 lAVtMoYC65SdHFHsxMe46TMme0vnG99sN2MMLwNDfJdYjQI1Dzn3NRqgVMEmGvNP
 ZShtvePSoaF/GS3axzlE2dxJcHPyC7bLSoTYS4s0FjWQ8AcJMqQrL2YeCJieny4Y
 2SILHN+25xDVyAimW8dQQNJBnDP/zJlnL6FBv6b7I1rynQYy733NbZ8TedHlHptF
 WEV47IyYVvaQs18wg24MD4wjtVDzK8s/r+frpBOrTiRr/AYJqqu8JtKYJHKNRpqZ
 EGN9o54fe4Yo97fkcyqQkE/vYdsRhJfz29gPEMaITSwOEu11xIiM2ia7KyARWbjo
 wJzZJY4AH2Gs0LeoEfMwTyKCH4KGvziBVzd/Dbc8ri1mzpgoRuDN1IkMcbBoZghQ
 1xjlRwZKk4yMKO5Sj4jOd6bp03qpFL+9lD0MKSD70Z86DN8Ods0G2QwgdhdHBe9o
 7gQE8NyvTmHWZ93aS1Fer1k=
 =1uWY
 -----END PGP SIGNATURE-----

Merge tag 'optee-fix-for-v5.13' of git://git.linaro.org/people/jens.wiklander/linux-tee into arm/fixes

OP-TEE use export_uuid() to copy UUID

* tag 'optee-fix-for-v5.13' of git://git.linaro.org/people/jens.wiklander/linux-tee:
  optee: use export_uuid() to copy client UUID

Link: https://lore.kernel.org/r/20210518100712.GA449561@jade
Signed-off-by: Olof Johansson <olof@lixom.net>
2021-06-05 15:43:11 -07:00
Olof Johansson
2f3e4eb179 PM and build warning fixes for omaps
While chasing system suspend related regressions, I noticed few other
 issues related to PM would be good to have fixed:
 
 - UART idling does not always work for hardware autoidle features
 - am335x resume works only the first time unless musb module is loaded
 
 Then there are three patches for omap1 related warnings caused by the gpio
 changes, and one build warning fix for legacy mmc platform code when mmc
 is built as a loadable module.
 
 These can all be merged whenever suitable naturally. I've sent the more
 urgent SATA regression fix separately although it appears in this pull
 request too because of the branches merged.
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEEkgNvrZJU/QSQYIcQG9Q+yVyrpXMFAmC3IlsRHHRvbnlAYXRv
 bWlkZS5jb20ACgkQG9Q+yVyrpXOQww/+KyrCAWN2184qd4LinTVoBnh0Uuh74MIb
 c3y1oXUD6srbCn1ZQZh33dw6ek/lMAqRSZYtxS81/ByfsPxiwV1M5TKiLM/HHfGD
 qb9cxQ4pldWM85bN18QiDsWpN7VngtluqD61nSk5fFGWFJ7FB3Gkaq0pwFY69XR2
 pxi9jdiJzGZmfrth/1k10K/oosYZemtGMtbCJ7kaeXTHpi0n6cycK3HYdWpSNaG6
 ZJOYc0YQtHjgMImMZC2b7cpcg8N5bXVoC0znBoePnHAMHr3K3ucCz+7j+jOXeAjU
 ADMoUCm5l4Yp1ibPeC1b7zaLqOOMXwiRLoOypTqpUTfCR2yWsaxsw7rmt44me0y0
 XRjFfrMKRBzWnD2a3zPYK6OHzErV8A1rzJnFLqQ2cDW0GGZe6bI/mlwfTSqMu1Ih
 dFY4J66gkIdNtEfivKF0aOqwWUytoJEKI4KTnWdJuLFk/FYvv40FOrLGeiYb1jpT
 ijnkdmn7UEgqUrOwDdaIa90YIVf6SiENerrBCiDUuMD2tB+KYZxYxWuv7wTOcJGK
 Nz3KMwzD8nttIzPXKDWUuOQofZwkPb61rtyoEnlXAEsRxOBg9QRkU/wzZMDzoIok
 i2T2ubL1ifrby7KN46zdO+Nw58c37Z1y0TCz8IAiSIrKpjQP5TMySHpfvVh2odjc
 g0gRsRi3j5M=
 =/LY8
 -----END PGP SIGNATURE-----

Merge tag 'omap-for-v5.13/fixes-pm' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes

PM and build warning fixes for omaps

While chasing system suspend related regressions, I noticed few other
issues related to PM would be good to have fixed:

- UART idling does not always work for hardware autoidle features
- am335x resume works only the first time unless musb module is loaded

Then there are three patches for omap1 related warnings caused by the gpio
changes, and one build warning fix for legacy mmc platform code when mmc
is built as a loadable module.

These can all be merged whenever suitable naturally. I've sent the more
urgent SATA regression fix separately although it appears in this pull
request too because of the branches merged.

* tag 'omap-for-v5.13/fixes-pm' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: OMAP1: ams-delta: remove unused function ams_delta_camera_power
  bus: ti-sysc: Fix flakey idling of uarts and stop using swsup_sidle_act
  bus: ti-sysc: Fix am335x resume hang for usb otg module
  ARM: OMAP2+: Fix build warning when mmc_omap is not built
  ARM: OMAP1: isp1301-omap: Add missing gpiod_add_lookup_table function
  ARM: OMAP1: Fix use of possibly uninitialized irq variable

Link: https://lore.kernel.org/r/pull-1622614772-543196@atomide.com
Signed-off-by: Olof Johansson <olof@lixom.net>
2021-06-05 15:41:44 -07:00
Olof Johansson
94277cb5b4 Regression fix for TI dra7 SATA not detecting drives
The SATA quirk flags are no missing With recent removal of legacy
 platform data and we need to add the quirk flags to detect drives.
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEEkgNvrZJU/QSQYIcQG9Q+yVyrpXMFAmC3HPcRHHRvbnlAYXRv
 bWlkZS5jb20ACgkQG9Q+yVyrpXOm5BAAyx2d3sgsredffn9njQw75oEARUiwt7kB
 uMVvvgqKQw8hluixHgxwrCQwQURBVBmQqZ6yHVPRhDCRqeYAZibYxdkfCllS/fa5
 0/b8dHLccXehaX6TYuhyqVMy+2atHSYE9blc4s+/ijMfrytIqORjz2yfIhILItgu
 nBaHR0HvYqvwESOYL48wg00Plbf4PYhxOrwJiI+9njuuQntQJ66EbK4+OWIVzxS8
 jxoqQvAqdbuXLKd7ZHahGhBus4nK3ePVBkDHgHEKDeufbGt77euueRAR5IaC3liR
 Y2u2q9Fn73Ae8sGAYy379O/CWXpdQr3Fy5gMtH36cWhEjqowUIftkeRu7yovHHjS
 WFT2c5fOIYrFoNPnw9Sm4TQgNvBleUNK6DmDeZ/Z1DrtjQyuZKUJ2tWkNm+m4O7n
 DE/HOuH6AAAkW0cvZvhJK4vZQNagN7CGy/fEARo2/hR/nMjZFRFD/NO7tUtuocwy
 /7uuPyX1oH0xbRJpAjSMLQVk0LU8Heb84pLU3PGk0jsDL2BfRyUUpOdfFWgdXeLS
 9jRKQT/nupaqDYFi/j10KIOuy1owanmnm8pW5qzspVMFnWSsbcmYDP2QcAdbk4tF
 1EbfoIFp/OKp5t2Efqkl2rozmpdKqowX4Ww/xQMeGlBqZXYkEP/lpr3uUlirP65U
 bpnEztC0v7g=
 =Ssh+
 -----END PGP SIGNATURE-----

Merge tag 'omap-for-v5.13/fixes-sata' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes

Regression fix for TI dra7 SATA not detecting drives

The SATA quirk flags are no missing With recent removal of legacy
platform data and we need to add the quirk flags to detect drives.

* tag 'omap-for-v5.13/fixes-sata' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  bus: ti-sysc: Fix missing quirk flags for sata

Link: https://lore.kernel.org/r/pull-1622613578-121536@atomide.com
Signed-off-by: Olof Johansson <olof@lixom.net>
2021-06-05 15:39:58 -07:00
Olof Johansson
3091a9e742 Amlogic fixes for v5.13-rc1
- arm64: meson: select COMMON_CLK to select a proper implementation of the clock API
 - soc: amlogic: meson-clk-measure: remove redundant dev_err call in meson_msr_probe()
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPVPGJshWBf4d9CyLd9zb2sjISdEFAmC0kP8ACgkQd9zb2sjI
 SdHINRAAl/YoblNGN8pbVjICgaPA+vwKJHcI6iHr6Fp3ErXpmqSRKQsLPYpfx6LO
 HdqItR02fL8KbGjazRVldowy1yd+XWlXaZzqilX92ICyTIVHW/oh9IqpR9PBF718
 e+x9BYDqMxJDj2qc7h3KjbHOYXneXrScd8NchWxI6PjI11+294jQHJ0kOYJeDYQB
 JRIsJq2BsxoUpncodT0GK6FRnDpqIxFTUrKa3Rs7yKSQRaw+XTO7KTFXwDQTTgE8
 YuY47If4UVM+qWBZbP3qAmazxPpBj2NR/zecK8w07A3470vVA9NvB5pdLkK4I9Ax
 qWg8L+uq6CbndYOfNAOGHC420LmVa+95oiN/5/iakXvBCKRhmLLZ230nM08vJxqb
 USLGAEBRx9w/x1y/N3jqE27TyQo4mYWJbeAiXniu3t4M7+o16hpocKVgH8pFMlV7
 hHvj8X+8F8dCeYUtTAY2h/upOVXgYsX9D15NsJCg5uJW+mQ4qW/5XvCuFFstfrlL
 XauNUNyyYLOWlmrvRcoSivCmykBBgkCx5vGydlwZIPf/wa1/tJ0VINxOC2qlGj/l
 rX2BUoLUx0hUoDsmnPxLQGn4ArONhpNwySFTZavnG2fKMO0p2VOluvoLkw+ovepm
 iToxEKU/c0CHLfx74MEqAmiK+K3/EM6RX4EodX4ZdshpRJYojcI=
 =z6p+
 -----END PGP SIGNATURE-----

Merge tag 'amlogic-fixes-v5.13-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/amlogic/linux into arm/fixes

Amlogic fixes for v5.13-rc1
- arm64: meson: select COMMON_CLK to select a proper implementation of the clock API
- soc: amlogic: meson-clk-measure: remove redundant dev_err call in meson_msr_probe()

* tag 'amlogic-fixes-v5.13-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/amlogic/linux:
  arm64: meson: select COMMON_CLK
  soc: amlogic: meson-clk-measure: remove redundant dev_err call in meson_msr_probe()

Link: https://lore.kernel.org/r/73e76706-f3f4-bebf-10dd-d2ec9106a234@baylibre.com
Signed-off-by: Olof Johansson <olof@lixom.net>
2021-06-05 15:39:23 -07:00
Olof Johansson
3a2d3ae067 i.MX fixes for 5.13:
- Fix missing-prototypes warning of 'imx27_pm_init' in i.MX27 platform
   pm code.
 - A couple of patches from Fabio Estevam to fix 'tuning-step' property
   in imx7d-meerkat96 and imx7d-pico DT.
 - Fix '#gpio-cells' of nxp,pca8574 device in imx6qdl-emcon-avari DT.
 - A couple of patches from Lucas Stach to fix regulator and voltage for
   imx8mq-zii-ultra board.
 - Add missing regulators for imx6q-dhcom to avoid possible instability
   issues.
 - Fix memory-controller settings for fsl-ls1028a DT.
 - Fix RGMII clock and voltage for a couple of fsl-ls1028a-kontron-sl28
   boards.
 - Fix RGMII connection to QCA8334 switch for imx6dl-yapp4 board.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCgAyFiEEFmJXigPl4LoGSz08UFdYWoewfM4FAmCu8p4UHHNoYXduZ3Vv
 QGtlcm5lbC5vcmcACgkQUFdYWoewfM7k+Qf9Ek/atJBjaaVzhbGFQ8HRnDOk4U5J
 dR0GbXKSN1zXjDLSBOBRyNcVn2T6lp4LwwYoleC2qG6MNoe6+y7fk1oRdvjkGjRO
 ZYfZT/JJl187IN6JsxR8VhRmmcI5/C3bihMbpKoLdHcPGbwC4relfSy99Io94Xfk
 CVn3PTxBA4KfoDq/+EiA9HkQ/HwQh/y7KD5yVd6OtLkt9awT1hYW1ttVefp95nAK
 5+zTm0Z14U8aGm+wo8jywvSyeXiuMeWDUrweo05XcaHWfcv3T0Wni7tgS2pluxqa
 lCMsNQBsZDTk6fUWxqPjjAEl6ks8N1H9L7YIGpq1dOCSYFUqI3jCmkhrFQ==
 =5vQk
 -----END PGP SIGNATURE-----

Merge tag 'imx-fixes-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes

i.MX fixes for 5.13:

- Fix missing-prototypes warning of 'imx27_pm_init' in i.MX27 platform
  pm code.
- A couple of patches from Fabio Estevam to fix 'tuning-step' property
  in imx7d-meerkat96 and imx7d-pico DT.
- Fix '#gpio-cells' of nxp,pca8574 device in imx6qdl-emcon-avari DT.
- A couple of patches from Lucas Stach to fix regulator and voltage for
  imx8mq-zii-ultra board.
- Add missing regulators for imx6q-dhcom to avoid possible instability
  issues.
- Fix memory-controller settings for fsl-ls1028a DT.
- Fix RGMII clock and voltage for a couple of fsl-ls1028a-kontron-sl28
  boards.
- Fix RGMII connection to QCA8334 switch for imx6dl-yapp4 board.

* tag 'imx-fixes-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
  ARM: dts: imx: emcon-avari: Fix nxp,pca8574 #gpio-cells
  ARM: dts: imx7d-pico: Fix the 'tuning-step' property
  ARM: dts: imx7d-meerkat96: Fix the 'tuning-step' property
  arm64: dts: freescale: sl28: var1: fix RGMII clock and voltage
  arm64: dts: freescale: sl28: var4: fix RGMII clock and voltage
  ARM: imx: pm-imx27: Include "common.h"
  arm64: dts: zii-ultra: fix 12V_MAIN voltage
  arm64: dts: zii-ultra: remove second GEN_3V3 regulator instance
  arm64: dts: ls1028a: fix memory node
  ARM: dts: imx6q-dhcom: Add PU,VDD1P1,VDD2P5 regulators
  ARM: dts: imx6dl-yapp4: Fix RGMII connection to QCA8334 switch

Link: https://lore.kernel.org/r/20210527011758.GD8194@dragon
Signed-off-by: Olof Johansson <olof@lixom.net>
2021-06-05 15:24:12 -07:00
Linus Torvalds
e5220dd167 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "13 patches.

  Subsystems affected by this patch series: mips, mm (kfence, debug,
  pagealloc, memory-hotplug, hugetlb, kasan, and hugetlb), init, proc,
  lib, ocfs2, and mailmap"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mailmap: use private address for Michel Lespinasse
  ocfs2: fix data corruption by fallocate
  lib: crc64: fix kernel-doc warning
  mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY
  mm/kasan/init.c: fix doc warning
  proc: add .gitignore for proc-subset-pid selftest
  hugetlb: pass head page to remove_hugetlb_page()
  drivers/base/memory: fix trying offlining memory blocks with memory holes on aarch64
  mm/page_alloc: fix counting of free pages after take off from buddy
  mm/debug_vm_pgtable: fix alignment for pmd/pud_advanced_tests()
  pid: take a reference when initializing `cad_pid`
  kfence: use TASK_IDLE when awaiting allocation
  Revert "MIPS: make userspace mapping young by default"
2021-06-05 10:55:41 -07:00
Linus Torvalds
af8d9eb840 RISC-V Fixes for 5.13-rc5
* Build with '-mno-relax' when using LLVM's linker, which doesn't
   support linker relaxation.
 * A fix to build without SiFive's errata.
 * A fix to use PAs during init_resources()
 * A fix to avoid W+X mappings during boot.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmC7i5ATHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiaQVEAC5/S8N9LdmN6ObGTjmuitfi2YMAnQ/
 rNPbZNU/rF9Xf09QmJYHzLh4K6rdLRhd1ZL/voYSEzjZqq9vF60f6gQGgGQ9tULM
 OEMfVPsUdqHte+8qG9HkseeRTjpqOlV8iSYIvDNOPdXaT8O4OSIx7LZQDpF1nvbg
 0KU0BE6BtL6NfAL2yF9HB2jlB5hiN9IZRWj9IXl84Rd8838/b09BUY7KPOMv6W3v
 ENkMkWsWuqxnZFsWdlNErveujULZTETpRuGp01/vqweTMfMGPkhQV73qGMbj1w42
 jLPPFJL62B7v7y24c+MdWdNMnKSGkjXoKXX4UjK0e96zIwq9nRx+346Jx+zZSO0Y
 4zeUTPUCOXycU+p/kB6odLq6dXmjx0RiG2hxjCZawpHGiyTYZ3j7rl5dzQXFN4Ad
 Onc54gQ/w0+yIvasOi7uQnBHtIRpkr2Lu3kK91ZzFF5W0qlio7GbvIgD09ny098c
 2j9x/Mp8qUdPSqtIBZ4djB8tGcMfPPW3fcaCDtcjBID+RdMSHzIkyWqCMQYhCEoJ
 Ke/B04Of0ExM5SmFE/LP63zpTYmGy8TNWyjKvC1cuegE3yUsEAoNve8d922wiAtO
 8exiHjhc+eRO127EXWQNJU6A9HJ7maO+a8o22fgLdQNZkR829wLaJmo3PQ6//L/5
 s2tj7h30zAvxOw==
 =vuVu
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

 - Build with '-mno-relax' when using LLVM's linker, which doesn't
   support linker relaxation.

 - A fix to build without SiFive's errata.

 - A fix to use PAs during init_resources()

 - A fix to avoid W+X mappings during boot.

* tag 'riscv-for-linus-5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: Fix memblock_free() usages in init_resources()
  riscv: skip errata_cip_453.o if CONFIG_ERRATA_SIFIVE_CIP_453 is disabled
  riscv: mm: Fix W+X mappings at boot
  riscv: Use -mno-relax when using lld linker
2021-06-05 10:45:13 -07:00
Michel Lespinasse
2eff0573e0 mailmap: use private address for Michel Lespinasse
Link: https://lkml.kernel.org/r/20210602221225.49446-1-michel@lespinasse.org
Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-05 08:58:12 -07:00
Junxiao Bi
6bba4471f0 ocfs2: fix data corruption by fallocate
When fallocate punches holes out of inode size, if original isize is in
the middle of last cluster, then the part from isize to the end of the
cluster will be zeroed with buffer write, at that time isize is not yet
updated to match the new size, if writeback is kicked in, it will invoke
ocfs2_writepage()->block_write_full_page() where the pages out of inode
size will be dropped.  That will cause file corruption.  Fix this by
zero out eof blocks when extending the inode size.

Running the following command with qemu-image 4.2.1 can get a corrupted
coverted image file easily.

    qemu-img convert -p -t none -T none -f qcow2 $qcow_image \
             -O qcow2 -o compat=1.1 $qcow_image.conv

The usage of fallocate in qemu is like this, it first punches holes out
of inode size, then extend the inode size.

    fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0
    fallocate(11, 0, 2276196352, 65536) = 0

v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html
v2: https://lore.kernel.org/linux-fsdevel/20210525093034.GB4112@quack2.suse.cz/T/

Link: https://lkml.kernel.org/r/20210528210648.9124-1-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-05 08:58:12 -07:00
YueHaibing
415f0c835b lib: crc64: fix kernel-doc warning
Fix W=1 kernel build warning:

  lib/crc64.c:40: warning:
   bad line:         or the previous crc64 value if computing incrementally.

Link: https://lkml.kernel.org/r/20210601135851.15444-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Coly Li <colyli@suse.de>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-05 08:58:12 -07:00
Mina Almasry
d84cf06e3d mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY
The userfaultfd hugetlb tests cause a resv_huge_pages underflow.  This
happens when hugetlb_mcopy_atomic_pte() is called with !is_continue on
an index for which we already have a page in the cache.  When this
happens, we allocate a second page, double consuming the reservation,
and then fail to insert the page into the cache and return -EEXIST.

To fix this, we first check if there is a page in the cache which
already consumed the reservation, and return -EEXIST immediately if so.

There is still a rare condition where we fail to copy the page contents
AND race with a call for hugetlb_no_page() for this index and again we
will underflow resv_huge_pages.  That is fixed in a more complicated
patch not targeted for -stable.

Test:

  Hacked the code locally such that resv_huge_pages underflows produce a
  warning, then:

  ./tools/testing/selftests/vm/userfaultfd hugetlb_shared 10
	2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success
  ./tools/testing/selftests/vm/userfaultfd hugetlb 10
	2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success

Both tests succeed and produce no warnings.  After the test runs number
of free/resv hugepages is correct.

[mike.kravetz@oracle.com: changelog fixes]

Link: https://lkml.kernel.org/r/20210528004649.85298-1-almasrymina@google.com
Fixes: 8fb5debc5f ("userfaultfd: hugetlbfs: add hugetlb_mcopy_atomic_pte for userfaultfd support")
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-05 08:58:12 -07:00
Yu Kuai
7b6889f54a mm/kasan/init.c: fix doc warning
Fix gcc W=1 warning:

  mm/kasan/init.c:228: warning: Function parameter or member 'shadow_start' not described in 'kasan_populate_early_shadow'
  mm/kasan/init.c:228: warning: Function parameter or member 'shadow_end' not described in 'kasan_populate_early_shadow'

Link: https://lkml.kernel.org/r/20210603140700.3045298-1-yukuai3@huawei.com
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Acked-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-05 08:58:11 -07:00
David Matlack
263e88d678 proc: add .gitignore for proc-subset-pid selftest
This new selftest needs an entry in the .gitignore file otherwise git
will try to track the binary.

Link: https://lkml.kernel.org/r/20210601164305.11776-1-dmatlack@google.com
Fixes: 268af17ada ("selftests: proc: test subset=pid")
Signed-off-by: David Matlack <dmatlack@google.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-05 08:58:11 -07:00
Naoya Horiguchi
0c5da35723 hugetlb: pass head page to remove_hugetlb_page()
When memory_failure() or soft_offline_page() is called on a tail page of
some hugetlb page, "BUG: unable to handle page fault" error can be
triggered.

remove_hugetlb_page() dereferences page->lru, so it's assumed that the
page points to a head page, but one of the caller,
dissolve_free_huge_page(), provides remove_hugetlb_page() with 'page'
which could be a tail page.  So pass 'head' to it, instead.

Link: https://lkml.kernel.org/r/20210526235257.2769473-1-nao.horiguchi@gmail.com
Fixes: 6eb4e88a6d ("hugetlb: create remove_hugetlb_page() to separate functionality")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-05 08:58:11 -07:00
David Hildenbrand
928130532e drivers/base/memory: fix trying offlining memory blocks with memory holes on aarch64
offline_pages() properly checks for memory holes and bails out.
However, we do a page_zone(pfn_to_page(start_pfn)) before calling
offline_pages() when offlining a memory block.

We should not unconditionally call page_zone(pfn_to_page(start_pfn)) on
aarch64 in offlining code, otherwise we can trigger a BUG when hitting a
memory hole:

   kernel BUG at include/linux/mm.h:1383!
   Internal error: Oops - BUG: 0 [#1] SMP
   Modules linked in: loop processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb nvme i2c_algo_bit mlx5_core i2c_core nvme_core firmware_class
   CPU: 13 PID: 1694 Comm: ranbug Not tainted 5.12.0-next-20210524+ #4
   Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020
   pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
   pc : memory_subsys_offline+0x1f8/0x250
   lr : memory_subsys_offline+0x1f8/0x250
   Call trace:
     memory_subsys_offline+0x1f8/0x250
     device_offline+0x154/0x1d8
     online_store+0xa4/0x118
     dev_attr_store+0x44/0x78
     sysfs_kf_write+0xe8/0x138
     kernfs_fop_write_iter+0x26c/0x3d0
     new_sync_write+0x2bc/0x4f8
     vfs_write+0x718/0xc88
     ksys_write+0xf8/0x1e0
     __arm64_sys_write+0x74/0xa8
     invoke_syscall.constprop.0+0x78/0x1e8
     do_el0_svc+0xe4/0x298
     el0_svc+0x20/0x30
     el0_sync_handler+0xb0/0xb8
     el0_sync+0x178/0x180
   Kernel panic - not syncing: Oops - BUG: Fatal exception
   SMP: stopping secondary CPUs
   Kernel Offset: disabled
   CPU features: 0x00000251,20000846
   Memory Limit: none

If nr_vmemmap_pages is set, we know that we are dealing with hotplugged
memory that doesn't have any holes.  So call
page_zone(pfn_to_page(start_pfn)) only when really necessary -- when
nr_vmemmap_pages is set and we actually adjust the present pages.

Link: https://lkml.kernel.org/r/20210526075226.5572-1-david@redhat.com
Fixes: a08a2ae346 ("mm,memory_hotplug: allocate memmap from the added memory range")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Qian Cai (QUIC) <quic_qiancai@quicinc.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-05 08:58:11 -07:00
Ding Hui
bac9c6fa1f mm/page_alloc: fix counting of free pages after take off from buddy
Recently we found that there is a lot MemFree left in /proc/meminfo
after do a lot of pages soft offline, it's not quite correct.

Before Oscar's rework of soft offline for free pages [1], if we soft
offline free pages, these pages are left in buddy with HWPoison flag,
and NR_FREE_PAGES is not updated immediately.  So the difference between
NR_FREE_PAGES and real number of available free pages is also even big
at the beginning.

However, with the workload running, when we catch HWPoison page in any
alloc functions subsequently, we will remove it from buddy, meanwhile
update the NR_FREE_PAGES and try again, so the NR_FREE_PAGES will get
more and more closer to the real number of available free pages.
(regardless of unpoison_memory())

Now, for offline free pages, after a successful call
take_page_off_buddy(), the page is no longer belong to buddy allocator,
and will not be used any more, but we missed accounting NR_FREE_PAGES in
this situation, and there is no chance to be updated later.

Do update in take_page_off_buddy() like rmqueue() does, but avoid double
counting if some one already set_migratetype_isolate() on the page.

[1]: commit 06be6ff3d2 ("mm,hwpoison: rework soft offline for free pages")

Link: https://lkml.kernel.org/r/20210526075247.11130-1-dinghui@sangfor.com.cn
Fixes: 06be6ff3d2 ("mm,hwpoison: rework soft offline for free pages")
Signed-off-by: Ding Hui <dinghui@sangfor.com.cn>
Suggested-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-05 08:58:11 -07:00
Gerald Schaefer
04f7ce3f07 mm/debug_vm_pgtable: fix alignment for pmd/pud_advanced_tests()
In pmd/pud_advanced_tests(), the vaddr is aligned up to the next pmd/pud
entry, and so it does not match the given pmdp/pudp and (aligned down)
pfn any more.

For s390, this results in memory corruption, because the IDTE
instruction used e.g.  in xxx_get_and_clear() will take the vaddr for
some calculations, in combination with the given pmdp.  It will then end
up with a wrong table origin, ending on ...ff8, and some of those
wrongly set low-order bits will also select a wrong pagetable level for
the index addition.  IDTE could therefore invalidate (or 0x20) something
outside of the page tables, depending on the wrongly picked index, which
in turn depends on the random vaddr.

As result, we sometimes see "BUG task_struct (Not tainted): Padding
overwritten" on s390, where one 0x5a padding value got overwritten with
0x7a.

Fix this by aligning down, similar to how the pmd/pud_aligned pfns are
calculated.

Link: https://lkml.kernel.org/r/20210525130043.186290-2-gerald.schaefer@linux.ibm.com
Fixes: a5c3b9ffb0 ("mm/debug_vm_pgtable: add tests validating advanced arch page table helpers")
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: <stable@vger.kernel.org>	[5.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-05 08:58:11 -07:00
Mark Rutland
0711f0d705 pid: take a reference when initializing cad_pid
During boot, kernel_init_freeable() initializes `cad_pid` to the init
task's struct pid.  Later on, we may change `cad_pid` via a sysctl, and
when this happens proc_do_cad_pid() will increment the refcount on the
new pid via get_pid(), and will decrement the refcount on the old pid
via put_pid().  As we never called get_pid() when we initialized
`cad_pid`, we decrement a reference we never incremented, can therefore
free the init task's struct pid early.  As there can be dangling
references to the struct pid, we can later encounter a use-after-free
(e.g.  when delivering signals).

This was spotted when fuzzing v5.13-rc3 with Syzkaller, but seems to
have been around since the conversion of `cad_pid` to struct pid in
commit 9ec52099e4 ("[PATCH] replace cad_pid by a struct pid") from the
pre-KASAN stone age of v2.6.19.

Fix this by getting a reference to the init task's struct pid when we
assign it to `cad_pid`.

Full KASAN splat below.

   ==================================================================
   BUG: KASAN: use-after-free in ns_of_pid include/linux/pid.h:153 [inline]
   BUG: KASAN: use-after-free in task_active_pid_ns+0xc0/0xc8 kernel/pid.c:509
   Read of size 4 at addr ffff23794dda0004 by task syz-executor.0/273

   CPU: 1 PID: 273 Comm: syz-executor.0 Not tainted 5.12.0-00001-g9aef892b2d15 #1
   Hardware name: linux,dummy-virt (DT)
   Call trace:
    ns_of_pid include/linux/pid.h:153 [inline]
    task_active_pid_ns+0xc0/0xc8 kernel/pid.c:509
    do_notify_parent+0x308/0xe60 kernel/signal.c:1950
    exit_notify kernel/exit.c:682 [inline]
    do_exit+0x2334/0x2bd0 kernel/exit.c:845
    do_group_exit+0x108/0x2c8 kernel/exit.c:922
    get_signal+0x4e4/0x2a88 kernel/signal.c:2781
    do_signal arch/arm64/kernel/signal.c:882 [inline]
    do_notify_resume+0x300/0x970 arch/arm64/kernel/signal.c:936
    work_pending+0xc/0x2dc

   Allocated by task 0:
    slab_post_alloc_hook+0x50/0x5c0 mm/slab.h:516
    slab_alloc_node mm/slub.c:2907 [inline]
    slab_alloc mm/slub.c:2915 [inline]
    kmem_cache_alloc+0x1f4/0x4c0 mm/slub.c:2920
    alloc_pid+0xdc/0xc00 kernel/pid.c:180
    copy_process+0x2794/0x5e18 kernel/fork.c:2129
    kernel_clone+0x194/0x13c8 kernel/fork.c:2500
    kernel_thread+0xd4/0x110 kernel/fork.c:2552
    rest_init+0x44/0x4a0 init/main.c:687
    arch_call_rest_init+0x1c/0x28
    start_kernel+0x520/0x554 init/main.c:1064
    0x0

   Freed by task 270:
    slab_free_hook mm/slub.c:1562 [inline]
    slab_free_freelist_hook+0x98/0x260 mm/slub.c:1600
    slab_free mm/slub.c:3161 [inline]
    kmem_cache_free+0x224/0x8e0 mm/slub.c:3177
    put_pid.part.4+0xe0/0x1a8 kernel/pid.c:114
    put_pid+0x30/0x48 kernel/pid.c:109
    proc_do_cad_pid+0x190/0x1b0 kernel/sysctl.c:1401
    proc_sys_call_handler+0x338/0x4b0 fs/proc/proc_sysctl.c:591
    proc_sys_write+0x34/0x48 fs/proc/proc_sysctl.c:617
    call_write_iter include/linux/fs.h:1977 [inline]
    new_sync_write+0x3ac/0x510 fs/read_write.c:518
    vfs_write fs/read_write.c:605 [inline]
    vfs_write+0x9c4/0x1018 fs/read_write.c:585
    ksys_write+0x124/0x240 fs/read_write.c:658
    __do_sys_write fs/read_write.c:670 [inline]
    __se_sys_write fs/read_write.c:667 [inline]
    __arm64_sys_write+0x78/0xb0 fs/read_write.c:667
    __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline]
    invoke_syscall arch/arm64/kernel/syscall.c:49 [inline]
    el0_svc_common.constprop.1+0x16c/0x388 arch/arm64/kernel/syscall.c:129
    do_el0_svc+0xf8/0x150 arch/arm64/kernel/syscall.c:168
    el0_svc+0x28/0x38 arch/arm64/kernel/entry-common.c:416
    el0_sync_handler+0x134/0x180 arch/arm64/kernel/entry-common.c:432
    el0_sync+0x154/0x180 arch/arm64/kernel/entry.S:701

   The buggy address belongs to the object at ffff23794dda0000
    which belongs to the cache pid of size 224
   The buggy address is located 4 bytes inside of
    224-byte region [ffff23794dda0000, ffff23794dda00e0)
   The buggy address belongs to the page:
   page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4dda0
   head:(____ptrval____) order:1 compound_mapcount:0
   flags: 0x3fffc0000010200(slab|head)
   raw: 03fffc0000010200 dead000000000100 dead000000000122 ffff23794d40d080
   raw: 0000000000000000 0000000000190019 00000001ffffffff 0000000000000000
   page dumped because: kasan: bad access detected

   Memory state around the buggy address:
    ffff23794dd9ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    ffff23794dd9ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
   >ffff23794dda0000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                      ^
    ffff23794dda0080: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
    ffff23794dda0100: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00
   ==================================================================

Link: https://lkml.kernel.org/r/20210524172230.38715-1-mark.rutland@arm.com
Fixes: 9ec52099e4 ("[PATCH] replace cad_pid by a struct pid")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-05 08:58:11 -07:00
Marco Elver
8fd0e995cc kfence: use TASK_IDLE when awaiting allocation
Since wait_event() uses TASK_UNINTERRUPTIBLE by default, waiting for an
allocation counts towards load.  However, for KFENCE, this does not make
any sense, since there is no busy work we're awaiting.

Instead, use TASK_IDLE via wait_event_idle() to not count towards load.

BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1185565
Link: https://lkml.kernel.org/r/20210521083209.3740269-1-elver@google.com
Fixes: 407f1d8c1b ("kfence: await for allocation using wait_event")
Signed-off-by: Marco Elver <elver@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Hillf Danton <hdanton@sina.com>
Cc: <stable@vger.kernel.org>	[5.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-05 08:58:11 -07:00
Thomas Bogendoerfer
50c25ee97c Revert "MIPS: make userspace mapping young by default"
This reverts commit f685a533a7.

The MIPS cache flush logic needs to know whether the mapping was already
established to decide how to flush caches.  This is done by checking the
valid bit in the PTE.  The commit above breaks this logic by setting the
valid in the PTE in new mappings, which causes kernel crashes.

Link: https://lkml.kernel.org/r/20210526094335.92948-1-tsbogend@alpha.franken.de
Fixes: f685a533a7 ("MIPS: make userspace mapping young by default")
Reported-by: Zhou Yanjie <zhouyanjie@wanyeetech.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Huang Pei <huangpei@loongson.cn>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-05 08:58:11 -07:00
Linus Torvalds
9d32fa5d74 Networking fixes for 5.13-rc5, including fixes from bpf, wireless,
netfilter and wireguard trees.
 
 The bpf vs lockdown+audit fix is the most notable.
 
 Current release - regressions:
 
  - virtio-net: fix page faults and crashes when XDP is enabled
 
  - mlx5e: fix HW timestamping with CQE compression, and make sure they
           are only allowed to coexist with capable devices
 
  - stmmac:
         - fix kernel panic due to NULL pointer dereference of mdio_bus_data
         - fix double clk unprepare when no PHY device is connected
 
 Current release - new code bugs:
 
  - mt76: a few fixes for the recent MT7921 devices and runtime
          power management
 
 Previous releases - regressions:
 
  - ice: - track AF_XDP ZC enabled queues in bitmap to fix copy mode Tx
         - fix allowing VF to request more/less queues via virtchnl
 	- correct supported and advertised autoneg by using PHY capabilities
         - allow all LLDP packets from PF to Tx
 
  - kbuild: quote OBJCOPY var to avoid a pahole call break the build
 
 Previous releases - always broken:
 
  - bpf, lockdown, audit: fix buggy SELinux lockdown permission checks
 
  - mt76: address the recent FragAttack vulnerabilities not covered
          by generic fixes
 
  - ipv6: fix KASAN: slab-out-of-bounds Read in fib6_nh_flush_exceptions
 
  - Bluetooth:
  	 - fix the erroneous flush_work() order, to avoid double free
          - use correct lock to prevent UAF of hdev object
 
  - nfc: fix NULL ptr dereference in llcp_sock_getname() after failed connect
 
  - ieee802154: multiple fixes to error checking and return values
 
  - igb: fix XDP with PTP enabled
 
  - intel: add correct exception tracing for XDP
 
  - tls: fix use-after-free when TLS offload device goes down and back up
 
  - ipvs: ignore IP_VS_SVC_F_HASHED flag when adding service
 
  - netfilter: nft_ct: skip expectations for confirmed conntrack
 
  - mptcp: fix falling back to TCP in presence of out of order packets
           early in connection lifetime
 
  - wireguard: switch from O(n) to a O(1) algorithm for maintaining peers,
           fixing stalls and a large memory leak in the process
 
 Misc:
 
  - devlink: correct VIRTUAL port to not have phys_port attributes
 
  - Bluetooth: fix VIRTIO_ID_BT assigned number
 
  - net: return the correct errno code ENOBUF -> ENOMEM
 
  - wireguard:
          - peer: allocate in kmem_cache saving 25% on peer memory
          - do not use -O3
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmC6yGMACgkQMUZtbf5S
 Irv67w//ZpT4+KHETUIS+CgeUIgjAQD0FTmO4iboHFGG7BadWEZpEVswUU0xBfY/
 RJrSWAEqTga8zbjWqRaLRx5Qii99F2hHPZ502VR6x6NbPu1mNdS5rUOa61YbtGCv
 v4sC45eOvG7T/y5mceq4rQaPsQKEUUAIgYzIOpjSiDoMfgFCT3UUF/UrBhgLzybj
 aMXd12rg17dN+RJeNOZjQKligNENX9A0tBtSGXxs9hhYYbY25O+uECOsESrA1RKt
 uHeh003iqApT5x8hmJsdMDtis05n7S/Bq1/4RZfAdbTcgJngepw570bQ999tbXqE
 HeB3Ls9k3Vi9W6svfUkYjFGt3GYygsVGPjFAVhC+g0TZXAgdsh5w2SPQAgcIrzIr
 WOfDL9hu7OJp/XRsPiB9pg8cul7a4Q5Yhp29bvN33u43AMij2TWD0CpKCQt9UQdi
 8V0KOLAGC8bzXx35VTP/pbbwAI21PIYxVKfe/0cOJKShTMtfPePx1a2cuYRWoQSP
 PYYbQaY6WhfUniV3DEmvL1Z+dgL0yyaJKIV2IdBHR8MPKKy+5kD+6HDaNo2lO75J
 wWSN1LtoVKrc5msCD375epGmkbjatpWdfzOE+pljWHz5LnW+2cGwFhCo7+UJhAG5
 XwE8+G9YUyYH51PjFpGBsoPBWEmYmIMnY34p20A1Pz1M7/HFfXc=
 =sNP5
 -----END PGP SIGNATURE-----

Merge tag 'net-5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Networking fixes, including fixes from bpf, wireless, netfilter and
  wireguard trees.

  The bpf vs lockdown+audit fix is the most notable.

  Things haven't slowed down just yet, both in terms of regressions in
  current release and largish fixes for older code, but we usually see a
  slowdown only after -rc5.

  Current release - regressions:

   - virtio-net: fix page faults and crashes when XDP is enabled

   - mlx5e: fix HW timestamping with CQE compression, and make sure they
     are only allowed to coexist with capable devices

   - stmmac:
      - fix kernel panic due to NULL pointer dereference of
        mdio_bus_data
      - fix double clk unprepare when no PHY device is connected

  Current release - new code bugs:

   - mt76: a few fixes for the recent MT7921 devices and runtime power
     management

  Previous releases - regressions:

   - ice:
      - track AF_XDP ZC enabled queues in bitmap to fix copy mode Tx
      - fix allowing VF to request more/less queues via virtchnl
      - correct supported and advertised autoneg by using PHY
        capabilities
      - allow all LLDP packets from PF to Tx

   - kbuild: quote OBJCOPY var to avoid a pahole call break the build

  Previous releases - always broken:

   - bpf, lockdown, audit: fix buggy SELinux lockdown permission checks

   - mt76: address the recent FragAttack vulnerabilities not covered by
     generic fixes

   - ipv6: fix KASAN: slab-out-of-bounds Read in
     fib6_nh_flush_exceptions

   - Bluetooth:
      - fix the erroneous flush_work() order, to avoid double free
      - use correct lock to prevent UAF of hdev object

   - nfc: fix NULL ptr dereference in llcp_sock_getname() after failed
     connect

   - ieee802154: multiple fixes to error checking and return values

   - igb: fix XDP with PTP enabled

   - intel: add correct exception tracing for XDP

   - tls: fix use-after-free when TLS offload device goes down and back
     up

   - ipvs: ignore IP_VS_SVC_F_HASHED flag when adding service

   - netfilter: nft_ct: skip expectations for confirmed conntrack

   - mptcp: fix falling back to TCP in presence of out of order packets
     early in connection lifetime

   - wireguard: switch from O(n) to a O(1) algorithm for maintaining
     peers, fixing stalls and a large memory leak in the process

  Misc:

   - devlink: correct VIRTUAL port to not have phys_port attributes

   - Bluetooth: fix VIRTIO_ID_BT assigned number

   - net: return the correct errno code ENOBUF -> ENOMEM

   - wireguard:
      - peer: allocate in kmem_cache saving 25% on peer memory
      - do not use -O3"

* tag 'net-5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits)
  cxgb4: avoid link re-train during TC-MQPRIO configuration
  sch_htb: fix refcount leak in htb_parent_to_leaf_offload
  wireguard: allowedips: free empty intermediate nodes when removing single node
  wireguard: allowedips: allocate nodes in kmem_cache
  wireguard: allowedips: remove nodes in O(1)
  wireguard: allowedips: initialize list head in selftest
  wireguard: peer: allocate in kmem_cache
  wireguard: use synchronize_net rather than synchronize_rcu
  wireguard: do not use -O3
  wireguard: selftests: make sure rp_filter is disabled on vethc
  wireguard: selftests: remove old conntrack kconfig value
  virtchnl: Add missing padding to virtchnl_proto_hdrs
  ice: Allow all LLDP packets from PF to Tx
  ice: report supported and advertised autoneg using PHY capabilities
  ice: handle the VF VSI rebuild failure
  ice: Fix VFR issues for AVF drivers that expect ATQLEN cleared
  ice: Fix allowing VF to request more/less queues via virtchnl
  virtio-net: fix for skb_over_panic inside big mode
  ipv6: Fix KASAN: slab-out-of-bounds Read in fib6_nh_flush_exceptions
  fib: Return the correct errno code
  ...
2021-06-04 18:25:39 -07:00
Linus Torvalds
2cb26c15a2 perf tools fixes for v5.13: 4th batch
- Fix NULL pointer dereference in 'perf probe' when handling
   DW_AT_const_value when looking for a variable, which is valid.
 
 - Fix for capability querying of perf_event_attr.cgroup support in older
   kernels.
 
 - Add missing cloning of evsel->use_config_name.
 
 - Honor event config name on --no-merge in 'perf stat'.
 
 - Fix some memory leaks found using ASAN.
 
 - Fix the perf entry for perf_event_attr setup with make LIBPFM4=1 on
   s390 z/VM.
 
 - Update MIPS UAPI perf_regs.h file.
 
 - Fix 'perf stat' BPF counter load return check.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYLqEvgAKCRCyPKLppCJ+
 J8nOAPwNmtXNGu2a/aV23go751d6s69oiEjhA9tmJfKrVnLdOwEA5r9MviUjml1+
 li7TGcNo74dNaXgcuaZ9Oi854xEbvA4=
 =8qmn
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-fixes-for-v5.13-2021-06-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix NULL pointer dereference in 'perf probe' when handling
   DW_AT_const_value when looking for a variable, which is valid.

 - Fix for capability querying of perf_event_attr.cgroup support in
   older kernels.

 - Add missing cloning of evsel->use_config_name.

 - Honor event config name on --no-merge in 'perf stat'.

 - Fix some memory leaks found using ASAN.

 - Fix the perf entry for perf_event_attr setup with make LIBPFM4=1 on
   s390 z/VM.

 - Update MIPS UAPI perf_regs.h file.

 - Fix 'perf stat' BPF counter load return check.

* tag 'perf-tools-fixes-for-v5.13-2021-06-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf env: Fix memory leak of bpf_prog_info_linear member
  perf symbol-elf: Fix memory leak by freeing sdt_note.args
  perf stat: Honor event config name on --no-merge
  perf evsel: Add missing cloning of evsel->use_config_name
  perf test: Test 17 fails with make LIBPFM4=1 on s390 z/VM
  perf stat: Fix error return code in bperf__load()
  perf record: Move probing cgroup sampling support
  perf probe: Fix NULL pointer dereference in convert_variable_location()
  perf tools: Copy uapi/asm/perf_regs.h from the kernel for MIPS
2021-06-04 18:15:33 -07:00
Linus Torvalds
ff6091075a pci-v5.13-fixes-1
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmC6h7YUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vyuOw//cJICxrEqA6gGUgvEJbDyVQfkb0tx
 pFuTRVVbxAa6nGM3Ay2JRMHlcilvgre0vGPMu4ZtK/F8Qbcx6uqN0ARC5jSSbFVn
 ynExJZzlJPWWyZGaUNSuIKtXCUSYbfCbelXYxm9Fl70zldRR6zy05E9qToKULyj9
 tmqmIHCfRZnAyMileFULE4Iz7BlT5UO8738CMYcDIzW148cEiUpnjIQgvrRcRZ4E
 C7nXV6CSaHjQdqCqxfUFR5Ozw2QkNGIbF5Zguc54IT7TnHLBuXkszltEWRAgXLJf
 bWpItDpyEZe0rrqUQE4MxPiGDU0NtgweF0fsIf6NjahgKF2rLzRUKP4ftoCTg1my
 FbDfqxtre1mS/1/joQIygQGy7ghlT7QxjHgIgTv1ttcxez8VaETYC2CUymmdFSeP
 AFmmGyROJQMMWxZ5eJgtw45KAoiYgme2ZjVZLrzjrvLRo0RspwcIqd3kVQNBx9ck
 dIZanC1kAtoYtSnZxa7QhpvQPTsGjzAzUnFCKCeDymxPWgfEHsTGCepJ+bWk0TmS
 2BgAK26NK62pxVUeGgeHI8Ci9JAgAdEWlMnXX79F8G83mKngMLyqdDo22T0cFKio
 8yU08LvHKDotF3x0nkge8OfceJ5uNUwWHuku8wLc1UwAXRKnb8SKWNbdsvZ2hFrx
 kHMc/UyM4iAf830=
 =exOH
 -----END PGP SIGNATURE-----

Merge tag 'pci-v5.13-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:

 - Fix MSIs for platforms with "msi-map" device-tree property, which we
   broke in v5.13-rc1 (Jean-Philippe Brucker)

 - Add Krzysztof Wilczyński as PCI reviewer (Lorenzo Pieralisi)

* tag 'pci-v5.13-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI/MSI: Fix MSIs for generic hosts that use device-tree's "msi-map"
  MAINTAINERS: Add Krzysztof as PCI host/endpoint controllers reviewer
2021-06-04 15:19:45 -07:00
Rahul Lakkireddy
3822d0670c cxgb4: avoid link re-train during TC-MQPRIO configuration
When configuring TC-MQPRIO offload, only turn off netdev carrier and
don't bring physical link down in hardware. Otherwise, when the
physical link is brought up again after configuration, it gets
re-trained and stalls ongoing traffic.

Also, when firmware is no longer accessible or crashed, avoid sending
FLOWC and waiting for reply that will never come.

Fix following hung_task_timeout_secs trace seen in these cases.

INFO: task tc:20807 blocked for more than 122 seconds.
      Tainted: G S                5.13.0-rc3+ #122
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:tc   state:D stack:14768 pid:20807 ppid: 19366 flags:0x00000000
Call Trace:
 __schedule+0x27b/0x6a0
 schedule+0x37/0xa0
 schedule_preempt_disabled+0x5/0x10
 __mutex_lock.isra.14+0x2a0/0x4a0
 ? netlink_lookup+0x120/0x1a0
 ? rtnl_fill_ifinfo+0x10f0/0x10f0
 __netlink_dump_start+0x70/0x250
 rtnetlink_rcv_msg+0x28b/0x380
 ? rtnl_fill_ifinfo+0x10f0/0x10f0
 ? rtnl_calcit.isra.42+0x120/0x120
 netlink_rcv_skb+0x4b/0xf0
 netlink_unicast+0x1a0/0x280
 netlink_sendmsg+0x216/0x440
 sock_sendmsg+0x56/0x60
 __sys_sendto+0xe9/0x150
 ? handle_mm_fault+0x6d/0x1b0
 ? do_user_addr_fault+0x1c5/0x620
 __x64_sys_sendto+0x1f/0x30
 do_syscall_64+0x3c/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f7f73218321
RSP: 002b:00007ffd19626208 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 000055b7c0a8b240 RCX: 00007f7f73218321
RDX: 0000000000000028 RSI: 00007ffd19626210 RDI: 0000000000000003
RBP: 000055b7c08680ff R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000055b7c085f5f6
R13: 000055b7c085f60a R14: 00007ffd19636470 R15: 00007ffd196262a0

Fixes: b1396c2bd6 ("cxgb4: parse and configure TC-MQPRIO offload")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04 14:45:13 -07:00
Yunjian Wang
944d671d5f sch_htb: fix refcount leak in htb_parent_to_leaf_offload
The commit ae81feb733 ("sch_htb: fix null pointer dereference
on a null new_q") fixes a NULL pointer dereference bug, but it
is not correct.

Because htb_graft_helper properly handles the case when new_q
is NULL, and after the previous patch by skipping this call
which creates an inconsistency : dev_queue->qdisc will still
point to the old qdisc, but cl->parent->leaf.q will point to
the new one (which will be noop_qdisc, because new_q was NULL).
The code is based on an assumption that these two pointers are
the same, so it can lead to refcount leaks.

The correct fix is to add a NULL pointer check to protect
qdisc_refcount_inc inside htb_parent_to_leaf_offload.

Fixes: ae81feb733 ("sch_htb: fix null pointer dereference on a null new_q")
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Suggested-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04 14:44:18 -07:00
David S. Miller
26821ecd3b Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2021-06-04

This series contains updates to virtchnl header file and ice driver.

Brett fixes VF being unable to request a different number of queues then
allocated and adds clearing of VF_MBX_ATQLEN register for VF reset.

Haiyue handles error of rebuilding VF VSI during reset.

Paul fixes reporting of autoneg to use the PHY capabilities.

Dave allows LLDP packets without priority of TC_PRIO_CONTROL to be
transmitted.

Geert Uytterhoeven adds explicit padding to virtchnl_proto_hdrs
structure in the virtchnl header file.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04 14:27:07 -07:00
David S. Miller
6fd815bb1e Merge branch 'wireguard-fixes'
Jason A. Donenfeld says:

====================
wireguard fixes for 5.13-rc5

Here are bug fixes to WireGuard for 5.13-rc5:

1-2,6) These are small, trivial tweaks to our test harness.

3) Linus thinks -O3 is still dangerous to enable. The code gen wasn't so
   much different with -O2 either.

4) We were accidentally calling synchronize_rcu instead of
   synchronize_net while holding the rtnl_lock, resulting in some rather
   large stalls that hit production machines.

5) Peer allocation was wasting literally hundreds of megabytes on real
   world deployments, due to oddly sized large objects not fitting
   nicely into a kmalloc slab.

7-9) We move from an insanely expensive O(n) algorithm to a fast O(1)
     algorithm, and cleanup a massive memory leak in the process, in
     which allowed ips churn would leave danging nodes hanging around
     without cleanup until the interface was removed. The O(1) algorithm
     eliminates packet stalls and high latency issues, in addition to
     bringing operations that took as much as 10 minutes down to less
     than a second.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04 14:25:14 -07:00
Jason A. Donenfeld
bf7b042dc6 wireguard: allowedips: free empty intermediate nodes when removing single node
When removing single nodes, it's possible that that node's parent is an
empty intermediate node, in which case, it too should be removed.
Otherwise the trie fills up and never is fully emptied, leading to
gradual memory leaks over time for tries that are modified often. There
was originally code to do this, but was removed during refactoring in
2016 and never reworked. Now that we have proper parent pointers from
the previous commits, we can implement this properly.

In order to reduce branching and expensive comparisons, we want to keep
the double pointer for parent assignment (which lets us easily chain up
to the root), but we still need to actually get the parent's base
address. So encode the bit number into the last two bits of the pointer,
and pack and unpack it as needed. This is a little bit clumsy but is the
fastest and less memory wasteful of the compromises. Note that we align
the root struct here to a minimum of 4, because it's embedded into a
larger struct, and we're relying on having the bottom two bits for our
flag, which would only be 16-bit aligned on m68k.

The existing macro-based helpers were a bit unwieldy for adding the bit
packing to, so this commit replaces them with safer and clearer ordinary
functions.

We add a test to the randomized/fuzzer part of the selftests, to free
the randomized tries by-peer, refuzz it, and repeat, until it's supposed
to be empty, and then then see if that actually resulted in the whole
thing being emptied. That combined with kmemcheck should hopefully make
sure this commit is doing what it should. Along the way this resulted in
various other cleanups of the tests and fixes for recent graphviz.

Fixes: e7096c131e ("net: WireGuard secure network tunnel")
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04 14:25:14 -07:00
Jason A. Donenfeld
dc680de28c wireguard: allowedips: allocate nodes in kmem_cache
The previous commit moved from O(n) to O(1) for removal, but in the
process introduced an additional pointer member to a struct that
increased the size from 60 to 68 bytes, putting nodes in the 128-byte
slab. With deployed systems having as many as 2 million nodes, this
represents a significant doubling in memory usage (128 MiB -> 256 MiB).
Fix this by using our own kmem_cache, that's sized exactly right. This
also makes wireguard's memory usage more transparent in tools like
slabtop and /proc/slabinfo.

Fixes: e7096c131e ("net: WireGuard secure network tunnel")
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04 14:25:14 -07:00
Jason A. Donenfeld
f634f418c2 wireguard: allowedips: remove nodes in O(1)
Previously, deleting peers would require traversing the entire trie in
order to rebalance nodes and safely free them. This meant that removing
1000 peers from a trie with a half million nodes would take an extremely
long time, during which we're holding the rtnl lock. Large-scale users
were reporting 200ms latencies added to the networking stack as a whole
every time their userspace software would queue up significant removals.
That's a serious situation.

This commit fixes that by maintaining a double pointer to the parent's
bit pointer for each node, and then using the already existing node list
belonging to each peer to go directly to the node, fix up its pointers,
and free it with RCU. This means removal is O(1) instead of O(n), and we
don't use gobs of stack.

The removal algorithm has the same downside as the code that it fixes:
it won't collapse needlessly long runs of fillers.  We can enhance that
in the future if it ever becomes a problem. This commit documents that
limitation with a TODO comment in code, a small but meaningful
improvement over the prior situation.

Currently the biggest flaw, which the next commit addresses, is that
because this increases the node size on 64-bit machines from 60 bytes to
68 bytes. 60 rounds up to 64, but 68 rounds up to 128. So we wind up
using twice as much memory per node, because of power-of-two
allocations, which is a big bummer. We'll need to figure something out
there.

Fixes: e7096c131e ("net: WireGuard secure network tunnel")
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04 14:25:14 -07:00
Jason A. Donenfeld
46cfe8eee2 wireguard: allowedips: initialize list head in selftest
The randomized trie tests weren't initializing the dummy peer list head,
resulting in a NULL pointer dereference when used. Fix this by
initializing it in the randomized trie test, just like we do for the
static unit test.

While we're at it, all of the other strings like this have the word
"self-test", so add it to the missing place here.

Fixes: e7096c131e ("net: WireGuard secure network tunnel")
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04 14:25:14 -07:00
Jason A. Donenfeld
a4e9f8e328 wireguard: peer: allocate in kmem_cache
With deployments having upwards of 600k peers now, this somewhat heavy
structure could benefit from more fine-grained allocations.
Specifically, instead of using a 2048-byte slab for a 1544-byte object,
we can now use 1544-byte objects directly, thus saving almost 25%
per-peer, or with 600k peers, that's a savings of 303 MiB. This also
makes wireguard's memory usage more transparent in tools like slabtop
and /proc/slabinfo.

Fixes: 8b5553ace8 ("wireguard: queueing: get rid of per-peer ring buffers")
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04 14:25:14 -07:00