From 00a99339f0a30b593d06f31499c4bbd965979567 Mon Sep 17 00:00:00 2001 From: Vineet Gupta Date: Fri, 7 Sep 2018 15:13:10 -0700 Subject: [PATCH 001/134] ARCv2: build: use mcpu=hs38 iso generic mcpu=archs helps gcc with better instruction selections such as 64-bit multiply MPYD before ------ 82c34b58 : 82c34b58: ld r2,[0x83068d00] 82c34b60: add_s r2,r2,0x7530 82c34b66: mov_s r0,0x989680 82c34b6c: mpymu r5,r2,r0 82c34b70: mpy r4,r2,r0 82c34b74: mov_s r0,r4 82c34b76: j_s.d [blink] 82c34b78: mov_s r1,r5 82c34b7a: nop_s after ------ 82c34b7c : 82c34b7c: ld r0,[0x83064d00] 82c34b84: add_s r0,r0,0x7530 82c34b8a: mpydu r0,r0,0x989680 82c34b92: j_s [blink] Signed-off-by: Vineet Gupta --- arch/arc/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arc/Makefile b/arch/arc/Makefile index 99cce77ab98f..dcc0f0cb0ca5 100644 --- a/arch/arc/Makefile +++ b/arch/arc/Makefile @@ -18,7 +18,7 @@ KBUILD_DEFCONFIG := nsim_700_defconfig cflags-y += -fno-common -pipe -fno-builtin -mmedium-calls -D__linux__ cflags-$(CONFIG_ISA_ARCOMPACT) += -mA7 -cflags-$(CONFIG_ISA_ARCV2) += -mcpu=archs +cflags-$(CONFIG_ISA_ARCV2) += -mcpu=hs38 is_700 = $(shell $(CC) -dM -E - < /dev/null | grep -q "ARC700" && echo 1 || echo 0) From eea96566c189c77e5272585984eb2729881a2f1d Mon Sep 17 00:00:00 2001 From: Sascha Hauer Date: Wed, 12 Sep 2018 08:23:01 +0200 Subject: [PATCH 002/134] ARM: dts: imx53-qsb: disable 1.2GHz OPP The maximum CPU frequency for the i.MX53 QSB is 1GHz, so disable the 1.2GHz OPP. This makes the board work again with configs that have cpufreq enabled like imx_v6_v7_defconfig on which the board stopped working with the addition of cpufreq-dt support. Fixes: 791f416608 ("ARM: dts: imx53: add cpufreq-dt support") Signed-off-by: Sascha Hauer Signed-off-by: Shawn Guo --- arch/arm/boot/dts/imx53-qsb-common.dtsi | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/arch/arm/boot/dts/imx53-qsb-common.dtsi b/arch/arm/boot/dts/imx53-qsb-common.dtsi index 7423d462d1e4..50dde84b72ed 100644 --- a/arch/arm/boot/dts/imx53-qsb-common.dtsi +++ b/arch/arm/boot/dts/imx53-qsb-common.dtsi @@ -123,6 +123,17 @@ }; }; +&cpu0 { + /* CPU rated to 1GHz, not 1.2GHz as per the default settings */ + operating-points = < + /* kHz uV */ + 166666 850000 + 400000 900000 + 800000 1050000 + 1000000 1200000 + >; +}; + &esdhc1 { pinctrl-names = "default"; pinctrl-0 = <&pinctrl_esdhc1>; From 615f64458ad890ef94abc879a66d8b27236e733a Mon Sep 17 00:00:00 2001 From: Alexey Brodkin Date: Thu, 13 Sep 2018 23:24:28 +0300 Subject: [PATCH 003/134] ARC: build: Get rid of toolchain check This check is very naive: we simply test if GCC invoked without "-mcpu=XXX" has ARC700 define set. In that case we think that GCC was built with "--with-cpu=arc700" and has libgcc built for ARC700. Otherwise if ARC700 is not defined we think that everythng was built for ARCv2. But in reality our life is much more interesting. 1. Regardless of GCC configuration (i.e. what we pass in "--with-cpu" it may generate code for any ARC core). 2. libgcc might be built with explicitly specified "--mcpu=YYY" That's exactly what happens in case of multilibbed toolchains: - GCC is configured with default settings - All the libs built for many different CPU flavors I.e. that check gets in the way of usage of multilibbed toolchains. And even non-multilibbed toolchains are affected. OpenEmbedded also builds GCC without "--with-cpu" because each and every target component later is compiled with explicitly set "-mcpu=ZZZ". Acked-by: Rob Herring Signed-off-by: Alexey Brodkin Signed-off-by: Vineet Gupta --- arch/arc/Makefile | 14 -------------- 1 file changed, 14 deletions(-) diff --git a/arch/arc/Makefile b/arch/arc/Makefile index dcc0f0cb0ca5..0dbce0235617 100644 --- a/arch/arc/Makefile +++ b/arch/arc/Makefile @@ -20,20 +20,6 @@ cflags-y += -fno-common -pipe -fno-builtin -mmedium-calls -D__linux__ cflags-$(CONFIG_ISA_ARCOMPACT) += -mA7 cflags-$(CONFIG_ISA_ARCV2) += -mcpu=hs38 -is_700 = $(shell $(CC) -dM -E - < /dev/null | grep -q "ARC700" && echo 1 || echo 0) - -ifdef CONFIG_ISA_ARCOMPACT -ifeq ($(is_700), 0) - $(error Toolchain not configured for ARCompact builds) -endif -endif - -ifdef CONFIG_ISA_ARCV2 -ifeq ($(is_700), 1) - $(error Toolchain not configured for ARCv2 builds) -endif -endif - ifdef CONFIG_ARC_CURR_IN_REG # For a global register defintion, make sure it gets passed to every file # We had a customer reported bug where some code built in kernel was NOT using From 36cae568404a298a19a6e8a3f18641075d4cab04 Mon Sep 17 00:00:00 2001 From: Kristian Evensen Date: Thu, 13 Sep 2018 11:21:49 +0200 Subject: [PATCH 004/134] USB: serial: option: improve Quectel EP06 detection The Quectel EP06 (and EM06/EG06) LTE modem supports updating the USB configuration, without the VID/PID or configuration number changing. When the configuration is updated and interfaces are added/removed, the interface numbers are updated. This causes our current code for matching EP06 not to work as intended, as the assumption about reserved interfaces no longer holds. If for example the diagnostic (first) interface is removed, option will (try to) bind to the QMI interface. This patch improves EP06 detection by replacing the current match with two matches, and those matches check class, subclass and protocol as well as VID and PID. The diag interface exports class, subclass and protocol as 0xff. For the other serial interfaces, class is 0xff and subclass and protocol are both 0x0. The modem can export the following devices and always in this order: diag, nmea, at, ppp. qmi and adb. This means that diag can only ever be interface 0, and interface numbers 1-5 should be marked as reserved. The three other serial devices can have interface numbers 0-3, but I have not marked any interfaces as reserved. The reason is that the serial devices are the only interfaces exported by the device where subclass and protocol is 0x0. QMI exports the same class, subclass and protocol values as the diag interface. However, the two interfaces have different number of endpoints, QMI has three and diag two. I have added a check for number of interfaces if VID/PID matches the EP06, and we ignore the device if number of interfaces equals three (and subclass is set). Signed-off-by: Kristian Evensen Acked-by: Dan Williams [ johan: drop uneeded RSVD(5) for ADB ] Cc: stable Signed-off-by: Johan Hovold --- drivers/usb/serial/option.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c index 0215b70c4efc..382feafbd127 100644 --- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -1081,8 +1081,9 @@ static const struct usb_device_id option_ids[] = { .driver_info = RSVD(4) }, { USB_DEVICE(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_BG96), .driver_info = RSVD(4) }, - { USB_DEVICE(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EP06), - .driver_info = RSVD(4) | RSVD(5) }, + { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EP06, 0xff, 0xff, 0xff), + .driver_info = RSVD(1) | RSVD(2) | RSVD(3) | RSVD(4) }, + { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EP06, 0xff, 0, 0) }, { USB_DEVICE(CMOTECH_VENDOR_ID, CMOTECH_PRODUCT_6001) }, { USB_DEVICE(CMOTECH_VENDOR_ID, CMOTECH_PRODUCT_CMU_300) }, { USB_DEVICE(CMOTECH_VENDOR_ID, CMOTECH_PRODUCT_6003), @@ -1985,6 +1986,7 @@ static int option_probe(struct usb_serial *serial, { struct usb_interface_descriptor *iface_desc = &serial->interface->cur_altsetting->desc; + struct usb_device_descriptor *dev_desc = &serial->dev->descriptor; unsigned long device_flags = id->driver_info; /* Never bind to the CD-Rom emulation interface */ @@ -1999,6 +2001,18 @@ static int option_probe(struct usb_serial *serial, if (device_flags & RSVD(iface_desc->bInterfaceNumber)) return -ENODEV; + /* + * Don't bind to the QMI device of the Quectel EP06/EG06/EM06. Class, + * subclass and protocol is 0xff for both the diagnostic port and the + * QMI interface, but the diagnostic port only has two endpoints (QMI + * has three). + */ + if (dev_desc->idVendor == cpu_to_le16(QUECTEL_VENDOR_ID) && + dev_desc->idProduct == cpu_to_le16(QUECTEL_PRODUCT_EP06) && + iface_desc->bInterfaceSubClass && iface_desc->bNumEndpoints == 3) { + return -ENODEV; + } + /* Store the device flags so we can use them during attach. */ usb_set_serial_data(serial, (void *)device_flags); From 35aecc02b5b621782111f64cbb032c7f6a90bb32 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Thu, 13 Sep 2018 11:21:50 +0200 Subject: [PATCH 005/134] USB: serial: option: add two-endpoints device-id flag Allow matching on interfaces having two endpoints by adding a new device-id flag. This allows for the handling of devices whose interface numbers can change (e.g. Quectel EP06) to be contained in the device-id table. Tested-by: Kristian Evensen Cc: stable Signed-off-by: Johan Hovold --- drivers/usb/serial/option.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c index 382feafbd127..e72ad9f81c73 100644 --- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -561,6 +561,9 @@ static void option_instat_callback(struct urb *urb); /* Interface is reserved */ #define RSVD(ifnum) ((BIT(ifnum) & 0xff) << 0) +/* Interface must have two endpoints */ +#define NUMEP2 BIT(16) + static const struct usb_device_id option_ids[] = { { USB_DEVICE(OPTION_VENDOR_ID, OPTION_PRODUCT_COLT) }, @@ -1082,7 +1085,7 @@ static const struct usb_device_id option_ids[] = { { USB_DEVICE(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_BG96), .driver_info = RSVD(4) }, { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EP06, 0xff, 0xff, 0xff), - .driver_info = RSVD(1) | RSVD(2) | RSVD(3) | RSVD(4) }, + .driver_info = RSVD(1) | RSVD(2) | RSVD(3) | RSVD(4) | NUMEP2 }, { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EP06, 0xff, 0, 0) }, { USB_DEVICE(CMOTECH_VENDOR_ID, CMOTECH_PRODUCT_6001) }, { USB_DEVICE(CMOTECH_VENDOR_ID, CMOTECH_PRODUCT_CMU_300) }, @@ -1986,7 +1989,6 @@ static int option_probe(struct usb_serial *serial, { struct usb_interface_descriptor *iface_desc = &serial->interface->cur_altsetting->desc; - struct usb_device_descriptor *dev_desc = &serial->dev->descriptor; unsigned long device_flags = id->driver_info; /* Never bind to the CD-Rom emulation interface */ @@ -2002,16 +2004,11 @@ static int option_probe(struct usb_serial *serial, return -ENODEV; /* - * Don't bind to the QMI device of the Quectel EP06/EG06/EM06. Class, - * subclass and protocol is 0xff for both the diagnostic port and the - * QMI interface, but the diagnostic port only has two endpoints (QMI - * has three). + * Allow matching on bNumEndpoints for devices whose interface numbers + * can change (e.g. Quectel EP06). */ - if (dev_desc->idVendor == cpu_to_le16(QUECTEL_VENDOR_ID) && - dev_desc->idProduct == cpu_to_le16(QUECTEL_PRODUCT_EP06) && - iface_desc->bInterfaceSubClass && iface_desc->bNumEndpoints == 3) { + if (device_flags & NUMEP2 && iface_desc->bNumEndpoints != 2) return -ENODEV; - } /* Store the device flags so we can use them during attach. */ usb_set_serial_data(serial, (void *)device_flags); From 7c2020c3022dfa6801d420b55ef0c507322d2c60 Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Fri, 14 Sep 2018 12:27:27 +0100 Subject: [PATCH 006/134] ARC: fix spelling mistake "entires" -> "entries" Trivial fix to spelling mistake in Kconfig Signed-off-by: Colin Ian King Signed-off-by: Vineet Gupta --- arch/arc/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig index b4441b0764d7..a045f3086047 100644 --- a/arch/arc/Kconfig +++ b/arch/arc/Kconfig @@ -149,7 +149,7 @@ config ARC_CPU_770 Support for ARC770 core introduced with Rel 4.10 (Summer 2011) This core has a bunch of cool new features: -MMU-v3: Variable Page Sz (4k, 8k, 16k), bigger J-TLB (128x4) - Shared Address Spaces (for sharing TLB entires in MMU) + Shared Address Spaces (for sharing TLB entries in MMU) -Caches: New Prog Model, Region Flush -Insns: endian swap, load-locked/store-conditional, time-stamp-ctr From 40660f1fcee8d524a60b5101538e42b1f39f106d Mon Sep 17 00:00:00 2001 From: Alexey Brodkin Date: Sun, 16 Sep 2018 23:47:57 +0300 Subject: [PATCH 007/134] ARC: build: Don't set CROSS_COMPILE in arch's Makefile There's not much sense in doing that because if user or his build-system didn't set CROSS_COMPILE we still may very well make incorrect guess. But as it turned out setting CROSS_COMPILE is not as harmless as one may think: with recent changes that implemented automatic discovery of __host__ gcc features unconditional setup of CROSS_COMPILE leads to failures on execution of "make xxx_defconfig" with absent cross-compiler, for more info see [1]. Set CROSS_COMPILE as well gets in the way if we want only to build .dtb's (again with absent cross-compiler which is not really needed for building .dtb's), see [2]. Note, we had to change LIBGCC assignment type from ":=" to "=" so that is is resolved on its usage, otherwise if it is resolved at declaration time with missing CROSS_COMPILE we're getting this error message from host GCC: | gcc: error: unrecognized command line option -mmedium-calls | gcc: error: unrecognized command line option -mno-sdata [1] http://lists.infradead.org/pipermail/linux-snps-arc/2018-September/004308.html [2] http://lists.infradead.org/pipermail/linux-snps-arc/2018-September/004320.html Signed-off-by: Alexey Brodkin Cc: Masahiro Yamada Cc: Rob Herring Signed-off-by: Vineet Gupta --- arch/arc/Makefile | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/arch/arc/Makefile b/arch/arc/Makefile index 0dbce0235617..644815c0516e 100644 --- a/arch/arc/Makefile +++ b/arch/arc/Makefile @@ -6,14 +6,6 @@ # published by the Free Software Foundation. # -ifeq ($(CROSS_COMPILE),) -ifndef CONFIG_CPU_BIG_ENDIAN -CROSS_COMPILE := arc-linux- -else -CROSS_COMPILE := arceb-linux- -endif -endif - KBUILD_DEFCONFIG := nsim_700_defconfig cflags-y += -fno-common -pipe -fno-builtin -mmedium-calls -D__linux__ @@ -65,7 +57,7 @@ cflags-$(disable_small_data) += -mno-sdata -fcall-used-gp cflags-$(CONFIG_CPU_BIG_ENDIAN) += -mbig-endian ldflags-$(CONFIG_CPU_BIG_ENDIAN) += -EB -LIBGCC := $(shell $(CC) $(cflags-y) --print-libgcc-file-name) +LIBGCC = $(shell $(CC) $(cflags-y) --print-libgcc-file-name) # Modules with short calls might break for calls into builtin-kernel KBUILD_CFLAGS_MODULE += -mlong-calls -mno-millicode From 5a4630aadb9a9525474e9ac92965829f990cb5c4 Mon Sep 17 00:00:00 2001 From: Joel Stanley Date: Mon, 17 Sep 2018 17:07:54 +0930 Subject: [PATCH 008/134] ftrace: Build with CPPFLAGS to get -Qunused-arguments When building to record the mcount locations the kernel uses KBUILD_CFLAGS but not KBUILD_CPPFLAGS. This means it lacks -Qunused-arguments when building with clang, resulting in a lot of noisy warnings. Signed-off-by: Joel Stanley Reviewed-by: Nick Desaulniers Signed-off-by: Masahiro Yamada --- scripts/Makefile.build | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/Makefile.build b/scripts/Makefile.build index 5a2d1c9578a0..54da4b070db3 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -219,7 +219,7 @@ else sub_cmd_record_mcount = set -e ; perl $(srctree)/scripts/recordmcount.pl "$(ARCH)" \ "$(if $(CONFIG_CPU_BIG_ENDIAN),big,little)" \ "$(if $(CONFIG_64BIT),64,32)" \ - "$(OBJDUMP)" "$(OBJCOPY)" "$(CC) $(KBUILD_CFLAGS)" \ + "$(OBJDUMP)" "$(OBJCOPY)" "$(CC) $(KBUILD_CPPFLAGS) $(KBUILD_CFLAGS)" \ "$(LD) $(KBUILD_LDFLAGS)" "$(NM)" "$(RM)" "$(MV)" \ "$(if $(part-of-module),1,0)" "$(@)"; recordmcount_source := $(srctree)/scripts/recordmcount.pl From ef8c4ed9db80261f397f0c0bf723684601ae3b52 Mon Sep 17 00:00:00 2001 From: Stefan Agner Date: Mon, 17 Sep 2018 19:31:57 -0700 Subject: [PATCH 009/134] kbuild: allow to use GCC toolchain not in Clang search path When using a GCC cross toolchain which is not in a compiled in Clang search path, Clang reverts to the system assembler and linker. This leads to assembler or linker errors, depending on which tool is first used for a given architecture. It seems that Clang is not searching $PATH for a matching assembler or linker. Make sure that Clang picks up the correct assembler or linker by passing the cross compilers bin directory as search path. This allows to use Clang provided by distributions with GCC toolchains not in /usr/bin. Link: https://github.com/ClangBuiltLinux/linux/issues/78 Signed-off-by: Stefan Agner Reviewed-and-tested-by: Nick Desaulniers Signed-off-by: Masahiro Yamada --- Makefile | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/Makefile b/Makefile index 4d5c883a98e5..d5de2db4b549 100644 --- a/Makefile +++ b/Makefile @@ -495,13 +495,15 @@ endif ifeq ($(cc-name),clang) ifneq ($(CROSS_COMPILE),) CLANG_TARGET := --target=$(notdir $(CROSS_COMPILE:%-=%)) -GCC_TOOLCHAIN := $(realpath $(dir $(shell which $(LD)))/..) +GCC_TOOLCHAIN_DIR := $(dir $(shell which $(LD))) +CLANG_PREFIX := --prefix=$(GCC_TOOLCHAIN_DIR) +GCC_TOOLCHAIN := $(realpath $(GCC_TOOLCHAIN_DIR)/..) endif ifneq ($(GCC_TOOLCHAIN),) CLANG_GCC_TC := --gcc-toolchain=$(GCC_TOOLCHAIN) endif -KBUILD_CFLAGS += $(CLANG_TARGET) $(CLANG_GCC_TC) -KBUILD_AFLAGS += $(CLANG_TARGET) $(CLANG_GCC_TC) +KBUILD_CFLAGS += $(CLANG_TARGET) $(CLANG_GCC_TC) $(CLANG_PREFIX) +KBUILD_AFLAGS += $(CLANG_TARGET) $(CLANG_GCC_TC) $(CLANG_PREFIX) KBUILD_CFLAGS += $(call cc-option, -no-integrated-as) KBUILD_AFLAGS += $(call cc-option, -no-integrated-as) endif From 55a5542a546238354d1f209f794414168cf8c71d Mon Sep 17 00:00:00 2001 From: Gerald Schaefer Date: Mon, 10 Sep 2018 18:03:29 +0200 Subject: [PATCH 010/134] s390/hibernate: fix error handling when suspend cpu != resume cpu The resume code checks if the resume cpu is the same as the suspend cpu. If not, and if it is also not possible to switch to the suspend cpu, an error message should be printed and the resume process should be stopped by loading a disabled wait psw. The current logic is broken in multiple ways, the message is never printed, and the disabled wait psw never loaded because the kernel panics before that: - sam31 and SIGP_SET_ARCHITECTURE to ESA mode is wrong, this will break on the first 64bit instruction in sclp_early_printk(). - The init stack should be used, but the stack pointer is not set up correctly (missing aghi %r15,-STACK_FRAME_OVERHEAD). - __sclp_early_printk() checks the sclp_init_state. If it is not sclp_init_state_uninitialized, it simply returns w/o printing anything. In the resumed kernel however, sclp_init_state will never be uninitialized. This patch fixes those issues by removing the sam31/ESA logic, adding a correct init stack pointer, and also introducing sclp_early_printk_force() to allow using sclp_early_printk() even when sclp_init_state is not uninitialized. Reviewed-by: Heiko Carstens Signed-off-by: Gerald Schaefer Signed-off-by: Martin Schwidefsky --- arch/s390/include/asm/sclp.h | 3 ++- arch/s390/kernel/early_printk.c | 2 +- arch/s390/kernel/swsusp.S | 8 +++----- drivers/s390/char/sclp_early_core.c | 11 ++++++++--- 4 files changed, 14 insertions(+), 10 deletions(-) diff --git a/arch/s390/include/asm/sclp.h b/arch/s390/include/asm/sclp.h index 3cae9168f63c..e44a8d7959f5 100644 --- a/arch/s390/include/asm/sclp.h +++ b/arch/s390/include/asm/sclp.h @@ -108,7 +108,8 @@ int sclp_early_get_core_info(struct sclp_core_info *info); void sclp_early_get_ipl_info(struct sclp_ipl_info *info); void sclp_early_detect(void); void sclp_early_printk(const char *s); -void __sclp_early_printk(const char *s, unsigned int len); +void sclp_early_printk_force(const char *s); +void __sclp_early_printk(const char *s, unsigned int len, unsigned int force); int _sclp_get_core_info(struct sclp_core_info *info); int sclp_core_configure(u8 core); diff --git a/arch/s390/kernel/early_printk.c b/arch/s390/kernel/early_printk.c index 9431784d7796..40c1dfec944e 100644 --- a/arch/s390/kernel/early_printk.c +++ b/arch/s390/kernel/early_printk.c @@ -10,7 +10,7 @@ static void sclp_early_write(struct console *con, const char *s, unsigned int len) { - __sclp_early_printk(s, len); + __sclp_early_printk(s, len, 0); } static struct console sclp_early_console = { diff --git a/arch/s390/kernel/swsusp.S b/arch/s390/kernel/swsusp.S index a049a7b9d6e8..c1a080b11ae9 100644 --- a/arch/s390/kernel/swsusp.S +++ b/arch/s390/kernel/swsusp.S @@ -198,12 +198,10 @@ pgm_check_entry: /* Suspend CPU not available -> panic */ larl %r15,init_thread_union - ahi %r15,1<<(PAGE_SHIFT+THREAD_SIZE_ORDER) + aghi %r15,1<<(PAGE_SHIFT+THREAD_SIZE_ORDER) + aghi %r15,-STACK_FRAME_OVERHEAD larl %r2,.Lpanic_string - lghi %r1,0 - sam31 - sigp %r1,%r0,SIGP_SET_ARCHITECTURE - brasl %r14,sclp_early_printk + brasl %r14,sclp_early_printk_force larl %r3,.Ldisabled_wait_31 lpsw 0(%r3) 4: diff --git a/drivers/s390/char/sclp_early_core.c b/drivers/s390/char/sclp_early_core.c index eceba3858cef..2f61f5579aa5 100644 --- a/drivers/s390/char/sclp_early_core.c +++ b/drivers/s390/char/sclp_early_core.c @@ -210,11 +210,11 @@ static int sclp_early_setup(int disable, int *have_linemode, int *have_vt220) * Output one or more lines of text on the SCLP console (VT220 and / * or line-mode). */ -void __sclp_early_printk(const char *str, unsigned int len) +void __sclp_early_printk(const char *str, unsigned int len, unsigned int force) { int have_linemode, have_vt220; - if (sclp_init_state != sclp_init_state_uninitialized) + if (!force && sclp_init_state != sclp_init_state_uninitialized) return; if (sclp_early_setup(0, &have_linemode, &have_vt220) != 0) return; @@ -227,5 +227,10 @@ void __sclp_early_printk(const char *str, unsigned int len) void sclp_early_printk(const char *str) { - __sclp_early_printk(str, strlen(str)); + __sclp_early_printk(str, strlen(str), 0); +} + +void sclp_early_printk_force(const char *str) +{ + __sclp_early_printk(str, strlen(str), 1); } From f5fad711c06e652f90f581fc7c2caee327c33d31 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Mon, 24 Sep 2018 15:28:10 +0200 Subject: [PATCH 011/134] USB: serial: simple: add Motorola Tetra MTP6550 id Add device-id for the Motorola Tetra radio MTP6550. Bus 001 Device 004: ID 0cad:9012 Motorola CGISS Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass 0 (Defined at Interface level) bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize0 64 idVendor 0x0cad Motorola CGISS idProduct 0x9012 bcdDevice 24.16 iManufacturer 1 Motorola Solutions, Inc. iProduct 2 TETRA PEI interface iSerial 0 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 55 bNumInterfaces 2 bConfigurationValue 1 iConfiguration 3 Generic Serial config bmAttributes 0x80 (Bus Powered) MaxPower 500mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 0 bInterfaceProtocol 0 iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x01 EP 1 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 1 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 0 bInterfaceProtocol 0 iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x82 EP 2 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x02 EP 2 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 0 Device Qualifier (for other device speed): bLength 10 bDescriptorType 6 bcdUSB 2.00 bDeviceClass 0 (Defined at Interface level) bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize0 64 bNumConfigurations 1 Device Status: 0x0000 (Bus Powered) Reported-by: Hans Hult Cc: stable Signed-off-by: Johan Hovold --- drivers/usb/serial/usb-serial-simple.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/usb/serial/usb-serial-simple.c b/drivers/usb/serial/usb-serial-simple.c index 40864c2bd9dc..4d0273508043 100644 --- a/drivers/usb/serial/usb-serial-simple.c +++ b/drivers/usb/serial/usb-serial-simple.c @@ -84,7 +84,8 @@ DEVICE(moto_modem, MOTO_IDS); /* Motorola Tetra driver */ #define MOTOROLA_TETRA_IDS() \ - { USB_DEVICE(0x0cad, 0x9011) } /* Motorola Solutions TETRA PEI */ + { USB_DEVICE(0x0cad, 0x9011) }, /* Motorola Solutions TETRA PEI */ \ + { USB_DEVICE(0x0cad, 0x9012) } /* MTP6550 */ DEVICE(motorola_tetra, MOTOROLA_TETRA_IDS); /* Novatel Wireless GPS driver */ From 6697576788816985f9c79da190abfaad7c9e1738 Mon Sep 17 00:00:00 2001 From: Stephen Boyd Date: Thu, 20 Sep 2018 11:03:22 -0700 Subject: [PATCH 012/134] i2c: i2c-qcom-geni: Properly handle DMA safe buffers We shouldn't attempt to DMA map the message buffers passed into this driver from the i2c core unless the message we're mapping have been properly setup for DMA. The i2c core indicates such a situation by setting the I2C_M_DMA_SAFE flag, so check for that flag before using DMA mode. We can also bounce the buffer if it isn't already mapped properly by using the i2c_get_dma_safe_msg_buf() APIs, so do that when we want to use DMA for a message. This fixes a problem where the kernel oopses cleaning pages for a buffer that's mapped into the vmalloc space. The pages are returned from request_firmware() and passed down directly to the i2c master to write to the i2c touchscreen device. Mapping vmalloc buffers with dma_map_single() won't work reliably, causing an oops like below: Unable to handle kernel paging request at virtual address ffffffc01391d000 ... Reported-by: Philip Chen Signed-off-by: Stephen Boyd Reviewed-by: Douglas Anderson Signed-off-by: Wolfram Sang --- drivers/i2c/busses/i2c-qcom-geni.c | 22 ++++++++++++++++++---- 1 file changed, 18 insertions(+), 4 deletions(-) diff --git a/drivers/i2c/busses/i2c-qcom-geni.c b/drivers/i2c/busses/i2c-qcom-geni.c index 36732eb688a4..9f2eb02481d3 100644 --- a/drivers/i2c/busses/i2c-qcom-geni.c +++ b/drivers/i2c/busses/i2c-qcom-geni.c @@ -367,20 +367,26 @@ static int geni_i2c_rx_one_msg(struct geni_i2c_dev *gi2c, struct i2c_msg *msg, dma_addr_t rx_dma; enum geni_se_xfer_mode mode; unsigned long time_left = XFER_TIMEOUT; + void *dma_buf; gi2c->cur = msg; - mode = msg->len > 32 ? GENI_SE_DMA : GENI_SE_FIFO; + mode = GENI_SE_FIFO; + dma_buf = i2c_get_dma_safe_msg_buf(msg, 32); + if (dma_buf) + mode = GENI_SE_DMA; + geni_se_select_mode(&gi2c->se, mode); writel_relaxed(msg->len, gi2c->se.base + SE_I2C_RX_TRANS_LEN); geni_se_setup_m_cmd(&gi2c->se, I2C_READ, m_param); if (mode == GENI_SE_DMA) { int ret; - ret = geni_se_rx_dma_prep(&gi2c->se, msg->buf, msg->len, + ret = geni_se_rx_dma_prep(&gi2c->se, dma_buf, msg->len, &rx_dma); if (ret) { mode = GENI_SE_FIFO; geni_se_select_mode(&gi2c->se, mode); + i2c_put_dma_safe_msg_buf(dma_buf, msg, false); } } @@ -393,6 +399,7 @@ static int geni_i2c_rx_one_msg(struct geni_i2c_dev *gi2c, struct i2c_msg *msg, if (gi2c->err) geni_i2c_rx_fsm_rst(gi2c); geni_se_rx_dma_unprep(&gi2c->se, rx_dma, msg->len); + i2c_put_dma_safe_msg_buf(dma_buf, msg, !gi2c->err); } return gi2c->err; } @@ -403,20 +410,26 @@ static int geni_i2c_tx_one_msg(struct geni_i2c_dev *gi2c, struct i2c_msg *msg, dma_addr_t tx_dma; enum geni_se_xfer_mode mode; unsigned long time_left; + void *dma_buf; gi2c->cur = msg; - mode = msg->len > 32 ? GENI_SE_DMA : GENI_SE_FIFO; + mode = GENI_SE_FIFO; + dma_buf = i2c_get_dma_safe_msg_buf(msg, 32); + if (dma_buf) + mode = GENI_SE_DMA; + geni_se_select_mode(&gi2c->se, mode); writel_relaxed(msg->len, gi2c->se.base + SE_I2C_TX_TRANS_LEN); geni_se_setup_m_cmd(&gi2c->se, I2C_WRITE, m_param); if (mode == GENI_SE_DMA) { int ret; - ret = geni_se_tx_dma_prep(&gi2c->se, msg->buf, msg->len, + ret = geni_se_tx_dma_prep(&gi2c->se, dma_buf, msg->len, &tx_dma); if (ret) { mode = GENI_SE_FIFO; geni_se_select_mode(&gi2c->se, mode); + i2c_put_dma_safe_msg_buf(dma_buf, msg, false); } } @@ -432,6 +445,7 @@ static int geni_i2c_tx_one_msg(struct geni_i2c_dev *gi2c, struct i2c_msg *msg, if (gi2c->err) geni_i2c_tx_fsm_rst(gi2c); geni_se_tx_dma_unprep(&gi2c->se, tx_dma, msg->len); + i2c_put_dma_safe_msg_buf(dma_buf, msg, !gi2c->err); } return gi2c->err; } From ea51e17b953f391c0a7d6684141e76a5ab209fef Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Mon, 24 Sep 2018 18:14:37 +0100 Subject: [PATCH 013/134] i2c: i2c-isch: fix spelling mistake "unitialized" -> "uninitialized" Trivial fix to spelling mistake in dev_notice message. Signed-off-by: Colin Ian King Reviewed-by: Jean Delvare Signed-off-by: Wolfram Sang --- drivers/i2c/busses/i2c-isch.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/i2c/busses/i2c-isch.c b/drivers/i2c/busses/i2c-isch.c index 0cf1379f4e80..5c754bf659e2 100644 --- a/drivers/i2c/busses/i2c-isch.c +++ b/drivers/i2c/busses/i2c-isch.c @@ -164,7 +164,7 @@ static s32 sch_access(struct i2c_adapter *adap, u16 addr, * run ~75 kHz instead which should do no harm. */ dev_notice(&sch_adapter.dev, - "Clock divider unitialized. Setting defaults\n"); + "Clock divider uninitialized. Setting defaults\n"); outw(backbone_speed / (4 * 100), SMBHSTCLK); } From 25e11700b54c7b6b5ebfc4361981dae12299557b Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Tue, 11 Sep 2018 14:45:03 +0300 Subject: [PATCH 014/134] perf script python: Fix export-to-postgresql.py occasional failure Occasional export failures were found to be caused by truncating 64-bit pointers to 32-bits. Fix by explicitly setting types for all ctype arguments and results. Signed-off-by: Adrian Hunter Cc: Jiri Olsa Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20180911114504.28516-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/scripts/python/export-to-postgresql.py | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/tools/perf/scripts/python/export-to-postgresql.py b/tools/perf/scripts/python/export-to-postgresql.py index efcaf6cac2eb..e46f51b17513 100644 --- a/tools/perf/scripts/python/export-to-postgresql.py +++ b/tools/perf/scripts/python/export-to-postgresql.py @@ -204,14 +204,23 @@ from ctypes import * libpq = CDLL("libpq.so.5") PQconnectdb = libpq.PQconnectdb PQconnectdb.restype = c_void_p +PQconnectdb.argtypes = [ c_char_p ] PQfinish = libpq.PQfinish +PQfinish.argtypes = [ c_void_p ] PQstatus = libpq.PQstatus +PQstatus.restype = c_int +PQstatus.argtypes = [ c_void_p ] PQexec = libpq.PQexec PQexec.restype = c_void_p +PQexec.argtypes = [ c_void_p, c_char_p ] PQresultStatus = libpq.PQresultStatus +PQresultStatus.restype = c_int +PQresultStatus.argtypes = [ c_void_p ] PQputCopyData = libpq.PQputCopyData +PQputCopyData.restype = c_int PQputCopyData.argtypes = [ c_void_p, c_void_p, c_int ] PQputCopyEnd = libpq.PQputCopyEnd +PQputCopyEnd.restype = c_int PQputCopyEnd.argtypes = [ c_void_p, c_void_p ] sys.path.append(os.environ['PERF_EXEC_PATH'] + \ From d005efe18db0b4a123dd92ea8e77e27aee8f99fd Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Tue, 11 Sep 2018 14:45:04 +0300 Subject: [PATCH 015/134] perf script python: Fix export-to-sqlite.py sample columns With the "branches" export option, not all sample columns are exported. However the unwanted columns are not at the end of the tuple, as assumed by the code. Fix by taking the first 15 and last 3 values, instead of the first 18. Signed-off-by: Adrian Hunter Cc: Jiri Olsa Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20180911114504.28516-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/scripts/python/export-to-sqlite.py | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/tools/perf/scripts/python/export-to-sqlite.py b/tools/perf/scripts/python/export-to-sqlite.py index f827bf77e9d2..e4bb82c8aba9 100644 --- a/tools/perf/scripts/python/export-to-sqlite.py +++ b/tools/perf/scripts/python/export-to-sqlite.py @@ -440,7 +440,11 @@ def branch_type_table(*x): def sample_table(*x): if branches: - bind_exec(sample_query, 18, x) + for xx in x[0:15]: + sample_query.addBindValue(str(xx)) + for xx in x[19:22]: + sample_query.addBindValue(str(xx)) + do_query_(sample_query) else: bind_exec(sample_query, 22, x) From c98e16b2fa1202dd8c66900823591cd110a1a213 Mon Sep 17 00:00:00 2001 From: Eric Farman Date: Fri, 21 Sep 2018 22:40:12 +0200 Subject: [PATCH 016/134] s390/cio: Convert ccw_io_region to pointer In the event that we want to change the layout of the ccw_io_region in the future[1], it might be easier to work with it as a pointer within the vfio_ccw_private struct rather than an embedded struct. [1] https://patchwork.kernel.org/comment/22228541/ Signed-off-by: Eric Farman Message-Id: <20180921204013.95804-2-farman@linux.ibm.com> Signed-off-by: Cornelia Huck --- drivers/s390/cio/vfio_ccw_drv.c | 12 +++++++++++- drivers/s390/cio/vfio_ccw_fsm.c | 6 +++--- drivers/s390/cio/vfio_ccw_ops.c | 4 ++-- drivers/s390/cio/vfio_ccw_private.h | 2 +- 4 files changed, 17 insertions(+), 7 deletions(-) diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c index 770fa9cfc310..f48e6f84eefe 100644 --- a/drivers/s390/cio/vfio_ccw_drv.c +++ b/drivers/s390/cio/vfio_ccw_drv.c @@ -79,7 +79,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work) cp_update_scsw(&private->cp, &irb->scsw); cp_free(&private->cp); } - memcpy(private->io_region.irb_area, irb, sizeof(*irb)); + memcpy(private->io_region->irb_area, irb, sizeof(*irb)); if (private->io_trigger) eventfd_signal(private->io_trigger, 1); @@ -114,6 +114,14 @@ static int vfio_ccw_sch_probe(struct subchannel *sch) private = kzalloc(sizeof(*private), GFP_KERNEL | GFP_DMA); if (!private) return -ENOMEM; + + private->io_region = kzalloc(sizeof(*private->io_region), + GFP_KERNEL | GFP_DMA); + if (!private->io_region) { + kfree(private); + return -ENOMEM; + } + private->sch = sch; dev_set_drvdata(&sch->dev, private); @@ -139,6 +147,7 @@ out_disable: cio_disable_subchannel(sch); out_free: dev_set_drvdata(&sch->dev, NULL); + kfree(private->io_region); kfree(private); return ret; } @@ -153,6 +162,7 @@ static int vfio_ccw_sch_remove(struct subchannel *sch) dev_set_drvdata(&sch->dev, NULL); + kfree(private->io_region); kfree(private); return 0; diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c index 797a82731159..f94aa01f9c36 100644 --- a/drivers/s390/cio/vfio_ccw_fsm.c +++ b/drivers/s390/cio/vfio_ccw_fsm.c @@ -93,13 +93,13 @@ static void fsm_io_error(struct vfio_ccw_private *private, enum vfio_ccw_event event) { pr_err("vfio-ccw: FSM: I/O request from state:%d\n", private->state); - private->io_region.ret_code = -EIO; + private->io_region->ret_code = -EIO; } static void fsm_io_busy(struct vfio_ccw_private *private, enum vfio_ccw_event event) { - private->io_region.ret_code = -EBUSY; + private->io_region->ret_code = -EBUSY; } static void fsm_disabled_irq(struct vfio_ccw_private *private, @@ -126,7 +126,7 @@ static void fsm_io_request(struct vfio_ccw_private *private, { union orb *orb; union scsw *scsw = &private->scsw; - struct ccw_io_region *io_region = &private->io_region; + struct ccw_io_region *io_region = private->io_region; struct mdev_device *mdev = private->mdev; char *errstr = "request"; diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c index 41eeb57d68a3..f673e106c041 100644 --- a/drivers/s390/cio/vfio_ccw_ops.c +++ b/drivers/s390/cio/vfio_ccw_ops.c @@ -174,7 +174,7 @@ static ssize_t vfio_ccw_mdev_read(struct mdev_device *mdev, return -EINVAL; private = dev_get_drvdata(mdev_parent_dev(mdev)); - region = &private->io_region; + region = private->io_region; if (copy_to_user(buf, (void *)region + *ppos, count)) return -EFAULT; @@ -196,7 +196,7 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev, if (private->state != VFIO_CCW_STATE_IDLE) return -EACCES; - region = &private->io_region; + region = private->io_region; if (copy_from_user((void *)region + *ppos, buf, count)) return -EFAULT; diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h index 78a66d96756b..078e46f9623d 100644 --- a/drivers/s390/cio/vfio_ccw_private.h +++ b/drivers/s390/cio/vfio_ccw_private.h @@ -41,7 +41,7 @@ struct vfio_ccw_private { atomic_t avail; struct mdev_device *mdev; struct notifier_block nb; - struct ccw_io_region io_region; + struct ccw_io_region *io_region; struct channel_program cp; struct irb irb; From bf42daed6bd136774415ae6d26c8475152f92b54 Mon Sep 17 00:00:00 2001 From: Eric Farman Date: Fri, 21 Sep 2018 22:40:13 +0200 Subject: [PATCH 017/134] s390/cio: Refactor alloc of ccw_io_region If I attach a vfio-ccw device to my guest, I get the following warning on the host when the host kernel is CONFIG_HARDENED_USERCOPY=y [250757.595325] Bad or missing usercopy whitelist? Kernel memory overwrite attempt detected to SLUB object 'dma-kmalloc-512' (offset 64, size 124)! [250757.595365] WARNING: CPU: 2 PID: 10958 at mm/usercopy.c:81 usercopy_warn+0xac/0xd8 [250757.595369] Modules linked in: kvm vhost_net vhost tap xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c devlink tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables sunrpc dm_multipath s390_trng crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 des_generic sha512_s390 sha1_s390 eadm_sch tape_3590 tape tape_class qeth_l2 qeth ccwgroup vfio_ccw vfio_mdev zcrypt_cex4 mdev vfio_iommu_type1 zcrypt vfio sha256_s390 sha_common zfcp scsi_transport_fc qdio dasd_eckd_mod dasd_mod [250757.595424] CPU: 2 PID: 10958 Comm: CPU 2/KVM Not tainted 4.18.0-derp #2 [250757.595426] Hardware name: IBM 3906 M05 780 (LPAR) ...snip regs... [250757.595523] Call Trace: [250757.595529] ([<0000000000349210>] usercopy_warn+0xa8/0xd8) [250757.595535] [<000000000032daaa>] __check_heap_object+0xfa/0x160 [250757.595540] [<0000000000349396>] __check_object_size+0x156/0x1d0 [250757.595547] [<000003ff80332d04>] vfio_ccw_mdev_write+0x74/0x148 [vfio_ccw] [250757.595552] [<000000000034ed12>] __vfs_write+0x3a/0x188 [250757.595556] [<000000000034f040>] vfs_write+0xa8/0x1b8 [250757.595559] [<000000000034f4e6>] ksys_pwrite64+0x86/0xc0 [250757.595568] [<00000000008959a0>] system_call+0xdc/0x2b0 [250757.595570] Last Breaking-Event-Address: [250757.595573] [<0000000000349210>] usercopy_warn+0xa8/0xd8 While vfio_ccw_mdev_{write|read} validates that the input position/count does not run over the ccw_io_region struct, the usercopy code that does copy_{to|from}_user doesn't necessarily know this. It sees the variable length and gets worried that it's affecting a normal kmalloc'd struct, and generates the above warning. Adjust how the ccw_io_region is alloc'd with a whitelist to remove this warning. The boundary checking will continue to do its thing. Signed-off-by: Eric Farman Message-Id: <20180921204013.95804-3-farman@linux.ibm.com> Signed-off-by: Cornelia Huck --- drivers/s390/cio/vfio_ccw_drv.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c index f48e6f84eefe..f47d16b5810b 100644 --- a/drivers/s390/cio/vfio_ccw_drv.c +++ b/drivers/s390/cio/vfio_ccw_drv.c @@ -22,6 +22,7 @@ #include "vfio_ccw_private.h" struct workqueue_struct *vfio_ccw_work_q; +struct kmem_cache *vfio_ccw_io_region; /* * Helpers @@ -115,8 +116,8 @@ static int vfio_ccw_sch_probe(struct subchannel *sch) if (!private) return -ENOMEM; - private->io_region = kzalloc(sizeof(*private->io_region), - GFP_KERNEL | GFP_DMA); + private->io_region = kmem_cache_zalloc(vfio_ccw_io_region, + GFP_KERNEL | GFP_DMA); if (!private->io_region) { kfree(private); return -ENOMEM; @@ -147,7 +148,7 @@ out_disable: cio_disable_subchannel(sch); out_free: dev_set_drvdata(&sch->dev, NULL); - kfree(private->io_region); + kmem_cache_free(vfio_ccw_io_region, private->io_region); kfree(private); return ret; } @@ -162,7 +163,7 @@ static int vfio_ccw_sch_remove(struct subchannel *sch) dev_set_drvdata(&sch->dev, NULL); - kfree(private->io_region); + kmem_cache_free(vfio_ccw_io_region, private->io_region); kfree(private); return 0; @@ -242,10 +243,20 @@ static int __init vfio_ccw_sch_init(void) if (!vfio_ccw_work_q) return -ENOMEM; + vfio_ccw_io_region = kmem_cache_create_usercopy("vfio_ccw_io_region", + sizeof(struct ccw_io_region), 0, + SLAB_ACCOUNT, 0, + sizeof(struct ccw_io_region), NULL); + if (!vfio_ccw_io_region) { + destroy_workqueue(vfio_ccw_work_q); + return -ENOMEM; + } + isc_register(VFIO_CCW_ISC); ret = css_driver_register(&vfio_ccw_sch_driver); if (ret) { isc_unregister(VFIO_CCW_ISC); + kmem_cache_destroy(vfio_ccw_io_region); destroy_workqueue(vfio_ccw_work_q); } @@ -256,6 +267,7 @@ static void __exit vfio_ccw_sch_exit(void) { css_driver_unregister(&vfio_ccw_sch_driver); isc_unregister(VFIO_CCW_ISC); + kmem_cache_destroy(vfio_ccw_io_region); destroy_workqueue(vfio_ccw_work_q); } module_init(vfio_ccw_sch_init); From ff4ce2885af8f9e8e99864d78dbeb4673f089c76 Mon Sep 17 00:00:00 2001 From: Milian Wolff Date: Wed, 26 Sep 2018 15:52:05 +0200 Subject: [PATCH 018/134] perf report: Don't try to map ip to invalid map Fixes a crash when the report encounters an address that could not be associated with an mmaped region: #0 0x00005555557bdc4a in callchain_srcline (ip=, sym=0x0, map=0x0) at util/machine.c:2329 #1 unwind_entry (entry=entry@entry=0x7fffffff9180, arg=arg@entry=0x7ffff5642498) at util/machine.c:2329 #2 0x00005555558370af in entry (arg=0x7ffff5642498, cb=0x5555557bdb50 , thread=, ip=18446744073709551615) at util/unwind-libunwind-local.c:586 #3 get_entries (ui=ui@entry=0x7fffffff9620, cb=0x5555557bdb50 , arg=0x7ffff5642498, max_stack=) at util/unwind-libunwind-local.c:703 #4 0x0000555555837192 in _unwind__get_entries (cb=, arg=, thread=, data=, max_stack=) at util/unwind-libunwind-local.c:725 #5 0x00005555557c310f in thread__resolve_callchain_unwind (max_stack=127, sample=0x7fffffff9830, evsel=0x555555c7b3b0, cursor=0x7ffff5642498, thread=0x555555c7f6f0) at util/machine.c:2351 #6 thread__resolve_callchain (thread=0x555555c7f6f0, cursor=0x7ffff5642498, evsel=0x555555c7b3b0, sample=0x7fffffff9830, parent=0x7fffffff97b8, root_al=0x7fffffff9750, max_stack=127) at util/machine.c:2378 #7 0x00005555557ba4ee in sample__resolve_callchain (sample=, cursor=, parent=parent@entry=0x7fffffff97b8, evsel=, al=al@entry=0x7fffffff9750, max_stack=) at util/callchain.c:1085 Signed-off-by: Milian Wolff Tested-by: Sandipan Das Acked-by: Jiri Olsa Cc: Jin Yao Cc: Namhyung Kim Fixes: 2a9d5050dc84 ("perf script: Show correct offsets for DWARF-based unwinding") Link: http://lkml.kernel.org/r/20180926135207.30263-1-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/machine.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index c4acd2001db0..0cb4f8bf3ca7 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -2312,7 +2312,7 @@ static int unwind_entry(struct unwind_entry *entry, void *arg) { struct callchain_cursor *cursor = arg; const char *srcline = NULL; - u64 addr; + u64 addr = entry->ip; if (symbol_conf.hide_unresolved && entry->sym == NULL) return 0; @@ -2324,7 +2324,8 @@ static int unwind_entry(struct unwind_entry *entry, void *arg) * Convert entry->ip from a virtual address to an offset in * its corresponding binary. */ - addr = map__map_ip(entry->map, entry->ip); + if (entry->map) + addr = map__map_ip(entry->map, entry->ip); srcline = callchain_srcline(entry->map, entry->sym, addr); return callchain_cursor_append(cursor, entry->ip, From 853dc104e6a43cba9a7a87542bc6d8e3b74f0bc9 Mon Sep 17 00:00:00 2001 From: Laurentiu Tudor Date: Wed, 26 Sep 2018 16:22:30 +0300 Subject: [PATCH 019/134] soc: fsl: qbman: add APIs to retrieve the probing status Add a couple of new APIs to check the probing status of qman and bman: 'int bman_is_probed()' and 'int qman_is_probed()'. They return the following values. * 1 if qman/bman were probed correctly * 0 if qman/bman were not yet probed * -1 if probing of qman/bman failed Drivers that use qman/bman driver services are required to use these APIs before calling any functions exported by qman or bman drivers or otherwise they will crash the kernel. The APIs will be used in the following couple of qbman portal patches and later in the series in the dpaa1 ethernet driver. Signed-off-by: Laurentiu Tudor Signed-off-by: Li Yang --- drivers/soc/fsl/qbman/bman_ccsr.c | 11 +++++++++++ drivers/soc/fsl/qbman/qman_ccsr.c | 11 +++++++++++ include/soc/fsl/bman.h | 8 ++++++++ include/soc/fsl/qman.h | 8 ++++++++ 4 files changed, 38 insertions(+) diff --git a/drivers/soc/fsl/qbman/bman_ccsr.c b/drivers/soc/fsl/qbman/bman_ccsr.c index 05c42235dd41..7c3cc968053c 100644 --- a/drivers/soc/fsl/qbman/bman_ccsr.c +++ b/drivers/soc/fsl/qbman/bman_ccsr.c @@ -120,6 +120,7 @@ static void bm_set_memory(u64 ba, u32 size) */ static dma_addr_t fbpr_a; static size_t fbpr_sz; +static int __bman_probed; static int bman_fbpr(struct reserved_mem *rmem) { @@ -166,6 +167,12 @@ static irqreturn_t bman_isr(int irq, void *ptr) return IRQ_HANDLED; } +int bman_is_probed(void) +{ + return __bman_probed; +} +EXPORT_SYMBOL_GPL(bman_is_probed); + static int fsl_bman_probe(struct platform_device *pdev) { int ret, err_irq; @@ -175,6 +182,8 @@ static int fsl_bman_probe(struct platform_device *pdev) u16 id, bm_pool_cnt; u8 major, minor; + __bman_probed = -1; + res = platform_get_resource(pdev, IORESOURCE_MEM, 0); if (!res) { dev_err(dev, "Can't get %pOF property 'IORESOURCE_MEM'\n", @@ -255,6 +264,8 @@ static int fsl_bman_probe(struct platform_device *pdev) return ret; } + __bman_probed = 1; + return 0; }; diff --git a/drivers/soc/fsl/qbman/qman_ccsr.c b/drivers/soc/fsl/qbman/qman_ccsr.c index 79cba58387a5..6fd5fef5f39b 100644 --- a/drivers/soc/fsl/qbman/qman_ccsr.c +++ b/drivers/soc/fsl/qbman/qman_ccsr.c @@ -273,6 +273,7 @@ static const struct qman_error_info_mdata error_mdata[] = { static u32 __iomem *qm_ccsr_start; /* A SDQCR mask comprising all the available/visible pool channels */ static u32 qm_pools_sdqcr; +static int __qman_probed; static inline u32 qm_ccsr_in(u32 offset) { @@ -686,6 +687,12 @@ static int qman_resource_init(struct device *dev) return 0; } +int qman_is_probed(void) +{ + return __qman_probed; +} +EXPORT_SYMBOL_GPL(qman_is_probed); + static int fsl_qman_probe(struct platform_device *pdev) { struct device *dev = &pdev->dev; @@ -695,6 +702,8 @@ static int fsl_qman_probe(struct platform_device *pdev) u16 id; u8 major, minor; + __qman_probed = -1; + res = platform_get_resource(pdev, IORESOURCE_MEM, 0); if (!res) { dev_err(dev, "Can't get %pOF property 'IORESOURCE_MEM'\n", @@ -828,6 +837,8 @@ static int fsl_qman_probe(struct platform_device *pdev) if (ret) return ret; + __qman_probed = 1; + return 0; } diff --git a/include/soc/fsl/bman.h b/include/soc/fsl/bman.h index eaaf56df4086..5b99cb2ea5ef 100644 --- a/include/soc/fsl/bman.h +++ b/include/soc/fsl/bman.h @@ -126,4 +126,12 @@ int bman_release(struct bman_pool *pool, const struct bm_buffer *bufs, u8 num); */ int bman_acquire(struct bman_pool *pool, struct bm_buffer *bufs, u8 num); +/** + * bman_is_probed - Check if bman is probed + * + * Returns 1 if the bman driver successfully probed, -1 if the bman driver + * failed to probe or 0 if the bman driver did not probed yet. + */ +int bman_is_probed(void); + #endif /* __FSL_BMAN_H */ diff --git a/include/soc/fsl/qman.h b/include/soc/fsl/qman.h index d4dfefdee6c1..597783b8a3a0 100644 --- a/include/soc/fsl/qman.h +++ b/include/soc/fsl/qman.h @@ -1186,4 +1186,12 @@ int qman_alloc_cgrid_range(u32 *result, u32 count); */ int qman_release_cgrid(u32 id); +/** + * qman_is_probed - Check if qman is probed + * + * Returns 1 if the qman driver successfully probed, -1 if the qman driver + * failed to probe or 0 if the qman driver did not probed yet. + */ +int qman_is_probed(void); + #endif /* __FSL_QMAN_H */ From 3cc5746e5ad7688e274e193fa71278d98aa52759 Mon Sep 17 00:00:00 2001 From: Nilesh Javali Date: Thu, 27 Sep 2018 05:15:35 -0700 Subject: [PATCH 020/134] scsi: qedi: Initialize the stats mutex lock Fix kernel NULL pointer dereference, Call Trace: [] __mutex_lock_slowpath+0xa6/0x1d0 [] mutex_lock+0x1f/0x2f [] qedi_get_protocol_tlv_data+0x61/0x450 [qedi] [] ? map_vm_area+0x2e/0x40 [] ? __vmalloc_node_range+0x170/0x280 [] ? qed_mfw_process_tlv_req+0x27d/0xbd0 [qed] [] qed_mfw_fill_tlv_data+0x4b/0xb0 [qed] [] qed_mfw_process_tlv_req+0x299/0xbd0 [qed] [] ? __switch_to+0xce/0x580 [] qed_slowpath_task+0x5b/0x80 [qed] Signed-off-by: Nilesh Javali Signed-off-by: Martin K. Petersen --- drivers/scsi/qedi/qedi_main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c index cc8e64dc65ad..e5bd035ebad0 100644 --- a/drivers/scsi/qedi/qedi_main.c +++ b/drivers/scsi/qedi/qedi_main.c @@ -2472,6 +2472,7 @@ static int __qedi_probe(struct pci_dev *pdev, int mode) /* start qedi context */ spin_lock_init(&qedi->hba_lock); spin_lock_init(&qedi->task_idx_lock); + mutex_init(&qedi->stats_lock); } qedi_ops->ll2->register_cb_ops(qedi->cdev, &qedi_ll2_cb_ops, qedi); qedi_ops->ll2->start(qedi->cdev, ¶ms); From ea7e0480a4b695d0aa6b3fa99bd658a003122113 Mon Sep 17 00:00:00 2001 From: Paul Burton Date: Tue, 25 Sep 2018 15:51:26 -0700 Subject: [PATCH 021/134] MIPS: VDSO: Always map near top of user memory When using the legacy mmap layout, for example triggered using ulimit -s unlimited, get_unmapped_area() fills memory from bottom to top starting from a fairly low address near TASK_UNMAPPED_BASE. This placement is suboptimal if the user application wishes to allocate large amounts of heap memory using the brk syscall. With the VDSO being located low in the user's virtual address space, the amount of space available for access using brk is limited much more than it was prior to the introduction of the VDSO. For example: # ulimit -s unlimited; cat /proc/self/maps 00400000-004ec000 r-xp 00000000 08:00 71436 /usr/bin/coreutils 004fc000-004fd000 rwxp 000ec000 08:00 71436 /usr/bin/coreutils 004fd000-0050f000 rwxp 00000000 00:00 0 00cc3000-00ce4000 rwxp 00000000 00:00 0 [heap] 2ab96000-2ab98000 r--p 00000000 00:00 0 [vvar] 2ab98000-2ab99000 r-xp 00000000 00:00 0 [vdso] 2ab99000-2ab9d000 rwxp 00000000 00:00 0 ... Resolve this by adjusting STACK_TOP to reserve space for the VDSO & providing an address hint to get_unmapped_area() causing it to use this space even when using the legacy mmap layout. We reserve enough space for the VDSO, plus 1MB or 256MB for 32 bit & 64 bit systems respectively within which we randomize the VDSO base address. Previously this randomization was taken care of by the mmap base address randomization performed by arch_mmap_rnd(). The 1MB & 256MB sizes are somewhat arbitrary but chosen such that we have some randomization without taking up too much of the user's virtual address space, which is often in short supply for 32 bit systems. With this the VDSO is always mapped at a high address, leaving lots of space for statically linked programs to make use of brk: # ulimit -s unlimited; cat /proc/self/maps 00400000-004ec000 r-xp 00000000 08:00 71436 /usr/bin/coreutils 004fc000-004fd000 rwxp 000ec000 08:00 71436 /usr/bin/coreutils 004fd000-0050f000 rwxp 00000000 00:00 0 00c28000-00c49000 rwxp 00000000 00:00 0 [heap] ... 7f67c000-7f69d000 rwxp 00000000 00:00 0 [stack] 7f7fc000-7f7fd000 rwxp 00000000 00:00 0 7fcf1000-7fcf3000 r--p 00000000 00:00 0 [vvar] 7fcf3000-7fcf4000 r-xp 00000000 00:00 0 [vdso] Signed-off-by: Paul Burton Reported-by: Huacai Chen Fixes: ebb5e78cc634 ("MIPS: Initial implementation of a VDSO") Cc: Huacai Chen Cc: linux-mips@linux-mips.org Cc: stable@vger.kernel.org # v4.4+ --- arch/mips/include/asm/processor.h | 10 +++++----- arch/mips/kernel/process.c | 25 +++++++++++++++++++++++++ arch/mips/kernel/vdso.c | 18 +++++++++++++++++- 3 files changed, 47 insertions(+), 6 deletions(-) diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h index b2fa62922d88..49d6046ca1d0 100644 --- a/arch/mips/include/asm/processor.h +++ b/arch/mips/include/asm/processor.h @@ -13,6 +13,7 @@ #include #include +#include #include #include @@ -80,11 +81,10 @@ extern unsigned int vced_count, vcei_count; #endif -/* - * One page above the stack is used for branch delay slot "emulation". - * See dsemul.c for details. - */ -#define STACK_TOP ((TASK_SIZE & PAGE_MASK) - PAGE_SIZE) +#define VDSO_RANDOMIZE_SIZE (TASK_IS_32BIT_ADDR ? SZ_1M : SZ_256M) + +extern unsigned long mips_stack_top(void); +#define STACK_TOP mips_stack_top() /* * This decides where the kernel will search for a free chunk of vm diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c index 8fc69891e117..d4f7fd4550e1 100644 --- a/arch/mips/kernel/process.c +++ b/arch/mips/kernel/process.c @@ -32,6 +32,7 @@ #include #include +#include #include #include #include @@ -39,6 +40,7 @@ #include #include #include +#include #include #include #include @@ -645,6 +647,29 @@ out: return pc; } +unsigned long mips_stack_top(void) +{ + unsigned long top = TASK_SIZE & PAGE_MASK; + + /* One page for branch delay slot "emulation" */ + top -= PAGE_SIZE; + + /* Space for the VDSO, data page & GIC user page */ + top -= PAGE_ALIGN(current->thread.abi->vdso->size); + top -= PAGE_SIZE; + top -= mips_gic_present() ? PAGE_SIZE : 0; + + /* Space for cache colour alignment */ + if (cpu_has_dc_aliases) + top -= shm_align_mask + 1; + + /* Space to randomize the VDSO base */ + if (current->flags & PF_RANDOMIZE) + top -= VDSO_RANDOMIZE_SIZE; + + return top; +} + /* * Don't forget that the stack pointer must be aligned on a 8 bytes * boundary for 32-bits ABI and 16 bytes for 64-bits ABI. diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c index 8f845f6e5f42..48a9c6b90e07 100644 --- a/arch/mips/kernel/vdso.c +++ b/arch/mips/kernel/vdso.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include #include @@ -97,6 +98,21 @@ void update_vsyscall_tz(void) } } +static unsigned long vdso_base(void) +{ + unsigned long base; + + /* Skip the delay slot emulation page */ + base = STACK_TOP + PAGE_SIZE; + + if (current->flags & PF_RANDOMIZE) { + base += get_random_int() & (VDSO_RANDOMIZE_SIZE - 1); + base = PAGE_ALIGN(base); + } + + return base; +} + int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) { struct mips_vdso_image *image = current->thread.abi->vdso; @@ -137,7 +153,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) if (cpu_has_dc_aliases) size += shm_align_mask + 1; - base = get_unmapped_area(NULL, 0, size, 0, 0); + base = get_unmapped_area(NULL, vdso_base(), size, 0, 0); if (IS_ERR_VALUE(base)) { ret = base; goto out; From 951d223c6c16ed5d2a71a4d1f13c1e65d6882156 Mon Sep 17 00:00:00 2001 From: Paul Burton Date: Thu, 27 Sep 2018 22:59:18 +0000 Subject: [PATCH 022/134] MIPS: Fix CONFIG_CMDLINE handling Commit 8ce355cf2e38 ("MIPS: Setup boot_command_line before plat_mem_setup") fixed a problem for systems which have CONFIG_CMDLINE_BOOL=y & use a DT with a chosen node that has either no bootargs property or an empty one. In this configuration early_init_dt_scan_chosen() copies CONFIG_CMDLINE into boot_command_line, but the MIPS code doesn't know this so it appends CONFIG_CMDLINE (via builtin_cmdline) to boot_command_line again. The result is that boot_command_line contains the arguments from CONFIG_CMDLINE twice. That commit took the approach of simply setting up boot_command_line from the MIPS code before early_init_dt_scan_chosen() runs, causing it not to copy CONFIG_CMDLINE to boot_command_line if a chosen node with no bootargs property is found. Unfortunately this is problematic for systems which do have a non-empty bootargs property & CONFIG_CMDLINE_BOOL=y. There early_init_dt_scan_chosen() will overwrite boot_command_line with the arguments from DT, which means we lose those from CONFIG_CMDLINE entirely. This breaks CONFIG_MIPS_CMDLINE_DTB_EXTEND. If we have CONFIG_MIPS_CMDLINE_FROM_BOOTLOADER or CONFIG_MIPS_CMDLINE_BUILTIN_EXTEND selected and the DT has a bootargs property which we should ignore, it will instead be honoured breaking those configurations too. Fix this by reverting commit 8ce355cf2e38 ("MIPS: Setup boot_command_line before plat_mem_setup") to restore the former behaviour, and fixing the CONFIG_CMDLINE duplication issue by initializing boot_command_line to a non-empty string that early_init_dt_scan_chosen() will not overwrite with CONFIG_CMDLINE. This is a little ugly, but cleanup in this area is on its way. In the meantime this is at least easy to backport & contains the ugliness within arch/mips/. Signed-off-by: Paul Burton Fixes: 8ce355cf2e38 ("MIPS: Setup boot_command_line before plat_mem_setup") References: https://patchwork.linux-mips.org/patch/18804/ Patchwork: https://patchwork.linux-mips.org/patch/20813/ Cc: Frank Rowand Cc: Jaedon Shin Cc: Mathieu Malaterre Cc: Rob Herring Cc: devicetree@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: stable@vger.kernel.org # v4.16+ --- arch/mips/kernel/setup.c | 48 +++++++++++++++++++++++----------------- 1 file changed, 28 insertions(+), 20 deletions(-) diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c index c71d1eb7da59..8aaaa42f91ed 100644 --- a/arch/mips/kernel/setup.c +++ b/arch/mips/kernel/setup.c @@ -846,6 +846,34 @@ static void __init arch_mem_init(char **cmdline_p) struct memblock_region *reg; extern void plat_mem_setup(void); + /* + * Initialize boot_command_line to an innocuous but non-empty string in + * order to prevent early_init_dt_scan_chosen() from copying + * CONFIG_CMDLINE into it without our knowledge. We handle + * CONFIG_CMDLINE ourselves below & don't want to duplicate its + * content because repeating arguments can be problematic. + */ + strlcpy(boot_command_line, " ", COMMAND_LINE_SIZE); + + /* call board setup routine */ + plat_mem_setup(); + + /* + * Make sure all kernel memory is in the maps. The "UP" and + * "DOWN" are opposite for initdata since if it crosses over + * into another memory section you don't want that to be + * freed when the initdata is freed. + */ + arch_mem_addpart(PFN_DOWN(__pa_symbol(&_text)) << PAGE_SHIFT, + PFN_UP(__pa_symbol(&_edata)) << PAGE_SHIFT, + BOOT_MEM_RAM); + arch_mem_addpart(PFN_UP(__pa_symbol(&__init_begin)) << PAGE_SHIFT, + PFN_DOWN(__pa_symbol(&__init_end)) << PAGE_SHIFT, + BOOT_MEM_INIT_RAM); + + pr_info("Determined physical RAM map:\n"); + print_memory_map(); + #if defined(CONFIG_CMDLINE_BOOL) && defined(CONFIG_CMDLINE_OVERRIDE) strlcpy(boot_command_line, builtin_cmdline, COMMAND_LINE_SIZE); #else @@ -873,26 +901,6 @@ static void __init arch_mem_init(char **cmdline_p) } #endif #endif - - /* call board setup routine */ - plat_mem_setup(); - - /* - * Make sure all kernel memory is in the maps. The "UP" and - * "DOWN" are opposite for initdata since if it crosses over - * into another memory section you don't want that to be - * freed when the initdata is freed. - */ - arch_mem_addpart(PFN_DOWN(__pa_symbol(&_text)) << PAGE_SHIFT, - PFN_UP(__pa_symbol(&_edata)) << PAGE_SHIFT, - BOOT_MEM_RAM); - arch_mem_addpart(PFN_UP(__pa_symbol(&__init_begin)) << PAGE_SHIFT, - PFN_DOWN(__pa_symbol(&__init_end)) << PAGE_SHIFT, - BOOT_MEM_INIT_RAM); - - pr_info("Determined physical RAM map:\n"); - print_memory_map(); - strlcpy(command_line, boot_command_line, COMMAND_LINE_SIZE); *cmdline_p = command_line; From 41e270f6898e7502be9fd6920ee0a108ca259d36 Mon Sep 17 00:00:00 2001 From: Dexuan Cui Date: Mon, 17 Sep 2018 04:14:54 +0000 Subject: [PATCH 023/134] Drivers: hv: vmbus: Use get/put_cpu() in vmbus_connect() With CONFIG_DEBUG_PREEMPT=y, I always see this warning: BUG: using smp_processor_id() in preemptible [00000000] Fix the false warning by using get/put_cpu(). Here vmbus_connect() sends a message to the host and waits for the host's response. The host will deliver the response message and an interrupt on CPU msg->target_vcpu, and later the interrupt handler will wake up vmbus_connect(). vmbus_connect() doesn't really have to run on the same cpu as CPU msg->target_vcpu, so it's safe to call put_cpu() just here. Signed-off-by: Dexuan Cui Cc: stable@vger.kernel.org Cc: K. Y. Srinivasan Cc: Haiyang Zhang Cc: Stephen Hemminger Signed-off-by: K. Y. Srinivasan Signed-off-by: Greg Kroah-Hartman --- drivers/hv/connection.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c index ced041899456..f4d08c8ac7f8 100644 --- a/drivers/hv/connection.c +++ b/drivers/hv/connection.c @@ -76,6 +76,7 @@ static int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, __u32 version) { int ret = 0; + unsigned int cur_cpu; struct vmbus_channel_initiate_contact *msg; unsigned long flags; @@ -118,9 +119,10 @@ static int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, * the CPU attempting to connect may not be CPU 0. */ if (version >= VERSION_WIN8_1) { - msg->target_vcpu = - hv_cpu_number_to_vp_number(smp_processor_id()); - vmbus_connection.connect_cpu = smp_processor_id(); + cur_cpu = get_cpu(); + msg->target_vcpu = hv_cpu_number_to_vp_number(cur_cpu); + vmbus_connection.connect_cpu = cur_cpu; + put_cpu(); } else { msg->target_vcpu = 0; vmbus_connection.connect_cpu = 0; From 34bd283396af1f05c43fa64d51378d38affb1cf1 Mon Sep 17 00:00:00 2001 From: Alan Tull Date: Wed, 12 Sep 2018 09:43:23 -0500 Subject: [PATCH 024/134] fpga: do not access region struct after fpga_region_unregister A couple drivers were accessing the region struct after it had been freed. Save off the pointer to the mgr before the region struct gets freed. Signed-off-by: Alan Tull Acked-by: Moritz Fischer Signed-off-by: Greg Kroah-Hartman --- drivers/fpga/dfl-fme-region.c | 4 +++- drivers/fpga/of-fpga-region.c | 3 ++- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/fpga/dfl-fme-region.c b/drivers/fpga/dfl-fme-region.c index 0b7e19c27c6d..51a5ac2293a7 100644 --- a/drivers/fpga/dfl-fme-region.c +++ b/drivers/fpga/dfl-fme-region.c @@ -14,6 +14,7 @@ */ #include +#include #include #include "dfl-fme-pr.h" @@ -66,9 +67,10 @@ eprobe_mgr_put: static int fme_region_remove(struct platform_device *pdev) { struct fpga_region *region = dev_get_drvdata(&pdev->dev); + struct fpga_manager *mgr = region->mgr; fpga_region_unregister(region); - fpga_mgr_put(region->mgr); + fpga_mgr_put(mgr); return 0; } diff --git a/drivers/fpga/of-fpga-region.c b/drivers/fpga/of-fpga-region.c index 35fabb8083fb..052a1342ab7e 100644 --- a/drivers/fpga/of-fpga-region.c +++ b/drivers/fpga/of-fpga-region.c @@ -437,9 +437,10 @@ eprobe_mgr_put: static int of_fpga_region_remove(struct platform_device *pdev) { struct fpga_region *region = platform_get_drvdata(pdev); + struct fpga_manager *mgr = region->mgr; fpga_region_unregister(region); - fpga_mgr_put(region->mgr); + fpga_mgr_put(mgr); return 0; } From c2d68afba86d1ff01e7300c68bc16a9234dcd8e9 Mon Sep 17 00:00:00 2001 From: Vitaly Kuznetsov Date: Mon, 17 Sep 2018 04:14:55 +0000 Subject: [PATCH 025/134] tools: hv: fcopy: set 'error' in case an unknown operation was requested 'error' variable is left uninitialized in case we see an unknown operation. As we don't immediately return and proceed to pwrite() we need to set it to something, HV_E_FAIL sounds good enough. Signed-off-by: Vitaly Kuznetsov Signed-off-by: K. Y. Srinivasan Cc: stable Signed-off-by: Greg Kroah-Hartman --- tools/hv/hv_fcopy_daemon.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/hv/hv_fcopy_daemon.c b/tools/hv/hv_fcopy_daemon.c index d78aed86af09..8ff8cb1a11f4 100644 --- a/tools/hv/hv_fcopy_daemon.c +++ b/tools/hv/hv_fcopy_daemon.c @@ -234,6 +234,7 @@ int main(int argc, char *argv[]) break; default: + error = HV_E_FAIL; syslog(LOG_ERR, "Unknown operation: %d", buffer.hdr.operation); From b4d9a0e5ca139640a45bcda87bc718f85d8008af Mon Sep 17 00:00:00 2001 From: Alan Tull Date: Wed, 12 Sep 2018 09:43:24 -0500 Subject: [PATCH 026/134] fpga: bridge: fix obvious function documentation error fpga_bridge_dev_match() returns a FPGA bridge struct, not a FPGA manager struct so s/manager/bridge/. Signed-off-by: Alan Tull Acked-by: Moritz Fischer Signed-off-by: Greg Kroah-Hartman --- drivers/fpga/fpga-bridge.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/fpga/fpga-bridge.c b/drivers/fpga/fpga-bridge.c index 24b8f98b73ec..c983dac97501 100644 --- a/drivers/fpga/fpga-bridge.c +++ b/drivers/fpga/fpga-bridge.c @@ -125,7 +125,7 @@ static int fpga_bridge_dev_match(struct device *dev, const void *data) * * Given a device, get an exclusive reference to a fpga bridge. * - * Return: fpga manager struct or IS_ERR() condition containing error code. + * Return: fpga bridge struct or IS_ERR() condition containing error code. */ struct fpga_bridge *fpga_bridge_get(struct device *dev, struct fpga_image_info *info) From 492ecf6d6598e7c3b0b1372653ed21da50ddfb09 Mon Sep 17 00:00:00 2001 From: Alan Tull Date: Wed, 12 Sep 2018 09:43:25 -0500 Subject: [PATCH 027/134] docs: fpga: document fpga manager flags Add flags #defines to kerneldoc documentation in a useful place. Signed-off-by: Alan Tull Acked-by: Moritz Fischer Signed-off-by: Greg Kroah-Hartman --- Documentation/driver-api/fpga/fpga-mgr.rst | 5 +++++ include/linux/fpga/fpga-mgr.h | 20 ++++++++++++++------ 2 files changed, 19 insertions(+), 6 deletions(-) diff --git a/Documentation/driver-api/fpga/fpga-mgr.rst b/Documentation/driver-api/fpga/fpga-mgr.rst index 4b3825da48d9..82b6dbbd31cd 100644 --- a/Documentation/driver-api/fpga/fpga-mgr.rst +++ b/Documentation/driver-api/fpga/fpga-mgr.rst @@ -184,6 +184,11 @@ API for implementing a new FPGA Manager driver API for programming an FPGA --------------------------- +FPGA Manager flags + +.. kernel-doc:: include/linux/fpga/fpga-mgr.h + :doc: FPGA Manager flags + .. kernel-doc:: include/linux/fpga/fpga-mgr.h :functions: fpga_image_info diff --git a/include/linux/fpga/fpga-mgr.h b/include/linux/fpga/fpga-mgr.h index 8942e61f0028..8ab5df769923 100644 --- a/include/linux/fpga/fpga-mgr.h +++ b/include/linux/fpga/fpga-mgr.h @@ -53,12 +53,20 @@ enum fpga_mgr_states { FPGA_MGR_STATE_OPERATING, }; -/* - * FPGA Manager flags - * FPGA_MGR_PARTIAL_RECONFIG: do partial reconfiguration if supported - * FPGA_MGR_EXTERNAL_CONFIG: FPGA has been configured prior to Linux booting - * FPGA_MGR_BITSTREAM_LSB_FIRST: SPI bitstream bit order is LSB first - * FPGA_MGR_COMPRESSED_BITSTREAM: FPGA bitstream is compressed +/** + * DOC: FPGA Manager flags + * + * Flags used in the &fpga_image_info->flags field + * + * %FPGA_MGR_PARTIAL_RECONFIG: do partial reconfiguration if supported + * + * %FPGA_MGR_EXTERNAL_CONFIG: FPGA has been configured prior to Linux booting + * + * %FPGA_MGR_ENCRYPTED_BITSTREAM: indicates bitstream is encrypted + * + * %FPGA_MGR_BITSTREAM_LSB_FIRST: SPI bitstream bit order is LSB first + * + * %FPGA_MGR_COMPRESSED_BITSTREAM: FPGA bitstream is compressed */ #define FPGA_MGR_PARTIAL_RECONFIG BIT(0) #define FPGA_MGR_EXTERNAL_CONFIG BIT(1) From 7012040576c6ae25a47035659ee48673612c2c27 Mon Sep 17 00:00:00 2001 From: Bjorn Andersson Date: Wed, 19 Sep 2018 18:09:38 -0700 Subject: [PATCH 028/134] firmware: Always initialize the fw_priv list object When freeing the fw_priv the item is taken off the list. This causes an oops in the FW_OPT_NOCACHE case as the list object is not initialized. Make sure to initialize the list object regardless of this flag. Fixes: 422b3db2a503 ("firmware: Fix security issue with request_firmware_into_buf()") Cc: stable@vger.kernel.org Cc: Rishabh Bhatnagar Signed-off-by: Bjorn Andersson Reviewed-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman --- drivers/base/firmware_loader/main.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/base/firmware_loader/main.c b/drivers/base/firmware_loader/main.c index b3c0498ee433..8e9213b36e31 100644 --- a/drivers/base/firmware_loader/main.c +++ b/drivers/base/firmware_loader/main.c @@ -226,8 +226,11 @@ static int alloc_lookup_fw_priv(const char *fw_name, } tmp = __allocate_fw_priv(fw_name, fwc, dbuf, size); - if (tmp && !(opt_flags & FW_OPT_NOCACHE)) - list_add(&tmp->list, &fwc->head); + if (tmp) { + INIT_LIST_HEAD(&tmp->list); + if (!(opt_flags & FW_OPT_NOCACHE)) + list_add(&tmp->list, &fwc->head); + } spin_unlock(&fwc->lock); *fw_priv = tmp; From 08d9db00fe0e300d6df976e6c294f974988226dd Mon Sep 17 00:00:00 2001 From: Edgar Cherkasov Date: Thu, 27 Sep 2018 11:56:03 +0300 Subject: [PATCH 029/134] i2c: i2c-scmi: fix for i2c_smbus_write_block_data The i2c-scmi driver crashes when the SMBus Write Block transaction is executed: WARNING: CPU: 9 PID: 2194 at mm/page_alloc.c:3931 __alloc_pages_slowpath+0x9db/0xec0 Call Trace: ? get_page_from_freelist+0x49d/0x11f0 ? alloc_pages_current+0x6a/0xe0 ? new_slab+0x499/0x690 __alloc_pages_nodemask+0x265/0x280 alloc_pages_current+0x6a/0xe0 kmalloc_order+0x18/0x40 kmalloc_order_trace+0x24/0xb0 ? acpi_ut_allocate_object_desc_dbg+0x62/0x10c __kmalloc+0x203/0x220 acpi_os_allocate_zeroed+0x34/0x36 acpi_ut_copy_eobject_to_iobject+0x266/0x31e acpi_evaluate_object+0x166/0x3b2 acpi_smbus_cmi_access+0x144/0x530 [i2c_scmi] i2c_smbus_xfer+0xda/0x370 i2cdev_ioctl_smbus+0x1bd/0x270 i2cdev_ioctl+0xaa/0x250 do_vfs_ioctl+0xa4/0x600 SyS_ioctl+0x79/0x90 do_syscall_64+0x73/0x130 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 ACPI Error: Evaluating _SBW: 4 (20170831/smbus_cmi-185) This problem occurs because the length of ACPI Buffer object is not defined/initialized in the code before a corresponding ACPI method is called. The obvious patch below fixes this issue. Signed-off-by: Edgar Cherkasov Acked-by: Viktor Krasnov Acked-by: Michael Brunner Signed-off-by: Wolfram Sang --- drivers/i2c/busses/i2c-scmi.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/i2c/busses/i2c-scmi.c b/drivers/i2c/busses/i2c-scmi.c index a01389b85f13..7e9a2bbf5ddc 100644 --- a/drivers/i2c/busses/i2c-scmi.c +++ b/drivers/i2c/busses/i2c-scmi.c @@ -152,6 +152,7 @@ acpi_smbus_cmi_access(struct i2c_adapter *adap, u16 addr, unsigned short flags, mt_params[3].type = ACPI_TYPE_INTEGER; mt_params[3].integer.value = len; mt_params[4].type = ACPI_TYPE_BUFFER; + mt_params[4].buffer.length = len; mt_params[4].buffer.pointer = data->block + 1; } break; From e7e86f42fa68c70cab4e964017273da32a77af07 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Sun, 30 Sep 2018 09:21:34 -0700 Subject: [PATCH 030/134] MAINTAINERS: MIPS/LOONGSON2 ARCHITECTURE - Use the normal wildcard style Neither git nor get_maintainer understands the curly brace style. Signed-off-by: Joe Perches Signed-off-by: Paul Burton Patchwork: https://patchwork.linux-mips.org/patch/20821/ Cc: Huacai Chen Cc: linux-mips Cc: LKML --- MAINTAINERS | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 02a39617ec82..ba90171786d6 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9658,7 +9658,8 @@ MIPS/LOONGSON2 ARCHITECTURE M: Jiaxun Yang L: linux-mips@linux-mips.org S: Maintained -F: arch/mips/loongson64/*{2e/2f}* +F: arch/mips/loongson64/fuloong-2e/ +F: arch/mips/loongson64/lemote-2f/ F: arch/mips/include/asm/mach-loongson64/ F: drivers/*/*loongson2* F: drivers/*/*/*loongson2* From 242cdad873a75652f97c35aad61270581e0f3a2e Mon Sep 17 00:00:00 2001 From: Joel Stanley Date: Fri, 21 Sep 2018 12:24:31 +0930 Subject: [PATCH 031/134] lib/xz: Put CRC32_POLY_LE in xz_private.h This fixes a regression introduced by faa16bc404d72a5 ("lib: Use existing define with polynomial"). The cleanup added a dependency on include/linux, which broke the PowerPC boot wrapper/decompresser when KERNEL_XZ is enabled: BOOTCC arch/powerpc/boot/decompress.o In file included from arch/powerpc/boot/../../../lib/decompress_unxz.c:233, from arch/powerpc/boot/decompress.c:42: arch/powerpc/boot/../../../lib/xz/xz_crc32.c:18:10: fatal error: linux/crc32poly.h: No such file or directory #include ^~~~~~~~~~~~~~~~~~~ The powerpc decompresser is a hairy corner of the kernel. Even while building a 64-bit kernel it needs to build a 32-bit binary and therefore avoid including files from include/linux. This allows users of the xz library to avoid including headers from 'include/linux/' while still achieving the cleanup of the magic number. Fixes: faa16bc404d72a5 ("lib: Use existing define with polynomial") Reported-by: Meelis Roos Reported-by: kbuild test robot Suggested-by: Christophe LEROY Signed-off-by: Joel Stanley Tested-by: Meelis Roos Signed-off-by: Michael Ellerman --- lib/xz/xz_crc32.c | 1 - lib/xz/xz_private.h | 4 ++++ 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/lib/xz/xz_crc32.c b/lib/xz/xz_crc32.c index 25a5d87e2e4c..912aae5fa09e 100644 --- a/lib/xz/xz_crc32.c +++ b/lib/xz/xz_crc32.c @@ -15,7 +15,6 @@ * but they are bigger and use more memory for the lookup table. */ -#include #include "xz_private.h" /* diff --git a/lib/xz/xz_private.h b/lib/xz/xz_private.h index 482b90f363fe..09360ebb510e 100644 --- a/lib/xz/xz_private.h +++ b/lib/xz/xz_private.h @@ -102,6 +102,10 @@ # endif #endif +#ifndef CRC32_POLY_LE +#define CRC32_POLY_LE 0xedb88320 +#endif + /* * Allocate memory for LZMA2 decoder. xz_dec_lzma2_reset() must be used * before calling xz_dec_lzma2_run(). From 5a1eb8b9542884592a018829bb1ff20c9695d925 Mon Sep 17 00:00:00 2001 From: Laurentiu Tudor Date: Wed, 26 Sep 2018 16:22:31 +0300 Subject: [PATCH 032/134] soc: fsl: qman_portals: defer probe after qman's probe Defer probe of qman portals after qman probing. This fixes the crash below, seen on NXP LS1043A SoCs: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000004 Mem abort info: ESR = 0x96000004 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 [0000000000000004] user address but active_mm is swapper Internal error: Oops: 96000004 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-rc1-next-20180622-00200-g986f5c179185 #9 Hardware name: LS1043A RDB Board (DT) pstate: 80000005 (Nzcv daif -PAN -UAO) pc : qman_set_sdest+0x74/0xa0 lr : qman_portal_probe+0x22c/0x470 sp : ffff00000803bbc0 x29: ffff00000803bbc0 x28: 0000000000000000 x27: ffff0000090c1b88 x26: ffff00000927cb68 x25: ffff00000927c000 x24: ffff00000927cb60 x23: 0000000000000000 x22: 0000000000000000 x21: ffff0000090e9000 x20: ffff800073b5c810 x19: ffff800027401298 x18: ffffffffffffffff x17: 0000000000000001 x16: 0000000000000000 x15: ffff0000090e96c8 x14: ffff80002740138a x13: ffff0000090f2000 x12: 0000000000000030 x11: ffff000008f25000 x10: 0000000000000000 x9 : ffff80007bdfd2c0 x8 : 0000000000004000 x7 : ffff80007393cc18 x6 : 0040000000000001 x5 : 0000000000000000 x4 : ffffffffffffffff x3 : 0000000000000004 x2 : ffff00000927c900 x1 : 0000000000000000 x0 : 0000000000000004 Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____)) Call trace: qman_set_sdest+0x74/0xa0 platform_drv_probe+0x50/0xa8 driver_probe_device+0x214/0x2f8 __driver_attach+0xd8/0xe0 bus_for_each_dev+0x68/0xc8 driver_attach+0x20/0x28 bus_add_driver+0x108/0x228 driver_register+0x60/0x110 __platform_driver_register+0x40/0x48 qman_portal_driver_init+0x20/0x84 do_one_initcall+0x58/0x168 kernel_init_freeable+0x184/0x22c kernel_init+0x10/0x108 ret_from_fork+0x10/0x18 Code: f9400443 11001000 927e4800 8b000063 (b9400063) ---[ end trace 4f6d50489ecfb930 ]--- Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b Signed-off-by: Laurentiu Tudor Signed-off-by: Li Yang --- drivers/soc/fsl/qbman/qman_portal.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/soc/fsl/qbman/qman_portal.c b/drivers/soc/fsl/qbman/qman_portal.c index a120002b630e..3e9391d117c5 100644 --- a/drivers/soc/fsl/qbman/qman_portal.c +++ b/drivers/soc/fsl/qbman/qman_portal.c @@ -227,6 +227,14 @@ static int qman_portal_probe(struct platform_device *pdev) int irq, cpu, err; u32 val; + err = qman_is_probed(); + if (!err) + return -EPROBE_DEFER; + if (err < 0) { + dev_err(&pdev->dev, "failing probe due to qman probe error\n"); + return -ENODEV; + } + pcfg = devm_kmalloc(dev, sizeof(*pcfg), GFP_KERNEL); if (!pcfg) return -ENOMEM; From 9735082a7cbae572c2eabdc45acecc8c9fa0759b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ramses=20Ram=C3=ADrez?= Date: Fri, 28 Sep 2018 16:59:26 -0700 Subject: [PATCH 033/134] Input: xpad - add support for Xbox1 PDP Camo series gamepad MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The "Xbox One PDP Wired Controller - Camo series" has a different product-id than the regular PDP controller and the PDP stealth series, but it uses the same initialization sequence. This patch adds the product-id of the camo series to the structures that handle the other PDP Xbox One controllers. Signed-off-by: Ramses Ramírez Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov --- drivers/input/joystick/xpad.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/input/joystick/xpad.c b/drivers/input/joystick/xpad.c index cd620e009bad..d4b9db487b16 100644 --- a/drivers/input/joystick/xpad.c +++ b/drivers/input/joystick/xpad.c @@ -231,6 +231,7 @@ static const struct xpad_device { { 0x0e6f, 0x0246, "Rock Candy Gamepad for Xbox One 2015", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x02ab, "PDP Controller for Xbox One", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x02a4, "PDP Wired Controller for Xbox One - Stealth Series", 0, XTYPE_XBOXONE }, + { 0x0e6f, 0x02a6, "PDP Wired Controller for Xbox One - Camo Series", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x0301, "Logic3 Controller", 0, XTYPE_XBOX360 }, { 0x0e6f, 0x0346, "Rock Candy Gamepad for Xbox One 2016", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x0401, "Logic3 Controller", 0, XTYPE_XBOX360 }, @@ -530,6 +531,8 @@ static const struct xboxone_init_packet xboxone_init_packets[] = { XBOXONE_INIT_PKT(0x0e6f, 0x02ab, xboxone_pdp_init2), XBOXONE_INIT_PKT(0x0e6f, 0x02a4, xboxone_pdp_init1), XBOXONE_INIT_PKT(0x0e6f, 0x02a4, xboxone_pdp_init2), + XBOXONE_INIT_PKT(0x0e6f, 0x02a6, xboxone_pdp_init1), + XBOXONE_INIT_PKT(0x0e6f, 0x02a6, xboxone_pdp_init2), XBOXONE_INIT_PKT(0x24c6, 0x541a, xboxone_rumblebegin_init), XBOXONE_INIT_PKT(0x24c6, 0x542a, xboxone_rumblebegin_init), XBOXONE_INIT_PKT(0x24c6, 0x543a, xboxone_rumblebegin_init), From 684bec1092b6991ff2a7751e8a763898576eb5c2 Mon Sep 17 00:00:00 2001 From: Daniel Drake Date: Mon, 1 Oct 2018 15:55:22 -0700 Subject: [PATCH 034/134] Input: i8042 - enable keyboard wakeups by default when s2idle is used Previously, on typical consumer laptops, pressing a key on the keyboard when the system is in suspend would cause it to wake up (default or unconditional behaviour). This happens because the EC generates a SCI interrupt in this scenario. That is no longer true on modern laptops based on Intel WhiskeyLake, including Acer Swift SF314-55G, Asus UX333FA, Asus UX433FN and Asus UX533FD. We confirmed with Asus EC engineers that the "Modern Standby" design has been modified so that the EC no longer generates a SCI in this case; the keyboard controller itself should be used for wakeup. In order to retain the standard behaviour of being able to use the keyboard to wake up the system, enable serio wakeups by default on platforms that are using s2idle. Link: https://lkml.kernel.org/r/CAB4CAwfQ0mPMqCLp95TVjw4J0r5zKPWkSvvkK4cpZUGE--w8bQ@mail.gmail.com Reviewed-by: Rafael J. Wysocki Signed-off-by: Daniel Drake Signed-off-by: Dmitry Torokhov --- drivers/input/serio/i8042.c | 29 ++++++++++++++++++++--------- include/linux/suspend.h | 2 ++ kernel/power/suspend.c | 6 ++++++ 3 files changed, 28 insertions(+), 9 deletions(-) diff --git a/drivers/input/serio/i8042.c b/drivers/input/serio/i8042.c index b8bc71569349..95a78ccbd847 100644 --- a/drivers/input/serio/i8042.c +++ b/drivers/input/serio/i8042.c @@ -1395,15 +1395,26 @@ static void __init i8042_register_ports(void) for (i = 0; i < I8042_NUM_PORTS; i++) { struct serio *serio = i8042_ports[i].serio; - if (serio) { - printk(KERN_INFO "serio: %s at %#lx,%#lx irq %d\n", - serio->name, - (unsigned long) I8042_DATA_REG, - (unsigned long) I8042_COMMAND_REG, - i8042_ports[i].irq); - serio_register_port(serio); - device_set_wakeup_capable(&serio->dev, true); - } + if (!serio) + continue; + + printk(KERN_INFO "serio: %s at %#lx,%#lx irq %d\n", + serio->name, + (unsigned long) I8042_DATA_REG, + (unsigned long) I8042_COMMAND_REG, + i8042_ports[i].irq); + serio_register_port(serio); + device_set_wakeup_capable(&serio->dev, true); + + /* + * On platforms using suspend-to-idle, allow the keyboard to + * wake up the system from sleep by enabling keyboard wakeups + * by default. This is consistent with keyboard wakeup + * behavior on many platforms using suspend-to-RAM (ACPI S3) + * by default. + */ + if (pm_suspend_via_s2idle() && i == I8042_KBD_PORT_NO) + device_set_wakeup_enable(&serio->dev, true); } } diff --git a/include/linux/suspend.h b/include/linux/suspend.h index 440b62f7502e..206b735f383f 100644 --- a/include/linux/suspend.h +++ b/include/linux/suspend.h @@ -251,6 +251,7 @@ static inline bool idle_should_enter_s2idle(void) return unlikely(s2idle_state == S2IDLE_STATE_ENTER); } +extern bool pm_suspend_via_s2idle(void); extern void __init pm_states_init(void); extern void s2idle_set_ops(const struct platform_s2idle_ops *ops); extern void s2idle_wake(void); @@ -282,6 +283,7 @@ static inline void pm_set_suspend_via_firmware(void) {} static inline void pm_set_resume_via_firmware(void) {} static inline bool pm_suspend_via_firmware(void) { return false; } static inline bool pm_resume_via_firmware(void) { return false; } +static inline bool pm_suspend_via_s2idle(void) { return false; } static inline void suspend_set_ops(const struct platform_suspend_ops *ops) {} static inline int pm_suspend(suspend_state_t state) { return -ENOSYS; } diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c index 4c10be0f4843..be3d0d477661 100644 --- a/kernel/power/suspend.c +++ b/kernel/power/suspend.c @@ -62,6 +62,12 @@ static DECLARE_WAIT_QUEUE_HEAD(s2idle_wait_head); enum s2idle_states __read_mostly s2idle_state; static DEFINE_SPINLOCK(s2idle_lock); +bool pm_suspend_via_s2idle(void) +{ + return mem_sleep_current == PM_SUSPEND_TO_IDLE; +} +EXPORT_SYMBOL_GPL(pm_suspend_via_s2idle); + void s2idle_set_ops(const struct platform_s2idle_ops *ops) { lock_system_sleep(); From f2924d4b16ae138c2de6a0e73f526fb638330858 Mon Sep 17 00:00:00 2001 From: Romain Izard Date: Thu, 20 Sep 2018 16:49:04 +0200 Subject: [PATCH 035/134] usb: cdc_acm: Do not leak URB buffers When the ACM TTY port is disconnected, the URBs it uses must be killed, and then the buffers must be freed. Unfortunately a previous refactor removed the code freeing the buffers because it looked extremely similar to the code killing the URBs. As a result, there were many new leaks for each plug/unplug cycle of a CDC-ACM device, that were detected by kmemleak. Restore the missing code, and the memory leak is removed. Fixes: ba8c931ded8d ("cdc-acm: refactor killing urbs") Signed-off-by: Romain Izard Acked-by: Oliver Neukum Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/usb/class/cdc-acm.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c index f9b40a9dc4d3..bc03b0a690b4 100644 --- a/drivers/usb/class/cdc-acm.c +++ b/drivers/usb/class/cdc-acm.c @@ -1514,6 +1514,7 @@ static void acm_disconnect(struct usb_interface *intf) { struct acm *acm = usb_get_intfdata(intf); struct tty_struct *tty; + int i; /* sibling interface is already cleaning up */ if (!acm) @@ -1544,6 +1545,11 @@ static void acm_disconnect(struct usb_interface *intf) tty_unregister_device(acm_tty_driver, acm->minor); + usb_free_urb(acm->ctrlurb); + for (i = 0; i < ACM_NW; i++) + usb_free_urb(acm->wb[i].urb); + for (i = 0; i < acm->rx_buflimit; i++) + usb_free_urb(acm->read_urbs[i]); acm_write_buffers_free(acm); usb_free_coherent(acm->dev, acm->ctrlsize, acm->ctrl_buffer, acm->ctrl_dma); acm_read_buffers_free(acm); From ffe84e01bb1b38c7eb9c6b6da127a6c136d251df Mon Sep 17 00:00:00 2001 From: Mathias Nyman Date: Mon, 1 Oct 2018 18:36:07 +0300 Subject: [PATCH 036/134] xhci: Add missing CAS workaround for Intel Sunrise Point xHCI The workaround for missing CAS bit is also needed for xHC on Intel sunrisepoint PCH. For more details see: Intel 100/c230 series PCH specification update Doc #332692-006 Errata #8 Cc: Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci-pci.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c index 6372edf339d9..722860eb5a91 100644 --- a/drivers/usb/host/xhci-pci.c +++ b/drivers/usb/host/xhci-pci.c @@ -185,6 +185,8 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci) } if (pdev->vendor == PCI_VENDOR_ID_INTEL && (pdev->device == PCI_DEVICE_ID_INTEL_CHERRYVIEW_XHCI || + pdev->device == PCI_DEVICE_ID_INTEL_SUNRISEPOINT_LP_XHCI || + pdev->device == PCI_DEVICE_ID_INTEL_SUNRISEPOINT_H_XHCI || pdev->device == PCI_DEVICE_ID_INTEL_APL_XHCI || pdev->device == PCI_DEVICE_ID_INTEL_DNV_XHCI)) xhci->quirks |= XHCI_MISSING_CAS; From 555df5820e733cded7eb8d0bf78b2a791be51d75 Mon Sep 17 00:00:00 2001 From: Chunfeng Yun Date: Mon, 1 Oct 2018 18:36:08 +0300 Subject: [PATCH 037/134] usb: xhci-mtk: resume USB3 roothub first Give USB3 devices a better chance to enumerate at USB3 speeds if they are connected to a suspended host. Porting from "671ffdff5b13 xhci: resume USB 3 roothub first" Cc: Signed-off-by: Chunfeng Yun Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci-mtk.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/usb/host/xhci-mtk.c b/drivers/usb/host/xhci-mtk.c index 7334da9e9779..71d0d33c3286 100644 --- a/drivers/usb/host/xhci-mtk.c +++ b/drivers/usb/host/xhci-mtk.c @@ -642,10 +642,10 @@ static int __maybe_unused xhci_mtk_resume(struct device *dev) xhci_mtk_host_enable(mtk); xhci_dbg(xhci, "%s: restart port polling\n", __func__); - set_bit(HCD_FLAG_POLL_RH, &hcd->flags); - usb_hcd_poll_rh_status(hcd); set_bit(HCD_FLAG_POLL_RH, &xhci->shared_hcd->flags); usb_hcd_poll_rh_status(xhci->shared_hcd); + set_bit(HCD_FLAG_POLL_RH, &hcd->flags); + usb_hcd_poll_rh_status(hcd); return 0; } From 24abf2901b18bf941b9f21ea2ce5791f61097ae4 Mon Sep 17 00:00:00 2001 From: Eric Farman Date: Tue, 2 Oct 2018 03:02:35 +0200 Subject: [PATCH 038/134] s390/cio: Fix how vfio-ccw checks pinned pages We have two nested loops to check the entries within the pfn_array_table arrays. But we mistakenly use the outer array as an index in our check, and completely ignore the indexing performed by the inner loop. Cc: stable@vger.kernel.org Signed-off-by: Eric Farman Message-Id: <20181002010235.42483-1-farman@linux.ibm.com> Signed-off-by: Cornelia Huck --- drivers/s390/cio/vfio_ccw_cp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c index dbe7c7ac9ac8..fd77e46eb3b2 100644 --- a/drivers/s390/cio/vfio_ccw_cp.c +++ b/drivers/s390/cio/vfio_ccw_cp.c @@ -163,7 +163,7 @@ static bool pfn_array_table_iova_pinned(struct pfn_array_table *pat, for (i = 0; i < pat->pat_nr; i++, pa++) for (j = 0; j < pa->pa_nr; j++) - if (pa->pa_iova_pfn[i] == iova_pfn) + if (pa->pa_iova_pfn[j] == iova_pfn) return true; return false; From b45ba4a51cde29b2939365ef0c07ad34c8321789 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Mon, 1 Oct 2018 12:21:10 +0000 Subject: [PATCH 039/134] powerpc/lib: fix book3s/32 boot failure due to code patching Commit 51c3c62b58b3 ("powerpc: Avoid code patching freed init sections") accesses 'init_mem_is_free' flag too early, before the kernel is relocated. This provokes early boot failure (before the console is active). As it is not necessary to do this verification that early, this patch moves the test into patch_instruction() instead of __patch_instruction(). This modification also has the advantage of avoiding unnecessary remappings. Fixes: 51c3c62b58b3 ("powerpc: Avoid code patching freed init sections") Cc: stable@vger.kernel.org # 4.13+ Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman --- arch/powerpc/lib/code-patching.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index 6ae2777c220d..5ffee298745f 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -28,12 +28,6 @@ static int __patch_instruction(unsigned int *exec_addr, unsigned int instr, { int err; - /* Make sure we aren't patching a freed init section */ - if (init_mem_is_free && init_section_contains(exec_addr, 4)) { - pr_debug("Skipping init section patching addr: 0x%px\n", exec_addr); - return 0; - } - __put_user_size(instr, patch_addr, 4, err); if (err) return err; @@ -148,7 +142,7 @@ static inline int unmap_patch_area(unsigned long addr) return 0; } -int patch_instruction(unsigned int *addr, unsigned int instr) +static int do_patch_instruction(unsigned int *addr, unsigned int instr) { int err; unsigned int *patch_addr = NULL; @@ -188,12 +182,22 @@ out: } #else /* !CONFIG_STRICT_KERNEL_RWX */ -int patch_instruction(unsigned int *addr, unsigned int instr) +static int do_patch_instruction(unsigned int *addr, unsigned int instr) { return raw_patch_instruction(addr, instr); } #endif /* CONFIG_STRICT_KERNEL_RWX */ + +int patch_instruction(unsigned int *addr, unsigned int instr) +{ + /* Make sure we aren't patching a freed init section */ + if (init_mem_is_free && init_section_contains(addr, 4)) { + pr_debug("Skipping init section patching addr: 0x%px\n", addr); + return 0; + } + return do_patch_instruction(addr, instr); +} NOKPROBE_SYMBOL(patch_instruction); int patch_branch(unsigned int *addr, unsigned long target, int flags) From 86da809dda64a63fc27e05a215475325c3aaae92 Mon Sep 17 00:00:00 2001 From: Mika Westerberg Date: Mon, 24 Sep 2018 13:20:44 +0300 Subject: [PATCH 040/134] thunderbolt: Do not handle ICM events after domain is stopped If there is a long chain of devices connected when the driver is loaded ICM sends device connected event for each and those are put to tb->wq for later processing. Now if the driver gets unloaded in the middle, so that the work queue is not yet empty it gets flushed by tb_domain_stop(). However, by that time the root switch is already removed so the driver crashes when it tries to dereference it in ICM event handling callbacks. Fix this by checking whether the root switch is already removed. If it is we know that the domain is stopped and we should merely skip handling the event. Signed-off-by: Mika Westerberg Signed-off-by: Greg Kroah-Hartman --- drivers/thunderbolt/icm.c | 49 ++++++++++++++++----------------------- 1 file changed, 20 insertions(+), 29 deletions(-) diff --git a/drivers/thunderbolt/icm.c b/drivers/thunderbolt/icm.c index e1e264a9a4c7..28fc4ce75edb 100644 --- a/drivers/thunderbolt/icm.c +++ b/drivers/thunderbolt/icm.c @@ -738,14 +738,6 @@ icm_fr_xdomain_connected(struct tb *tb, const struct icm_pkg_header *hdr) u8 link, depth; u64 route; - /* - * After NVM upgrade adding root switch device fails because we - * initiated reset. During that time ICM might still send - * XDomain connected message which we ignore here. - */ - if (!tb->root_switch) - return; - link = pkg->link_info & ICM_LINK_INFO_LINK_MASK; depth = (pkg->link_info & ICM_LINK_INFO_DEPTH_MASK) >> ICM_LINK_INFO_DEPTH_SHIFT; @@ -1037,14 +1029,6 @@ icm_tr_device_connected(struct tb *tb, const struct icm_pkg_header *hdr) if (pkg->hdr.packet_id) return; - /* - * After NVM upgrade adding root switch device fails because we - * initiated reset. During that time ICM might still send device - * connected message which we ignore here. - */ - if (!tb->root_switch) - return; - route = get_route(pkg->route_hi, pkg->route_lo); authorized = pkg->link_info & ICM_LINK_INFO_APPROVED; security_level = (pkg->hdr.flags & ICM_FLAGS_SLEVEL_MASK) >> @@ -1408,19 +1392,26 @@ static void icm_handle_notification(struct work_struct *work) mutex_lock(&tb->lock); - switch (n->pkg->code) { - case ICM_EVENT_DEVICE_CONNECTED: - icm->device_connected(tb, n->pkg); - break; - case ICM_EVENT_DEVICE_DISCONNECTED: - icm->device_disconnected(tb, n->pkg); - break; - case ICM_EVENT_XDOMAIN_CONNECTED: - icm->xdomain_connected(tb, n->pkg); - break; - case ICM_EVENT_XDOMAIN_DISCONNECTED: - icm->xdomain_disconnected(tb, n->pkg); - break; + /* + * When the domain is stopped we flush its workqueue but before + * that the root switch is removed. In that case we should treat + * the queued events as being canceled. + */ + if (tb->root_switch) { + switch (n->pkg->code) { + case ICM_EVENT_DEVICE_CONNECTED: + icm->device_connected(tb, n->pkg); + break; + case ICM_EVENT_DEVICE_DISCONNECTED: + icm->device_disconnected(tb, n->pkg); + break; + case ICM_EVENT_XDOMAIN_CONNECTED: + icm->xdomain_connected(tb, n->pkg); + break; + case ICM_EVENT_XDOMAIN_DISCONNECTED: + icm->xdomain_disconnected(tb, n->pkg); + break; + } } mutex_unlock(&tb->lock); From eafa717bc145963c944bb0a64d16add683861b35 Mon Sep 17 00:00:00 2001 From: Mika Westerberg Date: Mon, 24 Sep 2018 13:20:45 +0300 Subject: [PATCH 041/134] thunderbolt: Initialize after IOMMUs If IOMMU is enabled and Thunderbolt driver is built into the kernel image, it will be probed before IOMMUs are attached to the PCI bus. Because of this DMA mappings the driver does will not go through IOMMU and start failing right after IOMMUs are enabled. For this reason move the Thunderbolt driver initialization happen at rootfs level. Signed-off-by: Mika Westerberg Signed-off-by: Greg Kroah-Hartman --- drivers/thunderbolt/nhi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c index 88cff05a1808..5cd6bdfa068f 100644 --- a/drivers/thunderbolt/nhi.c +++ b/drivers/thunderbolt/nhi.c @@ -1191,5 +1191,5 @@ static void __exit nhi_unload(void) tb_domain_exit(); } -fs_initcall(nhi_init); +rootfs_initcall(nhi_init); module_exit(nhi_unload); From beeeac43b6fae5f5eaf707b6fcc2bf1e09deb785 Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Mon, 1 Oct 2018 21:42:37 -0700 Subject: [PATCH 042/134] Revert "serial: 8250_dw: Fix runtime PM handling" This reverts commit d76c74387e1c978b6c5524a146ab0f3f72206f98. While commit d76c74387e1c ("serial: 8250_dw: Fix runtime PM handling") fixes runtime PM handling when using kgdb, it introduces a traceback for everyone else. BUG: sleeping function called from invalid context at /mnt/host/source/src/third_party/kernel/next/drivers/base/power/runtime.c:1034 in_atomic(): 1, irqs_disabled(): 1, pid: 1, name: swapper/0 7 locks held by swapper/0/1: #0: 000000005ec5bc72 (&dev->mutex){....}, at: __driver_attach+0xb5/0x12b #1: 000000005d5fa9e5 (&dev->mutex){....}, at: __device_attach+0x3e/0x15b #2: 0000000047e93286 (serial_mutex){+.+.}, at: serial8250_register_8250_port+0x51/0x8bb #3: 000000003b328f07 (port_mutex){+.+.}, at: uart_add_one_port+0xab/0x8b0 #4: 00000000fa313d4d (&port->mutex){+.+.}, at: uart_add_one_port+0xcc/0x8b0 #5: 00000000090983ca (console_lock){+.+.}, at: vprintk_emit+0xdb/0x217 #6: 00000000c743e583 (console_owner){-...}, at: console_unlock+0x211/0x60f irq event stamp: 735222 __down_trylock_console_sem+0x4a/0x84 console_unlock+0x338/0x60f __do_softirq+0x4a4/0x50d irq_exit+0x64/0xe2 CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc5 #6 Hardware name: Google Caroline/Caroline, BIOS Google_Caroline.7820.286.0 03/15/2017 Call Trace: dump_stack+0x7d/0xbd ___might_sleep+0x238/0x259 __pm_runtime_resume+0x4e/0xa4 ? serial8250_rpm_get+0x2e/0x44 serial8250_console_write+0x44/0x301 ? lock_acquire+0x1b8/0x1fa console_unlock+0x577/0x60f vprintk_emit+0x1f0/0x217 printk+0x52/0x6e register_console+0x43b/0x524 uart_add_one_port+0x672/0x8b0 ? set_io_from_upio+0x150/0x162 serial8250_register_8250_port+0x825/0x8bb dw8250_probe+0x80c/0x8b0 ? dw8250_serial_inq+0x8e/0x8e ? dw8250_check_lcr+0x108/0x108 ? dw8250_runtime_resume+0x5b/0x5b ? dw8250_serial_outq+0xa1/0xa1 ? dw8250_remove+0x115/0x115 platform_drv_probe+0x76/0xc5 really_probe+0x1f1/0x3ee ? driver_allows_async_probing+0x5d/0x5d driver_probe_device+0xd6/0x112 ? driver_allows_async_probing+0x5d/0x5d bus_for_each_drv+0xbe/0xe5 __device_attach+0xdd/0x15b bus_probe_device+0x5a/0x10b device_add+0x501/0x894 ? _raw_write_unlock+0x27/0x3a platform_device_add+0x224/0x2b7 mfd_add_device+0x718/0x75b ? __kmalloc+0x144/0x16a ? mfd_add_devices+0x38/0xdb mfd_add_devices+0x9b/0xdb intel_lpss_probe+0x7d4/0x8ee intel_lpss_pci_probe+0xac/0xd4 pci_device_probe+0x101/0x18e ... Revert the offending patch until a more comprehensive solution is available. Cc: Tony Lindgren Cc: Andy Shevchenko Cc: Phil Edworthy Fixes: d76c74387e1c ("serial: 8250_dw: Fix runtime PM handling") Signed-off-by: Guenter Roeck Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/8250/8250_dw.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/drivers/tty/serial/8250/8250_dw.c b/drivers/tty/serial/8250/8250_dw.c index fa8dcb470640..d31b975dd3fd 100644 --- a/drivers/tty/serial/8250/8250_dw.c +++ b/drivers/tty/serial/8250/8250_dw.c @@ -630,10 +630,6 @@ static int dw8250_probe(struct platform_device *pdev) if (!data->skip_autocfg) dw8250_setup_port(p); -#ifdef CONFIG_PM - uart.capabilities |= UART_CAP_RPM; -#endif - /* If we have a valid fifosize, try hooking up DMA */ if (p->fifosize) { data->dma.rxconf.src_maxburst = p->fifosize / 4; From 10653022456dc77b398777fd8e95126c77954b49 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Thu, 30 Aug 2018 14:54:03 +0200 Subject: [PATCH 043/134] Revert "serial: sh-sci: Remove SCIx_RZ_SCIFA_REGTYPE" This reverts commit 7acece71a517cad83a0842a94d94c13f271b680c. Signed-off-by: Geert Uytterhoeven Acked-by: Chris Brandt Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/sh-sci.c | 31 +++++++++++++++++++++++++++++++ include/linux/serial_sci.h | 1 + 2 files changed, 32 insertions(+) diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c index ac4424bf6b13..5d42c9a63001 100644 --- a/drivers/tty/serial/sh-sci.c +++ b/drivers/tty/serial/sh-sci.c @@ -291,6 +291,33 @@ static const struct sci_port_params sci_port_params[SCIx_NR_REGTYPES] = { .error_clear = SCIF_ERROR_CLEAR, }, + /* + * The "SCIFA" that is in RZ/T and RZ/A2. + * It looks like a normal SCIF with FIFO data, but with a + * compressed address space. Also, the break out of interrupts + * are different: ERI/BRI, RXI, TXI, TEI, DRI. + */ + [SCIx_RZ_SCIFA_REGTYPE] = { + .regs = { + [SCSMR] = { 0x00, 16 }, + [SCBRR] = { 0x02, 8 }, + [SCSCR] = { 0x04, 16 }, + [SCxTDR] = { 0x06, 8 }, + [SCxSR] = { 0x08, 16 }, + [SCxRDR] = { 0x0A, 8 }, + [SCFCR] = { 0x0C, 16 }, + [SCFDR] = { 0x0E, 16 }, + [SCSPTR] = { 0x10, 16 }, + [SCLSR] = { 0x12, 16 }, + }, + .fifosize = 16, + .overrun_reg = SCLSR, + .overrun_mask = SCLSR_ORER, + .sampling_rate_mask = SCI_SR(32), + .error_mask = SCIF_DEFAULT_ERROR_MASK, + .error_clear = SCIF_ERROR_CLEAR, + }, + /* * Common SH-3 SCIF definitions. */ @@ -3110,6 +3137,10 @@ static const struct of_device_id of_sci_match[] = { .compatible = "renesas,scif-r7s72100", .data = SCI_OF_DATA(PORT_SCIF, SCIx_SH2_SCIF_FIFODATA_REGTYPE), }, + { + .compatible = "renesas,scif-r7s9210", + .data = SCI_OF_DATA(PORT_SCIF, SCIx_RZ_SCIFA_REGTYPE), + }, /* Family-specific types */ { .compatible = "renesas,rcar-gen1-scif", diff --git a/include/linux/serial_sci.h b/include/linux/serial_sci.h index c0e795d95477..1c89611e0e06 100644 --- a/include/linux/serial_sci.h +++ b/include/linux/serial_sci.h @@ -36,6 +36,7 @@ enum { SCIx_SH4_SCIF_FIFODATA_REGTYPE, SCIx_SH7705_SCIF_REGTYPE, SCIx_HSCIF_REGTYPE, + SCIx_RZ_SCIFA_REGTYPE, SCIx_NR_REGTYPES, }; From 5b162cc4ac27ba76e576abc9090c54cf90a17980 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Thu, 30 Aug 2018 14:54:04 +0200 Subject: [PATCH 044/134] Revert "serial: sh-sci: Allow for compressed SCIF address" This reverts commit 2d4dd0da45401c7ae7332b4d1eb7bbb1348edde9. This broke earlycon on all Renesas ARM platforms using a SCIF port for the serial console (R-Car, RZ/A1, RZ/G1, RZ/G2 SoCs), due to an incorrect value of port->regshift. Signed-off-by: Geert Uytterhoeven Acked-by: Chris Brandt Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/sh-sci.c | 25 ++++++++++--------------- 1 file changed, 10 insertions(+), 15 deletions(-) diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c index 5d42c9a63001..ab3f6e91853d 100644 --- a/drivers/tty/serial/sh-sci.c +++ b/drivers/tty/serial/sh-sci.c @@ -346,15 +346,15 @@ static const struct sci_port_params sci_port_params[SCIx_NR_REGTYPES] = { [SCIx_SH4_SCIF_REGTYPE] = { .regs = { [SCSMR] = { 0x00, 16 }, - [SCBRR] = { 0x02, 8 }, - [SCSCR] = { 0x04, 16 }, - [SCxTDR] = { 0x06, 8 }, - [SCxSR] = { 0x08, 16 }, - [SCxRDR] = { 0x0a, 8 }, - [SCFCR] = { 0x0c, 16 }, - [SCFDR] = { 0x0e, 16 }, - [SCSPTR] = { 0x10, 16 }, - [SCLSR] = { 0x12, 16 }, + [SCBRR] = { 0x04, 8 }, + [SCSCR] = { 0x08, 16 }, + [SCxTDR] = { 0x0c, 8 }, + [SCxSR] = { 0x10, 16 }, + [SCxRDR] = { 0x14, 8 }, + [SCFCR] = { 0x18, 16 }, + [SCFDR] = { 0x1c, 16 }, + [SCSPTR] = { 0x20, 16 }, + [SCLSR] = { 0x24, 16 }, }, .fifosize = 16, .overrun_reg = SCLSR, @@ -2837,7 +2837,7 @@ static int sci_init_single(struct platform_device *dev, { struct uart_port *port = &sci_port->port; const struct resource *res; - unsigned int i, regtype; + unsigned int i; int ret; sci_port->cfg = p; @@ -2874,7 +2874,6 @@ static int sci_init_single(struct platform_device *dev, if (unlikely(sci_port->params == NULL)) return -EINVAL; - regtype = sci_port->params - sci_port_params; switch (p->type) { case PORT_SCIFB: sci_port->rx_trigger = 48; @@ -2929,10 +2928,6 @@ static int sci_init_single(struct platform_device *dev, port->regshift = 1; } - if (regtype == SCIx_SH4_SCIF_REGTYPE) - if (sci_port->reg_size >= 0x20) - port->regshift = 1; - /* * The UART port needs an IRQ value, so we peg this to the RX IRQ * for the multi-IRQ ports, which is where we are primarily From 479adb89a97b0a33e5a9d702119872cc82ca21aa Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Thu, 4 Oct 2018 13:28:08 -0700 Subject: [PATCH 045/134] cgroup: Fix dom_cgrp propagation when enabling threaded mode A cgroup which is already a threaded domain may be converted into a threaded cgroup if the prerequisite conditions are met. When this happens, all threaded descendant should also have their ->dom_cgrp updated to the new threaded domain cgroup. Unfortunately, this propagation was missing leading to the following failure. # cd /sys/fs/cgroup/unified # cat cgroup.subtree_control # show that no controllers are enabled # mkdir -p mycgrp/a/b/c # echo threaded > mycgrp/a/b/cgroup.type At this point, the hierarchy looks as follows: mycgrp [d] a [dt] b [t] c [inv] Now let's make node "a" threaded (and thus "mycgrp" s made "domain threaded"): # echo threaded > mycgrp/a/cgroup.type By this point, we now have a hierarchy that looks as follows: mycgrp [dt] a [t] b [t] c [inv] But, when we try to convert the node "c" from "domain invalid" to "threaded", we get ENOTSUP on the write(): # echo threaded > mycgrp/a/b/c/cgroup.type sh: echo: write error: Operation not supported This patch fixes the problem by * Moving the opencoded ->dom_cgrp save and restoration in cgroup_enable_threaded() into cgroup_{save|restore}_control() so that mulitple cgroups can be handled. * Updating all threaded descendants' ->dom_cgrp to point to the new dom_cgrp when enabling threaded mode. Signed-off-by: Tejun Heo Reported-and-tested-by: "Michael Kerrisk (man-pages)" Reported-by: Amin Jamali Reported-by: Joao De Almeida Pereira Link: https://lore.kernel.org/r/CAKgNAkhHYCMn74TCNiMJ=ccLd7DcmXSbvw3CbZ1YREeG7iJM5g@mail.gmail.com Fixes: 454000adaa2a ("cgroup: introduce cgroup->dom_cgrp and threaded css_set handling") Cc: stable@vger.kernel.org # v4.14+ --- include/linux/cgroup-defs.h | 1 + kernel/cgroup/cgroup.c | 25 ++++++++++++++++--------- 2 files changed, 17 insertions(+), 9 deletions(-) diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index ff20b677fb9f..22254c1fe1c5 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -412,6 +412,7 @@ struct cgroup { * specific task are charged to the dom_cgrp. */ struct cgroup *dom_cgrp; + struct cgroup *old_dom_cgrp; /* used while enabling threaded */ /* per-cpu recursive resource statistics */ struct cgroup_rstat_cpu __percpu *rstat_cpu; diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index aae10baf1902..4a3dae2a8283 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -2836,11 +2836,12 @@ restart: } /** - * cgroup_save_control - save control masks of a subtree + * cgroup_save_control - save control masks and dom_cgrp of a subtree * @cgrp: root of the target subtree * - * Save ->subtree_control and ->subtree_ss_mask to the respective old_ - * prefixed fields for @cgrp's subtree including @cgrp itself. + * Save ->subtree_control, ->subtree_ss_mask and ->dom_cgrp to the + * respective old_ prefixed fields for @cgrp's subtree including @cgrp + * itself. */ static void cgroup_save_control(struct cgroup *cgrp) { @@ -2850,6 +2851,7 @@ static void cgroup_save_control(struct cgroup *cgrp) cgroup_for_each_live_descendant_pre(dsct, d_css, cgrp) { dsct->old_subtree_control = dsct->subtree_control; dsct->old_subtree_ss_mask = dsct->subtree_ss_mask; + dsct->old_dom_cgrp = dsct->dom_cgrp; } } @@ -2875,11 +2877,12 @@ static void cgroup_propagate_control(struct cgroup *cgrp) } /** - * cgroup_restore_control - restore control masks of a subtree + * cgroup_restore_control - restore control masks and dom_cgrp of a subtree * @cgrp: root of the target subtree * - * Restore ->subtree_control and ->subtree_ss_mask from the respective old_ - * prefixed fields for @cgrp's subtree including @cgrp itself. + * Restore ->subtree_control, ->subtree_ss_mask and ->dom_cgrp from the + * respective old_ prefixed fields for @cgrp's subtree including @cgrp + * itself. */ static void cgroup_restore_control(struct cgroup *cgrp) { @@ -2889,6 +2892,7 @@ static void cgroup_restore_control(struct cgroup *cgrp) cgroup_for_each_live_descendant_post(dsct, d_css, cgrp) { dsct->subtree_control = dsct->old_subtree_control; dsct->subtree_ss_mask = dsct->old_subtree_ss_mask; + dsct->dom_cgrp = dsct->old_dom_cgrp; } } @@ -3196,6 +3200,8 @@ static int cgroup_enable_threaded(struct cgroup *cgrp) { struct cgroup *parent = cgroup_parent(cgrp); struct cgroup *dom_cgrp = parent->dom_cgrp; + struct cgroup *dsct; + struct cgroup_subsys_state *d_css; int ret; lockdep_assert_held(&cgroup_mutex); @@ -3225,12 +3231,13 @@ static int cgroup_enable_threaded(struct cgroup *cgrp) */ cgroup_save_control(cgrp); - cgrp->dom_cgrp = dom_cgrp; + cgroup_for_each_live_descendant_pre(dsct, d_css, cgrp) + if (dsct == cgrp || cgroup_is_threaded(dsct)) + dsct->dom_cgrp = dom_cgrp; + ret = cgroup_apply_control(cgrp); if (!ret) parent->nr_threaded_children++; - else - cgrp->dom_cgrp = cgrp; cgroup_finalize_control(cgrp, ret); return ret; From f74c371fe72a4f820d287db8067683fb533e4ede Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 4 Oct 2018 17:41:37 -0700 Subject: [PATCH 046/134] Input: mousedev - add a schedule point in mousedev_write() syzbot was able to trigger rcu stalls by calling write() with large number of bytes. Add a cond_resched() in the loop to avoid this. Link: https://lkml.org/lkml/2018/8/23/1106 Signed-off-by: Eric Dumazet Reported-by: syzbot+9436b02171ac0894d33e@syzkaller.appspotmail.com Reviewed-by: Paul E. McKenney Signed-off-by: Dmitry Torokhov --- drivers/input/mousedev.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/input/mousedev.c b/drivers/input/mousedev.c index e08228061bcd..412fa71245af 100644 --- a/drivers/input/mousedev.c +++ b/drivers/input/mousedev.c @@ -707,6 +707,7 @@ static ssize_t mousedev_write(struct file *file, const char __user *buffer, mousedev_generate_response(client, c); spin_unlock_irq(&client->packet_lock); + cond_resched(); } kill_fasync(&client->fasync, SIGIO, POLL_IN); From e46368cf77f2cb6304c51d9ff5f147cfb7dc0074 Mon Sep 17 00:00:00 2001 From: Lyude Paul Date: Fri, 14 Sep 2018 16:44:03 -0400 Subject: [PATCH 047/134] drm/nouveau/drm/nouveau: Grab runtime PM ref in nv50_mstc_detect() While we currently grab a runtime PM ref in nouveau's normal connector detection code, we apparently don't do this for MST. This means if we're in a scenario where the GPU is suspended and userspace attempts to do a connector probe on an MSTC connector, the probe will fail entirely due to the DP aux channel and GPU not being woken up: [ 316.633489] nouveau 0000:01:00.0: i2c: aux 000a: begin idle timeout ffffffff [ 316.635713] nouveau 0000:01:00.0: i2c: aux 000a: begin idle timeout ffffffff [ 316.637785] nouveau 0000:01:00.0: i2c: aux 000a: begin idle timeout ffffffff ... So, grab a runtime PM ref here. Signed-off-by: Lyude Paul Cc: stable@vger.kernel.org Reviewed-by: Karol Herbst Signed-off-by: Ben Skeggs --- drivers/gpu/drm/nouveau/dispnv50/disp.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c b/drivers/gpu/drm/nouveau/dispnv50/disp.c index 5691dfa1db6f..041e7daf8a33 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/disp.c +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c @@ -900,9 +900,22 @@ static enum drm_connector_status nv50_mstc_detect(struct drm_connector *connector, bool force) { struct nv50_mstc *mstc = nv50_mstc(connector); + enum drm_connector_status conn_status; + int ret; + if (!mstc->port) return connector_status_disconnected; - return drm_dp_mst_detect_port(connector, mstc->port->mgr, mstc->port); + + ret = pm_runtime_get_sync(connector->dev->dev); + if (ret < 0 && ret != -EACCES) + return connector_status_disconnected; + + conn_status = drm_dp_mst_detect_port(connector, mstc->port->mgr, + mstc->port); + + pm_runtime_mark_last_busy(connector->dev->dev); + pm_runtime_put_autosuspend(connector->dev->dev); + return conn_status; } static void From 0d41e1d28c2e969094ef7933b8521f1e08d30251 Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Fri, 5 Oct 2018 19:04:22 +1000 Subject: [PATCH 048/134] xfs: refactor clonerange preparation into a separate helper Refactor all the reflink preparation steps into a separate helper that we'll use to land all the upcoming fixes for insufficient input checks. This rework also moves the invalidation of the destination range to the prep function so that it is done before the range is remapped. This ensures that nobody can access the data in range being remapped until the remap is complete. [dgc: fix xfs_reflink_remap_prep() return value and caller check to handle vfs_clone_file_prep_inodes() returning 0 to mean "nothing to do". ] [dgc: make sure length changed by vfs_clone_file_prep_inodes() gets propagated back to XFS code that does the remapping. ] Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Dave Chinner --- fs/xfs/xfs_reflink.c | 100 +++++++++++++++++++++++++++++++------------ 1 file changed, 73 insertions(+), 27 deletions(-) diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 5289e22cb081..da1a447bef51 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1220,35 +1220,47 @@ retry: return 0; } +/* Unlock both inodes after they've been prepped for a range clone. */ +STATIC void +xfs_reflink_remap_unlock( + struct file *file_in, + struct file *file_out) +{ + struct inode *inode_in = file_inode(file_in); + struct xfs_inode *src = XFS_I(inode_in); + struct inode *inode_out = file_inode(file_out); + struct xfs_inode *dest = XFS_I(inode_out); + bool same_inode = (inode_in == inode_out); + + xfs_iunlock(dest, XFS_MMAPLOCK_EXCL); + if (!same_inode) + xfs_iunlock(src, XFS_MMAPLOCK_SHARED); + inode_unlock(inode_out); + if (!same_inode) + inode_unlock_shared(inode_in); +} + /* - * Link a range of blocks from one file to another. + * Prepare two files for range cloning. Upon a successful return both inodes + * will have the iolock and mmaplock held, the page cache of the out file + * will be truncated, and any leases on the out file will have been broken. */ -int -xfs_reflink_remap_range( +STATIC int +xfs_reflink_remap_prep( struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, - u64 len, + u64 *len, bool is_dedupe) { struct inode *inode_in = file_inode(file_in); struct xfs_inode *src = XFS_I(inode_in); struct inode *inode_out = file_inode(file_out); struct xfs_inode *dest = XFS_I(inode_out); - struct xfs_mount *mp = src->i_mount; bool same_inode = (inode_in == inode_out); - xfs_fileoff_t sfsbno, dfsbno; - xfs_filblks_t fsblen; - xfs_extlen_t cowextsize; ssize_t ret; - if (!xfs_sb_version_hasreflink(&mp->m_sb)) - return -EOPNOTSUPP; - - if (XFS_FORCED_SHUTDOWN(mp)) - return -EIO; - /* Lock both files against IO */ ret = xfs_iolock_two_inodes_and_break_layout(inode_in, inode_out); if (ret) @@ -1270,7 +1282,7 @@ xfs_reflink_remap_range( goto out_unlock; ret = vfs_clone_file_prep_inodes(inode_in, pos_in, inode_out, pos_out, - &len, is_dedupe); + len, is_dedupe); if (ret <= 0) goto out_unlock; @@ -1279,8 +1291,6 @@ xfs_reflink_remap_range( if (ret) goto out_unlock; - trace_xfs_reflink_remap_range(src, pos_in, len, dest, pos_out); - /* * Clear out post-eof preallocations because we don't have page cache * backing the delayed allocations and they'll never get freed on @@ -1297,6 +1307,51 @@ xfs_reflink_remap_range( if (ret) goto out_unlock; + /* Zap any page cache for the destination file's range. */ + truncate_inode_pages_range(&inode_out->i_data, pos_out, + PAGE_ALIGN(pos_out + *len) - 1); + return 1; +out_unlock: + xfs_reflink_remap_unlock(file_in, file_out); + return ret; +} + +/* + * Link a range of blocks from one file to another. + */ +int +xfs_reflink_remap_range( + struct file *file_in, + loff_t pos_in, + struct file *file_out, + loff_t pos_out, + u64 len, + bool is_dedupe) +{ + struct inode *inode_in = file_inode(file_in); + struct xfs_inode *src = XFS_I(inode_in); + struct inode *inode_out = file_inode(file_out); + struct xfs_inode *dest = XFS_I(inode_out); + struct xfs_mount *mp = src->i_mount; + xfs_fileoff_t sfsbno, dfsbno; + xfs_filblks_t fsblen; + xfs_extlen_t cowextsize; + ssize_t ret; + + if (!xfs_sb_version_hasreflink(&mp->m_sb)) + return -EOPNOTSUPP; + + if (XFS_FORCED_SHUTDOWN(mp)) + return -EIO; + + /* Prepare and then clone file data. */ + ret = xfs_reflink_remap_prep(file_in, pos_in, file_out, pos_out, + &len, is_dedupe); + if (ret <= 0) + return ret; + + trace_xfs_reflink_remap_range(src, pos_in, len, dest, pos_out); + dfsbno = XFS_B_TO_FSBT(mp, pos_out); sfsbno = XFS_B_TO_FSBT(mp, pos_in); fsblen = XFS_B_TO_FSB(mp, len); @@ -1305,10 +1360,6 @@ xfs_reflink_remap_range( if (ret) goto out_unlock; - /* Zap any page cache for the destination file's range. */ - truncate_inode_pages_range(&inode_out->i_data, pos_out, - PAGE_ALIGN(pos_out + len) - 1); - /* * Carry the cowextsize hint from src to dest if we're sharing the * entire source file to the entire destination file, the source file @@ -1325,12 +1376,7 @@ xfs_reflink_remap_range( is_dedupe); out_unlock: - xfs_iunlock(dest, XFS_MMAPLOCK_EXCL); - if (!same_inode) - xfs_iunlock(src, XFS_MMAPLOCK_SHARED); - inode_unlock(inode_out); - if (!same_inode) - inode_unlock_shared(inode_in); + xfs_reflink_remap_unlock(file_in, file_out); if (ret) trace_xfs_reflink_remap_range_error(dest, ret, _RET_IP_); return ret; From 410fdc72b05afabef3afb51167085799dcc7b3cf Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Fri, 5 Oct 2018 19:04:27 +1000 Subject: [PATCH 049/134] xfs: zero posteof blocks when cloning above eof When we're reflinking between two files and the destination file range is well beyond the destination file's EOF marker, zero any posteof speculative preallocations in the destination file so that we don't expose stale disk contents. The previous strategy of trying to clear the preallocations does not work if the destination file has the PREALLOC flag set. Uncovered by shared/010. Reported-by: Zorro Lang Bugzilla-id: https://bugzilla.kernel.org/show_bug.cgi?id=201259 Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Dave Chinner --- fs/xfs/xfs_reflink.c | 33 +++++++++++++++++++++++++-------- 1 file changed, 25 insertions(+), 8 deletions(-) diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index da1a447bef51..f135748d8282 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1240,6 +1240,26 @@ xfs_reflink_remap_unlock( inode_unlock_shared(inode_in); } +/* + * If we're reflinking to a point past the destination file's EOF, we must + * zero any speculative post-EOF preallocations that sit between the old EOF + * and the destination file offset. + */ +static int +xfs_reflink_zero_posteof( + struct xfs_inode *ip, + loff_t pos) +{ + loff_t isize = i_size_read(VFS_I(ip)); + + if (pos <= isize) + return 0; + + trace_xfs_zero_eof(ip, isize, pos - isize); + return iomap_zero_range(VFS_I(ip), isize, pos - isize, NULL, + &xfs_iomap_ops); +} + /* * Prepare two files for range cloning. Upon a successful return both inodes * will have the iolock and mmaplock held, the page cache of the out file @@ -1292,15 +1312,12 @@ xfs_reflink_remap_prep( goto out_unlock; /* - * Clear out post-eof preallocations because we don't have page cache - * backing the delayed allocations and they'll never get freed on - * their own. + * Zero existing post-eof speculative preallocations in the destination + * file. */ - if (xfs_can_free_eofblocks(dest, true)) { - ret = xfs_free_eofblocks(dest); - if (ret) - goto out_unlock; - } + ret = xfs_reflink_zero_posteof(dest, pos_out); + if (ret) + goto out_unlock; /* Set flags and remap blocks. */ ret = xfs_reflink_set_inode_flag(src, dest); From 7debbf015f580693680f3d2a3cef0cf99dcef688 Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Fri, 5 Oct 2018 19:05:41 +1000 Subject: [PATCH 050/134] xfs: update ctime and remove suid before cloning files Before cloning into a file, update the ctime and remove sensitive attributes like suid, just like we'd do for a regular file write. Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Dave Chinner --- fs/xfs/xfs_reflink.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index f135748d8282..59da9708e9c1 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1264,6 +1264,7 @@ xfs_reflink_zero_posteof( * Prepare two files for range cloning. Upon a successful return both inodes * will have the iolock and mmaplock held, the page cache of the out file * will be truncated, and any leases on the out file will have been broken. + * This function borrows heavily from xfs_file_aio_write_checks. */ STATIC int xfs_reflink_remap_prep( @@ -1327,6 +1328,30 @@ xfs_reflink_remap_prep( /* Zap any page cache for the destination file's range. */ truncate_inode_pages_range(&inode_out->i_data, pos_out, PAGE_ALIGN(pos_out + *len) - 1); + + /* If we're altering the file contents... */ + if (!is_dedupe) { + /* + * ...update the timestamps (which will grab the ilock again + * from xfs_fs_dirty_inode, so we have to call it before we + * take the ilock). + */ + if (!(file_out->f_mode & FMODE_NOCMTIME)) { + ret = file_update_time(file_out); + if (ret) + goto out_unlock; + } + + /* + * ...clear the security bits if the process is not being run + * by root. This keeps people from modifying setuid and setgid + * binaries. + */ + ret = file_remove_privs(file_out); + if (ret) + goto out_unlock; + } + return 1; out_unlock: xfs_reflink_remap_unlock(file_in, file_out); From 9ce7610e6d201e3923c0b2f454f2e1d54f5da49e Mon Sep 17 00:00:00 2001 From: Jarkko Nikula Date: Mon, 1 Oct 2018 14:49:05 +0300 Subject: [PATCH 051/134] i2c: designware: Call i2c_dw_clk_rate() only when calculating timings There are platforms which don't provide input clock rate but provide I2C timing parameters. Commit 3bd4f277274b ("i2c: designware: Call i2c_dw_clk_rate() only once in i2c_dw_init_master()") causes needless warning during probe on those platforms since i2c_dw_clk_rate(), which causes the warning when input clock is unknown, is called even when there is no need to calculate timing parameters. Fixes: 3bd4f277274b ("i2c: designware: Call i2c_dw_clk_rate() only once in i2c_dw_init_master()") Reported-by: Ard Biesheuvel Cc: # 4.19 Signed-off-by: Jarkko Nikula Tested-by: Ard Biesheuvel Signed-off-by: Wolfram Sang --- drivers/i2c/busses/i2c-designware-master.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/i2c/busses/i2c-designware-master.c b/drivers/i2c/busses/i2c-designware-master.c index 94d94b4a9a0d..18cc324f3ca9 100644 --- a/drivers/i2c/busses/i2c-designware-master.c +++ b/drivers/i2c/busses/i2c-designware-master.c @@ -34,11 +34,11 @@ static void i2c_dw_configure_fifo_master(struct dw_i2c_dev *dev) static int i2c_dw_set_timings_master(struct dw_i2c_dev *dev) { - u32 ic_clk = i2c_dw_clk_rate(dev); const char *mode_str, *fp_str = ""; u32 comp_param1; u32 sda_falling_time, scl_falling_time; struct i2c_timings *t = &dev->timings; + u32 ic_clk; int ret; ret = i2c_dw_acquire_lock(dev); @@ -53,6 +53,7 @@ static int i2c_dw_set_timings_master(struct dw_i2c_dev *dev) /* Calculate SCL timing parameters for standard mode if not set */ if (!dev->ss_hcnt || !dev->ss_lcnt) { + ic_clk = i2c_dw_clk_rate(dev); dev->ss_hcnt = i2c_dw_scl_hcnt(ic_clk, 4000, /* tHD;STA = tHIGH = 4.0 us */ @@ -89,6 +90,7 @@ static int i2c_dw_set_timings_master(struct dw_i2c_dev *dev) * needed also in high speed mode. */ if (!dev->fs_hcnt || !dev->fs_lcnt) { + ic_clk = i2c_dw_clk_rate(dev); dev->fs_hcnt = i2c_dw_scl_hcnt(ic_clk, 600, /* tHD;STA = tHIGH = 0.6 us */ From a932ed3b718147c6537da290b7a91e990fdedb43 Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Fri, 5 Oct 2018 16:43:55 +1000 Subject: [PATCH 052/134] powerpc: Don't print kernel instructions in show_user_instructions() Recently we implemented show_user_instructions() which dumps the code around the NIP when a user space process dies with an unhandled signal. This was modelled on the x86 code, and we even went so far as to implement the exact same bug, namely that if the user process crashed with its NIP pointing into the kernel we will dump kernel text to dmesg. eg: bad-bctr[2996]: segfault (11) at c000000000010000 nip c000000000010000 lr 12d0b0894 code 1 bad-bctr[2996]: code: fbe10068 7cbe2b78 7c7f1b78 fb610048 38a10028 38810020 fb810050 7f8802a6 bad-bctr[2996]: code: 3860001c f8010080 48242371 60000000 <7c7b1b79> 4082002c e8010080 eb610048 This was discovered on x86 by Jann Horn and fixed in commit 342db04ae712 ("x86/dumpstack: Don't dump kernel memory based on usermode RIP"). Fix it by checking the adjusted NIP value (pc) and number of instructions against USER_DS, and bail if we fail the check, eg: bad-bctr[2969]: segfault (11) at c000000000010000 nip c000000000010000 lr 107930894 code 1 bad-bctr[2969]: Bad NIP, not dumping instructions. Fixes: 88b0fe175735 ("powerpc: Add show_user_instructions()") Signed-off-by: Michael Ellerman --- arch/powerpc/kernel/process.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 913c5725cdb2..bb6ac471a784 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1306,6 +1306,16 @@ void show_user_instructions(struct pt_regs *regs) pc = regs->nip - (instructions_to_print * 3 / 4 * sizeof(int)); + /* + * Make sure the NIP points at userspace, not kernel text/data or + * elsewhere. + */ + if (!__access_ok(pc, instructions_to_print * sizeof(int), USER_DS)) { + pr_info("%s[%d]: Bad NIP, not dumping instructions.\n", + current->comm, current->pid); + return; + } + pr_info("%s[%d]: code: ", current->comm, current->pid); for (i = 0; i < instructions_to_print; i++) { From ac1788cc7da4ce54edcfd2e499afdb0a23d5c41d Mon Sep 17 00:00:00 2001 From: Srikar Dronamraju Date: Fri, 28 Sep 2018 09:17:32 +0530 Subject: [PATCH 053/134] powerpc/numa: Skip onlining a offline node in kdump path With commit 2ea626306810 ("powerpc/topology: Get topology for shared processors at boot"), kdump kernel on shared LPAR may crash. The necessary conditions are - Shared LPAR with at least 2 nodes having memory and CPUs. - Memory requirement for kdump kernel must be met by the first N-1 nodes where there are at least N nodes with memory and CPUs. Example numactl of such a machine. $ numactl -H available: 5 nodes (0,2,5-7) node 0 cpus: node 0 size: 0 MB node 0 free: 0 MB node 2 cpus: node 2 size: 255 MB node 2 free: 189 MB node 5 cpus: 24 25 26 27 28 29 30 31 node 5 size: 4095 MB node 5 free: 4024 MB node 6 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23 node 6 size: 6353 MB node 6 free: 5998 MB node 7 cpus: 8 9 10 11 12 13 14 15 32 33 34 35 36 37 38 39 node 7 size: 7640 MB node 7 free: 7164 MB node distances: node 0 2 5 6 7 0: 10 40 40 40 40 2: 40 10 40 40 40 5: 40 40 10 40 40 6: 40 40 40 10 20 7: 40 40 40 20 10 Steps to reproduce. 1. Load / start kdump service. 2. Trigger a kdump (for example : echo c > /proc/sysrq-trigger) When booting a kdump kernel with 2048M: kexec: Starting switchover sequence. I'm in purgatory Using 1TB segments hash-mmu: Initializing hash mmu with SLB Linux version 4.19.0-rc5-master+ (srikar@linux-xxu6) (gcc version 4.8.5 (SUSE Linux)) #1 SMP Thu Sep 27 19:45:00 IST 2018 Found initrd at 0xc000000009e70000:0xc00000000ae554b4 Using pSeries machine description ----------------------------------------------------- ppc64_pft_size = 0x1e phys_mem_size = 0x88000000 dcache_bsize = 0x80 icache_bsize = 0x80 cpu_features = 0x000000ff8f5d91a7 possible = 0x0000fbffcf5fb1a7 always = 0x0000006f8b5c91a1 cpu_user_features = 0xdc0065c2 0xef000000 mmu_features = 0x7c006001 firmware_features = 0x00000007c45bfc57 htab_hash_mask = 0x7fffff physical_start = 0x8000000 ----------------------------------------------------- numa: NODE_DATA [mem 0x87d5e300-0x87d67fff] numa: NODE_DATA(0) on node 6 numa: NODE_DATA [mem 0x87d54600-0x87d5e2ff] Top of RAM: 0x88000000, Total RAM: 0x88000000 Memory hole size: 0MB Zone ranges: DMA [mem 0x0000000000000000-0x0000000087ffffff] DMA32 empty Normal empty Movable zone start for each node Early memory node ranges node 6: [mem 0x0000000000000000-0x0000000087ffffff] Could not find start_pfn for node 0 Initmem setup node 0 [mem 0x0000000000000000-0x0000000000000000] On node 0 totalpages: 0 Initmem setup node 6 [mem 0x0000000000000000-0x0000000087ffffff] On node 6 totalpages: 34816 Unable to handle kernel paging request for data at address 0x00000060 Faulting instruction address: 0xc000000008703a54 Oops: Kernel access of bad area, sig: 11 [#1] LE SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 11 PID: 1 Comm: swapper/11 Not tainted 4.19.0-rc5-master+ #1 NIP: c000000008703a54 LR: c000000008703a38 CTR: 0000000000000000 REGS: c00000000b673440 TRAP: 0380 Not tainted (4.19.0-rc5-master+) MSR: 8000000002009033 CR: 24022022 XER: 20000002 CFAR: c0000000086fc238 IRQMASK: 0 GPR00: c000000008703a38 c00000000b6736c0 c000000009281900 0000000000000000 GPR04: 0000000000000000 0000000000000000 fffffffffffff001 c00000000b660080 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000220 GPR12: 0000000000002200 c000000009e51400 0000000000000000 0000000000000008 GPR16: 0000000000000000 c000000008c152e8 c000000008c152a8 0000000000000000 GPR20: c000000009422fd8 c000000009412fd8 c000000009426040 0000000000000008 GPR24: 0000000000000000 0000000000000000 c000000009168bc8 c000000009168c78 GPR28: c00000000b126410 0000000000000000 c00000000916a0b8 c00000000b126400 NIP [c000000008703a54] bus_add_device+0x84/0x1e0 LR [c000000008703a38] bus_add_device+0x68/0x1e0 Call Trace: [c00000000b6736c0] [c000000008703a38] bus_add_device+0x68/0x1e0 (unreliable) [c00000000b673740] [c000000008700194] device_add+0x454/0x7c0 [c00000000b673800] [c00000000872e660] __register_one_node+0xb0/0x240 [c00000000b673860] [c00000000839a6bc] __try_online_node+0x12c/0x180 [c00000000b673900] [c00000000839b978] try_online_node+0x58/0x90 [c00000000b673930] [c0000000080846d8] find_and_online_cpu_nid+0x158/0x190 [c00000000b673a10] [c0000000080848a0] numa_update_cpu_topology+0x190/0x580 [c00000000b673c00] [c000000008d3f2e4] smp_cpus_done+0x94/0x108 [c00000000b673c70] [c000000008d5c00c] smp_init+0x174/0x19c [c00000000b673d00] [c000000008d346b8] kernel_init_freeable+0x1e0/0x450 [c00000000b673dc0] [c0000000080102e8] kernel_init+0x28/0x160 [c00000000b673e30] [c00000000800b65c] ret_from_kernel_thread+0x5c/0x80 Instruction dump: 60000000 60000000 e89e0020 7fe3fb78 4bff87d5 60000000 7c7d1b79 4082008c e8bf0050 e93e0098 3b9f0010 2fa50000 38630018 419e0114 7f84e378 ---[ end trace 593577668c2daa65 ]--- However a regular kernel with 4096M (2048 gets reserved for crash kernel) boots properly. Unlike regular kernels, which mark all available nodes as online, kdump kernel only marks just enough nodes as online and marks the rest as offline at boot. However kdump kernel boots with all available CPUs. With Commit 2ea626306810 ("powerpc/topology: Get topology for shared processors at boot"), all CPUs are onlined on their respective nodes at boot time. try_online_node() tries to online the offline nodes but fails as all needed subsystems are not yet initialized. As part of fix, detect and skip early onlining of a offline node. Fixes: 2ea626306810 ("powerpc/topology: Get topology for shared processors at boot") Reported-by: Pavithra Prakash Signed-off-by: Srikar Dronamraju Tested-by: Hari Bathini Signed-off-by: Michael Ellerman --- arch/powerpc/mm/numa.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 59d07bd5374a..055b211b7126 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -1217,9 +1217,10 @@ int find_and_online_cpu_nid(int cpu) * Need to ensure that NODE_DATA is initialized for a node from * available memory (see memblock_alloc_try_nid). If unable to * init the node, then default to nearest node that has memory - * installed. + * installed. Skip onlining a node if the subsystems are not + * yet initialized. */ - if (try_online_node(new_nid)) + if (!topology_inited || try_online_node(new_nid)) new_nid = first_online_node; #else /* From 5e33a23ba4b56c109b732d57a0a76558a37d9ec5 Mon Sep 17 00:00:00 2001 From: David Howells Date: Fri, 5 Oct 2018 14:05:34 +0100 Subject: [PATCH 054/134] rxrpc: Fix some missed refs to init_net Fix some refs to init_net that should've been changed to the appropriate network namespace. Fixes: 2baec2c3f854 ("rxrpc: Support network namespacing") Signed-off-by: David Howells Acked-by: Paolo Abeni --- net/rxrpc/ar-internal.h | 10 ++++++---- net/rxrpc/call_accept.c | 2 +- net/rxrpc/call_object.c | 4 ++-- net/rxrpc/conn_client.c | 10 ++++++---- net/rxrpc/input.c | 4 ++-- net/rxrpc/peer_object.c | 28 +++++++++++++++++----------- 6 files changed, 34 insertions(+), 24 deletions(-) diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index ef9554131434..63c43b3a2096 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -891,8 +891,9 @@ extern unsigned long rxrpc_conn_idle_client_fast_expiry; extern struct idr rxrpc_client_conn_ids; void rxrpc_destroy_client_conn_ids(void); -int rxrpc_connect_call(struct rxrpc_call *, struct rxrpc_conn_parameters *, - struct sockaddr_rxrpc *, gfp_t); +int rxrpc_connect_call(struct rxrpc_sock *, struct rxrpc_call *, + struct rxrpc_conn_parameters *, struct sockaddr_rxrpc *, + gfp_t); void rxrpc_expose_client_call(struct rxrpc_call *); void rxrpc_disconnect_client_call(struct rxrpc_call *); void rxrpc_put_client_conn(struct rxrpc_connection *); @@ -1045,10 +1046,11 @@ void rxrpc_peer_keepalive_worker(struct work_struct *); */ struct rxrpc_peer *rxrpc_lookup_peer_rcu(struct rxrpc_local *, const struct sockaddr_rxrpc *); -struct rxrpc_peer *rxrpc_lookup_peer(struct rxrpc_local *, +struct rxrpc_peer *rxrpc_lookup_peer(struct rxrpc_sock *, struct rxrpc_local *, struct sockaddr_rxrpc *, gfp_t); struct rxrpc_peer *rxrpc_alloc_peer(struct rxrpc_local *, gfp_t); -void rxrpc_new_incoming_peer(struct rxrpc_local *, struct rxrpc_peer *); +void rxrpc_new_incoming_peer(struct rxrpc_sock *, struct rxrpc_local *, + struct rxrpc_peer *); void rxrpc_destroy_all_peers(struct rxrpc_net *); struct rxrpc_peer *rxrpc_get_peer(struct rxrpc_peer *); struct rxrpc_peer *rxrpc_get_peer_maybe(struct rxrpc_peer *); diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c index 9c7f26d06a52..f55f67894465 100644 --- a/net/rxrpc/call_accept.c +++ b/net/rxrpc/call_accept.c @@ -287,7 +287,7 @@ static struct rxrpc_call *rxrpc_alloc_incoming_call(struct rxrpc_sock *rx, (peer_tail + 1) & (RXRPC_BACKLOG_MAX - 1)); - rxrpc_new_incoming_peer(local, peer); + rxrpc_new_incoming_peer(rx, local, peer); } /* Now allocate and set up the connection */ diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c index 799f75b6900d..0ca2c2dfd196 100644 --- a/net/rxrpc/call_object.c +++ b/net/rxrpc/call_object.c @@ -287,7 +287,7 @@ struct rxrpc_call *rxrpc_new_client_call(struct rxrpc_sock *rx, /* Set up or get a connection record and set the protocol parameters, * including channel number and call ID. */ - ret = rxrpc_connect_call(call, cp, srx, gfp); + ret = rxrpc_connect_call(rx, call, cp, srx, gfp); if (ret < 0) goto error; @@ -339,7 +339,7 @@ int rxrpc_retry_client_call(struct rxrpc_sock *rx, /* Set up or get a connection record and set the protocol parameters, * including channel number and call ID. */ - ret = rxrpc_connect_call(call, cp, srx, gfp); + ret = rxrpc_connect_call(rx, call, cp, srx, gfp); if (ret < 0) goto error; diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c index 8acf74fe24c0..521189f4b666 100644 --- a/net/rxrpc/conn_client.c +++ b/net/rxrpc/conn_client.c @@ -276,7 +276,8 @@ dont_reuse: * If we return with a connection, the call will be on its waiting list. It's * left to the caller to assign a channel and wake up the call. */ -static int rxrpc_get_client_conn(struct rxrpc_call *call, +static int rxrpc_get_client_conn(struct rxrpc_sock *rx, + struct rxrpc_call *call, struct rxrpc_conn_parameters *cp, struct sockaddr_rxrpc *srx, gfp_t gfp) @@ -289,7 +290,7 @@ static int rxrpc_get_client_conn(struct rxrpc_call *call, _enter("{%d,%lx},", call->debug_id, call->user_call_ID); - cp->peer = rxrpc_lookup_peer(cp->local, srx, gfp); + cp->peer = rxrpc_lookup_peer(rx, cp->local, srx, gfp); if (!cp->peer) goto error; @@ -683,7 +684,8 @@ out: * find a connection for a call * - called in process context with IRQs enabled */ -int rxrpc_connect_call(struct rxrpc_call *call, +int rxrpc_connect_call(struct rxrpc_sock *rx, + struct rxrpc_call *call, struct rxrpc_conn_parameters *cp, struct sockaddr_rxrpc *srx, gfp_t gfp) @@ -696,7 +698,7 @@ int rxrpc_connect_call(struct rxrpc_call *call, rxrpc_discard_expired_client_conns(&rxnet->client_conn_reaper); rxrpc_cull_active_client_conns(rxnet); - ret = rxrpc_get_client_conn(call, cp, srx, gfp); + ret = rxrpc_get_client_conn(rx, call, cp, srx, gfp); if (ret < 0) goto out; diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index 800f5b8a1baa..c5af9955665b 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -1156,12 +1156,12 @@ void rxrpc_data_ready(struct sock *udp_sk) /* we'll probably need to checksum it (didn't call sock_recvmsg) */ if (skb_checksum_complete(skb)) { rxrpc_free_skb(skb, rxrpc_skb_rx_freed); - __UDP_INC_STATS(&init_net, UDP_MIB_INERRORS, 0); + __UDP_INC_STATS(sock_net(udp_sk), UDP_MIB_INERRORS, 0); _leave(" [CSUM failed]"); return; } - __UDP_INC_STATS(&init_net, UDP_MIB_INDATAGRAMS, 0); + __UDP_INC_STATS(sock_net(udp_sk), UDP_MIB_INDATAGRAMS, 0); /* The UDP protocol already released all skb resources; * we are free to add our own data there. diff --git a/net/rxrpc/peer_object.c b/net/rxrpc/peer_object.c index 01a9febfa367..2d39eaf19620 100644 --- a/net/rxrpc/peer_object.c +++ b/net/rxrpc/peer_object.c @@ -153,8 +153,10 @@ struct rxrpc_peer *rxrpc_lookup_peer_rcu(struct rxrpc_local *local, * assess the MTU size for the network interface through which this peer is * reached */ -static void rxrpc_assess_MTU_size(struct rxrpc_peer *peer) +static void rxrpc_assess_MTU_size(struct rxrpc_sock *rx, + struct rxrpc_peer *peer) { + struct net *net = sock_net(&rx->sk); struct dst_entry *dst; struct rtable *rt; struct flowi fl; @@ -169,7 +171,7 @@ static void rxrpc_assess_MTU_size(struct rxrpc_peer *peer) switch (peer->srx.transport.family) { case AF_INET: rt = ip_route_output_ports( - &init_net, fl4, NULL, + net, fl4, NULL, peer->srx.transport.sin.sin_addr.s_addr, 0, htons(7000), htons(7001), IPPROTO_UDP, 0, 0); if (IS_ERR(rt)) { @@ -188,7 +190,7 @@ static void rxrpc_assess_MTU_size(struct rxrpc_peer *peer) sizeof(struct in6_addr)); fl6->fl6_dport = htons(7001); fl6->fl6_sport = htons(7000); - dst = ip6_route_output(&init_net, NULL, fl6); + dst = ip6_route_output(net, NULL, fl6); if (dst->error) { _leave(" [route err %d]", dst->error); return; @@ -240,10 +242,11 @@ struct rxrpc_peer *rxrpc_alloc_peer(struct rxrpc_local *local, gfp_t gfp) /* * Initialise peer record. */ -static void rxrpc_init_peer(struct rxrpc_peer *peer, unsigned long hash_key) +static void rxrpc_init_peer(struct rxrpc_sock *rx, struct rxrpc_peer *peer, + unsigned long hash_key) { peer->hash_key = hash_key; - rxrpc_assess_MTU_size(peer); + rxrpc_assess_MTU_size(rx, peer); peer->mtu = peer->if_mtu; peer->rtt_last_req = ktime_get_real(); @@ -275,7 +278,8 @@ static void rxrpc_init_peer(struct rxrpc_peer *peer, unsigned long hash_key) /* * Set up a new peer. */ -static struct rxrpc_peer *rxrpc_create_peer(struct rxrpc_local *local, +static struct rxrpc_peer *rxrpc_create_peer(struct rxrpc_sock *rx, + struct rxrpc_local *local, struct sockaddr_rxrpc *srx, unsigned long hash_key, gfp_t gfp) @@ -287,7 +291,7 @@ static struct rxrpc_peer *rxrpc_create_peer(struct rxrpc_local *local, peer = rxrpc_alloc_peer(local, gfp); if (peer) { memcpy(&peer->srx, srx, sizeof(*srx)); - rxrpc_init_peer(peer, hash_key); + rxrpc_init_peer(rx, peer, hash_key); } _leave(" = %p", peer); @@ -299,14 +303,15 @@ static struct rxrpc_peer *rxrpc_create_peer(struct rxrpc_local *local, * since we've already done a search in the list from the non-reentrant context * (the data_ready handler) that is the only place we can add new peers. */ -void rxrpc_new_incoming_peer(struct rxrpc_local *local, struct rxrpc_peer *peer) +void rxrpc_new_incoming_peer(struct rxrpc_sock *rx, struct rxrpc_local *local, + struct rxrpc_peer *peer) { struct rxrpc_net *rxnet = local->rxnet; unsigned long hash_key; hash_key = rxrpc_peer_hash_key(local, &peer->srx); peer->local = local; - rxrpc_init_peer(peer, hash_key); + rxrpc_init_peer(rx, peer, hash_key); spin_lock(&rxnet->peer_hash_lock); hash_add_rcu(rxnet->peer_hash, &peer->hash_link, hash_key); @@ -317,7 +322,8 @@ void rxrpc_new_incoming_peer(struct rxrpc_local *local, struct rxrpc_peer *peer) /* * obtain a remote transport endpoint for the specified address */ -struct rxrpc_peer *rxrpc_lookup_peer(struct rxrpc_local *local, +struct rxrpc_peer *rxrpc_lookup_peer(struct rxrpc_sock *rx, + struct rxrpc_local *local, struct sockaddr_rxrpc *srx, gfp_t gfp) { struct rxrpc_peer *peer, *candidate; @@ -337,7 +343,7 @@ struct rxrpc_peer *rxrpc_lookup_peer(struct rxrpc_local *local, /* The peer is not yet present in hash - create a candidate * for a new record and then redo the search. */ - candidate = rxrpc_create_peer(local, srx, hash_key, gfp); + candidate = rxrpc_create_peer(rx, local, srx, hash_key, gfp); if (!candidate) { _leave(" = NULL [nomem]"); return NULL; From 2cfa2271604bb26e75b828d38f357ed084464795 Mon Sep 17 00:00:00 2001 From: David Howells Date: Fri, 5 Oct 2018 14:05:35 +0100 Subject: [PATCH 055/134] rxrpc: Fix the data_ready handler Fix the rxrpc_data_ready() function to pick up all packets and to not miss any. There are two problems: (1) The sk_data_ready pointer on the UDP socket is set *after* it is bound. This means that it's open for business before we're ready to dequeue packets and there's a tiny window exists in which a packet can sneak onto the receive queue, but we never know about it. Fix this by setting the pointers on the socket prior to binding it. (2) skb_recv_udp() will return an error (such as ENETUNREACH) if there was an error on the transmission side, even though we set the sk_error_report hook. Because rxrpc_data_ready() returns immediately in such a case, it never actually removes its packet from the receive queue. Fix this by abstracting out the UDP dequeuing and checksumming into a separate function that keeps hammering on skb_recv_udp() until it returns -EAGAIN, passing the packets extracted to the remainder of the function. and two potential problems: (3) It might be possible in some circumstances or in the future for packets to be being added to the UDP receive queue whilst rxrpc is running consuming them, so the data_ready() handler might get called less often than once per packet. Allow for this by fully draining the queue on each call as (2). (4) If a packet fails the checksum check, the code currently returns after discarding the packet without checking for more. Allow for this by fully draining the queue on each call as (2). Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells Acked-by: Paolo Abeni --- net/rxrpc/input.c | 68 ++++++++++++++++++++++------------------ net/rxrpc/local_object.c | 11 ++++--- 2 files changed, 44 insertions(+), 35 deletions(-) diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index c5af9955665b..c3114fa66c92 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -1121,7 +1121,7 @@ int rxrpc_extract_header(struct rxrpc_skb_priv *sp, struct sk_buff *skb) * shut down and the local endpoint from going away, thus sk_user_data will not * be cleared until this function returns. */ -void rxrpc_data_ready(struct sock *udp_sk) +void rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb) { struct rxrpc_connection *conn; struct rxrpc_channel *chan; @@ -1130,39 +1130,11 @@ void rxrpc_data_ready(struct sock *udp_sk) struct rxrpc_local *local = udp_sk->sk_user_data; struct rxrpc_peer *peer = NULL; struct rxrpc_sock *rx = NULL; - struct sk_buff *skb; unsigned int channel; - int ret, skew = 0; + int skew = 0; _enter("%p", udp_sk); - ASSERT(!irqs_disabled()); - - skb = skb_recv_udp(udp_sk, 0, 1, &ret); - if (!skb) { - if (ret == -EAGAIN) - return; - _debug("UDP socket error %d", ret); - return; - } - - if (skb->tstamp == 0) - skb->tstamp = ktime_get_real(); - - rxrpc_new_skb(skb, rxrpc_skb_rx_received); - - _net("recv skb %p", skb); - - /* we'll probably need to checksum it (didn't call sock_recvmsg) */ - if (skb_checksum_complete(skb)) { - rxrpc_free_skb(skb, rxrpc_skb_rx_freed); - __UDP_INC_STATS(sock_net(udp_sk), UDP_MIB_INERRORS, 0); - _leave(" [CSUM failed]"); - return; - } - - __UDP_INC_STATS(sock_net(udp_sk), UDP_MIB_INDATAGRAMS, 0); - /* The UDP protocol already released all skb resources; * we are free to add our own data there. */ @@ -1181,6 +1153,8 @@ void rxrpc_data_ready(struct sock *udp_sk) } } + if (skb->tstamp == 0) + skb->tstamp = ktime_get_real(); trace_rxrpc_rx_packet(sp); switch (sp->hdr.type) { @@ -1398,3 +1372,37 @@ reject_packet: rxrpc_reject_packet(local, skb); _leave(" [badmsg]"); } + +void rxrpc_data_ready(struct sock *udp_sk) +{ + struct sk_buff *skb; + int ret; + + for (;;) { + skb = skb_recv_udp(udp_sk, 0, 1, &ret); + if (!skb) { + if (ret == -EAGAIN) + return; + + /* If there was a transmission failure, we get an error + * here that we need to ignore. + */ + _debug("UDP socket error %d", ret); + continue; + } + + rxrpc_new_skb(skb, rxrpc_skb_rx_received); + + /* we'll probably need to checksum it (didn't call sock_recvmsg) */ + if (skb_checksum_complete(skb)) { + rxrpc_free_skb(skb, rxrpc_skb_rx_freed); + __UDP_INC_STATS(sock_net(udp_sk), UDP_MIB_INERRORS, 0); + _debug("csum failed"); + continue; + } + + __UDP_INC_STATS(sock_net(udp_sk), UDP_MIB_INDATAGRAMS, 0); + + rxrpc_input_packet(udp_sk, skb); + } +} diff --git a/net/rxrpc/local_object.c b/net/rxrpc/local_object.c index 94d234e9c685..30862f44c9f1 100644 --- a/net/rxrpc/local_object.c +++ b/net/rxrpc/local_object.c @@ -122,6 +122,12 @@ static int rxrpc_open_socket(struct rxrpc_local *local, struct net *net) return ret; } + /* set the socket up */ + sock = local->socket->sk; + sock->sk_user_data = local; + sock->sk_data_ready = rxrpc_data_ready; + sock->sk_error_report = rxrpc_error_report; + /* if a local address was supplied then bind it */ if (local->srx.transport_len > sizeof(sa_family_t)) { _debug("bind"); @@ -191,11 +197,6 @@ static int rxrpc_open_socket(struct rxrpc_local *local, struct net *net) BUG(); } - /* set the socket up */ - sock = local->socket->sk; - sock->sk_user_data = local; - sock->sk_data_ready = rxrpc_data_ready; - sock->sk_error_report = rxrpc_error_report; _leave(" = 0"); return 0; From 05a2f54679861deb188750ba2a70187000b2c71f Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo Date: Tue, 18 Sep 2018 16:08:02 -0300 Subject: [PATCH 056/134] perf python: Use -Wno-redundant-decls to build with PYTHON=python3 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When building in ClearLinux using 'make PYTHON=python3' with gcc 8.2.1 it fails with: GEN /tmp/build/perf/python/perf.so In file included from /usr/include/python3.7m/Python.h:126, from /git/linux/tools/perf/util/python.c:2: /usr/include/python3.7m/import.h:58:24: error: redundant redeclaration of ‘_PyImport_AddModuleObject’ [-Werror=redundant-decls] PyAPI_FUNC(PyObject *) _PyImport_AddModuleObject(PyObject *, PyObject *); ^~~~~~~~~~~~~~~~~~~~~~~~~ /usr/include/python3.7m/import.h:47:24: note: previous declaration of ‘_PyImport_AddModuleObject’ was here PyAPI_FUNC(PyObject *) _PyImport_AddModuleObject(PyObject *name, ^~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors error: command 'gcc' failed with exit status 1 And indeed there is a redundant declaration in that Python.h file, one with parameter names and the other without, so just add -Wno-error=redundant-decls to the python setup instructions. Now perf builds with gcc in ClearLinux with the following Dockerfile: # docker.io/acmel/linux-perf-tools-build-clearlinux:latest FROM docker.io/clearlinux:latest MAINTAINER Arnaldo Carvalho de Melo RUN swupd update && \ swupd bundle-add sysadmin-basic-dev RUN mkdir -m 777 -p /git /tmp/build/perf /tmp/build/objtool /tmp/build/linux && \ groupadd -r perfbuilder && \ useradd -m -r -g perfbuilder perfbuilder && \ chown -R perfbuilder.perfbuilder /tmp/build/ /git/ USER perfbuilder COPY rx_and_build.sh / ENV EXTRA_MAKE_ARGS=PYTHON=python3 ENTRYPOINT ["/rx_and_build.sh"] Now to figure out why the build fails with clang, that is present in the above container as detected by the rx_and_build.sh script: clang version 6.0.1 (tags/RELEASE_601/final) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /usr/sbin make: Entering directory '/git/linux/tools/perf' BUILD: Doing 'make -j4' parallel build HOSTCC /tmp/build/perf/fixdep.o HOSTLD /tmp/build/perf/fixdep-in.o LINK /tmp/build/perf/fixdep Auto-detecting system features: ... dwarf: [ OFF ] ... dwarf_getlocations: [ OFF ] ... glibc: [ OFF ] ... gtk2: [ OFF ] ... libaudit: [ OFF ] ... libbfd: [ OFF ] ... libelf: [ OFF ] ... libnuma: [ OFF ] ... numa_num_possible_cpus: [ OFF ] ... libperl: [ OFF ] ... libpython: [ OFF ] ... libslang: [ OFF ] ... libcrypto: [ OFF ] ... libunwind: [ OFF ] ... libdw-dwarf-unwind: [ OFF ] ... zlib: [ OFF ] ... lzma: [ OFF ] ... get_cpuid: [ OFF ] ... bpf: [ OFF ] Makefile.config:331: *** No gnu/libc-version.h found, please install glibc-dev[el]. Stop. make[1]: *** [Makefile.perf:206: sub-make] Error 2 make: *** [Makefile:70: all] Error 2 make: Leaving directory '/git/linux/tools/perf' Cc: Adrian Hunter Cc: David Ahern Cc: Jiri Olsa Cc: Namhyung Kim Cc: Thiago Macieira Cc: Wang Nan Link: https://lkml.kernel.org/n/tip-c3khb9ac86s00qxzjrueomme@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/setup.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/setup.py b/tools/perf/util/setup.py index 97efbcad076e..1942f6dd24f6 100644 --- a/tools/perf/util/setup.py +++ b/tools/perf/util/setup.py @@ -35,7 +35,7 @@ class install_lib(_install_lib): cflags = getenv('CFLAGS', '').split() # switch off several checks (need to be at the end of cflags list) -cflags += ['-fno-strict-aliasing', '-Wno-write-strings', '-Wno-unused-parameter' ] +cflags += ['-fno-strict-aliasing', '-Wno-write-strings', '-Wno-unused-parameter', '-Wno-redundant-decls' ] if cc != "clang": cflags += ['-Wno-cast-function-type' ] From 62165600ae73ebd76e2d9b992b36360408d570d8 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (VMware)" Date: Fri, 5 Oct 2018 10:08:03 -0400 Subject: [PATCH 057/134] vsprintf: Fix off-by-one bug in bstr_printf() processing dereferenced pointers The functions vbin_printf() and bstr_printf() are used by trace_printk() to try to keep the overhead down during printing. trace_printk() uses vbin_printf() at the time of execution, as it only scans the fmt string to record the printf values into the buffer, and then uses vbin_printf() to do the conversions to print the string based on the format and the saved values in the buffer. This is an issue for dereferenced pointers, as before commit 841a915d20c7b, the processing of the pointer could happen some time after the pointer value was recorded (reading the trace buffer). This means the processing of the value at a later time could show different results, or even crash the system, if the pointer no longer existed. Commit 841a915d20c7b addressed this by processing dereferenced pointers at the time of execution and save the result in the ring buffer as a string. The bstr_printf() would then treat these pointers as normal strings, and print the value. But there was an off-by-one bug here, where after processing the argument, it move the pointer only "strlen(arg)" which made the arg pointer not point to the next argument in the ring buffer, but instead point to the nul character of the last argument. This causes any values after a dereferenced pointer to be corrupted. Cc: stable@vger.kernel.org Fixes: 841a915d20c7b ("vsprintf: Do not have bprintf dereference pointers") Reported-by: Nikolay Borisov Tested-by: Nikolay Borisov Signed-off-by: Steven Rostedt (VMware) --- lib/vsprintf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/vsprintf.c b/lib/vsprintf.c index d5b3a3f95c01..812e59e13fe6 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -2794,7 +2794,7 @@ int bstr_printf(char *buf, size_t size, const char *fmt, const u32 *bin_buf) copy = end - str; memcpy(str, args, copy); str += len; - args += len; + args += len + 1; } } if (process) From 7a8a8fcf7b860e4b2d4edc787c844d41cad9dfcf Mon Sep 17 00:00:00 2001 From: Milian Wolff Date: Wed, 26 Sep 2018 15:52:06 +0200 Subject: [PATCH 058/134] perf record: Use unmapped IP for inline callchain cursors Only use the mapped IP to find inline frames, but keep using the unmapped IP for the callchain cursor. This ensures we properly show the unmapped IP when displaying a frame we received via the dso__parse_addr_inlines API for a module which does not contain sufficient debug symbols to show the srcline. This is another follow-up to commit 19610184693c ("perf script: Show virtual addresses instead of offsets"). Signed-off-by: Milian Wolff Acked-by: Jiri Olsa Tested-by: Ravi Bangoria Tested-by: Arnaldo Carvalho de Melo Cc: Jin Yao Cc: Namhyung Kim Cc: Sandipan Das Fixes: 19610184693c ("perf script: Show virtual addresses instead of offsets") Link: http://lkml.kernel.org/r/20180926135207.30263-2-milian.wolff@kdab.com Link: http://lkml.kernel.org/r/20181002073949.3297-1-milian.wolff@kdab.com [ Squashed a fix from Milian for a problem reported by Ravi, fixed up space damage ] Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/machine.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index 0cb4f8bf3ca7..111ae858cbcb 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -2286,7 +2286,8 @@ static int append_inlines(struct callchain_cursor *cursor, if (!symbol_conf.inline_name || !map || !sym) return ret; - addr = map__rip_2objdump(map, ip); + addr = map__map_ip(map, ip); + addr = map__rip_2objdump(map, addr); inline_node = inlines__tree_find(&map->dso->inlined_nodes, addr); if (!inline_node) { From 148b9aba99e0bbadf361747d21456e1589015f74 Mon Sep 17 00:00:00 2001 From: "Maciej W. Rozycki" Date: Tue, 2 Oct 2018 12:50:11 +0100 Subject: [PATCH 059/134] MIPS: memset: Fix CPU_DADDI_WORKAROUNDS `small_fixup' regression Fix a commit 8a8158c85e1e ("MIPS: memset.S: EVA & fault support for small_memset") regression and remove assembly warnings: arch/mips/lib/memset.S: Assembler messages: arch/mips/lib/memset.S:243: Warning: Macro instruction expanded into multiple instructions in a branch delay slot triggering with the CPU_DADDI_WORKAROUNDS option set and this code: PTR_SUBU a2, t1, a0 jr ra PTR_ADDIU a2, 1 This is because with that option in place the DADDIU instruction, which the PTR_ADDIU CPP macro expands to, becomes a GAS macro, which in turn expands to an LI/DADDU (or actually ADDIU/DADDU) sequence: 13c: 01a4302f dsubu a2,t1,a0 140: 03e00008 jr ra 144: 24010001 li at,1 148: 00c1302d daddu a2,a2,at ... Correct this by switching off the `noreorder' assembly mode and letting GAS schedule this jump's delay slot, as there is nothing special about it that would require manual scheduling. With this change in place correct code is produced: 13c: 01a4302f dsubu a2,t1,a0 140: 24010001 li at,1 144: 03e00008 jr ra 148: 00c1302d daddu a2,a2,at ... Signed-off-by: Maciej W. Rozycki Signed-off-by: Paul Burton Fixes: 8a8158c85e1e ("MIPS: memset.S: EVA & fault support for small_memset") Patchwork: https://patchwork.linux-mips.org/patch/20833/ Cc: Ralf Baechle Cc: stable@vger.kernel.org # 4.17+ --- arch/mips/lib/memset.S | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/mips/lib/memset.S b/arch/mips/lib/memset.S index 3a6f34ef5ffc..069acec3df9f 100644 --- a/arch/mips/lib/memset.S +++ b/arch/mips/lib/memset.S @@ -280,9 +280,11 @@ * unset_bytes = end_addr - current_addr + 1 * a2 = t1 - a0 + 1 */ + .set reorder PTR_SUBU a2, t1, a0 + PTR_ADDIU a2, 1 jr ra - PTR_ADDIU a2, 1 + .set noreorder .endm From 36d2582ff235b4e01ad64a734c877a52dc762d9c Mon Sep 17 00:00:00 2001 From: Dmitry Torokhov Date: Thu, 4 Oct 2018 17:45:54 -0700 Subject: [PATCH 060/134] Input: evdev - add a schedule point in evdev_write() Large writes to evdev interface may cause rcu stalls. Let's add cond_resched() to the loop to avoid this. Reviewed-by: Paul E. McKenney Signed-off-by: Dmitry Torokhov --- drivers/input/evdev.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/input/evdev.c b/drivers/input/evdev.c index 370206f987f9..f48369d6f3a0 100644 --- a/drivers/input/evdev.c +++ b/drivers/input/evdev.c @@ -564,6 +564,7 @@ static ssize_t evdev_write(struct file *file, const char __user *buffer, input_inject_event(&evdev->handle, event.type, event.code, event.value); + cond_resched(); } out: From cecf10704899467a787975e3d94a1f0129b9688e Mon Sep 17 00:00:00 2001 From: Dmitry Torokhov Date: Thu, 4 Oct 2018 17:50:48 -0700 Subject: [PATCH 061/134] Input: uinput - add a schedule point in uinput_inject_events() Large writes to uinput interface may cause rcu stalls. Let's add cond_resched() to the loop to avoid this. Reviewed-by: Paul E. McKenney Signed-off-by: Dmitry Torokhov --- drivers/input/misc/uinput.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/input/misc/uinput.c b/drivers/input/misc/uinput.c index eb14ddf69346..8ec483e8688b 100644 --- a/drivers/input/misc/uinput.c +++ b/drivers/input/misc/uinput.c @@ -598,6 +598,7 @@ static ssize_t uinput_inject_events(struct uinput_device *udev, input_event(udev->dev, ev.type, ev.code, ev.value); bytes += input_event_size(); + cond_resched(); } return bytes; From c58a584f05e35d1d4342923cd7aac07d9c3d3d16 Mon Sep 17 00:00:00 2001 From: Vineet Gupta Date: Fri, 5 Oct 2018 12:48:48 -0700 Subject: [PATCH 062/134] ARC: clone syscall to setp r25 as thread pointer Per ARC TLS ABI, r25 is designated TP (thread pointer register). However so far kernel didn't do any special treatment, like setting up usermode r25, even for CLONE_SETTLS. We instead relied on libc runtime to do this, in say clone libc wrapper [1]. This was deliberate to keep kernel ABI agnostic (userspace could potentially change TP, specially for different ARC ISA say ARCompact vs. ARCv2 with different spare registers etc) However userspace setting up r25, after clone syscall opens a race, if child is not scheduled and gets a signal instead. It starts off in userspace not in clone but in a signal handler and anything TP sepcific there such as pthread_self() fails which showed up with uClibc testsuite nptl/tst-kill6 [2] Fix this by having kernel populate r25 to TP value. So this locks in ABI, but it was not going to change anyways, and fwiw is same for both ARCompact (arc700 core) and ARCvs (HS3x cores) [1] https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/libc/sysdeps/linux/arc/clone.S [2] https://github.com/wbx-github/uclibc-ng-test/blob/master/test/nptl/tst-kill6.c Fixes: ARC STAR 9001378481 Cc: stable@vger.kernel.org Reported-by: Nikita Sobolev Signed-off-by: Vineet Gupta --- arch/arc/kernel/process.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c index 4674541eba3f..8ce6e7235915 100644 --- a/arch/arc/kernel/process.c +++ b/arch/arc/kernel/process.c @@ -241,6 +241,26 @@ int copy_thread(unsigned long clone_flags, task_thread_info(current)->thr_ptr; } + + /* + * setup usermode thread pointer #1: + * when child is picked by scheduler, __switch_to() uses @c_callee to + * populate usermode callee regs: this works (despite being in a kernel + * function) since special return path for child @ret_from_fork() + * ensures those regs are not clobbered all the way to RTIE to usermode + */ + c_callee->r25 = task_thread_info(p)->thr_ptr; + +#ifdef CONFIG_ARC_CURR_IN_REG + /* + * setup usermode thread pointer #2: + * however for this special use of r25 in kernel, __switch_to() sets + * r25 for kernel needs and only in the final return path is usermode + * r25 setup, from pt_regs->user_r25. So set that up as well + */ + c_regs->user_r25 = c_callee->r25; +#endif + return 0; } From 329e09893909d409039f6a79757d9b80b67efe39 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Fri, 5 Oct 2018 16:21:46 -0700 Subject: [PATCH 063/134] treewide: Replace more open-coded allocation size multiplications As done treewide earlier, this catches several more open-coded allocation size calculations that were added to the kernel during the merge window. This performs the following mechanical transformations using Coccinelle: kvmalloc(a * b, ...) -> kvmalloc_array(a, b, ...) kvzalloc(a * b, ...) -> kvcalloc(a, b, ...) devm_kzalloc(..., a * b, ...) -> devm_kcalloc(..., a, b, ...) Signed-off-by: Kees Cook --- drivers/bluetooth/hci_qca.c | 2 +- drivers/crypto/inside-secure/safexcel.c | 8 +++++--- drivers/gpu/drm/mediatek/mtk_drm_crtc.c | 2 +- drivers/gpu/drm/msm/disp/dpu1/dpu_io_util.c | 4 ++-- drivers/hwmon/npcm750-pwm-fan.c | 2 +- drivers/md/dm-integrity.c | 3 ++- drivers/net/wireless/mediatek/mt76/usb.c | 10 +++++----- drivers/pci/controller/pcie-cadence.c | 4 ++-- drivers/tty/serial/qcom_geni_serial.c | 4 ++-- net/sched/sch_cake.c | 2 +- 10 files changed, 22 insertions(+), 19 deletions(-) diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c index e182f6019f68..2fee65886d50 100644 --- a/drivers/bluetooth/hci_qca.c +++ b/drivers/bluetooth/hci_qca.c @@ -1322,7 +1322,7 @@ static int qca_init_regulators(struct qca_power *qca, { int i; - qca->vreg_bulk = devm_kzalloc(qca->dev, num_vregs * + qca->vreg_bulk = devm_kcalloc(qca->dev, num_vregs, sizeof(struct regulator_bulk_data), GFP_KERNEL); if (!qca->vreg_bulk) diff --git a/drivers/crypto/inside-secure/safexcel.c b/drivers/crypto/inside-secure/safexcel.c index 7e71043457a6..86c699c14f84 100644 --- a/drivers/crypto/inside-secure/safexcel.c +++ b/drivers/crypto/inside-secure/safexcel.c @@ -1044,7 +1044,8 @@ static int safexcel_probe(struct platform_device *pdev) safexcel_configure(priv); - priv->ring = devm_kzalloc(dev, priv->config.rings * sizeof(*priv->ring), + priv->ring = devm_kcalloc(dev, priv->config.rings, + sizeof(*priv->ring), GFP_KERNEL); if (!priv->ring) { ret = -ENOMEM; @@ -1063,8 +1064,9 @@ static int safexcel_probe(struct platform_device *pdev) if (ret) goto err_reg_clk; - priv->ring[i].rdr_req = devm_kzalloc(dev, - sizeof(priv->ring[i].rdr_req) * EIP197_DEFAULT_RING_SIZE, + priv->ring[i].rdr_req = devm_kcalloc(dev, + EIP197_DEFAULT_RING_SIZE, + sizeof(priv->ring[i].rdr_req), GFP_KERNEL); if (!priv->ring[i].rdr_req) { ret = -ENOMEM; diff --git a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c index 0b976dfd04df..92ecb9bf982c 100644 --- a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c +++ b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c @@ -600,7 +600,7 @@ int mtk_drm_crtc_create(struct drm_device *drm_dev, } mtk_crtc->layer_nr = mtk_ddp_comp_layer_nr(mtk_crtc->ddp_comp[0]); - mtk_crtc->planes = devm_kzalloc(dev, mtk_crtc->layer_nr * + mtk_crtc->planes = devm_kcalloc(dev, mtk_crtc->layer_nr, sizeof(struct drm_plane), GFP_KERNEL); diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_io_util.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_io_util.c index 790d39f816dc..b557687b1964 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_io_util.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_io_util.c @@ -153,8 +153,8 @@ int msm_dss_parse_clock(struct platform_device *pdev, return 0; } - mp->clk_config = devm_kzalloc(&pdev->dev, - sizeof(struct dss_clk) * num_clk, + mp->clk_config = devm_kcalloc(&pdev->dev, + num_clk, sizeof(struct dss_clk), GFP_KERNEL); if (!mp->clk_config) return -ENOMEM; diff --git a/drivers/hwmon/npcm750-pwm-fan.c b/drivers/hwmon/npcm750-pwm-fan.c index 8474d601aa63..b998f9fbed41 100644 --- a/drivers/hwmon/npcm750-pwm-fan.c +++ b/drivers/hwmon/npcm750-pwm-fan.c @@ -908,7 +908,7 @@ static int npcm7xx_en_pwm_fan(struct device *dev, if (fan_cnt < 1) return -EINVAL; - fan_ch = devm_kzalloc(dev, sizeof(*fan_ch) * fan_cnt, GFP_KERNEL); + fan_ch = devm_kcalloc(dev, fan_cnt, sizeof(*fan_ch), GFP_KERNEL); if (!fan_ch) return -ENOMEM; diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c index 378878599466..244ff5580f82 100644 --- a/drivers/md/dm-integrity.c +++ b/drivers/md/dm-integrity.c @@ -3462,7 +3462,8 @@ try_smaller_buffer: r = -ENOMEM; goto bad; } - ic->recalc_tags = kvmalloc((RECALC_SECTORS >> ic->sb->log2_sectors_per_block) * ic->tag_size, GFP_KERNEL); + ic->recalc_tags = kvmalloc_array(RECALC_SECTORS >> ic->sb->log2_sectors_per_block, + ic->tag_size, GFP_KERNEL); if (!ic->recalc_tags) { ti->error = "Cannot allocate tags for recalculating"; r = -ENOMEM; diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c index 7780b07543bb..79e59f2379a2 100644 --- a/drivers/net/wireless/mediatek/mt76/usb.c +++ b/drivers/net/wireless/mediatek/mt76/usb.c @@ -258,7 +258,7 @@ int mt76u_buf_alloc(struct mt76_dev *dev, struct mt76u_buf *buf, if (!buf->urb) return -ENOMEM; - buf->urb->sg = devm_kzalloc(dev->dev, nsgs * sizeof(*buf->urb->sg), + buf->urb->sg = devm_kcalloc(dev->dev, nsgs, sizeof(*buf->urb->sg), gfp); if (!buf->urb->sg) return -ENOMEM; @@ -464,8 +464,8 @@ static int mt76u_alloc_rx(struct mt76_dev *dev) int i, err, nsgs; spin_lock_init(&q->lock); - q->entry = devm_kzalloc(dev->dev, - MT_NUM_RX_ENTRIES * sizeof(*q->entry), + q->entry = devm_kcalloc(dev->dev, + MT_NUM_RX_ENTRIES, sizeof(*q->entry), GFP_KERNEL); if (!q->entry) return -ENOMEM; @@ -717,8 +717,8 @@ static int mt76u_alloc_tx(struct mt76_dev *dev) INIT_LIST_HEAD(&q->swq); q->hw_idx = q2hwq(i); - q->entry = devm_kzalloc(dev->dev, - MT_NUM_TX_ENTRIES * sizeof(*q->entry), + q->entry = devm_kcalloc(dev->dev, + MT_NUM_TX_ENTRIES, sizeof(*q->entry), GFP_KERNEL); if (!q->entry) return -ENOMEM; diff --git a/drivers/pci/controller/pcie-cadence.c b/drivers/pci/controller/pcie-cadence.c index 86f1b002c846..975bcdd6b5c0 100644 --- a/drivers/pci/controller/pcie-cadence.c +++ b/drivers/pci/controller/pcie-cadence.c @@ -180,11 +180,11 @@ int cdns_pcie_init_phy(struct device *dev, struct cdns_pcie *pcie) return 0; } - phy = devm_kzalloc(dev, sizeof(*phy) * phy_count, GFP_KERNEL); + phy = devm_kcalloc(dev, phy_count, sizeof(*phy), GFP_KERNEL); if (!phy) return -ENOMEM; - link = devm_kzalloc(dev, sizeof(*link) * phy_count, GFP_KERNEL); + link = devm_kcalloc(dev, phy_count, sizeof(*link), GFP_KERNEL); if (!link) return -ENOMEM; diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c index 29ec34387246..1515074e18fb 100644 --- a/drivers/tty/serial/qcom_geni_serial.c +++ b/drivers/tty/serial/qcom_geni_serial.c @@ -868,8 +868,8 @@ static int qcom_geni_serial_port_setup(struct uart_port *uport) geni_se_init(&port->se, port->rx_wm, port->rx_rfr); geni_se_select_mode(&port->se, port->xfer_mode); if (!uart_console(uport)) { - port->rx_fifo = devm_kzalloc(uport->dev, - port->rx_fifo_depth * sizeof(u32), GFP_KERNEL); + port->rx_fifo = devm_kcalloc(uport->dev, + port->rx_fifo_depth, sizeof(u32), GFP_KERNEL); if (!port->rx_fifo) return -ENOMEM; } diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c index c07c30b916d5..793016d722ec 100644 --- a/net/sched/sch_cake.c +++ b/net/sched/sch_cake.c @@ -2644,7 +2644,7 @@ static int cake_init(struct Qdisc *sch, struct nlattr *opt, for (i = 1; i <= CAKE_QUEUES; i++) quantum_div[i] = 65535 / i; - q->tins = kvzalloc(CAKE_MAX_TINS * sizeof(struct cake_tin_data), + q->tins = kvcalloc(CAKE_MAX_TINS, sizeof(struct cake_tin_data), GFP_KERNEL); if (!q->tins) goto nomem; From dceeb47b0ed65e14de53507a8a9c32a90831cfa1 Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Sat, 6 Oct 2018 11:44:19 +1000 Subject: [PATCH 064/134] xfs: fix data corruption w/ unaligned dedupe ranges A deduplication data corruption is Exposed by fstests generic/505 on XFS. It is caused by extending the block match range to include the partial EOF block, but then allowing unknown data beyond EOF to be considered a "match" to data in the destination file because the comparison is only made to the end of the source file. This corrupts the destination file when the source extent is shared with it. XFS only supports whole block dedupe, but we still need to appear to support whole file dedupe correctly. Hence if the dedupe request includes the last block of the souce file, don't include it in the actual XFS dedupe operation. If the rest of the range dedupes successfully, then report the partial last block as deduped, too, so that userspace sees it as a successful dedupe rather than return EINVAL because we can't dedupe unaligned blocks. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Signed-off-by: Dave Chinner --- fs/xfs/xfs_reflink.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 59da9708e9c1..f889398e25d6 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1265,6 +1265,19 @@ xfs_reflink_zero_posteof( * will have the iolock and mmaplock held, the page cache of the out file * will be truncated, and any leases on the out file will have been broken. * This function borrows heavily from xfs_file_aio_write_checks. + * + * The VFS allows partial EOF blocks to "match" for dedupe even though it hasn't + * checked that the bytes beyond EOF physically match. Hence we cannot use the + * EOF block in the source dedupe range because it's not a complete block match, + * hence can introduce a corruption into the file that has it's + * block replaced. + * + * Despite this issue, we still need to report that range as successfully + * deduped to avoid confusing userspace with EINVAL errors on completely + * matching file data. The only time that an unaligned length will be passed to + * us is when it spans the EOF block of the source file, so if we simply mask it + * down to be block aligned here the we will dedupe everything but that partial + * EOF block. */ STATIC int xfs_reflink_remap_prep( @@ -1307,6 +1320,14 @@ xfs_reflink_remap_prep( if (ret <= 0) goto out_unlock; + /* + * If the dedupe data matches, chop off the partial EOF block + * from the source file so we don't try to dedupe the partial + * EOF block. + */ + if (is_dedupe) + *len &= ~((u64)i_blocksize(inode_in) - 1); + /* Attach dquots to dest inode before changing block map */ ret = xfs_qm_dqattach(dest); if (ret) From b39989009bdb84992915c9869f58094ed5becf10 Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Sat, 6 Oct 2018 11:44:39 +1000 Subject: [PATCH 065/134] xfs: fix data corruption w/ unaligned reflink ranges When reflinking sub-file ranges, a data corruption can occur when the source file range includes a partial EOF block. This shares the unknown data beyond EOF into the second file at a position inside EOF, exposing stale data in the second file. XFS only supports whole block sharing, but we still need to support whole file reflink correctly. Hence if the reflink request includes the last block of the souce file, only proceed with the reflink operation if it lands at or past the destination file's current EOF. If it lands within the destination file EOF, reject the entire request with -EINVAL and make the caller go the hard way. This avoids the data corruption vector, but also avoids disruption of returning EINVAL to userspace for the common case of whole file cloning. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Signed-off-by: Dave Chinner --- fs/xfs/xfs_reflink.c | 47 ++++++++++++++++++++++++++++++++------------ 1 file changed, 34 insertions(+), 13 deletions(-) diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index f889398e25d6..42ea7bab9144 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1262,22 +1262,32 @@ xfs_reflink_zero_posteof( /* * Prepare two files for range cloning. Upon a successful return both inodes - * will have the iolock and mmaplock held, the page cache of the out file - * will be truncated, and any leases on the out file will have been broken. - * This function borrows heavily from xfs_file_aio_write_checks. + * will have the iolock and mmaplock held, the page cache of the out file will + * be truncated, and any leases on the out file will have been broken. This + * function borrows heavily from xfs_file_aio_write_checks. * * The VFS allows partial EOF blocks to "match" for dedupe even though it hasn't * checked that the bytes beyond EOF physically match. Hence we cannot use the * EOF block in the source dedupe range because it's not a complete block match, - * hence can introduce a corruption into the file that has it's - * block replaced. + * hence can introduce a corruption into the file that has it's block replaced. * - * Despite this issue, we still need to report that range as successfully - * deduped to avoid confusing userspace with EINVAL errors on completely - * matching file data. The only time that an unaligned length will be passed to - * us is when it spans the EOF block of the source file, so if we simply mask it - * down to be block aligned here the we will dedupe everything but that partial - * EOF block. + * In similar fashion, the VFS file cloning also allows partial EOF blocks to be + * "block aligned" for the purposes of cloning entire files. However, if the + * source file range includes the EOF block and it lands within the existing EOF + * of the destination file, then we can expose stale data from beyond the source + * file EOF in the destination file. + * + * XFS doesn't support partial block sharing, so in both cases we have check + * these cases ourselves. For dedupe, we can simply round the length to dedupe + * down to the previous whole block and ignore the partial EOF block. While this + * means we can't dedupe the last block of a file, this is an acceptible + * tradeoff for simplicity on implementation. + * + * For cloning, we want to share the partial EOF block if it is also the new EOF + * block of the destination file. If the partial EOF block lies inside the + * existing destination EOF, then we have to abort the clone to avoid exposing + * stale data in the destination file. Hence we reject these clone attempts with + * -EINVAL in this case. */ STATIC int xfs_reflink_remap_prep( @@ -1293,6 +1303,7 @@ xfs_reflink_remap_prep( struct inode *inode_out = file_inode(file_out); struct xfs_inode *dest = XFS_I(inode_out); bool same_inode = (inode_in == inode_out); + u64 blkmask = i_blocksize(inode_in) - 1; ssize_t ret; /* Lock both files against IO */ @@ -1325,8 +1336,18 @@ xfs_reflink_remap_prep( * from the source file so we don't try to dedupe the partial * EOF block. */ - if (is_dedupe) - *len &= ~((u64)i_blocksize(inode_in) - 1); + if (is_dedupe) { + *len &= ~blkmask; + } else if (*len & blkmask) { + /* + * The user is attempting to share a partial EOF block, + * if it's inside the destination EOF then reject it. + */ + if (pos_out + *len < i_size_read(inode_out)) { + ret = -EINVAL; + goto out_unlock; + } + } /* Attach dquots to dest inode before changing block map */ ret = xfs_qm_dqattach(dest); From 0238df646e6224016a45505d2c111a24669ebe21 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Sun, 7 Oct 2018 17:26:02 +0200 Subject: [PATCH 066/134] Linux 4.19-rc7 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 6c3da3e10f07..9b2df076885a 100644 --- a/Makefile +++ b/Makefile @@ -2,7 +2,7 @@ VERSION = 4 PATCHLEVEL = 19 SUBLEVEL = 0 -EXTRAVERSION = -rc6 +EXTRAVERSION = -rc7 NAME = Merciless Moray # *DOCUMENTATION* From 6685b357363bfe295e3ae73665014db4aed62c58 Mon Sep 17 00:00:00 2001 From: Mike Rapoport Date: Sun, 7 Oct 2018 11:31:51 +0300 Subject: [PATCH 067/134] percpu: stop leaking bitmap metadata blocks The commit ca460b3c9627 ("percpu: introduce bitmap metadata blocks") introduced bitmap metadata blocks. These metadata blocks are allocated whenever a new chunk is created, but they are never freed. Fix it. Fixes: ca460b3c9627 ("percpu: introduce bitmap metadata blocks") Signed-off-by: Mike Rapoport Cc: stable@vger.kernel.org Signed-off-by: Dennis Zhou --- mm/percpu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/percpu.c b/mm/percpu.c index a749d4d96e3e..4b90682623e9 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -1212,6 +1212,7 @@ static void pcpu_free_chunk(struct pcpu_chunk *chunk) { if (!chunk) return; + pcpu_mem_free(chunk->md_blocks); pcpu_mem_free(chunk->bound_map); pcpu_mem_free(chunk->alloc_map); pcpu_mem_free(chunk); From 7e823644b60555f70f241274b8d0120dd919269a Mon Sep 17 00:00:00 2001 From: Jiri Kosina Date: Thu, 4 Oct 2018 13:37:32 +0200 Subject: [PATCH 068/134] udp: Unbreak modules that rely on external __skb_recv_udp() availability Commit 2276f58ac589 ("udp: use a separate rx queue for packet reception") turned static inline __skb_recv_udp() from being a trivial helper around __skb_recv_datagram() into a UDP specific implementaion, making it EXPORT_SYMBOL_GPL() at the same time. There are external modules that got broken by __skb_recv_udp() not being visible to them. Let's unbreak them by making __skb_recv_udp EXPORT_SYMBOL(). Rationale (one of those) why this is actually "technically correct" thing to do: __skb_recv_udp() used to be an inline wrapper around __skb_recv_datagram(), which itself (still, and correctly so, I believe) is EXPORT_SYMBOL(). Cc: Paolo Abeni Cc: Eric Dumazet Fixes: 2276f58ac589 ("udp: use a separate rx queue for packet reception") Signed-off-by: Jiri Kosina Signed-off-by: David S. Miller --- net/ipv4/udp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 7d69dd6fa7e8..c32a4c16b7ff 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1627,7 +1627,7 @@ busy_check: *err = error; return NULL; } -EXPORT_SYMBOL_GPL(__skb_recv_udp); +EXPORT_SYMBOL(__skb_recv_udp); /* * This should be easy, if there is something there we From 6d4c407744dd0338da5d5d76f40dce5adabfb30a Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sun, 7 Oct 2018 07:40:17 -0400 Subject: [PATCH 069/134] net: sched: cls_u32: fix hnode refcounting cls_u32.c misuses refcounts for struct tc_u_hnode - it counts references via ->hlist and via ->tp_root together. u32_destroy() drops the former and, in case when there had been links, leaves the sucker on the list. As the result, there's nothing to protect it from getting freed once links are dropped. That also makes the "is it busy" check incapable of catching the root hnode - it *is* busy (there's a reference from tp), but we don't see it as something separate. "Is it our root?" check partially covers that, but the problem exists for others' roots as well. AFAICS, the minimal fix preserving the existing behaviour (where it doesn't include oopsen, that is) would be this: * count tp->root and tp_c->hlist as separate references. I.e. have u32_init() set refcount to 2, not 1. * in u32_destroy() we always drop the former; in u32_destroy_hnode() - the latter. That way we have *all* references contributing to refcount. List removal happens in u32_destroy_hnode() (called only when ->refcnt is 1) an in u32_destroy() in case of tc_u_common going away, along with everything reachable from it. IOW, that way we know that u32_destroy_key() won't free something still on the list (or pointed to by someone's ->root). Reproducer: tc qdisc add dev eth0 ingress tc filter add dev eth0 parent ffff: protocol ip prio 100 handle 1: \ u32 divisor 1 tc filter add dev eth0 parent ffff: protocol ip prio 200 handle 2: \ u32 divisor 1 tc filter add dev eth0 parent ffff: protocol ip prio 100 \ handle 1:0:11 u32 ht 1: link 801: offset at 0 mask 0f00 shift 6 \ plus 0 eat match ip protocol 6 ff tc filter delete dev eth0 parent ffff: protocol ip prio 200 tc filter change dev eth0 parent ffff: protocol ip prio 100 \ handle 1:0:11 u32 ht 1: link 0: offset at 0 mask 0f00 shift 6 plus 0 \ eat match ip protocol 6 ff tc filter delete dev eth0 parent ffff: protocol ip prio 100 Signed-off-by: Al Viro Signed-off-by: Jamal Hadi Salim Signed-off-by: David S. Miller --- net/sched/cls_u32.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index f218ccf1e2d9..b2c3406a2cf2 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -398,6 +398,7 @@ static int u32_init(struct tcf_proto *tp) rcu_assign_pointer(tp_c->hlist, root_ht); root_ht->tp_c = tp_c; + root_ht->refcnt++; rcu_assign_pointer(tp->root, root_ht); tp->data = tp_c; return 0; @@ -610,7 +611,7 @@ static int u32_destroy_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht, struct tc_u_hnode __rcu **hn; struct tc_u_hnode *phn; - WARN_ON(ht->refcnt); + WARN_ON(--ht->refcnt); u32_clear_hnode(tp, ht, extack); @@ -649,7 +650,7 @@ static void u32_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack) WARN_ON(root_ht == NULL); - if (root_ht && --root_ht->refcnt == 0) + if (root_ht && --root_ht->refcnt == 1) u32_destroy_hnode(tp, root_ht, extack); if (--tp_c->refcnt == 0) { @@ -698,7 +699,6 @@ static int u32_delete(struct tcf_proto *tp, void *arg, bool *last, } if (ht->refcnt == 1) { - ht->refcnt--; u32_destroy_hnode(tp, ht, extack); } else { NL_SET_ERR_MSG_MOD(extack, "Can not delete in-use filter"); @@ -708,11 +708,11 @@ static int u32_delete(struct tcf_proto *tp, void *arg, bool *last, out: *last = true; if (root_ht) { - if (root_ht->refcnt > 1) { + if (root_ht->refcnt > 2) { *last = false; goto ret; } - if (root_ht->refcnt == 1) { + if (root_ht->refcnt == 2) { if (!ht_empty(root_ht)) { *last = false; goto ret; From a21048c8ec7caf4def353b00b75bf75535deba80 Mon Sep 17 00:00:00 2001 From: Eugene Syromiatnikov Date: Sun, 7 Oct 2018 16:57:31 +0200 Subject: [PATCH 070/134] net/smc: use __aligned_u64 for 64-bit smc_diag fields Commit 4b1b7d3b30a6 ("net/smc: add SMC-D diag support") introduced new UAPI-exposed structure, struct smcd_diag_dmbinfo. However, it's not usable by compat binaries, as it has different layout there. Probably, the most straightforward fix that will avoid similar issues in the future is to use __aligned_u64 for 64-bit fields. Fixes: 4b1b7d3b30a6 ("net/smc: add SMC-D diag support") Signed-off-by: Eugene Syromiatnikov Signed-off-by: David S. Miller --- include/uapi/linux/smc_diag.h | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/include/uapi/linux/smc_diag.h b/include/uapi/linux/smc_diag.h index ac9e8c96d9bd..6180c6d95309 100644 --- a/include/uapi/linux/smc_diag.h +++ b/include/uapi/linux/smc_diag.h @@ -18,14 +18,14 @@ struct smc_diag_req { * on the internal clcsock, and more SMC-related socket data */ struct smc_diag_msg { - __u8 diag_family; - __u8 diag_state; - __u8 diag_mode; - __u8 diag_shutdown; + __u8 diag_family; + __u8 diag_state; + __u8 diag_mode; + __u8 diag_shutdown; struct inet_diag_sockid id; - __u32 diag_uid; - __u64 diag_inode; + __u32 diag_uid; + __aligned_u64 diag_inode; }; /* Mode of a connection */ @@ -99,11 +99,11 @@ struct smc_diag_fallback { }; struct smcd_diag_dmbinfo { /* SMC-D Socket internals */ - __u32 linkid; /* Link identifier */ - __u64 peer_gid; /* Peer GID */ - __u64 my_gid; /* My GID */ - __u64 token; /* Token of DMB */ - __u64 peer_token; /* Token of remote DMBE */ + __u32 linkid; /* Link identifier */ + __aligned_u64 peer_gid; /* Peer GID */ + __aligned_u64 my_gid; /* My GID */ + __aligned_u64 token; /* Token of DMB */ + __aligned_u64 peer_token; /* Token of remote DMBE */ }; #endif /* _UAPI_SMC_DIAG_H_ */ From d4f0006a08f52b5320f038780286ef312535fc64 Mon Sep 17 00:00:00 2001 From: Eugene Syromiatnikov Date: Sun, 7 Oct 2018 16:57:37 +0200 Subject: [PATCH 071/134] net/smc: retain old name for diag_mode field Commit c601171d7a60 ("net/smc: provide smc mode in smc_diag.c") changed the name of diag_fallback field of struct smc_diag_msg structure to diag_mode. However, this structure is a part of UAPI, and this change breaks user space applications that use it ([1], for example). Since the new name is more suitable, convert the field to a union that provides access to the data via both the new and the old name. [1] https://gitlab.com/strace/strace/blob/v4.24/netlink_smc_diag.c#L165 Fixes: c601171d7a60 ("net/smc: provide smc mode in smc_diag.c") Signed-off-by: Eugene Syromiatnikov Signed-off-by: David S. Miller --- include/uapi/linux/smc_diag.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/smc_diag.h b/include/uapi/linux/smc_diag.h index 6180c6d95309..8cb3a6fef553 100644 --- a/include/uapi/linux/smc_diag.h +++ b/include/uapi/linux/smc_diag.h @@ -20,7 +20,10 @@ struct smc_diag_req { struct smc_diag_msg { __u8 diag_family; __u8 diag_state; - __u8 diag_mode; + union { + __u8 diag_mode; + __u8 diag_fallback; /* the old name of the field */ + }; __u8 diag_shutdown; struct inet_diag_sockid id; From 76ebebd2464c5c8a4453c98b6dbf9c95a599e810 Mon Sep 17 00:00:00 2001 From: Mikulas Patocka Date: Fri, 17 Aug 2018 15:19:37 -0400 Subject: [PATCH 072/134] mach64: detect the dot clock divider correctly on sparc MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On Sun Ultra 5, it happens that the dot clock is not set up properly for some videomodes. For example, if we set the videomode "r1024x768x60" in the firmware, Linux would incorrectly set a videomode with refresh rate 180Hz when booting (suprisingly, my LCD monitor can display it, although display quality is very low). The reason is this: Older mach64 cards set the divider in the register VCLK_POST_DIV. The register has four 2-bit fields (the field that is actually used is specified in the lowest two bits of the register CLOCK_CNTL). The 2 bits select divider "1, 2, 4, 8". On newer mach64 cards, there's another bit added - the top four bits of PLL_EXT_CNTL extend the divider selection, so we have possible dividers "1, 2, 4, 8, 3, 5, 6, 12". The Linux driver clears the top four bits of PLL_EXT_CNTL and never sets them, so it can work regardless if the card supports them. However, the sparc64 firmware may set these extended dividers during boot - and the mach64 driver detects incorrect dot clock in this case. This patch makes the driver read the additional divider bit from PLL_EXT_CNTL and calculate the initial refresh rate properly. Signed-off-by: Mikulas Patocka Cc: stable@vger.kernel.org Acked-by: David S. Miller Reviewed-by: Ville Syrjälä Signed-off-by: David S. Miller --- drivers/video/fbdev/aty/atyfb.h | 3 ++- drivers/video/fbdev/aty/atyfb_base.c | 7 ++++--- drivers/video/fbdev/aty/mach64_ct.c | 10 +++++----- 3 files changed, 11 insertions(+), 9 deletions(-) diff --git a/drivers/video/fbdev/aty/atyfb.h b/drivers/video/fbdev/aty/atyfb.h index 8235b285dbb2..d09bab3bf224 100644 --- a/drivers/video/fbdev/aty/atyfb.h +++ b/drivers/video/fbdev/aty/atyfb.h @@ -333,6 +333,8 @@ extern const struct aty_pll_ops aty_pll_ct; /* Integrated */ extern void aty_set_pll_ct(const struct fb_info *info, const union aty_pll *pll); extern u8 aty_ld_pll_ct(int offset, const struct atyfb_par *par); +extern const u8 aty_postdividers[8]; + /* * Hardware cursor support @@ -359,7 +361,6 @@ static inline void wait_for_idle(struct atyfb_par *par) extern void aty_reset_engine(const struct atyfb_par *par); extern void aty_init_engine(struct atyfb_par *par, struct fb_info *info); -extern u8 aty_ld_pll_ct(int offset, const struct atyfb_par *par); void atyfb_copyarea(struct fb_info *info, const struct fb_copyarea *area); void atyfb_fillrect(struct fb_info *info, const struct fb_fillrect *rect); diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c index a9a8272f7a6e..05111e90f168 100644 --- a/drivers/video/fbdev/aty/atyfb_base.c +++ b/drivers/video/fbdev/aty/atyfb_base.c @@ -3087,17 +3087,18 @@ static int atyfb_setup_sparc(struct pci_dev *pdev, struct fb_info *info, /* * PLL Reference Divider M: */ - M = pll_regs[2]; + M = pll_regs[PLL_REF_DIV]; /* * PLL Feedback Divider N (Dependent on CLOCK_CNTL): */ - N = pll_regs[7 + (clock_cntl & 3)]; + N = pll_regs[VCLK0_FB_DIV + (clock_cntl & 3)]; /* * PLL Post Divider P (Dependent on CLOCK_CNTL): */ - P = 1 << (pll_regs[6] >> ((clock_cntl & 3) << 1)); + P = aty_postdividers[((pll_regs[VCLK_POST_DIV] >> ((clock_cntl & 3) << 1)) & 3) | + ((pll_regs[PLL_EXT_CNTL] >> (2 + (clock_cntl & 3))) & 4)]; /* * PLL Divider Q: diff --git a/drivers/video/fbdev/aty/mach64_ct.c b/drivers/video/fbdev/aty/mach64_ct.c index 74a62aa193c0..f87cc81f4fa2 100644 --- a/drivers/video/fbdev/aty/mach64_ct.c +++ b/drivers/video/fbdev/aty/mach64_ct.c @@ -115,7 +115,7 @@ static void aty_st_pll_ct(int offset, u8 val, const struct atyfb_par *par) */ #define Maximum_DSP_PRECISION 7 -static u8 postdividers[] = {1,2,4,8,3}; +const u8 aty_postdividers[8] = {1,2,4,8,3,5,6,12}; static int aty_dsp_gt(const struct fb_info *info, u32 bpp, struct pll_ct *pll) { @@ -222,7 +222,7 @@ static int aty_valid_pll_ct(const struct fb_info *info, u32 vclk_per, struct pll pll->vclk_post_div += (q < 64*8); pll->vclk_post_div += (q < 32*8); } - pll->vclk_post_div_real = postdividers[pll->vclk_post_div]; + pll->vclk_post_div_real = aty_postdividers[pll->vclk_post_div]; // pll->vclk_post_div <<= 6; pll->vclk_fb_div = q * pll->vclk_post_div_real / 8; pllvclk = (1000000 * 2 * pll->vclk_fb_div) / @@ -513,7 +513,7 @@ static int aty_init_pll_ct(const struct fb_info *info, union aty_pll *pll) u8 mclk_fb_div, pll_ext_cntl; pll->ct.pll_ref_div = aty_ld_pll_ct(PLL_REF_DIV, par); pll_ext_cntl = aty_ld_pll_ct(PLL_EXT_CNTL, par); - pll->ct.xclk_post_div_real = postdividers[pll_ext_cntl & 0x07]; + pll->ct.xclk_post_div_real = aty_postdividers[pll_ext_cntl & 0x07]; mclk_fb_div = aty_ld_pll_ct(MCLK_FB_DIV, par); if (pll_ext_cntl & PLL_MFB_TIMES_4_2B) mclk_fb_div <<= 1; @@ -535,7 +535,7 @@ static int aty_init_pll_ct(const struct fb_info *info, union aty_pll *pll) xpost_div += (q < 64*8); xpost_div += (q < 32*8); } - pll->ct.xclk_post_div_real = postdividers[xpost_div]; + pll->ct.xclk_post_div_real = aty_postdividers[xpost_div]; pll->ct.mclk_fb_div = q * pll->ct.xclk_post_div_real / 8; #ifdef CONFIG_PPC @@ -584,7 +584,7 @@ static int aty_init_pll_ct(const struct fb_info *info, union aty_pll *pll) mpost_div += (q < 64*8); mpost_div += (q < 32*8); } - sclk_post_div_real = postdividers[mpost_div]; + sclk_post_div_real = aty_postdividers[mpost_div]; pll->ct.sclk_fb_div = q * sclk_post_div_real / 8; pll->ct.spll_cntl2 = mpost_div << 4; #ifdef DEBUG From 0b9871a3a8cc7234c285b5d9bf66cc6712cfee7c Mon Sep 17 00:00:00 2001 From: Rob Herring Date: Tue, 28 Aug 2018 10:44:32 -0500 Subject: [PATCH 073/134] sparc: Convert to using %pOFn instead of device_node.name In preparation to remove the node name pointer from struct device_node, convert printf users to use the %pOFn format specifier. Cc: "David S. Miller" Cc: sparclinux@vger.kernel.org Signed-off-by: Rob Herring Signed-off-by: David S. Miller --- arch/sparc/kernel/auxio_64.c | 4 +-- arch/sparc/kernel/power.c | 4 +-- arch/sparc/kernel/prom_32.c | 26 +++++++------- arch/sparc/kernel/prom_64.c | 68 ++++++++++++++++++------------------ 4 files changed, 51 insertions(+), 51 deletions(-) diff --git a/arch/sparc/kernel/auxio_64.c b/arch/sparc/kernel/auxio_64.c index 4e8f56c3793c..cc42225c20f3 100644 --- a/arch/sparc/kernel/auxio_64.c +++ b/arch/sparc/kernel/auxio_64.c @@ -115,8 +115,8 @@ static int auxio_probe(struct platform_device *dev) auxio_devtype = AUXIO_TYPE_SBUS; size = 1; } else { - printk("auxio: Unknown parent bus type [%s]\n", - dp->parent->name); + printk("auxio: Unknown parent bus type [%pOFn]\n", + dp->parent); return -ENODEV; } auxio_register = of_ioremap(&dev->resource[0], 0, size, "auxio"); diff --git a/arch/sparc/kernel/power.c b/arch/sparc/kernel/power.c index 92627abce311..d941875dd718 100644 --- a/arch/sparc/kernel/power.c +++ b/arch/sparc/kernel/power.c @@ -41,8 +41,8 @@ static int power_probe(struct platform_device *op) power_reg = of_ioremap(res, 0, 0x4, "power"); - printk(KERN_INFO "%s: Control reg at %llx\n", - op->dev.of_node->name, res->start); + printk(KERN_INFO "%pOFn: Control reg at %llx\n", + op->dev.of_node, res->start); if (has_button_interrupt(irq, op->dev.of_node)) { if (request_irq(irq, diff --git a/arch/sparc/kernel/prom_32.c b/arch/sparc/kernel/prom_32.c index b51cbb9e87dc..17c87d29ff20 100644 --- a/arch/sparc/kernel/prom_32.c +++ b/arch/sparc/kernel/prom_32.c @@ -68,8 +68,8 @@ static void __init sparc32_path_component(struct device_node *dp, char *tmp_buf) return; regs = rprop->value; - sprintf(tmp_buf, "%s@%x,%x", - dp->name, + sprintf(tmp_buf, "%pOFn@%x,%x", + dp, regs->which_io, regs->phys_addr); } @@ -84,8 +84,8 @@ static void __init sbus_path_component(struct device_node *dp, char *tmp_buf) return; regs = prop->value; - sprintf(tmp_buf, "%s@%x,%x", - dp->name, + sprintf(tmp_buf, "%pOFn@%x,%x", + dp, regs->which_io, regs->phys_addr); } @@ -104,13 +104,13 @@ static void __init pci_path_component(struct device_node *dp, char *tmp_buf) regs = prop->value; devfn = (regs->phys_hi >> 8) & 0xff; if (devfn & 0x07) { - sprintf(tmp_buf, "%s@%x,%x", - dp->name, + sprintf(tmp_buf, "%pOFn@%x,%x", + dp, devfn >> 3, devfn & 0x07); } else { - sprintf(tmp_buf, "%s@%x", - dp->name, + sprintf(tmp_buf, "%pOFn@%x", + dp, devfn >> 3); } } @@ -127,8 +127,8 @@ static void __init ebus_path_component(struct device_node *dp, char *tmp_buf) regs = prop->value; - sprintf(tmp_buf, "%s@%x,%x", - dp->name, + sprintf(tmp_buf, "%pOFn@%x,%x", + dp, regs->which_io, regs->phys_addr); } @@ -167,8 +167,8 @@ static void __init ambapp_path_component(struct device_node *dp, char *tmp_buf) return; device = prop->value; - sprintf(tmp_buf, "%s:%d:%d@%x,%x", - dp->name, *vendor, *device, + sprintf(tmp_buf, "%pOFn:%d:%d@%x,%x", + dp, *vendor, *device, *intr, reg0); } @@ -201,7 +201,7 @@ char * __init build_path_component(struct device_node *dp) tmp_buf[0] = '\0'; __build_path_component(dp, tmp_buf); if (tmp_buf[0] == '\0') - strcpy(tmp_buf, dp->name); + snprintf(tmp_buf, sizeof(tmp_buf), "%pOFn", dp); n = prom_early_alloc(strlen(tmp_buf) + 1); strcpy(n, tmp_buf); diff --git a/arch/sparc/kernel/prom_64.c b/arch/sparc/kernel/prom_64.c index baeaeed64993..6220411ce8fc 100644 --- a/arch/sparc/kernel/prom_64.c +++ b/arch/sparc/kernel/prom_64.c @@ -82,8 +82,8 @@ static void __init sun4v_path_component(struct device_node *dp, char *tmp_buf) regs = rprop->value; if (!of_node_is_root(dp->parent)) { - sprintf(tmp_buf, "%s@%x,%x", - dp->name, + sprintf(tmp_buf, "%pOFn@%x,%x", + dp, (unsigned int) (regs->phys_addr >> 32UL), (unsigned int) (regs->phys_addr & 0xffffffffUL)); return; @@ -97,17 +97,17 @@ static void __init sun4v_path_component(struct device_node *dp, char *tmp_buf) const char *prefix = (type == 0) ? "m" : "i"; if (low_bits) - sprintf(tmp_buf, "%s@%s%x,%x", - dp->name, prefix, + sprintf(tmp_buf, "%pOFn@%s%x,%x", + dp, prefix, high_bits, low_bits); else - sprintf(tmp_buf, "%s@%s%x", - dp->name, + sprintf(tmp_buf, "%pOFn@%s%x", + dp, prefix, high_bits); } else if (type == 12) { - sprintf(tmp_buf, "%s@%x", - dp->name, high_bits); + sprintf(tmp_buf, "%pOFn@%x", + dp, high_bits); } } @@ -122,8 +122,8 @@ static void __init sun4u_path_component(struct device_node *dp, char *tmp_buf) regs = prop->value; if (!of_node_is_root(dp->parent)) { - sprintf(tmp_buf, "%s@%x,%x", - dp->name, + sprintf(tmp_buf, "%pOFn@%x,%x", + dp, (unsigned int) (regs->phys_addr >> 32UL), (unsigned int) (regs->phys_addr & 0xffffffffUL)); return; @@ -138,8 +138,8 @@ static void __init sun4u_path_component(struct device_node *dp, char *tmp_buf) if (tlb_type >= cheetah) mask = 0x7fffff; - sprintf(tmp_buf, "%s@%x,%x", - dp->name, + sprintf(tmp_buf, "%pOFn@%x,%x", + dp, *(u32 *)prop->value, (unsigned int) (regs->phys_addr & mask)); } @@ -156,8 +156,8 @@ static void __init sbus_path_component(struct device_node *dp, char *tmp_buf) return; regs = prop->value; - sprintf(tmp_buf, "%s@%x,%x", - dp->name, + sprintf(tmp_buf, "%pOFn@%x,%x", + dp, regs->which_io, regs->phys_addr); } @@ -176,13 +176,13 @@ static void __init pci_path_component(struct device_node *dp, char *tmp_buf) regs = prop->value; devfn = (regs->phys_hi >> 8) & 0xff; if (devfn & 0x07) { - sprintf(tmp_buf, "%s@%x,%x", - dp->name, + sprintf(tmp_buf, "%pOFn@%x,%x", + dp, devfn >> 3, devfn & 0x07); } else { - sprintf(tmp_buf, "%s@%x", - dp->name, + sprintf(tmp_buf, "%pOFn@%x", + dp, devfn >> 3); } } @@ -203,8 +203,8 @@ static void __init upa_path_component(struct device_node *dp, char *tmp_buf) if (!prop) return; - sprintf(tmp_buf, "%s@%x,%x", - dp->name, + sprintf(tmp_buf, "%pOFn@%x,%x", + dp, *(u32 *) prop->value, (unsigned int) (regs->phys_addr & 0xffffffffUL)); } @@ -221,7 +221,7 @@ static void __init vdev_path_component(struct device_node *dp, char *tmp_buf) regs = prop->value; - sprintf(tmp_buf, "%s@%x", dp->name, *regs); + sprintf(tmp_buf, "%pOFn@%x", dp, *regs); } /* "name@addrhi,addrlo" */ @@ -236,8 +236,8 @@ static void __init ebus_path_component(struct device_node *dp, char *tmp_buf) regs = prop->value; - sprintf(tmp_buf, "%s@%x,%x", - dp->name, + sprintf(tmp_buf, "%pOFn@%x,%x", + dp, (unsigned int) (regs->phys_addr >> 32UL), (unsigned int) (regs->phys_addr & 0xffffffffUL)); } @@ -257,8 +257,8 @@ static void __init i2c_path_component(struct device_node *dp, char *tmp_buf) /* This actually isn't right... should look at the #address-cells * property of the i2c bus node etc. etc. */ - sprintf(tmp_buf, "%s@%x,%x", - dp->name, regs[0], regs[1]); + sprintf(tmp_buf, "%pOFn@%x,%x", + dp, regs[0], regs[1]); } /* "name@reg0[,reg1]" */ @@ -274,11 +274,11 @@ static void __init usb_path_component(struct device_node *dp, char *tmp_buf) regs = prop->value; if (prop->length == sizeof(u32) || regs[1] == 1) { - sprintf(tmp_buf, "%s@%x", - dp->name, regs[0]); + sprintf(tmp_buf, "%pOFn@%x", + dp, regs[0]); } else { - sprintf(tmp_buf, "%s@%x,%x", - dp->name, regs[0], regs[1]); + sprintf(tmp_buf, "%pOFn@%x,%x", + dp, regs[0], regs[1]); } } @@ -295,11 +295,11 @@ static void __init ieee1394_path_component(struct device_node *dp, char *tmp_buf regs = prop->value; if (regs[2] || regs[3]) { - sprintf(tmp_buf, "%s@%08x%08x,%04x%08x", - dp->name, regs[0], regs[1], regs[2], regs[3]); + sprintf(tmp_buf, "%pOFn@%08x%08x,%04x%08x", + dp, regs[0], regs[1], regs[2], regs[3]); } else { - sprintf(tmp_buf, "%s@%08x%08x", - dp->name, regs[0], regs[1]); + sprintf(tmp_buf, "%pOFn@%08x%08x", + dp, regs[0], regs[1]); } } @@ -361,7 +361,7 @@ char * __init build_path_component(struct device_node *dp) tmp_buf[0] = '\0'; __build_path_component(dp, tmp_buf); if (tmp_buf[0] == '\0') - strcpy(tmp_buf, dp->name); + snprintf(tmp_buf, sizeof(tmp_buf), "%pOFn", dp); n = prom_early_alloc(strlen(tmp_buf) + 1); strcpy(n, tmp_buf); From df58f37b5d8f883fb18ba85e8d9a624154879832 Mon Sep 17 00:00:00 2001 From: Rob Herring Date: Wed, 29 Aug 2018 15:03:37 -0500 Subject: [PATCH 074/134] sbus: Use of_get_child_by_name helper Use the of_get_child_by_name() helper instead of open coding searching for the '/options' node. This removes directly accessing the name pointer as well. Cc: "David S. Miller" Cc: sparclinux@vger.kernel.org Signed-off-by: Rob Herring Signed-off-by: David S. Miller --- drivers/sbus/char/openprom.c | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-) diff --git a/drivers/sbus/char/openprom.c b/drivers/sbus/char/openprom.c index 7b31f19ade83..050879a2ddef 100644 --- a/drivers/sbus/char/openprom.c +++ b/drivers/sbus/char/openprom.c @@ -715,22 +715,13 @@ static struct miscdevice openprom_dev = { static int __init openprom_init(void) { - struct device_node *dp; int err; err = misc_register(&openprom_dev); if (err) return err; - dp = of_find_node_by_path("/"); - dp = dp->child; - while (dp) { - if (!strcmp(dp->name, "options")) - break; - dp = dp->sibling; - } - options_node = dp; - + options_node = of_get_child_by_name(of_find_node_by_path("/"), "options"); if (!options_node) { misc_deregister(&openprom_dev); return -EIO; From 31a43fa7945a1de8c550b35c211be5670e32f12b Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Wed, 5 Sep 2018 15:03:51 -0700 Subject: [PATCH 075/134] sparc64: viohs: Remove VLA usage In the quest to remove all stack VLA usage from the kernel[1], this allocates a fixed size array for the maximum number of cookies and adds a runtime sanity check. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1 RqZWA@mail.gmail.com Signed-off-by: Kees Cook Signed-off-by: David S. Miller --- arch/sparc/kernel/viohs.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/arch/sparc/kernel/viohs.c b/arch/sparc/kernel/viohs.c index 635d67ffc9a3..7db5aabe9708 100644 --- a/arch/sparc/kernel/viohs.c +++ b/arch/sparc/kernel/viohs.c @@ -180,11 +180,17 @@ static int send_dreg(struct vio_driver_state *vio) struct vio_dring_register pkt; char all[sizeof(struct vio_dring_register) + (sizeof(struct ldc_trans_cookie) * - dr->ncookies)]; + VIO_MAX_RING_COOKIES)]; } u; + size_t bytes = sizeof(struct vio_dring_register) + + (sizeof(struct ldc_trans_cookie) * + dr->ncookies); int i; - memset(&u, 0, sizeof(u)); + if (WARN_ON(bytes > sizeof(u))) + return -EINVAL; + + memset(&u, 0, bytes); init_tag(&u.pkt.tag, VIO_TYPE_CTRL, VIO_SUBTYPE_INFO, VIO_DRING_REG); u.pkt.dring_ident = 0; u.pkt.num_descr = dr->num_entries; @@ -206,7 +212,7 @@ static int send_dreg(struct vio_driver_state *vio) (unsigned long long) u.pkt.cookies[i].cookie_size); } - return send_ctrl(vio, &u.pkt.tag, sizeof(u)); + return send_ctrl(vio, &u.pkt.tag, bytes); } static int send_rdx(struct vio_driver_state *vio) From 16e2a9d396c140d9a47d6b22e2a631f36c31e844 Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Fri, 7 Sep 2018 11:35:00 +0100 Subject: [PATCH 076/134] oradax: remove redundant null check before kfree A null check before a kfree is redundant, so remove it. Signed-off-by: Colin Ian King Signed-off-by: David S. Miller --- drivers/sbus/char/oradax.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/sbus/char/oradax.c b/drivers/sbus/char/oradax.c index 524f9ea62e52..6516bc3cb58b 100644 --- a/drivers/sbus/char/oradax.c +++ b/drivers/sbus/char/oradax.c @@ -689,8 +689,7 @@ static int dax_open(struct inode *inode, struct file *f) alloc_error: kfree(ctx->ccb_buf); done: - if (ctx != NULL) - kfree(ctx); + kfree(ctx); return -ENOMEM; } From 8cf7765d33ae894cd7722502fbb737efad0eaa9b Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Wed, 12 Sep 2018 12:39:13 +0900 Subject: [PATCH 077/134] sparc: vdso: clean-up vdso Makefile arch/sparc/vdso/Makefile is a replica of arch/x86/entry/vdso/Makefile. Clean-up the Makefile in the same way as I did for x86: - Remove unnecessary export - Put the generated linker script to $(obj)/ instead of $(src)/ - Simplify cmd_vdso2c The corresponding x86 commits are: - 61615faf0a89 ("x86/build/vdso: Remove unnecessary export in Makefile") - 1742ed2088cc ("x86/build/vdso: Put generated linker scripts to $(obj)/") - c5fcdbf15523 ("x86/build/vdso: Simplify 'cmd_vdso2c'") Signed-off-by: Masahiro Yamada Signed-off-by: David S. Miller --- arch/sparc/vdso/Makefile | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/arch/sparc/vdso/Makefile b/arch/sparc/vdso/Makefile index dd0b5a92ffd0..dc85570d8839 100644 --- a/arch/sparc/vdso/Makefile +++ b/arch/sparc/vdso/Makefile @@ -31,23 +31,21 @@ obj-y += $(vdso_img_objs) targets += $(vdso_img_cfiles) targets += $(vdso_img_sodbg) $(vdso_img-y:%=vdso%.so) -export CPPFLAGS_vdso.lds += -P -C +CPPFLAGS_vdso.lds += -P -C VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \ -Wl,--no-undefined \ -Wl,-z,max-page-size=8192 -Wl,-z,common-page-size=8192 \ $(DISABLE_LTO) -$(obj)/vdso64.so.dbg: $(src)/vdso.lds $(vobjs) FORCE +$(obj)/vdso64.so.dbg: $(obj)/vdso.lds $(vobjs) FORCE $(call if_changed,vdso) HOST_EXTRACFLAGS += -I$(srctree)/tools/include hostprogs-y += vdso2c quiet_cmd_vdso2c = VDSO2C $@ -define cmd_vdso2c - $(obj)/vdso2c $< $(<:%.dbg=%) $@ -endef + cmd_vdso2c = $(obj)/vdso2c $< $(<:%.dbg=%) $@ $(obj)/vdso-image-%.c: $(obj)/vdso%.so.dbg $(obj)/vdso%.so $(obj)/vdso2c FORCE $(call if_changed,vdso2c) From c4beb225f93a2fdaffbe2a610eccb0f6c86b3e45 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Tue, 2 Oct 2018 12:15:17 +0200 Subject: [PATCH 078/134] sparc32: fix fall-through annotation Replace "fallthru" with a proper "fall through" annotation. This fix is part of the ongoing efforts to enabling -Wimplicit-fallthrough Signed-off-by: Gustavo A. R. Silva Signed-off-by: David S. Miller --- arch/sparc/kernel/kgdb_32.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/sparc/kernel/kgdb_32.c b/arch/sparc/kernel/kgdb_32.c index 5868fc333ea8..639c8e54530a 100644 --- a/arch/sparc/kernel/kgdb_32.c +++ b/arch/sparc/kernel/kgdb_32.c @@ -122,7 +122,7 @@ int kgdb_arch_handle_exception(int e_vector, int signo, int err_code, linux_regs->pc = addr; linux_regs->npc = addr + 4; } - /* fallthru */ + /* fall through */ case 'D': case 'k': From b7dc10b64f6190a008f05baf697d4d8fa9b8ed51 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Tue, 2 Oct 2018 12:19:54 +0200 Subject: [PATCH 079/134] sparc64: fix fall-through annotation Replace "fallthru" with a proper "fall through" annotation. This fix is part of the ongoing efforts to enabling -Wimplicit-fallthrough Signed-off-by: Gustavo A. R. Silva Signed-off-by: David S. Miller --- arch/sparc/kernel/kgdb_64.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/sparc/kernel/kgdb_64.c b/arch/sparc/kernel/kgdb_64.c index d5f7dc6323d5..a68bbddbdba4 100644 --- a/arch/sparc/kernel/kgdb_64.c +++ b/arch/sparc/kernel/kgdb_64.c @@ -148,7 +148,7 @@ int kgdb_arch_handle_exception(int e_vector, int signo, int err_code, linux_regs->tpc = addr; linux_regs->tnpc = addr + 4; } - /* fallthru */ + /* fall through */ case 'D': case 'k': From 5271953cad31b97dea80f848c16e96ad66401199 Mon Sep 17 00:00:00 2001 From: David Howells Date: Thu, 4 Oct 2018 11:10:51 +0100 Subject: [PATCH 080/134] rxrpc: Use the UDP encap_rcv hook Use the UDP encap_rcv hook to cut the bit out of the rxrpc packet reception in which a packet is placed onto the UDP receive queue and then immediately removed again by rxrpc. Going via the queue in this manner seems like it should be unnecessary. This does, however, require the invention of a value to place in encap_type as that's one of the conditions to switch packets out to the encap_rcv hook. Possibly the value doesn't actually matter for anything other than sockopts on the UDP socket, which aren't accessible outside of rxrpc anyway. This seems to cut a bit of time out of the time elapsed between each sk_buff being timestamped and turning up in rxrpc (the final number in the following trace excerpts). I measured this by making the rxrpc_rx_packet trace point print the time elapsed between the skb being timestamped and the current time (in ns), e.g.: ... 424.278721: rxrpc_rx_packet: ... ACK 25026 So doing a 512MiB DIO read from my test server, with an unmodified kernel: N min max sum mean stddev 27605 2626 7581 7.83992e+07 2840.04 181.029 and with the patch applied: N min max sum mean stddev 27547 1895 12165 6.77461e+07 2459.29 255.02 Signed-off-by: David Howells --- include/uapi/linux/udp.h | 1 + net/rxrpc/ar-internal.h | 2 +- net/rxrpc/input.c | 50 ++++++++++------------------------------ net/rxrpc/local_object.c | 27 ++++++++++++++++++---- 4 files changed, 36 insertions(+), 44 deletions(-) diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h index 09d00f8c442b..09502de447f5 100644 --- a/include/uapi/linux/udp.h +++ b/include/uapi/linux/udp.h @@ -40,5 +40,6 @@ struct udphdr { #define UDP_ENCAP_L2TPINUDP 3 /* rfc2661 */ #define UDP_ENCAP_GTP0 4 /* GSM TS 09.60 */ #define UDP_ENCAP_GTP1U 5 /* 3GPP TS 29.060 */ +#define UDP_ENCAP_RXRPC 6 #endif /* _UAPI_LINUX_UDP_H */ diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index 63c43b3a2096..ab60c0313fd4 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -966,7 +966,7 @@ void rxrpc_unpublish_service_conn(struct rxrpc_connection *); /* * input.c */ -void rxrpc_data_ready(struct sock *); +int rxrpc_input_packet(struct sock *, struct sk_buff *); /* * insecure.c diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index c3114fa66c92..1866aeef2284 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -1121,7 +1121,7 @@ int rxrpc_extract_header(struct rxrpc_skb_priv *sp, struct sk_buff *skb) * shut down and the local endpoint from going away, thus sk_user_data will not * be cleared until this function returns. */ -void rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb) +int rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb) { struct rxrpc_connection *conn; struct rxrpc_channel *chan; @@ -1135,6 +1135,13 @@ void rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb) _enter("%p", udp_sk); + if (skb->tstamp == 0) + skb->tstamp = ktime_get_real(); + + rxrpc_new_skb(skb, rxrpc_skb_rx_received); + + skb_pull(skb, sizeof(struct udphdr)); + /* The UDP protocol already released all skb resources; * we are free to add our own data there. */ @@ -1148,8 +1155,8 @@ void rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb) static int lose; if ((lose++ & 7) == 7) { trace_rxrpc_rx_lose(sp); - rxrpc_lose_skb(skb, rxrpc_skb_rx_lost); - return; + rxrpc_free_skb(skb, rxrpc_skb_rx_lost); + return 0; } } @@ -1332,7 +1339,7 @@ discard: rxrpc_free_skb(skb, rxrpc_skb_rx_freed); out: trace_rxrpc_rx_done(0, 0); - return; + return 0; out_unlock: rcu_read_unlock(); @@ -1371,38 +1378,5 @@ reject_packet: trace_rxrpc_rx_done(skb->mark, skb->priority); rxrpc_reject_packet(local, skb); _leave(" [badmsg]"); -} - -void rxrpc_data_ready(struct sock *udp_sk) -{ - struct sk_buff *skb; - int ret; - - for (;;) { - skb = skb_recv_udp(udp_sk, 0, 1, &ret); - if (!skb) { - if (ret == -EAGAIN) - return; - - /* If there was a transmission failure, we get an error - * here that we need to ignore. - */ - _debug("UDP socket error %d", ret); - continue; - } - - rxrpc_new_skb(skb, rxrpc_skb_rx_received); - - /* we'll probably need to checksum it (didn't call sock_recvmsg) */ - if (skb_checksum_complete(skb)) { - rxrpc_free_skb(skb, rxrpc_skb_rx_freed); - __UDP_INC_STATS(sock_net(udp_sk), UDP_MIB_INERRORS, 0); - _debug("csum failed"); - continue; - } - - __UDP_INC_STATS(sock_net(udp_sk), UDP_MIB_INDATAGRAMS, 0); - - rxrpc_input_packet(udp_sk, skb); - } + return 0; } diff --git a/net/rxrpc/local_object.c b/net/rxrpc/local_object.c index 30862f44c9f1..cad0691c2bb4 100644 --- a/net/rxrpc/local_object.c +++ b/net/rxrpc/local_object.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include "ar-internal.h" @@ -108,7 +109,7 @@ static struct rxrpc_local *rxrpc_alloc_local(struct rxrpc_net *rxnet, */ static int rxrpc_open_socket(struct rxrpc_local *local, struct net *net) { - struct sock *sock; + struct sock *usk; int ret, opt; _enter("%p{%d,%d}", @@ -123,10 +124,26 @@ static int rxrpc_open_socket(struct rxrpc_local *local, struct net *net) } /* set the socket up */ - sock = local->socket->sk; - sock->sk_user_data = local; - sock->sk_data_ready = rxrpc_data_ready; - sock->sk_error_report = rxrpc_error_report; + usk = local->socket->sk; + inet_sk(usk)->mc_loop = 0; + + /* Enable CHECKSUM_UNNECESSARY to CHECKSUM_COMPLETE conversion */ + inet_inc_convert_csum(usk); + + rcu_assign_sk_user_data(usk, local); + + udp_sk(usk)->encap_type = UDP_ENCAP_RXRPC; + udp_sk(usk)->encap_rcv = rxrpc_input_packet; + udp_sk(usk)->encap_destroy = NULL; + udp_sk(usk)->gro_receive = NULL; + udp_sk(usk)->gro_complete = NULL; + + udp_encap_enable(); +#if IS_ENABLED(CONFIG_IPV6) + if (local->srx.transport.family == AF_INET6) + udpv6_encap_enable(); +#endif + usk->sk_error_report = rxrpc_error_report; /* if a local address was supplied then bind it */ if (local->srx.transport_len > sizeof(sa_family_t)) { From bfd2821117a751763718f1b5e57216c0d9b19a49 Mon Sep 17 00:00:00 2001 From: David Howells Date: Mon, 8 Oct 2018 15:45:56 +0100 Subject: [PATCH 081/134] rxrpc: Don't need to take the RCU read lock in the packet receiver We don't need to take the RCU read lock in the rxrpc packet receive function because it's held further up the stack in the IP input routine around the UDP receive routines. Fix this by dropping the RCU read lock calls from rxrpc_input_packet(). This simplifies the code. Fixes: 70790dbe3f66 ("rxrpc: Pass the last Tx packet marker in the annotation buffer") Signed-off-by: David Howells --- net/rxrpc/input.c | 41 +++++++++++++---------------------------- 1 file changed, 13 insertions(+), 28 deletions(-) diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index 1866aeef2284..2dcef482acf2 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -1120,6 +1120,8 @@ int rxrpc_extract_header(struct rxrpc_skb_priv *sp, struct sk_buff *skb) * The socket is locked by the caller and this prevents the socket from being * shut down and the local endpoint from going away, thus sk_user_data will not * be cleared until this function returns. + * + * Called with the RCU read lock held from the IP layer via UDP. */ int rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb) { @@ -1215,8 +1217,6 @@ int rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb) if (sp->hdr.serviceId == 0) goto bad_message; - rcu_read_lock(); - if (rxrpc_to_server(sp)) { /* Weed out packets to services we're not offering. Packets * that would begin a call are explicitly rejected and the rest @@ -1228,7 +1228,7 @@ int rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb) if (sp->hdr.type == RXRPC_PACKET_TYPE_DATA && sp->hdr.seq == 1) goto unsupported_service; - goto discard_unlock; + goto discard; } } @@ -1248,7 +1248,7 @@ int rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb) /* Connection-level packet */ _debug("CONN %p {%d}", conn, conn->debug_id); rxrpc_post_packet_to_conn(conn, skb); - goto out_unlock; + goto out; } /* Note the serial number skew here */ @@ -1267,19 +1267,19 @@ int rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb) /* Ignore really old calls */ if (sp->hdr.callNumber < chan->last_call) - goto discard_unlock; + goto discard; if (sp->hdr.callNumber == chan->last_call) { if (chan->call || sp->hdr.type == RXRPC_PACKET_TYPE_ABORT) - goto discard_unlock; + goto discard; /* For the previous service call, if completed * successfully, we discard all further packets. */ if (rxrpc_conn_is_service(conn) && chan->last_type == RXRPC_PACKET_TYPE_ACK) - goto discard_unlock; + goto discard; /* But otherwise we need to retransmit the final packet * from data cached in the connection record. @@ -1290,16 +1290,14 @@ int rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb) sp->hdr.serial, sp->hdr.flags, 0); rxrpc_post_packet_to_conn(conn, skb); - goto out_unlock; + goto out; } call = rcu_dereference(chan->call); if (sp->hdr.callNumber > chan->call_id) { - if (rxrpc_to_client(sp)) { - rcu_read_unlock(); + if (rxrpc_to_client(sp)) goto reject_packet; - } if (call) rxrpc_input_implicit_end_call(conn, call); call = NULL; @@ -1318,55 +1316,42 @@ int rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb) if (!call || atomic_read(&call->usage) == 0) { if (rxrpc_to_client(sp) || sp->hdr.type != RXRPC_PACKET_TYPE_DATA) - goto bad_message_unlock; + goto bad_message; if (sp->hdr.seq != 1) - goto discard_unlock; + goto discard; call = rxrpc_new_incoming_call(local, rx, peer, conn, skb); - if (!call) { - rcu_read_unlock(); + if (!call) goto reject_packet; - } rxrpc_send_ping(call, skb, skew); mutex_unlock(&call->user_mutex); } rxrpc_input_call_packet(call, skb, skew); - goto discard_unlock; + goto discard; -discard_unlock: - rcu_read_unlock(); discard: rxrpc_free_skb(skb, rxrpc_skb_rx_freed); out: trace_rxrpc_rx_done(0, 0); return 0; -out_unlock: - rcu_read_unlock(); - goto out; - wrong_security: - rcu_read_unlock(); trace_rxrpc_abort(0, "SEC", sp->hdr.cid, sp->hdr.callNumber, sp->hdr.seq, RXKADINCONSISTENCY, EBADMSG); skb->priority = RXKADINCONSISTENCY; goto post_abort; unsupported_service: - rcu_read_unlock(); trace_rxrpc_abort(0, "INV", sp->hdr.cid, sp->hdr.callNumber, sp->hdr.seq, RX_INVALID_OPERATION, EOPNOTSUPP); skb->priority = RX_INVALID_OPERATION; goto post_abort; reupgrade: - rcu_read_unlock(); trace_rxrpc_abort(0, "UPG", sp->hdr.cid, sp->hdr.callNumber, sp->hdr.seq, RX_PROTOCOL_ERROR, EBADMSG); goto protocol_error; -bad_message_unlock: - rcu_read_unlock(); bad_message: trace_rxrpc_abort(0, "BAD", sp->hdr.cid, sp->hdr.callNumber, sp->hdr.seq, RX_PROTOCOL_ERROR, EBADMSG); From c479d5f2c2e1ce609da08c075054440d97ddff52 Mon Sep 17 00:00:00 2001 From: David Howells Date: Mon, 8 Oct 2018 15:46:01 +0100 Subject: [PATCH 082/134] rxrpc: Don't check RXRPC_CALL_TX_LAST after calling rxrpc_rotate_tx_window() We should only call the function to end a call's Tx phase if we rotated the marked-last packet out of the transmission buffer. Make rxrpc_rotate_tx_window() return an indication of whether it just rotated the packet marked as the last out of the transmit buffer, carrying the information out of the locked section in that function. We can then check the return value instead of examining RXRPC_CALL_TX_LAST. Fixes: 70790dbe3f66 ("rxrpc: Pass the last Tx packet marker in the annotation buffer") Signed-off-by: David Howells --- net/rxrpc/input.c | 35 +++++++++++++++++++---------------- 1 file changed, 19 insertions(+), 16 deletions(-) diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index 2dcef482acf2..8834aa627371 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -216,10 +216,11 @@ static void rxrpc_send_ping(struct rxrpc_call *call, struct sk_buff *skb, /* * Apply a hard ACK by advancing the Tx window. */ -static void rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to, +static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to, struct rxrpc_ack_summary *summary) { struct sk_buff *skb, *list = NULL; + bool rot_last = false; int ix; u8 annotation; @@ -243,15 +244,17 @@ static void rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to, skb->next = list; list = skb; - if (annotation & RXRPC_TX_ANNO_LAST) + if (annotation & RXRPC_TX_ANNO_LAST) { set_bit(RXRPC_CALL_TX_LAST, &call->flags); + rot_last = true; + } if ((annotation & RXRPC_TX_ANNO_MASK) != RXRPC_TX_ANNO_ACK) summary->nr_rot_new_acks++; } spin_unlock(&call->lock); - trace_rxrpc_transmit(call, (test_bit(RXRPC_CALL_TX_LAST, &call->flags) ? + trace_rxrpc_transmit(call, (rot_last ? rxrpc_transmit_rotate_last : rxrpc_transmit_rotate)); wake_up(&call->waitq); @@ -262,6 +265,8 @@ static void rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to, skb->next = NULL; rxrpc_free_skb(skb, rxrpc_skb_tx_freed); } + + return rot_last; } /* @@ -332,11 +337,11 @@ static bool rxrpc_receiving_reply(struct rxrpc_call *call) trace_rxrpc_timer(call, rxrpc_timer_init_for_reply, now); } - if (!test_bit(RXRPC_CALL_TX_LAST, &call->flags)) - rxrpc_rotate_tx_window(call, top, &summary); if (!test_bit(RXRPC_CALL_TX_LAST, &call->flags)) { - rxrpc_proto_abort("TXL", call, top); - return false; + if (!rxrpc_rotate_tx_window(call, top, &summary)) { + rxrpc_proto_abort("TXL", call, top); + return false; + } } if (!rxrpc_end_tx_phase(call, true, "ETD")) return false; @@ -897,8 +902,12 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb, if (nr_acks > call->tx_top - hard_ack) return rxrpc_proto_abort("AKN", call, 0); - if (after(hard_ack, call->tx_hard_ack)) - rxrpc_rotate_tx_window(call, hard_ack, &summary); + if (after(hard_ack, call->tx_hard_ack)) { + if (rxrpc_rotate_tx_window(call, hard_ack, &summary)) { + rxrpc_end_tx_phase(call, false, "ETA"); + return; + } + } if (nr_acks > 0) { if (skb_copy_bits(skb, offset, buf.acks, nr_acks) < 0) @@ -907,11 +916,6 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb, &summary); } - if (test_bit(RXRPC_CALL_TX_LAST, &call->flags)) { - rxrpc_end_tx_phase(call, false, "ETA"); - return; - } - if (call->rxtx_annotations[call->tx_top & RXRPC_RXTX_BUFF_MASK] & RXRPC_TX_ANNO_LAST && summary.nr_acks == call->tx_top - hard_ack && @@ -933,8 +937,7 @@ static void rxrpc_input_ackall(struct rxrpc_call *call, struct sk_buff *skb) _proto("Rx ACKALL %%%u", sp->hdr.serial); - rxrpc_rotate_tx_window(call, call->tx_top, &summary); - if (test_bit(RXRPC_CALL_TX_LAST, &call->flags)) + if (rxrpc_rotate_tx_window(call, call->tx_top, &summary)) rxrpc_end_tx_phase(call, false, "ETL"); } From dfe995224693798e554ab4770f6d8a096afc60cd Mon Sep 17 00:00:00 2001 From: David Howells Date: Mon, 8 Oct 2018 15:46:05 +0100 Subject: [PATCH 083/134] rxrpc: Carry call state out of locked section in rxrpc_rotate_tx_window() Carry the call state out of the locked section in rxrpc_rotate_tx_window() rather than sampling it afterwards. This is only used to select tracepoint data, but could have changed by the time we do the tracepoint. Signed-off-by: David Howells --- net/rxrpc/input.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index 8834aa627371..af8ce64f4162 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -278,23 +278,26 @@ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to, static bool rxrpc_end_tx_phase(struct rxrpc_call *call, bool reply_begun, const char *abort_why) { + unsigned int state; ASSERT(test_bit(RXRPC_CALL_TX_LAST, &call->flags)); write_lock(&call->state_lock); - switch (call->state) { + state = call->state; + switch (state) { case RXRPC_CALL_CLIENT_SEND_REQUEST: case RXRPC_CALL_CLIENT_AWAIT_REPLY: if (reply_begun) - call->state = RXRPC_CALL_CLIENT_RECV_REPLY; + call->state = state = RXRPC_CALL_CLIENT_RECV_REPLY; else - call->state = RXRPC_CALL_CLIENT_AWAIT_REPLY; + call->state = state = RXRPC_CALL_CLIENT_AWAIT_REPLY; break; case RXRPC_CALL_SERVER_AWAIT_ACK: __rxrpc_call_completed(call); rxrpc_notify_socket(call); + state = call->state; break; default: @@ -302,11 +305,10 @@ static bool rxrpc_end_tx_phase(struct rxrpc_call *call, bool reply_begun, } write_unlock(&call->state_lock); - if (call->state == RXRPC_CALL_CLIENT_AWAIT_REPLY) { + if (state == RXRPC_CALL_CLIENT_AWAIT_REPLY) trace_rxrpc_transmit(call, rxrpc_transmit_await_reply); - } else { + else trace_rxrpc_transmit(call, rxrpc_transmit_end); - } _leave(" = ok"); return true; From 298bc15b2079c324e82d0a6fda39c3d762af7282 Mon Sep 17 00:00:00 2001 From: David Howells Date: Mon, 8 Oct 2018 15:46:11 +0100 Subject: [PATCH 084/134] rxrpc: Only take the rwind and mtu values from latest ACK Move the out-of-order and duplicate ACK packet check to before the call to rxrpc_input_ackinfo() so that the receive window size and MTU size are only checked in the latest ACK packet and don't regress. Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Signed-off-by: David Howells --- net/rxrpc/input.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index af8ce64f4162..04213a65c1ac 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -868,6 +868,16 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb, rxrpc_propose_ack_respond_to_ack); } + /* Discard any out-of-order or duplicate ACKs. */ + if (before_eq(sp->hdr.serial, call->acks_latest)) { + _debug("discard ACK %d <= %d", + sp->hdr.serial, call->acks_latest); + return; + } + call->acks_latest_ts = skb->tstamp; + call->acks_latest = sp->hdr.serial; + + /* Parse rwind and mtu sizes if provided. */ ioffset = offset + nr_acks + 3; if (skb->len >= ioffset + sizeof(buf.info)) { if (skb_copy_bits(skb, ioffset, &buf.info, sizeof(buf.info)) < 0) @@ -889,15 +899,6 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb, return; } - /* Discard any out-of-order or duplicate ACKs. */ - if (before_eq(sp->hdr.serial, call->acks_latest)) { - _debug("discard ACK %d <= %d", - sp->hdr.serial, call->acks_latest); - return; - } - call->acks_latest_ts = skb->tstamp; - call->acks_latest = sp->hdr.serial; - if (before(hard_ack, call->tx_hard_ack) || after(hard_ack, call->tx_top)) return rxrpc_proto_abort("AKW", call, 0); From 647530924f47c93db472ee3cf43b7ef1425581b6 Mon Sep 17 00:00:00 2001 From: David Howells Date: Mon, 8 Oct 2018 15:46:17 +0100 Subject: [PATCH 085/134] rxrpc: Fix connection-level abort handling Fix connection-level abort handling to cache the abort and error codes properly so that a new incoming call can be properly aborted if it races with the parent connection being aborted by another CPU. The abort_code and error parameters can then be dropped from rxrpc_abort_calls(). Fixes: f5c17aaeb2ae ("rxrpc: Calls should only have one terminal state") Signed-off-by: David Howells --- net/rxrpc/ar-internal.h | 4 ++-- net/rxrpc/call_accept.c | 4 ++-- net/rxrpc/conn_event.c | 26 +++++++++++++++----------- 3 files changed, 19 insertions(+), 15 deletions(-) diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index ab60c0313fd4..45307463b7dd 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -442,8 +442,7 @@ struct rxrpc_connection { spinlock_t state_lock; /* state-change lock */ enum rxrpc_conn_cache_state cache_state; enum rxrpc_conn_proto_state state; /* current state of connection */ - u32 local_abort; /* local abort code */ - u32 remote_abort; /* remote abort code */ + u32 abort_code; /* Abort code of connection abort */ int debug_id; /* debug ID for printks */ atomic_t serial; /* packet serial number counter */ unsigned int hi_serial; /* highest serial number received */ @@ -453,6 +452,7 @@ struct rxrpc_connection { u8 security_size; /* security header size */ u8 security_ix; /* security type */ u8 out_clientflag; /* RXRPC_CLIENT_INITIATED if we are client */ + short error; /* Local error code */ }; static inline bool rxrpc_to_server(const struct rxrpc_skb_priv *sp) diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c index f55f67894465..1c4ebc0cb25b 100644 --- a/net/rxrpc/call_accept.c +++ b/net/rxrpc/call_accept.c @@ -405,11 +405,11 @@ struct rxrpc_call *rxrpc_new_incoming_call(struct rxrpc_local *local, case RXRPC_CONN_REMOTELY_ABORTED: rxrpc_set_call_completion(call, RXRPC_CALL_REMOTELY_ABORTED, - conn->remote_abort, -ECONNABORTED); + conn->abort_code, conn->error); break; case RXRPC_CONN_LOCALLY_ABORTED: rxrpc_abort_call("CON", call, sp->hdr.seq, - conn->local_abort, -ECONNABORTED); + conn->abort_code, conn->error); break; default: BUG(); diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c index 6df56ce68861..b6fca8ebb117 100644 --- a/net/rxrpc/conn_event.c +++ b/net/rxrpc/conn_event.c @@ -126,7 +126,7 @@ static void rxrpc_conn_retransmit_call(struct rxrpc_connection *conn, switch (chan->last_type) { case RXRPC_PACKET_TYPE_ABORT: - _proto("Tx ABORT %%%u { %d } [re]", serial, conn->local_abort); + _proto("Tx ABORT %%%u { %d } [re]", serial, conn->abort_code); break; case RXRPC_PACKET_TYPE_ACK: trace_rxrpc_tx_ack(chan->call_debug_id, serial, @@ -153,13 +153,12 @@ static void rxrpc_conn_retransmit_call(struct rxrpc_connection *conn, * pass a connection-level abort onto all calls on that connection */ static void rxrpc_abort_calls(struct rxrpc_connection *conn, - enum rxrpc_call_completion compl, - u32 abort_code, int error) + enum rxrpc_call_completion compl) { struct rxrpc_call *call; int i; - _enter("{%d},%x", conn->debug_id, abort_code); + _enter("{%d},%x", conn->debug_id, conn->abort_code); spin_lock(&conn->channel_lock); @@ -172,9 +171,11 @@ static void rxrpc_abort_calls(struct rxrpc_connection *conn, trace_rxrpc_abort(call->debug_id, "CON", call->cid, call->call_id, 0, - abort_code, error); + conn->abort_code, + conn->error); if (rxrpc_set_call_completion(call, compl, - abort_code, error)) + conn->abort_code, + conn->error)) rxrpc_notify_socket(call); } } @@ -207,10 +208,12 @@ static int rxrpc_abort_connection(struct rxrpc_connection *conn, return 0; } + conn->error = error; + conn->abort_code = abort_code; conn->state = RXRPC_CONN_LOCALLY_ABORTED; spin_unlock_bh(&conn->state_lock); - rxrpc_abort_calls(conn, RXRPC_CALL_LOCALLY_ABORTED, abort_code, error); + rxrpc_abort_calls(conn, RXRPC_CALL_LOCALLY_ABORTED); msg.msg_name = &conn->params.peer->srx.transport; msg.msg_namelen = conn->params.peer->srx.transport_len; @@ -229,7 +232,7 @@ static int rxrpc_abort_connection(struct rxrpc_connection *conn, whdr._rsvd = 0; whdr.serviceId = htons(conn->service_id); - word = htonl(conn->local_abort); + word = htonl(conn->abort_code); iov[0].iov_base = &whdr; iov[0].iov_len = sizeof(whdr); @@ -240,7 +243,7 @@ static int rxrpc_abort_connection(struct rxrpc_connection *conn, serial = atomic_inc_return(&conn->serial); whdr.serial = htonl(serial); - _proto("Tx CONN ABORT %%%u { %d }", serial, conn->local_abort); + _proto("Tx CONN ABORT %%%u { %d }", serial, conn->abort_code); ret = kernel_sendmsg(conn->params.local->socket, &msg, iov, 2, len); if (ret < 0) { @@ -315,9 +318,10 @@ static int rxrpc_process_event(struct rxrpc_connection *conn, abort_code = ntohl(wtmp); _proto("Rx ABORT %%%u { ac=%d }", sp->hdr.serial, abort_code); + conn->error = -ECONNABORTED; + conn->abort_code = abort_code; conn->state = RXRPC_CONN_REMOTELY_ABORTED; - rxrpc_abort_calls(conn, RXRPC_CALL_REMOTELY_ABORTED, - abort_code, -ECONNABORTED); + rxrpc_abort_calls(conn, RXRPC_CALL_REMOTELY_ABORTED); return -ECONNABORTED; case RXRPC_PACKET_TYPE_CHALLENGE: From 4e2abd3c051830a0d4189340fe79f2549bdf36de Mon Sep 17 00:00:00 2001 From: David Howells Date: Mon, 8 Oct 2018 19:44:39 +0100 Subject: [PATCH 086/134] rxrpc: Fix the rxrpc_tx_packet trace line Fix the rxrpc_tx_packet trace line by storing the where parameter. Fixes: 4764c0da69dc ("rxrpc: Trace packet transmission") Signed-off-by: David Howells --- include/trace/events/rxrpc.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index 837393fa897b..573d5b901fb1 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -931,6 +931,7 @@ TRACE_EVENT(rxrpc_tx_packet, TP_fast_assign( __entry->call = call_id; memcpy(&__entry->whdr, whdr, sizeof(__entry->whdr)); + __entry->where = where; ), TP_printk("c=%08x %08x:%08x:%08x:%04x %08x %08x %02x %02x %s %s", From c1e15b4944c9fa7fbbb74f7a5920a1e31b4b965a Mon Sep 17 00:00:00 2001 From: David Howells Date: Mon, 8 Oct 2018 15:46:25 +0100 Subject: [PATCH 087/134] rxrpc: Fix the packet reception routine The rxrpc_input_packet() function and its call tree was built around the assumption that data_ready() handler called from UDP to inform a kernel service that there is data to be had was non-reentrant. This means that certain locking could be dispensed with. This, however, turns out not to be the case with a multi-queue network card that can deliver packets to multiple cpus simultaneously. Each of those cpus can be in the rxrpc_input_packet() function at the same time. Fix by adding or changing some structure members: (1) Add peer->rtt_input_lock to serialise access to the RTT buffer. (2) Make conn->service_id into a 32-bit variable so that it can be cmpxchg'd on all arches. (3) Add call->input_lock to serialise access to the Rx/Tx state. Note that although the Rx and Tx states are (almost) entirely separate, there's no point completing the separation and having separate locks since it's a bi-phasal RPC protocol rather than a bi-direction streaming protocol. Data transmission and data reception do not take place simultaneously on any particular call. and making the following functional changes: (1) In rxrpc_input_data(), hold call->input_lock around the core to prevent simultaneous producing of packets into the Rx ring and updating of tracking state for a particular call. (2) In rxrpc_input_ping_response(), only read call->ping_serial once, and check it before checking RXRPC_CALL_PINGING as that's a cheaper test. The bit test and bit clear can then be combined. No further locking is needed here. (3) In rxrpc_input_ack(), take call->input_lock after we've parsed much of the ACK packet. The superseded ACK check is then done both before and after the lock is taken. The handing of ackinfo data is split, parsing before the lock is taken and processing with it held. This is keyed on rxMTU being non-zero. Congestion management is also done within the locked section. (4) In rxrpc_input_ackall(), take call->input_lock around the Tx window rotation. The ACKALL packet carries no information and is only really useful after all packets have been transmitted since it's imprecise. (5) In rxrpc_input_implicit_end_call(), we use rx->incoming_lock to prevent calls being simultaneously implicitly ended on two cpus and also to prevent any races with incoming call setup. (6) In rxrpc_input_packet(), use cmpxchg() to effect the service upgrade on a connection. It is only permitted to happen once for a connection. (7) In rxrpc_new_incoming_call(), we have to recheck the routing inside rx->incoming_lock to see if someone else set up the call, connection or peer whilst we were getting there. We can't trust the values from the earlier routing check unless we pin refs on them - which we want to avoid. Further, we need to allow for an incoming call to have its state changed on another CPU between us making it live and us adjusting it because the conn is now in the RXRPC_CONN_SERVICE state. (8) In rxrpc_peer_add_rtt(), take peer->rtt_input_lock around the access to the RTT buffer. Don't need to lock around setting peer->rtt. For reference, the inventory of state-accessing or state-altering functions used by the packet input procedure is: > rxrpc_input_packet() * PACKET CHECKING * ROUTING > rxrpc_post_packet_to_local() > rxrpc_find_connection_rcu() - uses RCU > rxrpc_lookup_peer_rcu() - uses RCU > rxrpc_find_service_conn_rcu() - uses RCU > idr_find() - uses RCU * CONNECTION-LEVEL PROCESSING - Service upgrade - Can only happen once per conn ! Changed to use cmpxchg > rxrpc_post_packet_to_conn() - Setting conn->hi_serial - Probably safe not using locks - Maybe use cmpxchg * CALL-LEVEL PROCESSING > Old-call checking > rxrpc_input_implicit_end_call() > rxrpc_call_completed() > rxrpc_queue_call() ! Need to take rx->incoming_lock > __rxrpc_disconnect_call() > rxrpc_notify_socket() > rxrpc_new_incoming_call() - Uses rx->incoming_lock for the entire process - Might be able to drop this earlier in favour of the call lock > rxrpc_incoming_call() ! Conflicts with rxrpc_input_implicit_end_call() > rxrpc_send_ping() - Don't need locks to check rtt state > rxrpc_propose_ACK * PACKET DISTRIBUTION > rxrpc_input_call_packet() > rxrpc_input_data() * QUEUE DATA PACKET ON CALL > rxrpc_reduce_call_timer() - Uses timer_reduce() ! Needs call->input_lock() > rxrpc_receiving_reply() ! Needs locking around ack state > rxrpc_rotate_tx_window() > rxrpc_end_tx_phase() > rxrpc_proto_abort() > rxrpc_input_dup_data() - Fills the Rx buffer - rxrpc_propose_ACK() - rxrpc_notify_socket() > rxrpc_input_ack() * APPLY ACK PACKET TO CALL AND DISCARD PACKET > rxrpc_input_ping_response() - Probably doesn't need any extra locking ! Need READ_ONCE() on call->ping_serial > rxrpc_input_check_for_lost_ack() - Takes call->lock to consult Tx buffer > rxrpc_peer_add_rtt() ! Needs to take a lock (peer->rtt_input_lock) ! Could perhaps manage with cmpxchg() and xadd() instead > rxrpc_input_requested_ack - Consults Tx buffer ! Probably needs a lock > rxrpc_peer_add_rtt() > rxrpc_propose_ack() > rxrpc_input_ackinfo() - Changes call->tx_winsize ! Use cmpxchg to handle change ! Should perhaps track serial number - Uses peer->lock to record MTU specification changes > rxrpc_proto_abort() ! Need to take call->input_lock > rxrpc_rotate_tx_window() > rxrpc_end_tx_phase() > rxrpc_input_soft_acks() - Consults the Tx buffer > rxrpc_congestion_management() - Modifies the Tx annotations ! Needs call->input_lock() > rxrpc_queue_call() > rxrpc_input_abort() * APPLY ABORT PACKET TO CALL AND DISCARD PACKET > rxrpc_set_call_completion() > rxrpc_notify_socket() > rxrpc_input_ackall() * APPLY ACKALL PACKET TO CALL AND DISCARD PACKET ! Need to take call->input_lock > rxrpc_rotate_tx_window() > rxrpc_end_tx_phase() > rxrpc_reject_packet() There are some functions used by the above that queue the packet, after which the procedure is terminated: - rxrpc_post_packet_to_local() - local->event_queue is an sk_buff_head - local->processor is a work_struct - rxrpc_post_packet_to_conn() - conn->rx_queue is an sk_buff_head - conn->processor is a work_struct - rxrpc_reject_packet() - local->reject_queue is an sk_buff_head - local->processor is a work_struct And some that offload processing to process context: - rxrpc_notify_socket() - Uses RCU lock - Uses call->notify_lock to call call->notify_rx - Uses call->recvmsg_lock to queue recvmsg side - rxrpc_queue_call() - call->processor is a work_struct - rxrpc_propose_ACK() - Uses call->lock to wrap __rxrpc_propose_ACK() And a bunch that complete a call, all of which use call->state_lock to protect the call state: - rxrpc_call_completed() - rxrpc_set_call_completion() - rxrpc_abort_call() - rxrpc_proto_abort() - Also uses rxrpc_queue_call() Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells --- net/rxrpc/ar-internal.h | 7 ++- net/rxrpc/call_accept.c | 21 +++++-- net/rxrpc/call_object.c | 1 + net/rxrpc/input.c | 120 ++++++++++++++++++++++++++-------------- net/rxrpc/peer_event.c | 5 ++ net/rxrpc/peer_object.c | 1 + 6 files changed, 105 insertions(+), 50 deletions(-) diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index 45307463b7dd..a6e6cae82c30 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -302,6 +302,7 @@ struct rxrpc_peer { /* calculated RTT cache */ #define RXRPC_RTT_CACHE_SIZE 32 + spinlock_t rtt_input_lock; /* RTT lock for input routine */ ktime_t rtt_last_req; /* Time of last RTT request */ u64 rtt; /* Current RTT estimate (in nS) */ u64 rtt_sum; /* Sum of cache contents */ @@ -447,7 +448,7 @@ struct rxrpc_connection { atomic_t serial; /* packet serial number counter */ unsigned int hi_serial; /* highest serial number received */ u32 security_nonce; /* response re-use preventer */ - u16 service_id; /* Service ID, possibly upgraded */ + u32 service_id; /* Service ID, possibly upgraded */ u8 size_align; /* data size alignment (for security) */ u8 security_size; /* security header size */ u8 security_ix; /* security type */ @@ -635,6 +636,8 @@ struct rxrpc_call { bool tx_phase; /* T if transmission phase, F if receive phase */ u8 nr_jumbo_bad; /* Number of jumbo dups/exceeds-windows */ + spinlock_t input_lock; /* Lock for packet input to this call */ + /* receive-phase ACK management */ u8 ackr_reason; /* reason to ACK */ u16 ackr_skew; /* skew on packet being ACK'd */ @@ -720,8 +723,6 @@ int rxrpc_service_prealloc(struct rxrpc_sock *, gfp_t); void rxrpc_discard_prealloc(struct rxrpc_sock *); struct rxrpc_call *rxrpc_new_incoming_call(struct rxrpc_local *, struct rxrpc_sock *, - struct rxrpc_peer *, - struct rxrpc_connection *, struct sk_buff *); void rxrpc_accept_incoming_calls(struct rxrpc_local *); struct rxrpc_call *rxrpc_accept_call(struct rxrpc_sock *, unsigned long, diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c index 1c4ebc0cb25b..652e314de38e 100644 --- a/net/rxrpc/call_accept.c +++ b/net/rxrpc/call_accept.c @@ -333,11 +333,11 @@ static struct rxrpc_call *rxrpc_alloc_incoming_call(struct rxrpc_sock *rx, */ struct rxrpc_call *rxrpc_new_incoming_call(struct rxrpc_local *local, struct rxrpc_sock *rx, - struct rxrpc_peer *peer, - struct rxrpc_connection *conn, struct sk_buff *skb) { struct rxrpc_skb_priv *sp = rxrpc_skb(skb); + struct rxrpc_connection *conn; + struct rxrpc_peer *peer; struct rxrpc_call *call; _enter(""); @@ -354,6 +354,13 @@ struct rxrpc_call *rxrpc_new_incoming_call(struct rxrpc_local *local, goto out; } + /* The peer, connection and call may all have sprung into existence due + * to a duplicate packet being handled on another CPU in parallel, so + * we have to recheck the routing. However, we're now holding + * rx->incoming_lock, so the values should remain stable. + */ + conn = rxrpc_find_connection_rcu(local, skb, &peer); + call = rxrpc_alloc_incoming_call(rx, local, peer, conn, skb); if (!call) { skb->mark = RXRPC_SKB_MARK_REJECT_BUSY; @@ -396,10 +403,12 @@ struct rxrpc_call *rxrpc_new_incoming_call(struct rxrpc_local *local, case RXRPC_CONN_SERVICE: write_lock(&call->state_lock); - if (rx->discard_new_call) - call->state = RXRPC_CALL_SERVER_RECV_REQUEST; - else - call->state = RXRPC_CALL_SERVER_ACCEPTING; + if (call->state < RXRPC_CALL_COMPLETE) { + if (rx->discard_new_call) + call->state = RXRPC_CALL_SERVER_RECV_REQUEST; + else + call->state = RXRPC_CALL_SERVER_ACCEPTING; + } write_unlock(&call->state_lock); break; diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c index 0ca2c2dfd196..8f1a8f85b1f9 100644 --- a/net/rxrpc/call_object.c +++ b/net/rxrpc/call_object.c @@ -138,6 +138,7 @@ struct rxrpc_call *rxrpc_alloc_call(struct rxrpc_sock *rx, gfp_t gfp, init_waitqueue_head(&call->waitq); spin_lock_init(&call->lock); spin_lock_init(&call->notify_lock); + spin_lock_init(&call->input_lock); rwlock_init(&call->state_lock); atomic_set(&call->usage, 1); call->debug_id = debug_id; diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index 04213a65c1ac..570b49d2da42 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -459,13 +459,15 @@ static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb, } } + spin_lock(&call->input_lock); + /* Received data implicitly ACKs all of the request packets we sent * when we're acting as a client. */ if ((state == RXRPC_CALL_CLIENT_SEND_REQUEST || state == RXRPC_CALL_CLIENT_AWAIT_REPLY) && !rxrpc_receiving_reply(call)) - return; + goto unlock; call->ackr_prev_seq = seq; @@ -495,12 +497,16 @@ next_subpacket: if (flags & RXRPC_LAST_PACKET) { if (test_bit(RXRPC_CALL_RX_LAST, &call->flags) && - seq != call->rx_top) - return rxrpc_proto_abort("LSN", call, seq); + seq != call->rx_top) { + rxrpc_proto_abort("LSN", call, seq); + goto unlock; + } } else { if (test_bit(RXRPC_CALL_RX_LAST, &call->flags) && - after_eq(seq, call->rx_top)) - return rxrpc_proto_abort("LSA", call, seq); + after_eq(seq, call->rx_top)) { + rxrpc_proto_abort("LSA", call, seq); + goto unlock; + } } trace_rxrpc_rx_data(call->debug_id, seq, serial, flags, annotation); @@ -567,8 +573,10 @@ next_subpacket: skip: offset += len; if (flags & RXRPC_JUMBO_PACKET) { - if (skb_copy_bits(skb, offset, &flags, 1) < 0) - return rxrpc_proto_abort("XJF", call, seq); + if (skb_copy_bits(skb, offset, &flags, 1) < 0) { + rxrpc_proto_abort("XJF", call, seq); + goto unlock; + } offset += sizeof(struct rxrpc_jumbo_header); seq++; serial++; @@ -608,6 +616,9 @@ ack: trace_rxrpc_notify_socket(call->debug_id, serial); rxrpc_notify_socket(call); } + +unlock: + spin_unlock(&call->input_lock); _leave(" [queued]"); } @@ -694,15 +705,14 @@ static void rxrpc_input_ping_response(struct rxrpc_call *call, ping_time = call->ping_time; smp_rmb(); - ping_serial = call->ping_serial; + ping_serial = READ_ONCE(call->ping_serial); if (orig_serial == call->acks_lost_ping) rxrpc_input_check_for_lost_ack(call); - if (!test_bit(RXRPC_CALL_PINGING, &call->flags) || - before(orig_serial, ping_serial)) + if (before(orig_serial, ping_serial) || + !test_and_clear_bit(RXRPC_CALL_PINGING, &call->flags)) return; - clear_bit(RXRPC_CALL_PINGING, &call->flags); if (after(orig_serial, ping_serial)) return; @@ -869,24 +879,31 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb, } /* Discard any out-of-order or duplicate ACKs. */ - if (before_eq(sp->hdr.serial, call->acks_latest)) { - _debug("discard ACK %d <= %d", - sp->hdr.serial, call->acks_latest); + if (before_eq(sp->hdr.serial, call->acks_latest)) return; - } + + buf.info.rxMTU = 0; + ioffset = offset + nr_acks + 3; + if (skb->len >= ioffset + sizeof(buf.info) && + skb_copy_bits(skb, ioffset, &buf.info, sizeof(buf.info)) < 0) + return rxrpc_proto_abort("XAI", call, 0); + + spin_lock(&call->input_lock); + + /* Discard any out-of-order or duplicate ACKs. */ + if (before_eq(sp->hdr.serial, call->acks_latest)) + goto out; call->acks_latest_ts = skb->tstamp; call->acks_latest = sp->hdr.serial; /* Parse rwind and mtu sizes if provided. */ - ioffset = offset + nr_acks + 3; - if (skb->len >= ioffset + sizeof(buf.info)) { - if (skb_copy_bits(skb, ioffset, &buf.info, sizeof(buf.info)) < 0) - return rxrpc_proto_abort("XAI", call, 0); + if (buf.info.rxMTU) rxrpc_input_ackinfo(call, skb, &buf.info); - } - if (first_soft_ack == 0) - return rxrpc_proto_abort("AK0", call, 0); + if (first_soft_ack == 0) { + rxrpc_proto_abort("AK0", call, 0); + goto out; + } /* Ignore ACKs unless we are or have just been transmitting. */ switch (READ_ONCE(call->state)) { @@ -896,25 +913,31 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb, case RXRPC_CALL_SERVER_AWAIT_ACK: break; default: - return; + goto out; } if (before(hard_ack, call->tx_hard_ack) || - after(hard_ack, call->tx_top)) - return rxrpc_proto_abort("AKW", call, 0); - if (nr_acks > call->tx_top - hard_ack) - return rxrpc_proto_abort("AKN", call, 0); + after(hard_ack, call->tx_top)) { + rxrpc_proto_abort("AKW", call, 0); + goto out; + } + if (nr_acks > call->tx_top - hard_ack) { + rxrpc_proto_abort("AKN", call, 0); + goto out; + } if (after(hard_ack, call->tx_hard_ack)) { if (rxrpc_rotate_tx_window(call, hard_ack, &summary)) { rxrpc_end_tx_phase(call, false, "ETA"); - return; + goto out; } } if (nr_acks > 0) { - if (skb_copy_bits(skb, offset, buf.acks, nr_acks) < 0) - return rxrpc_proto_abort("XSA", call, 0); + if (skb_copy_bits(skb, offset, buf.acks, nr_acks) < 0) { + rxrpc_proto_abort("XSA", call, 0); + goto out; + } rxrpc_input_soft_acks(call, buf.acks, first_soft_ack, nr_acks, &summary); } @@ -927,7 +950,9 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb, false, true, rxrpc_propose_ack_ping_for_lost_reply); - return rxrpc_congestion_management(call, skb, &summary, acked_serial); + rxrpc_congestion_management(call, skb, &summary, acked_serial); +out: + spin_unlock(&call->input_lock); } /* @@ -940,8 +965,12 @@ static void rxrpc_input_ackall(struct rxrpc_call *call, struct sk_buff *skb) _proto("Rx ACKALL %%%u", sp->hdr.serial); + spin_lock(&call->input_lock); + if (rxrpc_rotate_tx_window(call, call->tx_top, &summary)) rxrpc_end_tx_phase(call, false, "ETL"); + + spin_unlock(&call->input_lock); } /* @@ -1024,18 +1053,19 @@ static void rxrpc_input_call_packet(struct rxrpc_call *call, } /* - * Handle a new call on a channel implicitly completing the preceding call on - * that channel. + * Handle a new service call on a channel implicitly completing the preceding + * call on that channel. This does not apply to client conns. * * TODO: If callNumber > call_id + 1, renegotiate security. */ -static void rxrpc_input_implicit_end_call(struct rxrpc_connection *conn, +static void rxrpc_input_implicit_end_call(struct rxrpc_sock *rx, + struct rxrpc_connection *conn, struct rxrpc_call *call) { switch (READ_ONCE(call->state)) { case RXRPC_CALL_SERVER_AWAIT_ACK: rxrpc_call_completed(call); - break; + /* Fall through */ case RXRPC_CALL_COMPLETE: break; default: @@ -1043,11 +1073,13 @@ static void rxrpc_input_implicit_end_call(struct rxrpc_connection *conn, set_bit(RXRPC_CALL_EV_ABORT, &call->events); rxrpc_queue_call(call); } + trace_rxrpc_improper_term(call); break; } - trace_rxrpc_improper_term(call); + spin_lock(&rx->incoming_lock); __rxrpc_disconnect_call(conn, call); + spin_unlock(&rx->incoming_lock); rxrpc_notify_socket(call); } @@ -1244,10 +1276,16 @@ int rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb) goto wrong_security; if (sp->hdr.serviceId != conn->service_id) { - if (!test_bit(RXRPC_CONN_PROBING_FOR_UPGRADE, &conn->flags) || - conn->service_id != conn->params.service_id) + int old_id; + + if (!test_bit(RXRPC_CONN_PROBING_FOR_UPGRADE, &conn->flags)) + goto reupgrade; + old_id = cmpxchg(&conn->service_id, conn->params.service_id, + sp->hdr.serviceId); + + if (old_id != conn->params.service_id && + old_id != sp->hdr.serviceId) goto reupgrade; - conn->service_id = sp->hdr.serviceId; } if (sp->hdr.callNumber == 0) { @@ -1305,7 +1343,7 @@ int rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb) if (rxrpc_to_client(sp)) goto reject_packet; if (call) - rxrpc_input_implicit_end_call(conn, call); + rxrpc_input_implicit_end_call(rx, conn, call); call = NULL; } @@ -1325,7 +1363,7 @@ int rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb) goto bad_message; if (sp->hdr.seq != 1) goto discard; - call = rxrpc_new_incoming_call(local, rx, peer, conn, skb); + call = rxrpc_new_incoming_call(local, rx, skb); if (!call) goto reject_packet; rxrpc_send_ping(call, skb, skew); diff --git a/net/rxrpc/peer_event.c b/net/rxrpc/peer_event.c index f3e6fc670da2..05b51bdbdd41 100644 --- a/net/rxrpc/peer_event.c +++ b/net/rxrpc/peer_event.c @@ -301,6 +301,8 @@ void rxrpc_peer_add_rtt(struct rxrpc_call *call, enum rxrpc_rtt_rx_trace why, if (rtt < 0) return; + spin_lock(&peer->rtt_input_lock); + /* Replace the oldest datum in the RTT buffer */ sum -= peer->rtt_cache[cursor]; sum += rtt; @@ -312,6 +314,8 @@ void rxrpc_peer_add_rtt(struct rxrpc_call *call, enum rxrpc_rtt_rx_trace why, peer->rtt_usage = usage; } + spin_unlock(&peer->rtt_input_lock); + /* Now recalculate the average */ if (usage == RXRPC_RTT_CACHE_SIZE) { avg = sum / RXRPC_RTT_CACHE_SIZE; @@ -320,6 +324,7 @@ void rxrpc_peer_add_rtt(struct rxrpc_call *call, enum rxrpc_rtt_rx_trace why, do_div(avg, usage); } + /* Don't need to update this under lock */ peer->rtt = avg; trace_rxrpc_rtt_rx(call, why, send_serial, resp_serial, rtt, usage, avg); diff --git a/net/rxrpc/peer_object.c b/net/rxrpc/peer_object.c index 2d39eaf19620..5691b7d266ca 100644 --- a/net/rxrpc/peer_object.c +++ b/net/rxrpc/peer_object.c @@ -225,6 +225,7 @@ struct rxrpc_peer *rxrpc_alloc_peer(struct rxrpc_local *local, gfp_t gfp) peer->service_conns = RB_ROOT; seqlock_init(&peer->service_conn_lock); spin_lock_init(&peer->lock); + spin_lock_init(&peer->rtt_input_lock); peer->debug_id = atomic_inc_return(&rxrpc_debug_id); if (RXRPC_TX_SMSS > 2190) From 49e00eee00612b1357596fed8a88b621a7648c14 Mon Sep 17 00:00:00 2001 From: Reinette Chatre Date: Thu, 4 Oct 2018 14:05:23 -0700 Subject: [PATCH 088/134] x86/intel_rdt: Fix out-of-bounds memory access in CBM tests While the DOC at the beginning of lib/bitmap.c explicitly states that "The number of valid bits in a given bitmap does _not_ need to be an exact multiple of BITS_PER_LONG.", some of the bitmap operations do indeed access BITS_PER_LONG portions of the provided bitmap no matter the size of the provided bitmap. For example, if bitmap_intersects() is provided with an 8 bit bitmap the operation will access BITS_PER_LONG bits from the provided bitmap. While the operation ensures that these extra bits do not affect the result, the memory is still accessed. The capacity bitmasks (CBMs) are typically stored in u32 since they can never exceed 32 bits. A few instances exist where a bitmap_* operation is performed on a CBM by simply pointing the bitmap operation to the stored u32 value. The consequence of this pattern is that some bitmap_* operations will access out-of-bounds memory when interacting with the provided CBM. This is confirmed with a KASAN test that reports: BUG: KASAN: stack-out-of-bounds in __bitmap_intersects+0xa2/0x100 and BUG: KASAN: stack-out-of-bounds in __bitmap_weight+0x58/0x90 Fix this by moving any CBM provided to a bitmap operation needing BITS_PER_LONG to an 'unsigned long' variable. [ tglx: Changed related function arguments to unsigned long and got rid of the _cbm extra step ] Fixes: 72d505056604 ("x86/intel_rdt: Add utilities to test pseudo-locked region possibility") Fixes: 49f7b4efa110 ("x86/intel_rdt: Enable setting of exclusive mode") Fixes: d9b48c86eb38 ("x86/intel_rdt: Display resource groups' allocations' size in bytes") Fixes: 95f0b77efa57 ("x86/intel_rdt: Initialize new resource group with sane defaults") Signed-off-by: Reinette Chatre Signed-off-by: Thomas Gleixner Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Cc: gavin.hindman@intel.com Cc: jithu.joseph@intel.com Cc: dave.hansen@intel.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/69a428613a53f10e80594679ac726246020ff94f.1538686926.git.reinette.chatre@intel.com Signed-off-by: Ingo Molnar --- arch/x86/kernel/cpu/intel_rdt.h | 6 ++-- arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c | 20 ++++++------ arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 36 +++++++++++++-------- 3 files changed, 37 insertions(+), 25 deletions(-) diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h index 285eb3ec4200..3736f6dc9545 100644 --- a/arch/x86/kernel/cpu/intel_rdt.h +++ b/arch/x86/kernel/cpu/intel_rdt.h @@ -529,14 +529,14 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of, int rdtgroup_schemata_show(struct kernfs_open_file *of, struct seq_file *s, void *v); bool rdtgroup_cbm_overlaps(struct rdt_resource *r, struct rdt_domain *d, - u32 _cbm, int closid, bool exclusive); + unsigned long cbm, int closid, bool exclusive); unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r, struct rdt_domain *d, - u32 cbm); + unsigned long cbm); enum rdtgrp_mode rdtgroup_mode_by_closid(int closid); int rdtgroup_tasks_assigned(struct rdtgroup *r); int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp); int rdtgroup_locksetup_exit(struct rdtgroup *rdtgrp); -bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_domain *d, u32 _cbm); +bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_domain *d, unsigned long cbm); bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d); int rdt_pseudo_lock_init(void); void rdt_pseudo_lock_release(void); diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c index 40f3903ae5d9..f8c260d522ca 100644 --- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c +++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c @@ -797,25 +797,27 @@ int rdtgroup_locksetup_exit(struct rdtgroup *rdtgrp) /** * rdtgroup_cbm_overlaps_pseudo_locked - Test if CBM or portion is pseudo-locked * @d: RDT domain - * @_cbm: CBM to test + * @cbm: CBM to test * - * @d represents a cache instance and @_cbm a capacity bitmask that is - * considered for it. Determine if @_cbm overlaps with any existing + * @d represents a cache instance and @cbm a capacity bitmask that is + * considered for it. Determine if @cbm overlaps with any existing * pseudo-locked region on @d. * - * Return: true if @_cbm overlaps with pseudo-locked region on @d, false + * @cbm is unsigned long, even if only 32 bits are used, to make the + * bitmap functions work correctly. + * + * Return: true if @cbm overlaps with pseudo-locked region on @d, false * otherwise. */ -bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_domain *d, u32 _cbm) +bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_domain *d, unsigned long cbm) { - unsigned long *cbm = (unsigned long *)&_cbm; - unsigned long *cbm_b; unsigned int cbm_len; + unsigned long cbm_b; if (d->plr) { cbm_len = d->plr->r->cache.cbm_len; - cbm_b = (unsigned long *)&d->plr->cbm; - if (bitmap_intersects(cbm, cbm_b, cbm_len)) + cbm_b = d->plr->cbm; + if (bitmap_intersects(&cbm, &cbm_b, cbm_len)) return true; } return false; diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c index 1b8e86a5d5e1..b140c68bc14b 100644 --- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c +++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c @@ -975,33 +975,34 @@ static int rdtgroup_mode_show(struct kernfs_open_file *of, * is false then overlaps with any resource group or hardware entities * will be considered. * + * @cbm is unsigned long, even if only 32 bits are used, to make the + * bitmap functions work correctly. + * * Return: false if CBM does not overlap, true if it does. */ bool rdtgroup_cbm_overlaps(struct rdt_resource *r, struct rdt_domain *d, - u32 _cbm, int closid, bool exclusive) + unsigned long cbm, int closid, bool exclusive) { - unsigned long *cbm = (unsigned long *)&_cbm; - unsigned long *ctrl_b; enum rdtgrp_mode mode; + unsigned long ctrl_b; u32 *ctrl; int i; /* Check for any overlap with regions used by hardware directly */ if (!exclusive) { - if (bitmap_intersects(cbm, - (unsigned long *)&r->cache.shareable_bits, - r->cache.cbm_len)) + ctrl_b = r->cache.shareable_bits; + if (bitmap_intersects(&cbm, &ctrl_b, r->cache.cbm_len)) return true; } /* Check for overlap with other resource groups */ ctrl = d->ctrl_val; for (i = 0; i < closids_supported(); i++, ctrl++) { - ctrl_b = (unsigned long *)ctrl; + ctrl_b = *ctrl; mode = rdtgroup_mode_by_closid(i); if (closid_allocated(i) && i != closid && mode != RDT_MODE_PSEUDO_LOCKSETUP) { - if (bitmap_intersects(cbm, ctrl_b, r->cache.cbm_len)) { + if (bitmap_intersects(&cbm, &ctrl_b, r->cache.cbm_len)) { if (exclusive) { if (mode == RDT_MODE_EXCLUSIVE) return true; @@ -1138,15 +1139,18 @@ out: * computed by first dividing the total cache size by the CBM length to * determine how many bytes each bit in the bitmask represents. The result * is multiplied with the number of bits set in the bitmask. + * + * @cbm is unsigned long, even if only 32 bits are used to make the + * bitmap functions work correctly. */ unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r, - struct rdt_domain *d, u32 cbm) + struct rdt_domain *d, unsigned long cbm) { struct cpu_cacheinfo *ci; unsigned int size = 0; int num_b, i; - num_b = bitmap_weight((unsigned long *)&cbm, r->cache.cbm_len); + num_b = bitmap_weight(&cbm, r->cache.cbm_len); ci = get_cpu_cacheinfo(cpumask_any(&d->cpu_mask)); for (i = 0; i < ci->num_leaves; i++) { if (ci->info_list[i].level == r->cache_level) { @@ -2353,6 +2357,7 @@ static int rdtgroup_init_alloc(struct rdtgroup *rdtgrp) u32 used_b = 0, unused_b = 0; u32 closid = rdtgrp->closid; struct rdt_resource *r; + unsigned long tmp_cbm; enum rdtgrp_mode mode; struct rdt_domain *d; int i, ret; @@ -2390,9 +2395,14 @@ static int rdtgroup_init_alloc(struct rdtgroup *rdtgrp) * modify the CBM based on system availability. */ cbm_ensure_valid(&d->new_ctrl, r); - if (bitmap_weight((unsigned long *) &d->new_ctrl, - r->cache.cbm_len) < - r->cache.min_cbm_bits) { + /* + * Assign the u32 CBM to an unsigned long to ensure + * that bitmap_weight() does not access out-of-bound + * memory. + */ + tmp_cbm = d->new_ctrl; + if (bitmap_weight(&tmp_cbm, r->cache.cbm_len) < + r->cache.min_cbm_bits) { rdt_last_cmd_printf("no space on %s:%d\n", r->name, d->id); return -ENOSPC; From e054637597ba36d3729ba6a3a3dd7aad8e2a3003 Mon Sep 17 00:00:00 2001 From: Srikar Dronamraju Date: Sat, 6 Oct 2018 16:53:19 +0530 Subject: [PATCH 089/134] mm, sched/numa: Remove remaining traces of NUMA rate-limiting Remove the leftover pglist_data::numabalancing_migrate_lock and its initialization, we stopped using this lock with: efaffc5e40ae ("mm, sched/numa: Remove rate-limiting of automatic NUMA balancing migration") [ mingo: Rewrote the changelog. ] Signed-off-by: Srikar Dronamraju Acked-by: Mel Gorman Cc: Linus Torvalds Cc: Linux-MM Cc: Peter Zijlstra Cc: Rik van Riel Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1538824999-31230-1-git-send-email-srikar@linux.vnet.ibm.com Signed-off-by: Ingo Molnar --- include/linux/mmzone.h | 4 ---- mm/page_alloc.c | 10 ---------- 2 files changed, 14 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 3f4c0b167333..d4b0c79d2924 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -667,10 +667,6 @@ typedef struct pglist_data { enum zone_type kcompactd_classzone_idx; wait_queue_head_t kcompactd_wait; struct task_struct *kcompactd; -#endif -#ifdef CONFIG_NUMA_BALANCING - /* Lock serializing the migrate rate limiting window */ - spinlock_t numabalancing_migrate_lock; #endif /* * This is a per-node reserve of pages that are not available diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 706a738c0aee..e2ef1c17942f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6193,15 +6193,6 @@ static unsigned long __init calc_memmap_size(unsigned long spanned_pages, return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT; } -#ifdef CONFIG_NUMA_BALANCING -static void pgdat_init_numabalancing(struct pglist_data *pgdat) -{ - spin_lock_init(&pgdat->numabalancing_migrate_lock); -} -#else -static void pgdat_init_numabalancing(struct pglist_data *pgdat) {} -#endif - #ifdef CONFIG_TRANSPARENT_HUGEPAGE static void pgdat_init_split_queue(struct pglist_data *pgdat) { @@ -6226,7 +6217,6 @@ static void __meminit pgdat_init_internals(struct pglist_data *pgdat) { pgdat_resize_init(pgdat); - pgdat_init_numabalancing(pgdat); pgdat_init_split_queue(pgdat); pgdat_init_kcompactd(pgdat); From 184d47f0fd365108bd06ab26cdb3450b716269fd Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Mon, 8 Oct 2018 16:54:34 -0700 Subject: [PATCH 090/134] x86/mm: Avoid VLA in pgd_alloc() Arnd Bergmann reported that turning on -Wvla found a new (unintended) VLA usage: arch/x86/mm/pgtable.c: In function 'pgd_alloc': include/linux/build_bug.h:29:45: error: ISO C90 forbids variable length array 'u_pmds' [-Werror=vla] arch/x86/mm/pgtable.c:190:34: note: in expansion of macro 'static_cpu_has' #define PREALLOCATED_USER_PMDS (static_cpu_has(X86_FEATURE_PTI) ? \ ^~~~~~~~~~~~~~ arch/x86/mm/pgtable.c:431:16: note: in expansion of macro 'PREALLOCATED_USER_PMDS' pmd_t *u_pmds[PREALLOCATED_USER_PMDS]; ^~~~~~~~~~~~~~~~~~~~~~ Use the actual size of the array that is used for X86_FEATURE_PTI, which is known at build time, instead of the variable size. [ mingo: Squashed original fix with followup fix to avoid bisection breakage, wrote new changelog. ] Reported-by: Arnd Bergmann Original-written-by: Arnd Bergmann Reported-by: Borislav Petkov Signed-off-by: Kees Cook Cc: Andrew Morton Cc: Andy Lutomirski Cc: Arnd Bergmann Cc: Dave Hansen Cc: Joerg Roedel Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Toshi Kani Fixes: 1be3f247c288 ("x86/mm: Avoid VLA in pgd_alloc()") Link: http://lkml.kernel.org/r/20181008235434.GA35035@beast Signed-off-by: Ingo Molnar --- arch/x86/mm/pgtable.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 089e78c4effd..59274e2c1ac4 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -115,6 +115,8 @@ static inline void pgd_list_del(pgd_t *pgd) #define UNSHARED_PTRS_PER_PGD \ (SHARED_KERNEL_PMD ? KERNEL_PGD_BOUNDARY : PTRS_PER_PGD) +#define MAX_UNSHARED_PTRS_PER_PGD \ + max_t(size_t, KERNEL_PGD_BOUNDARY, PTRS_PER_PGD) static void pgd_set_mm(pgd_t *pgd, struct mm_struct *mm) @@ -181,6 +183,7 @@ static void pgd_dtor(pgd_t *pgd) * and initialize the kernel pmds here. */ #define PREALLOCATED_PMDS UNSHARED_PTRS_PER_PGD +#define MAX_PREALLOCATED_PMDS MAX_UNSHARED_PTRS_PER_PGD /* * We allocate separate PMDs for the kernel part of the user page-table @@ -189,6 +192,7 @@ static void pgd_dtor(pgd_t *pgd) */ #define PREALLOCATED_USER_PMDS (static_cpu_has(X86_FEATURE_PTI) ? \ KERNEL_PGD_PTRS : 0) +#define MAX_PREALLOCATED_USER_PMDS KERNEL_PGD_PTRS void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmd) { @@ -210,7 +214,9 @@ void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmd) /* No need to prepopulate any pagetable entries in non-PAE modes. */ #define PREALLOCATED_PMDS 0 +#define MAX_PREALLOCATED_PMDS 0 #define PREALLOCATED_USER_PMDS 0 +#define MAX_PREALLOCATED_USER_PMDS 0 #endif /* CONFIG_X86_PAE */ static void free_pmds(struct mm_struct *mm, pmd_t *pmds[], int count) @@ -428,8 +434,8 @@ static inline void _pgd_free(pgd_t *pgd) pgd_t *pgd_alloc(struct mm_struct *mm) { pgd_t *pgd; - pmd_t *u_pmds[PREALLOCATED_USER_PMDS]; - pmd_t *pmds[PREALLOCATED_PMDS]; + pmd_t *u_pmds[MAX_PREALLOCATED_USER_PMDS]; + pmd_t *pmds[MAX_PREALLOCATED_PMDS]; pgd = _pgd_alloc(); From 41591b38f5f8f78344954b68582b5f00e56ffe61 Mon Sep 17 00:00:00 2001 From: Chris Boot Date: Mon, 8 Oct 2018 17:07:30 +0200 Subject: [PATCH 091/134] mmc: block: avoid multiblock reads for the last sector in SPI mode MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On some SD cards over SPI, reading with the multiblock read command the last sector will leave the card in a bad state. Remove last sectors from the multiblock reading cmd. Signed-off-by: Chris Boot Signed-off-by: Clément Péron Cc: stable@vger.kernel.org # v4.10+ Signed-off-by: Ulf Hansson --- drivers/mmc/core/block.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index a0b9102c4c6e..e201ccb3fda4 100644 --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -1370,6 +1370,16 @@ static void mmc_blk_data_prep(struct mmc_queue *mq, struct mmc_queue_req *mqrq, brq->data.blocks = card->host->max_blk_count; if (brq->data.blocks > 1) { + /* + * Some SD cards in SPI mode return a CRC error or even lock up + * completely when trying to read the last block using a + * multiblock read command. + */ + if (mmc_host_is_spi(card->host) && (rq_data_dir(req) == READ) && + (blk_rq_pos(req) + blk_rq_sectors(req) == + get_capacity(md->disk))) + brq->data.blocks--; + /* * After a read error, we redo the request one sector * at a time in order to accurately determine which From dc480feb454a975b7ee2c18a2f98fb34e04d3baf Mon Sep 17 00:00:00 2001 From: Andreas Gruenbacher Date: Tue, 9 Oct 2018 13:20:05 +0200 Subject: [PATCH 092/134] gfs2: Fix iomap buffered write support for journaled files Commit 64bc06bb32ee broke buffered writes to journaled files (chattr +j): we'll try to journal the buffer heads of the page being written to in gfs2_iomap_journaled_page_done. However, the iomap code no longer creates buffer heads, so we'll BUG() in gfs2_page_add_databufs. Fix that by creating buffer heads ourself when needed. Signed-off-by: Andreas Gruenbacher --- fs/gfs2/bmap.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c index 03128ed1f34e..3c159a7f9a9e 100644 --- a/fs/gfs2/bmap.c +++ b/fs/gfs2/bmap.c @@ -975,6 +975,10 @@ static void gfs2_iomap_journaled_page_done(struct inode *inode, loff_t pos, { struct gfs2_inode *ip = GFS2_I(inode); + if (!page_has_buffers(page)) { + create_empty_buffers(page, inode->i_sb->s_blocksize, + (1 << BH_Dirty)|(1 << BH_Uptodate)); + } gfs2_page_add_databufs(ip, page, offset_in_page(pos), copied); } From d79c3888bde6581da7ff9f9d6f581900ecb5e632 Mon Sep 17 00:00:00 2001 From: Arthur Kiyanovski Date: Tue, 9 Oct 2018 11:21:27 +0300 Subject: [PATCH 093/134] net: ena: fix warning in rmmod caused by double iounmap Memory mapped with devm_ioremap is automatically freed when the driver is disconnected from the device. Therefore there is no need to explicitly call devm_iounmap. Fixes: 0857d92f71b6 ("net: ena: add missing unmap bars on device removal") Fixes: 411838e7b41c ("net: ena: fix rare kernel crash when bar memory remap fails") Signed-off-by: Arthur Kiyanovski Signed-off-by: David S. Miller --- drivers/net/ethernet/amazon/ena/ena_netdev.c | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c index 25621a218f20..78d84ca0a378 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c @@ -3099,15 +3099,8 @@ err_rss_init: static void ena_release_bars(struct ena_com_dev *ena_dev, struct pci_dev *pdev) { - int release_bars; + int release_bars = pci_select_bars(pdev, IORESOURCE_MEM) & ENA_BAR_MASK; - if (ena_dev->mem_bar) - devm_iounmap(&pdev->dev, ena_dev->mem_bar); - - if (ena_dev->reg_bar) - devm_iounmap(&pdev->dev, ena_dev->reg_bar); - - release_bars = pci_select_bars(pdev, IORESOURCE_MEM) & ENA_BAR_MASK; pci_release_selected_regions(pdev, release_bars); } From d7703ddbd7c9cb1ab7c08e1b85b314ff8cea38e9 Mon Sep 17 00:00:00 2001 From: Arthur Kiyanovski Date: Tue, 9 Oct 2018 11:21:28 +0300 Subject: [PATCH 094/134] net: ena: fix rare bug when failed restart/resume is followed by driver removal In a rare scenario when ena_device_restore() fails, followed by device remove, an FLR will not be issued. In this case, the device will keep sending asynchronous AENQ keep-alive events, even after driver removal, leading to memory corruption. Fixes: 8c5c7abdeb2d ("net: ena: add power management ops to the ENA driver") Signed-off-by: Arthur Kiyanovski Signed-off-by: David S. Miller --- drivers/net/ethernet/amazon/ena/ena_netdev.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c index 78d84ca0a378..ce18e07ff5ad 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c @@ -2619,7 +2619,11 @@ err_disable_msix: ena_free_mgmnt_irq(adapter); ena_disable_msix(adapter); err_device_destroy: + ena_com_abort_admin_commands(ena_dev); + ena_com_wait_for_abort_completion(ena_dev); ena_com_admin_destroy(ena_dev); + ena_com_mmio_reg_read_request_destroy(ena_dev); + ena_com_dev_reset(ena_dev, ENA_REGS_RESET_DRIVER_INVALID_STATE); err: clear_bit(ENA_FLAG_DEVICE_RUNNING, &adapter->flags); clear_bit(ENA_FLAG_ONGOING_RESET, &adapter->flags); From 78a55d05def95144ca5fa9a64c49b2a0636a9866 Mon Sep 17 00:00:00 2001 From: Arthur Kiyanovski Date: Tue, 9 Oct 2018 11:21:29 +0300 Subject: [PATCH 095/134] net: ena: fix NULL dereference due to untimely napi initialization napi poll functions should be initialized before running request_irq(), to handle a rare condition where there is a pending interrupt, causing the ISR to fire immediately while the poll function wasn't set yet, causing a NULL dereference. Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Arthur Kiyanovski Signed-off-by: David S. Miller --- drivers/net/ethernet/amazon/ena/ena_netdev.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c index ce18e07ff5ad..d906293ce07d 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c @@ -1575,8 +1575,6 @@ static int ena_up_complete(struct ena_adapter *adapter) if (rc) return rc; - ena_init_napi(adapter); - ena_change_mtu(adapter->netdev, adapter->netdev->mtu); ena_refill_all_rx_bufs(adapter); @@ -1730,6 +1728,13 @@ static int ena_up(struct ena_adapter *adapter) ena_setup_io_intr(adapter); + /* napi poll functions should be initialized before running + * request_irq(), to handle a rare condition where there is a pending + * interrupt, causing the ISR to fire immediately while the poll + * function wasn't set yet, causing a null dereference + */ + ena_init_napi(adapter); + rc = ena_request_io_irq(adapter); if (rc) goto err_req_irq; From 248ab77342d0453f067b666b36f0f517ea66c361 Mon Sep 17 00:00:00 2001 From: Arthur Kiyanovski Date: Tue, 9 Oct 2018 11:21:30 +0300 Subject: [PATCH 096/134] net: ena: fix auto casting to boolean Eliminate potential auto casting compilation error. Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Arthur Kiyanovski Signed-off-by: David S. Miller --- drivers/net/ethernet/amazon/ena/ena_eth_com.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/amazon/ena/ena_eth_com.c b/drivers/net/ethernet/amazon/ena/ena_eth_com.c index 1c682b76190f..2b3ff0c20155 100644 --- a/drivers/net/ethernet/amazon/ena/ena_eth_com.c +++ b/drivers/net/ethernet/amazon/ena/ena_eth_com.c @@ -245,11 +245,11 @@ static inline void ena_com_rx_set_flags(struct ena_com_rx_ctx *ena_rx_ctx, (cdesc->status & ENA_ETH_IO_RX_CDESC_BASE_L4_PROTO_IDX_MASK) >> ENA_ETH_IO_RX_CDESC_BASE_L4_PROTO_IDX_SHIFT; ena_rx_ctx->l3_csum_err = - (cdesc->status & ENA_ETH_IO_RX_CDESC_BASE_L3_CSUM_ERR_MASK) >> - ENA_ETH_IO_RX_CDESC_BASE_L3_CSUM_ERR_SHIFT; + !!((cdesc->status & ENA_ETH_IO_RX_CDESC_BASE_L3_CSUM_ERR_MASK) >> + ENA_ETH_IO_RX_CDESC_BASE_L3_CSUM_ERR_SHIFT); ena_rx_ctx->l4_csum_err = - (cdesc->status & ENA_ETH_IO_RX_CDESC_BASE_L4_CSUM_ERR_MASK) >> - ENA_ETH_IO_RX_CDESC_BASE_L4_CSUM_ERR_SHIFT; + !!((cdesc->status & ENA_ETH_IO_RX_CDESC_BASE_L4_CSUM_ERR_MASK) >> + ENA_ETH_IO_RX_CDESC_BASE_L4_CSUM_ERR_SHIFT); ena_rx_ctx->hash = cdesc->hash; ena_rx_ctx->frag = (cdesc->status & ENA_ETH_IO_RX_CDESC_BASE_IPV4_FRAG_MASK) >> From c7cd55504a5b0fc826a2cd9540845979d24ae542 Mon Sep 17 00:00:00 2001 From: Shenghui Wang Date: Sun, 7 Oct 2018 14:45:41 +0800 Subject: [PATCH 097/134] dm cache: destroy migration_cache if cache target registration failed Commit 7e6358d244e47 ("dm: fix various targets to dm_register_target after module __init resources created") inadvertently introduced this bug when it moved dm_register_target() after the call to KMEM_CACHE(). Fixes: 7e6358d244e47 ("dm: fix various targets to dm_register_target after module __init resources created") Cc: stable@vger.kernel.org Signed-off-by: Shenghui Wang Signed-off-by: Mike Snitzer --- drivers/md/dm-cache-target.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c index e13d991e9fb5..b29a8327eed1 100644 --- a/drivers/md/dm-cache-target.c +++ b/drivers/md/dm-cache-target.c @@ -3484,14 +3484,13 @@ static int __init dm_cache_init(void) int r; migration_cache = KMEM_CACHE(dm_cache_migration, 0); - if (!migration_cache) { - dm_unregister_target(&cache_target); + if (!migration_cache) return -ENOMEM; - } r = dm_register_target(&cache_target); if (r) { DMERR("cache target registration failed: %d", r); + kmem_cache_destroy(migration_cache); return r; } From 9864cd5dc54cade89fd4b0954c2e522841aa247c Mon Sep 17 00:00:00 2001 From: Damien Le Moal Date: Tue, 9 Oct 2018 14:24:31 +0900 Subject: [PATCH 098/134] dm: fix report zone remapping to account for partition offset If dm-linear or dm-flakey are layered on top of a partition of a zoned block device, remapping of the start sector and write pointer position of the zones reported by a report zones BIO must be modified to account for the target table entry mapping (start offset within the device and entry mapping with the dm device). If the target's backing device is a partition of a whole disk, the start sector on the physical device of the partition must also be accounted for when modifying the zone information. However, dm_remap_zone_report() was not considering this last case, resulting in incorrect zone information remapping with targets using disk partitions. Fix this by calculating the target backing device start sector using the position of the completed report zones BIO and the unchanged position and size of the original report zone BIO. With this value calculated, the start sector and write pointer position of the target zones can be correctly remapped. Fixes: 10999307c14e ("dm: introduce dm_remap_zone_report()") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal Signed-off-by: Mike Snitzer --- drivers/md/dm.c | 27 ++++++++++++++++++++------- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 20f7e4ef5342..45abb54037fc 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1155,12 +1155,14 @@ void dm_accept_partial_bio(struct bio *bio, unsigned n_sectors) EXPORT_SYMBOL_GPL(dm_accept_partial_bio); /* - * The zone descriptors obtained with a zone report indicate - * zone positions within the target device. The zone descriptors - * must be remapped to match their position within the dm device. - * A target may call dm_remap_zone_report after completion of a - * REQ_OP_ZONE_REPORT bio to remap the zone descriptors obtained - * from the target device mapping to the dm device. + * The zone descriptors obtained with a zone report indicate zone positions + * within the target backing device, regardless of that device is a partition + * and regardless of the target mapping start sector on the device or partition. + * The zone descriptors start sector and write pointer position must be adjusted + * to match their relative position within the dm device. + * A target may call dm_remap_zone_report() after completion of a + * REQ_OP_ZONE_REPORT bio to remap the zone descriptors obtained from the + * backing device. */ void dm_remap_zone_report(struct dm_target *ti, struct bio *bio, sector_t start) { @@ -1171,6 +1173,7 @@ void dm_remap_zone_report(struct dm_target *ti, struct bio *bio, sector_t start) struct blk_zone *zone; unsigned int nr_rep = 0; unsigned int ofst; + sector_t part_offset; struct bio_vec bvec; struct bvec_iter iter; void *addr; @@ -1178,6 +1181,15 @@ void dm_remap_zone_report(struct dm_target *ti, struct bio *bio, sector_t start) if (bio->bi_status) return; + /* + * bio sector was incremented by the request size on completion. Taking + * into account the original request sector, the target start offset on + * the backing device and the target mapping offset (ti->begin), the + * start sector of the backing device. The partition offset is always 0 + * if the target uses a whole device. + */ + part_offset = bio->bi_iter.bi_sector + ti->begin - (start + bio_end_sector(report_bio)); + /* * Remap the start sector of the reported zones. For sequential zones, * also remap the write pointer position. @@ -1195,6 +1207,7 @@ void dm_remap_zone_report(struct dm_target *ti, struct bio *bio, sector_t start) /* Set zones start sector */ while (hdr->nr_zones && ofst < bvec.bv_len) { zone = addr + ofst; + zone->start -= part_offset; if (zone->start >= start + ti->len) { hdr->nr_zones = 0; break; @@ -1206,7 +1219,7 @@ void dm_remap_zone_report(struct dm_target *ti, struct bio *bio, sector_t start) else if (zone->cond == BLK_ZONE_COND_EMPTY) zone->wp = zone->start; else - zone->wp = zone->wp + ti->begin - start; + zone->wp = zone->wp + ti->begin - start - part_offset; } ofst += sizeof(struct blk_zone); hdr->nr_zones--; From d4d2313a3c17eff4aef9a544023c2df5b9f5bedc Mon Sep 17 00:00:00 2001 From: Emil Karlson Date: Wed, 3 Oct 2018 21:43:18 +0300 Subject: [PATCH 099/134] mfd: cros-ec: copy the whole event in get_next_event_xfer Commit 57e94c8b974db2d83c60e1139c89a70806abbea0 caused cros-ec keyboard events be truncated on many chromebooks so that Left and Right keys on Column 12 were always 0. Use ret as memcpy len to fix this. The old code was using ec_dev->event_size, which is the event payload/data size excluding event_type header, for the length of the memcpy operation. Use ret as memcpy length to avoid the off by one and copy the whole msg->data. Fixes: 57e94c8b974d ("mfd: cros-ec: Increase maximum mkbp event size") Acked-by: Enric Balletbo i Serra Tested-by: Emil Renner Berthing Signed-off-by: Emil Karlson Signed-off-by: Benson Leung --- drivers/platform/chrome/cros_ec_proto.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/platform/chrome/cros_ec_proto.c b/drivers/platform/chrome/cros_ec_proto.c index 398393ab5df8..b6fd4838f60f 100644 --- a/drivers/platform/chrome/cros_ec_proto.c +++ b/drivers/platform/chrome/cros_ec_proto.c @@ -520,7 +520,7 @@ static int get_next_event_xfer(struct cros_ec_device *ec_dev, ret = cros_ec_cmd_xfer(ec_dev, msg); if (ret > 0) { ec_dev->event_size = ret - 1; - memcpy(&ec_dev->event_data, msg->data, ec_dev->event_size); + memcpy(&ec_dev->event_data, msg->data, ret); } return ret; From 8894891446c9380709451b99ab45c5c53adfd2fc Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Tue, 25 Sep 2018 21:06:24 -0700 Subject: [PATCH 100/134] of: unittest: Disable interrupt node tests for old world MAC systems On systems with OF_IMAP_OLDWORLD_MAC set in of_irq_workarounds, the devicetree interrupt parsing code is different, causing unit tests of devicetree interrupt nodes to fail. Due to a bug in unittest code, which tries to dereference an uninitialized pointer, this results in a crash. OF: /testcase-data/phandle-tests/consumer-a: arguments longer than property Unable to handle kernel paging request for data at address 0x00bc616e Faulting instruction address: 0xc08e9468 Oops: Kernel access of bad area, sig: 11 [#1] BE PREEMPT PowerMac Modules linked in: CPU: 0 PID: 1 Comm: swapper Not tainted 4.14.72-rc1-yocto-standard+ #1 task: cf8e0000 task.stack: cf8da000 NIP: c08e9468 LR: c08ea5bc CTR: c08ea5ac REGS: cf8dbb50 TRAP: 0300 Not tainted (4.14.72-rc1-yocto-standard+) MSR: 00001032 CR: 82004044 XER: 00000000 DAR: 00bc616e DSISR: 40000000 GPR00: c08ea5bc cf8dbc00 cf8e0000 c13ca517 c13ca517 c13ca8a0 00000066 00000002 GPR08: 00000063 00bc614e c0b05865 000affff 82004048 00000000 c00047f0 00000000 GPR16: c0a80000 c0a9cc34 c13ca517 c0ad1134 05ffffff 000affff c0b05860 c0abeef8 GPR24: cecec278 cecec278 c0a8c4d0 c0a885e0 c13ca8a0 05ffffff c13ca8a0 c13ca517 NIP [c08e9468] device_node_gen_full_name+0x30/0x15c LR [c08ea5bc] device_node_string+0x190/0x3c8 Call Trace: [cf8dbc00] [c007f670] trace_hardirqs_on_caller+0x118/0x1fc (unreliable) [cf8dbc40] [c08ea5bc] device_node_string+0x190/0x3c8 [cf8dbcb0] [c08eb794] pointer+0x25c/0x4d0 [cf8dbd00] [c08ebcbc] vsnprintf+0x2b4/0x5ec [cf8dbd60] [c08ec00c] vscnprintf+0x18/0x48 [cf8dbd70] [c008e268] vprintk_store+0x4c/0x22c [cf8dbda0] [c008ecac] vprintk_emit+0x94/0x130 [cf8dbdd0] [c008ff54] printk+0x5c/0x6c [cf8dbe10] [c0b8ddd4] of_unittest+0x2220/0x26f8 [cf8dbea0] [c0004434] do_one_initcall+0x4c/0x184 [cf8dbf00] [c0b4534c] kernel_init_freeable+0x13c/0x1d8 [cf8dbf30] [c0004814] kernel_init+0x24/0x118 [cf8dbf40] [c0013398] ret_from_kernel_thread+0x5c/0x64 The problem was observed when running a qemu test for the g3beige machine with devicetree unittests enabled. Disable interrupt node tests on affected systems to avoid both false unittest failures and the crash. With this patch in place, unittest on the affected system passes with the following message. dt-test ### end of unittest - 144 passed, 0 failed Fixes: 53a42093d96ef ("of: Add device tree selftests") Signed-off-by: Guenter Roeck Reviewed-by: Frank Rowand Signed-off-by: Rob Herring --- drivers/of/unittest.c | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-) diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c index 722537e14848..41b49716ac75 100644 --- a/drivers/of/unittest.c +++ b/drivers/of/unittest.c @@ -771,6 +771,9 @@ static void __init of_unittest_parse_interrupts(void) struct of_phandle_args args; int i, rc; + if (of_irq_workarounds & OF_IMAP_OLDWORLD_MAC) + return; + np = of_find_node_by_path("/testcase-data/interrupts/interrupts0"); if (!np) { pr_err("missing testcase data\n"); @@ -845,6 +848,9 @@ static void __init of_unittest_parse_interrupts_extended(void) struct of_phandle_args args; int i, rc; + if (of_irq_workarounds & OF_IMAP_OLDWORLD_MAC) + return; + np = of_find_node_by_path("/testcase-data/interrupts/interrupts-extended0"); if (!np) { pr_err("missing testcase data\n"); @@ -1001,15 +1007,19 @@ static void __init of_unittest_platform_populate(void) pdev = of_find_device_by_node(np); unittest(pdev, "device 1 creation failed\n"); - irq = platform_get_irq(pdev, 0); - unittest(irq == -EPROBE_DEFER, "device deferred probe failed - %d\n", irq); + if (!(of_irq_workarounds & OF_IMAP_OLDWORLD_MAC)) { + irq = platform_get_irq(pdev, 0); + unittest(irq == -EPROBE_DEFER, + "device deferred probe failed - %d\n", irq); - /* Test that a parsing failure does not return -EPROBE_DEFER */ - np = of_find_node_by_path("/testcase-data/testcase-device2"); - pdev = of_find_device_by_node(np); - unittest(pdev, "device 2 creation failed\n"); - irq = platform_get_irq(pdev, 0); - unittest(irq < 0 && irq != -EPROBE_DEFER, "device parsing error failed - %d\n", irq); + /* Test that a parsing failure does not return -EPROBE_DEFER */ + np = of_find_node_by_path("/testcase-data/testcase-device2"); + pdev = of_find_device_by_node(np); + unittest(pdev, "device 2 creation failed\n"); + irq = platform_get_irq(pdev, 0); + unittest(irq < 0 && irq != -EPROBE_DEFER, + "device parsing error failed - %d\n", irq); + } np = of_find_node_by_path("/testcase-data/platform-tests"); unittest(np, "No testcase data in device tree\n"); From 4f666675cdff0b986195413215eb062b7da6586f Mon Sep 17 00:00:00 2001 From: Daniel Mack Date: Mon, 8 Oct 2018 22:03:57 +0200 Subject: [PATCH 101/134] libertas: call into generic suspend code before turning off power When powering down a SDIO connected card during suspend, make sure to call into the generic lbs_suspend() function before pulling the plug. This will make sure the card is successfully deregistered from the system to avoid communication to the card starving out. Fixes: 7444a8092906 ("libertas: fix suspend and resume for SDIO connected cards") Signed-off-by: Daniel Mack Reviewed-by: Ulf Hansson Acked-by: Kalle Valo Signed-off-by: Ulf Hansson --- drivers/net/wireless/marvell/libertas/if_sdio.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/net/wireless/marvell/libertas/if_sdio.c b/drivers/net/wireless/marvell/libertas/if_sdio.c index 43743c26c071..39bf85d0ade0 100644 --- a/drivers/net/wireless/marvell/libertas/if_sdio.c +++ b/drivers/net/wireless/marvell/libertas/if_sdio.c @@ -1317,6 +1317,10 @@ static int if_sdio_suspend(struct device *dev) if (priv->wol_criteria == EHS_REMOVE_WAKEUP) { dev_info(dev, "Suspend without wake params -- powering down card\n"); if (priv->fw_ready) { + ret = lbs_suspend(priv); + if (ret) + return ret; + priv->power_up_on_resume = true; if_sdio_power_off(card); } From 3e779a2e7f909015f21428b66834127496110b6d Mon Sep 17 00:00:00 2001 From: Stephen Boyd Date: Mon, 8 Oct 2018 09:32:13 -0700 Subject: [PATCH 102/134] gpio: Assign gpio_irq_chip::parents to non-stack pointer gpiochip_set_cascaded_irqchip() is passed 'parent_irq' as an argument and then the address of that argument is assigned to the gpio chips gpio_irq_chip 'parents' pointer shortly thereafter. This can't ever work, because we've just assigned some stack address to a pointer that we plan to dereference later in gpiochip_irq_map(). I ran into this issue with the KASAN report below when gpiochip_irq_map() tried to setup the parent irq with a total junk pointer for the 'parents' array. BUG: KASAN: stack-out-of-bounds in gpiochip_irq_map+0x228/0x248 Read of size 4 at addr ffffffc0dde472e0 by task swapper/0/1 CPU: 7 PID: 1 Comm: swapper/0 Not tainted 4.14.72 #34 Call trace: [] dump_backtrace+0x0/0x718 [] show_stack+0x20/0x2c [] __dump_stack+0x20/0x28 [] dump_stack+0x80/0xbc [] print_address_description+0x70/0x238 [] kasan_report+0x1cc/0x260 [] __asan_report_load4_noabort+0x2c/0x38 [] gpiochip_irq_map+0x228/0x248 [] irq_domain_associate+0x114/0x2ec [] irq_create_mapping+0x120/0x234 [] irq_create_fwspec_mapping+0x4c8/0x88c [] irq_create_of_mapping+0x180/0x210 [] of_irq_get+0x138/0x198 [] spi_drv_probe+0x94/0x178 [] driver_probe_device+0x51c/0x824 [] __device_attach_driver+0x148/0x20c [] bus_for_each_drv+0x120/0x188 [] __device_attach+0x19c/0x2dc [] device_initial_probe+0x20/0x2c [] bus_probe_device+0x80/0x154 [] device_add+0x9b8/0xbdc [] spi_add_device+0x1b8/0x380 [] spi_register_controller+0x111c/0x1378 [] spi_geni_probe+0x4dc/0x6f8 [] platform_drv_probe+0xdc/0x130 [] driver_probe_device+0x51c/0x824 [] __driver_attach+0x100/0x194 [] bus_for_each_dev+0x104/0x16c [] driver_attach+0x48/0x54 [] bus_add_driver+0x274/0x498 [] driver_register+0x1ac/0x230 [] __platform_driver_register+0xcc/0xdc [] spi_geni_driver_init+0x1c/0x24 [] do_one_initcall+0x240/0x3dc [] kernel_init_freeable+0x378/0x468 [] kernel_init+0x14/0x110 [] ret_from_fork+0x10/0x18 The buggy address belongs to the page: page:ffffffbf037791c0 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x4000000000000000() raw: 4000000000000000 0000000000000000 0000000000000000 00000000ffffffff raw: ffffffbf037791e0 ffffffbf037791e0 0000000000000000 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffffffc0dde47180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffffc0dde47200: f1 f1 f1 f1 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f2 f2 >ffffffc0dde47280: f2 f2 00 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 ^ ffffffc0dde47300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffffc0dde47380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Let's leave around one unsigned int in the gpio_irq_chip struct for the single parent irq case and repoint the 'parents' array at it. This way code is left mostly intact to setup parents and we waste an extra few bytes per structure of which there should be only a handful in a system. Cc: Evan Green Cc: Thierry Reding Cc: Grygorii Strashko Fixes: e0d897289813 ("gpio: Implement tighter IRQ chip integration") Signed-off-by: Stephen Boyd Signed-off-by: Linus Walleij --- drivers/gpio/gpiolib.c | 3 ++- include/linux/gpio/driver.h | 7 +++++++ 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index a57300c1d649..25187403e3ac 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -1682,7 +1682,8 @@ static void gpiochip_set_cascaded_irqchip(struct gpio_chip *gpiochip, irq_set_chained_handler_and_data(parent_irq, parent_handler, gpiochip); - gpiochip->irq.parents = &parent_irq; + gpiochip->irq.parent_irq = parent_irq; + gpiochip->irq.parents = &gpiochip->irq.parent_irq; gpiochip->irq.num_parents = 1; } diff --git a/include/linux/gpio/driver.h b/include/linux/gpio/driver.h index 0ea328e71ec9..a4d5eb37744a 100644 --- a/include/linux/gpio/driver.h +++ b/include/linux/gpio/driver.h @@ -94,6 +94,13 @@ struct gpio_irq_chip { */ unsigned int num_parents; + /** + * @parent_irq: + * + * For use by gpiochip_set_cascaded_irqchip() + */ + unsigned int parent_irq; + /** * @parents: * From f259f896f2348f0302f6f88d4382378cf9d23a7e Mon Sep 17 00:00:00 2001 From: Marco Felsch Date: Tue, 2 Oct 2018 10:06:46 +0200 Subject: [PATCH 103/134] pinctrl: mcp23s08: fix irq and irqchip setup order MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Since 'commit 02e389e63e35 ("pinctrl: mcp23s08: fix irq setup order")' the irq request isn't the last devm_* allocation. Without a deeper look at the irq and testing this isn't a good solution. Since this driver relies on the devm mechanism, requesting a interrupt should be the last thing to avoid memory corruptions during unbinding. 'Commit 02e389e63e35 ("pinctrl: mcp23s08: fix irq setup order")' fixed the order for the interrupt-controller use case only. The mcp23s08_irq_setup() must be split into two to fix it for the interrupt-controller use case and to register the irq at last. So the irq will be freed first during unbind. Cc: stable@vger.kernel.org Cc: Jan Kundrát Cc: Dmitry Mastykin Cc: Sebastian Reichel Fixes: 82039d244f87 ("pinctrl: mcp23s08: add pinconf support") Fixes: 02e389e63e35 ("pinctrl: mcp23s08: fix irq setup order") Signed-off-by: Marco Felsch Tested-by: Phil Reid Signed-off-by: Linus Walleij --- drivers/pinctrl/pinctrl-mcp23s08.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/pinctrl/pinctrl-mcp23s08.c b/drivers/pinctrl/pinctrl-mcp23s08.c index 4a8a8efadefa..cf73a403d22d 100644 --- a/drivers/pinctrl/pinctrl-mcp23s08.c +++ b/drivers/pinctrl/pinctrl-mcp23s08.c @@ -636,6 +636,14 @@ static int mcp23s08_irq_setup(struct mcp23s08 *mcp) return err; } + return 0; +} + +static int mcp23s08_irqchip_setup(struct mcp23s08 *mcp) +{ + struct gpio_chip *chip = &mcp->chip; + int err; + err = gpiochip_irqchip_add_nested(chip, &mcp23s08_irq_chip, 0, @@ -912,7 +920,7 @@ static int mcp23s08_probe_one(struct mcp23s08 *mcp, struct device *dev, } if (mcp->irq && mcp->irq_controller) { - ret = mcp23s08_irq_setup(mcp); + ret = mcp23s08_irqchip_setup(mcp); if (ret) goto fail; } @@ -944,6 +952,9 @@ static int mcp23s08_probe_one(struct mcp23s08 *mcp, struct device *dev, goto fail; } + if (mcp->irq) + ret = mcp23s08_irq_setup(mcp); + fail: if (ret < 0) dev_dbg(dev, "can't setup chip %d, --> %d\n", addr, ret); From beb9caac211c1be1bc118bb62d5cf09c4107e6a5 Mon Sep 17 00:00:00 2001 From: Mike Snitzer Date: Wed, 10 Oct 2018 12:01:55 -0400 Subject: [PATCH 104/134] dm linear: eliminate linear_end_io call if CONFIG_DM_ZONED disabled It is best to avoid any extra overhead associated with bio completion. DM core will indirectly call a DM target's .end_io if it is defined. In the case of DM linear, there is no need to do so (for every bio that completes) if CONFIG_DM_ZONED is not enabled. Avoiding an extra indirect call for every bio completion is very important for ensuring DM linear doesn't incur more overhead that further widens the performance gap between dm-linear and raw block devices. Fixes: 0be12c1c7fce7 ("dm linear: add support for zoned block devices") Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer --- drivers/md/dm-linear.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c index d10964d41fd7..172f6fa83c0d 100644 --- a/drivers/md/dm-linear.c +++ b/drivers/md/dm-linear.c @@ -102,6 +102,7 @@ static int linear_map(struct dm_target *ti, struct bio *bio) return DM_MAPIO_REMAPPED; } +#ifdef CONFIG_DM_ZONED static int linear_end_io(struct dm_target *ti, struct bio *bio, blk_status_t *error) { @@ -112,6 +113,7 @@ static int linear_end_io(struct dm_target *ti, struct bio *bio, return DM_ENDIO_DONE; } +#endif static void linear_status(struct dm_target *ti, status_type_t type, unsigned status_flags, char *result, unsigned maxlen) @@ -208,12 +210,16 @@ static size_t linear_dax_copy_to_iter(struct dm_target *ti, pgoff_t pgoff, static struct target_type linear_target = { .name = "linear", .version = {1, 4, 0}, +#ifdef CONFIG_DM_ZONED + .end_io = linear_end_io, .features = DM_TARGET_PASSES_INTEGRITY | DM_TARGET_ZONED_HM, +#else + .features = DM_TARGET_PASSES_INTEGRITY, +#endif .module = THIS_MODULE, .ctr = linear_ctr, .dtr = linear_dtr, .map = linear_map, - .end_io = linear_end_io, .status = linear_status, .prepare_ioctl = linear_prepare_ioctl, .iterate_devices = linear_iterate_devices, From 5318321d367c21ca1fe9eb00caefc239c938750a Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Tue, 18 Sep 2018 12:58:33 +0900 Subject: [PATCH 105/134] samples: disable CONFIG_SAMPLES for UML Some samples require headers installation, so commit 3fca1700c4c3 ("kbuild: make samples really depend on headers_install") added such dependency in the top Makefile. However, UML fails to build with CONFIG_SAMPLES=y because UML does not support headers_install. Fixes: 3fca1700c4c3 ("kbuild: make samples really depend on headers_install") Reported-by: Kees Cook Cc: David Howells Signed-off-by: Masahiro Yamada --- samples/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/samples/Kconfig b/samples/Kconfig index bd133efc1a56..ad1ec7016d4c 100644 --- a/samples/Kconfig +++ b/samples/Kconfig @@ -1,5 +1,6 @@ menuconfig SAMPLES bool "Sample kernel code" + depends on !UML help You can build and test sample kernel code here. From f355cfcdb251e22b9dfb78c0eef4005a9d902a35 Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Wed, 10 Oct 2018 16:09:25 +0300 Subject: [PATCH 106/134] devlink: Fix param set handling for string type In case devlink param type is string, it needs to copy the string value it got from the input to devlink_param_value. Fixes: e3b7ca18ad7b ("devlink: Add param set command") Signed-off-by: Moshe Shemesh Signed-off-by: David S. Miller --- include/net/devlink.h | 2 +- net/core/devlink.c | 11 ++++++++--- 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/include/net/devlink.h b/include/net/devlink.h index b9b89d6604d4..b0e17c025fdc 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -311,7 +311,7 @@ union devlink_param_value { u8 vu8; u16 vu16; u32 vu32; - const char *vstr; + char vstr[DEVLINK_PARAM_MAX_STRING_VALUE]; bool vbool; }; diff --git a/net/core/devlink.c b/net/core/devlink.c index 8c0ed225e280..d808af7a5c52 100644 --- a/net/core/devlink.c +++ b/net/core/devlink.c @@ -2995,6 +2995,8 @@ devlink_param_value_get_from_info(const struct devlink_param *param, struct genl_info *info, union devlink_param_value *value) { + int len; + if (param->type != DEVLINK_PARAM_TYPE_BOOL && !info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]) return -EINVAL; @@ -3010,10 +3012,13 @@ devlink_param_value_get_from_info(const struct devlink_param *param, value->vu32 = nla_get_u32(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]); break; case DEVLINK_PARAM_TYPE_STRING: - if (nla_len(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]) > - DEVLINK_PARAM_MAX_STRING_VALUE) + len = strnlen(nla_data(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]), + nla_len(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA])); + if (len == nla_len(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]) || + len >= DEVLINK_PARAM_MAX_STRING_VALUE) return -EINVAL; - value->vstr = nla_data(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]); + strcpy(value->vstr, + nla_data(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA])); break; case DEVLINK_PARAM_TYPE_BOOL: value->vbool = info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA] ? From 1276534c988ba752fa01bf090412a877ee783829 Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Wed, 10 Oct 2018 16:09:26 +0300 Subject: [PATCH 107/134] devlink: Fix param cmode driverinit for string type Driverinit configuration mode value is held by devlink to enable the driver fetch the value after reload command. In case the param type is string devlink should copy the value from driver string buffer to devlink string buffer on devlink_param_driverinit_value_set() and vice-versa on devlink_param_driverinit_value_get(). Fixes: ec01aeb1803e ("devlink: Add support for get/set driverinit value") Signed-off-by: Moshe Shemesh Acked-by: Jiri Pirko Signed-off-by: David S. Miller --- net/core/devlink.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/net/core/devlink.c b/net/core/devlink.c index d808af7a5c52..1a0de1677197 100644 --- a/net/core/devlink.c +++ b/net/core/devlink.c @@ -3105,7 +3105,10 @@ static int devlink_nl_cmd_param_set_doit(struct sk_buff *skb, return -EOPNOTSUPP; if (cmode == DEVLINK_PARAM_CMODE_DRIVERINIT) { - param_item->driverinit_value = value; + if (param->type == DEVLINK_PARAM_TYPE_STRING) + strcpy(param_item->driverinit_value.vstr, value.vstr); + else + param_item->driverinit_value = value; param_item->driverinit_value_valid = true; } else { if (!param->set) @@ -4545,7 +4548,10 @@ int devlink_param_driverinit_value_get(struct devlink *devlink, u32 param_id, DEVLINK_PARAM_CMODE_DRIVERINIT)) return -EOPNOTSUPP; - *init_val = param_item->driverinit_value; + if (param_item->param->type == DEVLINK_PARAM_TYPE_STRING) + strcpy(init_val->vstr, param_item->driverinit_value.vstr); + else + *init_val = param_item->driverinit_value; return 0; } @@ -4576,7 +4582,10 @@ int devlink_param_driverinit_value_set(struct devlink *devlink, u32 param_id, DEVLINK_PARAM_CMODE_DRIVERINIT)) return -EOPNOTSUPP; - param_item->driverinit_value = init_val; + if (param_item->param->type == DEVLINK_PARAM_TYPE_STRING) + strcpy(param_item->driverinit_value.vstr, init_val.vstr); + else + param_item->driverinit_value = init_val; param_item->driverinit_value_valid = true; devlink_param_notify(devlink, param_item, DEVLINK_CMD_PARAM_NEW); From bde74ad10eb55aaa472c37b107934e6b8563c25e Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Wed, 10 Oct 2018 16:09:27 +0300 Subject: [PATCH 108/134] devlink: Add helper function for safely copy string param Devlink string param buffer is allocated at the size of DEVLINK_PARAM_MAX_STRING_VALUE. Add helper function which makes sure this size is not exceeded. Renamed DEVLINK_PARAM_MAX_STRING_VALUE to __DEVLINK_PARAM_MAX_STRING_VALUE to emphasize that it should be used by devlink only. The driver should use the helper function instead to verify it doesn't exceed the allowed length. Signed-off-by: Moshe Shemesh Acked-by: Jiri Pirko Signed-off-by: David S. Miller --- include/net/devlink.h | 12 ++++++++++-- net/core/devlink.c | 19 ++++++++++++++++++- 2 files changed, 28 insertions(+), 3 deletions(-) diff --git a/include/net/devlink.h b/include/net/devlink.h index b0e17c025fdc..99efc156a309 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -298,7 +298,7 @@ struct devlink_resource { #define DEVLINK_RESOURCE_ID_PARENT_TOP 0 -#define DEVLINK_PARAM_MAX_STRING_VALUE 32 +#define __DEVLINK_PARAM_MAX_STRING_VALUE 32 enum devlink_param_type { DEVLINK_PARAM_TYPE_U8, DEVLINK_PARAM_TYPE_U16, @@ -311,7 +311,7 @@ union devlink_param_value { u8 vu8; u16 vu16; u32 vu32; - char vstr[DEVLINK_PARAM_MAX_STRING_VALUE]; + char vstr[__DEVLINK_PARAM_MAX_STRING_VALUE]; bool vbool; }; @@ -553,6 +553,8 @@ int devlink_param_driverinit_value_get(struct devlink *devlink, u32 param_id, int devlink_param_driverinit_value_set(struct devlink *devlink, u32 param_id, union devlink_param_value init_val); void devlink_param_value_changed(struct devlink *devlink, u32 param_id); +void devlink_param_value_str_fill(union devlink_param_value *dst_val, + const char *src); struct devlink_region *devlink_region_create(struct devlink *devlink, const char *region_name, u32 region_max_snapshots, @@ -789,6 +791,12 @@ devlink_param_value_changed(struct devlink *devlink, u32 param_id) { } +static inline void +devlink_param_value_str_fill(union devlink_param_value *dst_val, + const char *src) +{ +} + static inline struct devlink_region * devlink_region_create(struct devlink *devlink, const char *region_name, diff --git a/net/core/devlink.c b/net/core/devlink.c index 1a0de1677197..6bc42933be4a 100644 --- a/net/core/devlink.c +++ b/net/core/devlink.c @@ -3015,7 +3015,7 @@ devlink_param_value_get_from_info(const struct devlink_param *param, len = strnlen(nla_data(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]), nla_len(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA])); if (len == nla_len(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]) || - len >= DEVLINK_PARAM_MAX_STRING_VALUE) + len >= __DEVLINK_PARAM_MAX_STRING_VALUE) return -EINVAL; strcpy(value->vstr, nla_data(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA])); @@ -4617,6 +4617,23 @@ void devlink_param_value_changed(struct devlink *devlink, u32 param_id) } EXPORT_SYMBOL_GPL(devlink_param_value_changed); +/** + * devlink_param_value_str_fill - Safely fill-up the string preventing + * from overflow of the preallocated buffer + * + * @dst_val: destination devlink_param_value + * @src: source buffer + */ +void devlink_param_value_str_fill(union devlink_param_value *dst_val, + const char *src) +{ + size_t len; + + len = strlcpy(dst_val->vstr, src, __DEVLINK_PARAM_MAX_STRING_VALUE); + WARN_ON(len >= __DEVLINK_PARAM_MAX_STRING_VALUE); +} +EXPORT_SYMBOL_GPL(devlink_param_value_str_fill); + /** * devlink_region_create - create a new address region * From 52b5d6f5dcf0e5201392f7d417148ccb537dbf6f Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 10 Oct 2018 06:59:35 -0700 Subject: [PATCH 109/134] net: make skb_partial_csum_set() more robust against overflows syzbot managed to crash in skb_checksum_help() [1] : BUG_ON(offset + sizeof(__sum16) > skb_headlen(skb)); Root cause is the following check in skb_partial_csum_set() if (unlikely(start > skb_headlen(skb)) || unlikely((int)start + off > skb_headlen(skb) - 2)) return false; If skb_headlen(skb) is 1, then (skb_headlen(skb) - 2) becomes 0xffffffff and the check fails to detect that ((int)start + off) is off the limit, since the compare is unsigned. When we fix that, then the first condition (start > skb_headlen(skb)) becomes obsolete. Then we should also check that (skb_headroom(skb) + start) wont overflow 16bit field. [1] kernel BUG at net/core/dev.c:2880! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 7330 Comm: syz-executor4 Not tainted 4.19.0-rc6+ #253 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:skb_checksum_help+0x9e3/0xbb0 net/core/dev.c:2880 Code: 85 00 ff ff ff 48 c1 e8 03 42 80 3c 28 00 0f 84 09 fb ff ff 48 8b bd 00 ff ff ff e8 97 a8 b9 fb e9 f8 fa ff ff e8 2d 09 76 fb <0f> 0b 48 8b bd 28 ff ff ff e8 1f a8 b9 fb e9 b1 f6 ff ff 48 89 cf RSP: 0018:ffff8801d83a6f60 EFLAGS: 00010293 RAX: ffff8801b9834380 RBX: ffff8801b9f8d8c0 RCX: ffffffff8608c6d7 RDX: 0000000000000000 RSI: ffffffff8608cc63 RDI: 0000000000000006 RBP: ffff8801d83a7068 R08: ffff8801b9834380 R09: 0000000000000000 R10: ffff8801d83a76d8 R11: 0000000000000000 R12: 0000000000000001 R13: 0000000000010001 R14: 000000000000ffff R15: 00000000000000a8 FS: 00007f1a66db5700(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7d77f091b0 CR3: 00000001ba252000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: skb_csum_hwoffload_help+0x8f/0xe0 net/core/dev.c:3269 validate_xmit_skb+0xa2a/0xf30 net/core/dev.c:3312 __dev_queue_xmit+0xc2f/0x3950 net/core/dev.c:3797 dev_queue_xmit+0x17/0x20 net/core/dev.c:3838 packet_snd net/packet/af_packet.c:2928 [inline] packet_sendmsg+0x422d/0x64c0 net/packet/af_packet.c:2953 Fixes: 5ff8dda3035d ("net: Ensure partial checksum offset is inside the skb head") Signed-off-by: Eric Dumazet Cc: Herbert Xu Reported-by: syzbot Signed-off-by: David S. Miller --- net/core/skbuff.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index b2c807f67aba..428094b577fc 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4452,14 +4452,16 @@ EXPORT_SYMBOL_GPL(skb_complete_wifi_ack); */ bool skb_partial_csum_set(struct sk_buff *skb, u16 start, u16 off) { - if (unlikely(start > skb_headlen(skb)) || - unlikely((int)start + off > skb_headlen(skb) - 2)) { - net_warn_ratelimited("bad partial csum: csum=%u/%u len=%u\n", - start, off, skb_headlen(skb)); + u32 csum_end = (u32)start + (u32)off + sizeof(__sum16); + u32 csum_start = skb_headroom(skb) + (u32)start; + + if (unlikely(csum_start > U16_MAX || csum_end > skb_headlen(skb))) { + net_warn_ratelimited("bad partial csum: csum=%u/%u headroom=%u headlen=%u\n", + start, off, skb_headroom(skb), skb_headlen(skb)); return false; } skb->ip_summed = CHECKSUM_PARTIAL; - skb->csum_start = skb_headroom(skb) + start; + skb->csum_start = csum_start; skb->csum_offset = off; skb_set_transport_header(skb, start); return true; From dd9a403495704fc80fb9f399003013ef2be2ee23 Mon Sep 17 00:00:00 2001 From: Valentine Fatiev Date: Wed, 10 Oct 2018 09:56:25 +0300 Subject: [PATCH 110/134] IB/mlx5: Unmap DMA addr from HCA before IOMMU The function that puts back the MR in cache also removes the DMA address from the HCA. Therefore we need to call this function before we remove the DMA mapping from MMU. Otherwise the HCA may access a memory that is no longer DMA mapped. Call trace: NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0. CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0-rc6+ #4 Hardware name: HP ProLiant DL360p Gen8, BIOS P71 08/20/2012 RIP: 0010:intel_idle+0x73/0x120 Code: 80 5c 01 00 0f ae 38 0f ae f0 31 d2 65 48 8b 04 25 80 5c 01 00 48 89 d1 0f 60 02 RSP: 0018:ffffffff9a403e38 EFLAGS: 00000046 RAX: 0000000000000030 RBX: 0000000000000005 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffffffff9a5790c0 RDI: 0000000000000000 RBP: 0000000000000030 R08: 0000000000000000 R09: 0000000000007cf9 R10: 000000000000030a R11: 0000000000000018 R12: 0000000000000000 R13: ffffffff9a5792b8 R14: ffffffff9a5790c0 R15: 0000002b48471e4d FS: 0000000000000000(0000) GS:ffff9c6caf400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f5737185000 CR3: 0000000590c0a002 CR4: 00000000000606f0 Call Trace: cpuidle_enter_state+0x7e/0x2e0 do_idle+0x1ed/0x290 cpu_startup_entry+0x6f/0x80 start_kernel+0x524/0x544 ? set_init_arg+0x55/0x55 secondary_startup_64+0xa4/0xb0 DMAR: DRHD: handling fault status reg 2 DMAR: [DMA Read] Request device [04:00.0] fault addr b34d2000 [fault reason 06] PTE Read access is not set DMAR: [DMA Read] Request device [01:00.2] fault addr bff8b000 [fault reason 06] PTE Read access is not set Fixes: f3f134f5260a ("RDMA/mlx5: Fix crash while accessing garbage pointer and freed memory") Signed-off-by: Valentine Fatiev Reviewed-by: Moni Shoua Signed-off-by: Leon Romanovsky Signed-off-by: Doug Ledford --- drivers/infiniband/hw/mlx5/mr.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c index 9fb1d9cb9401..e22314837645 100644 --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -544,6 +544,9 @@ void mlx5_mr_cache_free(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr) int shrink = 0; int c; + if (!mr->allocated_from_cache) + return; + c = order2idx(dev, mr->order); if (c < 0 || c >= MAX_MR_CACHE_ENTRIES) { mlx5_ib_warn(dev, "order %d, cache index %d\n", mr->order, c); @@ -1647,18 +1650,19 @@ static void dereg_mr(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr) umem = NULL; } #endif - clean_mr(dev, mr); + /* + * We should unregister the DMA address from the HCA before + * remove the DMA mapping. + */ + mlx5_mr_cache_free(dev, mr); if (umem) { ib_umem_release(umem); atomic_sub(npages, &dev->mdev->priv.reg_pages); } - if (!mr->allocated_from_cache) kfree(mr); - else - mlx5_mr_cache_free(dev, mr); } int mlx5_ib_dereg_mr(struct ib_mr *ibmr) From 118aa47c7072bce05fc39bd40a1c0a90caed72ab Mon Sep 17 00:00:00 2001 From: Damien Le Moal Date: Thu, 11 Oct 2018 11:45:30 +0900 Subject: [PATCH 111/134] dm linear: fix linear_end_io conditional definition The dm-linear target is independent of the dm-zoned target. For code requiring support for zoned block devices, use CONFIG_BLK_DEV_ZONED instead of CONFIG_DM_ZONED. While at it, similarly to dm linear, also enable the DM_TARGET_ZONED_HM feature in dm-flakey only if CONFIG_BLK_DEV_ZONED is defined. Fixes: beb9caac211c1 ("dm linear: eliminate linear_end_io call if CONFIG_DM_ZONED disabled") Fixes: 0be12c1c7fce7 ("dm linear: add support for zoned block devices") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal Signed-off-by: Mike Snitzer --- drivers/md/dm-flakey.c | 2 ++ drivers/md/dm-linear.c | 4 ++-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c index 21d126a5078c..32aabe27b37c 100644 --- a/drivers/md/dm-flakey.c +++ b/drivers/md/dm-flakey.c @@ -467,7 +467,9 @@ static int flakey_iterate_devices(struct dm_target *ti, iterate_devices_callout_ static struct target_type flakey_target = { .name = "flakey", .version = {1, 5, 0}, +#ifdef CONFIG_BLK_DEV_ZONED .features = DM_TARGET_ZONED_HM, +#endif .module = THIS_MODULE, .ctr = flakey_ctr, .dtr = flakey_dtr, diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c index 172f6fa83c0d..2f7c44a006c4 100644 --- a/drivers/md/dm-linear.c +++ b/drivers/md/dm-linear.c @@ -102,7 +102,7 @@ static int linear_map(struct dm_target *ti, struct bio *bio) return DM_MAPIO_REMAPPED; } -#ifdef CONFIG_DM_ZONED +#ifdef CONFIG_BLK_DEV_ZONED static int linear_end_io(struct dm_target *ti, struct bio *bio, blk_status_t *error) { @@ -210,7 +210,7 @@ static size_t linear_dax_copy_to_iter(struct dm_target *ti, pgoff_t pgoff, static struct target_type linear_target = { .name = "linear", .version = {1, 4, 0}, -#ifdef CONFIG_DM_ZONED +#ifdef CONFIG_BLK_DEV_ZONED .end_io = linear_end_io, .features = DM_TARGET_PASSES_INTEGRITY | DM_TARGET_ZONED_HM, #else From 9a4890bd6d6325a1c88564a20ab310b2d56f6094 Mon Sep 17 00:00:00 2001 From: Ka-Cheong Poon Date: Mon, 8 Oct 2018 09:17:11 -0700 Subject: [PATCH 112/134] rds: RDS (tcp) hangs on sendto() to unresponding address In rds_send_mprds_hash(), if the calculated hash value is non-zero and the MPRDS connections are not yet up, it will wait. But it should not wait if the send is non-blocking. In this case, it should just use the base c_path for sending the message. Signed-off-by: Ka-Cheong Poon Acked-by: Santosh Shilimkar Signed-off-by: David S. Miller --- net/rds/send.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/net/rds/send.c b/net/rds/send.c index 57b3d5a8b2db..fe785ee819dd 100644 --- a/net/rds/send.c +++ b/net/rds/send.c @@ -1007,7 +1007,8 @@ static int rds_cmsg_send(struct rds_sock *rs, struct rds_message *rm, return ret; } -static int rds_send_mprds_hash(struct rds_sock *rs, struct rds_connection *conn) +static int rds_send_mprds_hash(struct rds_sock *rs, + struct rds_connection *conn, int nonblock) { int hash; @@ -1023,10 +1024,16 @@ static int rds_send_mprds_hash(struct rds_sock *rs, struct rds_connection *conn) * used. But if we are interrupted, we have to use the zero * c_path in case the connection ends up being non-MP capable. */ - if (conn->c_npaths == 0) + if (conn->c_npaths == 0) { + /* Cannot wait for the connection be made, so just use + * the base c_path. + */ + if (nonblock) + return 0; if (wait_event_interruptible(conn->c_hs_waitq, conn->c_npaths != 0)) hash = 0; + } if (conn->c_npaths == 1) hash = 0; } @@ -1256,7 +1263,7 @@ int rds_sendmsg(struct socket *sock, struct msghdr *msg, size_t payload_len) } if (conn->c_trans->t_mp_capable) - cpath = &conn->c_path[rds_send_mprds_hash(rs, conn)]; + cpath = &conn->c_path[rds_send_mprds_hash(rs, conn, nonblock)]; else cpath = &conn->c_path[0]; From 7abab7b9b498650404800a08765f44929fee8f31 Mon Sep 17 00:00:00 2001 From: Mike Rapoport Date: Tue, 9 Oct 2018 07:02:01 +0300 Subject: [PATCH 113/134] net/ipv6: stop leaking percpu memory in fib6 info The fib6_info_alloc() function allocates percpu memory to hold per CPU pointers to rt6_info, but this memory is never freed. Fix it. Fixes: a64efe142f5e ("net/ipv6: introduce fib6_info struct and helpers") Signed-off-by: Mike Rapoport Reviewed-by: David Ahern Signed-off-by: David S. Miller --- net/ipv6/ip6_fib.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index 5516f55e214b..cbe46175bb59 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -196,6 +196,8 @@ void fib6_info_destroy_rcu(struct rcu_head *head) *ppcpu_rt = NULL; } } + + free_percpu(f6i->rt6i_pcpu); } lwtstate_put(f6i->fib6_nh.nh_lwtstate); From af7d6cce53694a88d6a1bb60c9a239a6a5144459 Mon Sep 17 00:00:00 2001 From: Sabrina Dubroca Date: Tue, 9 Oct 2018 17:48:14 +0200 Subject: [PATCH 114/134] net: ipv4: update fnhe_pmtu when first hop's MTU changes Since commit 5aad1de5ea2c ("ipv4: use separate genid for next hop exceptions"), exceptions get deprecated separately from cached routes. In particular, administrative changes don't clear PMTU anymore. As Stefano described in commit e9fa1495d738 ("ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes"), the PMTU discovered before the local MTU change can become stale: - if the local MTU is now lower than the PMTU, that PMTU is now incorrect - if the local MTU was the lowest value in the path, and is increased, we might discover a higher PMTU Similarly to what commit e9fa1495d738 did for IPv6, update PMTU in those cases. If the exception was locked, the discovered PMTU was smaller than the minimal accepted PMTU. In that case, if the new local MTU is smaller than the current PMTU, let PMTU discovery figure out if locking of the exception is still needed. To do this, we need to know the old link MTU in the NETDEV_CHANGEMTU notifier. By the time the notifier is called, dev->mtu has been changed. This patch adds the old MTU as additional information in the notifier structure, and a new call_netdevice_notifiers_u32() function. Fixes: 5aad1de5ea2c ("ipv4: use separate genid for next hop exceptions") Signed-off-by: Sabrina Dubroca Reviewed-by: Stefano Brivio Reviewed-by: David Ahern Signed-off-by: David S. Miller --- include/linux/netdevice.h | 7 ++++++ include/net/ip_fib.h | 1 + net/core/dev.c | 28 ++++++++++++++++++++-- net/ipv4/fib_frontend.c | 12 ++++++---- net/ipv4/fib_semantics.c | 50 +++++++++++++++++++++++++++++++++++++++ 5 files changed, 92 insertions(+), 6 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index c7861e4b402c..d837dad24b4c 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2458,6 +2458,13 @@ struct netdev_notifier_info { struct netlink_ext_ack *extack; }; +struct netdev_notifier_info_ext { + struct netdev_notifier_info info; /* must be first */ + union { + u32 mtu; + } ext; +}; + struct netdev_notifier_change_info { struct netdev_notifier_info info; /* must be first */ unsigned int flags_changed; diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index 69c91d1934c1..c9b7b136939d 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -394,6 +394,7 @@ int ip_fib_check_default(__be32 gw, struct net_device *dev); int fib_sync_down_dev(struct net_device *dev, unsigned long event, bool force); int fib_sync_down_addr(struct net_device *dev, __be32 local); int fib_sync_up(struct net_device *dev, unsigned int nh_flags); +void fib_sync_mtu(struct net_device *dev, u32 orig_mtu); #ifdef CONFIG_IP_ROUTE_MULTIPATH int fib_multipath_hash(const struct net *net, const struct flowi4 *fl4, diff --git a/net/core/dev.c b/net/core/dev.c index 82114e1111e6..93243479085f 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1752,6 +1752,28 @@ int call_netdevice_notifiers(unsigned long val, struct net_device *dev) } EXPORT_SYMBOL(call_netdevice_notifiers); +/** + * call_netdevice_notifiers_mtu - call all network notifier blocks + * @val: value passed unmodified to notifier function + * @dev: net_device pointer passed unmodified to notifier function + * @arg: additional u32 argument passed to the notifier function + * + * Call all network notifier blocks. Parameters and return value + * are as for raw_notifier_call_chain(). + */ +static int call_netdevice_notifiers_mtu(unsigned long val, + struct net_device *dev, u32 arg) +{ + struct netdev_notifier_info_ext info = { + .info.dev = dev, + .ext.mtu = arg, + }; + + BUILD_BUG_ON(offsetof(struct netdev_notifier_info_ext, info) != 0); + + return call_netdevice_notifiers_info(val, &info.info); +} + #ifdef CONFIG_NET_INGRESS static DEFINE_STATIC_KEY_FALSE(ingress_needed_key); @@ -7574,14 +7596,16 @@ int dev_set_mtu_ext(struct net_device *dev, int new_mtu, err = __dev_set_mtu(dev, new_mtu); if (!err) { - err = call_netdevice_notifiers(NETDEV_CHANGEMTU, dev); + err = call_netdevice_notifiers_mtu(NETDEV_CHANGEMTU, dev, + orig_mtu); err = notifier_to_errno(err); if (err) { /* setting mtu back and notifying everyone again, * so that they have a chance to revert changes. */ __dev_set_mtu(dev, orig_mtu); - call_netdevice_notifiers(NETDEV_CHANGEMTU, dev); + call_netdevice_notifiers_mtu(NETDEV_CHANGEMTU, dev, + new_mtu); } } return err; diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index 2998b0e47d4b..0113993e9b2c 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -1243,7 +1243,8 @@ static int fib_inetaddr_event(struct notifier_block *this, unsigned long event, static int fib_netdev_event(struct notifier_block *this, unsigned long event, void *ptr) { struct net_device *dev = netdev_notifier_info_to_dev(ptr); - struct netdev_notifier_changeupper_info *info; + struct netdev_notifier_changeupper_info *upper_info = ptr; + struct netdev_notifier_info_ext *info_ext = ptr; struct in_device *in_dev; struct net *net = dev_net(dev); unsigned int flags; @@ -1278,16 +1279,19 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo fib_sync_up(dev, RTNH_F_LINKDOWN); else fib_sync_down_dev(dev, event, false); - /* fall through */ + rt_cache_flush(net); + break; case NETDEV_CHANGEMTU: + fib_sync_mtu(dev, info_ext->ext.mtu); rt_cache_flush(net); break; case NETDEV_CHANGEUPPER: - info = ptr; + upper_info = ptr; /* flush all routes if dev is linked to or unlinked from * an L3 master device (e.g., VRF) */ - if (info->upper_dev && netif_is_l3_master(info->upper_dev)) + if (upper_info->upper_dev && + netif_is_l3_master(upper_info->upper_dev)) fib_disable_ip(dev, NETDEV_DOWN, true); break; } diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index f3c89ccf14c5..446204ca7406 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -1470,6 +1470,56 @@ static int call_fib_nh_notifiers(struct fib_nh *fib_nh, return NOTIFY_DONE; } +/* Update the PMTU of exceptions when: + * - the new MTU of the first hop becomes smaller than the PMTU + * - the old MTU was the same as the PMTU, and it limited discovery of + * larger MTUs on the path. With that limit raised, we can now + * discover larger MTUs + * A special case is locked exceptions, for which the PMTU is smaller + * than the minimal accepted PMTU: + * - if the new MTU is greater than the PMTU, don't make any change + * - otherwise, unlock and set PMTU + */ +static void nh_update_mtu(struct fib_nh *nh, u32 new, u32 orig) +{ + struct fnhe_hash_bucket *bucket; + int i; + + bucket = rcu_dereference_protected(nh->nh_exceptions, 1); + if (!bucket) + return; + + for (i = 0; i < FNHE_HASH_SIZE; i++) { + struct fib_nh_exception *fnhe; + + for (fnhe = rcu_dereference_protected(bucket[i].chain, 1); + fnhe; + fnhe = rcu_dereference_protected(fnhe->fnhe_next, 1)) { + if (fnhe->fnhe_mtu_locked) { + if (new <= fnhe->fnhe_pmtu) { + fnhe->fnhe_pmtu = new; + fnhe->fnhe_mtu_locked = false; + } + } else if (new < fnhe->fnhe_pmtu || + orig == fnhe->fnhe_pmtu) { + fnhe->fnhe_pmtu = new; + } + } + } +} + +void fib_sync_mtu(struct net_device *dev, u32 orig_mtu) +{ + unsigned int hash = fib_devindex_hashfn(dev->ifindex); + struct hlist_head *head = &fib_info_devhash[hash]; + struct fib_nh *nh; + + hlist_for_each_entry(nh, head, nh_hash) { + if (nh->nh_dev == dev) + nh_update_mtu(nh, dev->mtu, orig_mtu); + } +} + /* Event force Flags Description * NETDEV_CHANGE 0 LINKDOWN Carrier OFF, not for scope host * NETDEV_DOWN 0 LINKDOWN|DEAD Link down, not for scope host From 28d35bcdd3925e7293408cdb8aa5f2aac5f0d6e3 Mon Sep 17 00:00:00 2001 From: Sabrina Dubroca Date: Tue, 9 Oct 2018 17:48:15 +0200 Subject: [PATCH 115/134] net: ipv4: don't let PMTU updates increase route MTU When an MTU update with PMTU smaller than net.ipv4.route.min_pmtu is received, we must clamp its value. However, we can receive a PMTU exception with PMTU < old_mtu < ip_rt_min_pmtu, which would lead to an increase in PMTU. To fix this, take the smallest of the old MTU and ip_rt_min_pmtu. Before this patch, in case of an update, the exception's MTU would always change. Now, an exception can have only its lock flag updated, but not the MTU, so we need to add a check on locking to the following "is this exception getting updated, or close to expiring?" test. Fixes: d52e5a7e7ca4 ("ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtu") Signed-off-by: Sabrina Dubroca Reviewed-by: Stefano Brivio Signed-off-by: David S. Miller --- net/ipv4/route.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index b678466da451..8501554e96a4 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1001,21 +1001,22 @@ out: kfree_skb(skb); static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu) { struct dst_entry *dst = &rt->dst; + u32 old_mtu = ipv4_mtu(dst); struct fib_result res; bool lock = false; if (ip_mtu_locked(dst)) return; - if (ipv4_mtu(dst) < mtu) + if (old_mtu < mtu) return; if (mtu < ip_rt_min_pmtu) { lock = true; - mtu = ip_rt_min_pmtu; + mtu = min(old_mtu, ip_rt_min_pmtu); } - if (rt->rt_pmtu == mtu && + if (rt->rt_pmtu == mtu && !lock && time_before(jiffies, dst->expires - ip_rt_mtu_expires / 2)) return; From 047491ea334a454fa0647ec99dadcc6dd38417e0 Mon Sep 17 00:00:00 2001 From: Jon Maloy Date: Wed, 10 Oct 2018 17:34:01 +0200 Subject: [PATCH 116/134] tipc: set link tolerance correctly in broadcast link In the patch referred to below we added link tolerance as an additional criteria for declaring broadcast transmission "stale" and resetting the affected links. However, the 'tolerance' field of the broadcast link is never set, and remains at zero. This renders the whole commit without the intended improving effect, but luckily also with no negative effect. In this commit we add the missing initialization. Fixes: a4dc70d46cf1 ("tipc: extend link reset criteria for stale packet retransmission") Signed-off-by: Jon Maloy Signed-off-by: David S. Miller --- net/tipc/link.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/net/tipc/link.c b/net/tipc/link.c index fb886b525d95..d229a36968da 100644 --- a/net/tipc/link.c +++ b/net/tipc/link.c @@ -477,6 +477,8 @@ bool tipc_link_create(struct net *net, char *if_name, int bearer_id, l->in_session = false; l->bearer_id = bearer_id; l->tolerance = tolerance; + if (bc_rcvlink) + bc_rcvlink->tolerance = tolerance; l->net_plane = net_plane; l->advertised_mtu = mtu; l->mtu = mtu; @@ -1031,7 +1033,7 @@ static int tipc_link_retrans(struct tipc_link *l, struct tipc_link *r, /* Detect repeated retransmit failures on same packet */ if (r->last_retransm != buf_seqno(skb)) { r->last_retransm = buf_seqno(skb); - r->stale_limit = jiffies + msecs_to_jiffies(l->tolerance); + r->stale_limit = jiffies + msecs_to_jiffies(r->tolerance); } else if (++r->stale_cnt > 99 && time_after(jiffies, r->stale_limit)) { link_retransmit_failure(l, skb); if (link_is_bc_sndlink(l)) @@ -1576,9 +1578,10 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb, strncpy(if_name, data, TIPC_MAX_IF_NAME); /* Update own tolerance if peer indicates a non-zero value */ - if (in_range(peers_tol, TIPC_MIN_LINK_TOL, TIPC_MAX_LINK_TOL)) + if (in_range(peers_tol, TIPC_MIN_LINK_TOL, TIPC_MAX_LINK_TOL)) { l->tolerance = peers_tol; - + l->bc_rcvlink->tolerance = peers_tol; + } /* Update own priority if peer's priority is higher */ if (in_range(peers_prio, l->priority + 1, TIPC_MAX_LINK_PRI)) l->priority = peers_prio; @@ -1604,9 +1607,10 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb, l->rcv_nxt_state = msg_seqno(hdr) + 1; /* Update own tolerance if peer indicates a non-zero value */ - if (in_range(peers_tol, TIPC_MIN_LINK_TOL, TIPC_MAX_LINK_TOL)) + if (in_range(peers_tol, TIPC_MIN_LINK_TOL, TIPC_MAX_LINK_TOL)) { l->tolerance = peers_tol; - + l->bc_rcvlink->tolerance = peers_tol; + } /* Update own prio if peer indicates a different value */ if ((peers_prio != l->priority) && in_range(peers_prio, 1, TIPC_MAX_LINK_PRI)) { @@ -2223,6 +2227,8 @@ void tipc_link_set_tolerance(struct tipc_link *l, u32 tol, struct sk_buff_head *xmitq) { l->tolerance = tol; + if (l->bc_rcvlink) + l->bc_rcvlink->tolerance = tol; if (link_is_up(l)) tipc_link_build_proto_msg(l, STATE_MSG, 0, 0, 0, tol, 0, xmitq); } From e7eb05823806502747eadc31039cecfd7836ddeb Mon Sep 17 00:00:00 2001 From: Parthasarathy Bhuvaragan Date: Wed, 10 Oct 2018 17:50:23 +0200 Subject: [PATCH 117/134] tipc: queue socket protocol error messages into socket receive buffer In tipc_sk_filter_rcv(), when we detect protocol messages with error we call tipc_sk_conn_proto_rcv() and let it reset the connection and notify the socket by calling sk->sk_state_change(). However, tipc_sk_filter_rcv() may have been called from the function tipc_backlog_rcv(), in which case the socket lock is held and the socket already awake. This means that the sk_state_change() call is ignored and the error notification lost. Now the receive queue will remain empty and the socket sleeps forever. In this commit, we convert the protocol message into a connection abort message and enqueue it into the socket's receive queue. By this addition to the above state change we cover all conditions. Acked-by: Ying Xue Signed-off-by: Parthasarathy Bhuvaragan Signed-off-by: Jon Maloy Signed-off-by: David S. Miller --- net/tipc/socket.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index b6f99b021d09..49810fdff4c5 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -1196,6 +1196,7 @@ void tipc_sk_mcast_rcv(struct net *net, struct sk_buff_head *arrvq, * @skb: pointer to message buffer. */ static void tipc_sk_conn_proto_rcv(struct tipc_sock *tsk, struct sk_buff *skb, + struct sk_buff_head *inputq, struct sk_buff_head *xmitq) { struct tipc_msg *hdr = buf_msg(skb); @@ -1213,7 +1214,16 @@ static void tipc_sk_conn_proto_rcv(struct tipc_sock *tsk, struct sk_buff *skb, tipc_node_remove_conn(sock_net(sk), tsk_peer_node(tsk), tsk_peer_port(tsk)); sk->sk_state_change(sk); - goto exit; + + /* State change is ignored if socket already awake, + * - convert msg to abort msg and add to inqueue + */ + msg_set_user(hdr, TIPC_CRITICAL_IMPORTANCE); + msg_set_type(hdr, TIPC_CONN_MSG); + msg_set_size(hdr, BASIC_H_SIZE); + msg_set_hdr_sz(hdr, BASIC_H_SIZE); + __skb_queue_tail(inputq, skb); + return; } tsk->probe_unacked = false; @@ -1936,7 +1946,7 @@ static void tipc_sk_proto_rcv(struct sock *sk, switch (msg_user(hdr)) { case CONN_MANAGER: - tipc_sk_conn_proto_rcv(tsk, skb, xmitq); + tipc_sk_conn_proto_rcv(tsk, skb, inputq, xmitq); return; case SOCK_WAKEUP: tipc_dest_del(&tsk->cong_links, msg_orignode(hdr), 0); From 4f7617705bfff84d756fe4401a1f4f032f374984 Mon Sep 17 00:00:00 2001 From: Giacinto Cifelli Date: Wed, 10 Oct 2018 20:05:53 +0200 Subject: [PATCH 118/134] qmi_wwan: Added support for Gemalto's Cinterion ALASxx WWAN interface MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Added support for Gemalto's Cinterion ALASxx WWAN interfaces by adding QMI_FIXED_INTF with Cinterion's VID and PID. Signed-off-by: Giacinto Cifelli Acked-by: Bjørn Mork Signed-off-by: David S. Miller --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 533b6fb8d923..72a55b6b4211 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -1241,6 +1241,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x0b3c, 0xc00b, 4)}, /* Olivetti Olicard 500 */ {QMI_FIXED_INTF(0x1e2d, 0x0060, 4)}, /* Cinterion PLxx */ {QMI_FIXED_INTF(0x1e2d, 0x0053, 4)}, /* Cinterion PHxx,PXxx */ + {QMI_FIXED_INTF(0x1e2d, 0x0063, 10)}, /* Cinterion ALASxx (1 RmNet) */ {QMI_FIXED_INTF(0x1e2d, 0x0082, 4)}, /* Cinterion PHxx,PXxx (2 RmNet) */ {QMI_FIXED_INTF(0x1e2d, 0x0082, 5)}, /* Cinterion PHxx,PXxx (2 RmNet) */ {QMI_FIXED_INTF(0x1e2d, 0x0083, 4)}, /* Cinterion PHxx,PXxx (1 RmNet + USB Audio)*/ From 3c718e677c2b35b449992adc36ecce883c467e98 Mon Sep 17 00:00:00 2001 From: Paolo Abeni Date: Thu, 11 Oct 2018 10:54:52 +0200 Subject: [PATCH 119/134] selftests: rtnetlink.sh explicitly requires bash. the script rtnetlink.sh requires a bash-only features (sleep with sub-second precision). This may cause random test failure if the default shell is not bash. Address the above explicitly requiring bash as the script interpreter. Fixes: 33b01b7b4f19 ("selftests: add rtnetlink test script") Signed-off-by: Paolo Abeni Signed-off-by: David S. Miller --- tools/testing/selftests/net/rtnetlink.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh index 08c341b49760..e101af52d1d6 100755 --- a/tools/testing/selftests/net/rtnetlink.sh +++ b/tools/testing/selftests/net/rtnetlink.sh @@ -1,4 +1,4 @@ -#!/bin/sh +#!/bin/bash # # This test is for checking rtnetlink callpaths, and get as much coverage as possible. # From 12a2ea962c06efb30764c47b140d2ec9d3cd7cb0 Mon Sep 17 00:00:00 2001 From: Paolo Abeni Date: Thu, 11 Oct 2018 10:54:53 +0200 Subject: [PATCH 120/134] selftests: udpgso_bench.sh explicitly requires bash The udpgso_bench.sh script requires several bash-only features. This may cause random failures if the default shell is not bash. Address the above explicitly requiring bash as the script interpreter Fixes: 3a687bef148d ("selftests: udp gso benchmark") Signed-off-by: Paolo Abeni Acked-by: Willem de Bruijn Signed-off-by: David S. Miller --- tools/testing/selftests/net/udpgso_bench.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/udpgso_bench.sh b/tools/testing/selftests/net/udpgso_bench.sh index 850767befa47..99e537ab5ad9 100755 --- a/tools/testing/selftests/net/udpgso_bench.sh +++ b/tools/testing/selftests/net/udpgso_bench.sh @@ -1,4 +1,4 @@ -#!/bin/sh +#!/bin/bash # SPDX-License-Identifier: GPL-2.0 # # Run a series of udpgso benchmarks From a1f8dd34e64af689e95122921fb2ca83dedd4c4e Mon Sep 17 00:00:00 2001 From: Ying Xue Date: Thu, 11 Oct 2018 19:57:56 +0800 Subject: [PATCH 121/134] tipc: eliminate possible recursive locking detected by LOCKDEP When booting kernel with LOCKDEP option, below warning info was found: WARNING: possible recursive locking detected 4.19.0-rc7+ #14 Not tainted -------------------------------------------- swapper/0/1 is trying to acquire lock: 00000000dcfc0fc8 (&(&list->lock)->rlock#4){+...}, at: spin_lock_bh include/linux/spinlock.h:334 [inline] 00000000dcfc0fc8 (&(&list->lock)->rlock#4){+...}, at: tipc_link_reset+0x125/0xdf0 net/tipc/link.c:850 but task is already holding lock: 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at: spin_lock_bh include/linux/spinlock.h:334 [inline] 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at: tipc_link_reset+0xfa/0xdf0 net/tipc/link.c:849 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&list->lock)->rlock#4); lock(&(&list->lock)->rlock#4); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by swapper/0/1: #0: 00000000f7539d34 (pernet_ops_rwsem){+.+.}, at: register_pernet_subsys+0x19/0x40 net/core/net_namespace.c:1051 #1: 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at: spin_lock_bh include/linux/spinlock.h:334 [inline] #1: 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at: tipc_link_reset+0xfa/0xdf0 net/tipc/link.c:849 stack backtrace: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc7+ #14 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1af/0x295 lib/dump_stack.c:113 print_deadlock_bug kernel/locking/lockdep.c:1759 [inline] check_deadlock kernel/locking/lockdep.c:1803 [inline] validate_chain kernel/locking/lockdep.c:2399 [inline] __lock_acquire+0xf1e/0x3c60 kernel/locking/lockdep.c:3411 lock_acquire+0x1db/0x520 kernel/locking/lockdep.c:3900 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline] _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168 spin_lock_bh include/linux/spinlock.h:334 [inline] tipc_link_reset+0x125/0xdf0 net/tipc/link.c:850 tipc_link_bc_create+0xb5/0x1f0 net/tipc/link.c:526 tipc_bcast_init+0x59b/0xab0 net/tipc/bcast.c:521 tipc_init_net+0x472/0x610 net/tipc/core.c:82 ops_init+0xf7/0x520 net/core/net_namespace.c:129 __register_pernet_operations net/core/net_namespace.c:940 [inline] register_pernet_operations+0x453/0xac0 net/core/net_namespace.c:1011 register_pernet_subsys+0x28/0x40 net/core/net_namespace.c:1052 tipc_init+0x83/0x104 net/tipc/core.c:140 do_one_initcall+0x109/0x70a init/main.c:885 do_initcall_level init/main.c:953 [inline] do_initcalls init/main.c:961 [inline] do_basic_setup init/main.c:979 [inline] kernel_init_freeable+0x4bd/0x57f init/main.c:1144 kernel_init+0x13/0x180 init/main.c:1063 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413 The reason why the noise above was complained by LOCKDEP is because we nested to hold l->wakeupq.lock and l->inputq->lock in tipc_link_reset function. In fact it's unnecessary to move skb buffer from l->wakeupq queue to l->inputq queue while holding the two locks at the same time. Instead, we can move skb buffers in l->wakeupq queue to a temporary list first and then move the buffers of the temporary list to l->inputq queue, which is also safe for us. Fixes: 3f32d0be6c16 ("tipc: lock wakeup & inputq at tipc_link_reset()") Reported-by: Dmitry Vyukov Signed-off-by: Ying Xue Acked-by: Jon Maloy Signed-off-by: David S. Miller --- net/tipc/link.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/net/tipc/link.c b/net/tipc/link.c index d229a36968da..f6552e4f4b43 100644 --- a/net/tipc/link.c +++ b/net/tipc/link.c @@ -845,15 +845,22 @@ static void link_prepare_wakeup(struct tipc_link *l) void tipc_link_reset(struct tipc_link *l) { + struct sk_buff_head list; + + __skb_queue_head_init(&list); + l->in_session = false; l->session++; l->mtu = l->advertised_mtu; + spin_lock_bh(&l->wakeupq.lock); - spin_lock_bh(&l->inputq->lock); - skb_queue_splice_init(&l->wakeupq, l->inputq); - spin_unlock_bh(&l->inputq->lock); + skb_queue_splice_init(&l->wakeupq, &list); spin_unlock_bh(&l->wakeupq.lock); + spin_lock_bh(&l->inputq->lock); + skb_queue_splice_init(&list, l->inputq); + spin_unlock_bh(&l->inputq->lock); + __skb_queue_purge(&l->transmq); __skb_queue_purge(&l->deferdq); __skb_queue_purge(&l->backlogq); From 26450608348e91a782691dbfcc836478f4381071 Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Thu, 11 Oct 2018 15:01:19 +0300 Subject: [PATCH 122/134] net/mlx4_core: Fix warnings during boot on driverinit param set failures During boot, mlx4_core sets the driverinit configuration parameters and updates the devlink module on the initial values calling devlink_param_driverinit_value_set(). If devlink_param_driverinit_value_set() returns an error mlx4_core reports kernel module warning. This caused false alarm during boot in case kernel was compiled with CONFIG_NET_DEVLINK off. Fix by removing warning reported in case devlink_param_driverinit_value_set() fails. This actually makes the function mlx4_devlink_set_init_value() redundant to using directly devlink_param_driverinit_value_set() and so removed. It fixes the following kernel trace: mlx4_core 0000:00:06.0: devlink set parameter 0 value failed (err = -95) mlx4_core 0000:00:06.0: devlink set parameter 1 value failed (err = -95) mlx4_core 0000:00:06.0: devlink set parameter 4 value failed (err = -95) mlx4_core 0000:00:06.0: devlink set parameter 5 value failed (err = -95) mlx4_core 0000:00:06.0: devlink set parameter 3 value failed (err = -95) Fixes: bd1b51dc66df ("mlx4: Add mlx4 initial parameters table and register it") Signed-off-by: Moshe Shemesh Signed-off-by: Tariq Toukan Signed-off-by: David S. Miller --- drivers/net/ethernet/mellanox/mlx4/main.c | 43 ++++++++--------------- 1 file changed, 15 insertions(+), 28 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c index d2d59444f562..6a046030e873 100644 --- a/drivers/net/ethernet/mellanox/mlx4/main.c +++ b/drivers/net/ethernet/mellanox/mlx4/main.c @@ -260,47 +260,34 @@ static const struct devlink_param mlx4_devlink_params[] = { NULL, NULL, NULL), }; -static void mlx4_devlink_set_init_value(struct devlink *devlink, u32 param_id, - union devlink_param_value init_val) -{ - struct mlx4_priv *priv = devlink_priv(devlink); - struct mlx4_dev *dev = &priv->dev; - int err; - - err = devlink_param_driverinit_value_set(devlink, param_id, init_val); - if (err) - mlx4_warn(dev, - "devlink set parameter %u value failed (err = %d)", - param_id, err); -} - static void mlx4_devlink_set_params_init_values(struct devlink *devlink) { union devlink_param_value value; value.vbool = !!mlx4_internal_err_reset; - mlx4_devlink_set_init_value(devlink, - DEVLINK_PARAM_GENERIC_ID_INT_ERR_RESET, - value); + devlink_param_driverinit_value_set(devlink, + DEVLINK_PARAM_GENERIC_ID_INT_ERR_RESET, + value); value.vu32 = 1UL << log_num_mac; - mlx4_devlink_set_init_value(devlink, - DEVLINK_PARAM_GENERIC_ID_MAX_MACS, value); + devlink_param_driverinit_value_set(devlink, + DEVLINK_PARAM_GENERIC_ID_MAX_MACS, + value); value.vbool = enable_64b_cqe_eqe; - mlx4_devlink_set_init_value(devlink, - MLX4_DEVLINK_PARAM_ID_ENABLE_64B_CQE_EQE, - value); + devlink_param_driverinit_value_set(devlink, + MLX4_DEVLINK_PARAM_ID_ENABLE_64B_CQE_EQE, + value); value.vbool = enable_4k_uar; - mlx4_devlink_set_init_value(devlink, - MLX4_DEVLINK_PARAM_ID_ENABLE_4K_UAR, - value); + devlink_param_driverinit_value_set(devlink, + MLX4_DEVLINK_PARAM_ID_ENABLE_4K_UAR, + value); value.vbool = false; - mlx4_devlink_set_init_value(devlink, - DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT, - value); + devlink_param_driverinit_value_set(devlink, + DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT, + value); } static inline void mlx4_set_num_reserved_uars(struct mlx4_dev *dev, From 2a1e89df785082a0fd7264ca6d3d834abe84fa25 Mon Sep 17 00:00:00 2001 From: Ilias Apalodimas Date: Thu, 11 Oct 2018 15:28:26 +0300 Subject: [PATCH 123/134] net: socionext: clear rx irq correctly commit 63ae7949e94a ("net: socionext: Use descriptor info instead of MMIO reads on Rx") removed constant mmio reads from the driver and started using a descriptor field to check if packet should be processed. This lead the napi rx handler being constantly called while no packets needed processing and ksoftirq getting 100% cpu usage. Issue one mmio read to clear the irq correcty after processing packets Signed-off-by: Ilias Apalodimas Reported-by: Ard Biesheuvel Tested-by: Ard Biesheuvel Acked-by: Ard Biesheuvel Signed-off-by: David S. Miller --- drivers/net/ethernet/socionext/netsec.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c index 7aa5ebb6766c..4289ccb26e4e 100644 --- a/drivers/net/ethernet/socionext/netsec.c +++ b/drivers/net/ethernet/socionext/netsec.c @@ -735,8 +735,11 @@ static int netsec_process_rx(struct netsec_priv *priv, int budget) u16 idx = dring->tail; struct netsec_de *de = dring->vaddr + (DESC_SZ * idx); - if (de->attr & (1U << NETSEC_RX_PKT_OWN_FIELD)) + if (de->attr & (1U << NETSEC_RX_PKT_OWN_FIELD)) { + /* reading the register clears the irq */ + netsec_read(priv, NETSEC_REG_NRM_RX_PKTCNT); break; + } /* This barrier is needed to keep us from reading * any other fields out of the netsec_de until we have From 511cfd580f23b0e0fcd5659931ef14c6e2c062b0 Mon Sep 17 00:00:00 2001 From: "Maciej S. Szmigiero" Date: Thu, 11 Oct 2018 16:02:10 +0200 Subject: [PATCH 124/134] r8169: set RX_MULTI_EN bit in RxConfig for 8168F-family chips It has been reported that since commit 05212ba8132b42 ("r8169: set RxConfig after tx/rx is enabled for RTL8169sb/8110sb devices") at least RTL_GIGA_MAC_VER_38 NICs work erratically after a resume from suspend. The problem has been traced to a missing RX_MULTI_EN bit in the RxConfig register. We already set this bit for RTL_GIGA_MAC_VER_35 NICs of the same 8168F chip family so let's do it also for its other siblings: RTL_GIGA_MAC_VER_36 and RTL_GIGA_MAC_VER_38. Curiously, the NIC seems to work fine after a system boot without having this bit set as long as the system isn't suspended and resumed. Fixes: 05212ba8132b42 ("r8169: set RxConfig after tx/rx is enabled for RTL8169sb/8110sb devices") Reported-by: Chris Clayton Signed-off-by: Maciej S. Szmigiero Reviewed-by: Heiner Kallweit Tested-by: Chris Clayton Signed-off-by: David S. Miller --- drivers/net/ethernet/realtek/r8169.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 9a5e2969df61..3a5e6160bf0d 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -4282,8 +4282,8 @@ static void rtl_init_rxcfg(struct rtl8169_private *tp) RTL_W32(tp, RxConfig, RX_FIFO_THRESH | RX_DMA_BURST); break; case RTL_GIGA_MAC_VER_18 ... RTL_GIGA_MAC_VER_24: - case RTL_GIGA_MAC_VER_34: - case RTL_GIGA_MAC_VER_35: + case RTL_GIGA_MAC_VER_34 ... RTL_GIGA_MAC_VER_36: + case RTL_GIGA_MAC_VER_38: RTL_W32(tp, RxConfig, RX128_INT_EN | RX_MULTI_EN | RX_DMA_BURST); break; case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51: From 052858663db31bd1ead76744df5d39d8bb703c77 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Thu, 11 Oct 2018 17:06:21 +0200 Subject: [PATCH 125/134] net: phy: sfp: remove sfp_mutex's definition The sfp_mutex variable is defined but never used in this file. Not even in the commit that introduced that variable. Remove sfp_mutex, it has no purpose. Cc: Andrew Lunn Cc: Florian Fainelli Cc: "David S. Miller" Signed-off-by: Sebastian Andrzej Siewior Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller --- drivers/net/phy/sfp.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c index 6e13b8832bc7..fd8bb998ae52 100644 --- a/drivers/net/phy/sfp.c +++ b/drivers/net/phy/sfp.c @@ -163,8 +163,6 @@ static const enum gpiod_flags gpio_flags[] = { /* Give this long for the PHY to reset. */ #define T_PHY_RESET_MS 50 -static DEFINE_MUTEX(sfp_mutex); - struct sff_data { unsigned int gpios; bool (*module_supported)(const struct sfp_eeprom_id *id); From 8dcf86caa1e3daf4a6ccf38e97f4f752b411f829 Mon Sep 17 00:00:00 2001 From: Peter Oberparleiter Date: Thu, 13 Sep 2018 12:59:59 +0200 Subject: [PATCH 126/134] vmlinux.lds.h: Fix incomplete .text.exit discards Enabling CONFIG_GCOV_PROFILE_ALL=y causes linker errors on ARM: `.text.exit' referenced in section `.ARM.exidx.text.exit': defined in discarded section `.text.exit' `.text.exit' referenced in section `.fini_array.00100': defined in discarded section `.text.exit' And related errors on NDS32: `.text.exit' referenced in section `.dtors.65435': defined in discarded section `.text.exit' The gcov compiler flags cause certain compiler versions to generate additional destructor-related sections that are not yet handled by the linker script, resulting in references between discarded and non-discarded sections. Since destructors are not used in the Linux kernel, fix this by discarding these additional sections. Reported-by: Arnd Bergmann Tested-by: Arnd Bergmann Acked-by: Arnd Bergmann Reported-by: Greentime Hu Tested-by: Masami Hiramatsu Signed-off-by: Peter Oberparleiter Signed-off-by: Stephen Rothwell --- arch/arm/kernel/vmlinux.lds.h | 2 ++ include/asm-generic/vmlinux.lds.h | 4 ++-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/arm/kernel/vmlinux.lds.h b/arch/arm/kernel/vmlinux.lds.h index ae5fdff18406..8247bc15addc 100644 --- a/arch/arm/kernel/vmlinux.lds.h +++ b/arch/arm/kernel/vmlinux.lds.h @@ -49,6 +49,8 @@ #define ARM_DISCARD \ *(.ARM.exidx.exit.text) \ *(.ARM.extab.exit.text) \ + *(.ARM.exidx.text.exit) \ + *(.ARM.extab.text.exit) \ ARM_CPU_DISCARD(*(.ARM.exidx.cpuexit.text)) \ ARM_CPU_DISCARD(*(.ARM.extab.cpuexit.text)) \ ARM_EXIT_DISCARD(EXIT_TEXT) \ diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 7b75ff6e2fce..b4d74b1c1e1d 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -613,8 +613,8 @@ #define EXIT_DATA \ *(.exit.data .exit.data.*) \ - *(.fini_array) \ - *(.dtors) \ + *(.fini_array .fini_array.*) \ + *(.dtors .dtors.*) \ MEM_DISCARD(exit.data*) \ MEM_DISCARD(exit.rodata*) From 52c8ee5bad8f33d02c567f6609f43d69303fc48d Mon Sep 17 00:00:00 2001 From: Peter Oberparleiter Date: Thu, 13 Sep 2018 13:00:00 +0200 Subject: [PATCH 127/134] vmlinux.lds.h: Fix linker warnings about orphan .LPBX sections Enabling both CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y and CONFIG_GCOV_PROFILE_ALL=y results in linker warnings: warning: orphan section `.data..LPBX1' being placed in section `.data..LPBX1'. LD_DEAD_CODE_DATA_ELIMINATION adds compiler flag -fdata-sections. This option causes GCC to create separate data sections for data objects, including those generated by GCC internally for gcov profiling. The names of these objects start with a dot (.LPBX0, .LPBX1), resulting in section names starting with 'data..'. As section names starting with 'data..' are used for specific purposes in the Linux kernel, the linker script does not automatically include them in the output data section, resulting in the "orphan section" linker warnings. Fix this by specifically including sections named "data..LPBX*" in the data section. Reported-by: Stephen Rothwell Tested-by: Stephen Rothwell Tested-by: Arnd Bergmann Acked-by: Arnd Bergmann Signed-off-by: Peter Oberparleiter Signed-off-by: Stephen Rothwell --- include/asm-generic/vmlinux.lds.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index b4d74b1c1e1d..d7701d466b60 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -68,7 +68,7 @@ */ #ifdef CONFIG_LD_DEAD_CODE_DATA_ELIMINATION #define TEXT_MAIN .text .text.[0-9a-zA-Z_]* -#define DATA_MAIN .data .data.[0-9a-zA-Z_]* +#define DATA_MAIN .data .data.[0-9a-zA-Z_]* .data..LPBX* #define SDATA_MAIN .sdata .sdata.[0-9a-zA-Z_]* #define RODATA_MAIN .rodata .rodata.[0-9a-zA-Z_]* #define BSS_MAIN .bss .bss.[0-9a-zA-Z_]* From bf3b452b7af787b8bf27de6490dc4eedf6f97599 Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Tue, 9 Oct 2018 16:48:57 -0700 Subject: [PATCH 128/134] net: dsa: bcm_sf2: Fix unbind ordering The order in which we release resources is unfortunately leading to bus errors while dismantling the port. This is because we set priv->wol_ports_mask to 0 to tell bcm_sf2_sw_suspend() that it is now permissible to clock gate the switch. Later on, when dsa_slave_destroy() comes in from dsa_unregister_switch() and calls dsa_switch_ops::port_disable, we perform the same dismantling again, and this time we hit registers that are clock gated. Make sure that dsa_unregister_switch() is the first thing that happens, which takes care of releasing all user visible resources, then proceed with clock gating hardware. We still need to set priv->wol_ports_mask to 0 to make sure that an enabled port properly gets disabled in case it was previously used as part of Wake-on-LAN. Fixes: d9338023fb8e ("net: dsa: bcm_sf2: Make it a real platform device driver") Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller --- drivers/net/dsa/bcm_sf2.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c index e0066adcd2f3..b6d8e849a949 100644 --- a/drivers/net/dsa/bcm_sf2.c +++ b/drivers/net/dsa/bcm_sf2.c @@ -1173,10 +1173,10 @@ static int bcm_sf2_sw_remove(struct platform_device *pdev) { struct bcm_sf2_priv *priv = platform_get_drvdata(pdev); - /* Disable all ports and interrupts */ priv->wol_ports_mask = 0; - bcm_sf2_sw_suspend(priv->dev->ds); dsa_unregister_switch(priv->dev->ds); + /* Disable all ports and interrupts */ + bcm_sf2_sw_suspend(priv->dev->ds); bcm_sf2_mdio_unregister(priv); return 0; From 54baca096386d862d19c10f58f34bf787c6b3cbe Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Tue, 9 Oct 2018 16:48:58 -0700 Subject: [PATCH 129/134] net: dsa: bcm_sf2: Call setup during switch resume There is no reason to open code what the switch setup function does, in fact, because we just issued a switch reset, we would make all the register get their default values, including for instance, having unused port be enabled again and wasting power and leading to an inappropriate switch core clock being selected. Fixes: 8cfa94984c9c ("net: dsa: bcm_sf2: add suspend/resume callbacks") Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller --- drivers/net/dsa/bcm_sf2.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c index b6d8e849a949..fc8b48adf38b 100644 --- a/drivers/net/dsa/bcm_sf2.c +++ b/drivers/net/dsa/bcm_sf2.c @@ -703,7 +703,6 @@ static int bcm_sf2_sw_suspend(struct dsa_switch *ds) static int bcm_sf2_sw_resume(struct dsa_switch *ds) { struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds); - unsigned int port; int ret; ret = bcm_sf2_sw_rst(priv); @@ -715,14 +714,7 @@ static int bcm_sf2_sw_resume(struct dsa_switch *ds) if (priv->hw_params.num_gphy == 1) bcm_sf2_gphy_enable_set(ds, true); - for (port = 0; port < DSA_MAX_PORTS; port++) { - if (dsa_is_user_port(ds, port)) - bcm_sf2_port_setup(ds, port, NULL); - else if (dsa_is_cpu_port(ds, port)) - bcm_sf2_imp_setup(ds, port); - } - - bcm_sf2_enable_acb(ds); + ds->ops->setup(ds); return 0; } From f0fe77f601c3d6a821198f88f7adb0a05b8fe03e Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Thu, 11 Oct 2018 13:06:17 +0200 Subject: [PATCH 130/134] lib/bch: fix possible stack overrun The previous patch introduced very large kernel stack usage and a Makefile change to hide the warning about it. From what I can tell, a number of things went wrong here: - The BCH_MAX_T constant was set to the maximum value for 'n', not the maximum for 't', which is much smaller. - The stack usage is actually larger than the entire kernel stack on some architectures that can use 4KB stacks (m68k, sh, c6x), which leads to an immediate overrun. - The justification in the patch description claimed that nothing changed, however that is not the case even without the two points above: the configuration is machine specific, and most boards never use the maximum BCH_ECC_WORDS() length but instead have something much smaller. That maximum would only apply to machines that use both the maximum block size and the maximum ECC strength. The largest value for 't' that I could find is '32', which in turn leads to a 60 byte array instead of 2048 bytes. Making it '64' for future extension seems also worthwhile, with 120 bytes for the array. Anything larger won't fit into the OOB area on NAND flash. With that changed, the warning can be enabled again. Only linux-4.19+ contains the breakage, so this is only needed as a stable backport if it does not make it into the release. Fixes: 02361bc77888 ("lib/bch: Remove VLA usage") Reported-by: Ard Biesheuvel Cc: stable@vger.kernel.org Signed-off-by: Arnd Bergmann Signed-off-by: Boris Brezillon --- lib/Makefile | 1 - lib/bch.c | 17 +++++++++++++---- 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/lib/Makefile b/lib/Makefile index ca3f7ebb900d..423876446810 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -119,7 +119,6 @@ obj-$(CONFIG_ZLIB_INFLATE) += zlib_inflate/ obj-$(CONFIG_ZLIB_DEFLATE) += zlib_deflate/ obj-$(CONFIG_REED_SOLOMON) += reed_solomon/ obj-$(CONFIG_BCH) += bch.o -CFLAGS_bch.o := $(call cc-option,-Wframe-larger-than=4500) obj-$(CONFIG_LZO_COMPRESS) += lzo/ obj-$(CONFIG_LZO_DECOMPRESS) += lzo/ obj-$(CONFIG_LZ4_COMPRESS) += lz4/ diff --git a/lib/bch.c b/lib/bch.c index 7b0f2006698b..5db6d3a4c8a6 100644 --- a/lib/bch.c +++ b/lib/bch.c @@ -79,20 +79,19 @@ #define GF_T(_p) (CONFIG_BCH_CONST_T) #define GF_N(_p) ((1 << (CONFIG_BCH_CONST_M))-1) #define BCH_MAX_M (CONFIG_BCH_CONST_M) +#define BCH_MAX_T (CONFIG_BCH_CONST_T) #else #define GF_M(_p) ((_p)->m) #define GF_T(_p) ((_p)->t) #define GF_N(_p) ((_p)->n) -#define BCH_MAX_M 15 +#define BCH_MAX_M 15 /* 2KB */ +#define BCH_MAX_T 64 /* 64 bit correction */ #endif -#define BCH_MAX_T (((1 << BCH_MAX_M) - 1) / BCH_MAX_M) - #define BCH_ECC_WORDS(_p) DIV_ROUND_UP(GF_M(_p)*GF_T(_p), 32) #define BCH_ECC_BYTES(_p) DIV_ROUND_UP(GF_M(_p)*GF_T(_p), 8) #define BCH_ECC_MAX_WORDS DIV_ROUND_UP(BCH_MAX_M * BCH_MAX_T, 32) -#define BCH_ECC_MAX_BYTES DIV_ROUND_UP(BCH_MAX_M * BCH_MAX_T, 8) #ifndef dbg #define dbg(_fmt, args...) do {} while (0) @@ -202,6 +201,9 @@ void encode_bch(struct bch_control *bch, const uint8_t *data, const uint32_t * const tab3 = tab2 + 256*(l+1); const uint32_t *pdata, *p0, *p1, *p2, *p3; + if (WARN_ON(r_bytes > sizeof(r))) + return; + if (ecc) { /* load ecc parity bytes into internal 32-bit buffer */ load_ecc8(bch, bch->ecc_buf, ecc); @@ -1285,6 +1287,13 @@ struct bch_control *init_bch(int m, int t, unsigned int prim_poly) */ goto fail; + if (t > BCH_MAX_T) + /* + * we can support larger than 64 bits if necessary, at the + * cost of higher stack usage. + */ + goto fail; + /* sanity checks */ if ((t < 1) || (m*t >= ((1 << m)-1))) /* invalid t value */ From 6b3944e42e2e554aa5a4be681ecd70dccd459114 Mon Sep 17 00:00:00 2001 From: David Howells Date: Thu, 11 Oct 2018 22:45:49 +0100 Subject: [PATCH 131/134] afs: Fix cell proc list Access to the list of cells by /proc/net/afs/cells has a couple of problems: (1) It should be checking against SEQ_START_TOKEN for the keying the header line. (2) It's only holding the RCU read lock, so it can't just walk over the list without following the proper RCU methods. Fix these by using an hlist instead of an ordinary list and using the appropriate accessor functions to follow it with RCU. Since the code that adds a cell to the list must also necessarily change, sort the list on insertion whilst we're at it. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells Signed-off-by: Greg Kroah-Hartman --- fs/afs/cell.c | 17 +++++++++++++++-- fs/afs/dynroot.c | 2 +- fs/afs/internal.h | 4 ++-- fs/afs/main.c | 2 +- fs/afs/proc.c | 7 +++---- 5 files changed, 22 insertions(+), 10 deletions(-) diff --git a/fs/afs/cell.c b/fs/afs/cell.c index f3d0bef16d78..6127f0fcd62c 100644 --- a/fs/afs/cell.c +++ b/fs/afs/cell.c @@ -514,6 +514,8 @@ static int afs_alloc_anon_key(struct afs_cell *cell) */ static int afs_activate_cell(struct afs_net *net, struct afs_cell *cell) { + struct hlist_node **p; + struct afs_cell *pcell; int ret; if (!cell->anonymous_key) { @@ -534,7 +536,18 @@ static int afs_activate_cell(struct afs_net *net, struct afs_cell *cell) return ret; mutex_lock(&net->proc_cells_lock); - list_add_tail(&cell->proc_link, &net->proc_cells); + for (p = &net->proc_cells.first; *p; p = &(*p)->next) { + pcell = hlist_entry(*p, struct afs_cell, proc_link); + if (strcmp(cell->name, pcell->name) < 0) + break; + } + + cell->proc_link.pprev = p; + cell->proc_link.next = *p; + rcu_assign_pointer(*p, &cell->proc_link.next); + if (cell->proc_link.next) + cell->proc_link.next->pprev = &cell->proc_link.next; + afs_dynroot_mkdir(net, cell); mutex_unlock(&net->proc_cells_lock); return 0; @@ -550,7 +563,7 @@ static void afs_deactivate_cell(struct afs_net *net, struct afs_cell *cell) afs_proc_cell_remove(cell); mutex_lock(&net->proc_cells_lock); - list_del_init(&cell->proc_link); + hlist_del_rcu(&cell->proc_link); afs_dynroot_rmdir(net, cell); mutex_unlock(&net->proc_cells_lock); diff --git a/fs/afs/dynroot.c b/fs/afs/dynroot.c index 1cde710a8013..f29c6dade7f6 100644 --- a/fs/afs/dynroot.c +++ b/fs/afs/dynroot.c @@ -265,7 +265,7 @@ int afs_dynroot_populate(struct super_block *sb) return -ERESTARTSYS; net->dynroot_sb = sb; - list_for_each_entry(cell, &net->proc_cells, proc_link) { + hlist_for_each_entry(cell, &net->proc_cells, proc_link) { ret = afs_dynroot_mkdir(net, cell); if (ret < 0) goto error; diff --git a/fs/afs/internal.h b/fs/afs/internal.h index 871a228d7f37..34c02fdcc25f 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -242,7 +242,7 @@ struct afs_net { seqlock_t cells_lock; struct mutex proc_cells_lock; - struct list_head proc_cells; + struct hlist_head proc_cells; /* Known servers. Theoretically each fileserver can only be in one * cell, but in practice, people create aliases and subsets and there's @@ -320,7 +320,7 @@ struct afs_cell { struct afs_net *net; struct key *anonymous_key; /* anonymous user key for this cell */ struct work_struct manager; /* Manager for init/deinit/dns */ - struct list_head proc_link; /* /proc cell list link */ + struct hlist_node proc_link; /* /proc cell list link */ #ifdef CONFIG_AFS_FSCACHE struct fscache_cookie *cache; /* caching cookie */ #endif diff --git a/fs/afs/main.c b/fs/afs/main.c index e84fe822a960..107427688edd 100644 --- a/fs/afs/main.c +++ b/fs/afs/main.c @@ -87,7 +87,7 @@ static int __net_init afs_net_init(struct net *net_ns) timer_setup(&net->cells_timer, afs_cells_timer, 0); mutex_init(&net->proc_cells_lock); - INIT_LIST_HEAD(&net->proc_cells); + INIT_HLIST_HEAD(&net->proc_cells); seqlock_init(&net->fs_lock); net->fs_servers = RB_ROOT; diff --git a/fs/afs/proc.c b/fs/afs/proc.c index 476dcbb79713..9101f62707af 100644 --- a/fs/afs/proc.c +++ b/fs/afs/proc.c @@ -33,9 +33,8 @@ static inline struct afs_net *afs_seq2net_single(struct seq_file *m) static int afs_proc_cells_show(struct seq_file *m, void *v) { struct afs_cell *cell = list_entry(v, struct afs_cell, proc_link); - struct afs_net *net = afs_seq2net(m); - if (v == &net->proc_cells) { + if (v == SEQ_START_TOKEN) { /* display header on line 1 */ seq_puts(m, "USE NAME\n"); return 0; @@ -50,12 +49,12 @@ static void *afs_proc_cells_start(struct seq_file *m, loff_t *_pos) __acquires(rcu) { rcu_read_lock(); - return seq_list_start_head(&afs_seq2net(m)->proc_cells, *_pos); + return seq_hlist_start_head_rcu(&afs_seq2net(m)->proc_cells, *_pos); } static void *afs_proc_cells_next(struct seq_file *m, void *v, loff_t *pos) { - return seq_list_next(v, &afs_seq2net(m)->proc_cells, pos); + return seq_hlist_next_rcu(v, &afs_seq2net(m)->proc_cells, pos); } static void afs_proc_cells_stop(struct seq_file *m, void *v) From 38a12607a82f3cba4149d36cc54695bd33f5ce04 Mon Sep 17 00:00:00 2001 From: Peter Rosin Date: Fri, 12 Oct 2018 14:46:40 +0000 Subject: [PATCH 132/134] mux: adgs1408: use the correct MODULE_LICENSE The file is GPL v2 or later. Acked-by: Mircea Caprioru Signed-off-by: Peter Rosin Signed-off-by: Greg Kroah-Hartman --- drivers/mux/adgs1408.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/mux/adgs1408.c b/drivers/mux/adgs1408.c index 0f7cf54e3234..89096f10f4c4 100644 --- a/drivers/mux/adgs1408.c +++ b/drivers/mux/adgs1408.c @@ -128,4 +128,4 @@ module_spi_driver(adgs1408_driver); MODULE_AUTHOR("Mircea Caprioru "); MODULE_DESCRIPTION("Analog Devices ADGS1408 MUX driver"); -MODULE_LICENSE("GPL v2"); +MODULE_LICENSE("GPL"); From b40afc0066405e5ecfce73949b56ddd6bb65bd10 Mon Sep 17 00:00:00 2001 From: Peter Rosin Date: Fri, 12 Oct 2018 14:46:46 +0000 Subject: [PATCH 133/134] MAINTAINERS: use the correct location for dt-bindings includes for mux Just drop the "linux" part of the path, it was never correct. Reported-by: Joe Perches Fixes: 256ac0375098 ("dt-bindings: document devicetree bindings for mux-controllers and gpio-mux") Signed-off-by: Peter Rosin Signed-off-by: Greg Kroah-Hartman --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 40082e45b3da..6ac000cc006d 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9865,7 +9865,7 @@ M: Peter Rosin S: Maintained F: Documentation/ABI/testing/sysfs-class-mux* F: Documentation/devicetree/bindings/mux/ -F: include/linux/dt-bindings/mux/ +F: include/dt-bindings/mux/ F: include/linux/mux/ F: drivers/mux/ From f014ffb025c159fd51d19af8af0022a991aaa4f8 Mon Sep 17 00:00:00 2001 From: David Howells Date: Fri, 12 Oct 2018 14:00:57 +0100 Subject: [PATCH 134/134] afs: Fix afs_server struct leak Fix a leak of afs_server structs. The routine that installs them in the various lookup lists and trees gets a ref on leaving the function, whether it added the server or a server already exists. It shouldn't increment the refcount if it added the server. The effect of this that "rmmod kafs" will hang waiting for the leaked server to become unused. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells Signed-off-by: Greg Kroah-Hartman --- fs/afs/server.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/afs/server.c b/fs/afs/server.c index 1d329e6981d5..2f306c0cc4ee 100644 --- a/fs/afs/server.c +++ b/fs/afs/server.c @@ -199,9 +199,11 @@ static struct afs_server *afs_install_server(struct afs_net *net, write_sequnlock(&net->fs_addr_lock); ret = 0; + goto out; exists: afs_get_server(server); +out: write_sequnlock(&net->fs_lock); return server; }