linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-11 12:28:41 +08:00

Author	SHA1	Message	Date
Christophe Leroy	5c35a02c54	powerpc: clean the inclusion of stringify.h Only include linux/stringify.h is files using __stringify() Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-30 22:48:17 +10:00
Christophe Leroy	ec0c464cdb	powerpc: move ASM_CONST and stringify_in_c() into asm-const.h This patch moves ASM_CONST() and stringify_in_c() into dedicated asm-const.h, then cleans all related inclusions. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: asm-compat.h should include asm-const.h] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-30 22:48:16 +10:00
Christophe Leroy	36a7eeaff7	powerpc/405: move PPC405_ERR77 in asm-405.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-30 22:48:13 +10:00
Christophe Leroy	8c58259bba	powerpc: remove unneeded inclusions of cpu_has_feature.h Files not using cpu_has_feature() don't need cpu_has_feature.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-30 22:47:54 +10:00
Christophe Leroy	db0a2b633d	powerpc: remove kdump.h from page.h page.h doesn't need kdump.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-30 22:47:53 +10:00
Nicholas Piggin	17cc1dd492	powerpc/powernv: implement opal_put_chars_atomic The RAW console does not need writes to be atomic, so relax opal_put_chars to be able to do partial writes, and implement an _atomic variant which does not take a spinlock. This API is used in xmon, so the less locking that is used, the better chance there is that a crash can be debugged. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:09:57 +10:00
Nicholas Piggin	ac4ac788fd	powerpc/powernv: move opal console flushing to udbg OPAL console writes do not have to synchronously flush firmware / hardware buffers unless they are going through the udbg path. Remove the unconditional flushing from opal_put_chars. Flush if there was no space in the buffer as an optimisation (callers loop waiting for success in that case). udbg flushing is moved to udbg_opal_putc. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:09:57 +10:00
Nicholas Piggin	b74d2807ae	powerpc/powernv: Remove OPALv1 support from opal console driver opal_put_chars deals with partial writes because in OPALv1, opal_console_write_buffer_space did not work correctly. That firmware is not supported. This reworks the opal_put_chars code to no longer deal with partial writes by turning them into full writes. Partial write handling is still supported in terms of what gets returned to the caller, but it may not go to the console atomically. A warning message is printed in this case. This allows console flushing to be moved out of the opal_write_lock spinlock. That could cause the lock to be held for long periods if the console is busy (especially if it was being spammed by firmware), which is dangerous because the lock is taken by xmon to debug the system. Flushing outside the lock improves the situation a bit. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:09:56 +10:00
Nicholas Piggin	d2a2262e68	powerpc/powernv: Implement and use opal_flush_console A new console flushing firmware API was introduced to replace event polling loops, and implemented in opal-kmsg with `affddff69c` ("powerpc/powernv: Add a kmsg_dumper that flushes console output on panic"), to flush the console in the panic path. The OPAL console driver has other situations where interrupts are off and it needs to flush the console synchronously. These still use a polling loop. So move the opal-kmsg flush code to opal_flush_console, and use the new function in opal-kmsg and opal_put_chars. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:09:56 +10:00
Nicholas Piggin	e00da0f2db	powerpc/powernv: opal-kmsg use flush fallback from console code Use the more refined and tested event polling loop from opal_put_chars as the fallback console flush in the opal-kmsg path. This loop is used by the console driver today, whereas the opal-kmsg fallback is not likely to have been used for years. Use WARN_ONCE rather than a printk when the fallback is invoked to prepare for moving the console flush into a common function. Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:09:56 +10:00
Nicholas Piggin	3a80bfc7ea	powerpc/powernv: opal-kmsg standardise OPAL_BUSY handling OPAL_CONSOLE_FLUSH is documented as being able to return OPAL_BUSY, so implement the standard OPAL_BUSY handling for it. Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:09:55 +10:00
Nicholas Piggin	36d2dabc87	powerpc/powernv: Fix OPAL console driver OPAL_BUSY loops The OPAL console driver does not delay in case it gets OPAL_BUSY or OPAL_BUSY_EVENT from firmware. It can't yet be made to sleep because it is called under spinlock, but it can be changed to the standard OPAL_BUSY loop form, and a delay added to keep it from hitting the firmware too frequently. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:09:55 +10:00
Nicholas Piggin	bd90284cc6	powerpc/powernv: opal_put_chars partial write fix The intention here is to consume and discard the remaining buffer upon error. This works if there has not been a previous partial write. If there has been, then total_len is no longer total number of bytes to copy. total_len is always "bytes left to copy", so it should be added to written bytes. This code may not be exercised any more if partial writes will not be hit, but this is a small bugfix before a larger change. Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:09:54 +10:00
Mukesh Ojha	b29336c0e1	powerpc/powernv/opal-dump : Use IRQ_HANDLED instead of numbers in interrupt handler Fixes: `8034f715f` ("powernv/opal-dump: Convert to irq domain") Converts all the return explicit number to a more proper IRQ_HANDLED, which looks proper incase of interrupt handler returning case. Here, It also removes error message like "nobody cared" which was getting unveiled while returning -1 or 0 from handler. Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:03:24 +10:00
Mukesh Ojha	a5bbe8fd29	powerpc/powernv/opal-dump : Handles opal_dump_info properly Moves the return value check of 'opal_dump_info' to a proper place which was previously unnecessarily filling all the dump info even on failure. Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com> Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com> Acked-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:03:23 +10:00
Cyril Bur	edd00b8307	powerpc/tm: Remove struct thread_info param from tm_reclaim_thread() Since commit `dc3106690b` ("powerpc: tm: Always use fp_state and vr_state to store live registers") tm_reclaim_thread() doesn't use the parameter anymore, both callers have to bother getting it as they have no need for a struct thread_info either. Just remove it and adjust the callers. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:03:23 +10:00
Cyril Bur	a596a7e917	powerpc/tm: Update function prototype comment In commit `eb5c3f1c86` ("powerpc: Always save/restore checkpointed regs during treclaim/trecheckpoint") __tm_recheckpoint was modified to no longer take the second parameter 'unsigned long orig_msr' as part of a TM rewrite to simplify the reclaiming/recheckpointing process. There is a comment in the asm file where the function is delcared which has an incorrect prototype with the 'orig_msr' parameter. This patch corrects the comment. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:03:22 +10:00
Simon Guo	c2a4e54e8b	powerpc/64: add 32 bytes prechecking before using VMX optimization on memcmp() This patch is based on the previous VMX patch on memcmp(). To optimize ppc64 memcmp() with VMX instruction, we need to think about the VMX penalty brought with: If kernel uses VMX instruction, it needs to save/restore current thread's VMX registers. There are 32 x 128 bits VMX registers in PPC, which means 32 x 16 = 512 bytes for load and store. The major concern regarding the memcmp() performance in kernel is KSM, who will use memcmp() frequently to merge identical pages. So it will make sense to take some measures/enhancement on KSM to see whether any improvement can be done here. Cyril Bur indicates that the memcmp() for KSM has a higher possibility to fail (unmatch) early in previous bytes in following mail. https://patchwork.ozlabs.org/patch/817322/#1773629 And I am taking a follow-up on this with this patch. Per some testing, it shows KSM memcmp() will fail early at previous 32 bytes. More specifically: - 76% cases will fail/unmatch before 16 bytes; - 83% cases will fail/unmatch before 32 bytes; - 84% cases will fail/unmatch before 64 bytes; So 32 bytes looks a better choice than other bytes for pre-checking. The early failure is also true for memcmp() for non-KSM case. With a non-typical call load, it shows ~73% cases fail before first 32 bytes. This patch adds a 32 bytes pre-checking firstly before jumping into VMX operations, to avoid the unnecessary VMX penalty. It is not limited to KSM case. And the testing shows ~20% improvement on memcmp() average execution time with this patch. And note the 32B pre-checking is only performed when the compare size is long enough (>=4K currently) to allow VMX operation. The detail data and analysis is at: https://github.com/justdoitqd/publicFiles/blob/master/memcmp/README.md Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:03:21 +10:00
Simon Guo	d58badfb7c	powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision This patch add VMX primitives to do memcmp() in case the compare size is equal or greater than 4K bytes. KSM feature can benefit from this. Test result with following test program(replace the "^>" with ""): ------ ># cat tools/testing/selftests/powerpc/stringloops/memcmp.c >#include <malloc.h> >#include <stdlib.h> >#include <string.h> >#include <time.h> >#include "utils.h" >#define SIZE (1024 * 1024 * 900) >#define ITERATIONS 40 int test_memcmp(const void s1, const void s2, size_t n); static int testcase(void) { char s1; char s2; unsigned long i; s1 = memalign(128, SIZE); if (!s1) { perror("memalign"); exit(1); } s2 = memalign(128, SIZE); if (!s2) { perror("memalign"); exit(1); } for (i = 0; i < SIZE; i++) { s1[i] = i & 0xff; s2[i] = i & 0xff; } for (i = 0; i < ITERATIONS; i++) { int ret = test_memcmp(s1, s2, SIZE); if (ret) { printf("return %d at[%ld]! should have returned zero\n", ret, i); abort(); } } return 0; } int main(void) { return test_harness(testcase, "memcmp"); } ------ Without this patch (but with the first patch "powerpc/64: Align bytes before fall back to .Lshort in powerpc64 memcmp()." in the series): 4.726728762 seconds time elapsed ( +- 3.54%) With VMX patch: 4.234335473 seconds time elapsed ( +- 2.63%) There is ~+10% improvement. Testing with unaligned and different offset version (make s1 and s2 shift random offset within 16 bytes) can archieve higher improvement than 10%.. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:03:21 +10:00
Simon Guo	f1ecbaf466	powerpc: add vcmpequd/vcmpequb ppc instruction macro Some old tool chains don't know about instructions like vcmpequd. This patch adds .long macro for vcmpequd and vcmpequb, which is a preparation to optimize ppc64 memcmp with VMX instructions. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:03:20 +10:00
Simon Guo	2d9ee327ad	powerpc/64: Align bytes before fall back to .Lshort in powerpc64 memcmp() Currently memcmp() 64bytes version in powerpc will fall back to .Lshort (compare per byte mode) if either src or dst address is not 8 bytes aligned. It can be opmitized in 2 situations: 1) if both addresses are with the same offset with 8 bytes boundary: memcmp() can compare the unaligned bytes within 8 bytes boundary firstly and then compare the rest 8-bytes-aligned content with .Llong mode. 2) If src/dst addrs are not with the same offset of 8 bytes boundary: memcmp() can align src addr with 8 bytes, increment dst addr accordingly, then load src with aligned mode and load dst with unaligned mode. This patch optmizes memcmp() behavior in the above 2 situations. Tested with both little/big endian. Performance result below is based on little endian. Following is the test result with src/dst having the same offset case: (a similar result was observed when src/dst having different offset): (1) 256 bytes Test with the existing tools/testing/selftests/powerpc/stringloops/memcmp: - without patch 29.773018302 seconds time elapsed ( +- 0.09% ) - with patch 16.485568173 seconds time elapsed ( +- 0.02% ) -> There is ~+80% percent improvement (2) 32 bytes To observe performance impact on < 32 bytes, modify tools/testing/selftests/powerpc/stringloops/memcmp.c with following: ------- #include <string.h> #include "utils.h" -#define SIZE 256 +#define SIZE 32 #define ITERATIONS 10000 int test_memcmp(const void s1, const void s2, size_t n); -------- - Without patch 0.244746482 seconds time elapsed ( +- 0.36%) - with patch 0.215069477 seconds time elapsed ( +- 0.51%) -> There is ～+13% improvement (3) 0~8 bytes To observe <8 bytes performance impact, modify tools/testing/selftests/powerpc/stringloops/memcmp.c with following: ------- #include <string.h> #include "utils.h" -#define SIZE 256 -#define ITERATIONS 10000 +#define SIZE 8 +#define ITERATIONS 1000000 int test_memcmp(const void s1, const void s2, size_t n); ------- - Without patch 1.845642503 seconds time elapsed ( +- 0.12% ) - With patch 1.849767135 seconds time elapsed ( +- 0.26% ) -> They are nearly the same. (-0.2%) Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:03:20 +10:00
Aneesh Kumar K.V	ca42d8d2d6	powerpc/pseries/mm: Improve error reporting on HCALL failures This patch adds error reporting to H_ENTER and H_READ hcalls. A failure for both these hcalls are mostly fatal and it would be good to log the failure reason. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:03:19 +10:00
Aneesh Kumar K.V	65471d763e	powerpc/pseries: Use pr_xxx() in lpar.c Switch from printk to pr_fmt() / pr_xxx(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:03:19 +10:00
Aneesh Kumar K.V	27d8959da7	powerpc/mm/hash: Reduce contention on hpte lock We do this in some part. This patch make sure we always try to search for hpte without holding lock and redo the compare with lock held once match found. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:03:18 +10:00
Aneesh Kumar K.V	a833280b4a	powerpc/mm/hash: Add hpte_get_old_v and use that instead of opencoding No functional change Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:03:18 +10:00
Aneesh Kumar K.V	1531cff44b	powerpc/mm/hash: Remove the superfluous bitwise operation when find hpte group When computing the starting slot number for a hash page table group we used to do this hpte_group = ((hash & htab_hash_mask) * HPTES_PER_GROUP) & ~0x7UL; Multiplying with 8 (HPTES_PER_GROUP) imply the last three bits are 0. Hence we really don't need to clear then separately. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:03:17 +10:00
Aneesh Kumar K.V	7d4340bb92	powerpc/mm: Increase MAX_PHYSMEM_BITS to 128TB with SPARSEMEM_VMEMMAP config We do this only with VMEMMAP config so that our page_to_[nid/section] etc are not impacted. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:03:17 +10:00
Aneesh Kumar K.V	6aba0c84ec	powerpc/mm: Check memblock_add against MAX_PHYSMEM_BITS range With SPARSEMEM config enabled, we make sure that we don't add sections beyond MAX_PHYSMEM_BITS range. This results in not building vmemmap mapping for range beyond max range. But our memblock layer looks the device tree and create mapping for the full memory range. Prevent this by checking against MAX_PHSYSMEM_BITS when doing memblock_add. We don't do similar check for memeblock_reserve_range. If reserve range is beyond MAX_PHYSMEM_BITS we expect that to be configured with 'nomap'. Any other reserved range should come from existing memblock ranges which we already filtered while adding. This avoids crash as below when running on a system with system ram config above MAX_PHSYSMEM_BITS Unable to handle kernel paging request for data at address 0xc00a001000000440 Faulting instruction address: 0xc000000001034118 cpu 0x0: Vector: 300 (Data Access) at [c00000000124fb30] pc: c000000001034118: __free_pages_bootmem+0xc0/0x1c0 lr: c00000000103b258: free_all_bootmem+0x19c/0x22c sp: c00000000124fdb0 msr: 9000000002001033 dar: c00a001000000440 dsisr: 40000000 current = 0xc00000000120dd00 paca = 0xc000000001f60000^I irqmask: 0x03^I irq_happened: 0x01 pid = 0, comm = swapper [c00000000124fe20] c00000000103b258 free_all_bootmem+0x19c/0x22c [c00000000124fee0] c000000001010a68 mem_init+0x3c/0x5c [c00000000124ff00] c00000000100401c start_kernel+0x298/0x5e4 [c00000000124ff90] c00000000000b57c start_here_common+0x1c/0x520 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:03:16 +10:00
Michael Ellerman	64de5d8d04	powerpc: Add ppc64le and ppc64_book3e allmodconfig targets Similarly as we just did for 32-bit, add phony targets for generating a little endian and Book3E allmodconfig. These aren't covered by the regular allmodconfig, which is big endian and Book3S due to the way the Kconfig symbols are structured. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:03:16 +10:00
Michael Ellerman	8db0c9d416	powerpc: Add ppc32_allmodconfig defconfig target Because the allmodconfig logic just sets every symbol to M or Y, it has the effect of always generating a 64-bit config, because CONFIG_PPC64 becomes Y. So to make it easier for folks to test 32-bit code, provide a phony defconfig target that generates a 32-bit allmodconfig. The 32-bit port has several mutually exclusive CPU types, we choose the Book3S variants as that's what the help text in Kconfig says is most common. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:03:15 +10:00
Michael Ellerman	6d44acae19	powerpc64s: Show ori31 availability in spectre_v1 sysfs file not v2 When I added the spectre_v2 information in sysfs, I included the availability of the ori31 speculation barrier. Although the ori31 barrier can be used to mitigate v2, it's primarily intended as a spectre v1 mitigation. Spectre v2 is mitigated by hardware changes. So rework the sysfs files to show the ori31 information in the spectre_v1 file, rather than v2. Currently we display eg: $ grep . spectre_v* spectre_v1:Mitigation: __user pointer sanitization spectre_v2:Mitigation: Indirect branch cache disabled, ori31 speculation barrier enabled After: $ grep . spectre_v* spectre_v1:Mitigation: __user pointer sanitization, ori31 speculation barrier enabled spectre_v2:Mitigation: Indirect branch cache disabled Fixes: `d6fbe1c55c` ("powerpc/64s: Wire up cpu_show_spectre_v2()") Cc: stable@vger.kernel.org # v4.17+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:03:15 +10:00
Nicholas Piggin	5b73151fff	powerpc: NMI IPI make NMI IPIs fully sychronous There is an asynchronous aspect to smp_send_nmi_ipi. The caller waits for all CPUs to call in to the handler, but it does not wait for completion of the handler. This is a needless complication, so remove it and always wait synchronously. The synchronous wait allows the caller to easily time out and clear the wait for completion (zero nmi_ipi_busy_count) in the case of badly behaved handlers. This would have prevented the recent smp_send_stop NMI IPI bug from causing the system to hang. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:03:14 +10:00
Nicholas Piggin	9b81c0211c	powerpc/64s: make PACA_IRQ_HARD_DIS track MSR[EE] closely When the masked interrupt handler clears MSR[EE] for an interrupt in the PACA_IRQ_MUST_HARD_MASK set, it does not set PACA_IRQ_HARD_DIS. This makes them get out of synch. With that taken into account, it's only low level irq manipulation (and interrupt entry before reconcile) where they can be out of synch. This makes the code less surprising. It also allows the IRQ replay code to rely on the IRQ_HARD_DIS value and not have to mtmsrd again in this case (e.g., for an external interrupt that has been masked). The bigger benefit might just be that there is not such an element of surprise in these two bits of state. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:03:14 +10:00
Ram Pai	07f522d203	powerpc/pkeys: make protection key 0 less special Applications need the ability to associate an address-range with some key and latter revert to its initial default key. Pkey-0 comes close to providing this function but falls short, because the current implementation disallows applications to explicitly associate pkey-0 to the address range. Lets make pkey-0 less special and treat it almost like any other key. Thus it can be explicitly associated with any address range, and can be freed. This gives the application more flexibility and power. The ability to free pkey-0 must be used responsibily, since pkey-0 is associated with almost all address-range by default. Even with this change pkey-0 continues to be slightly more special from the following point of view. (a) it is implicitly allocated. (b) it is the default key assigned to any address-range. (c) its permissions cannot be modified by userspace. NOTE: (c) is specific to powerpc only. pkey-0 is associated by default with all pages including kernel pages, and pkeys are also active in kernel mode. If any permission is denied on pkey-0, the kernel running in the context of the application will be unable to operate. Tested on powerpc. Signed-off-by: Ram Pai <linuxram@us.ibm.com> [mpe: Drop #define PKEY_0 0 in favour of plain old 0] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 21:43:24 +10:00
Ram Pai	a4fcc877d4	powerpc/pkeys: Preallocate execute-only key execute-only key is allocated dynamically. This is a problem. When a thread implicitly creates an execute-only key, and resets the UAMOR for that key, the UAMOR value does not percolate to all the other threads. Any other thread may ignorantly change the permissions on the key. This can cause the key to be not execute-only for that thread. Preallocate the execute-only key and ensure that no thread can change the permission of the key, by resetting the corresponding bit in UAMOR. Fixes: `5586cf61e1` ("powerpc: introduce execute-only pkey") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 21:38:42 +10:00
Ram Pai	fe6a2804e6	powerpc/pkeys: Fix calculation of total pkeys. Total number of pkeys calculation is off by 1. Fix it. Fixes: `4fb158f65a` ("powerpc: track allocation status of all pkeys") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 21:35:09 +10:00
Ram Pai	c76662e825	powerpc/pkeys: Save the pkey registers before fork When a thread forks the contents of AMR, IAMR, UAMOR registers in the newly forked thread are not inherited. Save the registers before forking, for content of those registers to be automatically copied into the new thread. Fixes: `cf43d3b264` ("powerpc: Enable pkey subsystem") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 21:34:47 +10:00
Ram Pai	4a4a5e5d2a	powerpc/pkeys: key allocation/deallocation must not change pkey registers Key allocation and deallocation has the side effect of programming the UAMOR/AMR/IAMR registers. This is wrong, since its the responsibility of the application and not that of the kernel, to modify the permission on the key. Do not modify the pkey registers at key allocation/deallocation. This patch also fixes a bug where a sys_pkey_free() resets the UAMOR bits of the key, thus making its permissions unmodifiable from user space. Later if the same key gets reallocated from a different thread this thread will no longer be able to change the permissions on the key. Fixes: `cf43d3b264` ("powerpc: Enable pkey subsystem") Cc: stable@vger.kernel.org # v4.16+ Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 21:34:08 +10:00
Ram Pai	de113256f8	powerpc/pkeys: Deny read/write/execute by default Deny all permissions on all keys, with some exceptions. pkey-0 must allow all permissions, or else everything comes to a screaching halt. Execute-only key must allow execute permission. Fixes: `cf43d3b264` ("powerpc: Enable pkey subsystem") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 21:33:41 +10:00
Ram Pai	a57a04c76e	powerpc/pkeys: Give all threads control of their key permissions Currently in a multithreaded application, a key allocated by one thread is not usable by other threads. By "not usable" we mean that other threads are unable to change the access permissions for that key for themselves. When a new key is allocated in one thread, the corresponding UAMOR bits for that thread get enabled, however the UAMOR bits for that key for all other threads remain disabled. Other threads have no way to set permissions on the key, and the current default permissions are that read/write is enabled for all keys, which means the key has no effect for other threads. Although that may be the desired behaviour in some circumstances, having all threads able to control their permissions for the key is more flexible. The current behaviour also differs from the x86 behaviour, which is problematic for users. To fix this, enable the UAMOR bits for all keys, at process creation (in start_thread(), ie exec time). Since the contents of UAMOR are inherited at fork, all threads are capable of modifying the permissions on any key. This is technically an ABI break on powerpc, but pkey support is fairly new on powerpc and not widely used, and this brings us into line with x86. Fixes: `cf43d3b264` ("powerpc: Enable pkey subsystem") Cc: stable@vger.kernel.org # v4.16+ Tested-by: Florian Weimer <fweimer@redhat.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> [mpe: Reword some of the changelog] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 21:32:33 +10:00
Murilo Opsfelder Araujo	ec9336396a	powerpc/prom_init: Remove linux,stdout-package property This property was added in 2004 and the only use of it, which was already inside `#if 0`, was removed a month later. Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-20 12:50:51 +10:00
Alistair Popple	99c3ce33a0	powerpc/powernv/npu: Add a debugfs setting to change ATSD threshold The threshold at which it becomes more efficient to coalesce a range of ATSDs into a single per-PID ATSD is currently not well understood due to a lack of real-world work loads. This patch adds a debugfs parameter allowing the threshold to be altered at runtime in order to aid future development and refinement of the value. Signed-off-by: Alistair Popple <alistair@popple.id.au> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-19 21:58:10 +10:00
Bharat Bhushan	fca7bf946e	powerpc/mpic: Pass first free vector number to mpic_setup_error_int() Update the comment to account for the spurious interrupt number. The code was already accounting for it, but that was unclear because it was achieved by mpic_setup_error_int() knowing that the number it was passed was the last used vector, rather than the first free vector. So change the meaning of the argument to the first free vector and update the caller to pass 13, instead of 12, to achieve the same result. Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com> [mpe: Rewrite change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-19 21:58:09 +10:00
David Gibson	fdf743c5c5	powerpc/hugetlbpage: Rmove unhelpful HUGEPD__SHIFT macros The HUGEPD__SHIFT macros are always defined to be PGDIR_SHIFT and PUD_SHIFT, and have to have those values to work properly. They once used to have different values, but that was really only because they were used to mean different things in different contexts. `6fa50483` "powerpc/mm/hugetlb: initialize the pagetable cache correctly for hugetlb" removed that double meaning, but left the now useless constants. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-19 14:38:46 +10:00
Randy Dunlap	a8bf9e504a	chrp/nvram.c: add MODULE_LICENSE() Add MODULE_LICENSE() to the chrp nvram.c driver to fix the build warning message: WARNING: modpost: missing MODULE_LICENSE() in arch/powerpc/platforms/chrp/nvram.o Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-19 14:38:46 +10:00
Christophe Leroy	8c8c10b90d	powerpc/8xx: fix handling of early NULL pointer dereference NULL pointers are pointers to user memory space. So user pagetable has to be set in order to avoid random behaviour in case of NULL pointer dereference, otherwise we may encounter random memory access hence Machine Check Exception from TLB Miss handlers. Set user pagetable as early as possible in order to properly catch early kernel NULL pointer dereference. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-19 14:38:45 +10:00
Michael Ellerman	ce57c6610c	Merge branch 'topic/ppc-kvm' into next Merge in some commits we're sharing with the KVM tree. I manually propagated the change from commit `d3d4ffaae4` ("powerpc/powernv/ioda2: Reduce upper limit for DMA window size") into pci-ioda-tce.c. Conflicts: arch/powerpc/include/asm/cputable.h arch/powerpc/platforms/powernv/pci-ioda.c arch/powerpc/platforms/powernv/pci.h	2018-07-19 14:37:57 +10:00
Alexey Kardashevskiy	a68bd1267b	powerpc/powernv/ioda: Allocate indirect TCE levels on demand At the moment we allocate the entire TCE table, twice (hardware part and userspace translation cache). This normally works as we normally have contigous memory and the guest will map entire RAM for 64bit DMA. However if we have sparse RAM (one example is a memory device), then we will allocate TCEs which will never be used as the guest only maps actual memory for DMA. If it is a single level TCE table, there is nothing we can really do but if it a multilevel table, we can skip allocating TCEs we know we won't need. This adds ability to allocate only first level, saving memory. This changes iommu_table::free() to avoid allocating of an extra level; iommu_table::set() will do this when needed. This adds @alloc parameter to iommu_table::exchange() to tell the callback if it can allocate an extra level; the flag is set to "false" for the realmode KVM handlers of H_PUT_TCE hcalls and the callback returns H_TOO_HARD. This still requires the entire table to be counted in mm::locked_vm. To be conservative, this only does on-demand allocation when the usespace cache table is requested which is the case of VFIO. The example math for a system replicating a powernv setup with NVLink2 in a guest: 16GB RAM mapped at 0x0 128GB GPU RAM window (16GB of actual RAM) mapped at 0x244000000000 the table to cover that all with 64K pages takes: (((0x244000000000 + 0x2000000000) >> 16)8)>>20 = 4556MB If we allocate only necessary TCE levels, we will only need: (((0x400000000 + 0x400000000) >> 16)8)>>20 = 4MB (plus some for indirect levels). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-16 22:53:11 +10:00
Alexey Kardashevskiy	9bc98c8a43	powerpc/powernv: Rework TCE level allocation This moves actual pages allocation to a separate function which is going to be reused later in on-demand TCE allocation. While we are at it, remove unnecessary level size round up as the caller does this already. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-16 22:53:10 +10:00
Alexey Kardashevskiy	090bad39b2	powerpc/powernv: Add indirect levels to it_userspace We want to support sparse memory and therefore huge chunks of DMA windows do not need to be mapped. If a DMA window big enough to require 2 or more indirect levels, and a DMA window is used to map all RAM (which is a default case for 64bit window), we can actually save some memory by not allocation TCE for regions which we are not going to map anyway. The hardware tables alreary support indirect levels but we also keep host-physical-to-userspace translation array which is allocated by vmalloc() and is a flat array which might use quite some memory. This converts it_userspace from vmalloc'ed array to a multi level table. As the format becomes platform dependend, this replaces the direct access to it_usespace with a iommu_table_ops::useraddrptr hook which returns a pointer to the userspace copy of a TCE; future extension will return NULL if the level was not allocated. This should not change non-KVM handling of TCE tables and it_userspace will not be allocated for non-KVM tables. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-16 22:53:10 +10:00

1 2 3 4 5 ...

18577 Commits