The precompute functions differ only by the sub-macros
they call, merge them to a single macro. Later diffs
add more code to fill in the gcm_context_data structure,
this allows changes in a single place.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
AAD hash only needs to be calculated once for each scatter/gather operation.
Move it to its own macro, and call it from GCM_INIT instead of
INITIAL_BLOCKS.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Merge encode and decode tag calculations in GCM_COMPLETE macro.
Scatter/gather routines will call this once at the end of encryption
or decryption.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add support for 192/256-bit keys using the avx gcm/aes routines.
The sse routines were previously updated in e31ac32d3b (Add support
for 192 & 256 bit keys to AESNI RFC4106).
Instead of adding an additional loop in the hotpath as in e31ac32d3b,
this diff instead generates separate versions of the code using macros,
and the entry routines choose which version once. This results
in a 5% performance improvement vs. adding a loop to the hot path.
This is the same strategy chosen by the intel isa-l_crypto library.
The key size checks are removed from the c code where appropriate.
Note that this diff depends on using gcm_context_data - 256 bit keys
require 16 HashKeys + 15 expanded keys, which is larger than
struct crypto_aes_ctx, so they are stored in struct gcm_context_data.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Macro-ify function save and restore. These will be used in new functions
added for scatter/gather update operations.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add the gcm_context_data structure to the avx asm routines.
This will be necessary to support both 256 bit keys and
scatter/gather.
The pre-computed HashKeys are now stored in the gcm_context_data
struct, which is expanded to hold the greater number of hashkeys
necessary for avx.
Loads and stores to the new struct are always done unlaligned to
avoid compiler issues, see e5b954e8 "Use unaligned loads from
gcm_context_data"
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The GCM_ENC_DEC routines for AVX and AVX2 are identical, except they
call separate sub-macros. Pass the macros as arguments, and merge them.
This facilitates additional refactoring, by requiring changes in only
one place.
The GCM_ENC_DEC macro was moved above the CONFIG_AS_AVX* ifdefs,
since it will be used by both AVX and AVX2.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
protocol is indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
net/can/af_can.c:115 can_get_proto() warn: potential spectre issue 'proto_tab' [w]
Fix this by sanitizing protocol before using it to index proto_tab.
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Validate packet socket address length if a length is given. Zero
length is equivalent to not setting an address.
Fixes: 99137b7888 ("packet: validate address length")
Reported-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
proto is indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
net/nfc/af_nfc.c:42 nfc_sock_create() warn: potential spectre issue 'proto_tab' [w] (local cap)
Fix this by sanitizing proto before using it to index proto_tab.
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
protocol is indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
net/phonet/af_phonet.c:48 phonet_proto_get() warn: potential spectre issue 'proto_tab' [w] (local cap)
Fix this by sanitizing protocol before using it to index proto_tab.
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
flen is indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
net/core/filter.c:1101 bpf_check_classic() warn: potential spectre issue 'filter' [w]
Fix this by sanitizing flen before using it to index filter at line 1101:
switch (filter[flen - 1].code) {
and through pc at line 1040:
const struct sock_filter *ftest = &filter[pc];
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is two simple target fixes and one discard related I/O starvation
problem in sd. The discard problem occurs because the discard page
doesn't have a mempool backing so if the allocation fails due to
memory pressure, we then lose the forward progress we require if the
writeout is on the same device. The fix is to back it with a mempool.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXB2mCiYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishSJmAP9E8ItG
tSgUyIfRRcn/ZxYdfOg1EWxGgDq17Fq2TgQU3gEAolSLwol7eKl1hQnDpOKPVMmC
//j4JyKpCl3EEvNs6DQ=
=3Hmt
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is two simple target fixes and one discard related I/O starvation
problem in sd.
The discard problem occurs because the discard page doesn't have a
mempool backing so if the allocation fails due to memory pressure, we
then lose the forward progress we require if the writeout is on the
same device. The fix is to back it with a mempool"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sd: use mempool for discard special page
scsi: target: iscsi: cxgbit: add missing spin_lock_init()
scsi: target: iscsi: cxgbit: fix csk leak
- don't pollute userspace with macro definitions
From Xiaozhou Liu
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAlwdVEQACgkQGXyLc2ht
IW0RSxAAo3yZBrkMxN6FIrBeEfENFs8TL3iDq5GoCPShJNWHRpRbEBhi06/k6dnA
ePFmgXL/FEio+f47aUj/pEh2NQv5QcwkLRpizREmGtHjVBngJNARFyHxveZyqE52
ArySpu5/WPswQdu73cQLAwqtk505Gi8jNLRKVqr4CiBJZB/WO7rsINWDOhUulpwG
9b8Kmct4al/3mhDOhnn1ppgAIauzj2xoyXxYMLZx95h7oycfssUvbNfJtALnxCJs
eIWAxGebr3ni85q9J69gMfIOwiSn6HtaLAuv8Q7AOKuCBd1+/ymX79gCwH68dQVl
tdDhIE7vEAWZHbVHy7fdnNIUbPfAMk/QonLStbdd2nYVeblD/luSe91NShCo7Jg5
ZVJHdA+eD9IjypGz4mMzjOlvhCWZBGtOdnby4tD6YxV+S9fDQPvE+9Ws1JaAIzpH
kpnj1tmi5YwqN6T5pLWQwVs/HCuoCXI89pv0tSQwip2/txxorAIhhJvzo94lLdv/
nOABNb6/eszVj7IGOxKLXW/djFluwt/0SzlaD3A8pIjQWNXolfpmBu/9xMcJVZuj
070Vfn60bjH9q2qitBvlYLCX4eXaGBcfybRi7oe5WnOlKVCl1idQSPcn+2dhiaOO
JpInO/XLQUrieHT4f/AW6prWJ4AiQJd1lpKw76acOjcm/iXMJGM=
=e9gb
-----END PGP SIGNATURE-----
Merge tag 'compiler-attributes-for-linus-v4.20' of https://github.com/ojeda/linux
Pull compiler_types.h fix from Miguel Ojeda:
"A cleanup for userspace in compiler_types.h: don't pollute userspace
with macro definitions (Xiaozhou Liu)
This is harmless for the kernel, but v4.19 was released with a few
macros exposed to userspace as the patch explains; which this removes,
so it *could* happen that we break something for someone (although
leaving inline redefined is probably worse)"
* tag 'compiler-attributes-for-linus-v4.20' of https://github.com/ojeda/linux:
include/linux/compiler_types.h: don't pollute userspace with macro definitions
This reverts commit 55956b59df.
commit 55956b59df ("vfs: Allow userns root to call mknod on owned filesystems.")
enabled mknod() in user namespaces for userns root if CAP_MKNOD is
available. However, these device nodes are useless since any filesystem
mounted from a non-initial user namespace will set the SB_I_NODEV flag on
the filesystem. Now, when a device node s created in a non-initial user
namespace a call to open() on said device node will fail due to:
bool may_open_dev(const struct path *path)
{
return !(path->mnt->mnt_flags & MNT_NODEV) &&
!(path->mnt->mnt_sb->s_iflags & SB_I_NODEV);
}
The problem with this is that as of the aforementioned commit mknod()
creates partially functional device nodes in non-initial user namespaces.
In particular, it has the consequence that as of the aforementioned commit
open() will be more privileged with respect to device nodes than mknod().
Before it was the other way around. Specifically, if mknod() succeeded
then it was transparent for any userspace application that a fatal error
must have occured when open() failed.
All of this breaks multiple userspace workloads and a widespread assumption
about how to handle mknod(). Basically, all container runtimes and systemd
live by the slogan "ask for forgiveness not permission" when running user
namespace workloads. For mknod() the assumption is that if the syscall
succeeds the device nodes are useable irrespective of whether it succeeds
in a non-initial user namespace or not. This logic was chosen explicitly
to allow for the glorious day when mknod() will actually be able to create
fully functional device nodes in user namespaces.
A specific problem people are already running into when running 4.18 rc
kernels are failing systemd services. For any distro that is run in a
container systemd services started with the PrivateDevices= property set
will fail to start since the device nodes in question cannot be
opened (cf. the arguments in [1]).
Full disclosure, Seth made the very sound argument that it is already
possible to end up with partially functional device nodes. Any filesystem
mounted with MS_NODEV set will allow mknod() to succeed but will not allow
open() to succeed. The difference to the case here is that the MS_NODEV
case is transparent to userspace since it is an explicitly set mount option
while the SB_I_NODEV case is an implicit property enforced by the kernel
and hence opaque to userspace.
[1]: https://github.com/systemd/systemd/pull/9483
Signed-off-by: Christian Brauner <christian@brauner.io>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Seth Forshee <seth.forshee@canonical.com>
Cc: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The following commit:
d5052a7130a6 ("x86/efi: Unmap EFI boot services code/data regions from efi_pgd")
forgets to take two EFI modes into consideration, namely EFI_OLD_MEMMAP and
EFI_MIXED_MODE:
- EFI_OLD_MEMMAP is a legacy way of mapping EFI regions into swapper_pg_dir
using ioremap() and init_memory_mapping(). This feature can be enabled by
passing "efi=old_map" as kernel command line argument. But,
efi_unmap_pages() unmaps EFI boot services code/data regions *only* from
efi_pgd and hence cannot be used for unmapping EFI boot services code/data
regions from swapper_pg_dir.
Introduce a temporary fix to not unmap EFI boot services code/data regions
when EFI_OLD_MEMMAP is enabled while working on a real fix.
- EFI_MIXED_MODE is another feature where a 64-bit kernel runs on a
64-bit platform crippled by a 32-bit firmware. To support EFI_MIXED_MODE,
all RAM (i.e. namely EFI regions like EFI_CONVENTIONAL_MEMORY,
EFI_LOADER_<CODE/DATA>, EFI_BOOT_SERVICES_<CODE/DATA> and
EFI_RUNTIME_CODE/DATA regions) is mapped into efi_pgd all the time to
facilitate EFI runtime calls access it's arguments in 1:1 mode.
Hence, don't unmap EFI boot services code/data regions when booted in mixed mode.
Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20181222022234.7573-1-sai.praneeth.prakhya@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We really need the writecombine flag in dma_alloc_wc, fix a stupid
oversight.
Fixes: 7ed1d91a9e ("dma-mapping: translate __GFP_NOFAIL to DMA_ATTR_NO_WARN")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The /chosen/linux,stdout-path is "deprecated" in favour of
/chosen/stdout-path so we should be checking for both.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
HMIs will crash the kernel due to
BRANCH_LINK_TO_FAR(hmi_exception_realmode)
Calling into the OPD instead of the actual code.
Fixes: 2337d20728 ("powerpc/64: CONFIG_RELOCATABLE support for hmi interrupts")
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Use DOTSYM() rather than #ifdef]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Convert string compares of DT node names to use of_node_name_{eq,prefix}
helpers instead. This removes direct access to the node name pointer.
This changes a single case insensitive node name comparison to case
sensitive for "ata4". This is the only instance of a case insensitive
comparison for all the open coded node name comparisons on powerpc.
Searching the commit history, there doesn't appear to be any reason for
it to be case insensitive.
A couple of open coded iterating thru the child node names are converted
to use for_each_child_of_node() instead.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Convert string compares of DT node names to use of_node_name_eq helper
instead. This removes direct access to the node name pointer.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linux-ide@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Convert string compares of DT node names to use of_node_name_eq helper
instead. This removes direct access to the node name pointer.
A couple of open coded iterating thru the child node names are converted
to use for_each_child_of_node() instead.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In preparation to remove the node name pointer from struct
device_node, convert printf users to use the %pOFn format specifier.
pmem.c was recently added and missed the initial conversion.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This comment talks about PTEs being 64-bits and PMD/PGD being 32-bits,
but that hasn't been true since 2005 when David Gibson implemented
4-level page tables in the commit titled "Four level pagetables for
ppc64".
Remove it.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In update_lmb_associativity_index() we lookup dr_node using
of_find_node_by_path() which takes a reference for us. In the
non-error case we forget to drop the reference. Note that
find_aa_index() does modify properties of the node, but doesn't need
an extra reference held once it's returned.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
mpc8641_hpcn was updated to 4-cell interrupt specifiers, but
PCI interrupt-map was not updated. It was also missing #interrupt-cells
on the outer PCI buses.
p1020rdb-pc was updated to 4-cell interrupt specifiers, but
the ethernet-phy nodes weren't updated.
mpc832x_rdb had an invalid "interrupts = <0>" on the ethernet-phy nodes.
Besides being the wrong number of cells, 0 is not a valid IPIC interrupt
according to ipic.c. Presumably it was meant to indicate that these
PHYs are not connected to an interrupt.
Signed-off-by: Scott Wood <oss@buserror.net>
Add more SoC compatible strings to support more chips.
Signed-off-by: Yuantian Tang <andy.tang@nxp.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Scott Wood <oss@buserror.net>
The driver retains compatibility with old device trees, but we don't
want the old nodes lying around to be copied, or used as a reference
(some of the mux options are incorrect), or even just being clutter.
Signed-off-by: Scott Wood <oss@buserror.net>
Signed-off-by: Tang Yuantian <andy.tang@nxp.com>
[scottwood: removed sysclk node added by Andy]
Signed-off-by: Scott Wood <oss@buserror.net>
When the watchdog timer is set in interrupt mode, it causes a
machine check when it times out. The purpose of this mode is to
ease debugging, not to crash the kernel and reboot the machine.
This patch implements a special handling for that, in order to not
crash the kernel if the watchdog times out while in interrupt or
within the idle task.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[scottwood: added missing #include]
Signed-off-by: Scott Wood <oss@buserror.net>
Fix a spelling mistake in a register description.
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Scott Wood <oss@buserror.net>
swiotlb will only bounce buffer when the effective dma address for the
device is smaller than the actual DMA range. Instead of flipping between
the swiotlb and nommu ops for FSL SOCs that have the second outbound
window just don't set the bus dma_mask in this case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Scott Wood <oss@buserror.net>
Merge misc fixes from Andrew Morton:
"4 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm, page_alloc: fix has_unmovable_pages for HugePages
fork,memcg: fix crash in free_thread_stack on memcg charge fail
mm: thp: fix flags for pmd migration when split
mm, memory_hotplug: initialize struct pages for the full memory section
While playing with gigantic hugepages and memory_hotplug, I triggered
the following #PF when "cat memoryX/removable":
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 1 PID: 1481 Comm: cat Tainted: G E 4.20.0-rc6-mm1-1-default+ #18
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
RIP: 0010:has_unmovable_pages+0x154/0x210
Call Trace:
is_mem_section_removable+0x7d/0x100
removable_show+0x90/0xb0
dev_attr_show+0x1c/0x50
sysfs_kf_seq_show+0xca/0x1b0
seq_read+0x133/0x380
__vfs_read+0x26/0x180
vfs_read+0x89/0x140
ksys_read+0x42/0x90
do_syscall_64+0x5b/0x180
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The reason is we do not pass the Head to page_hstate(), and so, the call
to compound_order() in page_hstate() returns 0, so we end up checking
all hstates's size to match PAGE_SIZE.
Obviously, we do not find any hstate matching that size, and we return
NULL. Then, we dereference that NULL pointer in
hugepage_migration_supported() and we got the #PF from above.
Fix that by getting the head page before calling page_hstate().
Also, since gigantic pages span several pageblocks, re-adjust the logic
for skipping pages. While are it, we can also get rid of the
round_up().
[osalvador@suse.de: remove round_up(), adjust skip pages logic per Michal]
Link: http://lkml.kernel.org/r/20181221062809.31771-1-osalvador@suse.de
Link: http://lkml.kernel.org/r/20181217225113.17864-1-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 9b6f7e163c ("mm: rework memcg kernel stack accounting") will
result in fork failing if allocating a kernel stack for a task in
dup_task_struct exceeds the kernel memory allowance for that cgroup.
Unfortunately, it also results in a crash.
This is due to the code jumping to free_stack and calling
free_thread_stack when the memcg kernel stack charge fails, but without
tsk->stack pointing at the freshly allocated stack.
This in turn results in the vfree_atomic in free_thread_stack oopsing
with a backtrace like this:
#5 [ffffc900244efc88] die at ffffffff8101f0ab
#6 [ffffc900244efcb8] do_general_protection at ffffffff8101cb86
#7 [ffffc900244efce0] general_protection at ffffffff818ff082
[exception RIP: llist_add_batch+7]
RIP: ffffffff8150d487 RSP: ffffc900244efd98 RFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88085ef55980 RCX: 0000000000000000
RDX: ffff88085ef55980 RSI: 343834343531203a RDI: 343834343531203a
RBP: ffffc900244efd98 R8: 0000000000000001 R9: ffff8808578c3600
R10: 0000000000000000 R11: 0000000000000001 R12: ffff88029f6c21c0
R13: 0000000000000286 R14: ffff880147759b00 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffffc900244efda0] vfree_atomic at ffffffff811df2c7
#9 [ffffc900244efdb8] copy_process at ffffffff81086e37
#10 [ffffc900244efe98] _do_fork at ffffffff810884e0
#11 [ffffc900244eff10] sys_vfork at ffffffff810887ff
#12 [ffffc900244eff20] do_syscall_64 at ffffffff81002a43
RIP: 000000000049b948 RSP: 00007ffcdb307830 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 0000000000896030 RCX: 000000000049b948
RDX: 0000000000000000 RSI: 00007ffcdb307790 RDI: 00000000005d7421
RBP: 000000000067370f R8: 00007ffcdb3077b0 R9: 000000000001ed00
R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000040
R13: 000000000000000f R14: 0000000000000000 R15: 000000000088d018
ORIG_RAX: 000000000000003a CS: 0033 SS: 002b
The simplest fix is to assign tsk->stack right where it is allocated.
Link: http://lkml.kernel.org/r/20181214231726.7ee4843c@imladris.surriel.com
Fixes: 9b6f7e163c ("mm: rework memcg kernel stack accounting")
Signed-off-by: Rik van Riel <riel@surriel.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When splitting a huge migrating PMD, we'll transfer all the existing PMD
bits and apply them again onto the small PTEs. However we are fetching
the bits unconditionally via pmd_soft_dirty(), pmd_write() or
pmd_yound() while actually they don't make sense at all when it's a
migration entry. Fix them up. Since at it, drop the ifdef together as
not needed.
Note that if my understanding is correct about the problem then if
without the patch there is chance to lose some of the dirty bits in the
migrating pmd pages (on x86_64 we're fetching bit 11 which is part of
swap offset instead of bit 2) and it could potentially corrupt the
memory of an userspace program which depends on the dirty bit.
Link: http://lkml.kernel.org/r/20181213051510.20306-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: <stable@vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If memory end is not aligned with the sparse memory section boundary,
the mapping of such a section is only partly initialized. This may lead
to VM_BUG_ON due to uninitialized struct page access from
is_mem_section_removable() or test_pages_in_a_zone() function triggered
by memory_hotplug sysfs handlers:
Here are the the panic examples:
CONFIG_DEBUG_VM=y
CONFIG_DEBUG_VM_PGFLAGS=y
kernel parameter mem=2050M
--------------------------
page:000003d082008000 is uninitialized and poisoned
page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
Call Trace:
( test_pages_in_a_zone+0xde/0x160)
show_valid_zones+0x5c/0x190
dev_attr_show+0x34/0x70
sysfs_kf_seq_show+0xc8/0x148
seq_read+0x204/0x480
__vfs_read+0x32/0x178
vfs_read+0x82/0x138
ksys_read+0x5a/0xb0
system_call+0xdc/0x2d8
Last Breaking-Event-Address:
test_pages_in_a_zone+0xde/0x160
Kernel panic - not syncing: Fatal exception: panic_on_oops
kernel parameter mem=3075M
--------------------------
page:000003d08300c000 is uninitialized and poisoned
page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
Call Trace:
( is_mem_section_removable+0xb4/0x190)
show_mem_removable+0x9a/0xd8
dev_attr_show+0x34/0x70
sysfs_kf_seq_show+0xc8/0x148
seq_read+0x204/0x480
__vfs_read+0x32/0x178
vfs_read+0x82/0x138
ksys_read+0x5a/0xb0
system_call+0xdc/0x2d8
Last Breaking-Event-Address:
is_mem_section_removable+0xb4/0x190
Kernel panic - not syncing: Fatal exception: panic_on_oops
Fix the problem by initializing the last memory section of each zone in
memmap_init_zone() till the very end, even if it goes beyond the zone end.
Michal said:
: This has alwways been problem AFAIU. It just went unnoticed because we
: have zeroed memmaps during allocation before f7f99100d8 ("mm: stop
: zeroing memory during allocation in vmemmap") and so the above test
: would simply skip these ranges as belonging to zone 0 or provided a
: garbage.
:
: So I guess we do care for post f7f99100d8 kernels mostly and
: therefore Fixes: f7f99100d8 ("mm: stop zeroing memory during
: allocation in vmemmap")
Link: http://lkml.kernel.org/r/20181212172712.34019-2-zaslonko@linux.ibm.com
Fixes: f7f99100d8 ("mm: stop zeroing memory during allocation in vmemmap")
Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Suggested-by: Michal Hocko <mhocko@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull sparc fixes from David Miller:
"Just some small fixes here and there, and a refcount leak in a serial
driver, nothing serious"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
serial/sunsu: fix refcount leak
sparc: Set "ARCH: sunxx" information on the same line
sparc: vdso: Drop implicit common-page-size linker flag
Pull more networking fixes from David Miller:
"Some more bug fixes have trickled in, we have:
1) Local MAC entries properly in mscc driver, from Allan W. Nielsen.
2) Eric Dumazet found some more of the typical "pskb_may_pull() -->
oops forgot to reload the header pointer" bugs in ipv6 tunnel
handling.
3) Bad SKB socket pointer in ipv6 fragmentation handling, from Herbert
Xu.
4) Overflow fix in sk_msg_clone(), from Vakul Garg.
5) Validate address lengths in AF_PACKET, from Willem de Bruijn"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
qmi_wwan: Fix qmap header retrieval in qmimux_rx_fixup
qmi_wwan: Add support for Fibocom NL678 series
tls: Do not call sk_memcopy_from_iter with zero length
ipv6: tunnels: fix two use-after-free
Prevent overflow of sk_msg in sk_msg_clone()
packet: validate address length
net: netxen: fix a missing check and an uninitialized use
tcp: fix a race in inet_diag_dump_icsk()
MAINTAINERS: update cxgb4 and cxgb3 maintainer
ipv6: frags: Fix bogus skb->sk in reassembled packets
mscc: Configured MAC entries should be locked.
The x/y command parsing has been broken since commit 129957069e
("staging: panel: Fixed checkpatch warning about simple_strtoul()").
Commit b34050fadb ("auxdisplay: charlcd: Fix and clean up handling of
x/y commands") fixed some problems by rewriting the parsing code,
but also broke things further by removing the check for a complete
command before attempting to parse it. As a result, parsing is
terminated at the first x or y character.
This reinstates the check for a final semicolon. Whereas the original
code use strchr(), this is wasteful seeing as the semicolon is always
at the end of the buffer. Thus check this character directly instead.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
The function of_find_node_by_path() acquires a reference to the node
returned by it and that reference needs to be dropped by its caller.
su_get_type() doesn't do that. The match node are used as an identifier
to compare against the current node, so we can directly drop the refcount
after getting the node from the path as it is not used as pointer.
Fix this by use a single variable and drop the refcount right after
of_find_node_by_path().
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While checking boot log from SPARC qemu, I saw that the "ARCH: sunxx"
information was split on two different line.
This patchs merge both line together.
In the meantime, thoses information need to be printed via pr_info
since printk print them by default via the warning loglevel.
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
GNU linker's -z common-page-size's default value is based on the target
architecture. arch/sparc/vdso/Makefile sets it to the architecture
default, which is implicit and redundant. Drop it.
Link: https://lkml.kernel.org/r/20181206191231.192355-1-ndesaulniers@google.com
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJcHOSgAAoJEL/70l94x66DAw4H/jQjdRjT1DAf4vswXwMD6lpJ
qHcSyAYL4d/PFbcovfAm2ca8F0HJylVWDeZcqQRP3zdX53diqJ4gyYMaNuuY0niX
zKvzNhFw1oaZK93rwrF6BX1jl4Virw2uC4qL9bhgV/OfkmvTPvIFkP8gJGVDt9YY
Kn5yhWnJOpHOCQs3GW8zOy2LWtiuCrp7epSrMGjGsWrp50ccW1tTioxYyDmBr3mF
GizAIgDD2xMwIeOlj4IngQhDTahwekOA9XzhSMKjm0/GMcZ33TXPcnUdoa0Yxguj
Uu3cXLfcEUfakZdefi3FB5eDB2knDe3kbmKviok2giAAY1hBvEO5b6bHrn+5W2g=
=l4oP
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fix from Paolo Bonzini:
"A simple patch for a pretty bad bug: Unbreak AMD nested
virtualization."
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: nSVM: fix switch to guest mmu
This patch fixes qmap header retrieval when modem is configured for
dl data aggregation.
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull timer fix from Ingo Molnar:
"Fix a division by zero crash in the posix-timers code"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
posix-timers: Fix division by zero bug
Added support for Fibocom NL678 series cellular module QMI interface.
Using QMI_QUIRK_SET_DTR required for Qualcomm MDM9x40 series chipsets.
Signed-off-by: Jörgen Storvist <jorgen.storvist@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>