The ndctl unit tests discovered that the dax enabling omitted updates to
nd_detach_and_reset(). This routine clears device the configuration
when the namespace is detached. Without this clearing userspace may
assume that the device is in the process of being configured by another
agent in the system.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Testing the dax-device autodetect support revealed a probe failure with
the following result:
dax0.1: bad offset: 0x8200000 dax disabled
The original pfn-device implementation inferred the alignment from
ilog2(offset), now that the alignment is explicit the is_power_of_2()
needs replacing with a real sanity check against the recorded alignment.
Otherwise the alignment check is useless in the implicit case and only
the minimum size of the offset matters.
This self-consistency check is further validated by the probe path that
will re-check that the offset is large enough to contain all the
metadata required to enable the device.
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
For autodetecting a previously established dax configuration we need the
info block to indicate block-device vs device-dax mode, and we need to
have the default namespace probe hand-off the configuration to the
dax_pmem driver.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
ida instances allocate some internal memory for ->free_bitmap in
addition to the base 'struct ida'. Use ida_destroy() to release that
memory at module_exit().
Reported-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The dax_pmem driver was implementing an empty ->remove() method to
satisfy the nvdimm bus driver that unconditionally calls ->remove().
Teach the core bus driver to check if ->remove() is NULL to remove that
requirement.
Reported-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
We want to use the alignment as the allocation and mapping unit.
Previously this information was only useful for establishing the data
offset, but now it is important to remember the granularity for the
later use.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
We may want to subdivide a device-dax range into multiple devices so
that each can have separate permissions or naming. Reserve 128K of
label space by default so we have the capability of making allocation
decisions persistent. This reservation is not something we can add
later since it would result in the default size of a device-dax range
changing between kernel versions.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Device DAX is the device-centric analogue of Filesystem DAX
(CONFIG_FS_DAX). It allows persistent memory ranges to be allocated and
mapped without need of an intervening file system. This initial
infrastructure arranges for a libnvdimm pfn-device to be represented as
a different device-type so that it can be attached to a driver other
than the pmem driver.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
I had relied on the kbuild robot for cross build coverage, however it
only builds alpha_defconfig. Switch from HPAGE_SIZE to PMD_SIZE, which
is more widely defined.
Fixes: 658922e57b ("libnvdimm, pfn: fix memmap reservation sizing")
Cc: <stable@vger.kernel.org>
Reported-by: Guenter Roeck <guenter@roeck-us.net>
Tested-by: Guenter Roeck <guenter@roeck-us.net>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When configuring a pfn-device instance to allocate the memmap array it
needs to account for the fact that vmemmap_populate_hugepages()
allocates struct page blocks in HPAGE_SIZE chunks. We need to align the
reserved area size to 2MB otherwise arch_add_memory() runs out of memory
while establishing the memmap:
WARNING: CPU: 0 PID: 496 at arch/x86/mm/init_64.c:704 arch_add_memory+0xe7/0xf0
[..]
Call Trace:
[<ffffffff8148bdb3>] dump_stack+0x85/0xc2
[<ffffffff810a749b>] __warn+0xcb/0xf0
[<ffffffff810a75cd>] warn_slowpath_null+0x1d/0x20
[<ffffffff8106a497>] arch_add_memory+0xe7/0xf0
[<ffffffff811d2097>] devm_memremap_pages+0x287/0x450
[<ffffffff811d1ffa>] ? devm_memremap_pages+0x1ea/0x450
[<ffffffffa0000298>] __wrap_devm_memremap_pages+0x58/0x70 [nfit_test_iomap]
[<ffffffffa0047a58>] pmem_attach_disk+0x318/0x420 [nd_pmem]
[<ffffffffa0047bcf>] nd_pmem_probe+0x6f/0x90 [nd_pmem]
[<ffffffffa0009469>] nvdimm_bus_probe+0x69/0x110 [libnvdimm]
[..]
ndbus0: nd_pmem.probe(pfn3.0) = -12
nd_pmem: probe of pfn3.0 failed with error -12
libndctl: ndctl_pfn_enable: pfn3.0: failed to enable
Reported-by: Namratha Kothapalli <namratha.n.kothapalli@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
There are currently 4 known similar but incompatible definitions of the
command sets that can be sent to an NVDIMM through ACPI. It is also
clear that future platform generations (ACPI or not) will continue to
revise and extend the DIMM command set as new devices and use cases
arrive.
It is obviously untenable to continue to proliferate divergence
of these command definitions, and to that end a standardization process
has begun to provide for a unified specification. However, that leaves a
problem about what to do with this first generation where vendors are
already shipping divergence.
The Linux kernel can support these initial diverged platforms without
giving platform-firmware free reign to continue to diverge and compound
kernel maintenance overhead. The kernel implementation can encourage
standardization in two ways:
1/ Require that any function code that userspace wants to send be
explicitly white-listed in the implementation. For ACPI this means
function codes marked as supported by acpi_check_dsm() may
only be invoked if they appear in the white-list. A function must be
publicly documented before it is added to the white-list.
2/ The above restrictions can be trivially bypassed by using the
"vendor-specific" payload command. However, since vendor-specific
commands are by definition not publicly documented and have the
potential to corrupt the kernel's view of the dimm state, we provide a
toggle to disable vendor-specific operations. Enabling undefined
behavior is a policy decision that can be made by the platform owner
and encourages firmware implementations to choose public over
private command implementations.
Based on an initial patch from Jerry Hoemann
Cc: Jerry Hoemann <jerry.hoemann@hpe.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Clarify the distinction between "commands", the ioctls userspace calls
to request the kernel take some action on a given dimm device, and
"_DSMs", the actual function numbers used in the firmware interface to
the DIMM. _DSMs are ACPI specific whereas commands are Linux kernel
generic.
This is in preparation for breaking the 1:1 implicit relationship
between the kernel ioctl number space and the firmware specific function
numbers.
Cc: Jerry Hoemann <jerry.hoemann@hpe.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The 'host' variable can be killed as it is always the same as the passed
in device.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The devm conversion obviates the need to continue to remember the queue
and disk locally in the driver.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Now that pmem internals have been disentangled from pfn setup, that code
can move to the core. This is in preparation for adding another user of
the pfn-device capabilities.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In preparation for providing an alternative (to block device) access
mechanism to persistent memory, convert pmem_rw_bytes() to
nsio_rw_bytes(). This allows ->rw_bytes() functionality without
requiring a 'struct pmem_device' to be instantiated.
In other words, when ->rw_bytes() is in use i/o is driven through
'struct nd_namespace_io', otherwise it is driven through 'struct
pmem_device' and the block layer. This consolidates the disjoint calls
to devm_exit_badblocks() and devm_memunmap() into a common
devm_nsio_disable() and cleans up the init path to use a unified
pmem_attach_disk() implementation.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The leading '0x' in front of %pa is redundant, also we can just use %pR
to simplify the print statement. The request parameters can be directly
taken from the resource as well.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Register a callback to clean up the request_queue and put the gendisk at
driver disable time.
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Consolidate the information for issuing i/o to a blk-namespace, and
eliminate some pointer chasing.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
I/O errors events have the potential to be a high frequency and a log
message for each event can swamp the system. This message is also
redundant with upper layer error reporting.
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Save a pointer chase by storing the driver private data in the
request_queue rather than the gendisk.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Save a pointer chase by storing the driver private data in the
request_queue rather than the gendisk.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Register a callback to clean up the request_queue and put the gendisk at
driver disable time.
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pass the device performing the probe so we can use a devm allocation for
the btt superblock.
Cc: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pass the device performing the probe so we can use a devm allocation for
the pfn superblock.
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
We can derive the common namespace from other information. We also do
not need to cache it because all the usages are in slow paths.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The ACPI specification does not specify the state of data after a clear
poison operation. Potential future libnvdimm bus implementations for
other architectures also might not specify or disagree on the state of
data after clear poison. Clarify why we write twice.
Reported-by: Jeff Moyer <jmoyer@redhat.com>
Reported-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Provide simulated SMART data to enable the ndctl implementation of SMART
data retrieval and parsing.
The payload is defined here, "Section 4.1 SMART and Health Info
(Function Index 1)":
http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pull libnvdimm fixes from Dan Williams:
"Three fixes, the first two are tagged for -stable:
- The ndctl utility/library gained expanded unit tests illuminating a
long standing bug in the libnvdimm SMART data retrieval
implementation.
It has been broken since its initial implementation, now fixed.
- Another one line fix for the detection of stale info blocks.
Without this change userspace can get into a situation where it is
unable to reconfigure a namespace.
- Fix the badblock initialization path in the presence of the new (in
v4.6-rc1) section alignment workarounds.
Without this change badblocks will be reported at the wrong offset.
These have received a build success report from the kbuild robot and
have appeared in -next with no reported issues"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
libnvdimm, pfn: fix nvdimm_namespace_add_poison() vs section alignment
libnvdimm, pfn: fix uuid validation
libnvdimm: fix smart data retrieval
When section alignment padding is in effect we need to shift / truncate
the range that is queried for poison by the 'start_pad' or 'end_trunc'
reservations.
It's easiest if we just pass in an adjusted resource range rather than
deriving it from the passed in namespace. With the resource range
resolution pushed out to the caller we can also push the
namespace-to-region lookup to the caller and drop the implicit pmem-type
assumption about the passed in namespace object.
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
If we detect a namespace has a stale info block in the init path, we
should overwrite with the latest configuration. In fact, we already
return -ENODEV when the parent uuid is invalid, the same should be done
for the 'self' uuid. Otherwise we can get into a condition where
userspace is unable to reconfigure the pfn-device without directly /
manually invalidating the info block.
Cc: <stable@vger.kernel.org>
Reported-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
It appears that smart data retrieval has been broken the since the
initial implementation. Fix the payload size to be 128-bytes per the
specification.
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Update the definition of memcpy_from_pmem() to return 0 or a negative
error code. Implement x86/arch_memcpy_from_pmem() with memcpy_mcsafe().
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pull ARM updates from Russell King:
"Another mixture of changes this time around:
- Split XIP linker file from main linker file to make it more
maintainable, and various XIP fixes, and clean up a resulting
macro.
- Decompressor cleanups from Masahiro Yamada
- Avoid printing an error for a missing L2 cache
- Remove some duplicated symbols in System.map, and move
vectors/stubs back into kernel VMA
- Various low priority fixes from Arnd
- Updates to allow bus match functions to return negative errno
values, touching some drivers and the driver core. Greg has acked
these changes.
- Virtualisation platform udpates form Jean-Philippe Brucker.
- Security enhancements from Kees Cook
- Rework some Kconfig dependencies and move PSCI idle management code
out of arch/arm into drivers/firmware/psci.c
- ARM DMA mapping updates, touching media, acked by Mauro.
- Fix places in ARM code which should be using virt_to_idmap() so
that Keystone2 can work.
- Fix Marvell Tauros2 to work again with non-DT boots.
- Provide a delay timer for ARM Orion platforms"
* 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (45 commits)
ARM: 8546/1: dma-mapping: refactor to fix coherent+cma+gfp=0
ARM: 8547/1: dma-mapping: store buffer information
ARM: 8543/1: decompressor: rename suffix_y to compress-y
ARM: 8542/1: decompressor: merge piggy.*.S and simplify Makefile
ARM: 8541/1: decompressor: drop redundant FORCE in Makefile
ARM: 8540/1: decompressor: use clean-files instead of extra-y to clean files
ARM: 8539/1: decompressor: drop more unneeded assignments to "targets"
ARM: 8538/1: decompressor: drop unneeded assignments to "targets"
ARM: 8532/1: uncompress: mark putc as inline
ARM: 8531/1: turn init_new_context into an inline function
ARM: 8530/1: remove VIRT_TO_BUS
ARM: 8537/1: drop unused DEBUG_RODATA from XIP_KERNEL
ARM: 8536/1: mm: hide __start_rodata_section_aligned for non-debug builds
ARM: 8535/1: mm: DEBUG_RODATA makes no sense with XIP_KERNEL
ARM: 8534/1: virt: fix hyp-stub build for pre-ARMv7 CPUs
ARM: make the physical-relative calculation more obvious
ARM: 8512/1: proc-v7.S: Adjust stack address when XIP_KERNEL
ARM: 8411/1: Add default SPARSEMEM settings
ARM: 8503/1: clk_register_clkdev: remove format string interface
ARM: 8529/1: remove 'i' and 'zi' targets
...
If a write is directed at a known bad block perform the following:
1/ write the data
2/ send a clear poison command
3/ invalidate the poison out of the cache hierarchy
Cc: <x86@kernel.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When we enounter a bad block we need to kunmap_atomic() before
returning.
Cc: <stable@vger.kernel.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
alloc_disk(0) does not require or use a ->major number,
all devices are allocated with a major of BLOCK_EXT_MAJOR.
So don't allocate btt_major.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When alloc_disk(0) is used ->major is completely ignored, all devices
are allocated with a "major" of BLOCK_EXT_MAJOR.
So don't allocate nd_blk_major
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When alloc_disk(0) or alloc_disk-node(0, XX) is used, the ->major
number is completely ignored: all devices are allocated with a
major of BLOCK_EXT_MAJOR.
So there is no point allocating pmem_major.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
drivers/nvdimm/pmem.c: In function 'nvdimm_namespace_attach_pfn':
drivers/nvdimm/pmem.c:367:3: error: implicit declaration of function
'__phys_to_pfn' [-Werror=implicit-function-declaration]
.base_pfn = __phys_to_pfn(nsio->res.start),
ia64 does not provide __phys_to_pfn(), just use the PHYS_PFN() alias.
Cc: Guenter Roeck <linux@roeck-us.net>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Add the boiler-plate for a 'clear error' command based on section
9.20.7.6 "Function Index 4 - Clear Uncorrectable Error" from the ACPI
6.1 specification, and add a reference implementation in nfit_test.
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Currenty with a raw mode pmem namespace the physical memory address range for
the device can be obtained via /sys/block/pmemX/device/{resource|size}. Add
similar attributes for pfn instances that takes the struct page memmap and
section padding into account.
Reported-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
On a platform where 'Persistent Memory' and 'System RAM' are mixed
within a given sparsemem section, trim the namespace and notify about the
sub-optimal alignment.
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>