mirror of
https://mirrors.bfsu.edu.cn/git/linux.git
synced 2024-11-15 16:24:13 +08:00
e15ab05a6b
10 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Dan Williams
|
09d09e04d2 |
cxl/dax: Create dax devices for CXL RAM regions
While platform firmware takes some responsibility for mapping the RAM capacity of CXL devices present at boot, the OS is responsible for mapping the remainder and hot-added devices. Platform firmware is also responsible for identifying the platform general purpose memory pool, typically DDR attached DRAM, and arranging for the remainder to be 'Soft Reserved'. That reservation allows the CXL subsystem to route the memory to core-mm via memory-hotplug (dax_kmem), or leave it for dedicated access (device-dax). The new 'struct cxl_dax_region' object allows for a CXL memory resource (region) to be published, but also allow for udev and module policy to act on that event. It also prevents cxl_core.ko from having a module loading dependency on any drivers/dax/ modules. Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167602003896.1924368.10335442077318970468.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> |
||
Dan Williams
|
83762cb5c7 |
dax: Kill DEV_DAX_PMEM_COMPAT
The /sys/class/dax compatibility option has shipped in the kernel for 4 years now which should be sufficient time for tools to abandon the old ABI in favor of the /sys/bus/dax device-model. Delete it now and see if anyone screams. Since this compatibility option shipped there has been more reports of users being surprised by the compat ABI than surprised by the "new", so the compat infrastructure has outlived its usefulness. Recall that /sys/bus/dax device-model is required for the dax kmem driver which allows PMEM to be used as "System RAM". The following projects were known to have a dependency on /sys/class/dax and have dropped their dependency as of the listed version: - ndctl (including libndctl, daxctl, and libdaxctl): v64+ - fio: v3.13+ - pmdk: v1.5.2+ As further evidence this option is no longer needed some distributions have already stopped enabling CONFIG_DEV_DAX_PMEM_COMPAT. Cc: Ira Weiny <ira.weiny@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Reported-by: Vishal Verma <vishal.l.verma@intel.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Jane Chu <jane.chu@oracle.com> Link: https://lore.kernel.org/r/163701116195.3784476.726128179293466337.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> |
||
Dan Williams
|
c01044cc81 |
ACPI: HMAT: refactor hmat_register_target_device to hmem_register_device
In preparation for exposing "Soft Reserved" memory ranges without an HMAT, move the hmem device registration to its own compilation unit and make the implementation generic. The generic implementation drops usage acpi_map_pxm_to_online_node() that was translating ACPI proximity domain values and instead relies on numa_map_to_online_node() to determine the numa node for the device. [joao.m.martins@oracle.com: CONFIG_DEV_DAX_HMEM_DEVICES should depend on CONFIG_DAX=y] Link: https://lkml.kernel.org/r/8f34727f-ec2d-9395-cb18-969ec8a5d0d4@oracle.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: kernel test robot <lkp@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Link: https://lkml.kernel.org/r/159643096584.4062302.5035370788475153738.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lore.kernel.org/r/158318761484.2216124.2049322072599482736.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Dan Williams
|
a6c7f4c6ae |
device-dax: Add a driver for "hmem" devices
Platform firmware like EFI/ACPI may publish "hmem" platform devices. Such a device is a performance differentiated memory range likely reserved for an application specific use case. The driver gives access to 100% of the capacity via a device-dax mmap instance by default. However, if over-subscription and other kernel memory management is desired the resulting dax device can be assigned to the core-mm via the kmem driver. This consumes "hmem" devices the producer of "hmem" devices is saved for a follow-on patch so that it can reference the new CONFIG_DEV_DAX_HMEM symbol to gate performing the enumeration work. Reported-by: kbuild test robot <lkp@intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
||
Dave Hansen
|
c221c0b030 |
device-dax: "Hotplug" persistent memory for use like normal RAM
This is intended for use with NVDIMMs that are physically persistent (physically like flash) so that they can be used as a cost-effective RAM replacement. Intel Optane DC persistent memory is one implementation of this kind of NVDIMM. Currently, a persistent memory region is "owned" by a device driver, either the "Direct DAX" or "Filesystem DAX" drivers. These drivers allow applications to explicitly use persistent memory, generally by being modified to use special, new libraries. (DIMM-based persistent memory hardware/software is described in great detail here: Documentation/nvdimm/nvdimm.txt). However, this limits persistent memory use to applications which *have* been modified. To make it more broadly usable, this driver "hotplugs" memory into the kernel, to be managed and used just like normal RAM would be. To make this work, management software must remove the device from being controlled by the "Device DAX" infrastructure: echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind and then tell the new driver that it can bind to the device: echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id After this, there will be a number of new memory sections visible in sysfs that can be onlined, or that may get onlined by existing udev-initiated memory hotplug rules. This rebinding procedure is currently a one-way trip. Once memory is bound to "kmem", it's there permanently and can not be unbound and assigned back to device_dax. The kmem driver will never bind to a dax device unless the device is *explicitly* bound to the driver. There are two reasons for this: One, since it is a one-way trip, it can not be undone if bound incorrectly. Two, the kmem driver destroys data on the device. Think of if you had good data on a pmem device. It would be catastrophic if you compile-in "kmem", but leave out the "device_dax" driver. kmem would take over the device and write volatile data all over your good data. This inherits any existing NUMA information for the newly-added memory from the persistent memory device that came from the firmware. On Intel platforms, the firmware has guarantees that require each socket's persistent memory to be in a separate memory-only NUMA node. That means that this patch is not expected to create NUMA nodes, but will simply hotplug memory into existing nodes. Because NUMA nodes are created, the existing NUMA APIs and tools are sufficient to create policies for applications or memory areas to have affinity for or an aversion to using this memory. There is currently some metadata at the beginning of pmem regions. The section-size memory hotplug restrictions, plus this small reserved area can cause the "loss" of a section or two of capacity. This should be fixable in follow-on patches. But, as a first step, losing 256MB of memory (worst case) out of hundreds of gigabytes is a good tradeoff vs. the required code to fix this up precisely. This calculation is also the reason we export memory_block_size_bytes(). Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: linux-nvdimm@lists.01.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Cc: Huang Ying <ying.huang@intel.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Jerome Glisse <jglisse@redhat.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> |
||
Dan Williams
|
730926c3b0 |
device-dax: Add /sys/class/dax backwards compatibility
On the expectation that some environments may not upgrade libdaxctl (userspace component that depends on the /sys/class/dax hierarchy), provide a default / legacy dax_pmem_compat driver. The dax_pmem_compat driver implements the original /sys/class/dax sysfs layout rather than /sys/bus/dax. When userspace is upgraded it can blacklist this module and switch to the dax_pmem driver going forward. CONFIG_DEV_DAX_PMEM_COMPAT and supporting code will be deleted according to the dax_pmem entry in Documentation/ABI/obsolete/. Signed-off-by: Dan Williams <dan.j.williams@intel.com> |
||
Dan Williams
|
51cf784c42 |
device-dax: Start defining a dax bus model
Towards eliminating the dax_class, move the dax-device-attribute enabling to a new bus.c file in the core. The amount of code thrash of sub-sequent patches is reduced as no logic changes are made, just pure code movement. A temporary export of unregister_dex_dax() and dax_attribute_groups is needed to preserve compilation, but those symbols become static again in a follow-on patch. Signed-off-by: Dan Williams <dan.j.williams@intel.com> |
||
Greg Kroah-Hartman
|
b24413180f |
License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Dan Williams
|
7b6be8444e |
dax: refactor dax-fs into a generic provider of 'struct dax_device' instances
We want dax capable drivers to be able to publish a set of dax operations [1]. However, we do not want to further abuse block_devices to advertise these operations. Instead we will attach these operations to a dax device and add a lookup mechanism to go from block device path to a dax device. A dax capable driver like pmem or brd is responsible for registering a dax device, alongside a block device, and then a dax capable filesystem is responsible for retrieving the dax device by path name if it wants to call dax_operations. For now, we refactor the dax pseudo-fs to be a generic facility, rather than an implementation detail, of the device-dax use case. Where a "dax device" is just an inode + dax infrastructure, and "Device DAX" is a mapping service layered on top of that base 'struct dax_device'. "Filesystem DAX" is then a mapping service that layers a filesystem on top of that same base device. Filesystem DAX is associated with a block_device for now, but perhaps directly to a dax device in the future, or for new pmem-only filesystems. [1]: https://lkml.org/lkml/2017/1/19/880 Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com> |
||
Dan Williams
|
ab68f26221 |
/dev/dax, pmem: direct access to persistent memory
Device DAX is the device-centric analogue of Filesystem DAX (CONFIG_FS_DAX). It allows memory ranges to be allocated and mapped without need of an intervening file system. Device DAX is strict, precise and predictable. Specifically this interface: 1/ Guarantees fault granularity with respect to a given page size (pte, pmd, or pud) set at configuration time. 2/ Enforces deterministic behavior by being strict about what fault scenarios are supported. For example, by forcing MADV_DONTFORK semantics and omitting MAP_PRIVATE support device-dax guarantees that a mapping always behaves/performs the same once established. It is the "what you see is what you get" access mechanism to differentiated memory vs filesystem DAX which has filesystem specific implementation semantics. Persistent memory is the first target, but the mechanism is also targeted for exclusive allocations of performance differentiated memory ranges. This commit is limited to the base device driver infrastructure to associate a dax device with pmem range. Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com> |