Creating a new cache for kernfs_iattrs.
Currently, memory is allocated with kzalloc() which
always gives aligned memory. On ARM, this is 64 byte aligned.
To avoid the wastage of memory in aligning the size requested,
a new cache for kernfs_iattrs is created.
Size of struct kernfs_iattrs is 80 Bytes.
On ARM, it will come in kmalloc-128 slab.
and it will come in kmalloc-192 slab if debug info is enabled.
Extra bytes taken 48 bytes.
Total number of objects created : 4096
Total saving = 48*4096 = 192 KB
After creating new slab(When debug info is enabled) :
sh-3.2# cat /proc/slabinfo
...
kernfs_iattrs_cache 4069 4096 128 32 1 : tunables 0 0 0 : slabdata 128 128 0
...
All testing has been done on ARM target.
Signed-off-by: Ayush Mittal <ayush.m@samsung.com>
Signed-off-by: Vaneet Narang <v.narang@samsung.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This include is not needed (fs/sysfs/file.c builds just fine without
it). Remove it.
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When unbinding the (IOMMU-enabled) R-Car SATA device on Salvator-XS
(R-Car H3 ES2.0), in preparation of rebinding against vfio-platform for
device pass-through for virtualization:
echo ee300000.sata > /sys/bus/platform/drivers/sata_rcar/unbind
the kernel crashes with:
Unable to handle kernel paging request at virtual address ffffffbf029ffffc
Mem abort info:
ESR = 0x96000006
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000006
CM = 0, WnR = 0
swapper pgtable: 4k pages, 39-bit VAs, pgdp = 000000007e8c586c
[ffffffbf029ffffc] pgd=000000073bfc6003, pud=000000073bfc6003, pmd=0000000000000000
Internal error: Oops: 96000006 [#1] SMP
Modules linked in:
CPU: 0 PID: 1098 Comm: bash Not tainted 5.0.0-rc5-salvator-x-00452-g37596f884f4318ef #287
Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
pstate: 60400005 (nZCv daif +PAN -UAO)
pc : __free_pages+0x8/0x58
lr : __dma_direct_free_pages+0x50/0x5c
sp : ffffff801268baa0
x29: ffffff801268baa0 x28: 0000000000000000
x27: ffffffc6f9c60bf0 x26: ffffffc6f9c60bf0
x25: ffffffc6f9c60810 x24: 0000000000000000
x23: 00000000fffff000 x22: ffffff8012145000
x21: 0000000000000800 x20: ffffffbf029fffc8
x19: 0000000000000000 x18: ffffffc6f86c42c8
x17: 0000000000000000 x16: 0000000000000070
x15: 0000000000000003 x14: 0000000000000000
x13: ffffff801103d7f8 x12: 0000000000000028
x11: ffffff8011117604 x10: 0000000000009ad8
x9 : ffffff80110126d0 x8 : ffffffc6f7563000
x7 : 6b6b6b6b6b6b6b6b x6 : 0000000000000018
x5 : ffffff8011cf3cc8 x4 : 0000000000004000
x3 : 0000000000080000 x2 : 0000000000000001
x1 : 0000000000000000 x0 : ffffffbf029fffc8
Process bash (pid: 1098, stack limit = 0x00000000c38e3e32)
Call trace:
__free_pages+0x8/0x58
__dma_direct_free_pages+0x50/0x5c
arch_dma_free+0x1c/0x98
dma_direct_free+0x14/0x24
dma_free_attrs+0x9c/0xdc
dmam_release+0x18/0x20
release_nodes+0x25c/0x28c
devres_release_all+0x48/0x4c
device_release_driver_internal+0x184/0x1f0
device_release_driver+0x14/0x1c
unbind_store+0x70/0xb8
drv_attr_store+0x24/0x34
sysfs_kf_write+0x4c/0x64
kernfs_fop_write+0x154/0x1c4
__vfs_write+0x34/0x164
vfs_write+0xb4/0x16c
ksys_write+0x5c/0xbc
__arm64_sys_write+0x14/0x1c
el0_svc_common+0x98/0x114
el0_svc_handler+0x1c/0x24
el0_svc+0x8/0xc
Code: d51b4234 17fffffa a9bf7bfd 910003fd (b9403404)
---[ end trace 8c564cdd3a1a840f ]---
While I've bisected this to commit e8e683ae9a ("iommu/of: Fix
probe-deferral"), and reverting that commit on post-v5.0-rc4 kernels
does fix the problem, this turned out to be a red herring.
On arm64, arch_teardown_dma_ops() resets dev->dma_ops to NULL.
Hence if a driver has used a managed DMA allocation API, the allocated
DMA memory will be freed using the direct DMA ops, while it may have
been allocated using a custom DMA ops (iommu_dma_ops in this case).
Fix this by reversing the order of the calls to devres_release_all() and
arch_teardown_dma_ops().
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: stable <stable@vger.kernel.org>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If device_link_add() is called twice in a row to create a stateless
device link for the same consumer-supplier pair without an attempt
to delete the link between these calls, and the second caller passes
DL_FLAG_RPM_ACTIVE to it in flags, calling either device_link_del()
or device_link_remove() immediately after that will leave the link's
supplier device with nonzero PM-runtime usage counter, which may
prevent the supplier from being runtime-suspended going forward
until the link is deleted by another invocation of device_link_del()
or device_link_remove() for it.
Even though this is confusing and may lead to subtle issues, trying
to avoid it in the framework also may cause problems to appear, so
document it as a known limitation.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the target device has any suppliers, as reflected by device links
to them, __pm_runtime_set_status() does not take them into account,
which is not consistent with the other parts of the PM-runtime
framework and may lead to programming mistakes.
Modify __pm_runtime_set_status() to take suppliers into account by
activating them upfront if the new status is RPM_ACTIVE and
deactivating them on exit if the new status is RPM_SUSPENDED.
If the activation of one of the suppliers fails, the new status
will be RPM_SUSPENDED and the (remaining) suppliers will be
deactivated on exit (the child count of the device's parent
will be dropped too then).
Of course, adding device links locking to __pm_runtime_set_status()
means that it cannot be run fron interrupt context, so make it use
spin_lock_irq() and spin_unlock_irq() instead of spin_lock_irqsave()
and spin_unlock_irqrestore(), respectively.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add __cold to the dev_<level> logging functions similar to
the use of __cold in the generic printk function.
Using __cold moves all the dev_<level> logging functions
out-of-line possibly improving code locality and runtime
performance.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This does not build yet ...
Cc: Richard Gong <richard.gong@intel.com>
Fixes: 095ff29d2b ("firmware: intel_stratix10_service: add hardware dependency")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add a new device link flag, DL_FLAG_AUTOPROBE_CONSUMER, to request the
driver core to probe for a consumer driver automatically after binding
a driver to the supplier device on a persistent managed device link.
As unbinding the supplier driver on a managed device link causes the
consumer driver to be detached from its device automatically, this
flag provides a complementary mechanism which is needed to address
some "composite device" use cases.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Even though stateful device links are managed by the driver core in
principle, their creators are allowed and sometimes even expected
to drop references to them via device_link_del() or
device_link_remove(), but that doesn't really play well with the
"persistent" link concept.
If "persistent" managed device links are created from driver
probe callbacks, device_link_add() called to do that will take a
new reference on the link each time the callback runs and those
references will never be dropped, which kind of isn't nice.
This issues arises because of the link reference counting carried
out by device_link_add() for existing links, but that is only done to
avoid deleting device links that may still be necessary, which
shouldn't be a concern for managed (stateful) links. These device
links are managed by the driver core and whoever creates one of them
will need it at least as long as until the consumer driver is detached
from its device and deleting it may be left to the driver core just
fine.
For this reason, rework device_link_add() to apply the reference
counting to stateless links only and make device_link_del() and
device_link_remove() drop references to stateless links only too.
After this change, if called to add a stateful device link for
a consumer-supplier pair for which a stateful device link is
present already, device_link_add() will return the existing link
without incrementing its reference counter. Accordingly,
device_link_del() and device_link_remove() will WARN() and do
nothing when called to drop a reference to a stateful link. Thus,
effectively, all stateful device links will be owned by the driver
core.
In addition, clean up the handling of the link management flags,
DL_FLAG_AUTOREMOVE_CONSUMER and DL_FLAG_AUTOREMOVE_SUPPLIER, so that
(a) they are never set at the same time and (b) if device_link_add()
is called for a consumer-supplier pair with an existing stateful link
between them, the flags of that link will be combined with the flags
passed to device_link_add() to ensure that the life time of the link
is sufficient for all of the callers of device_link_add() for the
same consumer-supplier pair.
Update the device_link_add() kerneldoc comment to reflect the
above changes.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The device links used by rockchip-iommu and exynos-iommu are
completely managed by these drivers within the IOMMU framework,
so there is no reason to involve the driver core in the management
of these links.
For this reason, make rockchip-iommu and exynos-iommu pass
DL_FLAG_STATELESS in flags to device_link_add(), so that the device
links used by them are stateless.
[Note that this change is requisite for a subsequent one that will
rework the management of stateful device links in the driver core
and it will not be compatible with the two drivers in question any
more.]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Calling rpm_put_suppliers() from pm_runtime_drop_link() is excessive
as it affects all suppliers of the consumer device and not just the
one pointed to by the device link being dropped. Worst case it may
cause the consumer device to stop working unexpectedly. Moreover, in
principle it is racy with respect to runtime PM of the consumer
device.
To avoid these problems drop runtime PM references on the particular
supplier pointed to by the link in question only and do that after
the link has been dropped from the consumer device's list of links to
suppliers, which is in device_link_free().
Fixes: a0504aecba ("PM / runtime: Drop usage count for suppliers at device link removal")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, it is not valid to add a device link from a consumer
driver ->probe callback to a supplier that is still probing too, but
generally this is a valid use case. For example, if the consumer has
just acquired a resource that can only be available if the supplier
is functional, adding a device link to that supplier right away
should be safe (and even desirable arguably), but device_link_add()
doesn't handle that case correctly and the initial state of the link
created by it is wrong then.
To address this problem, change the initial state of device links
added between a probing supplier and a probing consumer to
DL_STATE_CONSUMER_PROBE and update device_links_driver_bound() to
skip such links on the supplier side.
With this change, if the supplier probe completes first,
device_links_driver_bound() called for it will skip the link state
update and when it is called for the consumer, the link state will
be updated to "active". In turn, if the consumer probe completes
first, device_links_driver_bound() called for it will change the
state of the link to "active" and when it is called for the
supplier, the link status update will be skipped.
However, in principle the supplier or consumer probe may still fail
after the link has been added, so modify device_links_no_driver() to
change device links in the "active" or "consumer probe" state to
"dormant" on the supplier side and update __device_links_no_driver()
to change the link state to "available" only if it is "consumer
probe" or "active".
Then, if the supplier probe fails first, the leftover link to the
probing consumer will become "dormant" and device_links_no_driver()
called for the consumer (when its probe fails) will clean it up.
In turn, if the consumer probe fails first, it will either drop the
link, or change its state to "available" and, in the latter case,
when device_links_no_driver() is called for the supplier, it will
update the link state to "dormant". [If the supplier probe fails,
but the consumer probe succeeds, which should not happen as long as
the consumer driver is correct, the link still will be around, but
it will be "dormant" until the supplier is probed again.]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After commit ead18c23c2 ("driver core: Introduce device links
reference counting"), if there is a link between the given supplier
and the given consumer already, device_link_add() will refcount it
and return it unconditionally without updating its flags. It is
possible, however, that the second (or any subsequent) caller of
device_link_add() for the same consumer-supplier pair will pass
DL_FLAG_PM_RUNTIME, possibly along with DL_FLAG_RPM_ACTIVE, in flags
to it and the existing link may not behave as expected then.
First, if DL_FLAG_PM_RUNTIME is not set in the existing link's flags
at all, it needs to be set like during the original initialization of
the link.
Second, if DL_FLAG_RPM_ACTIVE is passed to device_link_add() in flags
(in addition to DL_FLAG_PM_RUNTIME), the existing link should to be
updated to reflect the "active" runtime PM configuration of the
consumer-supplier pair and extra care must be taken here to avoid
possible destructive races with runtime PM of the consumer.
To that end, redefine the rpm_active field in struct device_link
as a refcount, initialize it to 1 and make rpm_resume() (for the
consumer) and device_link_add() increment it whenever they acquire
a runtime PM reference on the supplier device. Accordingly, make
rpm_suspend() (for the consumer) and pm_runtime_clean_up_links()
decrement it and drop runtime PM references to the supplier
device in a loop until rpm_active becones 1 again.
Fixes: ead18c23c2 ("driver core: Introduce device links reference counting")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is incorrect to call pm_runtime_get_sync() under
device_links_write_lock(), because it may end up trying to take
device_links_read_lock() while resuming the target device and that
will deadlock in the non-SRCU case, so avoid that by resuming the
supplier device in device_link_add() before calling
device_links_write_lock().
Fixes: 21d5c57b37 ("PM / runtime: Use device links")
Fixes: baa8809f60 ("PM / runtime: Optimize the use of device links")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After commit ead18c23c2 ("driver core: Introduce device links
reference counting"), if there is a link between the given supplier
and the given consumer already, device_link_add() will refcount it
and return it unconditionally. However, if the flags passed to
it on the second (or any subsequent) attempt to create a device
link between the same consumer-supplier pair are not compatible with
the existing link's flags, that is incorrect.
First off, if the existing link is stateless and the next caller of
device_link_add() for the same consumer-supplier pair wants a
stateful one, or the other way around, the existing link cannot be
returned, because it will not match the expected behavior, so make
device_link_add() dump the stack and return NULL in that case.
Moreover, if the DL_FLAG_AUTOREMOVE_CONSUMER flag is passed to
device_link_add(), its caller will expect its reference to the link
to be dropped automatically on consumer driver removal, which will
not happen if that flag is not set in the link's flags (and
analogously for DL_FLAG_AUTOREMOVE_SUPPLIER). For this reason, make
device_link_add() update the existing link's flags accordingly
before returning it to the caller.
Fixes: ead18c23c2 ("driver core: Introduce device links reference counting")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change the list walk in device_links_driver_cleanup() to a safe one
to avoid use-after-free when dropping a link from the list during the
walk.
Also, while at it, fix device_link_add() to refuse to create
stateless device links with DL_FLAG_AUTOREMOVE_SUPPLIER set, which is
an invalid combination (setting that flag means that the driver core
should manage the link, so it cannot be stateless), and extend the
kerneldoc comment of device_link_add() to cover the
DL_FLAG_AUTOREMOVE_SUPPLIER flag properly too.
Fixes: 1689cac5b3 ("driver core: Add flag to autoremove device link on supplier unbind")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
devm_ioremap_resource() prefers calling devm_request_mem_region() with a
resource name instead of a device name -- this looks pretty iff a resource
name isn't specified via a device tree with a "reg-names" property (in this
case, a resource name is set to a device node's full name), but if it is,
it doesn't really scale since these names are only unique to a given device
node, not globally; so, looking at the output of 'cat /proc/iomem', you do
not have an idea which memory region belongs to which device (see "dirmap",
"regs", and "wbuf" lines below):
08000000-0bffffff : dirmap
48000000-bfffffff : System RAM
48000000-48007fff : reserved
48080000-48b0ffff : Kernel code
48b10000-48b8ffff : reserved
48b90000-48c7afff : Kernel data
bc6a4000-bcbfffff : reserved
bcc0f000-bebfffff : reserved
bec0e000-bec0efff : reserved
bec11000-bec11fff : reserved
bec12000-bec14fff : reserved
bec15000-bfffffff : reserved
e6050000-e605004f : gpio@e6050000
e6051000-e605104f : gpio@e6051000
e6052000-e605204f : gpio@e6052000
e6053000-e605304f : gpio@e6053000
e6054000-e605404f : gpio@e6054000
e6055000-e605504f : gpio@e6055000
e6060000-e606050b : pin-controller@e6060000
e6e60000-e6e6003f : e6e60000.serial
e7400000-e7400fff : ethernet@e7400000
ee200000-ee2001ff : regs
ee208000-ee2080ff : wbuf
I think that devm_request_mem_region() should be called with dev_name()
despite the region names won't look as pretty as before (however, we gain
more consistency with e.g. the serial driver:
08000000-0bffffff : ee200000.rpc
48000000-bfffffff : System RAM
48000000-48007fff : reserved
48080000-48b0ffff : Kernel code
48b10000-48b8ffff : reserved
48b90000-48c7afff : Kernel data
bc6a4000-bcbfffff : reserved
bcc0f000-bebfffff : reserved
bec0e000-bec0efff : reserved
bec11000-bec11fff : reserved
bec12000-bec14fff : reserved
bec15000-bfffffff : reserved
e6050000-e605004f : e6050000.gpio
e6051000-e605104f : e6051000.gpio
e6052000-e605204f : e6052000.gpio
e6053000-e605304f : e6053000.gpio
e6054000-e605404f : e6054000.gpio
e6055000-e605504f : e6055000.gpio
e6060000-e606050b : e6060000.pin-controller
e6e60000-e6e6003f : e6e60000.serial
e7400000-e7400fff : e7400000.ethernet
ee200000-ee2001ff : ee200000.rpc
ee208000-ee2080ff : ee200000.rpc
Fixes: 72f8c0bfa0 ("lib: devres: add convenience function to remap a resource")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Silence warnings (triggered at W=1) by adding relevant __printf
attributes.
drivers/base/cpu.c:432:2: warning: function '__cpu_device_create' might be a candidate for 'gnu_printf' format attribute [-Wsuggest-attribute=format]
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add a Kconfig dependency to ensure Intel Stratix10 service layer driver
can be built only on the platform that supports it.
Signed-off-by: Richard Gong <richard.gong@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current async_probe test code is only testing one device allocated
prior to driver load and only loading one device afterwards. Instead of
doing things this way it makes much more sense to load one device per CPU
in order to actually stress the async infrastructure. By doing this we
should see delays significantly increase in the event of devices being
serialized.
In addition I have updated the test to verify that we are trying to place
the work on the correct NUMA node when we are running in async mode. By
doing this we can verify the best possible outcome for device and driver
load times.
I have added a timeout value that is used to disable the sleep and instead
cause the probe routine to report an error indicating it timed out. By
doing this we limit the maximum runtime for the test to 20 seconds or less.
The last major change in this set is that I have gone through and tuned it
for handling the massive number of possible events that will be scheduled.
Instead of reporting the sleep for each individual device it is moved to
only being displayed if we enable debugging.
With this patch applied below are what a failing test and a passing test
should look like. I elided a few hundred lines in the failing test that
were duplicated since the system I was testing on had a massive number of
CPU cores:
-- Failing --
[ 243.524697] test_async_driver_probe: registering first set of asynchronous devices...
[ 243.535625] test_async_driver_probe: registering asynchronous driver...
[ 243.543038] test_async_driver_probe: registration took 0 msecs
[ 243.549559] test_async_driver_probe: registering second set of asynchronous devices...
[ 243.568350] platform test_async_driver.447: registration took 9 msecs
[ 243.575544] test_async_driver_probe: registering first synchronous device...
[ 243.583454] test_async_driver_probe: registering synchronous driver...
[ 248.825920] test_async_driver_probe: registration took 5235 msecs
[ 248.825922] test_async_driver_probe: registering second synchronous device...
[ 248.825928] test_async_driver test_async_driver.443: NUMA node mismatch 3 != 1
[ 248.825932] test_async_driver test_async_driver.445: NUMA node mismatch 3 != 1
[ 248.825935] test_async_driver test_async_driver.446: NUMA node mismatch 3 != 1
[ 248.825939] test_async_driver test_async_driver.440: NUMA node mismatch 3 != 1
[ 248.825943] test_async_driver test_async_driver.441: NUMA node mismatch 3 != 1
...
[ 248.827150] test_async_driver test_async_driver.229: NUMA node mismatch 0 != 1
[ 248.827158] test_async_driver test_async_driver.228: NUMA node mismatch 0 != 1
[ 248.827220] test_async_driver test_async_driver.281: NUMA node mismatch 2 != 1
[ 248.827229] test_async_driver test_async_driver.282: NUMA node mismatch 2 != 1
[ 248.827240] test_async_driver test_async_driver.280: NUMA node mismatch 2 != 1
[ 253.945834] test_async_driver test_async_driver.1: NUMA node mismatch 0 != 1
[ 253.945878] test_sync_driver test_sync_driver.1: registration took 5119 msecs
[ 253.961693] test_async_driver_probe: async events still pending, forcing timeout and synchronize
[ 259.065839] test_async_driver test_async_driver.2: NUMA node mismatch 0 != 1
[ 259.073786] test_async_driver test_async_driver.3: async probe took too long
[ 259.081669] test_async_driver test_async_driver.3: NUMA node mismatch 0 != 1
[ 259.089569] test_async_driver test_async_driver.4: async probe took too long
[ 259.097451] test_async_driver test_async_driver.4: NUMA node mismatch 0 != 1
[ 259.105338] test_async_driver test_async_driver.5: async probe took too long
[ 259.113204] test_async_driver test_async_driver.5: NUMA node mismatch 0 != 1
[ 259.121089] test_async_driver test_async_driver.6: async probe took too long
[ 259.128961] test_async_driver test_async_driver.6: NUMA node mismatch 0 != 1
[ 259.136850] test_async_driver test_async_driver.7: async probe took too long
...
[ 262.124062] test_async_driver test_async_driver.221: async probe took too long
[ 262.132130] test_async_driver test_async_driver.221: NUMA node mismatch 3 != 1
[ 262.140206] test_async_driver test_async_driver.222: async probe took too long
[ 262.148277] test_async_driver test_async_driver.222: NUMA node mismatch 3 != 1
[ 262.156351] test_async_driver test_async_driver.223: async probe took too long
[ 262.164419] test_async_driver test_async_driver.223: NUMA node mismatch 3 != 1
[ 262.172630] test_async_driver_probe: Test failed with 222 errors and 336 warnings
-- Passing --
[ 105.419247] test_async_driver_probe: registering first set of asynchronous devices...
[ 105.432040] test_async_driver_probe: registering asynchronous driver...
[ 105.439718] test_async_driver_probe: registration took 0 msecs
[ 105.446239] test_async_driver_probe: registering second set of asynchronous devices...
[ 105.477986] platform test_async_driver.447: registration took 22 msecs
[ 105.485276] test_async_driver_probe: registering first synchronous device...
[ 105.493169] test_async_driver_probe: registering synchronous driver...
[ 110.597981] test_async_driver_probe: registration took 5097 msecs
[ 110.604806] test_async_driver_probe: registering second synchronous device...
[ 115.707490] test_sync_driver test_sync_driver.1: registration took 5094 msecs
[ 115.715478] test_async_driver_probe: completed successfully
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Force the device registration for nvdimm devices to be closer to the actual
device. This is achieved by using either the NUMA node ID of the region, or
of the parent. By doing this we can have everything above the region based
on the region, and everything below the region based on the nvdimm bus.
By guaranteeing NUMA locality I see an improvement of as high as 25% for
per-node init of a system with 12TB of persistent memory.
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use the device specific version of the async_schedule commands to defer
various tasks related to power management. By doing this we should see a
slight improvement in performance as any device that is sensitive to
latency/locality in the setup will now be initializing on the node closest
to the device.
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Call the asynchronous probe routines on a CPU local to the device node. By
doing this we should be able to improve our initialization time
significantly as we can avoid having to access the device from a remote
node which may introduce higher latency.
For example, in the case of initializing memory for NVDIMM this can have a
significant impact as initialing 3TB on remote node can take up to 39
seconds while initialing it on a local node only takes 23 seconds. It is
situations like this where we will see the biggest improvement.
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Introduce four new variants of the async_schedule_ functions that allow
scheduling on a specific NUMA node.
The first two functions are async_schedule_near and
async_schedule_near_domain end up mapping to async_schedule and
async_schedule_domain, but provide NUMA node specific functionality. They
replace the original functions which were moved to inline function
definitions that call the new functions while passing NUMA_NO_NODE.
The second two functions are async_schedule_dev and
async_schedule_dev_domain which provide NUMA specific functionality when
passing a device as the data member and that device has a NUMA node other
than NUMA_NO_NODE.
The main motivation behind this is to address the need to be able to
schedule device specific init work on specific NUMA nodes in order to
improve performance of memory initialization.
I have seen a significant improvement in initialziation time for persistent
memory as a result of this approach. In the case of 3TB of memory on a
single node the initialization time in the worst case went from 36s down to
about 26s for a 10s improvement. As such the data shows a general benefit
for affinitizing the async work to the node local to the device.
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Provide a new function, queue_work_node, which is meant to schedule work on
a "random" CPU of the requested NUMA node. The main motivation for this is
to help assist asynchronous init to better improve boot times for devices
that are local to a specific node.
For now we just default to the first CPU that is in the intersection of the
cpumask of the node and the online cpumask. The only exception is if the
CPU is local to the node we will just use the current CPU. This should work
for our purposes as we are currently only using this for unbound work so
the CPU will be translated to a node anyway instead of being directly used.
As we are only using the first CPU to represent the NUMA node for now I am
limiting the scope of the function so that it can only be used with unbound
workqueues.
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Probe devices asynchronously instead of the driver. This results in us
seeing the same behavior if the device is registered before the driver or
after. This way we can avoid serializing the initialization should the
driver not be loaded until after the devices have already been added.
The motivation behind this is that if we have a set of devices that
take a significant amount of time to load we can greatly reduce the time to
load by processing them in parallel instead of one at a time. In addition,
each device can exist on a different node so placing a single thread on one
CPU to initialize all of the devices for a given driver can result in poor
performance on a system with multiple nodes.
This approach can reduce the time needed to scan SCSI LUNs significantly.
The only way to realize that speedup is by enabling more concurrency which
is what is achieved with this patch.
To achieve this it was necessary to add a new member "async_driver" to the
device_private structure to store the driver pointer while we wait on the
deferred probe call.
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Try to consolidate all of the locking and unlocking of both the parent and
device when attaching or removing a driver from a given device.
To do that I first consolidated the lock pattern into two functions
__device_driver_lock and __device_driver_unlock. After doing that I then
created functions specific to attaching and detaching the driver while
acquiring these locks. By doing this I was able to reduce the number of
spots where we touch need_parent_lock from 12 down to 4.
This patch should produce no functional changes, it is meant to be a code
clean-up/consolidation only.
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add an additional bit flag to the device_private struct named "dead".
This additional flag provides a guarantee that when a device_del is
executed on a given interface an async worker will not attempt to attach
the driver following the earlier device_del call. Previously this
guarantee was not present and could result in the device_del call
attempting to remove a driver from an interface only to have the async
worker attempt to probe the driver later when it finally completes the
asynchronous probe call.
One additional change added was that I pulled the check for dev->driver
out of the __device_attach_driver call and instead placed it in the
__device_attach_async_helper call. This was motivated by the fact that the
only other caller of this, __device_attach, had already taken the
device_lock() and checked for dev->driver. Instead of testing for this
twice in this path it makes more sense to just consolidate the dev->dead
and dev->driver checks together into one set of checks.
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
DL_FLAG_AUTOREMOVE_CONSUMER/SUPPLIER means "Remove the link
automatically on consumer/supplier driver unbind", that means we should
remove whole the device_link when there is no this driver no matter what
the ref_count of the link is.
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On platforms making a fair use of regulators, the dev_info() messages
coming from the device link function are a bit too verbose. The amount
of message will increase further with the clock framework joining the
device link party.
These messages looks valuable for people debugging device link related
issues, so dev_dbg() looks more appropriate than dev_info().
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Acked-by: Kevin Hilman <khilman@baylibre.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is currently a missing terminating newline in non-switch case
match when msg == NULL
Signed-off-by: Bo YU <tsu.yubo@gmail.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Repalce printk with pr_warn in kobject_synth_uevent and replace
printk with pr_err in uevent_net_init to make both consistent with
other code in kobject_uevent.c
Signed-off-by: Bo YU <tsu.yubo@gmail.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
kset_get_ownership() is only used in lib/kobject.c, so make it 'static'.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We are trying to get rid of BUS_ATTR() and the usage of that in rbd.c
can be trivially converted to use BUS_ATTR_WO and RO, so use those
macros instead.
Cc: Sage Weil <sage@redhat.com>
Cc: Alex Elder <elder@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Acked-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We are trying to get rid of BUS_ATTR() and the usage of that in
rio-sysfs.c can be trivially converted to use BUS_ATTR_WO(), so use that
instead.
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Alexandre Bounine <alex.bou9@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We are trying to get rid of BUS_ATTR() and the usage of that in
ibmebus.c can be trivially converted to use BUS_ATTR_WO(), so use that
instead.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We are trying to get rid of BUS_ATTR() and the usage of that in
pci-sysfs.c can be trivially converted to use BUS_ATTR_WO(), so use that
instead.
Cc: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We are trying to get rid of BUS_ATTR() and the usage of that in pci.c
can be trivially converted to use BUS_ATTR_RW(), so use that instead.
Cc: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Cc: linux-f2fs-devel@lists.sourceforge.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
debugfs_use_file_start() and debugfs_use_file_finish() do not exist
since commit c9afbec270 ("debugfs: purge obsolete SRCU based removal
protection"); tweak debugfs_create_file_unsafe() comment.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, the 'firmware' directory only contains a single Makefile
to embed extra firmware into the kernel.
Move it to the more relevant place.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The whole code of fallback_table.c is surrounded by #ifdef of
CONFIG_FW_LOADER_USER_HELPER.
Move the CONFIG_FW_LOADER_USER_HELPER switch to Makefile so that
it is not compiled at all when this CONFIG option is disabled.
I also removed the confusing comment, "Module or buit-in [sic]".
CONFIG_FW_LOADER_USER_HELPER is a boolean option.
(If it were a module, CONFIG_FW_LOADER_USER_HELPER_MODULE would
be defined instead.)
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(X + 3) & ~3 is the same as ALIGN(X, 4), so replace all of the
instances of the formwer in the code with the latter. While at it,
introduce a helper variable 'record_size' to avoid duplicating length
calculatin code. No functional change intended.
Cc: Chris Healy <cphealy@gmail.com>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-kernel <linux-kernel@vger.kernel.org>
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We can convert original expression for 'writelen" to use ALIGN as
follows:
(p->len + 9) & ~3 => (p->len + 6 + 3) & ~3 => ALIGN(p->len + 6, 4)
Now, subsituting "p->len + 6" with "p->len + sizeof(p->addr) +
sizeof(p->len)" we end up with the same expression as used by kernel
couterpart in linux/ihex.h:
ALIGN(p->len + sizeof(p->addr) + sizeof(p->len), 4)
That is a full size of the record, aligned to 4 bytes. No functional
change intended.
Cc: Chris Healy <cphealy@gmail.com>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-kernel <linux-kernel@vger.kernel.org>
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Next record calucaltion can be reduced to a much more tivial ALIGN
operation as follows:
1. Splitting 5 into 2 + 3 we get
next = ((be16_to_cpu(rec->len) + 2 + 3) & ~3) - 2 (1)
2. Using ALIGN macro we reduce (1) to:
ALIGN(be16_to_cpu(rec->len) + 2, 4) - 2 (2)
3. Subsituting 'next' in original next record calucation we get:
(void *)&rec->data[ALIGN(be16_to_cpu(rec->len) + 2, 4) - 2] (3)
4. Converting array index to pointer arithmetic we convert (3) into:
(void *)rec + sizeof(*rec) +
ALIGN(be16_to_cpu(rec->len) + 2, 4) - 2 (4)
5. Subsituting sizeof(*rec) with its value, 6, and substracting 2,
in (4) we get:
(void *)rec + ALIGN(be16_to_cpu(rec->len) + 2, 4) + 4 (5)
6. Since ALIGN(X, 4) + 4 == ALIGN(X + 4, 4), (5) can be converted to:
(void *)rec + ALIGN(be16_to_cpu(rec->len) + 6, 4) (6)
5. Subsituting 6 in (6) to sizeof(*rec) we get:
(void *)rec + ALIGN(be16_to_cpu(rec->len) + sizeof(*rec), 4) (7)
Using expression (7) should make it more clear that next record is
located by adding full size of the current record (payload + auxiliary
data) aligned to 4 bytes, to the location of the current one. No
functional change intended.
Cc: Chris Healy <cphealy@gmail.com>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-kernel <linux-kernel@vger.kernel.org>
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When verifying the validity of IHEX file we need to make sure that
zero-length record we found is located at the end of the file. Not
doing that could result in an invalid file with a bogus zero-length in
the middle short-circuiting the check and being reported as valid.
Cc: Chris Healy <cphealy@gmail.com>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-kernel <linux-kernel@vger.kernel.org>
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Convert both ihex_validate_fw() and ihex_next_binrec() to use a helper
function to calculate next record offest. This way we only have one
place implementing next record offset calculation logic. No functional
change intended.
Cc: Chris Healy <cphealy@gmail.com>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-kernel <linux-kernel@vger.kernel.org>
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As the description of struct device_private says, it stores data which
is private to driver core. And it already has similar fields like:
knode_parent, knode_driver, knode_driver and knode_bus. This look it is
more proper to put knode_class together with those fields to make it
private to driver core.
This patch move device->knode_class to device_private to make it comply
with code convention.
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>