-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEe7vIQRWZI0iWSE3xu+CwddJFiJoFAmY8mxAACgkQu+CwddJF
iJru7AgAmBfolYwYjm9fCkH+px40smQQF08W+ygJaKF4+6e+b5ijfI8H3AG7QtuE
5FmdCjSvu56lr15sjeUy7giYWRfeEwxC/ztJ0FJ+RCzSEQVKCo2wWGYxDneelwdH
/v0Of5ENbIiH/svK4TArY9AemZw+nowNrwa4TI1QAEcp47T7x52r0GFOs1pnduep
eV6uSwHSx00myiF3fuMGQ7P4aUDLNTGn5LSHNI4sykObesGPx4Kvr0zZvhQT41me
c6Sc0GwV5M9sqBFwjujIeD7CB98wVPju4SDqNiEL+R1u+pnIA0kkefO4D4VyKvpr
7R/WXmqZI4Ae/HEtcRd8+5Z4FvapPw==
=7ez3
-----END PGP SIGNATURE-----
Merge tag 'slab-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:
"This time it's mostly random cleanups and fixes, with two performance
fixes that might have significant impact, but limited to systems
experiencing particular bad corner case scenarios rather than general
performance improvements.
The memcg hook changes are going through the mm tree due to
dependencies.
- Prevent stalls when reading /proc/slabinfo (Jianfeng Wang)
This fixes the long-standing problem that can happen with workloads
that have alloc/free patterns resulting in many partially used
slabs (in e.g. dentry cache). Reading /proc/slabinfo will traverse
the long partial slab list under spinlock with disabled irqs and
thus can stall other processes or even trigger the lockup
detection. The traversal is only done to count free objects so that
<active_objs> column can be reported along with <num_objs>.
To avoid affecting fast paths with another shared counter
(attempted in the past) or complex partial list traversal schemes
that allow rescheduling, the chosen solution resorts to
approximation - when the partial list is over 10000 slabs long, we
will only traverse first 5000 slabs from head and tail each and use
the average of those to estimate the whole list. Both head and tail
are used as the slabs near head to tend to have more free objects
than the slabs towards the tail.
It is expected the approximation should not break existing
/proc/slabinfo consumers. The <num_objs> field is still accurate
and reflects the overall kmem_cache footprint. The <active_objs>
was already imprecise due to cpu and percpu-partial slabs, so can't
be relied upon to determine exact cache usage. The difference
between <active_objs> and <num_objs> is mainly useful to determine
the slab fragmentation, and that will be possible even with the
approximation in place.
- Prevent allocating many slabs when a NUMA node is full (Chen Jun)
Currently, on NUMA systems with a node under significantly bigger
pressure than other nodes, the fallback strategy may result in each
kmalloc_node() that can't be safisfied from the preferred node, to
allocate a new slab on a fallback node, and not reuse the slabs
already on that node's partial list.
This is now fixed and partial lists of fallback nodes are checked
even for kmalloc_node() allocations. It's still preferred to
allocate a new slab on the requested node before a fallback, but
only with a GFP_NOWAIT attempt, which will fail quickly when the
node is under a significant memory pressure.
- More SLAB removal related cleanups (Xiu Jianfeng, Hyunmin Lee)
- Fix slub_kunit self-test with hardened freelists (Guenter Roeck)
- Mark racy accesses for KCSAN (linke li)
- Misc cleanups (Xiongwei Song, Haifeng Xu, Sangyun Kim)"
* tag 'slab-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/slub: remove the check for NULL kmalloc_caches
mm/slub: create kmalloc 96 and 192 caches regardless cache size order
mm/slub: mark racy access on slab->freelist
slub: use count_partial_free_approx() in slab_out_of_memory()
slub: introduce count_partial_free_approx()
slub: Set __GFP_COMP in kmem_cache by default
mm/slub: remove duplicate initialization for early_kmem_cache_node_alloc()
mm/slub: correct comment in do_slab_free()
mm/slub, kunit: Use inverted data to corrupt kmem cache
mm/slub: simplify get_partial_node()
mm/slub: add slub_get_cpu_partial() helper
mm/slub: remove the check of !kmem_cache_has_cpu_partial()
mm/slub: Reduce memory consumption in extreme scenarios
mm/slub: mark racy accesses on slab->slabs
mm/slub: remove dummy slabinfo functions
In deactivate_slab(), slab->freelist can be changed concurrently. Mark
data race on slab->freelist as benign using READ_ONCE.
This patch is aimed at reducing the number of benign races reported by
KCSAN in order to focus future debugging effort on harmful races.
Signed-off-by: linke li <lilinke99@qq.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Commit 284f17ac13 ("mm/slub: handle bulk and single object freeing
separately") splits single and bulk object freeing in two functions
slab_free() and slab_free_bulk() which leads slab_free() to call
slab_free_hook() directly instead of slab_free_freelist_hook().
If `init_on_free` is set, slab_free_hook() zeroes the object.
Afterward, if `slub_debug=F` and `CONFIG_SLAB_FREELIST_HARDENED` are
set, the do_slab_free() slowpath executes freelist consistency
checks and try to decode a zeroed freepointer which leads to a
"Freepointer corrupt" detection in check_object().
During bulk free, slab_free_freelist_hook() isn't affected as it always
sets it objects freepointer using set_freepointer() to maintain its
reconstructed freelist after `init_on_free`.
For single free, object's freepointer thus needs to be avoided when
stored outside the object if `init_on_free` is set. The freepointer left
as is, check_object() may later detect an invalid pointer value due to
objects overflow.
To reproduce, set `slub_debug=FU init_on_free=1 log_level=7` on the
command line of a kernel build with `CONFIG_SLAB_FREELIST_HARDENED=y`.
dmesg sample log:
[ 10.708715] =============================================================================
[ 10.710323] BUG kmalloc-rnd-05-32 (Tainted: G B T ): Freepointer corrupt
[ 10.712695] -----------------------------------------------------------------------------
[ 10.712695]
[ 10.712695] Slab 0xffffd8bdc400d580 objects=32 used=4 fp=0xffff9d9a80356f80 flags=0x200000000000a00(workingset|slab|node=0|zone=2)
[ 10.716698] Object 0xffff9d9a80356600 @offset=1536 fp=0x7ee4f480ce0ecd7c
[ 10.716698]
[ 10.716698] Bytes b4 ffff9d9a803565f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[ 10.720703] Object ffff9d9a80356600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[ 10.720703] Object ffff9d9a80356610: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[ 10.724696] Padding ffff9d9a8035666c: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[ 10.724696] Padding ffff9d9a8035667c: 00 00 00 00 ....
[ 10.724696] FIX kmalloc-rnd-05-32: Object at 0xffff9d9a80356600 not freed
Fixes: 284f17ac13 ("mm/slub: handle bulk and single object freeing separately")
Cc: <stable@vger.kernel.org>
Co-developed-by: Chengming Zhou <chengming.zhou@linux.dev>
Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev>
Signed-off-by: Nicolas Bouchinet <nicolas.bouchinet@ssi.gouv.fr>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
slab_out_of_memory() uses count_partial() to get the exact count
of free objects for each node. As it may get called in the slab
allocation path, count_partial_free_approx() can be used to avoid
the risk and overhead of traversing a long partial slab list.
At the same time, show_slab_objects() still uses count_partial().
Thus, slub users can still have the option to access the exact
count of objects via sysfs if the overhead is acceptable to them.
Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
When reading "/proc/slabinfo", the kernel needs to report the number
of free objects for each kmem_cache. The current implementation uses
count_partial() to get it by scanning each kmem_cache_node's partial
slab list and summing free objects from every partial slab. This
process must hold per-kmem_cache_node spinlock and disable IRQ, and
may take a long time. Consequently, it can block slab allocations on
other CPUs and cause timeouts for network devices, when the partial
list is long. In production, even NMI watchdog can be triggered due
to this matter: e.g., for "buffer_head", the number of partial slabs
was observed to be ~1M in one kmem_cache_node. This problem was also
confirmed by others [1-3].
Iterating a partial list to get the exact count of objects can cause
soft lockups for a long list with or without the lock (e.g., if
preemption is disabled), and may not be very useful: the object count
can change after the lock is released. The approach of maintaining
free-object counters requires atomic operations on the fast path [3].
So, the fix is to introduce count_partial_free_approx(). This function
can be used for getting the free object count in a kmem_cache_node's
partial list. It limits the number of slabs to scan and avoids scanning
the whole list by giving an approximation for a long list. Suppose the
limit is N. If the list's length is not greater than N, output the exact
count by traversing the list; if its length is greater than N, output an
approximated count by traversing a subset of the list. The proposed
method is to scan N/2 slabs from the list's head and N/2 slabs from
the tail. For a partial list with ~280K slabs, benchmarks show that
it performs better than just counting from the list's head, after slabs
get sorted by kmem_cache_shrink(). Default the limit to 10000, as it
produces an approximation within 1% of the exact count for both
scenarios. Then, use count_partial_free_approx() in get_slabinfo().
Benchmarks: Diff = (exact - approximated) / exact
* Normal case (w/o kmem_cache_shrink()):
| MAX_TO_SCAN | Diff (count from head)| Diff (count head+tail)|
| 1000 | 0.43 % | 1.09 % |
| 5000 | 0.06 % | 0.37 % |
| 10000 | 0.02 % | 0.16 % |
| 20000 | 0.009 % | -0.003 % |
* Skewed case (w/ kmem_cache_shrink()):
| MAX_TO_SCAN | Diff (count from head)| Diff (count head+tail)|
| 1000 | 12.46 % | 6.75 % |
| 5000 | 5.38 % | 1.27 % |
| 10000 | 4.99 % | 0.22 % |
| 20000 | 4.86 % | -0.06 % |
[1] https://lore.kernel.org/linux-mm/alpine.DEB.2.21.2003031602460.1537@www.lameter.com/T/
[2] https://lore.kernel.org/lkml/alpine.DEB.2.22.394.2008071258020.55871@www.lameter.com/T/
[3] https://lore.kernel.org/lkml/1e01092b-140d-2bab-aeba-321a74a194ee@linux.com/T/
Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Now the __GFP_COMP is set only if the higher-order is not 0. However,
__GFP_COMP flag can be set unconditionally because compound page can
not be created in the order-0 case. And this can also simplify the code
a bit (no need to check the order is 0 or not).
Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
The struct track for every object in a new slab is already set up by
new_slab(), so remove the duplicate initialization in
early_kmem_cache_node_alloc().
Co-developed-by: Hyunmin Lee <hyunminlr@gmail.com>
Signed-off-by: Hyunmin Lee <hyunminlr@gmail.com>
Co-developed-by: Jeungwoo Yoo <casionwoo@gmail.com>
Signed-off-by: Jeungwoo Yoo <casionwoo@gmail.com>
Signed-off-by: Sangyun Kim <sangyun.kim@snu.ac.kr>
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
The break conditions for filling cpu partial can be more readable and
simple.
If slub_get_cpu_partial() returns 0, we can confirm that we don't need
to fill cpu partial, then we should break from the loop. On the other
hand, we also should break from the loop if we have added enough cpu
partial slabs.
Meanwhile, the logic above gets rid of the #ifdef and also fixes a weird
corner case that if we set cpu_partial_slabs to 0 from sysfs, we still
allocate at least one here.
Signed-off-by: Xiongwei Song <xiongwei.song@windriver.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Add slub_get_cpu_partial() and dummy function to help improve
get_partial_node(). It can help remove #ifdef of CONFIG_SLUB_CPU_PARTIAL
and improve filling cpu partial logic.
Signed-off-by: Xiongwei Song <xiongwei.song@windriver.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
The check of !kmem_cache_has_cpu_partial(s) with
CONFIG_SLUB_CPU_PARTIAL enabled here is always false.
We have already checked kmem_cache_debug() earlier and if it was true,
then we either continued or broke from the loop so we can't reach this
code in that case and don't need to check kmem_cache_debug() as part of
kmem_cache_has_cpu_partial() again. Here we can remove it.
Signed-off-by: Xiongwei Song <xiongwei.song@windriver.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
When kmalloc_node() is called without __GFP_THISNODE and the target node
lacks sufficient memory, SLUB allocates a folio from a different node
other than the requested node, instead of taking a partial slab from it.
However, since the allocated folio does not belong to the requested
node, on the following allocation it is deactivated and added to the
partial slab list of the node it belongs to.
This behavior can result in excessive memory usage when the requested
node has insufficient memory, as SLUB will repeatedly allocate folios
from other nodes without reusing the previously allocated ones.
To prevent memory wastage, when a preferred node is indicated (not
NUMA_NO_NODE) but without a prior __GFP_THISNODE constraint:
1) try to get a partial slab from target node only by having
__GFP_THISNODE in pc.flags for get_partial()
2) if 1) failed, try to allocate a new slab from target node with
GFP_NOWAIT | __GFP_THISNODE opportunistically.
3) if 2) failed, retry with original gfpflags which will allow
get_partial() try partial lists of other nodes before potentially
allocating new page from other nodes
Without a preferred node, or with __GFP_THISNODE constraint, the
behavior remains unchanged.
On qemu with 4 numa nodes and each numa has 1G memory. Write a test ko
to call kmalloc_node(196, GFP_KERNEL, 3) for (4 * 1024 + 4) * 1024 times.
cat /proc/slabinfo shows:
kmalloc-256 4200530 13519712 256 32 2 : tunables..
after this patch,
cat /proc/slabinfo shows:
kmalloc-256 4200558 4200768 256 32 2 : tunables..
Signed-off-by: Chen Jun <chenjun102@huawei.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
The reads of slab->slabs are racy because it may be changed by
put_cpu_partial concurrently. In slabs_cpu_partial_show() and
show_slab_objects(), slab->slabs is only used for showing information.
Data-racy reads from shared variables that are used only for diagnostic
purposes should typically use data_race(), since it is normally not a
problem if the values are off by a little.
This patch is aimed at reducing the number of benign races reported by
KCSAN in order to focus future debugging effort on harmful races.
Signed-off-by: linke li <lilinke99@qq.com>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
The SLAB implementation has been removed since 6.8, so there is no
other version of slabinfo_show_stats() and slabinfo_write(), then we
can remove these two dummy functions.
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Merge a series from myself that replaces hardcoded SLAB_ cache flag
values with an enum, and explicitly deprecates the SLAB_MEM_SPREAD flag
that is a no-op sine SLAB removal.
This empty wrapped exists only for !CONFIG_MEMCG_KMEM and seems it was
never used. Probably a leftover from development of a series.
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
We already have the inc_slabs_node() after kmem_cache_node->node[node]
initialized in early_kmem_cache_node_alloc(), this special case of
inc_slabs_node() can be removed. Then we don't need to consider the
existence of kmem_cache_node in inc_slabs_node() anymore.
Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
The values of SLAB_ cache creation flags are defined by hand, which is
tedious and error-prone. Use an enum to assign the bit number and a
__SLAB_FLAG_BIT() macro to #define the final flags.
This renumbers the flag values, which is OK as they are only used
internally.
Also define a __SLAB_FLAG_UNUSED macro to assign value to flags disabled
by their respective config options in a unified and sparse-friendly way.
Reviewed-and-tested-by: Xiongwei Song <xiongwei.song@windriver.com>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
The partial slabs on cpu partial list are not frozen after the commit
8cd3fa428b ("slub: Delay freezing of partial slabs") merged. So fix
the comment.
Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
We don't use the object_size parameter in kmem_cache_flags(), so just
remove it.
Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
After commit 16a1d96835 ("mm/slab: remove mm/slab.c and slab_def.h"),
parameter 'flags' is only passed as 0 in create_kmalloc_caches(), and
then it is only passed to new_kmalloc_cache().
So we can change parameter 'flags' to be a local variable with
initial value 0 in new_kmalloc_cache() and remove parameter 'flags'
in create_kmalloc_caches(). Also make new_kmalloc_cache() static
due to it is only used in mm/slab_common.c.
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
The parameter "struct slab *slab" is unused in next_freelist_entry(),
so just remove it.
Acked-by: Christoph Lameter (Ampere) <cl@linux.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Since debug slab is processed by free_to_partial_list(), and only debug
slab which has SLAB_STORE_USER flag would care about the full list, we
can remove these unrelated full list manipulations from __slab_free().
Acked-by: Christoph Lameter (Ampere) <cl@linux.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
The likely case is that we get a usable slab from the cpu partial list,
we can directly load freelist from it and return back, instead of going
the other way that need more work, like reenable interrupt and recheck.
But we need to remove the "VM_BUG_ON(!new.frozen)" in get_freelist()
for reusing it, since cpu partial slab is not frozen. It seems
acceptable since it's only for debug purpose.
And get_freelist() also assumes it can return NULL if the freelist is
empty, which is not possible for the cpu partial slab case, so we
add "VM_BUG_ON(!freelist)" after get_freelist() to make it explicit.
There is some small performance improvement too, which shows by:
perf bench sched messaging -g 5 -t -l 100000
mm-stable slub-optimize
Total time 7.473 7.209
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Since the SLAB allocator has been removed, so we can clean up the
sl[au]b_$params. With only one slab allocator left, it's better to use the
generic "slab" term instead of "slub" which is an implementation detail,
which is pointed out by Vlastimil Babka. For more information please see
[1]. Hence, we are going to use "slab_$param" as the primary prefix.
This patch is changing the following slab parameters
- slub_max_order
- slub_min_order
- slub_min_objects
- slub_debug
to
- slab_max_order
- slab_min_order
- slab_min_objects
- slab_debug
as the primary slab parameters for all references of them in docs and
comments. But this patch won't change variables and functions inside
slub as we will have wider slub/slab change.
Meanwhile, "slub_$params" can also be passed by command line, which is
to keep backward compatibility. Also mark all "slub_$params" as legacy.
Remove the separate descriptions for slub_[no]merge, append legacy tip
for them at the end of descriptions of slab_[no]merge.
[1] https://lore.kernel.org/linux-mm/7512b350-4317-21a0-fab3-4101bc4d8f7a@suse.cz/
Signed-off-by: Xiongwei Song <xiongwei.song@windriver.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
are included in this merge do the following:
- Peng Zhang has done some mapletree maintainance work in the
series
"maple_tree: add mt_free_one() and mt_attr() helpers"
"Some cleanups of maple tree"
- In the series "mm: use memmap_on_memory semantics for dax/kmem"
Vishal Verma has altered the interworking between memory-hotplug
and dax/kmem so that newly added 'device memory' can more easily
have its memmap placed within that newly added memory.
- Matthew Wilcox continues folio-related work (including a few
fixes) in the patch series
"Add folio_zero_tail() and folio_fill_tail()"
"Make folio_start_writeback return void"
"Fix fault handler's handling of poisoned tail pages"
"Convert aops->error_remove_page to ->error_remove_folio"
"Finish two folio conversions"
"More swap folio conversions"
- Kefeng Wang has also contributed folio-related work in the series
"mm: cleanup and use more folio in page fault"
- Jim Cromie has improved the kmemleak reporting output in the
series "tweak kmemleak report format".
- In the series "stackdepot: allow evicting stack traces" Andrey
Konovalov to permits clients (in this case KASAN) to cause
eviction of no longer needed stack traces.
- Charan Teja Kalla has fixed some accounting issues in the page
allocator's atomic reserve calculations in the series "mm:
page_alloc: fixes for high atomic reserve caluculations".
- Dmitry Rokosov has added to the samples/ dorectory some sample
code for a userspace memcg event listener application. See the
series "samples: introduce cgroup events listeners".
- Some mapletree maintanance work from Liam Howlett in the series
"maple_tree: iterator state changes".
- Nhat Pham has improved zswap's approach to writeback in the
series "workload-specific and memory pressure-driven zswap
writeback".
- DAMON/DAMOS feature and maintenance work from SeongJae Park in
the series
"mm/damon: let users feed and tame/auto-tune DAMOS"
"selftests/damon: add Python-written DAMON functionality tests"
"mm/damon: misc updates for 6.8"
- Yosry Ahmed has improved memcg's stats flushing in the series
"mm: memcg: subtree stats flushing and thresholds".
- In the series "Multi-size THP for anonymous memory" Ryan Roberts
has added a runtime opt-in feature to transparent hugepages which
improves performance by allocating larger chunks of memory during
anonymous page faults.
- Matthew Wilcox has also contributed some cleanup and maintenance
work against eh buffer_head code int he series "More buffer_head
cleanups".
- Suren Baghdasaryan has done work on Andrea Arcangeli's series
"userfaultfd move option". UFFDIO_MOVE permits userspace heap
compaction algorithms to move userspace's pages around rather than
UFFDIO_COPY'a alloc/copy/free.
- Stefan Roesch has developed a "KSM Advisor", in the series
"mm/ksm: Add ksm advisor". This is a governor which tunes KSM's
scanning aggressiveness in response to userspace's current needs.
- Chengming Zhou has optimized zswap's temporary working memory
use in the series "mm/zswap: dstmem reuse optimizations and
cleanups".
- Matthew Wilcox has performed some maintenance work on the
writeback code, both code and within filesystems. The series is
"Clean up the writeback paths".
- Andrey Konovalov has optimized KASAN's handling of alloc and
free stack traces for secondary-level allocators, in the series
"kasan: save mempool stack traces".
- Andrey also performed some KASAN maintenance work in the series
"kasan: assorted clean-ups".
- David Hildenbrand has gone to town on the rmap code. Cleanups,
more pte batching, folio conversions and more. See the series
"mm/rmap: interface overhaul".
- Kinsey Ho has contributed some maintenance work on the MGLRU
code in the series "mm/mglru: Kconfig cleanup".
- Matthew Wilcox has contributed lruvec page accounting code
cleanups in the series "Remove some lruvec page accounting
functions".
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZZyF2wAKCRDdBJ7gKXxA
jjWjAP42LHvGSjp5M+Rs2rKFL0daBQsrlvy6/jCHUequSdWjSgEAmOx7bc5fbF27
Oa8+DxGM9C+fwqZ/7YxU2w/WuUmLPgU=
=0NHs
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Many singleton patches against the MM code. The patch series which are
included in this merge do the following:
- Peng Zhang has done some mapletree maintainance work in the series
'maple_tree: add mt_free_one() and mt_attr() helpers'
'Some cleanups of maple tree'
- In the series 'mm: use memmap_on_memory semantics for dax/kmem'
Vishal Verma has altered the interworking between memory-hotplug
and dax/kmem so that newly added 'device memory' can more easily
have its memmap placed within that newly added memory.
- Matthew Wilcox continues folio-related work (including a few fixes)
in the patch series
'Add folio_zero_tail() and folio_fill_tail()'
'Make folio_start_writeback return void'
'Fix fault handler's handling of poisoned tail pages'
'Convert aops->error_remove_page to ->error_remove_folio'
'Finish two folio conversions'
'More swap folio conversions'
- Kefeng Wang has also contributed folio-related work in the series
'mm: cleanup and use more folio in page fault'
- Jim Cromie has improved the kmemleak reporting output in the series
'tweak kmemleak report format'.
- In the series 'stackdepot: allow evicting stack traces' Andrey
Konovalov to permits clients (in this case KASAN) to cause eviction
of no longer needed stack traces.
- Charan Teja Kalla has fixed some accounting issues in the page
allocator's atomic reserve calculations in the series 'mm:
page_alloc: fixes for high atomic reserve caluculations'.
- Dmitry Rokosov has added to the samples/ dorectory some sample code
for a userspace memcg event listener application. See the series
'samples: introduce cgroup events listeners'.
- Some mapletree maintanance work from Liam Howlett in the series
'maple_tree: iterator state changes'.
- Nhat Pham has improved zswap's approach to writeback in the series
'workload-specific and memory pressure-driven zswap writeback'.
- DAMON/DAMOS feature and maintenance work from SeongJae Park in the
series
'mm/damon: let users feed and tame/auto-tune DAMOS'
'selftests/damon: add Python-written DAMON functionality tests'
'mm/damon: misc updates for 6.8'
- Yosry Ahmed has improved memcg's stats flushing in the series 'mm:
memcg: subtree stats flushing and thresholds'.
- In the series 'Multi-size THP for anonymous memory' Ryan Roberts
has added a runtime opt-in feature to transparent hugepages which
improves performance by allocating larger chunks of memory during
anonymous page faults.
- Matthew Wilcox has also contributed some cleanup and maintenance
work against eh buffer_head code int he series 'More buffer_head
cleanups'.
- Suren Baghdasaryan has done work on Andrea Arcangeli's series
'userfaultfd move option'. UFFDIO_MOVE permits userspace heap
compaction algorithms to move userspace's pages around rather than
UFFDIO_COPY'a alloc/copy/free.
- Stefan Roesch has developed a 'KSM Advisor', in the series 'mm/ksm:
Add ksm advisor'. This is a governor which tunes KSM's scanning
aggressiveness in response to userspace's current needs.
- Chengming Zhou has optimized zswap's temporary working memory use
in the series 'mm/zswap: dstmem reuse optimizations and cleanups'.
- Matthew Wilcox has performed some maintenance work on the writeback
code, both code and within filesystems. The series is 'Clean up the
writeback paths'.
- Andrey Konovalov has optimized KASAN's handling of alloc and free
stack traces for secondary-level allocators, in the series 'kasan:
save mempool stack traces'.
- Andrey also performed some KASAN maintenance work in the series
'kasan: assorted clean-ups'.
- David Hildenbrand has gone to town on the rmap code. Cleanups, more
pte batching, folio conversions and more. See the series 'mm/rmap:
interface overhaul'.
- Kinsey Ho has contributed some maintenance work on the MGLRU code
in the series 'mm/mglru: Kconfig cleanup'.
- Matthew Wilcox has contributed lruvec page accounting code cleanups
in the series 'Remove some lruvec page accounting functions'"
* tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (361 commits)
mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER
mm, treewide: introduce NR_PAGE_ORDERS
selftests/mm: add separate UFFDIO_MOVE test for PMD splitting
selftests/mm: skip test if application doesn't has root privileges
selftests/mm: conform test to TAP format output
selftests: mm: hugepage-mmap: conform to TAP format output
selftests/mm: gup_test: conform test to TAP format output
mm/selftests: hugepage-mremap: conform test to TAP format output
mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING
mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large
mm/memcontrol: remove __mod_lruvec_page_state()
mm/khugepaged: use a folio more in collapse_file()
slub: use a folio in __kmalloc_large_node
slub: use folio APIs in free_large_kmalloc()
slub: use alloc_pages_node() in alloc_slab_page()
mm: remove inc/dec lruvec page state functions
mm: ratelimit stat flush from workingset shrinker
kasan: stop leaking stack trace handles
mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE
mm/mglru: add dummy pmd_dirty()
...
commit 23baf831a3 ("mm, treewide: redefine MAX_ORDER sanely") has
changed the definition of MAX_ORDER to be inclusive. This has caused
issues with code that was not yet upstream and depended on the previous
definition.
To draw attention to the altered meaning of the define, rename MAX_ORDER
to MAX_PAGE_ORDER.
Link: https://lkml.kernel.org/r/20231228144704.14033-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
For no apparent reason, we were open-coding alloc_pages_node() in this
function.
Link: https://lkml.kernel.org/r/20231228085748.1083901-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Rename kasan_unpoison_object_data to kasan_unpoison_new_object and add a
documentation comment. Do the same for kasan_poison_object_data.
The new names and the comments should suggest the users that these hooks
are intended for internal use by the slab allocator.
The following patch will remove non-slab-internal uses of these hooks.
No functional changes.
[andreyknvl@google.com: update references to renamed functions in comments]
Link: https://lkml.kernel.org/r/20231221180637.105098-1-andrey.konovalov@linux.dev
Link: https://lkml.kernel.org/r/eab156ebbd635f9635ef67d1a4271f716994e628.1703024586.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Lobakin <alobakin@pm.me>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Breno Leitao <leitao@debian.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When freeing an object that was allocated from KFENCE, we do that in the
slowpath __slab_free(), relying on the fact that KFENCE "slab" cannot be
the cpu slab, so the fastpath has to fallback to the slowpath.
This optimization doesn't help much though, because is_kfence_address()
is checked earlier anyway during the free hook processing or detached
freelist building. Thus we can simplify the code by making the
slab_free_hook() free the KFENCE object immediately, similarly to KASAN
quarantine.
In slab_free_hook() we can place kfence_free() above init processing, as
callers have been making sure to set init to false for KFENCE objects.
This simplifies slab_free(). This places it also above kasan_slab_free()
which is ok as that skips KFENCE objects anyway.
While at it also determine the init value in slab_free_freelist_hook()
outside of the loop.
This change will also make introducing per cpu array caches easier.
Tested-by: Marco Elver <elver@google.com>
Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
When both KASAN and slub_debug are enabled, when a free object is being
prepared in setup_object, slub_debug poisons the object data before KASAN
initializes its per-object metadata.
Right now, in setup_object, KASAN only initializes the alloc metadata,
which is always stored outside of the object. slub_debug is aware of this
and it skips poisoning and checking that memory area.
However, with the following patch in this series, KASAN also starts
initializing its free medata in setup_object. As this metadata might be
stored within the object, this initialization might overwrite the
slub_debug poisoning. This leads to slub_debug reports.
Thus, skip checking slub_debug poisoning of the object data area that
overlaps with the in-object KASAN free metadata.
Also make slub_debug poisoning of tail kmalloc redzones more precise when
KASAN is enabled: slub_debug can still poison and check the tail kmalloc
allocation area that comes after the KASAN free metadata.
Link: https://lkml.kernel.org/r/20231122231202.121277-1-andrey.konovalov@linux.dev
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Marco Elver <elver@google.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Currently we have a single function slab_free() handling both single
object freeing and bulk freeing with necessary hooks, the latter case
requiring slab_free_freelist_hook(). It should be however better to
distinguish the two use cases for the following reasons:
- code simpler to follow for the single object case
- better code generation - although inlining should eliminate the
slab_free_freelist_hook() for single object freeing in case no
debugging options are enabled, it seems it's not perfect. When e.g.
KASAN is enabled, we're imposing additional unnecessary overhead for
single object freeing.
- preparation to add percpu array caches in near future
Therefore, simplify slab_free() for the single object case by dropping
unnecessary parameters and calling only slab_free_hook() instead of
slab_free_freelist_hook(). Rename the bulk variant to slab_free_bulk()
and adjust callers accordingly.
While at it, flip (and document) slab_free_hook() return value so that
it returns true when the freeing can proceed, which matches the logic of
slab_free_freelist_hook() and is not confusingly the opposite.
Additionally we can simplify a bit by changing the tail parameter of
do_slab_free() when freeing a single object - instead of NULL we can set
it equal to head.
bloat-o-meter shows small code reduction with a .config that has KASAN
etc disabled:
add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-118 (-118)
Function old new delta
kmem_cache_alloc_bulk 1203 1196 -7
kmem_cache_free 861 835 -26
__kmem_cache_free 741 704 -37
kmem_cache_free_bulk 911 863 -48
Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Currently, when __kmem_cache_alloc_bulk() fails, it frees back the
objects that were allocated before the failure, using
kmem_cache_free_bulk(). Because kmem_cache_free_bulk() calls the free
hooks (KASAN etc.) and those expect objects that were processed by the
post alloc hooks, slab_post_alloc_hook() is called before
kmem_cache_free_bulk().
This is wasteful, although not a big concern in practice for the rare
error path. But in order to efficiently handle percpu array batch refill
and free in the near future, we will also need a variant of
kmem_cache_free_bulk() that avoids the free hooks. So introduce it now
and use it for the failure path.
In case of failure we however still need to perform memcg uncharge so
handle that in a new memcg_slab_alloc_error_hook(). Thanks to Chengming
Zhou for noticing the missing uncharge.
As a consequence, __kmem_cache_alloc_bulk() no longer needs the objcg
parameter, remove it.
Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
The SLUB sysfs stats enabled CONFIG_SLUB_STATS have two deficiencies
identified wrt bulk alloc/free operations:
- Bulk allocations from cpu freelist are not counted. Add the
ALLOC_FASTPATH counter there.
- Bulk fastpath freeing will count a list of multiple objects with a
single FREE_FASTPATH inc. Add a stat_add() variant to count them all.
Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Inspection of kmem_cache_free() disassembly showed we could make the
fast path smaller by providing few more hints to the compiler, and
splitting the memcg_slab_free_hook() into an inline part that only
checks if there's work to do, and an out of line part doing the actual
uncharge.
bloat-o-meter results:
add/remove: 2/0 grow/shrink: 0/3 up/down: 286/-554 (-268)
Function old new delta
__memcg_slab_free_hook - 270 +270
__pfx___memcg_slab_free_hook - 16 +16
kfree 828 665 -163
kmem_cache_free 1116 948 -168
kmem_cache_free_bulk.part 1701 1478 -223
Checking kmem_cache_free() disassembly now shows the non-fastpath
cases are handled out of line, which should reduce instruction cache
usage.
Acked-by: David Rientjes <rientjes@google.com>
Tested-by: David Rientjes <rientjes@google.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
With allocation fastpaths no longer divided between two .c files, we
have better inlining, however checking the disassembly of
kmem_cache_alloc() reveals we can do better to make the fastpaths
smaller and move the less common situations out of line or to separate
functions, to reduce instruction cache pressure.
- split memcg pre/post alloc hooks to inlined checks that use likely()
to assume there will be no objcg handling necessary, and non-inline
functions doing the actual handling
- add some more likely/unlikely() to pre/post alloc hooks to indicate
which scenarios should be out of line
- change gfp_allowed_mask handling in slab_post_alloc_hook() so the
code can be optimized away when kasan/kmsan/kmemleak is configured out
bloat-o-meter shows:
add/remove: 4/2 grow/shrink: 1/8 up/down: 521/-2924 (-2403)
Function old new delta
__memcg_slab_post_alloc_hook - 461 +461
kmem_cache_alloc_bulk 775 791 +16
__pfx_should_failslab.constprop - 16 +16
__pfx___memcg_slab_post_alloc_hook - 16 +16
should_failslab.constprop - 12 +12
__pfx_memcg_slab_post_alloc_hook 16 - -16
kmem_cache_alloc_lru 1295 1023 -272
kmem_cache_alloc_node 1118 817 -301
kmem_cache_alloc 1076 772 -304
kmalloc_node_trace 1149 838 -311
kmalloc_trace 1102 789 -313
__kmalloc_node_track_caller 1393 1080 -313
__kmalloc_node 1397 1082 -315
__kmalloc 1374 1059 -315
memcg_slab_post_alloc_hook 464 - -464
Note that gcc still decided to inline __memcg_pre_alloc_hook(), but the
code is out of line. Forcing noinline did not improve the results. As a
result the fastpaths are shorter and overal code size is reduced.
Acked-by: David Rientjes <rientjes@google.com>
Tested-by: David Rientjes <rientjes@google.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
slab_alloc() is a thin wrapper around slab_alloc_node() with only one
caller. Replace with direct call of slab_alloc_node().
__kmem_cache_alloc_lru() itself is a thin wrapper with two callers,
so replace it with direct calls of slab_alloc_node() and
trace_kmem_cache_alloc().
This also makes sure _RET_IP_ has always the expected value and not
depending on inlining decisions.
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: David Rientjes <rientjes@google.com>
Tested-by: David Rientjes <rientjes@google.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
This will eliminate a call between compilation units through
__kmem_cache_alloc_node() and allow better inlining of the allocation
fast path.
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: David Rientjes <rientjes@google.com>
Tested-by: David Rientjes <rientjes@google.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
This should result in better code. Currently kfree() makes a function
call between compilation units to __kmem_cache_free() which does its own
virt_to_slab(), throwing away the struct slab pointer we already had in
kfree(). Now it can be reused. Additionally kfree() can now inline the
whole SLUB freeing fastpath.
Also move over free_large_kmalloc() as the only callsites are now in
slub.c, and make it static.
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: David Rientjes <rientjes@google.com>
Tested-by: David Rientjes <rientjes@google.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
The declaration and associated helpers are not used anywhere else
anymore.
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: David Rientjes <rientjes@google.com>
Tested-by: David Rientjes <rientjes@google.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
We don't share those between SLAB and SLUB anymore, so most memcg
related functions can be moved to slub.c proper.
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: David Rientjes <rientjes@google.com>
Tested-by: David Rientjes <rientjes@google.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
We don't share the hooks between two slab implementations anymore so
they can be moved away from the header. As part of the move, also move
should_failslab() from slab_common.c as the pre_alloc hook uses it.
This means slab.h can stop including fault-inject.h and kmemleak.h.
Fix up some files that were depending on the includes transitively.
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: David Rientjes <rientjes@google.com>
Tested-by: David Rientjes <rientjes@google.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Nothing outside SLUB itself accesses the struct kmem_cache_cpu fields so
it does not need to be declared in slub_def.h. This allows also to move
enum stat_item.
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: David Rientjes <rientjes@google.com>
Tested-by: David Rientjes <rientjes@google.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
The SLAB implementation is going to be removed, and mm-api.rst currently
uses mm/slab.c to obtain kerneldocs for some API functions. Switch it to
mm/slub.c and move the relevant kerneldocs of exported functions from
one to the other. The rest of kerneldocs in slab.c is for static SLAB
implementation-specific functions that don't have counterparts in slub.c
and thus can be simply removed with the implementation.
Acked-by: David Rientjes <rientjes@google.com>
Tested-by: David Rientjes <rientjes@google.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
The current updated scheme (which this series implemented) is:
- node partial slabs: PG_Workingset && !frozen
- cpu partial slabs: !PG_Workingset && !frozen
- cpu slabs: !PG_Workingset && frozen
- full slabs: !PG_Workingset && !frozen
The most important change is that "frozen" bit is not set for the
cpu partial slabs anymore, __slab_free() will grab node list_lock
then check by !PG_Workingset that it's not on a node partial list.
And the "frozen" bit is still kept for the cpu slabs for performance,
since we don't need to grab node list_lock to check whether the
PG_Workingset is set or not if the "frozen" bit is set in __slab_free().
Update related documentations and comments in the source.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Acked-by: Christoph Lameter (Ampere) <cl@linux.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Since all partial slabs on the CPU partial list are not frozen anymore,
we don't unfreeze when moving cpu partial slabs to node partial list,
it's better to rename these functions.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Since the introduce of unfrozen slabs on cpu partial list, we don't
need to synchronize the slab frozen state under the node list_lock.
The caller of deactivate_slab() and the caller of __slab_free() won't
manipulate the slab list concurrently.
So we can get node list_lock in the last stage if we really need to
manipulate the slab list in this path.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>