linux/lib/percpu_counter.c

408 lines
11 KiB
C
Raw Normal View History

License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 22:07:57 +08:00
// SPDX-License-Identifier: GPL-2.0
/*
* Fast batching percpu counters.
*/
#include <linux/percpu_counter.h>
#include <linux/mutex.h>
#include <linux/init.h>
#include <linux/cpu.h>
#include <linux/module.h>
percpu_counter: add debugobj support All percpu counters are linked to a global list on initialization and removed from it on destruction. The list is walked during CPU up/down. If a percpu counter is freed without being properly destroyed, the system will oops only on the next CPU up/down making it pretty nasty to track down. This patch adds debugobj support for percpu counters so that such problems can be found easily. As percpu counters don't make sense on stack and can't be statically initialized, debugobj support is pretty simple. It's initialized and activated on counter initialization, and deactivatd and destroyed on counter destruction. With this patch applied, the bug fixed by commit 602586a83b719df0fbd94196a1359ed35aeb2df3 (shmem: put_super must percpu_counter_destroy) triggers the following warning on tmpfs unmount and the system won't oops on the next cpu up/down operation. ------------[ cut here ]------------ WARNING: at lib/debugobjects.c:259 debug_print_object+0x5c/0x70() Hardware name: Bochs ODEBUG: free active (active state 0) object type: percpu_counter Modules linked in: Pid: 3999, comm: umount Not tainted 2.6.36-rc2-work+ #5 Call Trace: [<ffffffff81083f7f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff81084076>] warn_slowpath_fmt+0x46/0x50 [<ffffffff813b45cc>] debug_print_object+0x5c/0x70 [<ffffffff813b50e5>] debug_check_no_obj_freed+0x125/0x210 [<ffffffff811577d3>] kfree+0xb3/0x2f0 [<ffffffff81132edd>] shmem_put_super+0x1d/0x30 [<ffffffff81162e96>] generic_shutdown_super+0x56/0xe0 [<ffffffff81162f86>] kill_anon_super+0x16/0x60 [<ffffffff81162ff7>] kill_litter_super+0x27/0x30 [<ffffffff81163295>] deactivate_locked_super+0x45/0x60 [<ffffffff81163cfa>] deactivate_super+0x4a/0x70 [<ffffffff8117d446>] mntput_no_expire+0x86/0xe0 [<ffffffff8117df7f>] sys_umount+0x6f/0x360 [<ffffffff8103f01b>] system_call_fastpath+0x16/0x1b ---[ end trace cce2a341ba3611a7 ]--- Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Thomas Gleixner <tglxlinutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-27 05:23:05 +08:00
#include <linux/debugobjects.h>
#ifdef CONFIG_HOTPLUG_CPU
static LIST_HEAD(percpu_counters);
static DEFINE_SPINLOCK(percpu_counters_lock);
#endif
percpu_counter: add debugobj support All percpu counters are linked to a global list on initialization and removed from it on destruction. The list is walked during CPU up/down. If a percpu counter is freed without being properly destroyed, the system will oops only on the next CPU up/down making it pretty nasty to track down. This patch adds debugobj support for percpu counters so that such problems can be found easily. As percpu counters don't make sense on stack and can't be statically initialized, debugobj support is pretty simple. It's initialized and activated on counter initialization, and deactivatd and destroyed on counter destruction. With this patch applied, the bug fixed by commit 602586a83b719df0fbd94196a1359ed35aeb2df3 (shmem: put_super must percpu_counter_destroy) triggers the following warning on tmpfs unmount and the system won't oops on the next cpu up/down operation. ------------[ cut here ]------------ WARNING: at lib/debugobjects.c:259 debug_print_object+0x5c/0x70() Hardware name: Bochs ODEBUG: free active (active state 0) object type: percpu_counter Modules linked in: Pid: 3999, comm: umount Not tainted 2.6.36-rc2-work+ #5 Call Trace: [<ffffffff81083f7f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff81084076>] warn_slowpath_fmt+0x46/0x50 [<ffffffff813b45cc>] debug_print_object+0x5c/0x70 [<ffffffff813b50e5>] debug_check_no_obj_freed+0x125/0x210 [<ffffffff811577d3>] kfree+0xb3/0x2f0 [<ffffffff81132edd>] shmem_put_super+0x1d/0x30 [<ffffffff81162e96>] generic_shutdown_super+0x56/0xe0 [<ffffffff81162f86>] kill_anon_super+0x16/0x60 [<ffffffff81162ff7>] kill_litter_super+0x27/0x30 [<ffffffff81163295>] deactivate_locked_super+0x45/0x60 [<ffffffff81163cfa>] deactivate_super+0x4a/0x70 [<ffffffff8117d446>] mntput_no_expire+0x86/0xe0 [<ffffffff8117df7f>] sys_umount+0x6f/0x360 [<ffffffff8103f01b>] system_call_fastpath+0x16/0x1b ---[ end trace cce2a341ba3611a7 ]--- Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Thomas Gleixner <tglxlinutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-27 05:23:05 +08:00
#ifdef CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER
static const struct debug_obj_descr percpu_counter_debug_descr;
percpu_counter: add debugobj support All percpu counters are linked to a global list on initialization and removed from it on destruction. The list is walked during CPU up/down. If a percpu counter is freed without being properly destroyed, the system will oops only on the next CPU up/down making it pretty nasty to track down. This patch adds debugobj support for percpu counters so that such problems can be found easily. As percpu counters don't make sense on stack and can't be statically initialized, debugobj support is pretty simple. It's initialized and activated on counter initialization, and deactivatd and destroyed on counter destruction. With this patch applied, the bug fixed by commit 602586a83b719df0fbd94196a1359ed35aeb2df3 (shmem: put_super must percpu_counter_destroy) triggers the following warning on tmpfs unmount and the system won't oops on the next cpu up/down operation. ------------[ cut here ]------------ WARNING: at lib/debugobjects.c:259 debug_print_object+0x5c/0x70() Hardware name: Bochs ODEBUG: free active (active state 0) object type: percpu_counter Modules linked in: Pid: 3999, comm: umount Not tainted 2.6.36-rc2-work+ #5 Call Trace: [<ffffffff81083f7f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff81084076>] warn_slowpath_fmt+0x46/0x50 [<ffffffff813b45cc>] debug_print_object+0x5c/0x70 [<ffffffff813b50e5>] debug_check_no_obj_freed+0x125/0x210 [<ffffffff811577d3>] kfree+0xb3/0x2f0 [<ffffffff81132edd>] shmem_put_super+0x1d/0x30 [<ffffffff81162e96>] generic_shutdown_super+0x56/0xe0 [<ffffffff81162f86>] kill_anon_super+0x16/0x60 [<ffffffff81162ff7>] kill_litter_super+0x27/0x30 [<ffffffff81163295>] deactivate_locked_super+0x45/0x60 [<ffffffff81163cfa>] deactivate_super+0x4a/0x70 [<ffffffff8117d446>] mntput_no_expire+0x86/0xe0 [<ffffffff8117df7f>] sys_umount+0x6f/0x360 [<ffffffff8103f01b>] system_call_fastpath+0x16/0x1b ---[ end trace cce2a341ba3611a7 ]--- Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Thomas Gleixner <tglxlinutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-27 05:23:05 +08:00
static bool percpu_counter_fixup_free(void *addr, enum debug_obj_state state)
percpu_counter: add debugobj support All percpu counters are linked to a global list on initialization and removed from it on destruction. The list is walked during CPU up/down. If a percpu counter is freed without being properly destroyed, the system will oops only on the next CPU up/down making it pretty nasty to track down. This patch adds debugobj support for percpu counters so that such problems can be found easily. As percpu counters don't make sense on stack and can't be statically initialized, debugobj support is pretty simple. It's initialized and activated on counter initialization, and deactivatd and destroyed on counter destruction. With this patch applied, the bug fixed by commit 602586a83b719df0fbd94196a1359ed35aeb2df3 (shmem: put_super must percpu_counter_destroy) triggers the following warning on tmpfs unmount and the system won't oops on the next cpu up/down operation. ------------[ cut here ]------------ WARNING: at lib/debugobjects.c:259 debug_print_object+0x5c/0x70() Hardware name: Bochs ODEBUG: free active (active state 0) object type: percpu_counter Modules linked in: Pid: 3999, comm: umount Not tainted 2.6.36-rc2-work+ #5 Call Trace: [<ffffffff81083f7f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff81084076>] warn_slowpath_fmt+0x46/0x50 [<ffffffff813b45cc>] debug_print_object+0x5c/0x70 [<ffffffff813b50e5>] debug_check_no_obj_freed+0x125/0x210 [<ffffffff811577d3>] kfree+0xb3/0x2f0 [<ffffffff81132edd>] shmem_put_super+0x1d/0x30 [<ffffffff81162e96>] generic_shutdown_super+0x56/0xe0 [<ffffffff81162f86>] kill_anon_super+0x16/0x60 [<ffffffff81162ff7>] kill_litter_super+0x27/0x30 [<ffffffff81163295>] deactivate_locked_super+0x45/0x60 [<ffffffff81163cfa>] deactivate_super+0x4a/0x70 [<ffffffff8117d446>] mntput_no_expire+0x86/0xe0 [<ffffffff8117df7f>] sys_umount+0x6f/0x360 [<ffffffff8103f01b>] system_call_fastpath+0x16/0x1b ---[ end trace cce2a341ba3611a7 ]--- Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Thomas Gleixner <tglxlinutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-27 05:23:05 +08:00
{
struct percpu_counter *fbc = addr;
switch (state) {
case ODEBUG_STATE_ACTIVE:
percpu_counter_destroy(fbc);
debug_object_free(fbc, &percpu_counter_debug_descr);
return true;
percpu_counter: add debugobj support All percpu counters are linked to a global list on initialization and removed from it on destruction. The list is walked during CPU up/down. If a percpu counter is freed without being properly destroyed, the system will oops only on the next CPU up/down making it pretty nasty to track down. This patch adds debugobj support for percpu counters so that such problems can be found easily. As percpu counters don't make sense on stack and can't be statically initialized, debugobj support is pretty simple. It's initialized and activated on counter initialization, and deactivatd and destroyed on counter destruction. With this patch applied, the bug fixed by commit 602586a83b719df0fbd94196a1359ed35aeb2df3 (shmem: put_super must percpu_counter_destroy) triggers the following warning on tmpfs unmount and the system won't oops on the next cpu up/down operation. ------------[ cut here ]------------ WARNING: at lib/debugobjects.c:259 debug_print_object+0x5c/0x70() Hardware name: Bochs ODEBUG: free active (active state 0) object type: percpu_counter Modules linked in: Pid: 3999, comm: umount Not tainted 2.6.36-rc2-work+ #5 Call Trace: [<ffffffff81083f7f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff81084076>] warn_slowpath_fmt+0x46/0x50 [<ffffffff813b45cc>] debug_print_object+0x5c/0x70 [<ffffffff813b50e5>] debug_check_no_obj_freed+0x125/0x210 [<ffffffff811577d3>] kfree+0xb3/0x2f0 [<ffffffff81132edd>] shmem_put_super+0x1d/0x30 [<ffffffff81162e96>] generic_shutdown_super+0x56/0xe0 [<ffffffff81162f86>] kill_anon_super+0x16/0x60 [<ffffffff81162ff7>] kill_litter_super+0x27/0x30 [<ffffffff81163295>] deactivate_locked_super+0x45/0x60 [<ffffffff81163cfa>] deactivate_super+0x4a/0x70 [<ffffffff8117d446>] mntput_no_expire+0x86/0xe0 [<ffffffff8117df7f>] sys_umount+0x6f/0x360 [<ffffffff8103f01b>] system_call_fastpath+0x16/0x1b ---[ end trace cce2a341ba3611a7 ]--- Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Thomas Gleixner <tglxlinutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-27 05:23:05 +08:00
default:
return false;
percpu_counter: add debugobj support All percpu counters are linked to a global list on initialization and removed from it on destruction. The list is walked during CPU up/down. If a percpu counter is freed without being properly destroyed, the system will oops only on the next CPU up/down making it pretty nasty to track down. This patch adds debugobj support for percpu counters so that such problems can be found easily. As percpu counters don't make sense on stack and can't be statically initialized, debugobj support is pretty simple. It's initialized and activated on counter initialization, and deactivatd and destroyed on counter destruction. With this patch applied, the bug fixed by commit 602586a83b719df0fbd94196a1359ed35aeb2df3 (shmem: put_super must percpu_counter_destroy) triggers the following warning on tmpfs unmount and the system won't oops on the next cpu up/down operation. ------------[ cut here ]------------ WARNING: at lib/debugobjects.c:259 debug_print_object+0x5c/0x70() Hardware name: Bochs ODEBUG: free active (active state 0) object type: percpu_counter Modules linked in: Pid: 3999, comm: umount Not tainted 2.6.36-rc2-work+ #5 Call Trace: [<ffffffff81083f7f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff81084076>] warn_slowpath_fmt+0x46/0x50 [<ffffffff813b45cc>] debug_print_object+0x5c/0x70 [<ffffffff813b50e5>] debug_check_no_obj_freed+0x125/0x210 [<ffffffff811577d3>] kfree+0xb3/0x2f0 [<ffffffff81132edd>] shmem_put_super+0x1d/0x30 [<ffffffff81162e96>] generic_shutdown_super+0x56/0xe0 [<ffffffff81162f86>] kill_anon_super+0x16/0x60 [<ffffffff81162ff7>] kill_litter_super+0x27/0x30 [<ffffffff81163295>] deactivate_locked_super+0x45/0x60 [<ffffffff81163cfa>] deactivate_super+0x4a/0x70 [<ffffffff8117d446>] mntput_no_expire+0x86/0xe0 [<ffffffff8117df7f>] sys_umount+0x6f/0x360 [<ffffffff8103f01b>] system_call_fastpath+0x16/0x1b ---[ end trace cce2a341ba3611a7 ]--- Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Thomas Gleixner <tglxlinutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-27 05:23:05 +08:00
}
}
static const struct debug_obj_descr percpu_counter_debug_descr = {
percpu_counter: add debugobj support All percpu counters are linked to a global list on initialization and removed from it on destruction. The list is walked during CPU up/down. If a percpu counter is freed without being properly destroyed, the system will oops only on the next CPU up/down making it pretty nasty to track down. This patch adds debugobj support for percpu counters so that such problems can be found easily. As percpu counters don't make sense on stack and can't be statically initialized, debugobj support is pretty simple. It's initialized and activated on counter initialization, and deactivatd and destroyed on counter destruction. With this patch applied, the bug fixed by commit 602586a83b719df0fbd94196a1359ed35aeb2df3 (shmem: put_super must percpu_counter_destroy) triggers the following warning on tmpfs unmount and the system won't oops on the next cpu up/down operation. ------------[ cut here ]------------ WARNING: at lib/debugobjects.c:259 debug_print_object+0x5c/0x70() Hardware name: Bochs ODEBUG: free active (active state 0) object type: percpu_counter Modules linked in: Pid: 3999, comm: umount Not tainted 2.6.36-rc2-work+ #5 Call Trace: [<ffffffff81083f7f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff81084076>] warn_slowpath_fmt+0x46/0x50 [<ffffffff813b45cc>] debug_print_object+0x5c/0x70 [<ffffffff813b50e5>] debug_check_no_obj_freed+0x125/0x210 [<ffffffff811577d3>] kfree+0xb3/0x2f0 [<ffffffff81132edd>] shmem_put_super+0x1d/0x30 [<ffffffff81162e96>] generic_shutdown_super+0x56/0xe0 [<ffffffff81162f86>] kill_anon_super+0x16/0x60 [<ffffffff81162ff7>] kill_litter_super+0x27/0x30 [<ffffffff81163295>] deactivate_locked_super+0x45/0x60 [<ffffffff81163cfa>] deactivate_super+0x4a/0x70 [<ffffffff8117d446>] mntput_no_expire+0x86/0xe0 [<ffffffff8117df7f>] sys_umount+0x6f/0x360 [<ffffffff8103f01b>] system_call_fastpath+0x16/0x1b ---[ end trace cce2a341ba3611a7 ]--- Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Thomas Gleixner <tglxlinutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-27 05:23:05 +08:00
.name = "percpu_counter",
.fixup_free = percpu_counter_fixup_free,
};
static inline void debug_percpu_counter_activate(struct percpu_counter *fbc)
{
debug_object_init(fbc, &percpu_counter_debug_descr);
debug_object_activate(fbc, &percpu_counter_debug_descr);
}
static inline void debug_percpu_counter_deactivate(struct percpu_counter *fbc)
{
debug_object_deactivate(fbc, &percpu_counter_debug_descr);
debug_object_free(fbc, &percpu_counter_debug_descr);
}
#else /* CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER */
static inline void debug_percpu_counter_activate(struct percpu_counter *fbc)
{ }
static inline void debug_percpu_counter_deactivate(struct percpu_counter *fbc)
{ }
#endif /* CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER */
void percpu_counter_set(struct percpu_counter *fbc, s64 amount)
{
int cpu;
unsigned long flags;
raw_spin_lock_irqsave(&fbc->lock, flags);
for_each_possible_cpu(cpu) {
s32 *pcount = per_cpu_ptr(fbc->counters, cpu);
*pcount = 0;
}
fbc->count = amount;
raw_spin_unlock_irqrestore(&fbc->lock, flags);
}
EXPORT_SYMBOL(percpu_counter_set);
/*
* Add to a counter while respecting batch size.
*
* There are 2 implementations, both dealing with the following problem:
*
* The decision slow path/fast path and the actual update must be atomic.
* Otherwise a call in process context could check the current values and
* decide that the fast path can be used. If now an interrupt occurs before
* the this_cpu_add(), and the interrupt updates this_cpu(*fbc->counters),
* then the this_cpu_add() that is executed after the interrupt has completed
* can produce values larger than "batch" or even overflows.
*/
#ifdef CONFIG_HAVE_CMPXCHG_LOCAL
/*
* Safety against interrupts is achieved in 2 ways:
* 1. the fast path uses local cmpxchg (note: no lock prefix)
* 2. the slow path operates with interrupts disabled
*/
void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch)
{
s64 count;
unsigned long flags;
count = this_cpu_read(*fbc->counters);
do {
if (unlikely(abs(count + amount) >= batch)) {
raw_spin_lock_irqsave(&fbc->lock, flags);
/*
* Note: by now we might have migrated to another CPU
* or the value might have changed.
*/
count = __this_cpu_read(*fbc->counters);
fbc->count += count + amount;
__this_cpu_sub(*fbc->counters, count);
raw_spin_unlock_irqrestore(&fbc->lock, flags);
return;
}
} while (!this_cpu_try_cmpxchg(*fbc->counters, &count, count + amount));
}
#else
/*
* local_irq_save() is used to make the function irq safe:
* - The slow path would be ok as protected by an irq-safe spinlock.
* - this_cpu_add would be ok as it is irq-safe by definition.
*/
void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch)
{
s64 count;
unsigned long flags;
local_irq_save(flags);
percpucounter: Optimize __percpu_counter_add a bit through the use of this_cpu() options. The this_cpu_* options can be used to optimize __percpu_counter_add a bit. Avoids some address arithmetic and saves 12 bytes. Before: 00000000000001d3 <__percpu_counter_add>: 1d3: 55 push %rbp 1d4: 48 89 e5 mov %rsp,%rbp 1d7: 41 55 push %r13 1d9: 41 54 push %r12 1db: 53 push %rbx 1dc: 48 89 fb mov %rdi,%rbx 1df: 48 83 ec 08 sub $0x8,%rsp 1e3: 4c 8b 67 30 mov 0x30(%rdi),%r12 1e7: 65 4c 03 24 25 00 00 add %gs:0x0,%r12 1ee: 00 00 1f0: 4d 63 2c 24 movslq (%r12),%r13 1f4: 48 63 c2 movslq %edx,%rax 1f7: 49 01 f5 add %rsi,%r13 1fa: 49 39 c5 cmp %rax,%r13 1fd: 7d 0a jge 209 <__percpu_counter_add+0x36> 1ff: f7 da neg %edx 201: 48 63 d2 movslq %edx,%rdx 204: 49 39 d5 cmp %rdx,%r13 207: 7f 1e jg 227 <__percpu_counter_add+0x54> 209: 48 89 df mov %rbx,%rdi 20c: e8 00 00 00 00 callq 211 <__percpu_counter_add+0x3e> 211: 4c 01 6b 18 add %r13,0x18(%rbx) 215: 48 89 df mov %rbx,%rdi 218: 41 c7 04 24 00 00 00 movl $0x0,(%r12) 21f: 00 220: e8 00 00 00 00 callq 225 <__percpu_counter_add+0x52> 225: eb 04 jmp 22b <__percpu_counter_add+0x58> 227: 45 89 2c 24 mov %r13d,(%r12) 22b: 5b pop %rbx 22c: 5b pop %rbx 22d: 41 5c pop %r12 22f: 41 5d pop %r13 231: c9 leaveq 232: c3 retq After: 00000000000001d3 <__percpu_counter_add>: 1d3: 55 push %rbp 1d4: 48 63 ca movslq %edx,%rcx 1d7: 48 89 e5 mov %rsp,%rbp 1da: 41 54 push %r12 1dc: 53 push %rbx 1dd: 48 89 fb mov %rdi,%rbx 1e0: 48 8b 47 30 mov 0x30(%rdi),%rax 1e4: 65 44 8b 20 mov %gs:(%rax),%r12d 1e8: 4d 63 e4 movslq %r12d,%r12 1eb: 49 01 f4 add %rsi,%r12 1ee: 49 39 cc cmp %rcx,%r12 1f1: 7d 0a jge 1fd <__percpu_counter_add+0x2a> 1f3: f7 da neg %edx 1f5: 48 63 d2 movslq %edx,%rdx 1f8: 49 39 d4 cmp %rdx,%r12 1fb: 7f 21 jg 21e <__percpu_counter_add+0x4b> 1fd: 48 89 df mov %rbx,%rdi 200: e8 00 00 00 00 callq 205 <__percpu_counter_add+0x32> 205: 4c 01 63 18 add %r12,0x18(%rbx) 209: 48 8b 43 30 mov 0x30(%rbx),%rax 20d: 48 89 df mov %rbx,%rdi 210: 65 c7 00 00 00 00 00 movl $0x0,%gs:(%rax) 217: e8 00 00 00 00 callq 21c <__percpu_counter_add+0x49> 21c: eb 04 jmp 222 <__percpu_counter_add+0x4f> 21e: 65 44 89 20 mov %r12d,%gs:(%rax) 222: 5b pop %rbx 223: 41 5c pop %r12 225: c9 leaveq 226: c3 retq Reviewed-by: Pekka Enberg <penberg@kernel.org> Reviewed-by: Tejun Heo <tj@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2010-12-07 01:16:19 +08:00
count = __this_cpu_read(*fbc->counters) + amount;
if (abs(count) >= batch) {
raw_spin_lock(&fbc->lock);
fbc->count += count;
__this_cpu_sub(*fbc->counters, count - amount);
raw_spin_unlock(&fbc->lock);
} else {
this_cpu_add(*fbc->counters, amount);
}
local_irq_restore(flags);
}
#endif
EXPORT_SYMBOL(percpu_counter_add_batch);
/*
* For percpu_counter with a big batch, the devication of its count could
* be big, and there is requirement to reduce the deviation, like when the
* counter's batch could be runtime decreased to get a better accuracy,
* which can be achieved by running this sync function on each CPU.
*/
void percpu_counter_sync(struct percpu_counter *fbc)
{
unsigned long flags;
s64 count;
raw_spin_lock_irqsave(&fbc->lock, flags);
count = __this_cpu_read(*fbc->counters);
fbc->count += count;
__this_cpu_sub(*fbc->counters, count);
raw_spin_unlock_irqrestore(&fbc->lock, flags);
}
EXPORT_SYMBOL(percpu_counter_sync);
percpu_counter: add percpu_counter_sum_all interface The percpu_counter is used for scenarios where performance is more important than the accuracy. For percpu_counter users, who want more accurate information in their slowpath, percpu_counter_sum is provided which traverses all the online CPUs to accumulate the data. The reason it only needs to traverse online CPUs is because percpu_counter does implement CPU offline callback which syncs the local data of the offlined CPU. However there is a small race window between the online CPUs traversal of percpu_counter_sum and the CPU offline callback. The offline callback has to traverse all the percpu_counters on the system to flush the CPU local data which can be a lot. During that time, the CPU which is going offline has already been published as offline to all the readers. So, as the offline callback is running, percpu_counter_sum can be called for one counter which has some state on the CPU going offline. Since percpu_counter_sum only traverses online CPUs, it will skip that specific CPU and the offline callback might not have flushed the state for that specific percpu_counter on that offlined CPU. Normally this is not an issue because percpu_counter users can deal with some inaccuracy for small time window. However a new user i.e. mm_struct on the cleanup path wants to check the exact state of the percpu_counter through check_mm(). For such users, this patch introduces percpu_counter_sum_all() which traverses all possible CPUs and it is used in fork.c:check_mm() to avoid the potential race. This issue is exposed by the later patch "mm: convert mm's rss stats into percpu_counter". Link: https://lkml.kernel.org/r/20221109012011.881058-1-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-09 09:20:11 +08:00
/*
* Add up all the per-cpu counts, return the result. This is a more accurate
pcpcntrs: fix dying cpu summation race In commit f689054aace2 ("percpu_counter: add percpu_counter_sum_all interface") a race condition between a cpu dying and percpu_counter_sum() iterating online CPUs was identified. The solution was to iterate all possible CPUs for summation via percpu_counter_sum_all(). We recently had a percpu_counter_sum() call in XFS trip over this same race condition and it fired a debug assert because the filesystem was unmounting and the counter *should* be zero just before we destroy it. That was reported here: https://lore.kernel.org/linux-kernel/20230314090649.326642-1-yebin@huaweicloud.com/ likely as a result of running generic/648 which exercises filesystems in the presence of CPU online/offline events. The solution to use percpu_counter_sum_all() is an awful one. We use percpu counters and percpu_counter_sum() for accurate and reliable threshold detection for space management, so a summation race condition during these operations can result in overcommit of available space and that may result in filesystem shutdowns. As percpu_counter_sum_all() iterates all possible CPUs rather than just those online or even those present, the mask can include CPUs that aren't even installed in the machine, or in the case of machines that can hot-plug CPU capable nodes, even have physical sockets present in the machine. Fundamentally, this race condition is caused by the CPU being offlined being removed from the cpu_online_mask before the notifier that cleans up per-cpu state is run. Hence percpu_counter_sum() will not sum the count for a cpu currently being taken offline, regardless of whether the notifier has run or not. This is the root cause of the bug. The percpu counter notifier iterates all the registered counters, locks the counter and moves the percpu count to the global sum. This is serialised against other operations that move the percpu counter to the global sum as well as percpu_counter_sum() operations that sum the percpu counts while holding the counter lock. Hence the notifier is safe to run concurrently with sum operations, and the only thing we actually need to care about is that percpu_counter_sum() iterates dying CPUs. That's trivial to do, and when there are no CPUs dying, it has no addition overhead except for a cpumask_or() operation. This change makes percpu_counter_sum() always do the right thing in the presence of CPU hot unplug events and makes percpu_counter_sum_all() unnecessary. This, in turn, means that filesystems like XFS, ext4, and btrfs don't have to work out when they should use percpu_counter_sum() vs percpu_counter_sum_all() in their space accounting algorithms Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-03-16 08:31:02 +08:00
* but much slower version of percpu_counter_read_positive().
*
* We use the cpu mask of (cpu_online_mask | cpu_dying_mask) to capture sums
* from CPUs that are in the process of being taken offline. Dying cpus have
* been removed from the online mask, but may not have had the hotplug dead
* notifier called to fold the percpu count back into the global counter sum.
* By including dying CPUs in the iteration mask, we avoid this race condition
* so __percpu_counter_sum() just does the right thing when CPUs are being taken
* offline.
percpu_counter: add percpu_counter_sum_all interface The percpu_counter is used for scenarios where performance is more important than the accuracy. For percpu_counter users, who want more accurate information in their slowpath, percpu_counter_sum is provided which traverses all the online CPUs to accumulate the data. The reason it only needs to traverse online CPUs is because percpu_counter does implement CPU offline callback which syncs the local data of the offlined CPU. However there is a small race window between the online CPUs traversal of percpu_counter_sum and the CPU offline callback. The offline callback has to traverse all the percpu_counters on the system to flush the CPU local data which can be a lot. During that time, the CPU which is going offline has already been published as offline to all the readers. So, as the offline callback is running, percpu_counter_sum can be called for one counter which has some state on the CPU going offline. Since percpu_counter_sum only traverses online CPUs, it will skip that specific CPU and the offline callback might not have flushed the state for that specific percpu_counter on that offlined CPU. Normally this is not an issue because percpu_counter users can deal with some inaccuracy for small time window. However a new user i.e. mm_struct on the cleanup path wants to check the exact state of the percpu_counter through check_mm(). For such users, this patch introduces percpu_counter_sum_all() which traverses all possible CPUs and it is used in fork.c:check_mm() to avoid the potential race. This issue is exposed by the later patch "mm: convert mm's rss stats into percpu_counter". Link: https://lkml.kernel.org/r/20221109012011.881058-1-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-09 09:20:11 +08:00
*/
s64 __percpu_counter_sum(struct percpu_counter *fbc)
{
s64 ret;
int cpu;
unsigned long flags;
pcpcntrs: fix dying cpu summation race In commit f689054aace2 ("percpu_counter: add percpu_counter_sum_all interface") a race condition between a cpu dying and percpu_counter_sum() iterating online CPUs was identified. The solution was to iterate all possible CPUs for summation via percpu_counter_sum_all(). We recently had a percpu_counter_sum() call in XFS trip over this same race condition and it fired a debug assert because the filesystem was unmounting and the counter *should* be zero just before we destroy it. That was reported here: https://lore.kernel.org/linux-kernel/20230314090649.326642-1-yebin@huaweicloud.com/ likely as a result of running generic/648 which exercises filesystems in the presence of CPU online/offline events. The solution to use percpu_counter_sum_all() is an awful one. We use percpu counters and percpu_counter_sum() for accurate and reliable threshold detection for space management, so a summation race condition during these operations can result in overcommit of available space and that may result in filesystem shutdowns. As percpu_counter_sum_all() iterates all possible CPUs rather than just those online or even those present, the mask can include CPUs that aren't even installed in the machine, or in the case of machines that can hot-plug CPU capable nodes, even have physical sockets present in the machine. Fundamentally, this race condition is caused by the CPU being offlined being removed from the cpu_online_mask before the notifier that cleans up per-cpu state is run. Hence percpu_counter_sum() will not sum the count for a cpu currently being taken offline, regardless of whether the notifier has run or not. This is the root cause of the bug. The percpu counter notifier iterates all the registered counters, locks the counter and moves the percpu count to the global sum. This is serialised against other operations that move the percpu counter to the global sum as well as percpu_counter_sum() operations that sum the percpu counts while holding the counter lock. Hence the notifier is safe to run concurrently with sum operations, and the only thing we actually need to care about is that percpu_counter_sum() iterates dying CPUs. That's trivial to do, and when there are no CPUs dying, it has no addition overhead except for a cpumask_or() operation. This change makes percpu_counter_sum() always do the right thing in the presence of CPU hot unplug events and makes percpu_counter_sum_all() unnecessary. This, in turn, means that filesystems like XFS, ext4, and btrfs don't have to work out when they should use percpu_counter_sum() vs percpu_counter_sum_all() in their space accounting algorithms Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-03-16 08:31:02 +08:00
raw_spin_lock_irqsave(&fbc->lock, flags);
ret = fbc->count;
for_each_cpu_or(cpu, cpu_online_mask, cpu_dying_mask) {
s32 *pcount = per_cpu_ptr(fbc->counters, cpu);
ret += *pcount;
}
raw_spin_unlock_irqrestore(&fbc->lock, flags);
return ret;
percpu_counter: add percpu_counter_sum_all interface The percpu_counter is used for scenarios where performance is more important than the accuracy. For percpu_counter users, who want more accurate information in their slowpath, percpu_counter_sum is provided which traverses all the online CPUs to accumulate the data. The reason it only needs to traverse online CPUs is because percpu_counter does implement CPU offline callback which syncs the local data of the offlined CPU. However there is a small race window between the online CPUs traversal of percpu_counter_sum and the CPU offline callback. The offline callback has to traverse all the percpu_counters on the system to flush the CPU local data which can be a lot. During that time, the CPU which is going offline has already been published as offline to all the readers. So, as the offline callback is running, percpu_counter_sum can be called for one counter which has some state on the CPU going offline. Since percpu_counter_sum only traverses online CPUs, it will skip that specific CPU and the offline callback might not have flushed the state for that specific percpu_counter on that offlined CPU. Normally this is not an issue because percpu_counter users can deal with some inaccuracy for small time window. However a new user i.e. mm_struct on the cleanup path wants to check the exact state of the percpu_counter through check_mm(). For such users, this patch introduces percpu_counter_sum_all() which traverses all possible CPUs and it is used in fork.c:check_mm() to avoid the potential race. This issue is exposed by the later patch "mm: convert mm's rss stats into percpu_counter". Link: https://lkml.kernel.org/r/20221109012011.881058-1-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-09 09:20:11 +08:00
}
EXPORT_SYMBOL(__percpu_counter_sum);
int __percpu_counter_init_many(struct percpu_counter *fbc, s64 amount,
gfp_t gfp, u32 nr_counters,
struct lock_class_key *key)
{
unsigned long flags __maybe_unused;
size_t counter_size;
s32 __percpu *counters;
u32 i;
counter_size = ALIGN(sizeof(*counters), __alignof__(*counters));
counters = __alloc_percpu_gfp(nr_counters * counter_size,
__alignof__(*counters), gfp);
if (!counters) {
fbc[0].counters = NULL;
return -ENOMEM;
}
percpu_counter: add debugobj support All percpu counters are linked to a global list on initialization and removed from it on destruction. The list is walked during CPU up/down. If a percpu counter is freed without being properly destroyed, the system will oops only on the next CPU up/down making it pretty nasty to track down. This patch adds debugobj support for percpu counters so that such problems can be found easily. As percpu counters don't make sense on stack and can't be statically initialized, debugobj support is pretty simple. It's initialized and activated on counter initialization, and deactivatd and destroyed on counter destruction. With this patch applied, the bug fixed by commit 602586a83b719df0fbd94196a1359ed35aeb2df3 (shmem: put_super must percpu_counter_destroy) triggers the following warning on tmpfs unmount and the system won't oops on the next cpu up/down operation. ------------[ cut here ]------------ WARNING: at lib/debugobjects.c:259 debug_print_object+0x5c/0x70() Hardware name: Bochs ODEBUG: free active (active state 0) object type: percpu_counter Modules linked in: Pid: 3999, comm: umount Not tainted 2.6.36-rc2-work+ #5 Call Trace: [<ffffffff81083f7f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff81084076>] warn_slowpath_fmt+0x46/0x50 [<ffffffff813b45cc>] debug_print_object+0x5c/0x70 [<ffffffff813b50e5>] debug_check_no_obj_freed+0x125/0x210 [<ffffffff811577d3>] kfree+0xb3/0x2f0 [<ffffffff81132edd>] shmem_put_super+0x1d/0x30 [<ffffffff81162e96>] generic_shutdown_super+0x56/0xe0 [<ffffffff81162f86>] kill_anon_super+0x16/0x60 [<ffffffff81162ff7>] kill_litter_super+0x27/0x30 [<ffffffff81163295>] deactivate_locked_super+0x45/0x60 [<ffffffff81163cfa>] deactivate_super+0x4a/0x70 [<ffffffff8117d446>] mntput_no_expire+0x86/0xe0 [<ffffffff8117df7f>] sys_umount+0x6f/0x360 [<ffffffff8103f01b>] system_call_fastpath+0x16/0x1b ---[ end trace cce2a341ba3611a7 ]--- Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Thomas Gleixner <tglxlinutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-27 05:23:05 +08:00
for (i = 0; i < nr_counters; i++) {
raw_spin_lock_init(&fbc[i].lock);
lockdep_set_class(&fbc[i].lock, key);
#ifdef CONFIG_HOTPLUG_CPU
INIT_LIST_HEAD(&fbc[i].list);
#endif
fbc[i].count = amount;
fbc[i].counters = (void __percpu *)counters + i * counter_size;
debug_percpu_counter_activate(&fbc[i]);
}
percpu_counter: add debugobj support All percpu counters are linked to a global list on initialization and removed from it on destruction. The list is walked during CPU up/down. If a percpu counter is freed without being properly destroyed, the system will oops only on the next CPU up/down making it pretty nasty to track down. This patch adds debugobj support for percpu counters so that such problems can be found easily. As percpu counters don't make sense on stack and can't be statically initialized, debugobj support is pretty simple. It's initialized and activated on counter initialization, and deactivatd and destroyed on counter destruction. With this patch applied, the bug fixed by commit 602586a83b719df0fbd94196a1359ed35aeb2df3 (shmem: put_super must percpu_counter_destroy) triggers the following warning on tmpfs unmount and the system won't oops on the next cpu up/down operation. ------------[ cut here ]------------ WARNING: at lib/debugobjects.c:259 debug_print_object+0x5c/0x70() Hardware name: Bochs ODEBUG: free active (active state 0) object type: percpu_counter Modules linked in: Pid: 3999, comm: umount Not tainted 2.6.36-rc2-work+ #5 Call Trace: [<ffffffff81083f7f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff81084076>] warn_slowpath_fmt+0x46/0x50 [<ffffffff813b45cc>] debug_print_object+0x5c/0x70 [<ffffffff813b50e5>] debug_check_no_obj_freed+0x125/0x210 [<ffffffff811577d3>] kfree+0xb3/0x2f0 [<ffffffff81132edd>] shmem_put_super+0x1d/0x30 [<ffffffff81162e96>] generic_shutdown_super+0x56/0xe0 [<ffffffff81162f86>] kill_anon_super+0x16/0x60 [<ffffffff81162ff7>] kill_litter_super+0x27/0x30 [<ffffffff81163295>] deactivate_locked_super+0x45/0x60 [<ffffffff81163cfa>] deactivate_super+0x4a/0x70 [<ffffffff8117d446>] mntput_no_expire+0x86/0xe0 [<ffffffff8117df7f>] sys_umount+0x6f/0x360 [<ffffffff8103f01b>] system_call_fastpath+0x16/0x1b ---[ end trace cce2a341ba3611a7 ]--- Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Thomas Gleixner <tglxlinutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-27 05:23:05 +08:00
#ifdef CONFIG_HOTPLUG_CPU
spin_lock_irqsave(&percpu_counters_lock, flags);
for (i = 0; i < nr_counters; i++)
list_add(&fbc[i].list, &percpu_counters);
spin_unlock_irqrestore(&percpu_counters_lock, flags);
#endif
return 0;
}
EXPORT_SYMBOL(__percpu_counter_init_many);
void percpu_counter_destroy_many(struct percpu_counter *fbc, u32 nr_counters)
{
unsigned long flags __maybe_unused;
u32 i;
if (WARN_ON_ONCE(!fbc))
return;
if (!fbc[0].counters)
return;
for (i = 0; i < nr_counters; i++)
debug_percpu_counter_deactivate(&fbc[i]);
percpu_counter: add debugobj support All percpu counters are linked to a global list on initialization and removed from it on destruction. The list is walked during CPU up/down. If a percpu counter is freed without being properly destroyed, the system will oops only on the next CPU up/down making it pretty nasty to track down. This patch adds debugobj support for percpu counters so that such problems can be found easily. As percpu counters don't make sense on stack and can't be statically initialized, debugobj support is pretty simple. It's initialized and activated on counter initialization, and deactivatd and destroyed on counter destruction. With this patch applied, the bug fixed by commit 602586a83b719df0fbd94196a1359ed35aeb2df3 (shmem: put_super must percpu_counter_destroy) triggers the following warning on tmpfs unmount and the system won't oops on the next cpu up/down operation. ------------[ cut here ]------------ WARNING: at lib/debugobjects.c:259 debug_print_object+0x5c/0x70() Hardware name: Bochs ODEBUG: free active (active state 0) object type: percpu_counter Modules linked in: Pid: 3999, comm: umount Not tainted 2.6.36-rc2-work+ #5 Call Trace: [<ffffffff81083f7f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff81084076>] warn_slowpath_fmt+0x46/0x50 [<ffffffff813b45cc>] debug_print_object+0x5c/0x70 [<ffffffff813b50e5>] debug_check_no_obj_freed+0x125/0x210 [<ffffffff811577d3>] kfree+0xb3/0x2f0 [<ffffffff81132edd>] shmem_put_super+0x1d/0x30 [<ffffffff81162e96>] generic_shutdown_super+0x56/0xe0 [<ffffffff81162f86>] kill_anon_super+0x16/0x60 [<ffffffff81162ff7>] kill_litter_super+0x27/0x30 [<ffffffff81163295>] deactivate_locked_super+0x45/0x60 [<ffffffff81163cfa>] deactivate_super+0x4a/0x70 [<ffffffff8117d446>] mntput_no_expire+0x86/0xe0 [<ffffffff8117df7f>] sys_umount+0x6f/0x360 [<ffffffff8103f01b>] system_call_fastpath+0x16/0x1b ---[ end trace cce2a341ba3611a7 ]--- Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Thomas Gleixner <tglxlinutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-27 05:23:05 +08:00
#ifdef CONFIG_HOTPLUG_CPU
spin_lock_irqsave(&percpu_counters_lock, flags);
for (i = 0; i < nr_counters; i++)
list_del(&fbc[i].list);
spin_unlock_irqrestore(&percpu_counters_lock, flags);
#endif
free_percpu(fbc[0].counters);
for (i = 0; i < nr_counters; i++)
fbc[i].counters = NULL;
}
EXPORT_SYMBOL(percpu_counter_destroy_many);
int percpu_counter_batch __read_mostly = 32;
EXPORT_SYMBOL(percpu_counter_batch);
static int compute_batch_value(unsigned int cpu)
{
int nr = num_online_cpus();
percpu_counter_batch = max(32, nr*2);
return 0;
}
static int percpu_counter_cpu_dead(unsigned int cpu)
{
#ifdef CONFIG_HOTPLUG_CPU
struct percpu_counter *fbc;
compute_batch_value(cpu);
spin_lock_irq(&percpu_counters_lock);
list_for_each_entry(fbc, &percpu_counters, list) {
s32 *pcount;
raw_spin_lock(&fbc->lock);
pcount = per_cpu_ptr(fbc->counters, cpu);
fbc->count += *pcount;
*pcount = 0;
raw_spin_unlock(&fbc->lock);
}
spin_unlock_irq(&percpu_counters_lock);
#endif
return 0;
}
/*
* Compare counter against given value.
* Return 1 if greater, 0 if equal and -1 if less
*/
int __percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch)
{
s64 count;
count = percpu_counter_read(fbc);
/* Check to see if rough count will be sufficient for comparison */
if (abs(count - rhs) > (batch * num_online_cpus())) {
if (count > rhs)
return 1;
else
return -1;
}
/* Need to use precise count */
count = percpu_counter_sum(fbc);
if (count > rhs)
return 1;
else if (count < rhs)
return -1;
else
return 0;
}
EXPORT_SYMBOL(__percpu_counter_compare);
shmem,percpu_counter: add _limited_add(fbc, limit, amount) Percpu counter's compare and add are separate functions: without locking around them (which would defeat their purpose), it has been possible to overflow the intended limit. Imagine all the other CPUs fallocating tmpfs huge pages to the limit, in between this CPU's compare and its add. I have not seen reports of that happening; but tmpfs's recent addition of dquot_alloc_block_nodirty() in between the compare and the add makes it even more likely, and I'd be uncomfortable to leave it unfixed. Introduce percpu_counter_limited_add(fbc, limit, amount) to prevent it. I believe this implementation is correct, and slightly more efficient than the combination of compare and add (taking the lock once rather than twice when nearing full - the last 128MiB of a tmpfs volume on a machine with 128 CPUs and 4KiB pages); but it does beg for a better design - when nearing full, there is no new batching, but the costly percpu counter sum across CPUs still has to be done, while locked. Follow __percpu_counter_sum()'s example, including cpu_dying_mask as well as cpu_online_mask: but shouldn't __percpu_counter_compare() and __percpu_counter_limited_add() then be adding a num_dying_cpus() to num_online_cpus(), when they calculate the maximum which could be held across CPUs? But the times when it matters would be vanishingly rare. Link: https://lkml.kernel.org/r/bb817848-2d19-bcc8-39ca-ea179af0f0b4@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Carlos Maiolino <cem@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-30 11:42:45 +08:00
/*
* Compare counter, and add amount if total is: less than or equal to limit if
* amount is positive, or greater than or equal to limit if amount is negative.
* Return true if amount is added, or false if total would be beyond the limit.
*
* Negative limit is allowed, but unusual.
* When negative amounts (subs) are given to percpu_counter_limited_add(),
* the limit would most naturally be 0 - but other limits are also allowed.
*
* Overflow beyond S64_MAX is not allowed for: counter, limit and amount
* are all assumed to be sane (far from S64_MIN and S64_MAX).
shmem,percpu_counter: add _limited_add(fbc, limit, amount) Percpu counter's compare and add are separate functions: without locking around them (which would defeat their purpose), it has been possible to overflow the intended limit. Imagine all the other CPUs fallocating tmpfs huge pages to the limit, in between this CPU's compare and its add. I have not seen reports of that happening; but tmpfs's recent addition of dquot_alloc_block_nodirty() in between the compare and the add makes it even more likely, and I'd be uncomfortable to leave it unfixed. Introduce percpu_counter_limited_add(fbc, limit, amount) to prevent it. I believe this implementation is correct, and slightly more efficient than the combination of compare and add (taking the lock once rather than twice when nearing full - the last 128MiB of a tmpfs volume on a machine with 128 CPUs and 4KiB pages); but it does beg for a better design - when nearing full, there is no new batching, but the costly percpu counter sum across CPUs still has to be done, while locked. Follow __percpu_counter_sum()'s example, including cpu_dying_mask as well as cpu_online_mask: but shouldn't __percpu_counter_compare() and __percpu_counter_limited_add() then be adding a num_dying_cpus() to num_online_cpus(), when they calculate the maximum which could be held across CPUs? But the times when it matters would be vanishingly rare. Link: https://lkml.kernel.org/r/bb817848-2d19-bcc8-39ca-ea179af0f0b4@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Carlos Maiolino <cem@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-30 11:42:45 +08:00
*/
bool __percpu_counter_limited_add(struct percpu_counter *fbc,
s64 limit, s64 amount, s32 batch)
{
s64 count;
s64 unknown;
unsigned long flags;
bool good = false;
shmem,percpu_counter: add _limited_add(fbc, limit, amount) Percpu counter's compare and add are separate functions: without locking around them (which would defeat their purpose), it has been possible to overflow the intended limit. Imagine all the other CPUs fallocating tmpfs huge pages to the limit, in between this CPU's compare and its add. I have not seen reports of that happening; but tmpfs's recent addition of dquot_alloc_block_nodirty() in between the compare and the add makes it even more likely, and I'd be uncomfortable to leave it unfixed. Introduce percpu_counter_limited_add(fbc, limit, amount) to prevent it. I believe this implementation is correct, and slightly more efficient than the combination of compare and add (taking the lock once rather than twice when nearing full - the last 128MiB of a tmpfs volume on a machine with 128 CPUs and 4KiB pages); but it does beg for a better design - when nearing full, there is no new batching, but the costly percpu counter sum across CPUs still has to be done, while locked. Follow __percpu_counter_sum()'s example, including cpu_dying_mask as well as cpu_online_mask: but shouldn't __percpu_counter_compare() and __percpu_counter_limited_add() then be adding a num_dying_cpus() to num_online_cpus(), when they calculate the maximum which could be held across CPUs? But the times when it matters would be vanishingly rare. Link: https://lkml.kernel.org/r/bb817848-2d19-bcc8-39ca-ea179af0f0b4@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Carlos Maiolino <cem@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-30 11:42:45 +08:00
if (amount == 0)
return true;
shmem,percpu_counter: add _limited_add(fbc, limit, amount) Percpu counter's compare and add are separate functions: without locking around them (which would defeat their purpose), it has been possible to overflow the intended limit. Imagine all the other CPUs fallocating tmpfs huge pages to the limit, in between this CPU's compare and its add. I have not seen reports of that happening; but tmpfs's recent addition of dquot_alloc_block_nodirty() in between the compare and the add makes it even more likely, and I'd be uncomfortable to leave it unfixed. Introduce percpu_counter_limited_add(fbc, limit, amount) to prevent it. I believe this implementation is correct, and slightly more efficient than the combination of compare and add (taking the lock once rather than twice when nearing full - the last 128MiB of a tmpfs volume on a machine with 128 CPUs and 4KiB pages); but it does beg for a better design - when nearing full, there is no new batching, but the costly percpu counter sum across CPUs still has to be done, while locked. Follow __percpu_counter_sum()'s example, including cpu_dying_mask as well as cpu_online_mask: but shouldn't __percpu_counter_compare() and __percpu_counter_limited_add() then be adding a num_dying_cpus() to num_online_cpus(), when they calculate the maximum which could be held across CPUs? But the times when it matters would be vanishingly rare. Link: https://lkml.kernel.org/r/bb817848-2d19-bcc8-39ca-ea179af0f0b4@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Carlos Maiolino <cem@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-30 11:42:45 +08:00
local_irq_save(flags);
unknown = batch * num_online_cpus();
count = __this_cpu_read(*fbc->counters);
/* Skip taking the lock when safe */
if (abs(count + amount) <= batch &&
((amount > 0 && fbc->count + unknown <= limit) ||
(amount < 0 && fbc->count - unknown >= limit))) {
shmem,percpu_counter: add _limited_add(fbc, limit, amount) Percpu counter's compare and add are separate functions: without locking around them (which would defeat their purpose), it has been possible to overflow the intended limit. Imagine all the other CPUs fallocating tmpfs huge pages to the limit, in between this CPU's compare and its add. I have not seen reports of that happening; but tmpfs's recent addition of dquot_alloc_block_nodirty() in between the compare and the add makes it even more likely, and I'd be uncomfortable to leave it unfixed. Introduce percpu_counter_limited_add(fbc, limit, amount) to prevent it. I believe this implementation is correct, and slightly more efficient than the combination of compare and add (taking the lock once rather than twice when nearing full - the last 128MiB of a tmpfs volume on a machine with 128 CPUs and 4KiB pages); but it does beg for a better design - when nearing full, there is no new batching, but the costly percpu counter sum across CPUs still has to be done, while locked. Follow __percpu_counter_sum()'s example, including cpu_dying_mask as well as cpu_online_mask: but shouldn't __percpu_counter_compare() and __percpu_counter_limited_add() then be adding a num_dying_cpus() to num_online_cpus(), when they calculate the maximum which could be held across CPUs? But the times when it matters would be vanishingly rare. Link: https://lkml.kernel.org/r/bb817848-2d19-bcc8-39ca-ea179af0f0b4@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Carlos Maiolino <cem@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-30 11:42:45 +08:00
this_cpu_add(*fbc->counters, amount);
local_irq_restore(flags);
return true;
}
raw_spin_lock(&fbc->lock);
count = fbc->count + amount;
/* Skip percpu_counter_sum() when safe */
if (amount > 0) {
if (count - unknown > limit)
goto out;
if (count + unknown <= limit)
good = true;
} else {
if (count + unknown < limit)
goto out;
if (count - unknown >= limit)
good = true;
}
if (!good) {
shmem,percpu_counter: add _limited_add(fbc, limit, amount) Percpu counter's compare and add are separate functions: without locking around them (which would defeat their purpose), it has been possible to overflow the intended limit. Imagine all the other CPUs fallocating tmpfs huge pages to the limit, in between this CPU's compare and its add. I have not seen reports of that happening; but tmpfs's recent addition of dquot_alloc_block_nodirty() in between the compare and the add makes it even more likely, and I'd be uncomfortable to leave it unfixed. Introduce percpu_counter_limited_add(fbc, limit, amount) to prevent it. I believe this implementation is correct, and slightly more efficient than the combination of compare and add (taking the lock once rather than twice when nearing full - the last 128MiB of a tmpfs volume on a machine with 128 CPUs and 4KiB pages); but it does beg for a better design - when nearing full, there is no new batching, but the costly percpu counter sum across CPUs still has to be done, while locked. Follow __percpu_counter_sum()'s example, including cpu_dying_mask as well as cpu_online_mask: but shouldn't __percpu_counter_compare() and __percpu_counter_limited_add() then be adding a num_dying_cpus() to num_online_cpus(), when they calculate the maximum which could be held across CPUs? But the times when it matters would be vanishingly rare. Link: https://lkml.kernel.org/r/bb817848-2d19-bcc8-39ca-ea179af0f0b4@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Carlos Maiolino <cem@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-30 11:42:45 +08:00
s32 *pcount;
int cpu;
for_each_cpu_or(cpu, cpu_online_mask, cpu_dying_mask) {
pcount = per_cpu_ptr(fbc->counters, cpu);
count += *pcount;
}
if (amount > 0) {
if (count > limit)
goto out;
} else {
if (count < limit)
goto out;
}
good = true;
shmem,percpu_counter: add _limited_add(fbc, limit, amount) Percpu counter's compare and add are separate functions: without locking around them (which would defeat their purpose), it has been possible to overflow the intended limit. Imagine all the other CPUs fallocating tmpfs huge pages to the limit, in between this CPU's compare and its add. I have not seen reports of that happening; but tmpfs's recent addition of dquot_alloc_block_nodirty() in between the compare and the add makes it even more likely, and I'd be uncomfortable to leave it unfixed. Introduce percpu_counter_limited_add(fbc, limit, amount) to prevent it. I believe this implementation is correct, and slightly more efficient than the combination of compare and add (taking the lock once rather than twice when nearing full - the last 128MiB of a tmpfs volume on a machine with 128 CPUs and 4KiB pages); but it does beg for a better design - when nearing full, there is no new batching, but the costly percpu counter sum across CPUs still has to be done, while locked. Follow __percpu_counter_sum()'s example, including cpu_dying_mask as well as cpu_online_mask: but shouldn't __percpu_counter_compare() and __percpu_counter_limited_add() then be adding a num_dying_cpus() to num_online_cpus(), when they calculate the maximum which could be held across CPUs? But the times when it matters would be vanishingly rare. Link: https://lkml.kernel.org/r/bb817848-2d19-bcc8-39ca-ea179af0f0b4@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Carlos Maiolino <cem@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-30 11:42:45 +08:00
}
count = __this_cpu_read(*fbc->counters);
fbc->count += count + amount;
__this_cpu_sub(*fbc->counters, count);
out:
shmem,percpu_counter: add _limited_add(fbc, limit, amount) Percpu counter's compare and add are separate functions: without locking around them (which would defeat their purpose), it has been possible to overflow the intended limit. Imagine all the other CPUs fallocating tmpfs huge pages to the limit, in between this CPU's compare and its add. I have not seen reports of that happening; but tmpfs's recent addition of dquot_alloc_block_nodirty() in between the compare and the add makes it even more likely, and I'd be uncomfortable to leave it unfixed. Introduce percpu_counter_limited_add(fbc, limit, amount) to prevent it. I believe this implementation is correct, and slightly more efficient than the combination of compare and add (taking the lock once rather than twice when nearing full - the last 128MiB of a tmpfs volume on a machine with 128 CPUs and 4KiB pages); but it does beg for a better design - when nearing full, there is no new batching, but the costly percpu counter sum across CPUs still has to be done, while locked. Follow __percpu_counter_sum()'s example, including cpu_dying_mask as well as cpu_online_mask: but shouldn't __percpu_counter_compare() and __percpu_counter_limited_add() then be adding a num_dying_cpus() to num_online_cpus(), when they calculate the maximum which could be held across CPUs? But the times when it matters would be vanishingly rare. Link: https://lkml.kernel.org/r/bb817848-2d19-bcc8-39ca-ea179af0f0b4@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Carlos Maiolino <cem@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-30 11:42:45 +08:00
raw_spin_unlock(&fbc->lock);
local_irq_restore(flags);
return good;
}
static int __init percpu_counter_startup(void)
{
int ret;
ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "lib/percpu_cnt:online",
compute_batch_value, NULL);
WARN_ON(ret < 0);
ret = cpuhp_setup_state_nocalls(CPUHP_PERCPU_CNT_DEAD,
"lib/percpu_cnt:dead", NULL,
percpu_counter_cpu_dead);
WARN_ON(ret < 0);
return 0;
}
module_init(percpu_counter_startup);