linux-next/arch/unicore32/include/asm/bitops.h

/* SPDX-License-Identifier: GPL-2.0-only */
/*
 * linux/arch/unicore32/include/asm/bitops.h
 *
 * Code specific to PKUnity SoC and UniCore ISA
 *
 * Copyright (C) 2001-2010 GUAN Xue-tao
 */

#ifndef __UNICORE_BITOPS_H__
#define __UNICORE_BITOPS_H__

#define _ASM_GENERIC_BITOPS_FLS_H_
#define _ASM_GENERIC_BITOPS___FLS_H_
#define _ASM_GENERIC_BITOPS_FFS_H_
#define _ASM_GENERIC_BITOPS___FFS_H_
/*
 * On UNICORE, those functions can be implemented around
 * the cntlz instruction for much better code efficiency.
 */

static inline int fls(unsigned int x)
{
	int ret;

	asm("cntlz\t%0, %1" : "=r" (ret) : "r" (x) : "cc");
	ret = 32 - ret;

	return ret;
}

#define __fls(x) (fls(x) - 1)
#define ffs(x) ({ unsigned long __t = (x); fls(__t & -__t); })
#define __ffs(x) (ffs(x) - 1)

#include <asm-generic/bitops.h>

/* following definitions: to avoid using codes in lib/find_*.c */
#define find_next_bit		find_next_bit
#define find_next_zero_bit	find_next_zero_bit
#define find_first_bit		find_first_bit
#define find_first_zero_bit	find_first_zero_bit

#include <asm-generic/bitops/find.h>

#endif /* __UNICORE_BITOPS_H__ */
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 2019-06-04 16:11:33 +08:00			`/* SPDX-License-Identifier: GPL-2.0-only */`
unicore32 additional architecture files: low-level lib: misc This patch implements the rest low-level libraries. Signed-off-by: Guan Xuetao <gxt@mprc.pku.edu.cn> Acked-by: Arnd Bergmann <arnd@arndb.de> 2011-01-15 18:23:09 +08:00			`/*`
			`* linux/arch/unicore32/include/asm/bitops.h`
			`*`
			`* Code specific to PKUnity SoC and UniCore ISA`
			`*`
			`* Copyright (C) 2001-2010 GUAN Xue-tao`
			`*/`

			`#ifndef __UNICORE_BITOPS_H__`
			`#define __UNICORE_BITOPS_H__`

			`#define _ASM_GENERIC_BITOPS_FLS_H_`
			`#define _ASM_GENERIC_BITOPS___FLS_H_`
			`#define _ASM_GENERIC_BITOPS_FFS_H_`
			`#define _ASM_GENERIC_BITOPS___FFS_H_`
			`/*`
			`* On UNICORE, those functions can be implemented around`
			`* the cntlz instruction for much better code efficiency.`
			`*/`

fls: change parameter to unsigned int When testing in userspace, UBSAN pointed out that shifting into the sign bit is undefined behaviour. It doesn't really make sense to ask for the highest set bit of a negative value, so just turn the argument type into an unsigned int. Some architectures (eg ppc) already had it declared as an unsigned int, so I don't expect too many problems. Link: http://lkml.kernel.org/r/20181105221117.31828-1-willy@infradead.org Signed-off-by: Matthew Wilcox <willy@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2019-01-04 07:26:41 +08:00			`static inline int fls(unsigned int x)`
unicore32 additional architecture files: low-level lib: misc This patch implements the rest low-level libraries. Signed-off-by: Guan Xuetao <gxt@mprc.pku.edu.cn> Acked-by: Arnd Bergmann <arnd@arndb.de> 2011-01-15 18:23:09 +08:00			`{`
			`int ret;`

			`asm("cntlz\t%0, %1" : "=r" (ret) : "r" (x) : "cc");`
			`ret = 32 - ret;`

			`return ret;`
			`}`

			`#define __fls(x) (fls(x) - 1)`
			`#define ffs(x) ({ unsigned long __t = (x); fls(__t & -__t); })`
			`#define __ffs(x) (ffs(x) - 1)`

			`#include <asm-generic/bitops.h>`

unicore32: fix build error for find bitops Remove the __uc32_ prefix in find bitops functions. Move find_* macros behind asm-generic/bitops.h inclusion. see commit <19de85ef574c3a2182e3ccad9581805052f14946> bitops: add #ifndef for each of find bitops also see commit <63e424c84429903c92a0f1e9654c31ccaf6694d0> arch: remove CONFIG_GENERIC_FIND_{NEXT_BIT,BIT_LE,LAST_BIT} Signed-off-by: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> 2011-06-13 11:44:49 +08:00			`/* following definitions: to avoid using codes in lib/find_.c /`
			`#define find_next_bit find_next_bit`
			`#define find_next_zero_bit find_next_zero_bit`
			`#define find_first_bit find_first_bit`
			`#define find_first_zero_bit find_first_zero_bit`

lib: optimize cpumask_next_and() We've measured that we spend ~0.6% of sys cpu time in cpumask_next_and(). It's essentially a joined iteration in search for a non-zero bit, which is currently implemented as a lookup join (find a nonzero bit on the lhs, lookup the rhs to see if it's set there). Implement a direct join (find a nonzero bit on the incrementally built join). Also add generic bitmap benchmarks in the new `test_find_bit` module for new function (see `find_next_and_bit` in [2] and [3] below). For cpumask_next_and, direct benchmarking shows that it's 1.17x to 14x faster with a geometric mean of 2.1 on 32 CPUs [1]. No impact on memory usage. Note that on Arm, the new pure-C implementation still outperforms the old one that uses a mix of C and asm (`find_next_bit`) [3]. [1] Approximate benchmark code: ``` unsigned long src1p[nr_cpumask_longs] = {pattern1}; unsigned long src2p[nr_cpumask_longs] = {pattern2}; for (/a bunch of repetitions/) { for (int n = -1; n <= nr_cpu_ids; ++n) { asm volatile("" : "+rm"(src1p)); // prevent any optimization asm volatile("" : "+rm"(src2p)); unsigned long result = cpumask_next_and(n, src1p, src2p); asm volatile("" : "+rm"(result)); } } ``` Results: pattern1 pattern2 time_before/time_after 0x0000ffff 0x0000ffff 1.65 0x0000ffff 0x00005555 2.24 0x0000ffff 0x00001111 2.94 0x0000ffff 0x00000000 14.0 0x00005555 0x0000ffff 1.67 0x00005555 0x00005555 1.71 0x00005555 0x00001111 1.90 0x00005555 0x00000000 6.58 0x00001111 0x0000ffff 1.46 0x00001111 0x00005555 1.49 0x00001111 0x00001111 1.45 0x00001111 0x00000000 3.10 0x00000000 0x0000ffff 1.18 0x00000000 0x00005555 1.18 0x00000000 0x00001111 1.17 0x00000000 0x00000000 1.25 ----------------------------- geo.mean 2.06 [2] test_find_next_bit, X86 (skylake) [ 3913.477422] Start testing find_bit() with random-filled bitmap [ 3913.477847] find_next_bit: 160868 cycles, 16484 iterations [ 3913.477933] find_next_zero_bit: 169542 cycles, 16285 iterations [ 3913.478036] find_last_bit: 201638 cycles, 16483 iterations [ 3913.480214] find_first_bit: 4353244 cycles, 16484 iterations [ 3913.480216] Start testing find_next_and_bit() with random-filled bitmap [ 3913.481074] find_next_and_bit: 89604 cycles, 8216 iterations [ 3913.481075] Start testing find_bit() with sparse bitmap [ 3913.481078] find_next_bit: 2536 cycles, 66 iterations [ 3913.481252] find_next_zero_bit: 344404 cycles, 32703 iterations [ 3913.481255] find_last_bit: 2006 cycles, 66 iterations [ 3913.481265] find_first_bit: 17488 cycles, 66 iterations [ 3913.481266] Start testing find_next_and_bit() with sparse bitmap [ 3913.481272] find_next_and_bit: 764 cycles, 1 iterations [3] test_find_next_bit, arm (v7 odroid XU3). [ 267.206928] Start testing find_bit() with random-filled bitmap [ 267.214752] find_next_bit: 4474 cycles, 16419 iterations [ 267.221850] find_next_zero_bit: 5976 cycles, 16350 iterations [ 267.229294] find_last_bit: 4209 cycles, 16419 iterations [ 267.279131] find_first_bit: 1032991 cycles, 16420 iterations [ 267.286265] Start testing find_next_and_bit() with random-filled bitmap [ 267.302386] find_next_and_bit: 2290 cycles, 8140 iterations [ 267.309422] Start testing find_bit() with sparse bitmap [ 267.316054] find_next_bit: 191 cycles, 66 iterations [ 267.322726] find_next_zero_bit: 8758 cycles, 32703 iterations [ 267.329803] find_last_bit: 84 cycles, 66 iterations [ 267.336169] find_first_bit: 4118 cycles, 66 iterations [ 267.342627] Start testing find_next_and_bit() with sparse bitmap [ 267.356919] find_next_and_bit: 91 cycles, 1 iterations [courbet@google.com: v6] Link: http://lkml.kernel.org/r/20171129095715.23430-1-courbet@google.com [geert@linux-m68k.org: m68k/bitops: always include <asm-generic/bitops/find.h>] Link: http://lkml.kernel.org/r/1512556816-28627-1-git-send-email-geert@linux-m68k.org Link: http://lkml.kernel.org/r/20171128131334.23491-1-courbet@google.com Signed-off-by: Clement Courbet <courbet@google.com> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Yury Norov <ynorov@caviumnetworks.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2018-02-07 07:38:34 +08:00			`#include <asm-generic/bitops/find.h>`

unicore32 additional architecture files: low-level lib: misc This patch implements the rest low-level libraries. Signed-off-by: Guan Xuetao <gxt@mprc.pku.edu.cn> Acked-by: Arnd Bergmann <arnd@arndb.de> 2011-01-15 18:23:09 +08:00			`#endif /* __UNICORE_BITOPS_H__ */`