bluez

korg/bluez

mirror of https://git.kernel.org/pub/scm/bluetooth/bluez.git synced 2024-11-29 07:04:19 +08:00

Author	SHA1	Message	Date
Luiz Augusto von Dentz	98ebaa9932	sbc: detect when bitpool has changed A2DP spec allow bitpool changes midstream which is why sbc configuration has a range of values for bitpool that the encoder can use and decoder must support. Bitpool changes do not affect the state of encoder/decoder so they don't need to be reinitialize when this happens, so the impact is fairly small, what it does change is the frame length so encoders may change the bitpool to use the link more efficiently.	2010-12-23 00:43:22 +02:00
Keith Mok	f127d461f5	Add iwmmxt optimization for sbc for pxa series cpu Add iwmmxt optimization for sbc for pxa series cpu. Benchmarked on ARM PXA platform: === Before (4 bands) ==== $ time ./sbcenc_orig -s 4 long.au > /dev/null real 0m 2.44s user 0m 2.39s sys 0m 0.05s === After (4 bands) ==== $ time ./sbcenc -s 4 long.au > /dev/null real 0m 1.59s user 0m 1.49s sys 0m 0.10s === Before (8 bands) ==== $ time ./sbcenc_orig -s 8 long.au > /dev/null real 0m 4.05s user 0m 3.98s sys 0m 0.07s === After (8 bands) ==== $ time ./sbcenc -s 8 long.au > /dev/null real 0m 1.48s user 0m 1.41s sys 0m 0.06s === Before (a2dp usage) ==== $ time ./sbcenc_orig -b53 -s8 -j long.au > /dev/null real 0m 4.51s user 0m 4.41s sys 0m 0.10s === After (a2dp usage) ==== $ time ./sbcenc -b53 -s8 -j long.au > /dev/null real 0m 2.05s user 0m 1.99s sys 0m 0.06s	2010-11-18 18:50:20 +02:00
Siarhei Siamashka	01084686e1	sbc: added "cc" to the clobber list of mmx inline assembly In the case of scale factors calculation optimizations, the inline assembly code has instructions which update flags register, but "cc" was not mentioned in the clobber list. When optimizing code, gcc theoretically is allowed to do a comparison before the inline assembly block, and a conditional branch after it which would lead to a problem if the flags register gets clobbered. While this is apparently not happening in practice with the current versions of gcc, the clobber list needs to be corrected. Regarding the other inline assembly blocks. While most likely it is actually unnecessary based on quick review, "cc" is also added there to the clobber list because it should have no impact on performance in practice. It's kind of cargo cult, but relieves us from the need to track the potential updates of flags register in all these places.	2010-11-11 15:06:41 +02:00
Siarhei Siamashka	eb2d4570d3	sbc: ARMv6 optimized version of analysis filter for SBC encoder The optimized filter gets enabled when the code is compiled with -mcpu=/-march options set to target the processors which support ARMv6 instructions. This code is also disabled when NEON is used (which is a lot better alternative). For additional safety ARM EABI is required and thumb mode should not be used. Benchmarks from ARM11: == 8 subbands == $ time ./sbcenc -b53 -s8 -j test.au > /dev/null real 0m 35.65s user 0m 34.17s sys 0m 1.28s $ time ./sbcenc.armv6 -b53 -s8 -j test.au > /dev/null real 0m 17.29s user 0m 15.47s sys 0m 0.67s == 4 subbands == $ time ./sbcenc -b53 -s4 -j test.au > /dev/null real 0m 25.28s user 0m 23.76s sys 0m 1.32s $ time ./sbcenc.armv6 -b53 -s4 -j test.au > /dev/null real 0m 18.64s user 0m 15.78s sys 0m 2.22s	2010-07-02 16:02:12 -03:00
Siarhei Siamashka	213aff750a	sbc: faster 'sbc_calculate_bits' function By using SBC_ALWAYS_INLINE trick, the implementation of 'sbc_calculate_bits' function is split into two branches, each having 'subband' variable value known at compile time. It helps the compiler to generate more optimal code by saving at least one extra register, and also provides more obvious opportunities for loops unrolling. Benchmarked on ARM Cortex-A8: == Before: == $ time ./sbcenc -b53 -s8 -j test.au > /dev/null real 0m3.989s user 0m3.602s sys 0m0.391s samples % image name symbol name 26057 32.6128 sbcenc sbc_pack_frame 20003 25.0357 sbcenc sbc_analyze_4b_8s_neon 14220 17.7977 sbcenc sbc_calculate_bits 8498 10.6361 no-vmlinux /no-vmlinux 5300 6.6335 sbcenc sbc_calc_scalefactors_j_neon 3235 4.0489 sbcenc sbc_enc_process_input_8s_be_neon 2172 2.7185 sbcenc sbc_encode == After: == $ time ./sbcenc -b53 -s8 -j test.au > /dev/null real 0m3.652s user 0m3.195s sys 0m0.445s samples % image name symbol name 26207 36.0095 sbcenc sbc_pack_frame 19820 27.2335 sbcenc sbc_analyze_4b_8s_neon 8629 11.8566 no-vmlinux /no-vmlinux 6988 9.6018 sbcenc sbc_calculate_bits 5094 6.9994 sbcenc sbc_calc_scalefactors_j_neon 3351 4.6044 sbcenc sbc_enc_process_input_8s_be_neon 2182 2.9982 sbcenc sbc_encode	2010-07-02 16:02:10 -03:00
Siarhei Siamashka	e80454d08b	sbc: slightly faster 'sbc_calc_scalefactors_neon' Previous variant was basically derived from C and MMX implementations. Now new variant makes use of 'vmax' instruction, which is available in NEON and can do this job faster. The same method for calculating scale factors is also used in 'sbc_calc_scalefactors_j_neon'. Benchmarked without joint stereo on ARM Cortex-A8: == Before: == $ time ./sbcenc -b53 -s8 test.au > /dev/null real 0m3.851s user 0m3.375s sys 0m0.469s samples % image name symbol name 26260 34.2672 sbcenc sbc_pack_frame 20013 26.1154 sbcenc sbc_analyze_4b_8s_neon 13796 18.0027 sbcenc sbc_calculate_bits 8388 10.9457 no-vmlinux /no-vmlinux 3229 4.2136 sbcenc sbc_enc_process_input_8s_be_neon 2408 3.1422 sbcenc sbc_calc_scalefactors_neon 2093 2.7312 sbcenc sbc_encode == After: == $ time ./sbcenc -b53 -s8 test.au > /dev/null real 0m3.796s user 0m3.344s sys 0m0.438s samples % image name symbol name 26582 34.8726 sbcenc sbc_pack_frame 20032 26.2797 sbcenc sbc_analyze_4b_8s_neon 13808 18.1146 sbcenc sbc_calculate_bits 8374 10.9858 no-vmlinux /no-vmlinux 3187 4.1810 sbcenc sbc_enc_process_input_8s_be_neon 2027 2.6592 sbcenc sbc_encode 1766 2.3168 sbcenc sbc_calc_scalefactors_neon	2010-07-02 16:02:07 -03:00
Siarhei Siamashka	43cd9700fb	sbc: ARM NEON optimizations for input permutation in SBC encoder Using SIMD optimizations for 'sbc_enc_process_input_*' functions provides a modest, but consistent speedup in all SBC encoding cases. Benchmarked on ARM Cortex-A8: == Before: == $ time ./sbcenc -b53 -s8 -j test.au > /dev/null real 0m4.389s user 0m3.969s sys 0m0.422s samples % image name symbol name 26234 29.9625 sbcenc sbc_pack_frame 20057 22.9076 sbcenc sbc_analyze_4b_8s_neon 14306 16.3393 sbcenc sbc_calculate_bits 9866 11.2682 sbcenc sbc_enc_process_input_8s_be 8506 9.7149 no-vmlinux /no-vmlinux 5219 5.9608 sbcenc sbc_calc_scalefactors_j_neon 2280 2.6040 sbcenc sbc_encode 661 0.7549 libc-2.10.1.so memcpy == After: == $ time ./sbcenc -b53 -s8 -j test.au > /dev/null real 0m3.989s user 0m3.602s sys 0m0.391s samples % image name symbol name 26057 32.6128 sbcenc sbc_pack_frame 20003 25.0357 sbcenc sbc_analyze_4b_8s_neon 14220 17.7977 sbcenc sbc_calculate_bits 8498 10.6361 no-vmlinux /no-vmlinux 5300 6.6335 sbcenc sbc_calc_scalefactors_j_neon 3235 4.0489 sbcenc sbc_enc_process_input_8s_be_neon 2172 2.7185 sbcenc sbc_encode	2010-07-02 16:02:05 -03:00
Siarhei Siamashka	e1ea3e76c7	sbc: ARM NEON optimized joint stereo processing in SBC encoder Improves SBC encoding performance when joint stereo is used, which is a typical A2DP configuration. Benchmarked on ARM Cortex-A8: == Before: == $ time ./sbcenc -b53 -s8 -j test.au > /dev/null real 0m5.239s user 0m4.805s sys 0m0.430s samples % image name symbol name 26083 25.0856 sbcenc sbc_pack_frame 21548 20.7240 sbcenc sbc_calc_scalefactors_j 19910 19.1486 sbcenc sbc_analyze_4b_8s_neon 14377 13.8272 sbcenc sbc_calculate_bits 9990 9.6080 sbcenc sbc_enc_process_input_8s_be 8667 8.3356 no-vmlinux /no-vmlinux 2263 2.1765 sbcenc sbc_encode 696 0.6694 libc-2.10.1.so memcpy == After: == $ time ./sbcenc -b53 -s8 -j test.au > /dev/null real 0m4.389s user 0m3.969s sys 0m0.422s samples % image name symbol name 26234 29.9625 sbcenc sbc_pack_frame 20057 22.9076 sbcenc sbc_analyze_4b_8s_neon 14306 16.3393 sbcenc sbc_calculate_bits 9866 11.2682 sbcenc sbc_enc_process_input_8s_be 8506 9.7149 no-vmlinux /no-vmlinux 5219 5.9608 sbcenc sbc_calc_scalefactors_j_neon 2280 2.6040 sbcenc sbc_encode 661 0.7549 libc-2.10.1.so memcpy	2010-07-02 16:01:57 -03:00
Johan Hedberg	b3339488a9	Fix signedness of libsbc parameters The written parameter of sbc_encode can be negative so it should be ssize_t instead of size_t.	2010-06-30 11:57:25 +03:00
Siarhei Siamashka	d049a9a2ae	sbc: ARM NEON optimization for scale factors calculation Improves SBC encoding performance when joint stereo is not used. Benchmarked on ARM Cortex-A8: == Before: == $ time ./sbcenc -b53 -s8 test.au > /dev/null real 0m4.756s user 0m4.313s sys 0m0.438s samples % image name symbol name 2569 27.6296 sbcenc sbc_pack_frame 1934 20.8002 sbcenc sbc_analyze_4b_8s_neon 1386 14.9064 sbcenc sbc_calculate_bits 1221 13.1319 sbcenc sbc_calc_scalefactors 996 10.7120 sbcenc sbc_enc_process_input_8s_be 878 9.4429 no-vmlinux /no-vmlinux 204 2.1940 sbcenc sbc_encode 56 0.6023 libc-2.10.1.so memcpy == After: == $ time ./sbcenc -b53 -s8 test.au > /dev/null real 0m4.220s user 0m3.797s sys 0m0.422s samples % image name symbol name 2563 31.3249 sbcenc sbc_pack_frame 1892 23.1239 sbcenc sbc_analyze_4b_8s_neon 1368 16.7196 sbcenc sbc_calculate_bits 961 11.7453 sbcenc sbc_enc_process_input_8s_be 836 10.2176 no-vmlinux /no-vmlinux 262 3.2022 sbcenc sbc_calc_scalefactors_neon 199 2.4322 sbcenc sbc_encode 49 0.5989 libc-2.10.1.so memcpy	2010-06-30 10:30:16 +03:00
Siarhei Siamashka	95465b816f	sbc: MMX optimization for scale factors calculation Improves SBC encoding performance when joint stereo is not used. Benchmarked on Pentium-M: == Before: == $ time ./sbcenc -b53 -s8 test.au > /dev/null real 0m1.439s user 0m1.336s sys 0m0.104s samples % image name symbol name 8642 33.7473 sbcenc sbc_pack_frame 5873 22.9342 sbcenc sbc_analyze_4b_8s_mmx 4435 17.3188 sbcenc sbc_calc_scalefactors 4285 16.7331 sbcenc sbc_calculate_bits 1942 7.5836 sbcenc sbc_enc_process_input_8s_be 322 1.2574 sbcenc sbc_encode == After: == $ time ./sbcenc -b53 -s8 test.au > /dev/null real 0m1.319s user 0m1.220s sys 0m0.084s samples % image name symbol name 8706 37.9959 sbcenc sbc_pack_frame 5740 25.0513 sbcenc sbc_analyze_4b_8s_mmx 4307 18.7972 sbcenc sbc_calculate_bits 1937 8.4537 sbcenc sbc_enc_process_input_8s_be 1801 7.8602 sbcenc sbc_calc_scalefactors_mmx 307 1.3399 sbcenc sbc_encode	2010-06-30 10:30:12 +03:00
Siarhei Siamashka	b26b60a734	sbc: new 'sbc_calc_scalefactors_j' function added to sbc primitives The code for scale factors calculation with joint stereo support has been moved to a separate function. It can get platform-specific SIMD optimizations later for best possible performance. But even this change in C code improves performance because of the use of __builtin_clz() instead of loops similar to what was done to sbc_calc_scalefactors earlier. Also technically it does loop unrolling by processing two channels at once, which might be either good or bad for performance (if the registers pressure is increased and more data is spilled to memory). But the benchmark from 32-bit x86 system (pentium-m) shows that it got clearly faster: $ time ./sbcenc.old -b53 -s8 -j test.au > /dev/null real 0m1.868s user 0m1.808s sys 0m0.048s $ time ./sbcenc.new -b53 -s8 -j test.au > /dev/null real 0m1.742s user 0m1.668s sys 0m0.064s	2010-06-30 10:30:08 +03:00
Gustavo F. Padovan	502bfaf2bc	sbc: Fix redundant null check on calling free() Issues found by smatch static check: http://smatch.sourceforge.net/	2010-06-08 15:57:22 +08:00
Johan Hedberg	5592142cb9	Update Nokia copyrights	2010-01-07 01:11:05 -08:00
Marcel Holtmann	9184e2eeb7	Update copyright information	2010-01-01 17:08:17 -08:00
Marcel Holtmann	a5b1e3d02c	Switch to a full non-recursive build system	2009-08-23 00:40:59 -07:00
Siarhei Siamashka	3df7c3e660	sbc: added saturated clipping of decoder output to 16-bit This prevents overflows and audible artefacts for the audio files which originally had loudness maximized. Music from audio CD disks is an example of such files, see http://en.wikipedia.org/wiki/Loudness_war	2009-04-17 19:00:24 +02:00
Marcel Holtmann	a90d88b37e	Do some coding style cleanups	2009-04-17 01:55:42 +02:00
Lennart Poettering	c43f8bdcc1	fix up sbc.h prototypes to use const/size_t wherever applicable	2009-04-17 01:45:19 +02:00
Luiz Augusto von Dentz	1a090ece2f	Remove unused variable.	2009-04-01 11:59:52 -03:00
Siarhei Siamashka	fe3eae1690	sbc: ensure 16-byte buffer position alignment for 4 subbands encoding Buffer position in X array was not always 16-bytes aligned. Strict 16-byte alignment is strictly required for powerpc altivec simd optimizations because altivec does not have support for unaligned vector loads at all.	2009-03-14 07:15:47 +01:00
Luiz Augusto von Dentz	c9947240b9	Fix misuse of 'frame.joint' when estimating the frame length. 'frame.joint' is not the flag for joint stereo mode, it is a set of bits which show for which subbands channels joining was actually used.	2009-03-21 00:07:29 +02:00
Johan Hedberg	ae6e4f68f3	Fix a couple of other places that should use size_t and ssize_t	2009-03-12 16:33:14 -03:00
Marc-André Lureau	e2f70b056e	sbc: don't dereference sbc pointer if NULL	2009-02-17 23:54:51 +02:00
Marc-André Lureau	8e544e0dac	sbc: provide implementation info as a readable string This is mainly useful for logging and debugging.	2009-02-17 23:54:45 +02:00
Lennart Poettering	218de8c6ae	make check_mmx_support() a proper C function Signed-off-by: Lennart Poettering <lennart@poettering.net>	2009-02-01 17:39:06 -08:00
Marcel Holtmann	04adf8f0ac	Fix SBC to compile cleanly with -Wsign-compare	2009-01-30 00:02:58 +01:00
Siarhei Siamashka	f7c622b6c2	Fix for SBC encoding with block sizes other than 16 Thanks to Christian Hoene for finding and reporting the problem. This regression was intruduced in commit `19af3c49e6`	2009-01-29 18:01:05 +01:00
Marcel Holtmann	dc1d52cfac	Add -Wno-sign-compare for the library and fix the other warnings	2009-01-29 17:32:58 +01:00
Siarhei Siamashka	a563c8ed5a	SBC encoder scale factors calculation optimized with __builtin_clz Count leading zeros operation is often implemented using a special instruction for it on various architectures (at least this is true for ARM and x86). Using __builtin_clz gcc intrinsic allows to eliminate innermost loop in scale factors calculation and improve performance. Also scale factors calculation can be optimized even more using SIMD instructions.	2009-01-29 08:25:50 +01:00
Siarhei Siamashka	19af3c49e6	Performance optimizations for input data processing in SBC encoder Channels deinterleaving, endian conversion and samples reordering is done in one pass, avoiding the use of intermediate buffer. Also this code is implemented as a new "performance primitive", which allows further platform specific optimizations (ARMv6 and ARM NEON should gain quite a lot from assembly optimizations here).	2009-01-28 06:42:10 +01:00
Siarhei Siamashka	c90eb5ba7e	Use of -funroll-loops option to improve SBC encoder performance Added the use of -funroll-loops gcc option for SBC. Also in order to gain better effect, 'sbc_pack_frame' function body moved to an inline function, which gets instantiated for 4 different subbands/channels combinations. So that 'frame_subbands' and 'frame_channels' arguments become compile time constants and can be better optimized by the compiler.	2009-01-23 21:15:38 +02:00
Siarhei Siamashka	f70d1ada0a	Audio quality improvement for 16-bit fixed point SBC encoder Multiplying the first part of the analysis filter constant tables by some coefficients and dividing the second part by the same coefficients is a transformation which should produce the same results if rounding errors are not taken into account. These additional C0/C1/... coefficients can be varied in a certain range (the requirement is that we still do not get overflows). The 'magic' values for these coefficients are selected in such a way that the rounding errors are minimized (rounding errors are unavoidable when putting all the floating constants into 16-bit tables and losing some of the fractional part). Also non-SIMD variant of the analysis filter is dropped because keeping it would require applying a similar change to its tables, which is a bit tricky and just increases maintenance overhead.	2009-01-23 21:13:02 +02:00
Siarhei Siamashka	906a4655cc	Fix sbcenc breakage when au file header size is larger than 24 bytes	2009-01-19 13:03:41 +01:00
Siarhei Siamashka	e010fc175e	Performance optimizations for sbcenc utility Read and write buffers sizes increased, memmove overhead eliminated. Nonportable cast from 'unsigned char ' to 'struct au_header ' is now also resolved as part of the changes.	2009-01-18 16:14:46 +01:00
Siarhei Siamashka	d48c175f70	Coding style fixes	2009-01-18 16:09:00 +01:00
Johan Hedberg	1fb5ffff0b	Fix indentation to use only tabs	2009-01-16 20:29:43 +02:00
Johan Hedberg	854f69c4ff	Add missing sbc headers to dist	2009-01-16 18:28:51 +02:00
Siarhei Siamashka	188d5f24a6	MMX and ARM NEON optimized versions of analysis filter for SBC encoder	2009-01-16 08:23:24 +01:00
Siarhei Siamashka	82d00972c9	SBC arrays and constant tables aligned at 16 byte boundary for SIMD Most SIMD instruction sets benefit from data being naturally aligned. And even if it is not strictly required, performance is usually better with the aligned data. ARM NEON and SSE2 have different instruction variants for aligned/unaligned memory accesses.	2009-01-16 08:23:19 +01:00
Siarhei Siamashka	9e31e7dde6	SIMD-friendly variant of SBC encoder analysis filter Added SIMD-friendly C implementation of SBC analysis filter (the structure of code had to be changed a bit and constants in the tables reordered). This code can be used as a reference for developing platform specific SIMD optimizations. These functions are put into a new file 'sbc_primitives.c', which is going to contain all the basic stuff for SBC codec.	2009-01-16 00:28:32 +01:00
Siarhei Siamashka	8bbfdf782d	Fix for big endian problems in SBC codec	2009-01-07 14:41:46 +01:00
Christian Hoene	2cada66773	Fixed correct handling of frame sizes in the encoder	2009-01-06 03:41:57 +01:00
Siarhei Siamashka	365f92ed45	Use of constant shift in SBC quantization code to make it faster The result of 32x32->64 unsigned long multiplication is returned in two registers (high and low 32-bit parts) for many 32-bit architectures. For these architectures constant right shift by 32 bits is optimized out by the compiler to just taking the high 32-bit part. Also some data needed at the quantization stage is precalculated beforehand to improve performance.	2009-01-06 03:39:27 +01:00
Marcel Holtmann	fb333f1c88	Update copyright information	2009-01-01 19:33:20 +01:00
Siarhei Siamashka	8a206b8115	Added possibility to analyze 4 blocks at once in SBC encoder This change is needed for SIMD optimizations which will follow shortly. And even for non-SIMD capable platforms it still may be useful to have possibility to merge several analyzing functions together into one for better code scheduling or reusing loaded constants. Also analysis filter functions are now called using function pointers, which allows the default implementation to be overrided at runtime (with high precision variant or MMX/SSE2/NEON optimized code).	2009-01-01 09:52:37 +01:00
Siarhei Siamashka	a6cb57cd01	New SBC analysis filter function to replace current broken code This code is heavily based on the patch submitted by Jaska Uimonen. Additional changes include preserving extra bits in the output of filter function for better precision, support for both 16-bit and 32-bit fixed point implementation. Sign of some table values was changed in order to preserve a regular code structure and have multiply-accumulate oparations only. No additional optimizations were applied as this code is intended to be some kind of "reference" implementation. Platform specific optimizations may require different tricks and can be branched off from this implementation. Some extra information about this code can be found in linux-bluetooth mailing list archive for December 2008.	2008-12-29 12:52:20 +01:00
Siarhei Siamashka	635e9348a9	Fixed subbands selection for joint-stereo in SBC encoder	2008-12-29 11:31:59 +01:00
Marcel Holtmann	440d5aa50c	Add more options to control encoding methods	2008-12-23 23:56:32 +01:00
Marcel Holtmann	ce633965e1	Don't decode a frame if it is too small	2008-12-23 23:41:38 +01:00

1 2 3

147 Commits