Commit Graph

84 Commits

Author SHA1 Message Date
Siarhei Siamashka
a563c8ed5a SBC encoder scale factors calculation optimized with __builtin_clz
Count leading zeros operation is often implemented using a special
instruction for it on various architectures (at least this is true
for ARM and x86). Using __builtin_clz gcc intrinsic allows to
eliminate innermost loop in scale factors calculation and improve
performance. Also scale factors calculation can be optimized even
more using SIMD instructions.
2009-01-29 08:25:50 +01:00
Siarhei Siamashka
19af3c49e6 Performance optimizations for input data processing in SBC encoder
Channels deinterleaving, endian conversion and samples reordering
is done in one pass, avoiding the use of intermediate buffer. Also
this code is implemented as a new "performance primitive", which
allows further platform specific optimizations (ARMv6 and ARM NEON
should gain quite a lot from assembly optimizations here).
2009-01-28 06:42:10 +01:00
Siarhei Siamashka
c90eb5ba7e Use of -funroll-loops option to improve SBC encoder performance
Added the use of -funroll-loops gcc option for SBC. Also in
order to gain better effect, 'sbc_pack_frame' function
body moved to an inline function, which gets instantiated
for 4 different subbands/channels combinations. So that
'frame_subbands' and 'frame_channels' arguments become compile
time constants and can be better optimized by the compiler.
2009-01-23 21:15:38 +02:00
Siarhei Siamashka
d48c175f70 Coding style fixes 2009-01-18 16:09:00 +01:00
Siarhei Siamashka
82d00972c9 SBC arrays and constant tables aligned at 16 byte boundary for SIMD
Most SIMD instruction sets benefit from data being naturally aligned.
And even if it is not strictly required, performance is usually better
with the aligned data. ARM NEON and SSE2 have different instruction
variants for aligned/unaligned memory accesses.
2009-01-16 08:23:19 +01:00
Siarhei Siamashka
9e31e7dde6 SIMD-friendly variant of SBC encoder analysis filter
Added SIMD-friendly C implementation of SBC analysis filter (the
structure of code had to be changed a bit and constants in the
tables reordered). This code can be used as a reference for
developing platform specific SIMD optimizations. These functions
are put into a new file 'sbc_primitives.c', which is going to
contain all the basic stuff for SBC codec.
2009-01-16 00:28:32 +01:00
Siarhei Siamashka
8bbfdf782d Fix for big endian problems in SBC codec 2009-01-07 14:41:46 +01:00
Christian Hoene
2cada66773 Fixed correct handling of frame sizes in the encoder 2009-01-06 03:41:57 +01:00
Siarhei Siamashka
365f92ed45 Use of constant shift in SBC quantization code to make it faster
The result of 32x32->64 unsigned long multiplication is returned
in two registers (high and low 32-bit parts) for many 32-bit
architectures. For these architectures constant right shift by
32 bits is optimized out by the compiler to just taking the high
32-bit part. Also some data needed at the quantization stage is
precalculated beforehand to improve performance.
2009-01-06 03:39:27 +01:00
Marcel Holtmann
fb333f1c88 Update copyright information 2009-01-01 19:33:20 +01:00
Siarhei Siamashka
8a206b8115 Added possibility to analyze 4 blocks at once in SBC encoder
This change is needed for SIMD optimizations which will follow
shortly. And even for non-SIMD capable platforms it still may
be useful to have possibility to merge several analyzing functions
together into one for better code scheduling or reusing loaded
constants. Also analysis filter functions are now called using
function pointers, which allows the default implementation to be
overrided at runtime (with high precision variant or MMX/SSE2/NEON
optimized code).
2009-01-01 09:52:37 +01:00
Siarhei Siamashka
a6cb57cd01 New SBC analysis filter function to replace current broken code
This code is heavily based on the patch submitted by Jaska Uimonen.
Additional changes include preserving extra bits in the output of
filter function for better precision, support for both 16-bit and
32-bit fixed point implementation. Sign of some table values was
changed in order to preserve a regular code structure and have
multiply-accumulate oparations only. No additional optimizations
were applied as this code is intended to be some kind of "reference"
implementation. Platform specific optimizations may require
different tricks and can be branched off from this implementation.
Some extra information about this code can be found in linux-bluetooth
mailing list archive for December 2008.
2008-12-29 12:52:20 +01:00
Siarhei Siamashka
635e9348a9 Fixed subbands selection for joint-stereo in SBC encoder 2008-12-29 11:31:59 +01:00
Marcel Holtmann
ce633965e1 Don't decode a frame if it is too small 2008-12-23 23:41:38 +01:00
Luiz Augusto von Dentz
50374ec694 Remove unnecessary code and fix a coding style. 2008-12-18 23:46:43 +01:00
Siarhei Siamashka
91a3fc0c35 Fix for overflow bug in SBC quantization code
The result of multiplication does not always fit into 32-bits. Using 64-bit
calculations helps to avoid overflows and sound quality problems in encoded
audio. Overflows are more likely to show up when using high values for
bitpool setting.
2008-12-18 23:46:05 +01:00
Siarhei Siamashka
037a47214c Bitstream writing optimization for SBC encoder
SBC encoder performance improvement up to 1.5x for ARM11
and almost twice faster for Intel Core2 in some cases.
2008-12-18 23:45:36 +01:00
Marcel Holtmann
40e63b5f54 Fix SBC gain mismatch 2008-10-31 23:55:13 +01:00
Marcel Holtmann
45c36dbd27 Avoid direct inclusion of malloc.h 2008-06-11 13:20:50 +00:00
Brad Midgley
9e446dba51 Cidorvan found another place where the spec had us saving a bunch of values
that were used immediately. Just compute and use instead of saving. In the decoder.
2008-03-08 05:21:26 +00:00
Brad Midgley
51a5483169 decoder optimization, now using nested multiply calls 2008-03-06 14:04:43 +00:00
Brad Midgley
7a68b05bea Cidorvan's 4-subband overflow fixes 2008-02-29 03:56:22 +00:00
Johan Hedberg
4170955ad1 Replace 64bits multiplies by 32bits to further optimize the code 2008-02-22 13:41:02 +00:00
Luiz Augusto von Dentz
ce342bf252 Introduce sbc new API. 2008-02-19 19:47:25 +00:00
Brad Midgley
ff51f4b0b2 fix for decoder noise at high bitpools 2008-02-15 18:06:32 +00:00
Marcel Holtmann
e823c15e43 Update copyright information 2008-02-02 03:37:05 +00:00
Brad Midgley
82540ead6b change MUL/MULA semantics 2008-01-30 20:37:49 +00:00
Brad Midgley
4b8bfb24c7 one more .X 32-bitism 2008-01-29 19:47:49 +00:00
Brad Midgley
c6d9a4373d revert 16-bit state.X change (bad on arm) 2008-01-29 18:56:13 +00:00
Brad Midgley
6d205fda03 revert arm conditional code 2008-01-28 18:00:51 +00:00
Brad Midgley
358888c523 change function signature so the arm optimization will work 2008-01-28 17:48:21 +00:00
Brad Midgley
38158dc5dd remove 16x16 mult optimization--gcc actually generates more costly code 2008-01-28 17:26:22 +00:00
Johan Hedberg
ba255beb79 Whitespace cleanup 2008-01-28 10:38:40 +00:00
Brad Midgley
d352bd0438 avoid an (unlikely) overflow 2008-01-27 04:29:08 +00:00
Brad Midgley
c2ce7c2d41 get 32-bit products whenever we're sure the multiplicands are both 16 bits 2008-01-27 03:35:53 +00:00
Brad Midgley
c9b5101059 shorten the encoder tables to 16 bits, take out mula32/mul32 for now for simplicity 2008-01-26 19:45:54 +00:00
Brad Midgley
dae52eacba pcm input array should be 16 not 32 bits
use 32-bit product when multiplying two values limited to 16 bits each
2008-01-26 05:24:50 +00:00
Brad Midgley
66fe637352 update copyrights 2008-01-19 15:56:52 +00:00
Brad Midgley
910d620357 coding style 2008-01-14 22:15:17 +00:00
Brad Midgley
8c72b28a0c comment typo 2008-01-14 20:03:42 +00:00
Brad Midgley
23f3e84ba4 fix initialization 2008-01-14 15:03:23 +00:00
Brad Midgley
d4d085c9a0 take out memmove in sbc analyze 2008-01-14 14:40:57 +00:00
Brad Midgley
0dda8d7230 tweak to the memmove for 4 subbands 2008-01-11 20:28:18 +00:00
Brad Midgley
40d383bb1e optimizations: use memmove instead of a loop, unroll short loop 2008-01-08 20:56:17 +00:00
Brad Midgley
1da4920f71 smooth out last shift-in-place wrinkle 2007-12-14 06:54:19 +00:00
Brad Midgley
9a14d423e7 push in-place-shift optimization up into scalefactors section 2007-12-14 06:43:06 +00:00
Brad Midgley
c1ce3b25c4 shift-in-place opt is back in, with a bugfix for the 4-subband case 2007-12-14 06:07:52 +00:00
Brad Midgley
5acc7043cf coding style on ?: 2007-12-14 00:46:09 +00:00
Brad Midgley
356a8b48cd be more strict about calculating from joint since client may set it to
a funky value other than 0/1
2007-12-14 00:15:33 +00:00
Brad Midgley
c5e43eefcc roll back the shift-in-place bitpack optimization while we figure out if
it tickles a bug or creates a bug for 4 subbands
2007-12-14 00:13:07 +00:00