summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* sbc: better compatibility with ARM thumb/thumb2Siarhei Siamashka2012-07-292-3/+3
| | | | | | | | | | ARM assembly optimizations fail to compile in thumb mode, but are fine for thumb2. Update ifdefs in the code to make use of ARM assembly only when it is safe and also make sure that no optimizations are missed when compiling for thumb2. The problem was reported by Paul Menzel: https://tango.0pointer.de/pipermail/pulseaudio-discuss/2011-February/009022.html
* sbc: detect when bitpool has changedLuiz Augusto von Dentz2012-07-291-1/+7
| | | | | | | | | | | A2DP spec allow bitpool changes midstream which is why sbc configuration has a range of values for bitpool that the encoder can use and decoder must support. Bitpool changes do not affect the state of encoder/decoder so they don't need to be reinitialize when this happens, so the impact is fairly small, what it does change is the frame length so encoders may change the bitpool to use the link more efficiently.
* sbc: Add iwmmxt optimization for sbc for pxa series cpuKeith Mok2012-07-293-0/+350
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add iwmmxt optimization for sbc for pxa series cpu. Benchmarked on ARM PXA platform: === Before (4 bands) ==== $ time ./sbcenc_orig -s 4 long.au > /dev/null real 0m 2.44s user 0m 2.39s sys 0m 0.05s === After (4 bands) ==== $ time ./sbcenc -s 4 long.au > /dev/null real 0m 1.59s user 0m 1.49s sys 0m 0.10s === Before (8 bands) ==== $ time ./sbcenc_orig -s 8 long.au > /dev/null real 0m 4.05s user 0m 3.98s sys 0m 0.07s === After (8 bands) ==== $ time ./sbcenc -s 8 long.au > /dev/null real 0m 1.48s user 0m 1.41s sys 0m 0.06s === Before (a2dp usage) ==== $ time ./sbcenc_orig -b53 -s8 -j long.au > /dev/null real 0m 4.51s user 0m 4.41s sys 0m 0.10s === After (a2dp usage) ==== $ time ./sbcenc -b53 -s8 -j long.au > /dev/null real 0m 2.05s user 0m 1.99s sys 0m 0.06s
* sbc: added "cc" to the clobber list of mmx inline assemblySiarhei Siamashka2012-07-291-3/+3
| | | | | | | | | | | | | | | | | | In the case of scale factors calculation optimizations, the inline assembly code has instructions which update flags register, but "cc" was not mentioned in the clobber list. When optimizing code, gcc theoretically is allowed to do a comparison before the inline assembly block, and a conditional branch after it which would lead to a problem if the flags register gets clobbered. While this is apparently not happening in practice with the current versions of gcc, the clobber list needs to be corrected. Regarding the other inline assembly blocks. While most likely it is actually unnecessary based on quick review, "cc" is also added there to the clobber list because it should have no impact on performance in practice. It's kind of cargo cult, but relieves us from the need to track the potential updates of flags register in all these places.
* sbc: ARMv6 optimized version of analysis filter for SBC encoderSiarhei Siamashka2012-07-293-0/+355
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The optimized filter gets enabled when the code is compiled with -mcpu=/-march options set to target the processors which support ARMv6 instructions. This code is also disabled when NEON is used (which is a lot better alternative). For additional safety ARM EABI is required and thumb mode should not be used. Benchmarks from ARM11: == 8 subbands == $ time ./sbcenc -b53 -s8 -j test.au > /dev/null real 0m 35.65s user 0m 34.17s sys 0m 1.28s $ time ./sbcenc.armv6 -b53 -s8 -j test.au > /dev/null real 0m 17.29s user 0m 15.47s sys 0m 0.67s == 4 subbands == $ time ./sbcenc -b53 -s4 -j test.au > /dev/null real 0m 25.28s user 0m 23.76s sys 0m 1.32s $ time ./sbcenc.armv6 -b53 -s4 -j test.au > /dev/null real 0m 18.64s user 0m 15.78s sys 0m 2.22s
* sbc: faster 'sbc_calculate_bits' functionSiarhei Siamashka2012-07-291-15/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By using SBC_ALWAYS_INLINE trick, the implementation of 'sbc_calculate_bits' function is split into two branches, each having 'subband' variable value known at compile time. It helps the compiler to generate more optimal code by saving at least one extra register, and also provides more obvious opportunities for loops unrolling. Benchmarked on ARM Cortex-A8: == Before: == $ time ./sbcenc -b53 -s8 -j test.au > /dev/null real 0m3.989s user 0m3.602s sys 0m0.391s samples % image name symbol name 26057 32.6128 sbcenc sbc_pack_frame 20003 25.0357 sbcenc sbc_analyze_4b_8s_neon 14220 17.7977 sbcenc sbc_calculate_bits 8498 10.6361 no-vmlinux /no-vmlinux 5300 6.6335 sbcenc sbc_calc_scalefactors_j_neon 3235 4.0489 sbcenc sbc_enc_process_input_8s_be_neon 2172 2.7185 sbcenc sbc_encode == After: == $ time ./sbcenc -b53 -s8 -j test.au > /dev/null real 0m3.652s user 0m3.195s sys 0m0.445s samples % image name symbol name 26207 36.0095 sbcenc sbc_pack_frame 19820 27.2335 sbcenc sbc_analyze_4b_8s_neon 8629 11.8566 no-vmlinux /no-vmlinux 6988 9.6018 sbcenc sbc_calculate_bits 5094 6.9994 sbcenc sbc_calc_scalefactors_j_neon 3351 4.6044 sbcenc sbc_enc_process_input_8s_be_neon 2182 2.9982 sbcenc sbc_encode
* sbc: slightly faster 'sbc_calc_scalefactors_neon'Siarhei Siamashka2012-07-291-15/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previous variant was basically derived from C and MMX implementations. Now new variant makes use of 'vmax' instruction, which is available in NEON and can do this job faster. The same method for calculating scale factors is also used in 'sbc_calc_scalefactors_j_neon'. Benchmarked without joint stereo on ARM Cortex-A8: == Before: == $ time ./sbcenc -b53 -s8 test.au > /dev/null real 0m3.851s user 0m3.375s sys 0m0.469s samples % image name symbol name 26260 34.2672 sbcenc sbc_pack_frame 20013 26.1154 sbcenc sbc_analyze_4b_8s_neon 13796 18.0027 sbcenc sbc_calculate_bits 8388 10.9457 no-vmlinux /no-vmlinux 3229 4.2136 sbcenc sbc_enc_process_input_8s_be_neon 2408 3.1422 sbcenc sbc_calc_scalefactors_neon 2093 2.7312 sbcenc sbc_encode == After: == $ time ./sbcenc -b53 -s8 test.au > /dev/null real 0m3.796s user 0m3.344s sys 0m0.438s samples % image name symbol name 26582 34.8726 sbcenc sbc_pack_frame 20032 26.2797 sbcenc sbc_analyze_4b_8s_neon 13808 18.1146 sbcenc sbc_calculate_bits 8374 10.9858 no-vmlinux /no-vmlinux 3187 4.1810 sbcenc sbc_enc_process_input_8s_be_neon 2027 2.6592 sbcenc sbc_encode 1766 2.3168 sbcenc sbc_calc_scalefactors_neon
* sbc: ARM NEON optimizations for input permutation in SBC encoderSiarhei Siamashka2012-07-291-0/+350
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using SIMD optimizations for 'sbc_enc_process_input_*' functions provides a modest, but consistent speedup in all SBC encoding cases. Benchmarked on ARM Cortex-A8: == Before: == $ time ./sbcenc -b53 -s8 -j test.au > /dev/null real 0m4.389s user 0m3.969s sys 0m0.422s samples % image name symbol name 26234 29.9625 sbcenc sbc_pack_frame 20057 22.9076 sbcenc sbc_analyze_4b_8s_neon 14306 16.3393 sbcenc sbc_calculate_bits 9866 11.2682 sbcenc sbc_enc_process_input_8s_be 8506 9.7149 no-vmlinux /no-vmlinux 5219 5.9608 sbcenc sbc_calc_scalefactors_j_neon 2280 2.6040 sbcenc sbc_encode 661 0.7549 libc-2.10.1.so memcpy == After: == $ time ./sbcenc -b53 -s8 -j test.au > /dev/null real 0m3.989s user 0m3.602s sys 0m0.391s samples % image name symbol name 26057 32.6128 sbcenc sbc_pack_frame 20003 25.0357 sbcenc sbc_analyze_4b_8s_neon 14220 17.7977 sbcenc sbc_calculate_bits 8498 10.6361 no-vmlinux /no-vmlinux 5300 6.6335 sbcenc sbc_calc_scalefactors_j_neon 3235 4.0489 sbcenc sbc_enc_process_input_8s_be_neon 2172 2.7185 sbcenc sbc_encode
* sbc: ARM NEON optimized joint stereo processing in SBC encoderSiarhei Siamashka2012-07-291-0/+243
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Improves SBC encoding performance when joint stereo is used, which is a typical A2DP configuration. Benchmarked on ARM Cortex-A8: == Before: == $ time ./sbcenc -b53 -s8 -j test.au > /dev/null real 0m5.239s user 0m4.805s sys 0m0.430s samples % image name symbol name 26083 25.0856 sbcenc sbc_pack_frame 21548 20.7240 sbcenc sbc_calc_scalefactors_j 19910 19.1486 sbcenc sbc_analyze_4b_8s_neon 14377 13.8272 sbcenc sbc_calculate_bits 9990 9.6080 sbcenc sbc_enc_process_input_8s_be 8667 8.3356 no-vmlinux /no-vmlinux 2263 2.1765 sbcenc sbc_encode 696 0.6694 libc-2.10.1.so memcpy == After: == $ time ./sbcenc -b53 -s8 -j test.au > /dev/null real 0m4.389s user 0m3.969s sys 0m0.422s samples % image name symbol name 26234 29.9625 sbcenc sbc_pack_frame 20057 22.9076 sbcenc sbc_analyze_4b_8s_neon 14306 16.3393 sbcenc sbc_calculate_bits 9866 11.2682 sbcenc sbc_enc_process_input_8s_be 8506 9.7149 no-vmlinux /no-vmlinux 5219 5.9608 sbcenc sbc_calc_scalefactors_j_neon 2280 2.6040 sbcenc sbc_encode 661 0.7549 libc-2.10.1.so memcpy
* sbc: Fix signedness of libsbc parametersJohan Hedberg2012-07-293-6/+7
| | | | | The written parameter of sbc_encode can be negative so it should be ssize_t instead of size_t.
* sbc: ARM NEON optimization for scale factors calculationSiarhei Siamashka2012-07-292-1/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Improves SBC encoding performance when joint stereo is not used. Benchmarked on ARM Cortex-A8: == Before: == $ time ./sbcenc -b53 -s8 test.au > /dev/null real 0m4.756s user 0m4.313s sys 0m0.438s samples % image name symbol name 2569 27.6296 sbcenc sbc_pack_frame 1934 20.8002 sbcenc sbc_analyze_4b_8s_neon 1386 14.9064 sbcenc sbc_calculate_bits 1221 13.1319 sbcenc sbc_calc_scalefactors 996 10.7120 sbcenc sbc_enc_process_input_8s_be 878 9.4429 no-vmlinux /no-vmlinux 204 2.1940 sbcenc sbc_encode 56 0.6023 libc-2.10.1.so memcpy == After: == $ time ./sbcenc -b53 -s8 test.au > /dev/null real 0m4.220s user 0m3.797s sys 0m0.422s samples % image name symbol name 2563 31.3249 sbcenc sbc_pack_frame 1892 23.1239 sbcenc sbc_analyze_4b_8s_neon 1368 16.7196 sbcenc sbc_calculate_bits 961 11.7453 sbcenc sbc_enc_process_input_8s_be 836 10.2176 no-vmlinux /no-vmlinux 262 3.2022 sbcenc sbc_calc_scalefactors_neon 199 2.4322 sbcenc sbc_encode 49 0.5989 libc-2.10.1.so memcpy
* sbc: MMX optimization for scale factors calculationSiarhei Siamashka2012-07-291-0/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Improves SBC encoding performance when joint stereo is not used. Benchmarked on Pentium-M: == Before: == $ time ./sbcenc -b53 -s8 test.au > /dev/null real 0m1.439s user 0m1.336s sys 0m0.104s samples % image name symbol name 8642 33.7473 sbcenc sbc_pack_frame 5873 22.9342 sbcenc sbc_analyze_4b_8s_mmx 4435 17.3188 sbcenc sbc_calc_scalefactors 4285 16.7331 sbcenc sbc_calculate_bits 1942 7.5836 sbcenc sbc_enc_process_input_8s_be 322 1.2574 sbcenc sbc_encode == After: == $ time ./sbcenc -b53 -s8 test.au > /dev/null real 0m1.319s user 0m1.220s sys 0m0.084s samples % image name symbol name 8706 37.9959 sbcenc sbc_pack_frame 5740 25.0513 sbcenc sbc_analyze_4b_8s_mmx 4307 18.7972 sbcenc sbc_calculate_bits 1937 8.4537 sbcenc sbc_enc_process_input_8s_be 1801 7.8602 sbcenc sbc_calc_scalefactors_mmx 307 1.3399 sbcenc sbc_encode
* sbc: new 'sbc_calc_scalefactors_j' function added to sbc primitivesSiarhei Siamashka2012-07-293-68/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | The code for scale factors calculation with joint stereo support has been moved to a separate function. It can get platform-specific SIMD optimizations later for best possible performance. But even this change in C code improves performance because of the use of __builtin_clz() instead of loops similar to what was done to sbc_calc_scalefactors earlier. Also technically it does loop unrolling by processing two channels at once, which might be either good or bad for performance (if the registers pressure is increased and more data is spilled to memory). But the benchmark from 32-bit x86 system (pentium-m) shows that it got clearly faster: $ time ./sbcenc.old -b53 -s8 -j test.au > /dev/null real 0m1.868s user 0m1.808s sys 0m0.048s $ time ./sbcenc.new -b53 -s8 -j test.au > /dev/null real 0m1.742s user 0m1.668s sys 0m0.064s
* sbc: Fix redundant null check on calling free()Gustavo F. Padovan2012-07-292-8/+4
| | | | Issues found by smatch static check: http://smatch.sourceforge.net/
* sbc: Update Nokia copyrightsJohan Hedberg2012-07-2914-0/+14
|
* sbc: Update copyright informationMarcel Holtmann2012-07-2915-18/+18
|
* sbc: added saturated clipping of decoder output to 16-bitSiarhei Siamashka2012-07-291-5/+15
| | | | | | This prevents overflows and audible artefacts for the audio files which originally had loudness maximized. Music from audio CD disks is an example of such files, see http://en.wikipedia.org/wiki/Loudness_war
* sbc: Do some coding style cleanupsMarcel Holtmann2012-07-292-24/+16
|
* sbc: fix up sbc.h prototypes to use const/size_t wherever applicableLennart Poettering2012-07-294-22/+45
|
* sbc: Remove unused variable.Luiz Augusto von Dentz2012-07-291-2/+0
|
* sbc: ensure 16-byte buffer position alignment for 4 subbands encodingSiarhei Siamashka2012-07-292-4/+4
| | | | | | | Buffer position in X array was not always 16-bytes aligned. Strict 16-byte alignment is strictly required for powerpc altivec simd optimizations because altivec does not have support for unaligned vector loads at all.
* sbc: Fix misuse of 'frame.joint' when estimating the frame length.Luiz Augusto von Dentz2012-07-291-17/+12
| | | | | 'frame.joint' is not the flag for joint stereo mode, it is a set of bits which show for which subbands channels joining was actually used.
* sbc: Fix a couple of other places that should use size_t and ssize_tJohan Hedberg2012-07-293-8/+11
|
* sbc: don't dereference sbc pointer if NULLMarc-André Lureau2012-07-291-2/+2
|
* sbc: provide implementation info as a readable stringMarc-André Lureau2012-07-296-0/+19
| | | | This is mainly useful for logging and debugging.
* sbc: make check_mmx_support() a proper C functionLennart Poettering2012-07-291-1/+1
| | | | Signed-off-by: Lennart Poettering <lennart@poettering.net>
* sbc: Fix SBC to compile cleanly with -Wsign-compareMarcel Holtmann2012-07-291-4/+7
|
* sbc: Fix for SBC encoding with block sizes other than 16Siarhei Siamashka2012-07-291-6/+13
| | | | | | Thanks to Christian Hoene for finding and reporting the problem. This regression was intruduced in commit 19af3c49e61aa046375497108e05a3a0605da158
* sbc: Add -Wno-sign-compare for the library and fix the other warningsMarcel Holtmann2012-07-293-4/+5
|
* sbc: SBC encoder scale factors calculation optimized with __builtin_clzSiarhei Siamashka2012-07-293-16/+50
| | | | | | | | | Count leading zeros operation is often implemented using a special instruction for it on various architectures (at least this is true for ARM and x86). Using __builtin_clz gcc intrinsic allows to eliminate innermost loop in scale factors calculation and improve performance. Also scale factors calculation can be optimized even more using SIMD instructions.
* sbc: Performance optimizations for input data processing in SBC encoderSiarhei Siamashka2012-07-295-203/+258
| | | | | | | | Channels deinterleaving, endian conversion and samples reordering is done in one pass, avoiding the use of intermediate buffer. Also this code is implemented as a new "performance primitive", which allows further platform specific optimizations (ARMv6 and ARM NEON should gain quite a lot from assembly optimizations here).
* sbc: Use of -funroll-loops option to improve SBC encoder performanceSiarhei Siamashka2012-07-292-16/+39
| | | | | | | | | Added the use of -funroll-loops gcc option for SBC. Also in order to gain better effect, 'sbc_pack_frame' function body moved to an inline function, which gets instantiated for 4 different subbands/channels combinations. So that 'frame_subbands' and 'frame_channels' arguments become compile time constants and can be better optimized by the compiler.
* sbc: Audio quality improvement for 16-bit fixed point SBC encoderSiarhei Siamashka2012-07-292-347/+270
| | | | | | | | | | | | | | | | | Multiplying the first part of the analysis filter constant tables by some coefficients and dividing the second part by the same coefficients is a transformation which should produce the same results if rounding errors are not taken into account. These additional C0/C1/... coefficients can be varied in a certain range (the requirement is that we still do not get overflows). The 'magic' values for these coefficients are selected in such a way that the rounding errors are minimized (rounding errors are unavoidable when putting all the floating constants into 16-bit tables and losing some of the fractional part). Also non-SIMD variant of the analysis filter is dropped because keeping it would require applying a similar change to its tables, which is a bit tricky and just increases maintenance overhead.
* sbc: Fix sbcenc breakage when au file header size is larger than 24 bytesSiarhei Siamashka2012-07-291-7/+17
|
* sbc: Performance optimizations for sbcenc utilitySiarhei Siamashka2012-07-291-72/+50
| | | | | | Read and write buffers sizes increased, memmove overhead eliminated. Nonportable cast from 'unsigned char *' to 'struct au_header *' is now also resolved as part of the changes.
* sbc: Coding style fixesSiarhei Siamashka2012-07-291-21/+32
|
* sbc: Fix indentation to use only tabsJohan Hedberg2012-07-295-219/+219
|
* sbc: MMX and ARM NEON optimized versions of analysis filter for SBC encoderSiarhei Siamashka2012-07-295-0/+764
|
* sbc: SBC arrays and constant tables aligned at 16 byte boundary for SIMDSiarhei Siamashka2012-07-294-15/+36
| | | | | | | Most SIMD instruction sets benefit from data being naturally aligned. And even if it is not strictly required, performance is usually better with the aligned data. ARM NEON and SSE2 have different instruction variants for aligned/unaligned memory accesses.
* sbc: SIMD-friendly variant of SBC encoder analysis filterSiarhei Siamashka2012-07-295-159/+701
| | | | | | | | | Added SIMD-friendly C implementation of SBC analysis filter (the structure of code had to be changed a bit and constants in the tables reordered). This code can be used as a reference for developing platform specific SIMD optimizations. These functions are put into a new file 'sbc_primitives.c', which is going to contain all the basic stuff for SBC codec.
* sbc: Fix for big endian problems in SBC codecSiarhei Siamashka2012-07-291-12/+0
|
* sbc: Fixed correct handling of frame sizes in the encoderChristian Hoene2012-07-293-5/+7
|
* sbc: Use of constant shift in SBC quantization code to make it fasterSiarhei Siamashka2012-07-291-10/+13
| | | | | | | | | The result of 32x32->64 unsigned long multiplication is returned in two registers (high and low 32-bit parts) for many 32-bit architectures. For these architectures constant right shift by 32 bits is optimized out by the compiler to just taking the high 32-bit part. Also some data needed at the quantization stage is precalculated beforehand to improve performance.
* sbc: Update copyright informationMarcel Holtmann2012-07-299-13/+13
|
* sbc: Added possibility to analyze 4 blocks at once in SBC encoderSiarhei Siamashka2012-07-291-49/+82
| | | | | | | | | | | This change is needed for SIMD optimizations which will follow shortly. And even for non-SIMD capable platforms it still may be useful to have possibility to merge several analyzing functions together into one for better code scheduling or reusing loaded constants. Also analysis filter functions are now called using function pointers, which allows the default implementation to be overrided at runtime (with high precision variant or MMX/SSE2/NEON optimized code).
* sbc: New SBC analysis filter function to replace current broken codeSiarhei Siamashka2012-07-293-244/+323
| | | | | | | | | | | | | | This code is heavily based on the patch submitted by Jaska Uimonen. Additional changes include preserving extra bits in the output of filter function for better precision, support for both 16-bit and 32-bit fixed point implementation. Sign of some table values was changed in order to preserve a regular code structure and have multiply-accumulate oparations only. No additional optimizations were applied as this code is intended to be some kind of "reference" implementation. Platform specific optimizations may require different tricks and can be branched off from this implementation. Some extra information about this code can be found in linux-bluetooth mailing list archive for December 2008.
* sbc: Fixed subbands selection for joint-stereo in SBC encoderSiarhei Siamashka2012-07-291-4/+4
|
* sbc: Add more options to control encoding methodsMarcel Holtmann2012-07-291-16/+51
|
* sbc: Don't decode a frame if it is too smallMarcel Holtmann2012-07-291-0/+3
|
* sbc: Remove unnecessary code and fix a coding style.Luiz Augusto von Dentz2012-07-291-14/+11
|