lavc/aacenc_utils: replace powf(x,y) by expf(logf(x), y)

This is ~2x faster for y not an integer on Haswell+GCC, and should generally be faster due to the fact that anyway powf essentially does this under the hood. Made an inline function in lavu/internal.h for this purpose. Note that there are some accuracy differences, that should generally be negligible. In particular, FATE still passes on this platform. Results in ~ 7% speedup in aac encoding with -march=native, Haswell+GCC. before: ffmpeg -i sin.flac -acodec aac -y sin_new.aac 6.05s user 0.06s system 104% cpu 5.821 total after: ffmpeg -i sin.flac -acodec aac -y sin_new.aac 5.67s user 0.03s system 105% cpu 5.416 total This is also faster than an alternative approach that pulls in powf, gets rid of the crufty NaN checks and other special cases, exploits knowledge about the intervals, etc. This of course does not exclude smarter approaches; just suggests that there would need to be significant work on this front of lower utility than searches for hotspots elsewhere. Reviewed-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de> Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Ganesh Ajjanagadde <gajjanag@gmail.com> (cherry picked from commit bccc81dfa08e6561df6ed37860e3a08f7d983825)
author: Ganesh Ajjanagadde <gajjanag@gmail.com> 2016-03-07 21:16:29 -0500
committer: Rostislav Pehlivanov <atomnuker@gmail.com> 2016-03-28 16:57:41 +0100
commit: f281cb4ea93dc4c27ce93870eafffbe490b25247 (patch)
tree: fef62f24a6b0f1e579fbae2eb6d4e8ffb56cdf0a
parent: b176ab0556914a734932e934a5e904dad091ad71 (diff)
download: ffmpeg-f281cb4ea93dc4c27ce93870eafffbe490b25247.tar.gz
2 files changed, 21 insertions, 1 deletions
diff --git a/libavcodec/aacenc_utils.h b/libavcodec/aacenc_utils.h
index 41a62961e1..07f733746b 100644
--- a/libavcodec/aacenc_utils.h
+++ b/libavcodec/aacenc_utils.h
@@ -28,6 +28,7 @@
 #ifndef AVCODEC_AACENC_UTILS_H
 #define AVCODEC_AACENC_UTILS_H
 
+#include "libavutil/internal.h"
 #include "aac.h"
 #include "aacenctab.h"
 #include "aactab.h"
@@ -122,7 +123,10 @@ static inline float find_form_factor(int group_len, int swb_size, float thresh,
             if (s >= ethresh) {
                 nzl += 1.0f;
             } else {
-                nzl += powf(s / ethresh, nzslope);
+                if (nzslope == 2.f)
+                    nzl += (s / ethresh) * (s / ethresh);
+                else
+                    nzl += ff_fast_powf(s / ethresh, nzslope);
             }
         }
         if (e2 > thresh) {
diff --git a/libavutil/internal.h b/libavutil/internal.h
index c4bcf37ab8..44f8c1ee47 100644
--- a/libavutil/internal.h
+++ b/libavutil/internal.h
@@ -314,6 +314,22 @@ static av_always_inline float ff_exp10f(float x)
 }
 
 /**
+ * Compute x^y for floating point x, y. Note: this function is faster than the
+ * libm variant due to mainly 2 reasons:
+ * 1. It does not handle any edge cases. In particular, this is only guaranteed
+ * to work correctly for x > 0.
+ * 2. It is not as accurate as a standard nearly "correctly rounded" libm variant.
+ * @param x base
+ * @param y exponent
+ * @return x^y
+ */
+static av_always_inline float ff_fast_powf(float x, float y)
+{
+    return expf(logf(x) * y);
+}
+
+
+/**
  * A wrapper for open() setting O_CLOEXEC.
  */
 av_warn_unused_result
author	Ganesh Ajjanagadde <gajjanag@gmail.com>	2016-03-07 21:16:29 -0500
committer	Rostislav Pehlivanov <atomnuker@gmail.com>	2016-03-28 16:57:41 +0100
commit	f281cb4ea93dc4c27ce93870eafffbe490b25247 (patch)
tree	fef62f24a6b0f1e579fbae2eb6d4e8ffb56cdf0a
parent	b176ab0556914a734932e934a5e904dad091ad71 (diff)
download	ffmpeg-f281cb4ea93dc4c27ce93870eafffbe490b25247.tar.gz