sw_scale: Add specializations for hscale 8 to 19

Add arm64 neon implementations for hscale 8 to 19 with filter sizes 4, 4X and 8. Both implementations are based on very similar ones dedicated to hscale 8 to 15. The major changes refer to saving the data - instead of writing the result as int16_t it is done with int32_t. These functions are heavily inspired on patches provided by J. Swinney and M. Storsjö for hscale8to15 which were slightly adapted for hscale8to19. The tests and benchmarks run on AWS Graviton 2 instances. The results from a checkasm tool shown below. hscale_8_to_19__fs_4_dstW_512_c: 5663.2 hscale_8_to_19__fs_4_dstW_512_neon: 1259.7 hscale_8_to_19__fs_8_dstW_512_c: 9306.0 hscale_8_to_19__fs_8_dstW_512_neon: 2020.2 hscale_8_to_19__fs_12_dstW_512_c: 12932.7 hscale_8_to_19__fs_12_dstW_512_neon: 2462.5 hscale_8_to_19__fs_16_dstW_512_c: 16844.2 hscale_8_to_19__fs_16_dstW_512_neon: 4671.2 hscale_8_to_19__fs_32_dstW_512_c: 32803.7 hscale_8_to_19__fs_32_dstW_512_neon: 5474.2 hscale_8_to_19__fs_40_dstW_512_c: 40948.0 hscale_8_to_19__fs_40_dstW_512_neon: 6669.7 Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>
author: Hubert Mazur <hum@semihalf.com> 2022-10-28 11:34:37 +0000
committer: Martin Storsjö <martin@martin.st> 2022-11-01 15:24:43 +0200
commit: 1e9cfa5bb0253784798029bda377823ab5d1a529 (patch)
tree: 4a2e3df1746a882819b45dfecaacffb2aa2493eb /libswscale/aarch64/swscale.c
parent: 16af424bf983409e00a84af1c98bbe7171ddd79f (diff)
download: ffmpeg-1e9cfa5bb0253784798029bda377823ab5d1a529.tar.gz
1 files changed, 9 insertions, 4 deletions
diff --git a/libswscale/aarch64/swscale.c b/libswscale/aarch64/swscale.c
index d1312c6658..171fb51329 100644
--- a/libswscale/aarch64/swscale.c
+++ b/libswscale/aarch64/swscale.c
@@ -29,7 +29,8 @@ void ff_hscale ## from_bpc ## to ## to_bpc ## _ ## filter_n ## _ ## opt( \
                                                 const int16_t *filter, \
                                                 const int32_t *filterPos, int filterSize)
 #define SCALE_FUNCS(filter_n, opt) \
-    SCALE_FUNC(filter_n,  8, 15, opt);
+    SCALE_FUNC(filter_n, 8, 15, opt); \
+    SCALE_FUNC(filter_n, 8, 19, opt);
 #define ALL_SCALE_FUNCS(opt) \
     SCALE_FUNCS(4, opt); \
     SCALE_FUNCS(X8, opt); \
@@ -48,9 +49,13 @@ void ff_yuv2plane1_8_neon(
         int offset);
 
 #define ASSIGN_SCALE_FUNC2(hscalefn, filtersize, opt) do {              \
-    if (c->srcBpc == 8 && c->dstBpc <= 14) {                            \
-      hscalefn =                                                        \
-        ff_hscale8to15_ ## filtersize ## _ ## opt;                      \
+    if (c->srcBpc == 8) {                                               \
+        if(c->dstBpc <= 14) {                                           \
+            hscalefn =                                                  \
+                ff_hscale8to15_ ## filtersize ## _ ## opt;              \
+        } else                                                          \
+            hscalefn =                                                  \
+                ff_hscale8to19_ ## filtersize ## _ ## opt;              \
     }                                                                   \
 } while (0)
author	Hubert Mazur <hum@semihalf.com>	2022-10-28 11:34:37 +0000
committer	Martin Storsjö <martin@martin.st>	2022-11-01 15:24:43 +0200
commit	1e9cfa5bb0253784798029bda377823ab5d1a529 (patch)
tree	4a2e3df1746a882819b45dfecaacffb2aa2493eb /libswscale/aarch64/swscale.c
parent	16af424bf983409e00a84af1c98bbe7171ddd79f (diff)
download	ffmpeg-1e9cfa5bb0253784798029bda377823ab5d1a529.tar.gz