diff options
author | Wu Jianhua <jianhua.wu@intel.com> | 2021-08-04 10:06:15 +0800 |
---|---|---|
committer | Paul B Mahol <onemda@gmail.com> | 2021-08-29 19:58:33 +0200 |
commit | 4041c1029b93162faacda9e3f3cd083d1fbca7ce (patch) | |
tree | 0c7693b822eb51ccbef214df0be4e91a56f734f5 /libavcodec/cljrdec.c | |
parent | 0c54ab20c254bf26c33a5cceb83862d3a59b3db7 (diff) | |
download | ffmpeg-4041c1029b93162faacda9e3f3cd083d1fbca7ce.tar.gz |
libavfilter/x86/vf_gblur: add localbuf and ff_horiz_slice_avx2/512()
We introduced a ff_horiz_slice_avx2/512() implemented on a new algorithm.
In a nutshell, the new algorithm does three things, gathering data from
8/16 rows, blurring data, and scattering data back to the image buffer.
Here we used a customized transpose 8x8/16x16 to avoid the huge overhead
brought by gather and scatter instructions, which is dependent on the
temporary buffer called localbuf added newly.
Performance data:
ff_horiz_slice_avx2(old): 109.89
ff_horiz_slice_avx2(new): 666.67
ff_horiz_slice_avx512: 1000
Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com>
Co-authored-by: Jin Jun <jun.i.jin@intel.com>
Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
Diffstat (limited to 'libavcodec/cljrdec.c')
0 files changed, 0 insertions, 0 deletions