FFmpeg devs boast of up to 94x performance boost after implementing handwritten AVX-512 assembly code

Gork@lemm.ee · 1 day ago

FFmpeg devs boast of up to 94x performance boost after implementing handwritten AVX-512 assembly code

Avid Amoeba@lemmy.ca · edit-2 23 hours ago

This is the right way to optimize performance. Write everything in a decent higher level language, to achieve good maintainability. Then profile for hotspots, separate them in well defined modules and optimize the shit out of them, even if it takes assembly inlining. The ugly stays its own box and you don’t spend time optimizing stuff that doesn’t need optimization.

andyburke@fedia.io · 22 hours ago

This person programs. ☝️ 🤝

chellomere@lemmy.world · edit-2 18 hours ago

This is great, but the context is that this is for specific inner loops, and it is compared to the C version of that specific inner loop. Typically what was used before this on a computer with avx512 was the avx2 version of the inner loop, and the speedup compared to that version appears to be up to 60%: https://x.com/FFmpeg/status/1852542388851601913 . Then as not a specific inner loop isn’t run all the time, the speedup is probably much less than 60%. This is still sizeable, but the actual speedup in practice with this implementation is far far from 94x.

lol@discuss.tchncs.de · 1 day ago

AMD’s Ryzen 9000-series CPUs feature a fully-enabled AVX-512 FPU so the owners of these processors can take advantage of the FFmpeg achievement.

I’ve got a Ryzen 7800x3D and can see a bunch of AVX-512 feature flags in /proc/cpuinfo: avx512f avx512dq rd avx512f avx512dq avx512ifma avx512cd avx512bw avx512vl avx512_bf16 avx512vbmi avx512_vbmi2 avx512_vnni avx512_bitalg avx512_vpopcntdq.

Does that mean it would improve performance for me as well or is some more specific feature required?

MorphiusFaydal@lemmy.world · 1 day ago

7000 series run AVX512 as two 256 bit data paths, while the 9000 series has a native 512 bit data path for AVX512.

Decipher0771@lemmy.ca · 22 hours ago

Yes, but it’ll likely still be faster, just not as dramatically. Half of 4-94x is still 2-47x faster.

InverseParallax@lemmy.world · 23 hours ago

I mean why not, that worked out perfectly fine for bulldozer…

chellomere@lemmy.world · 21 hours ago

Yeah 7000-series Ryzen benefits from the avx512 code paths in ffmpeg. I’ve benchmarked a 5900x vs a 7900x specifically for software H.265 decoding and there was a sizeable difference.

0x0@programming.dev · 24 hours ago

Unsung heroes.

FFmpeg devs boast of up to 94x performance boost after implementing handwritten AVX-512 assembly code

FFmpeg devs boast of up to 94x performance boost after implementing handwritten AVX-512 assembly code

archive.ph