Last Updated: 2026-02-02 Hardware: AMD Ryzen AI Max+ 395, Radeon 8060S (40 CU, gfx1151), 128GB unified memory Test Audio: 11s JFK “Ask not” speech (test-audio.wav) Model: ggml-base.en.bin (74M parameters)
| Image | Backend | Total Time | Realtime | Status |
|---|---|---|---|---|
| whisper-ablation-sdma-1 | ROCm 7.2 HIP | 447 ms | 24.6x | ✅ PASS |
| whisper-hipblaslt-0 | ROCm 7.2 HIP | 459 ms | 24.0x | ✅ PASS |
| whisper-ablation-sdma-0 | ROCm 7.2 HIP | 461 ms | 23.9x | ✅ PASS |
| whisper-hip-rocm72-gfx1151 | ROCm 7.2 HIP | 487 ms | 22.6x | ✅ PASS |
| whisper-vulkan-amdvlk | Vulkan AMDVLK | 533 ms | 20.6x | ✅ PASS |
| whisper-vulkan-radv | Vulkan RADV | 547 ms | 20.1x | ✅ PASS |
| softab-toolbox:whisper-vulkan-radv | Vulkan RADV | 548 ms | 20.1x | ✅ PASS |
| whisper-hip-rocm62-gfx1151 | ROCm 6.2 HIP | - | - | ❌ HANG |
| whisper-hip-rocm644-gfx1151 | ROCm 6.4.4 HIP | - | - | ❌ HANG |
| whisper-therock-gfx1151 | TheRock | - | - | ❌ NO CLI |
| whisper-hipblaslt-1 | ROCm HIP | - | - | ❌ NO CLI |
Realtime = audio_duration / transcription_time (higher = faster)
| Setting | PyTorch | Whisper | |———|———|———| | SDMA=0 | Recommended | Slightly slower | | SDMA=1 | May cause artifacts | Slightly faster |
For whisper, SDMA=1 appears safe and gives ~3% better performance.
Unlike PyTorch where hipBLASLt=0 gives +20%, whisper shows minimal difference.
Both ROCm 6.2 and 6.4.4 hang after loading the model. Use ROCm 7.2+.
| Status | Images |
|---|---|
| ✅ Working | whisper-hip-rocm72-, whisper-ablation-, whisper-hipblaslt-0, whisper-vulkan-* |
| ❌ Hang | whisper-hip-rocm62-, whisper-hip-rocm644- |
| ❌ Missing CLI | whisper-therock-*, whisper-hipblaslt-1 |
For production (stability):
softab:whisper-vulkan-radv # Vulkan, most stable
softab-toolbox:whisper-vulkan-radv # Toolbox variant
For performance (fastest):
softab:whisper-ablation-sdma-1 # ROCm 7.2, SDMA=1, fastest
softab:whisper-hipblaslt-0 # ROCm 7.2, hipBLASLt disabled
# Quick test (11s audio)
./benchmarks/whisper-simple.sh softab:whisper-vulkan-radv /data/models/test-audio.wav
# With timing output
podman run --rm \
--device=/dev/kfd --device=/dev/dri \
--ipc=host \
--security-opt seccomp=unconfined \
--security-opt label=disable \
-v /data/models:/models:ro \
softab:whisper-hip-rocm72-gfx1151 \
whisper-cli \
-m /models/ggml-base.en.bin \
-f /models/test-audio.wav \
--no-timestamps
| Setting | PyTorch | Whisper | Notes |
|---|---|---|---|
| ROCBLAS_USE_HIPBLASLT | 0 (disable) | 0 or 1 | Critical for PyTorch, minimal for Whisper |
| HSA_ENABLE_SDMA | 0 (disable) | 1 (enable) | Opposite recommendations! |
| ROCm version | TheRock 7.11 | ROCm 7.2 | Both work, whisper less picky |
| Vulkan driver | RADV | AMDVLK slightly faster | Different optimal drivers |
main instead of whisper-cliSee also: