Last Updated: 2026-02-02 Hardware: AMD Ryzen AI Max+ 395, Radeon 8060S (40 CU, gfx1151), 128GB unified memory Test Audio: 11s JFK speech (test-audio.wav) Model: pyannote/speaker-diarization-3.1
| Image | PyTorch | GFX | Time | Realtime | Status |
|---|---|---|---|---|---|
| pyannote-rocm62-gfx1151 | 2.5.1+rocm6.2 | gfx1100 | 2.54s | 4.3x | ✅ PASS |
| pyannote-rocm62-working-gfx1151 | 2.5.1+rocm6.2 | gfx1100 | 2.59s | 4.2x | ✅ PASS |
| pyannote-rocm72-amd-gfx1151 | - | - | - | - | ❌ Import error |
| pyannote-rocm644-gfx1151 | - | - | - | - | ❌ Import error |
| pyannote-therock-gfx1151 | - | - | - | - | ❌ Import error |
Realtime = audio_duration / processing_time (higher = faster)
lightning.pytorch import errors| Status | Images |
|---|---|
| ✅ Working | pyannote-rocm62-gfx1151, pyannote-rocm62-working-gfx1151 |
| ❌ Import Error | pyannote-rocm72-, pyannote-rocm644-, pyannote-therock-* |
| ❓ Untested | pyannote-rocm62-minimal-gfx1151 |
ImportError: cannot import name 'is_oom_error' from 'lightning.pytorch.utilities.memory'
This is a pyannote-audio / pytorch-lightning version mismatch.
Pyannote models are gated - requires accepted license and token:
HF_TOKEN environment variableOlder pyannote uses use_auth_token, newer huggingface_hub uses token:
import huggingface_hub
_orig = huggingface_hub.hf_hub_download
def _patched(*args, **kwargs):
if 'use_auth_token' in kwargs:
kwargs['token'] = kwargs.pop('use_auth_token')
return _orig(*args, **kwargs)
huggingface_hub.hf_hub_download = _patched
# Using benchmark script
export HF_TOKEN="your_token_here"
./benchmarks/pyannote-simple.sh softab:pyannote-rocm62-gfx1151 /data/models/test-audio.wav
# Direct run
podman run --rm \
--device=/dev/kfd --device=/dev/dri \
--ipc=host \
--security-opt seccomp=unconfined \
--security-opt label=disable \
-e HF_TOKEN="$HF_TOKEN" \
-v /data/models:/audio:ro \
-v ~/.cache/huggingface:/root/.cache/huggingface \
softab:pyannote-rocm62-gfx1151 \
python3 -c "
import torch
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained('pyannote/speaker-diarization-3.1')
pipeline.to(torch.device('cuda'))
diarization = pipeline('/audio/test-audio.wav')
for turn, _, speaker in diarization.itertracks(yield_label=True):
print(f'[{turn.start:.1f}s - {turn.end:.1f}s] {speaker}')
"
| Workload | Best Backend | Realtime Factor | Notes |
|---|---|---|---|
| Whisper (11s) | ROCm 7.2 HIP | 24.6x | Transcription |
| Pyannote (11s) | ROCm 6.2 | 4.3x | Diarization |
| PyTorch ViT | TheRock 7.11 | - | 725 img/s |
Pyannote is slower than Whisper because diarization involves:
For speaker diarization on Strix Halo:
# Use the working ROCm 6.2 image
softab:pyannote-rocm62-gfx1151
# Or the explicitly named "working" variant
softab:pyannote-rocm62-working-gfx1151
See also: