Stop fighting dependency hell. Test what works automatically.
AMD Strix Halo’s software stack changes constantly. Instead of manually troubleshooting why ROCm broke after a kernel upgrade, test all configurations systematically and document what works.
Run a quick test with a pre-built image:
# Download a small test model
mkdir -p /data/models
cd /data/models
huggingface-cli download TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF \
tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
# Build a test image (or pull pre-built)
podman build -t softab:llama-vulkan-radv \
-f docker/llama-cpp/Dockerfile.vulkan-radv .
# Run simple test
./benchmarks/llama-simple.sh \
softab:llama-vulkan-radv \
/data/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
# Expected: "Hello! How can I assist you today?"
Create your first ablation experiment:
# 1. Create experiment from template
cp -r experiments/TEMPLATE experiments/$(date +%Y-%m-%d)_first-test
cd experiments/$(date +%Y-%m-%d)_first-test
# 2. Record system environment (non-mutable variables)
./record-environment.sh > ENVIRONMENT.txt
# 3. Choose images to test (edit run-all-benchmarks.sh)
# Example: Compare Vulkan RADV vs AMDVLK
nano run-all-benchmarks.sh
# Set IMAGES array:
# IMAGES=(
# "softab:llama-vulkan-radv"
# "softab:llama-vulkan-amdvlk"
# )
# 4. Run benchmarks
./run-all-benchmarks.sh
# Results saved to raw_results/
# - llama-simple.log (does it work?)
# - llama-bench.log (how fast?)
# 5. Analyze with LLM
# "Analyze these benchmark logs and tell me which driver performs better"
# Use the canonical builder
cd /home/tc/softab
./docker/build-matrix.sh list # Show all available images
./docker/build-matrix.sh build-llama # Build all llama.cpp variants
# Or build manually
podman build -t softab:llama-hip-rocm72 \
-f docker/llama-cpp/Dockerfile.hip-rocm72 \
--build-arg GFX_TARGET=gfx1151 .
podman images | grep softab
# Simple test (does it work?)
./benchmarks/llama-simple.sh IMAGE MODEL_PATH
# Performance test (how well?)
./benchmarks/llama-bench.sh IMAGE MODEL_PATH
# PyTorch GEMM benchmark
./benchmarks/pytorch-gemm.sh IMAGE
# Use the helper script
./scripts/download-models.sh
# Or manually with huggingface-cli
pip install huggingface-hub[cli]
huggingface-cli download REPO_NAME FILENAME --local-dir /data/models
# Test Flash Attention impact
./scripts/ablation-flash-attention.sh /data/models/model.gguf
# Test Vulkan batch size tuning
./scripts/ablation-vulkan-batch-size.sh RADV /data/models/model.gguf
# Test container runtime flags
./scripts/ablation-container-flags.sh softab:llama-hip-rocm72 /data/models/model.gguf
# Check GPU visibility
rocminfo | grep gfx
lspci | grep VGA
# Ensure kernel 6.16.9+ for VRAM visibility
uname -r
# Check firmware version
rpm -q linux-firmware
# Avoid linux-firmware-20251125 (broken)
# Likely missing gfx1151 kernels
# Solution: Use TheRock nightlies or set HSA_OVERRIDE (not recommended)
export HSA_OVERRIDE_GFX_VERSION=11.0.0
export HSA_ENABLE_SDMA=0
# Ensure correct device access
podman run --device=/dev/kfd --device=/dev/dri \
--ipc=host \
--security-opt label=disable \
IMAGE_NAME
# On Fedora, SELinux may block GPU access
# Workaround: --security-opt label=disable
# Check GPU clock speed
cat /sys/class/drm/card0/device/pp_dpm_sclk
# Force high performance mode
echo high | sudo tee /sys/class/drm/card0/device/power_dpm_force_performance_level
# Verify SDMA is disabled
echo $HSA_ENABLE_SDMA # Should be 0
Use SoftAb if you want to answer:
Don’t use SoftAb if you want to answer:
SoftAb focuses on software stack configuration, not model comparisons or inference tuning.