# Predator trio bench — 09d8fbde-0008-49bb-99da-03eeaca72be1 - Started: 2026-05-04T16:01:52Z - Host (driver): Slobodans-MacBook-Air.local - Node: predator (http://10.8.0.7:11436) - Engine: llamacpp - Harness: predator-trio-1 ## Models | Key | Cell | GGUF | |---|---|---| | granite | `predator:llamacpp:granite-4.1:8b-q4km` | `granite-4.1-8b-Q4_K_M.gguf` | | gemma | `predator:llamacpp:gemma-4:e4b-it-q4km` | `gemma-4-E4B-it-Q4_K_M.gguf` | | qwen | `predator:llamacpp:qwen3.5:9b-q4km` | `Qwen3.5-9B-Q4_K_M.gguf` | ## llama-bench synthetic throughput (pp512 + tg128) | Model | pp512 (t/s) | tg128 (t/s) | |---|---|---| | granite | 142.89 | 12.19 | | gemma | 498.70 | 24.53 | | qwen | 63.64 | 7.43 | ## VRAM snapshot (used / free / total MB, post-load) | Model | Used | Free | Total | |---|---|---|---| | granite | 5990 | 40 | 6144 | | gemma | 4408 | 1622 | 6144 | | qwen | 5954 | 76 | 6144 | ## Per-prompt latency + tokens/sec | Model | Prompt | Cold dur (ms) | Cold tok/s | Warm tok/s mean | Warm dur p50 (ms) | Errors | |---|---|---|---|---|---|---| | granite | hello | 1782 | 5.6 | 11.1 | 917 | 0 | | granite | P-MEDIUM | 6095 | 14.4 | 15.3 | 6602 | 0 | | granite | P-HARD | 20275 | 15.6 | 15.7 | 18761 | 0 | | gemma | hello | 3519 | 18.2 | 19.5 | 2850 | 0 | | gemma | P-MEDIUM | 3858 | 22.3 | 22.9 | 12258 | 0 | | gemma | P-HARD | 28440 | 23.6 | 23.5 | 15838 | 0 | | qwen | hello | 5514 | 11.6 | 13.8 | 4633 | 0 | | qwen | P-MEDIUM | 35678 | 14.4 | 14.5 | 35311 | 0 | | qwen | P-HARD | 73737 | 13.9 | 14.6 | 70291 | 0 | ## Notes - `phase=cold` is the FIRST request after llama-server boot; subsequent runs are `warm`. - `tokens_per_sec` here is `completion_tokens / (total_duration_ms / 1000)` — includes prefill time, so true generation rate is slightly higher. Use llama-bench's `tg128` for prefill-free generation throughput. - For Qwen3.5-9B Q4_K_M (~5.4 GB) on GTX 1060 6 GB: KV cache + model exceeds VRAM with ctx 4096, so partial CPU offload is expected. Compare against granite-4.1-8B (5.0 GB) which fits more comfortably and gemma-4-E4B (4.6 GB) which fits with room to spare.