Files
weeyuga-benchmarks-public/runs/ff1131ca-d021-4e06-8616-4b4cdb54e97e/run.jsonl
slobodanmargetic988 a18db6a3da B3 staging seed — 21 runs + catalogue v1.0-draft + methodology + README
Initial population of the weeyuga-benchmarks-public archive (PRIVATE
staging visibility — flips public after Miljan + Stevan security audit
sign-off per Sloba's 17:34Z dispatch).

Contents:
- README.md       — public-facing intro (warns staging state, schema overview, citation pattern, license split)
- LICENSE         — CC-BY-4.0 default (auto-init from Gitea)
- catalogue.json  — schema_version=1.0-draft (locked once Tomas ratifies); 21 benchmarks indexed, 13 complete + 8 meta-only
- methodology.md  — mirror of WeeyugaWeb docs/BENCHMARKS/HARNESS.md (canonical methodology)
- runs/<id>/run.jsonl|run.log|run.md|metadata.json — packaged copies of every run in WeeyugaWeb docs/BENCHMARKS/runs/*

Run set covers:
- Mission 1 (2026-04-28/29): pavilion-weeyuga-v1 + reconstructed v3 (96 calls, 16 models routed via weeyuga :11435)
- Predator trio (2026-05-04): granite-4.1-8B + gemma-4-E4B-it + qwen3.5-9B
- Predator qwen rerun (2026-05-04): qwen3.5-9B think500/nothink + qwen3-14B feasibility
- A3B campaign (2026-05-04/05): pavilion-a3b + predator-a3b NGL matrix + ctx sweep + NGL+ctx 2D + NGL=6 deep dive
- VPS50 CPU matrix + gemma-e4b CPU lane (2026-05-04/05)

Visibility GATE: this repo stays private until Miljan G1-G4 audit and
Stevan G3 credential audit both green. After sign-off, single API call
flips visibility=public, anonymous read on, push-protection requires
auth, issues moderate by default.

No raw IPs, no SSH user@host strings, no /Users/ paths, no whisper
transcripts in any of these files. Hardware names (pavilion, predator,
vps50) are intentional and fine to share.

Builder: WeeyugaWeb/scripts/benchmarks/build_catalogue.py (deterministic,
idempotent, ~5s wall on 21 runs).
Publish flow: WeeyugaWeb/scripts/benchmarks/publish_bench_run.py
(builds packaged dirs, regenerates catalogue, optional --push to mirror
into this repo, optional --deploy stub for cicd rsync).

Owner: mac/benchmark-tester-ben (Ben).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 19:46:01 +02:00

19 lines
12 KiB
JSON
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{"type": "meta", "benchmark_run_id": "ff1131ca-d021-4e06-8616-4b4cdb54e97e", "harness_version": "1", "harness_path": "scripts/benchmarks/run_pavilion_weeyuga.py", "git_sha": "9934892784228748586130d8abbacd82a919aee2", "git_dirty": true, "started_at_utc": "2026-04-28T21:03:46Z", "host": "Slobodans-MacBook-Air.local", "load_avg_start": [2.5576171875, 2.47900390625, 2.16552734375], "weeyuga_url": "http://10.8.0.3:11435", "phase_plan": "hello+5q", "models_planned": ["qwen3.5:4b", "qwen3.5:35b-a3b-uncensored-iq1m", "qwen3.5:35b-a3b-iq2s", "qwen3.5:9b-q6k", "qwen3.5:9b-q4km", "qwen3.5:2b", "qwen3.5:0.8b", "qwen3.5:9b", "qwen2.5-coder:14b", "qwen2.5-coder:3b", "qwen3:14b", "qwen3:8b", "qwen3:4b", "qwen2.5:3b", "qwen2.5-coder:1.5b", "qwen2.5-coder:0.5b"], "canonical_options": {"temperature": 0.1, "num_ctx": 4096, "num_predict": 2048}, "timeout_seconds": 360, "suite_5q_path": "/Users/slobodan/Documents/MyServers/instances/vps-81-17-99-14/telemetry/small_model_eval_questions.json", "suite_20q_path": "/Users/slobodan/Documents/MyServers/instances/vps-81-17-99-14/telemetry/python_task_suite_questions.json", "env_inference_route": null, "env_llamacpp_url": null}
{"type": "call", "ts_utc": "2026-04-28T21:03:50Z", "cell_id": "pavilion:weeyuga:qwen3.5:4b", "model": "qwen3.5:4b", "phase": "hello", "question_id": "hello_check", "run_idx": 0, "duration_seconds": 4.039, "prompt_tokens": 16, "completion_tokens": 128, "tokens_per_second": 31.69, "finish_reason": "stop", "weeyuga_meta": null, "status_code": 200, "response_chars": 57, "response_preview": "Hello! I'm glad to help you. What would you like to do? 😊", "required_markers": [], "markers_hit": [], "marker_hit_rate": null, "format_rule": "", "format_ok": null, "usable_answer": true, "error": null}
{"type": "call", "ts_utc": "2026-04-28T21:03:56Z", "cell_id": "pavilion:weeyuga:qwen3.5:35b-a3b-uncensored-iq1m", "model": "qwen3.5:35b-a3b-uncensored-iq1m", "phase": "hello", "question_id": "hello_check", "run_idx": 0, "duration_seconds": 5.989, "prompt_tokens": 16, "completion_tokens": 226, "tokens_per_second": 37.74, "finish_reason": "stop", "weeyuga_meta": null, "status_code": 200, "response_chars": 91, "response_preview": "Hello! I'd be happy to help you with anything you need. What can I assist you with today? 😊", "required_markers": [], "markers_hit": [], "marker_hit_rate": null, "format_rule": "", "format_ok": null, "usable_answer": true, "error": null}
{"type": "call", "ts_utc": "2026-04-28T21:04:00Z", "cell_id": "pavilion:weeyuga:qwen3.5:35b-a3b-iq2s", "model": "qwen3.5:35b-a3b-iq2s", "phase": "hello", "question_id": "hello_check", "run_idx": 0, "duration_seconds": 4.033, "prompt_tokens": 16, "completion_tokens": 126, "tokens_per_second": 31.24, "finish_reason": "stop", "weeyuga_meta": null, "status_code": 200, "response_chars": 57, "response_preview": "Hello! I'd love to help you. What would you like to do? 😊", "required_markers": [], "markers_hit": [], "marker_hit_rate": null, "format_rule": "", "format_ok": null, "usable_answer": true, "error": null}
{"type": "call", "ts_utc": "2026-04-28T21:04:03Z", "cell_id": "pavilion:weeyuga:qwen3.5:9b-q6k", "model": "qwen3.5:9b-q6k", "phase": "hello", "question_id": "hello_check", "run_idx": 0, "duration_seconds": 3.754, "prompt_tokens": 16, "completion_tokens": 139, "tokens_per_second": 37.03, "finish_reason": "stop", "weeyuga_meta": null, "status_code": 200, "response_chars": 159, "response_preview": "Hello! I'd love to help you. What would you like to work on today? Whether it's writing, coding, problem-solving, or just chatting, feel free to let me know! 😊", "required_markers": [], "markers_hit": [], "marker_hit_rate": null, "format_rule": "", "format_ok": null, "usable_answer": true, "error": null}
{"type": "call", "ts_utc": "2026-04-28T21:04:07Z", "cell_id": "pavilion:weeyuga:qwen3.5:9b-q4km", "model": "qwen3.5:9b-q4km", "phase": "hello", "question_id": "hello_check", "run_idx": 0, "duration_seconds": 3.684, "prompt_tokens": 16, "completion_tokens": 143, "tokens_per_second": 38.82, "finish_reason": "stop", "weeyuga_meta": null, "status_code": 200, "response_chars": 61, "response_preview": "Hello! I'd be happy to help you. What would you like to do? 😊", "required_markers": [], "markers_hit": [], "marker_hit_rate": null, "format_rule": "", "format_ok": null, "usable_answer": true, "error": null}
{"type": "call", "ts_utc": "2026-04-28T21:04:12Z", "cell_id": "pavilion:weeyuga:qwen3.5:2b", "model": "qwen3.5:2b", "phase": "hello", "question_id": "hello_check", "run_idx": 0, "duration_seconds": 4.749, "prompt_tokens": 16, "completion_tokens": 180, "tokens_per_second": 37.9, "finish_reason": "stop", "weeyuga_meta": null, "status_code": 200, "response_chars": 187, "response_preview": "Hello! I'd love to help you. What would you like to work on today? Whether it's learning a new skill, solving a problem, brainstorming ideas, or just chatting, feel free to let me know! 😊", "required_markers": [], "markers_hit": [], "marker_hit_rate": null, "format_rule": "", "format_ok": null, "usable_answer": true, "error": null}
{"type": "call", "ts_utc": "2026-04-28T21:04:15Z", "cell_id": "pavilion:weeyuga:qwen3.5:0.8b", "model": "qwen3.5:0.8b", "phase": "hello", "question_id": "hello_check", "run_idx": 0, "duration_seconds": 3.479, "prompt_tokens": 16, "completion_tokens": 133, "tokens_per_second": 38.23, "finish_reason": "stop", "weeyuga_meta": null, "status_code": 200, "response_chars": 63, "response_preview": "Hello! I'd love to help you. What would you like to do today? 😊", "required_markers": [], "markers_hit": [], "marker_hit_rate": null, "format_rule": "", "format_ok": null, "usable_answer": true, "error": null}
{"type": "call", "ts_utc": "2026-04-28T21:04:42Z", "cell_id": "pavilion:weeyuga:qwen3.5:9b", "model": "qwen3.5:9b", "phase": "hello", "question_id": "hello_check", "run_idx": 0, "duration_seconds": 26.517, "prompt_tokens": 16, "completion_tokens": 1081, "tokens_per_second": 40.77, "finish_reason": "stop", "weeyuga_meta": null, "status_code": 200, "response_chars": 179, "response_preview": "Hi! I'd be happy to help you. What would you like to do today? I can assist with a wide range of tasks, from writing code to answering questions. Please let me know what you need.", "required_markers": [], "markers_hit": [], "marker_hit_rate": null, "format_rule": "", "format_ok": null, "usable_answer": true, "error": null}
{"type": "call", "ts_utc": "2026-04-28T21:10:42Z", "cell_id": "pavilion:weeyuga:qwen2.5-coder:14b", "model": "qwen2.5-coder:14b", "phase": "hello", "question_id": "hello_check", "run_idx": 0, "duration_seconds": 360.061, "prompt_tokens": null, "completion_tokens": null, "tokens_per_second": null, "finish_reason": null, "weeyuga_meta": null, "status_code": null, "response_chars": 0, "response_preview": "", "required_markers": [], "markers_hit": [], "marker_hit_rate": null, "format_rule": "", "format_ok": null, "usable_answer": false, "error": "TimeoutError('timed out')"}
{"type": "call", "ts_utc": "2026-04-28T21:15:12Z", "cell_id": "pavilion:weeyuga:qwen2.5-coder:3b", "model": "qwen2.5-coder:3b", "phase": "hello", "question_id": "hello_check", "run_idx": 0, "duration_seconds": 269.603, "prompt_tokens": 35, "completion_tokens": 17, "tokens_per_second": 0.06, "finish_reason": "stop", "weeyuga_meta": null, "status_code": 200, "response_chars": 62, "response_preview": "Of course! I'm here to help. What do you need assistance with?", "required_markers": [], "markers_hit": [], "marker_hit_rate": null, "format_rule": "", "format_ok": null, "usable_answer": true, "error": null}
{"type": "call", "ts_utc": "2026-04-28T21:21:12Z", "cell_id": "pavilion:weeyuga:qwen3:14b", "model": "qwen3:14b", "phase": "hello", "question_id": "hello_check", "run_idx": 0, "duration_seconds": 360.083, "prompt_tokens": null, "completion_tokens": null, "tokens_per_second": null, "finish_reason": null, "weeyuga_meta": null, "status_code": null, "response_chars": 0, "response_preview": "", "required_markers": [], "markers_hit": [], "marker_hit_rate": null, "format_rule": "", "format_ok": null, "usable_answer": false, "error": "TimeoutError('timed out')"}
{"type": "call", "ts_utc": "2026-04-28T21:27:12Z", "cell_id": "pavilion:weeyuga:qwen3:8b", "model": "qwen3:8b", "phase": "hello", "question_id": "hello_check", "run_idx": 0, "duration_seconds": 360.084, "prompt_tokens": null, "completion_tokens": null, "tokens_per_second": null, "finish_reason": null, "weeyuga_meta": null, "status_code": null, "response_chars": 0, "response_preview": "", "required_markers": [], "markers_hit": [], "marker_hit_rate": null, "format_rule": "", "format_ok": null, "usable_answer": false, "error": "TimeoutError('timed out')"}
{"type": "call", "ts_utc": "2026-04-28T21:31:48Z", "cell_id": "pavilion:weeyuga:qwen3:4b", "model": "qwen3:4b", "phase": "hello", "question_id": "hello_check", "run_idx": 0, "duration_seconds": 276.449, "prompt_tokens": 16, "completion_tokens": 457, "tokens_per_second": 1.65, "finish_reason": "stop", "weeyuga_meta": null, "status_code": 200, "response_chars": 502, "response_preview": "Of course! 😊 I'm here to help with **anything** you need — whether it's homework, tech issues, writing, math, science, life advice, or just brainstorming ideas. \n\n**Just tell me:** \n- Whats *specifically* on your mind? \n- Whats stuck? ", "required_markers": [], "markers_hit": [], "marker_hit_rate": null, "format_rule": "", "format_ok": null, "usable_answer": true, "error": null}
{"type": "call", "ts_utc": "2026-04-28T21:32:19Z", "cell_id": "pavilion:weeyuga:qwen2.5:3b", "model": "qwen2.5:3b", "phase": "hello", "question_id": "hello_check", "run_idx": 0, "duration_seconds": 31.33, "prompt_tokens": 35, "completion_tokens": 50, "tokens_per_second": 1.6, "finish_reason": "stop", "weeyuga_meta": null, "status_code": 200, "response_chars": 208, "response_preview": "Of course! I'd be happy to help. What can I assist you with today? Whether it's information on cloud computing services, general advice, or any other query, feel free to ask and I'll do my best to assist you.", "required_markers": [], "markers_hit": [], "marker_hit_rate": null, "format_rule": "", "format_ok": null, "usable_answer": true, "error": null}
{"type": "call", "ts_utc": "2026-04-28T21:32:40Z", "cell_id": "pavilion:weeyuga:qwen2.5-coder:1.5b", "model": "qwen2.5-coder:1.5b", "phase": "hello", "question_id": "hello_check", "run_idx": 0, "duration_seconds": 20.572, "prompt_tokens": 35, "completion_tokens": 11, "tokens_per_second": 0.53, "finish_reason": "stop", "weeyuga_meta": null, "status_code": 200, "response_chars": 38, "response_preview": "Of course! How may I assist you today?", "required_markers": [], "markers_hit": [], "marker_hit_rate": null, "format_rule": "", "format_ok": null, "usable_answer": true, "error": null}
{"type": "call", "ts_utc": "2026-04-28T21:32:53Z", "cell_id": "pavilion:weeyuga:qwen2.5-coder:0.5b", "model": "qwen2.5-coder:0.5b", "phase": "hello", "question_id": "hello_check", "run_idx": 0, "duration_seconds": 13.367, "prompt_tokens": 35, "completion_tokens": 11, "tokens_per_second": 0.82, "finish_reason": "stop", "weeyuga_meta": null, "status_code": 200, "response_chars": 38, "response_preview": "Of course! How can I assist you today?", "required_markers": [], "markers_hit": [], "marker_hit_rate": null, "format_rule": "", "format_ok": null, "usable_answer": true, "error": null}
{"type": "call", "ts_utc": "2026-04-28T21:33:26Z", "cell_id": "pavilion:weeyuga:qwen3.5:4b", "model": "qwen3.5:4b", "phase": "5q", "question_id": "disk_guard_bash", "run_idx": 0, "duration_seconds": 32.117, "prompt_tokens": 71, "completion_tokens": 1125, "tokens_per_second": 35.03, "finish_reason": "stop", "weeyuga_meta": null, "status_code": 200, "response_chars": 247, "response_preview": "#!/bin/bash\n\n# Check disk usage for /\nif [ -d \"/\" ]; then\n disk_usage=$(df -P / | awk '{print $5}')\n if [ \"$disk_usage\" -gt 85 ]; then\n echo \"WARNING: Disk usage for / is at ${disk_usage}% (above 85%)\"\n exit 1\n fi\nfi\n", "required_markers": ["#!/usr/bin/env bash", "df -P /", "85", "exit 1"], "markers_hit": ["df -P /", "85", "exit 1"], "marker_hit_rate": 0.75, "format_rule": "bash_code", "format_ok": false, "usable_answer": true, "error": null}