7 Commits

Author SHA1 Message Date
64c3b48e06 chore: add .github/FUNDING.yml — enables Sponsor button on GitHub mirror
Sloba 2026-05-06T17:30Z chat: "i want to copy this public repo to my
github and make it a clone there as a public github repo with
sponsorship enabled".

GitHub mirror created at https://github.com/slobodanmargetic988/weeyuga-benchmarks-public
(public, with description + homepage pointing at
https://benchmarks.weeyuga.com); both `main` and the
`feature/runner-and-agent-instructions` branch pushed.

This commit lights up the "Sponsor" button on the GitHub repo's
landing page via .github/FUNDING.yml. The button currently routes to
github.com/sponsors/slobodanmargetic988 — Sloba's GitHub Sponsors
profile needs separate one-time enrollment at that URL before the
button does anything useful (see GitHub's "Set up GitHub Sponsors"
flow). Once enrolled, every public repo of his with this FUNDING.yml
gets a Sponsor button automatically.

The file also leaves placeholders for `custom`, `patreon`, `ko_fi`,
`tidelift`, `liberapay`, `polar` etc., commented out — uncomment +
edit when corresponding accounts exist.

Lives in `.github/FUNDING.yml` (the canonical GitHub-side path);
Gitea ignores the file harmlessly since it has no equivalent
sponsorship integration today, so the file is identical on both
mirrors with no Gitea-side noise.
2026-05-06 19:21:09 +02:00
98c3d4fa5b chore: flip repo to public — drop PRIVATE STAGING banner
Sloba 2026-05-06T16:55Z chat: "lets flip the switch i want the repo public".

Pre-launch security audit signed off:
  - G2-P0-1 (/Users/slobodan/ paths)        cleared end-to-end
  - G2-P0-2 (10.8.0.x WG mesh IPs)          cleared end-to-end
  - G2-P0-3 (Slobodans-MacBook-Air mDNS)    cleared end-to-end
  - G2-P0-4 (MyBoard / TruthGraph names)    cleared end-to-end

All four verified clean on the live https://benchmarks.weeyuga.com/
deploy at SHA 0ba4451 by Miljan's pen-test re-grep on
/index.html, /catalog.html, /benchmarks/09d8fbde.html (0 hits per
pattern per page). Full live security probe earlier the same day
also cleared: path traversal blocked, methods 405-locked, autoindex
OFF, hidden files (.git/.DS_Store/.env/.htaccess) all 404,
_template.html 404 via vhost SPA-fallback fix, all 6 security
headers + HSTS holding.

Updated:
  - README banner: "PRIVATE STAGING — pre-launch audit pending" →
    "PUBLIC — anonymous-readable since 2026-05-06"
  - Status table: "Pre-launch security audit | scheduled" →
    "cleared 2026-05-06 (G2 P0-1/2/3/4 all verified clean on live
    benchmarks.weeyuga.com at SHA 0ba4451)"
  - Status table: "Visibility flipped to public | pending audit
    sign-off" → "2026-05-06 ✓"

This commit lands the README copy update inside the repo. The
Gitea-side visibility flip (Settings → Visibility → Make Public)
is a UI click Sloba does himself; no Gitea API token available
locally to drive it from this session.
2026-05-06 19:06:16 +02:00
Slobodan Margetic
97a9245d9e feat: harness + agent runbook — flip repo from archive-only to
crowdsourced runner

Sloba's chat directive 2026-05-06: "this project is preparation for
going public ... ship the harness along so others can join in."

The repo's original purpose (Ben's catalogue + 21 reference run
ledgers, shipped 2026-05-05) stays intact. This commit ADDS a second
purpose: a portable harness + agent runbook so a friend's coding agent
can clone, read CLAUDE.md, run the same suite on the friend's hardware,
and submit results back as a PR.

What landed:

CLAUDE.md + AGENTS.md (byte-identical, ~520 lines)
  Full agent runbook: hardware probe, runtime + model selection,
  canonical knob reference (Sloba's Pavilion methodology values),
  hardware-adaptation decision rules, run-instructions, output-schema
  templates for hardware.json + metadata.json + run.md, PR submission
  flow (fork → branch → push → PR; nothing auto-merges), privacy
  guardrails, methodology lineage. Per Sloba's Q3 directive: the
  runbook explicitly tells the friend's agent to ADAPT to hardware
  reality and document deviations rather than blindly run defaults.

CONTRIBUTING.md (~110 lines)
  Human-readable companion for the friend (not the agent). What you
  need, how it works, what we ask, what maintainers commit to,
  license, code-of-conduct short version.

harness/
  ├── README.md        Technical readme for the harness folder
  ├── run_benchmark.py ~520 LOC runner. Stdlib-only. Adapted from
  │                    WeeyugaWeb/scripts/benchmarks/run_pavilion_weeyuga.py
  │                    v3 with the cluster-internal IP defaults
  │                    (10.8.0.x) replaced by 127.0.0.1:11434, the
  │                    cluster /v1/cluster/* endpoints removed, the
  │                    canonical-suite paths under ~/Documents/MyServers
  │                    replaced by harness/suites/ paths, the git-sha
  │                    enforcement on WeeyugaWeb dropped, and the
  │                    output written under submissions/<handle>/<tag>/
  │                    instead of docs/BENCHMARKS/runs/. Supports all
  │                    six suite phases via --phases, plus 'all'.
  ├── prompts.py       Verbatim copy of the canonical 3 frozen prompts
  │                    (P-EASY/P-MEDIUM/P-HARD) from
  │                    WeeyugaWeb/scripts/benchmarks/prompts.py.
  ├── requirements.txt Empty by intent (stdlib-only); placeholder for
  │                    pip-tools / agent auto-install patterns.
  ├── .gitignore       __pycache__/ etc.
  └── suites/          Six bundled JSON suites copied verbatim from
       Sloba's MyServers/instances/vps-81-17-99-14/telemetry/:
       small_model_eval_questions.json, python_task_suite_questions.json,
       parallel_qwen_same_model_20q_suite.json,
       parallel_qwen_mixed_model_20q_suite.json,
       python_context_edge_append_questions.json,
       python_context_edge_suite_only.json.

submissions/
  README.md            Folder convention + naming + reviewability rules
  EXAMPLE/mac-m1-8gb/run-00000000-...-000000000000/
       Synthetic-but-shape-complete contribution template:
       manifest.json, hardware.json, run.jsonl (5 example lines),
       metadata.json, run.md (with privacy attestation, methodology
       deviations, reproducibility command). Marked as synthetic at
       the top so future analysis doesn't accidentally cite it.

LICENSE-MIT
  MIT for harness/*.py and future helper code. Existing LICENSE
  (CC-BY-4.0) covers data files.

README.md (modified)
  Updated to reflect dual purpose. Layout diagram updated.
  Maintainer credits: Ben for catalogue/methodology + Bane for harness.
  Contributor quick-start added. Status table extended.

Privacy posture:
  - All 6 suite JSON files privacy-scanned for cluster IPs / hostnames /
    paths / tokens. Two prompts contain project names ("MyBoard" auth
    debugging in 20Q-Q14, generic SSH troubleshooting in 5Q-Q03);
    flagged in chat for Sloba's review. Otherwise clean.
  - run_benchmark.py default target_url is 127.0.0.1:11434 (no internal
    IPs leaked).
  - manifest.json captures host_hostname_short via socket.gethostname()
    .split('.')[0] — agent should review before PR if hostname is
    sensitive.
  - CLAUDE.md §8 spells out the privacy-grep before push.

Verification:
  - py_compile run_benchmark.py: OK
  - --help renders cleanly
  - All 6 suite JSON files: valid
  - All 4 example JSON files: valid
  - Example run.jsonl (5 lines): valid

This commit lands on branch feature/runner-and-agent-instructions.
NOT pushed to main; staying on the feature branch until Sloba reviews
on Gitea and merges. Bus dispatch to Ben + Sam announcing the
architectural pivot lives in the WeeyugaWeb coordination repo.
2026-05-06 19:05:22 +02:00
ddc9626136 publish: 21 run(s) — 09d8fbde-0008-49bb-99da-03eeaca72be1, 1bf57c9a-fd7a-49aa-90de-cd1907b15ddd, 212d6278-1b9b-45e9-8aae-7eed4d4ec822… 2026-05-06 14:28:25 +02:00
947c361b9c publish: 21 run(s) — 09d8fbde-0008-49bb-99da-03eeaca72be1, 1bf57c9a-fd7a-49aa-90de-cd1907b15ddd, 212d6278-1b9b-45e9-8aae-7eed4d4ec822… 2026-05-06 10:04:27 +02:00
a18db6a3da B3 staging seed — 21 runs + catalogue v1.0-draft + methodology + README
Initial population of the weeyuga-benchmarks-public archive (PRIVATE
staging visibility — flips public after Miljan + Stevan security audit
sign-off per Sloba's 17:34Z dispatch).

Contents:
- README.md       — public-facing intro (warns staging state, schema overview, citation pattern, license split)
- LICENSE         — CC-BY-4.0 default (auto-init from Gitea)
- catalogue.json  — schema_version=1.0-draft (locked once Tomas ratifies); 21 benchmarks indexed, 13 complete + 8 meta-only
- methodology.md  — mirror of WeeyugaWeb docs/BENCHMARKS/HARNESS.md (canonical methodology)
- runs/<id>/run.jsonl|run.log|run.md|metadata.json — packaged copies of every run in WeeyugaWeb docs/BENCHMARKS/runs/*

Run set covers:
- Mission 1 (2026-04-28/29): pavilion-weeyuga-v1 + reconstructed v3 (96 calls, 16 models routed via weeyuga :11435)
- Predator trio (2026-05-04): granite-4.1-8B + gemma-4-E4B-it + qwen3.5-9B
- Predator qwen rerun (2026-05-04): qwen3.5-9B think500/nothink + qwen3-14B feasibility
- A3B campaign (2026-05-04/05): pavilion-a3b + predator-a3b NGL matrix + ctx sweep + NGL+ctx 2D + NGL=6 deep dive
- VPS50 CPU matrix + gemma-e4b CPU lane (2026-05-04/05)

Visibility GATE: this repo stays private until Miljan G1-G4 audit and
Stevan G3 credential audit both green. After sign-off, single API call
flips visibility=public, anonymous read on, push-protection requires
auth, issues moderate by default.

No raw IPs, no SSH user@host strings, no /Users/ paths, no whisper
transcripts in any of these files. Hardware names (pavilion, predator,
vps50) are intentional and fine to share.

Builder: WeeyugaWeb/scripts/benchmarks/build_catalogue.py (deterministic,
idempotent, ~5s wall on 21 runs).
Publish flow: WeeyugaWeb/scripts/benchmarks/publish_bench_run.py
(builds packaged dirs, regenerates catalogue, optional --push to mirror
into this repo, optional --deploy stub for cicd rsync).

Owner: mac/benchmark-tester-ben (Ben).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 19:46:01 +02:00
5c726cf585 Initial commit 2026-05-05 19:44:29 +02:00