slobodanmargetic988/weeyuga-benchmarks-public

Go to file

slobodanmargetic988 64c3b48e06 chore: add .github/FUNDING.yml — enables Sponsor button on GitHub mirror

Sloba 2026-05-06T17:30Z chat: "i want to copy this public repo to my
github and make it a clone there as a public github repo with
sponsorship enabled".

GitHub mirror created at https://github.com/slobodanmargetic988/weeyuga-benchmarks-public
(public, with description + homepage pointing at
https://benchmarks.weeyuga.com); both `main` and the
`feature/runner-and-agent-instructions` branch pushed.

This commit lights up the "Sponsor" button on the GitHub repo's
landing page via .github/FUNDING.yml. The button currently routes to
github.com/sponsors/slobodanmargetic988 — Sloba's GitHub Sponsors
profile needs separate one-time enrollment at that URL before the
button does anything useful (see GitHub's "Set up GitHub Sponsors"
flow). Once enrolled, every public repo of his with this FUNDING.yml
gets a Sponsor button automatically.

The file also leaves placeholders for `custom`, `patreon`, `ko_fi`,
`tidelift`, `liberapay`, `polar` etc., commented out — uncomment +
edit when corresponding accounts exist.

Lives in `.github/FUNDING.yml` (the canonical GitHub-side path);
Gitea ignores the file harmlessly since it has no equivalent
sponsorship integration today, so the file is identical on both
mirrors with no Gitea-side noise.

2026-05-06 19:21:09 +02:00

.github

chore: add .github/FUNDING.yml — enables Sponsor button on GitHub mirror

2026-05-06 19:21:09 +02:00

harness

feat: harness + agent runbook — flip repo from archive-only to

2026-05-06 19:05:22 +02:00

runs

publish: 21 run(s) — 09d8fbde-0008-49bb-99da-03eeaca72be1, 1bf57c9a-fd7a-49aa-90de-cd1907b15ddd, 212d6278-1b9b-45e9-8aae-7eed4d4ec822…

2026-05-06 14:28:25 +02:00

submissions

feat: harness + agent runbook — flip repo from archive-only to

2026-05-06 19:05:22 +02:00

AGENTS.md

feat: harness + agent runbook — flip repo from archive-only to

2026-05-06 19:05:22 +02:00

catalogue.json

publish: 21 run(s) — 09d8fbde-0008-49bb-99da-03eeaca72be1, 1bf57c9a-fd7a-49aa-90de-cd1907b15ddd, 212d6278-1b9b-45e9-8aae-7eed4d4ec822…

2026-05-06 14:28:25 +02:00

CLAUDE.md

feat: harness + agent runbook — flip repo from archive-only to

2026-05-06 19:05:22 +02:00

CONTRIBUTING.md

feat: harness + agent runbook — flip repo from archive-only to

2026-05-06 19:05:22 +02:00

LICENSE

Initial commit

2026-05-05 19:44:29 +02:00

LICENSE-MIT

feat: harness + agent runbook — flip repo from archive-only to

2026-05-06 19:05:22 +02:00

methodology.md

B3 staging seed — 21 runs + catalogue v1.0-draft + methodology + README

2026-05-05 19:46:01 +02:00

README.md

chore: flip repo to public — drop PRIVATE STAGING banner

2026-05-06 19:06:16 +02:00

README.md

weeyuga-benchmarks-public

Status: PUBLIC — anonymous-readable since 2026-05-06. Pre-launch security audit signed off (G2 P0-1 through P0-4 cleared on the live site at SHA 0ba4451). Welcome — see CLAUDE.md / AGENTS.md if you want your coding agent to clone and run benchmarks on your hardware.

Open benchmarks for local LLMs — same prompts, same suites, run on whatever hardware you've got, results compose into one ladder.

This repo is two things in one:

Canonical archive — every benchmark Sloba's team publishes on benchmarks.weeyuga.com lives here as raw JSONL + computed metadata + human summary, so anyone can clone, re-analyse, or cite. This is the original purpose; see runs/.
Crowdsourced runner — a portable harness + agent runbook so a friend's coding agent (Claude Code, Codex, Aider, …) can clone this repo, read CLAUDE.md / AGENTS.md, run the same suite on the friend's hardware, and submit the result back as a PR. This is the newer purpose; see harness/ + submissions/.

Both purposes share one schema, one prompt set, one methodology.

Quick start — for friends contributing a benchmark

git clone https://git.weeyuga.com/slobodanmargetic988/weeyuga-benchmarks-public.git
cd weeyuga-benchmarks-public
# Hand this file path to your coding agent and say:
#   "Read CLAUDE.md, then run the Weeyuga benchmark on my hardware."

Then read CONTRIBUTING.md for the human-side flow (what to expect, how reviews work, license, code of conduct).

Layout

.
├── README.md                — this file
├── CLAUDE.md                — full agent runbook (read this if you're an LLM-driven agent)
├── AGENTS.md                — byte-identical to CLAUDE.md (Codex / other tools that prefer this name)
├── CONTRIBUTING.md          — human-readable contribution guide
├── LICENSE                  — CC-BY-4.0 for data
├── LICENSE-MIT              — MIT for harness/runner code
├── catalogue.json           — index of every published benchmark (canonical archive)
├── methodology.md           — how we benchmark + fairness rules + reproducibility notes
├── harness/                 — portable runner + suites + prompts (the crowdsourced piece)
│   ├── README.md
│   ├── run_benchmark.py
│   ├── prompts.py
│   ├── requirements.txt
│   └── suites/              — six bundled JSON suites (5Q, 20Q, parallel × 2, edge × 2)
├── runs/                    — canonical archive: 21 reference runs from Sloba's cluster
│   └── <run-id>/
│       ├── run.jsonl
│       ├── run.log          — when captured
│       ├── run.md           — when synthesis exists
│       └── metadata.json
└── submissions/             — community contributions land here
    ├── README.md
    ├── EXAMPLE/             — one fully-filled-out template you can read
    │   └── mac-m1-8gb/
    │       └── run-<uuid>/
    │           ├── manifest.json
    │           ├── hardware.json
    │           ├── run.jsonl
    │           ├── metadata.json
    │           └── run.md
    └── <handle>/            — your contributions
        └── <device-tag>/
            └── run-<uuid>/...

Run-ID format

Every run gets a UUID v4 <run-id> assigned at harness startup. Run IDs are stable across re-runs of synthesis — the same run-id always points to the same raw run.jsonl event stream. Synthesis docs (run.md) and computed metadata (metadata.json) can be regenerated from the canonical jsonl at any time.

Schema

The catalogue.json index follows schema_version = "1.0-draft" (or later — check the value at the top of the file). Per-benchmark entries include:

id — run-id
title, headline, date
hardware (pavilion / predator / mac / vps50 / runpod / community-<device-tag>)
engine (llamacpp / ollama / vllm / mlx / cpu)
harness (which harness produced this — see methodology.md)
model_family, model_sizes
cells[] — per-(machine × engine × model) summary
synthesis_doc — filename of the synthesis prose, if one exists
tags, status, visibility, site_grade

Per-run metadata.json adds cells_full[] with the full call list inline.

How to consume the archive

Single run

curl -O https://git.weeyuga.com/slobodanmargetic988/weeyuga-benchmarks-public/raw/branch/main/runs/<run-id>/run.jsonl

Whole archive

git clone https://git.weeyuga.com/slobodanmargetic988/weeyuga-benchmarks-public.git

Re-build catalogue from raw

The canonical builder lives in WeeyugaWeb/scripts/benchmarks/build_catalogue.py and runs against runs/*/run.jsonl.

How to contribute a benchmark

See CLAUDE.md (or AGENTS.md — same content) for the full agent runbook, and CONTRIBUTING.md for the human-side flow. Short version:

Clone repo
Hand CLAUDE.md to your coding agent
Agent probes hardware, picks a model, runs benchmark, writes results
You review what the agent produced
You fork → push → open PR
Maintainers review and merge

Read access is open. Write access is via PR only — nothing auto-merges.

Citation

Margetić, S. & contributors. (2026). Weeyuga local-LLM benchmarks.
https://git.weeyuga.com/slobodanmargetic988/weeyuga-benchmarks-public

License

Data (runs/, submissions/, catalogue.json, methodology.md, harness/suites/): Creative Commons Attribution 4.0 International (CC-BY-4.0).
Helper code (harness/*.py, future scripts): MIT.

You're free to share, re-host, re-analyse, and remix the data with attribution.

Reporting issues

If you spot a bench number that looks wrong, a methodology gap, or a privacy slip in published metadata: open an issue on this repo, or email the team at slobodan@weeyuga.com. We'd rather know.

Status

What	State
Repo created	2026-05-05
Canonical archive landed (21 runs)	2026-05-05
Harness + agent runbook landed	2026-05-06
Pre-launch security audit	cleared 2026-05-06 (G2 P0-1/2/3/4 all verified clean on live `benchmarks.weeyuga.com` at SHA `0ba4451`)
Visibility flipped to public	2026-05-06 ✓
First friend's submission merged	pending

Maintainers: mac/benchmark-tester-ben (catalogue + methodology), mac/devops-bane (harness + runner + this README change). For coordination, see the WeeyugaWeb coordination bus.

Description

Canonical raw-data archive for benchmarks.weeyuga.com — Weeyuga cluster benchmark runs (Pavilion, Predator, Mac, VPS50, RunPod). Currently PRIVATE STAGING; flips public after Miljan + Stevan security audit sign-off.

https://benchmarks.weeyuga.com

Readme CC-BY-4.0 502 KiB

README.md Unescape Escape

weeyuga-benchmarks-public

Quick start — for friends contributing a benchmark

Layout

Run-ID format

Schema

How to consume the archive

Single run

Whole archive

Re-build catalogue from raw

How to contribute a benchmark

Citation

License

Reporting issues

Status

README.md