B3 staging seed — 21 runs + catalogue v1.0-draft + methodology + README
Initial population of the weeyuga-benchmarks-public archive (PRIVATE staging visibility — flips public after Miljan + Stevan security audit sign-off per Sloba's 17:34Z dispatch). Contents: - README.md — public-facing intro (warns staging state, schema overview, citation pattern, license split) - LICENSE — CC-BY-4.0 default (auto-init from Gitea) - catalogue.json — schema_version=1.0-draft (locked once Tomas ratifies); 21 benchmarks indexed, 13 complete + 8 meta-only - methodology.md — mirror of WeeyugaWeb docs/BENCHMARKS/HARNESS.md (canonical methodology) - runs/<id>/run.jsonl|run.log|run.md|metadata.json — packaged copies of every run in WeeyugaWeb docs/BENCHMARKS/runs/* Run set covers: - Mission 1 (2026-04-28/29): pavilion-weeyuga-v1 + reconstructed v3 (96 calls, 16 models routed via weeyuga :11435) - Predator trio (2026-05-04): granite-4.1-8B + gemma-4-E4B-it + qwen3.5-9B - Predator qwen rerun (2026-05-04): qwen3.5-9B think500/nothink + qwen3-14B feasibility - A3B campaign (2026-05-04/05): pavilion-a3b + predator-a3b NGL matrix + ctx sweep + NGL+ctx 2D + NGL=6 deep dive - VPS50 CPU matrix + gemma-e4b CPU lane (2026-05-04/05) Visibility GATE: this repo stays private until Miljan G1-G4 audit and Stevan G3 credential audit both green. After sign-off, single API call flips visibility=public, anonymous read on, push-protection requires auth, issues moderate by default. No raw IPs, no SSH user@host strings, no /Users/ paths, no whisper transcripts in any of these files. Hardware names (pavilion, predator, vps50) are intentional and fine to share. Builder: WeeyugaWeb/scripts/benchmarks/build_catalogue.py (deterministic, idempotent, ~5s wall on 21 runs). Publish flow: WeeyugaWeb/scripts/benchmarks/publish_bench_run.py (builds packaged dirs, regenerates catalogue, optional --push to mirror into this repo, optional --deploy stub for cicd rsync). Owner: mac/benchmark-tester-ben (Ben). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
102
README.md
102
README.md
@@ -1,3 +1,103 @@
|
||||
# weeyuga-benchmarks-public
|
||||
|
||||
Canonical raw-data archive for benchmarks.weeyuga.com — Weeyuga cluster benchmark runs (Pavilion, Predator, Mac, VPS50, RunPod). Currently PRIVATE STAGING; flips public after Miljan + Stevan security audit sign-off.
|
||||
> **Status: PRIVATE STAGING** — this repo is not yet public. Flips to anonymous-read after [Miljan + Stevan's pre-launch security audit](https://git.weeyuga.com/slobodanmargetic988/WeeyugaWeb/src/branch/main/coordination/messages) signs off. If you're reading this and you're not on the Weeyuga team, you got here too early.
|
||||
|
||||
Canonical raw-data archive for **[benchmarks.weeyuga.com](https://benchmarks.weeyuga.com)** — every benchmark run we publish on the site is mirrored here as raw JSONL + log + human summary so anyone can clone, re-analyse, or cite.
|
||||
|
||||
## Layout
|
||||
|
||||
```
|
||||
.
|
||||
├── README.md — this file
|
||||
├── LICENSE — CC-BY-4.0 (data) + MIT (helper code)
|
||||
├── catalogue.json — index of every published benchmark (mirror of the site catalogue)
|
||||
├── methodology.md — how we benchmark + fairness rules + reproducibility notes
|
||||
└── runs/
|
||||
└── <run-id>/
|
||||
├── run.jsonl — canonical raw event stream (one JSON object per line)
|
||||
├── run.log — tee'd stdout/stderr from the harness (when captured)
|
||||
├── run.md — human-readable summary (when synthesis exists)
|
||||
└── metadata.json — computed snapshot: meta record + per-cell aggregates + status
|
||||
```
|
||||
|
||||
## Run-ID format
|
||||
|
||||
Every run gets a UUID v4 `<run-id>` assigned at harness startup. Run IDs are stable across re-runs of synthesis — the same run-id always points to the same raw `run.jsonl` event stream. Synthesis docs (`run.md`) and computed metadata (`metadata.json`) can be regenerated from the canonical jsonl at any time.
|
||||
|
||||
## Schema
|
||||
|
||||
The `catalogue.json` index follows `schema_version = "1.0-draft"` (or later — check the value at the top of the file). Per-benchmark entries include:
|
||||
|
||||
- `id` — run-id
|
||||
- `title`, `headline`, `date`
|
||||
- `hardware` (pavilion / predator / mac / vps50 / runpod)
|
||||
- `engine` (llamacpp / ollama / vllm / mlx / cpu)
|
||||
- `harness` (which harness produced this — see `methodology.md` for the matrix)
|
||||
- `model_family`, `model_sizes`
|
||||
- `cells[]` — per-(machine × engine × model) summary: n_calls, n_errors, duration_ms (mean + p50), tokens_per_sec (mean + max)
|
||||
- `synthesis_doc` — filename of the synthesis prose for this run, if one exists
|
||||
- `tags`, `status`, `visibility`
|
||||
|
||||
Per-run `metadata.json` adds `cells_full[]` with the full call list inline.
|
||||
|
||||
## How to consume
|
||||
|
||||
### Just download a single run
|
||||
|
||||
```bash
|
||||
curl -O https://benchmarks.weeyuga.com/data/runs/<run-id>/run.jsonl
|
||||
```
|
||||
|
||||
### Clone the whole archive
|
||||
|
||||
```bash
|
||||
git clone https://git.weeyuga.com/slobodanmargetic988/weeyuga-benchmarks-public.git
|
||||
```
|
||||
|
||||
### Re-build catalogue from raw
|
||||
|
||||
The canonical builder lives in [WeeyugaWeb/scripts/benchmarks/build_catalogue.py](https://git.weeyuga.com/slobodanmargetic988/WeeyugaWeb/src/branch/main/scripts/benchmarks/build_catalogue.py) and runs against `runs/*.jsonl`. If you want to regenerate the catalogue from your own clone of this repo:
|
||||
|
||||
```bash
|
||||
git clone https://git.weeyuga.com/slobodanmargetic988/WeeyugaWeb.git
|
||||
cd WeeyugaWeb
|
||||
python3 scripts/benchmarks/build_catalogue.py
|
||||
```
|
||||
|
||||
## Citation
|
||||
|
||||
If you use this data, please cite as:
|
||||
|
||||
```
|
||||
Margetić, S. & contributors. (2026). Weeyuga cluster benchmarks (raw data archive).
|
||||
https://git.weeyuga.com/slobodanmargetic988/weeyuga-benchmarks-public
|
||||
```
|
||||
|
||||
(A more formal citation form will land here once Mila weighs in on academic-attribution conventions.)
|
||||
|
||||
## License
|
||||
|
||||
- **Data** (`runs/`, `catalogue.json`, `methodology.md`): [Creative Commons Attribution 4.0 International (CC-BY-4.0)](LICENSE)
|
||||
- **Helper code** (any future scripts inside this repo): [MIT](LICENSE-MIT) (separate file added if/when code lands here)
|
||||
|
||||
You are free to share, re-host, re-analyse, and remix the data with attribution.
|
||||
|
||||
## What's in here vs what's NOT
|
||||
|
||||
This repo contains **bench-run output only**. No source code. No infrastructure config. No application internals. Reproducing a run requires the [WeeyugaWeb](https://git.weeyuga.com/slobodanmargetic988/WeeyugaWeb) main repo (also Gitea-hosted; visibility separate).
|
||||
|
||||
## Reporting an issue with the data
|
||||
|
||||
If you spot a bench number that looks wrong, a methodology gap, or a privacy slip in published metadata: open an issue on this repo, or email the Weeyuga team. We'd rather know.
|
||||
|
||||
## Status
|
||||
|
||||
| What | State |
|
||||
|---|---|
|
||||
| Repo created | 2026-05-05 |
|
||||
| First 21 runs landed | 2026-05-05 |
|
||||
| Miljan + Stevan security audit | scheduled |
|
||||
| Visibility flipped to public | pending audit sign-off |
|
||||
| Site `benchmarks.weeyuga.com` live | pending Bane DNS + nginx + Tomas site |
|
||||
|
||||
Owner: `mac/benchmark-tester-ben` (Ben). For coordination, see the [WeeyugaWeb coordination bus](https://git.weeyuga.com/slobodanmargetic988/WeeyugaWeb/src/branch/main/coordination).
|
||||
|
||||
Reference in New Issue
Block a user