Status: PRIVATE STAGING — this repo is not yet public. Flips to anonymous-read after Miljan + Stevan's pre-launch security audit signs off. If you're reading this and you're not on the Weeyuga team, you got here too early.

Canonical raw-data archive for benchmarks.weeyuga.com — every benchmark run we publish on the site is mirrored here as raw JSONL + log + human summary so anyone can clone, re-analyse, or cite.

Layout

.
├── README.md                       — this file
├── LICENSE                         — CC-BY-4.0 (data) + MIT (helper code)
├── catalogue.json                  — index of every published benchmark (mirror of the site catalogue)
├── methodology.md                  — how we benchmark + fairness rules + reproducibility notes
└── runs/
    └── <run-id>/
        ├── run.jsonl               — canonical raw event stream (one JSON object per line)
        ├── run.log                 — tee'd stdout/stderr from the harness (when captured)
        ├── run.md                  — human-readable summary (when synthesis exists)
        └── metadata.json           — computed snapshot: meta record + per-cell aggregates + status

Run-ID format

Every run gets a UUID v4 <run-id> assigned at harness startup. Run IDs are stable across re-runs of synthesis — the same run-id always points to the same raw run.jsonl event stream. Synthesis docs (run.md) and computed metadata (metadata.json) can be regenerated from the canonical jsonl at any time.

Schema

The catalogue.json index follows schema_version = "1.0-draft" (or later — check the value at the top of the file). Per-benchmark entries include:

id — run-id
title, headline, date
hardware (pavilion / predator / mac / vps50 / runpod)
engine (llamacpp / ollama / vllm / mlx / cpu)
harness (which harness produced this — see methodology.md for the matrix)
model_family, model_sizes
cells[] — per-(machine × engine × model) summary: n_calls, n_errors, duration_ms (mean + p50), tokens_per_sec (mean + max)
synthesis_doc — filename of the synthesis prose for this run, if one exists
tags, status, visibility

Per-run metadata.json adds cells_full[] with the full call list inline.

How to consume

Just download a single run

curl -O https://benchmarks.weeyuga.com/data/runs/<run-id>/run.jsonl

Clone the whole archive

git clone https://git.weeyuga.com/slobodanmargetic988/weeyuga-benchmarks-public.git

Re-build catalogue from raw

The canonical builder lives in WeeyugaWeb/scripts/benchmarks/build_catalogue.py and runs against runs/*.jsonl. If you want to regenerate the catalogue from your own clone of this repo:

git clone https://git.weeyuga.com/slobodanmargetic988/WeeyugaWeb.git
cd WeeyugaWeb
python3 scripts/benchmarks/build_catalogue.py

Citation

If you use this data, please cite as:

Margetić, S. & contributors. (2026). Weeyuga cluster benchmarks (raw data archive).
https://git.weeyuga.com/slobodanmargetic988/weeyuga-benchmarks-public

(A more formal citation form will land here once Mila weighs in on academic-attribution conventions.)

License

Data (runs/, catalogue.json, methodology.md): Creative Commons Attribution 4.0 International (CC-BY-4.0)
Helper code (any future scripts inside this repo): MIT (separate file added if/when code lands here)

You are free to share, re-host, re-analyse, and remix the data with attribution.

What's in here vs what's NOT

This repo contains bench-run output only. No source code. No infrastructure config. No application internals. Reproducing a run requires the WeeyugaWeb main repo (also Gitea-hosted; visibility separate).

Reporting an issue with the data

If you spot a bench number that looks wrong, a methodology gap, or a privacy slip in published metadata: open an issue on this repo, or email the Weeyuga team. We'd rather know.

Status

What	State
Repo created	2026-05-05
First 21 runs landed	2026-05-05
Miljan + Stevan security audit	scheduled
Visibility flipped to public	pending audit sign-off
Site `benchmarks.weeyuga.com` live	pending Bane DNS + nginx + Tomas site

Owner: mac/benchmark-tester-ben (Ben). For coordination, see the WeeyugaWeb coordination bus.

Description

Canonical raw-data archive for benchmarks.weeyuga.com — Weeyuga cluster benchmark runs (Pavilion, Predator, Mac, VPS50, RunPod). Currently PRIVATE STAGING; flips public after Miljan + Stevan security audit sign-off.

https://benchmarks.weeyuga.com

Readme CC-BY-4.0 502 KiB

README.md Unescape Escape

weeyuga-benchmarks-public