# Albedo

> Trajectory-distillation king-of-the-hill subnet on Bittensor (netuid 97). Miners submit
> Qwen3-4B challengers locked to the king's size class; the reigning king and challenger duel
> on SWE-ZERO coding trajectories judged by an ensemble of LLMs. Full duel traces are
> published for downstream distillation.

**Subnet:** `ALBEDO_NETUID=97` (finney mainnet)

---

## Repository layout

```
├── albedo/                      ← main package (all shared logic)
│   ├── config.py                    Single source of truth — reads chain.toml at import time
│   ├── stats.py                     Pure math: paired_bootstrap_lcb, aggregate_duel
│   │
│   ├── storage/
│   │   ├── r2.py                    R2 (Cloudflare) client — validator-private state
│   │   ├── hippius.py               Hippius S3 client — public dashboard writes
│   │   ├── store.py                 ObjectStore — composes r2 + hippius
│   │   └── state.py                 State — all mutable validator state + close_eval()
│   │
│   ├── models/
│   │   ├── ref.py                   ModelRef dataclass (repo@digest)
│   │   ├── reveal.py                build_reveal_v4 / parse_reveal_v4 ("v4|repo|digest|hotkey")
│   │   ├── download.py              materialize_model — download to local cache
│   │   ├── upload.py                upload_model_folder — push weights to Hippius
│   │   └── template.py              ensure_chat_template + scrub_tokenizer_config
│   │
│   ├── judge/
│   │   ├── rubric.py                RUBRIC_SYSTEM + PROBE_SYSTEM prompt strings (only)
│   │   ├── verdict.py               Verdict dataclass + parse_verdict()
│   │   └── client.py                ChutesJudge — OpenRouter/Chutes transport, concurrency, retry/deadline handling
│   │
│   ├── preeval/
│   │   ├── fingerprint.py           L2-norm near-duplicate detection (cosine sim ≥ 0.95)
│   │   └── injection.py             LLM injection probe (fast probe profile, separate seed) — probe_batch, fail-closed → untested
│   │
│   ├── duel/
│   │   ├── sampler.py               TrajectoryDataset + Sample — deterministic SWE-ZERO sampling
│   │   ├── turn.py                  generate_turn / judge_generated_turn — separated staged eval units
│   │   └── stream.py                run_duel — staged SSE eval: generate all → judge fanout → aggregate
│   │
│   ├── validator/
│   │   ├── chain.py                 scan_reveals — decode v4 on-chain commitments
│   │   ├── admission.py             validate_challenger_config — 9 checks before GPU duel
│   │   ├── faults.py                is_infra_failure / is_miner_fault — classify eval failures
│   │   ├── weights.py               maybe_set_weights — emission split with uid_map snapshot
│   │   ├── challenge.py             process_challenge — one challenger end-to-end + /prune_cache after crown
│   │   ├── website.py               upload_website — push dashboard HTML to Hippius
│   │   ├── reset.py                 run_reset — automated one-shot competition reset (ALBEDO_RESET)
│   │   └── loop.py                  main(), _startup(), _main_loop()
│   │
│   └── eval_server/
│       ├── vllm.py                  VLLMProcess — manages one vLLM subprocess
│       ├── server_state.py          EvalState singleton (king_proc, chal_proc, eval_lock)
│       ├── endpoints.py             FastAPI: /health, /set_king, /eval, /prune_cache
│       └── sink.py                  DatasetSink — publish eval traces to S3
│
├── eval.py                      Thin entrypoint → albedo.eval_server (PM2 target)
├── validator.py                 Thin entrypoint → albedo.validator (PM2 target)
├── miner.py                     Standalone miner: discover king → train → upload → reveal
├── chain.toml                   Active chain config (single source of truth for all params)
├── archs/qwen3/__init__.py      Arch shim — imports transformers so HF Auto* resolves without trust_remote_code
│
├── scripts/
│   ├── collect_traces.py        Step 1: download public duel traces, format for training
│   ├── inspect_dataset.py       Step 2: validate dataset health before GPU hours
│   ├── train_sft.py             Step 3: full SFT fine-tune (single/multi-GPU)
│   ├── sanity_check.py          Step 4: run 3 SWE prompts, check format before upload
│   ├── upload_challenger.py     Step 5: upload checkpoint to Hippius + print reveal string
│   ├── seed_genesis.py          Operator: upload a new genesis king, print chain.toml snippet
│   └── reset_state.py           Operator: wipe all R2 validator state for fresh competition
│
└── configs/ds_zero2.json        DeepSpeed ZeRO-2 config for multi-GPU training
```

---

## chain.toml — configuration reference

All runtime parameters come from here. `albedo/config.py` loads it at import time.

```toml
[chain]
name        = "Albedo"
seed_repo   = "yourorg/albedo-qwen3-4b-genesis"   # genesis king Hippius repo
repo_pattern = "^[^/]+/albedo-qwen3-4b-.+$"       # challenger naming — must include "qwen3-4b"

[arch]
module = "archs.qwen3"              # HF shim — no trust_remote_code needed
extra_lock_keys = [
  # Architecture identity
  "max_position_embeddings",        # 40960 for Qwen3-4B
  "tie_word_embeddings",            # false
  "rope_theta",                     # 1000000.0
  # Capacity — locks competition to the 4B size class
  "hidden_size",
  "num_hidden_layers",
  "num_attention_heads",
  "num_key_value_heads",
  "intermediate_size",
  "head_dim",
]
# vocab_size and model_type are always locked (from albedo/config.py COMPAT_KEYS)

[seed]
tokenizer_repo = "Qwen/Qwen3-4B"   # HF tokenizer for eval server
seed_digest    = "sha256:..."       # fill after: python scripts/seed_genesis.py

[judge]
models = [                          # T=3 independent OpenRouter judges
  "minimax/minimax-m3",
  "deepseek/deepseek-v4-flash",
  "z-ai/glm-5.1",
]
base_url_env       = "CHUTES_BASE_URL"   # legacy Chutes base URL; disabled in production eval via ALBEDO_JUDGE_CHUTES_ENABLED=0
api_key_env        = "CHUTES_API_KEY"    # only required when Chutes is enabled
tie_band           = 0.01
call_timeout_s     = 300.0
max_429_wait_s     = 60.0
max_429_retries    = 8
retry_max          = 3
retry_initial_backoff_s = 1.5
max_concurrency_per_model = 6
min_interval_s_per_model  = 0.0
# --- Unified transport v2: OpenRouter-only in production, with Chutes still supported as a fallback path. ---
chutes_try_s        = 2.0
chutes_max_s        = 150.0
or_timeout_s        = 60.0
or_retries          = 3
judge_total_s       = 180.0
chutes_giveup_tasks = 2

[judge.rate_limits]                 # high fanout for staged judge phase
"minimax/minimax-m3"           = { max_concurrency = 64, min_interval_s = 0.0 }
"deepseek/deepseek-v4-flash"   = { max_concurrency = 64, min_interval_s = 0.0 }
"z-ai/glm-5.1"                 = { max_concurrency = 64, min_interval_s = 0.0 }

[judge.fallback]                    # OpenRouter provider
enabled          = true
base_url         = "https://openrouter.ai/api"   # client appends /v1/chat/completions
api_key_env      = "OPENROUTER_API_KEY"
chutes_429_grace_s = 4.0
reasoning_models = []
[judge.fallback.model_map]          # judge id → OpenRouter model id
"minimax/minimax-m3"           = "minimax/minimax-m3"
"deepseek/deepseek-v4-flash"   = "deepseek/deepseek-v4-flash"
"z-ai/glm-5.1"                 = "z-ai/glm-5.1"

[duel]
n_samples            = 128          # SWE-ZERO trajectories per duel
max_turns_per_sample = 10           # max turns per trajectory
gate_alpha           = 0.05         # bootstrap LCB gate threshold
bootstrap_resamples  = 10000
gen_temperature      = 1.0
gen_max_tokens       = 1024
gen_max_model_len    = 32768
king_chain_depth     = 5            # rolling payout: current + 4 past kings
duel_budget_s        = 11500.0      # soft wall-budget — stop launching new turns past this
win_margin           = 2.0          # challenger must beat king by ≥2 point (0–100 scale)

[preeval]
similarity_threshold = 0.95         # cosine sim ≥ this → duplicate rejection
injection_probes = 2                # fast pre-duel injection probe count
injection_judges = ["deepseek/deepseek-v4-flash"]  # fast probe judge; duel still uses [judge].models

[dataset]
repo        = "AlienKevin/SWE-ZERO-12M-trajectories"
shard_glob  = "data/train-*.parquet"
manifest_sha256 = "982a92bd..."     # dataset integrity pin
```

---

## Environment variables

### Validator
| Variable | Default | Purpose |
|---|---|---|
| `ALBEDO_NETUID` | **0** (aborts) | Must set to `97` |
| `ALBEDO_NETWORK` | `finney` | Bittensor network |
| `BT_WALLET_NAME` | `default` | Validator wallet name |
| `BT_WALLET_HOTKEY` | `default` | Validator hotkey name |
| `ALBEDO_EVAL_SERVER` | `http://localhost:9001` | Eval server endpoint (legacy alias: `ALBEDO_EVAL_SERVER_URL`) |
| `ALBEDO_R2_ENDPOINT` | — | Cloudflare R2 endpoint (private state) |
| `ALBEDO_R2_BUCKET` | — | R2 bucket name |
| `ALBEDO_R2_ACCESS_KEY` | — | R2 access key |
| `ALBEDO_R2_SECRET_KEY` | — | R2 secret key |
| `ALBEDO_DS_BUCKET` | — | Hippius public dashboard bucket |
| `ALBEDO_DS_ACCESS_KEY` | — | Hippius S3 access key |
| `ALBEDO_DS_SECRET_KEY` | — | Hippius S3 secret key |
| `ALBEDO_WEIGHT_INTERVAL` | `300` | Blocks between weight sets |
| `ALBEDO_BURN_UID` | `0` | UID to burn weights to when no registered king |
| `ALBEDO_RESET` | `0` | `1`/`true`/`yes` = one-shot reset on startup (guarded by `reset_marker.json`); `force` bypasses the guard |
| `ALBEDO_EVAL_HARD_TIMEOUT_S` | `14000` | Hard cap (s) on one eval's SSE stream before the validator kills it |
| `ALBEDO_STREAM_IDLE_WARN_S` | `600` | Warn after this many seconds of stream silence |
| `ALBEDO_STREAM_IDLE_KILL_S` | `1500` | Kill the eval after this many seconds of stream silence |

### Eval server
| Variable | Default | Purpose |
|---|---|---|
| `CHUTES_API_KEY` | required only if Chutes enabled | Legacy/optional Chutes judge auth |
| `CHUTES_BASE_URL` | `https://llm.chutes.ai/v1` | Judge API base URL |
| `OPENROUTER_API_KEY` | required | OpenRouter judge auth |
| `ALBEDO_JUDGE_FALLBACK` | `1` (from chain.toml `[judge.fallback].enabled`) | Toggle OpenRouter transport |
| `ALBEDO_JUDGE_CHUTES_ENABLED` | `1` | Set `0` on eval host for OpenRouter-only judging |
| `ALBEDO_JUDGE_MODEL_MAX_CONCURRENCY` | `6` | Default per-model in-flight cap override |
| `ALBEDO_JUDGE_MODEL_MIN_INTERVAL_S` | `0.0` | Default per-model min spacing override |
| `ALBEDO_DUEL_BUDGET_S` | `11500.0` | Soft duel wall-budget (s) |
| `ALBEDO_MAX_PARALLEL_TURNS` | `8` | Generation concurrency; eval host uses `64` for staged full-fanout tests |
| `HIPPIUS_HUB_TOKEN` | required | Model download auth (or `HIPPIUS_HUB_USERNAME`+`HIPPIUS_HUB_PASSWORD`) |
| `ALBEDO_DATASET_DIR` | `/root/albedo/dataset` | Local SWE-ZERO parquet directory |
| `ALBEDO_KING_GPUS` | `0` | CUDA_VISIBLE_DEVICES for king vLLM |
| `ALBEDO_CHAL_GPUS` | `1` | CUDA_VISIBLE_DEVICES for challenger vLLM |
| `ALBEDO_GPU_MEMORY_UTILIZATION` | `0.85` | vLLM memory fraction |
| `ALBEDO_MODEL_CACHE_DIR` | `/root/albedo/hippius_models` | Model weight cache |
| `ALBEDO_EVALS_S3_BUCKET` | — | Bucket for published eval traces |
| `ALBEDO_EVALS_S3_ENDPOINT` | — | S3 endpoint for eval traces |

### Miner
| Variable | Default | Purpose |
|---|---|---|
| `ALBEDO_NETUID` | **0** (aborts) | Must set to `97` |
| `ALBEDO_NETWORK` | `finney` | Bittensor network |
| `BT_WALLET_NAME` | `albedo` | Miner wallet name |
| `HIPPIUS_HUB_TOKEN` | required | Upload auth |
| `ALBEDO_CHALLENGER_NAMESPACE` | `miner` | Your Hippius namespace |

---

## Quick push: miner.py (all-in-one)

`miner.py` is a standalone one-shot path that runs the full miner pipeline in a single
command: discover king (from the dashboard) → download king weights → `train_or_perturb()`
→ validate locally → upload to Hippius → submit the v4 reveal on-chain. Use it for a
smoke test of the round trip; for a real submission, replace `train_or_perturb()` with
actual training (see the "Complete flow" below, or `scripts/train_sft.py`).

```bash
ALBEDO_NETUID=97 \
HIPPIUS_HUB_TOKEN=your_token \
BT_WALLET_NAME=albedo \
ALBEDO_CHALLENGER_NAMESPACE=youruser \
python miner.py --hotkey h0
```

**Flags:**
| Flag | Default | Purpose |
|---|---|---|
| `--hotkey` | `h0` | Miner hotkey (under `BT_WALLET_NAME`); also the default repo suffix |
| `--suffix` | `<hotkey>` | Repo suffix → repo is `{namespace}/albedo-qwen3-4b-{suffix}` |
| `--noise` | `0.001` | Gaussian-noise stddev for the `train_or_perturb()` stub |

Required env: `ALBEDO_NETUID=97` (aborts if `0`), `HIPPIUS_HUB_TOKEN` (upload auth).
The repo name must match `repo_pattern` (`^[^/]+/albedo-qwen3-4b-.+$`) or the run aborts.
On success: `reveal committed — validator picks up within ~30 s`.

> ⚠️ The bundled `train_or_perturb()` only copies the king and adds gaussian noise — it
> will **not** beat a trained king. It exists to validate the upload/reveal mechanics.

The reveal is committed via `subtensor.set_commitment(wallet, netuid, payload)` where
`payload = build_reveal_v4(ref)` → `v4|{repo}|{digest}` (3 fields; the hotkey is the
on-chain commit signer, not embedded in the payload — see `albedo/models/reveal.py`).

---

## Verify your reveal is on-chain

After pushing (via `miner.py` or `scripts/upload_challenger.py` + a manual reveal), confirm
the commitment actually landed on chain with `scripts/check_commits.py`. It's a read-only
inspector — it scans `get_all_commitments(netuid)`, decodes every `v4|...` reveal, and
prints repo / digest / reveal-block per hotkey. It does **not** touch validator state, the
king, or the dashboard.

```bash
cd /path/to/albedo
source .venv/bin/activate

# All v4 commits currently on chain
python scripts/check_commits.py

# Just your hotkey
python scripts/check_commits.py --hotkey 5GcD3Pk5XscAESmb...
```

Output is a table (`HOTKEY  REPO  DIGEST [block status]`) plus a summary line:
`scanned N commitments — v4=… non_v4=… spoofed=… invalid=…`.

- **status `ok`** — valid reveal; the validator will pick it up on its next chain scan (~20–30 s).
- **status `SPOOFED`** — 4-part reveal whose embedded author ≠ the committing chain hotkey. The validator burns these (`spoof_rejected`); re-commit with your own hotkey as the author.
- **no rows for your hotkey** — the commit didn't land (or hasn't propagated yet); re-run after a few seconds, and check the reveal string / wallet used.

Env: `ALBEDO_NETUID` (default `97`), `ALBEDO_NETWORK` (default `finney`) — same names the
validator and miner use.

---

## Complete flow: new challenger model

### Step 1 — Collect training data

Downloads all public `eval-*.jsonl.gz` traces from Hippius, extracts turns where
the challenger beat the king (`delta_avg > min_delta`), formats them for SFT.

```bash
python scripts/collect_traces.py \
  --out       data/traces.jsonl \
  --min-delta 0.05 \
  --cache-dir data/cache
```

Each output line: `{"text": "<Qwen3 chat-formatted conversation>", "delta_avg": 0.12, "eval_id": "eval-0042"}`

The `text` field is the full conversation including the winning challenger reply,
formatted with Qwen3 chat template. Train the model to predict this reply.

### Step 2 — Inspect dataset

```bash
python scripts/inspect_dataset.py data/traces.jsonl
```

Shows: example count, length distribution, delta histogram, and warnings for
too few examples (<200), dominant single eval (>40%), or missing `text` fields.

### Step 3 — Fine-tune

**Single A100:**
```bash
python scripts/train_sft.py \
  --base    Qwen/Qwen3-4B \
  --data    data/traces.jsonl \
  --output  checkpoints/v1 \
  --epochs  3
```

**4×A100 with DeepSpeed:**
```bash
torchrun --nproc_per_node=4 scripts/train_sft.py \
  --base       Qwen/Qwen3-4B \
  --data       data/traces.jsonl \
  --output     checkpoints/v1 \
  --deepspeed  configs/ds_zero2.json \
  --batch-size 4 \
  --grad-accum 2
```

Checkpoint saved to `checkpoints/v1/final/`.

### Step 4 — Sanity check

```bash
python scripts/sanity_check.py checkpoints/v1/final
```

Runs 3 SWE-style prompts and verifies: non-empty reply, exactly one bash block,
no injected verdict JSON. A model that fails format gets score 0.0 from every
judge on every turn.

### Step 5 — Upload to Hippius

```bash
HIPPIUS_HUB_TOKEN=your_token python scripts/upload_challenger.py \
  --model  checkpoints/v1/final \
  --repo   youruser/albedo-qwen3-4b-v1 \
  --hotkey 5GcD3Pk5XscAESmb...
```

Repo name must match `repo_pattern` from `chain.toml`:
`^[^/]+/albedo-qwen3-4b-.+$`
Valid: `alice/albedo-qwen3-4b-v1`, `bob/albedo-qwen3-4b-sft`

### Step 6 — Submit reveal on-chain

The script prints the reveal string. Submit it:

```python
import bittensor as bt
wallet    = bt.Wallet(name="default", hotkey="default")
subtensor = bt.Subtensor(network="finney")
result    = subtensor.commit(wallet, 97, "v4|youruser/albedo-qwen3-4b-v1|sha256:...|5GcD3...")
print("Committed:", result)
```

### Step 7 — Monitor

```bash
# Watch for your eval in the validator log
ssh templar "grep 'youruser\|albedo-qwen3' ~/.pm2/logs/albedo-validator-error.log -f"

# Dashboard
python3 -c "
import json, urllib.request
d = json.load(urllib.request.urlopen('https://us-east-1.hippius.com/albedo/dashboard.json'))
print('King:', d['king'].get('model_repo'), '| Queue:', len(d.get('queue', [])))
"
```

---

## Validator flow (internal)

```
chain scan (every ~20s)
  scan_reveals()
    → accept legacy 3-part (v4|repo|digest)
    → filter seen hotkeys, completed repos, king hotkeys
    → spoof detection (4-part only): author_hotkey ≠ chain_hotkey → burn + record_failure("spoof_rejected")
  enqueue()
    → 1-hotkey-1-eval enforcement
    → same repo from two hotkeys → second blocked at enqueue
  process_challenge()
    → validate_challenger_config() (9 checks — repo pattern, sha256, arch, locks, no .py, etc.)
    → _compute_seed(reveal_block)  ← uses commit block, not current block
    → POST /eval  (SSE stream)
    →   eval server: check_fingerprint (cosine sim ≥ 0.95 → duplicate_model; same-hotkey skipped)
    →   eval server: probe_injection (fast profile: 2 turns × deepseek/deepseek-v4-flash; unresolved → untested, fail-closed)
    →   eval server: run_duel (staged: sample 128 → generate all king/challenger replies → judge fanout → aggregate)
    →   eval server: gate-1 (challenger_score − king_score ≥ win_margin=2.0)
    →   eval server: gate-2 (paired bootstrap LCB > 0 at α=0.05)
    →   verdict SSE event
    → accepted → state.set_king() + /set_king on eval + /prune_cache + set_weights (wait_for_inclusion=False)
    → always: state.close_eval() in finally
```

**Key guarantees:**
- `close_eval()` is always called (finally block) even if `set_king()` raises
- `uid_map` is snapshotted before first `await` in `maybe_set_weights` (TOCTOU fix)
- King hotkeys excluded from `scan_reveals` via explicit `king_hotkeys` set (genesis king fix)
- `seen_hotkeys` S3 failure → `RuntimeError` raised, validator refuses to start

---

## Eval server flow (internal)

```
POST /eval (EvalRequest)
  ├─ 409 if eval_lock.locked()
  ├─ parse king/challenger ModelRef from req (repo + digest keys)
  └─ _stream() generator:
       acquire eval_lock
       Gate 1: materialize challenger → start vLLM → wait_healthy()
               (emits SSE phase:"materialize" keepalive) — on failure → error verdict SSE, return
       Gate 2: check_fingerprint(chal_dir, STATE.fingerprints, hotkey)
               same-hotkey entries skipped (own model not a dup); is_dup=True → duplicate verdict SSE, return
       Gate 3: probe_injection(challenger_url, eval_id, dataset_dir)
               (emits SSE phase:"probe" keepalive) — not is_clean → injection verdict SSE, return
       resolve_model_names() once (not per turn)
       run_duel(samples, king_client, chal_client, ...)
         emit phase:"generate" and generate all king/challenger replies first
           (max_parallel controls vLLM generation fanout; eval host uses 64)
         emit phase:"judge" and fan out all generated turn judge prompts
           using per-model OpenRouter semaphores from [judge.rate_limits]
         turn SSE events stream back as judged turns finish
         aggregate_duel(): metric-first mean → per-judge score → challenger_score vs king_score
         gate-1: (challenger_score - king_score) >= win_margin (2.0 points, 0–100 scale)
         gate-2: paired_bootstrap_lcb() LCB > 0 at gate_alpha=0.05
       emit verdict SSE
       add challenger fingerprint (background task, tagged with hotkey)
       sink.flush() → publish trace to S3

POST /set_king ({"king": {"repo": ..., "digest": ...}})
  ├─ idempotency: same model already alive → return ok
  ├─ materialize_model() (await — yield point)
  ├─ eval_lock.locked() check AFTER materialize (TOCTOU fix)
  ├─ king_proc.start() + wait_healthy()
  └─ fingerprint king in background
```

---

## Judge calls: OpenRouter-first (`albedo/judge/client.py`)

`ChutesJudge.query_judges()` is the shared judge transport for both duel scoring and the
injection probe. Production eval is OpenRouter-only by setting
`ALBEDO_JUDGE_CHUTES_ENABLED=0`; the Chutes stream-gated path remains available but is
not used on the eval host.

**Current duel panel (`[judge].models`):**
- `minimax/minimax-m3`
- `deepseek/deepseek-v4-flash`
- `z-ai/glm-5.1`

**Probe panel (`[preeval]`):** two injection probe turns judged only by
`deepseek/deepseek-v4-flash` for fast fail-closed preflight. Duel scoring still uses the
full 3-judge panel.

**OpenRouter resolution (`_openrouter`):** judge ids are remapped through
`[judge.fallback.model_map]`; `or_timeout_s=60.0s`, `or_retries=3`, and
`judge_total_s=180.0s` cap one judge resolution. Exhaustion leaves that judge
**unresolved** (`None` → parse failure for scoring; fail-closed `untested` for probes).

**Verdict parsing:** `parse_metric_verdict()` accepts raw JSON, fenced ```json blocks,
common metric-key aliases (e.g. `Groundedness`, `Task Progress`), and common winner labels
(`model 2`, `candidate B`, `challenger`, `draw`, etc.). The parser scores only final
assistant content, not reasoning text.

**Per-model request shaping** (`[judge.rate_limits]`): each judge gets its own
OpenRouter semaphore. The active eval config gives all three judges `max_concurrency=64`
for staged fanout.

---

## Admission gates (what gets a challenger rejected)

`validate_challenger_config()` in `albedo/validator/admission.py` runs these checks
in order before dispatching to the GPU eval box:

1. Repo name matches `REPO_PATTERN` from `chain.toml`
2. Digest starts with `sha256:` (not `hf:`)
3. `config.json` downloadable from Hippius (config-only fetch — fast)
4. `architectures` field matches king (`["Qwen3ForCausalLM"]`)
5. All lock keys match king: `vocab_size`, `model_type`, `max_position_embeddings`, `tie_word_embeddings`, `rope_theta`, `hidden_size`, `num_hidden_layers`, `num_attention_heads`, `num_key_value_heads`, `intermediate_size`, `head_dim`
6. No `auto_map` key in `config.json`
7. No `quantization_config` key in `config.json` — quantized models rejected
8. No `*.py` files in the repo
9. At least one `.safetensors` file

The capacity keys (`hidden_size`, `num_hidden_layers`, etc.) lock the competition to the **Qwen3-4B** size class — challengers must match the king's architecture exactly.

---

## Eval data (public duel traces)

```
https://us-east-1.hippius.com/albedo/evals/YYYY-MM-DD/eval-NNNN.jsonl.gz
```

Each file is gzip JSONL. Per-turn fields relevant for training:
- `messages_prefix` — conversation history up to this turn
- `messages_prompt` — current user turn
- `chal_reply` — the challenger's response (training target)
- `king_reply` — the king's response (for DPO preference pairs)
- `delta_avg` — `chal_score - king_score` averaged across judges
- `parse_ok` — False if any judge failed to parse a verdict
- `vllm_error` — set if vLLM failed to generate (turn not valid for training)

Filter: `delta_avg > 0.05` and `parse_ok=True` and `vllm_error is None` for clean SFT data.

---

## Launching a fresh competition (genesis reset)

```bash
# 1. Train or choose genesis model, upload to Hippius
python scripts/seed_genesis.py --hf-model Qwen/Qwen3-4B --repo yourorg/albedo-qwen3-4b-genesis
# — or for custom fine-tune:
python scripts/seed_genesis.py --local-dir checkpoints/genesis/final --repo yourorg/albedo-qwen3-4b-genesis

# 2. Paste printed digest into chain.toml [seed].seed_digest
# 3. Deploy updated code, then trigger an automated one-shot reset on next start:
git push
ssh templar "cd /path/to/albedo && git pull && ALBEDO_RESET=1 pm2 restart albedo-validator --update-env"

# 4. Verify — logs should show: "crowned new king: reign=#0"
```

---

## Key API contracts

**Reveal format:** `v4|{repo}|{digest}|{author_hotkey_ss58}` (4-part, spoof-checked).
Legacy 3-part `v4|{repo}|{digest}` is also accepted — the chain hotkey is treated as the author.

**EvalRequest body (POST /eval):**
```json
{
  "eval_id":    "eval-000042",
  "hotkey":     "5GcD3...",
  "seed_hex":   "a3f9c2...",
  "king":       {"repo": "org/albedo-qwen3-4b-v1", "digest": "sha256:..."},
  "challenger": {"repo": "org/albedo-qwen3-4b-v2", "digest": "sha256:..."}
}
```

**Verdict SSE event (event: verdict):**
```json
{
  "eval_id":          "eval-000042",
  "accepted":         true,
  "n_done":           128,
  "n_valid":          61,
  "vllm_errors":      3,
  "challenger_score": 56.2,
  "king_score":       43.8,
  "winner":           "challenger",
  "by_judge":         {"minimax/minimax-m3": 57.4, "deepseek/deepseek-v4-flash": 55.1, ...},
  "by_metric":        {"correctness": 55.1, "grounding": 57.0, "progress": 56.8, "protocol": 55.9, "efficiency": 56.2},
  "win_margin":       2.0,
  "margin_ok":        true,
  "mean_delta":       0.062,
  "lcb":              0.031,
  "se":               0.018,
  "gate_alpha":       0.05,
  "gate_lcb":         true,
  "judge_models":     ["minimax/minimax-m3", "deepseek/deepseek-v4-flash", "z-ai/glm-5.1"]
}
```

**set_king body (POST /set_king):**
```json
{"king": {"repo": "org/albedo-qwen3-4b-v1", "digest": "sha256:..."}}
```

---

## Links

- **Dashboard JSON:** https://us-east-1.hippius.com/albedo/dashboard.json
- **Dashboard UI:** https://us-east-1.hippius.com/albedo/index.html
- **Eval traces:** https://us-east-1.hippius.com/albedo/evals/
- **SWE-ZERO dataset:** https://huggingface.co/datasets/AlienKevin/SWE-ZERO-12M-trajectories
- **Hippius registry:** https://registry.hippius.com