# Albedo > Trajectory-distillation king-of-the-hill subnet on Bittensor (netuid 97). Miners submit > Qwen3-4B challengers locked to the king's size class; the reigning king and challenger duel > on SWE-ZERO coding trajectories judged by an ensemble of LLMs. Full duel traces are > published for downstream distillation. **Subnet:** `ALBEDO_NETUID=97` (finney mainnet) --- ## Repository layout ``` ├── albedo/ ← main package (all shared logic) │ ├── config.py Single source of truth — reads chain.toml at import time │ ├── stats.py Pure math: paired_bootstrap_lcb, aggregate_duel │ │ │ ├── storage/ │ │ ├── r2.py R2 (Cloudflare) client — validator-private state │ │ ├── hippius.py Hippius S3 client — public dashboard writes │ │ ├── store.py ObjectStore — composes r2 + hippius │ │ └── state.py State — all mutable validator state + close_eval() │ │ │ ├── models/ │ │ ├── ref.py ModelRef dataclass (repo@digest) │ │ ├── reveal.py build_reveal_v4 / parse_reveal_v4 ("v4|repo|digest|hotkey") │ │ ├── download.py materialize_model — download to local cache │ │ ├── upload.py upload_model_folder — push weights to Hippius │ │ └── template.py ensure_chat_template + scrub_tokenizer_config │ │ │ ├── judge/ │ │ ├── rubric.py RUBRIC_SYSTEM + PROBE_SYSTEM prompt strings (only) │ │ ├── verdict.py Verdict dataclass + parse_verdict() │ │ └── client.py ChutesJudge — OpenRouter/Chutes transport, concurrency, retry/deadline handling │ │ │ ├── preeval/ │ │ ├── fingerprint.py L2-norm near-duplicate detection (cosine sim ≥ 0.95) │ │ └── injection.py LLM injection probe (fast probe profile, separate seed) — probe_batch, fail-closed → untested │ │ │ ├── duel/ │ │ ├── sampler.py TrajectoryDataset + Sample — deterministic SWE-ZERO sampling │ │ ├── turn.py generate_turn / judge_generated_turn — separated staged eval units │ │ └── stream.py run_duel — staged SSE eval: generate all → judge fanout → aggregate │ │ │ ├── validator/ │ │ ├── chain.py scan_reveals — decode v4 on-chain commitments │ │ ├── admission.py validate_challenger_config — 9 checks before GPU duel │ │ ├── faults.py is_infra_failure / is_miner_fault — classify eval failures │ │ ├── weights.py maybe_set_weights — emission split with uid_map snapshot │ │ ├── challenge.py process_challenge — one challenger end-to-end + /prune_cache after crown │ │ ├── website.py upload_website — push dashboard HTML to Hippius │ │ ├── reset.py run_reset — automated one-shot competition reset (ALBEDO_RESET) │ │ └── loop.py main(), _startup(), _main_loop() │ │ │ └── eval_server/ │ ├── vllm.py VLLMProcess — manages one vLLM subprocess │ ├── server_state.py EvalState singleton (king_proc, chal_proc, eval_lock) │ ├── endpoints.py FastAPI: /health, /set_king, /eval, /prune_cache │ └── sink.py DatasetSink — publish eval traces to S3 │ ├── eval.py Thin entrypoint → albedo.eval_server (PM2 target) ├── validator.py Thin entrypoint → albedo.validator (PM2 target) ├── miner.py Standalone miner: discover king → train → upload → reveal ├── chain.toml Active chain config (single source of truth for all params) ├── archs/qwen3/__init__.py Arch shim — imports transformers so HF Auto* resolves without trust_remote_code │ ├── scripts/ │ ├── collect_traces.py Step 1: download public duel traces, format for training │ ├── inspect_dataset.py Step 2: validate dataset health before GPU hours │ ├── train_sft.py Step 3: full SFT fine-tune (single/multi-GPU) │ ├── sanity_check.py Step 4: run 3 SWE prompts, check format before upload │ ├── upload_challenger.py Step 5: upload checkpoint to Hippius + print reveal string │ ├── seed_genesis.py Operator: upload a new genesis king, print chain.toml snippet │ └── reset_state.py Operator: wipe all R2 validator state for fresh competition │ └── configs/ds_zero2.json DeepSpeed ZeRO-2 config for multi-GPU training ``` --- ## chain.toml — configuration reference All runtime parameters come from here. `albedo/config.py` loads it at import time. ```toml [chain] name = "Albedo" seed_repo = "yourorg/albedo-qwen3-4b-genesis" # genesis king Hippius repo repo_pattern = "^[^/]+/albedo-qwen3-4b-.+$" # challenger naming — must include "qwen3-4b" [arch] module = "archs.qwen3" # HF shim — no trust_remote_code needed extra_lock_keys = [ # Architecture identity "max_position_embeddings", # 40960 for Qwen3-4B "tie_word_embeddings", # false "rope_theta", # 1000000.0 # Capacity — locks competition to the 4B size class "hidden_size", "num_hidden_layers", "num_attention_heads", "num_key_value_heads", "intermediate_size", "head_dim", ] # vocab_size and model_type are always locked (from albedo/config.py COMPAT_KEYS) [seed] tokenizer_repo = "Qwen/Qwen3-4B" # HF tokenizer for eval server seed_digest = "sha256:..." # fill after: python scripts/seed_genesis.py [judge] models = [ # T=3 independent OpenRouter judges "minimax/minimax-m3", "deepseek/deepseek-v4-flash", "z-ai/glm-5.1", ] base_url_env = "CHUTES_BASE_URL" # legacy Chutes base URL; disabled in production eval via ALBEDO_JUDGE_CHUTES_ENABLED=0 api_key_env = "CHUTES_API_KEY" # only required when Chutes is enabled tie_band = 0.01 call_timeout_s = 300.0 max_429_wait_s = 60.0 max_429_retries = 8 retry_max = 3 retry_initial_backoff_s = 1.5 max_concurrency_per_model = 6 min_interval_s_per_model = 0.0 # --- Unified transport v2: OpenRouter-only in production, with Chutes still supported as a fallback path. --- chutes_try_s = 2.0 chutes_max_s = 150.0 or_timeout_s = 60.0 or_retries = 3 judge_total_s = 180.0 chutes_giveup_tasks = 2 [judge.rate_limits] # high fanout for staged judge phase "minimax/minimax-m3" = { max_concurrency = 64, min_interval_s = 0.0 } "deepseek/deepseek-v4-flash" = { max_concurrency = 64, min_interval_s = 0.0 } "z-ai/glm-5.1" = { max_concurrency = 64, min_interval_s = 0.0 } [judge.fallback] # OpenRouter provider enabled = true base_url = "https://openrouter.ai/api" # client appends /v1/chat/completions api_key_env = "OPENROUTER_API_KEY" chutes_429_grace_s = 4.0 reasoning_models = [] [judge.fallback.model_map] # judge id → OpenRouter model id "minimax/minimax-m3" = "minimax/minimax-m3" "deepseek/deepseek-v4-flash" = "deepseek/deepseek-v4-flash" "z-ai/glm-5.1" = "z-ai/glm-5.1" [duel] n_samples = 128 # SWE-ZERO trajectories per duel max_turns_per_sample = 10 # max turns per trajectory gate_alpha = 0.05 # bootstrap LCB gate threshold bootstrap_resamples = 10000 gen_temperature = 1.0 gen_max_tokens = 1024 gen_max_model_len = 32768 king_chain_depth = 5 # rolling payout: current + 4 past kings duel_budget_s = 11500.0 # soft wall-budget — stop launching new turns past this win_margin = 2.0 # challenger must beat king by ≥2 point (0–100 scale) [preeval] similarity_threshold = 0.95 # cosine sim ≥ this → duplicate rejection injection_probes = 2 # fast pre-duel injection probe count injection_judges = ["deepseek/deepseek-v4-flash"] # fast probe judge; duel still uses [judge].models [dataset] repo = "AlienKevin/SWE-ZERO-12M-trajectories" shard_glob = "data/train-*.parquet" manifest_sha256 = "982a92bd..." # dataset integrity pin ``` --- ## Environment variables ### Validator | Variable | Default | Purpose | |---|---|---| | `ALBEDO_NETUID` | **0** (aborts) | Must set to `97` | | `ALBEDO_NETWORK` | `finney` | Bittensor network | | `BT_WALLET_NAME` | `default` | Validator wallet name | | `BT_WALLET_HOTKEY` | `default` | Validator hotkey name | | `ALBEDO_EVAL_SERVER` | `http://localhost:9001` | Eval server endpoint (legacy alias: `ALBEDO_EVAL_SERVER_URL`) | | `ALBEDO_R2_ENDPOINT` | — | Cloudflare R2 endpoint (private state) | | `ALBEDO_R2_BUCKET` | — | R2 bucket name | | `ALBEDO_R2_ACCESS_KEY` | — | R2 access key | | `ALBEDO_R2_SECRET_KEY` | — | R2 secret key | | `ALBEDO_DS_BUCKET` | — | Hippius public dashboard bucket | | `ALBEDO_DS_ACCESS_KEY` | — | Hippius S3 access key | | `ALBEDO_DS_SECRET_KEY` | — | Hippius S3 secret key | | `ALBEDO_WEIGHT_INTERVAL` | `300` | Blocks between weight sets | | `ALBEDO_BURN_UID` | `0` | UID to burn weights to when no registered king | | `ALBEDO_RESET` | `0` | `1`/`true`/`yes` = one-shot reset on startup (guarded by `reset_marker.json`); `force` bypasses the guard | | `ALBEDO_EVAL_HARD_TIMEOUT_S` | `14000` | Hard cap (s) on one eval's SSE stream before the validator kills it | | `ALBEDO_STREAM_IDLE_WARN_S` | `600` | Warn after this many seconds of stream silence | | `ALBEDO_STREAM_IDLE_KILL_S` | `1500` | Kill the eval after this many seconds of stream silence | ### Eval server | Variable | Default | Purpose | |---|---|---| | `CHUTES_API_KEY` | required only if Chutes enabled | Legacy/optional Chutes judge auth | | `CHUTES_BASE_URL` | `https://llm.chutes.ai/v1` | Judge API base URL | | `OPENROUTER_API_KEY` | required | OpenRouter judge auth | | `ALBEDO_JUDGE_FALLBACK` | `1` (from chain.toml `[judge.fallback].enabled`) | Toggle OpenRouter transport | | `ALBEDO_JUDGE_CHUTES_ENABLED` | `1` | Set `0` on eval host for OpenRouter-only judging | | `ALBEDO_JUDGE_MODEL_MAX_CONCURRENCY` | `6` | Default per-model in-flight cap override | | `ALBEDO_JUDGE_MODEL_MIN_INTERVAL_S` | `0.0` | Default per-model min spacing override | | `ALBEDO_DUEL_BUDGET_S` | `11500.0` | Soft duel wall-budget (s) | | `ALBEDO_MAX_PARALLEL_TURNS` | `8` | Generation concurrency; eval host uses `64` for staged full-fanout tests | | `HIPPIUS_HUB_TOKEN` | required | Model download auth (or `HIPPIUS_HUB_USERNAME`+`HIPPIUS_HUB_PASSWORD`) | | `ALBEDO_DATASET_DIR` | `/root/albedo/dataset` | Local SWE-ZERO parquet directory | | `ALBEDO_KING_GPUS` | `0` | CUDA_VISIBLE_DEVICES for king vLLM | | `ALBEDO_CHAL_GPUS` | `1` | CUDA_VISIBLE_DEVICES for challenger vLLM | | `ALBEDO_GPU_MEMORY_UTILIZATION` | `0.85` | vLLM memory fraction | | `ALBEDO_MODEL_CACHE_DIR` | `/root/albedo/hippius_models` | Model weight cache | | `ALBEDO_EVALS_S3_BUCKET` | — | Bucket for published eval traces | | `ALBEDO_EVALS_S3_ENDPOINT` | — | S3 endpoint for eval traces | ### Miner | Variable | Default | Purpose | |---|---|---| | `ALBEDO_NETUID` | **0** (aborts) | Must set to `97` | | `ALBEDO_NETWORK` | `finney` | Bittensor network | | `BT_WALLET_NAME` | `albedo` | Miner wallet name | | `HIPPIUS_HUB_TOKEN` | required | Upload auth | | `ALBEDO_CHALLENGER_NAMESPACE` | `miner` | Your Hippius namespace | --- ## Quick push: miner.py (all-in-one) `miner.py` is a standalone one-shot path that runs the full miner pipeline in a single command: discover king (from the dashboard) → download king weights → `train_or_perturb()` → validate locally → upload to Hippius → submit the v4 reveal on-chain. Use it for a smoke test of the round trip; for a real submission, replace `train_or_perturb()` with actual training (see the "Complete flow" below, or `scripts/train_sft.py`). ```bash ALBEDO_NETUID=97 \ HIPPIUS_HUB_TOKEN=your_token \ BT_WALLET_NAME=albedo \ ALBEDO_CHALLENGER_NAMESPACE=youruser \ python miner.py --hotkey h0 ``` **Flags:** | Flag | Default | Purpose | |---|---|---| | `--hotkey` | `h0` | Miner hotkey (under `BT_WALLET_NAME`); also the default repo suffix | | `--suffix` | `` | Repo suffix → repo is `{namespace}/albedo-qwen3-4b-{suffix}` | | `--noise` | `0.001` | Gaussian-noise stddev for the `train_or_perturb()` stub | Required env: `ALBEDO_NETUID=97` (aborts if `0`), `HIPPIUS_HUB_TOKEN` (upload auth). The repo name must match `repo_pattern` (`^[^/]+/albedo-qwen3-4b-.+$`) or the run aborts. On success: `reveal committed — validator picks up within ~30 s`. > ⚠️ The bundled `train_or_perturb()` only copies the king and adds gaussian noise — it > will **not** beat a trained king. It exists to validate the upload/reveal mechanics. The reveal is committed via `subtensor.set_commitment(wallet, netuid, payload)` where `payload = build_reveal_v4(ref)` → `v4|{repo}|{digest}` (3 fields; the hotkey is the on-chain commit signer, not embedded in the payload — see `albedo/models/reveal.py`). --- ## Verify your reveal is on-chain After pushing (via `miner.py` or `scripts/upload_challenger.py` + a manual reveal), confirm the commitment actually landed on chain with `scripts/check_commits.py`. It's a read-only inspector — it scans `get_all_commitments(netuid)`, decodes every `v4|...` reveal, and prints repo / digest / reveal-block per hotkey. It does **not** touch validator state, the king, or the dashboard. ```bash cd /path/to/albedo source .venv/bin/activate # All v4 commits currently on chain python scripts/check_commits.py # Just your hotkey python scripts/check_commits.py --hotkey 5GcD3Pk5XscAESmb... ``` Output is a table (`HOTKEY REPO DIGEST [block status]`) plus a summary line: `scanned N commitments — v4=… non_v4=… spoofed=… invalid=…`. - **status `ok`** — valid reveal; the validator will pick it up on its next chain scan (~20–30 s). - **status `SPOOFED`** — 4-part reveal whose embedded author ≠ the committing chain hotkey. The validator burns these (`spoof_rejected`); re-commit with your own hotkey as the author. - **no rows for your hotkey** — the commit didn't land (or hasn't propagated yet); re-run after a few seconds, and check the reveal string / wallet used. Env: `ALBEDO_NETUID` (default `97`), `ALBEDO_NETWORK` (default `finney`) — same names the validator and miner use. --- ## Complete flow: new challenger model ### Step 1 — Collect training data Downloads all public `eval-*.jsonl.gz` traces from Hippius, extracts turns where the challenger beat the king (`delta_avg > min_delta`), formats them for SFT. ```bash python scripts/collect_traces.py \ --out data/traces.jsonl \ --min-delta 0.05 \ --cache-dir data/cache ``` Each output line: `{"text": "", "delta_avg": 0.12, "eval_id": "eval-0042"}` The `text` field is the full conversation including the winning challenger reply, formatted with Qwen3 chat template. Train the model to predict this reply. ### Step 2 — Inspect dataset ```bash python scripts/inspect_dataset.py data/traces.jsonl ``` Shows: example count, length distribution, delta histogram, and warnings for too few examples (<200), dominant single eval (>40%), or missing `text` fields. ### Step 3 — Fine-tune **Single A100:** ```bash python scripts/train_sft.py \ --base Qwen/Qwen3-4B \ --data data/traces.jsonl \ --output checkpoints/v1 \ --epochs 3 ``` **4×A100 with DeepSpeed:** ```bash torchrun --nproc_per_node=4 scripts/train_sft.py \ --base Qwen/Qwen3-4B \ --data data/traces.jsonl \ --output checkpoints/v1 \ --deepspeed configs/ds_zero2.json \ --batch-size 4 \ --grad-accum 2 ``` Checkpoint saved to `checkpoints/v1/final/`. ### Step 4 — Sanity check ```bash python scripts/sanity_check.py checkpoints/v1/final ``` Runs 3 SWE-style prompts and verifies: non-empty reply, exactly one bash block, no injected verdict JSON. A model that fails format gets score 0.0 from every judge on every turn. ### Step 5 — Upload to Hippius ```bash HIPPIUS_HUB_TOKEN=your_token python scripts/upload_challenger.py \ --model checkpoints/v1/final \ --repo youruser/albedo-qwen3-4b-v1 \ --hotkey 5GcD3Pk5XscAESmb... ``` Repo name must match `repo_pattern` from `chain.toml`: `^[^/]+/albedo-qwen3-4b-.+$` Valid: `alice/albedo-qwen3-4b-v1`, `bob/albedo-qwen3-4b-sft` ### Step 6 — Submit reveal on-chain The script prints the reveal string. Submit it: ```python import bittensor as bt wallet = bt.Wallet(name="default", hotkey="default") subtensor = bt.Subtensor(network="finney") result = subtensor.commit(wallet, 97, "v4|youruser/albedo-qwen3-4b-v1|sha256:...|5GcD3...") print("Committed:", result) ``` ### Step 7 — Monitor ```bash # Watch for your eval in the validator log ssh templar "grep 'youruser\|albedo-qwen3' ~/.pm2/logs/albedo-validator-error.log -f" # Dashboard python3 -c " import json, urllib.request d = json.load(urllib.request.urlopen('https://us-east-1.hippius.com/albedo/dashboard.json')) print('King:', d['king'].get('model_repo'), '| Queue:', len(d.get('queue', []))) " ``` --- ## Validator flow (internal) ``` chain scan (every ~20s) scan_reveals() → accept legacy 3-part (v4|repo|digest) → filter seen hotkeys, completed repos, king hotkeys → spoof detection (4-part only): author_hotkey ≠ chain_hotkey → burn + record_failure("spoof_rejected") enqueue() → 1-hotkey-1-eval enforcement → same repo from two hotkeys → second blocked at enqueue process_challenge() → validate_challenger_config() (9 checks — repo pattern, sha256, arch, locks, no .py, etc.) → _compute_seed(reveal_block) ← uses commit block, not current block → POST /eval (SSE stream) → eval server: check_fingerprint (cosine sim ≥ 0.95 → duplicate_model; same-hotkey skipped) → eval server: probe_injection (fast profile: 2 turns × deepseek/deepseek-v4-flash; unresolved → untested, fail-closed) → eval server: run_duel (staged: sample 128 → generate all king/challenger replies → judge fanout → aggregate) → eval server: gate-1 (challenger_score − king_score ≥ win_margin=2.0) → eval server: gate-2 (paired bootstrap LCB > 0 at α=0.05) → verdict SSE event → accepted → state.set_king() + /set_king on eval + /prune_cache + set_weights (wait_for_inclusion=False) → always: state.close_eval() in finally ``` **Key guarantees:** - `close_eval()` is always called (finally block) even if `set_king()` raises - `uid_map` is snapshotted before first `await` in `maybe_set_weights` (TOCTOU fix) - King hotkeys excluded from `scan_reveals` via explicit `king_hotkeys` set (genesis king fix) - `seen_hotkeys` S3 failure → `RuntimeError` raised, validator refuses to start --- ## Eval server flow (internal) ``` POST /eval (EvalRequest) ├─ 409 if eval_lock.locked() ├─ parse king/challenger ModelRef from req (repo + digest keys) └─ _stream() generator: acquire eval_lock Gate 1: materialize challenger → start vLLM → wait_healthy() (emits SSE phase:"materialize" keepalive) — on failure → error verdict SSE, return Gate 2: check_fingerprint(chal_dir, STATE.fingerprints, hotkey) same-hotkey entries skipped (own model not a dup); is_dup=True → duplicate verdict SSE, return Gate 3: probe_injection(challenger_url, eval_id, dataset_dir) (emits SSE phase:"probe" keepalive) — not is_clean → injection verdict SSE, return resolve_model_names() once (not per turn) run_duel(samples, king_client, chal_client, ...) emit phase:"generate" and generate all king/challenger replies first (max_parallel controls vLLM generation fanout; eval host uses 64) emit phase:"judge" and fan out all generated turn judge prompts using per-model OpenRouter semaphores from [judge.rate_limits] turn SSE events stream back as judged turns finish aggregate_duel(): metric-first mean → per-judge score → challenger_score vs king_score gate-1: (challenger_score - king_score) >= win_margin (2.0 points, 0–100 scale) gate-2: paired_bootstrap_lcb() LCB > 0 at gate_alpha=0.05 emit verdict SSE add challenger fingerprint (background task, tagged with hotkey) sink.flush() → publish trace to S3 POST /set_king ({"king": {"repo": ..., "digest": ...}}) ├─ idempotency: same model already alive → return ok ├─ materialize_model() (await — yield point) ├─ eval_lock.locked() check AFTER materialize (TOCTOU fix) ├─ king_proc.start() + wait_healthy() └─ fingerprint king in background ``` --- ## Judge calls: OpenRouter-first (`albedo/judge/client.py`) `ChutesJudge.query_judges()` is the shared judge transport for both duel scoring and the injection probe. Production eval is OpenRouter-only by setting `ALBEDO_JUDGE_CHUTES_ENABLED=0`; the Chutes stream-gated path remains available but is not used on the eval host. **Current duel panel (`[judge].models`):** - `minimax/minimax-m3` - `deepseek/deepseek-v4-flash` - `z-ai/glm-5.1` **Probe panel (`[preeval]`):** two injection probe turns judged only by `deepseek/deepseek-v4-flash` for fast fail-closed preflight. Duel scoring still uses the full 3-judge panel. **OpenRouter resolution (`_openrouter`):** judge ids are remapped through `[judge.fallback.model_map]`; `or_timeout_s=60.0s`, `or_retries=3`, and `judge_total_s=180.0s` cap one judge resolution. Exhaustion leaves that judge **unresolved** (`None` → parse failure for scoring; fail-closed `untested` for probes). **Verdict parsing:** `parse_metric_verdict()` accepts raw JSON, fenced ```json blocks, common metric-key aliases (e.g. `Groundedness`, `Task Progress`), and common winner labels (`model 2`, `candidate B`, `challenger`, `draw`, etc.). The parser scores only final assistant content, not reasoning text. **Per-model request shaping** (`[judge.rate_limits]`): each judge gets its own OpenRouter semaphore. The active eval config gives all three judges `max_concurrency=64` for staged fanout. --- ## Admission gates (what gets a challenger rejected) `validate_challenger_config()` in `albedo/validator/admission.py` runs these checks in order before dispatching to the GPU eval box: 1. Repo name matches `REPO_PATTERN` from `chain.toml` 2. Digest starts with `sha256:` (not `hf:`) 3. `config.json` downloadable from Hippius (config-only fetch — fast) 4. `architectures` field matches king (`["Qwen3ForCausalLM"]`) 5. All lock keys match king: `vocab_size`, `model_type`, `max_position_embeddings`, `tie_word_embeddings`, `rope_theta`, `hidden_size`, `num_hidden_layers`, `num_attention_heads`, `num_key_value_heads`, `intermediate_size`, `head_dim` 6. No `auto_map` key in `config.json` 7. No `quantization_config` key in `config.json` — quantized models rejected 8. No `*.py` files in the repo 9. At least one `.safetensors` file The capacity keys (`hidden_size`, `num_hidden_layers`, etc.) lock the competition to the **Qwen3-4B** size class — challengers must match the king's architecture exactly. --- ## Eval data (public duel traces) ``` https://us-east-1.hippius.com/albedo/evals/YYYY-MM-DD/eval-NNNN.jsonl.gz ``` Each file is gzip JSONL. Per-turn fields relevant for training: - `messages_prefix` — conversation history up to this turn - `messages_prompt` — current user turn - `chal_reply` — the challenger's response (training target) - `king_reply` — the king's response (for DPO preference pairs) - `delta_avg` — `chal_score - king_score` averaged across judges - `parse_ok` — False if any judge failed to parse a verdict - `vllm_error` — set if vLLM failed to generate (turn not valid for training) Filter: `delta_avg > 0.05` and `parse_ok=True` and `vllm_error is None` for clean SFT data. --- ## Launching a fresh competition (genesis reset) ```bash # 1. Train or choose genesis model, upload to Hippius python scripts/seed_genesis.py --hf-model Qwen/Qwen3-4B --repo yourorg/albedo-qwen3-4b-genesis # — or for custom fine-tune: python scripts/seed_genesis.py --local-dir checkpoints/genesis/final --repo yourorg/albedo-qwen3-4b-genesis # 2. Paste printed digest into chain.toml [seed].seed_digest # 3. Deploy updated code, then trigger an automated one-shot reset on next start: git push ssh templar "cd /path/to/albedo && git pull && ALBEDO_RESET=1 pm2 restart albedo-validator --update-env" # 4. Verify — logs should show: "crowned new king: reign=#0" ``` --- ## Key API contracts **Reveal format:** `v4|{repo}|{digest}|{author_hotkey_ss58}` (4-part, spoof-checked). Legacy 3-part `v4|{repo}|{digest}` is also accepted — the chain hotkey is treated as the author. **EvalRequest body (POST /eval):** ```json { "eval_id": "eval-000042", "hotkey": "5GcD3...", "seed_hex": "a3f9c2...", "king": {"repo": "org/albedo-qwen3-4b-v1", "digest": "sha256:..."}, "challenger": {"repo": "org/albedo-qwen3-4b-v2", "digest": "sha256:..."} } ``` **Verdict SSE event (event: verdict):** ```json { "eval_id": "eval-000042", "accepted": true, "n_done": 128, "n_valid": 61, "vllm_errors": 3, "challenger_score": 56.2, "king_score": 43.8, "winner": "challenger", "by_judge": {"minimax/minimax-m3": 57.4, "deepseek/deepseek-v4-flash": 55.1, ...}, "by_metric": {"correctness": 55.1, "grounding": 57.0, "progress": 56.8, "protocol": 55.9, "efficiency": 56.2}, "win_margin": 2.0, "margin_ok": true, "mean_delta": 0.062, "lcb": 0.031, "se": 0.018, "gate_alpha": 0.05, "gate_lcb": true, "judge_models": ["minimax/minimax-m3", "deepseek/deepseek-v4-flash", "z-ai/glm-5.1"] } ``` **set_king body (POST /set_king):** ```json {"king": {"repo": "org/albedo-qwen3-4b-v1", "digest": "sha256:..."}} ``` --- ## Links - **Dashboard JSON:** https://us-east-1.hippius.com/albedo/dashboard.json - **Dashboard UI:** https://us-east-1.hippius.com/albedo/index.html - **Eval traces:** https://us-east-1.hippius.com/albedo/evals/ - **SWE-ZERO dataset:** https://huggingface.co/datasets/AlienKevin/SWE-ZERO-12M-trajectories - **Hippius registry:** https://registry.hippius.com