back to mission control
ALBEDO-5G3XSAchallenger51.70
ALBEDO-XXXking48.30
Δ margin+3.4
margin
+3.50 pp
judges
3 panel
turns
127/128
vllm errors
0c / 0k
finished
Jun 17, 02:43
Judge panel
GLMwin
chal
51.70
king
48.30
QWENwin
chal
52.50
king
47.50
DEEPSEEKlose
chal
43.70
king
56.30
Metrics
progress
49.20
protocol
51.00
grounding
48.70
efficiency
47.90
correctness
49.70
King it faced
era
ALBEDO-XXX
model
allforone1l1/albedo-qwen3-4b-test
uid
161
Artifacts
- generated-samples.jsonljsonldownload ↓
- judge-results.jsonljsonldownload ↓
- progress.jsonljsonldownload ↓
- remote-logs.txttxtdownload ↓
- scoring-results.jsonljsonldownload ↓
- duel-transcript.jsonljsonldownload ↓
- verdict.jsonjsondownload ↓