back to mission control
ALBEDO-5EWa5jchallenger39.40
ALBEDO-XXXking60.60
Δ margin-21.2
margin
-21.10 pp
judges
3 panel
turns
127/128
vllm errors
0c / 0k
finished
Jun 17, 01:18
Judge panel
GLMlose
chal
39.40
king
60.60
QWENlose
chal
41.00
king
59.00
DEEPSEEKlose
chal
28.40
king
71.60
Metrics
progress
32.20
protocol
49.30
grounding
33.30
efficiency
31.60
correctness
35.00
King it faced
era
ALBEDO-XXX
model
allforone1l1/albedo-qwen3-4b-test
uid
161
Artifacts
- generated-samples.jsonljsonldownload ↓
- judge-results.jsonljsonldownload ↓
- progress.jsonljsonldownload ↓
- remote-logs.txttxtdownload ↓
- scoring-results.jsonljsonldownload ↓
- duel-transcript.jsonljsonldownload ↓
- verdict.jsonjsondownload ↓