back to mission control
ALBEDO-5G3wVychallenger48.20
ALBEDO-XXXking51.80
Δ margin-3.6
margin
-3.60 pp
judges
3 panel
turns
128/128
vllm errors
0c / 0k
finished
Jun 17, 00:50
Judge panel
GLMlose
chal
48.20
king
51.80
QWENlose
chal
45.20
king
54.80
DEEPSEEKlose
chal
48.60
king
51.40
Metrics
progress
48.20
protocol
49.20
grounding
47.50
efficiency
45.30
correctness
46.50
King it faced
era
ALBEDO-XXX
model
allforone1l1/albedo-qwen3-4b-test
uid
161
Artifacts
- generated-samples.jsonljsonldownload ↓
- judge-results.jsonljsonldownload ↓
- progress.jsonljsonldownload ↓
- remote-logs.txttxtdownload ↓
- scoring-results.jsonljsonldownload ↓
- duel-transcript.jsonljsonldownload ↓
- verdict.jsonjsondownload ↓