1185µs
Best kernel latency
1.265
Peak reward (step 48)
90.6%
Peak correctness
63 steps
63 steps
Kernel latency & reward over training steps
Correctness per step
Step —
Select a step to view its code.
Model
gpt-oss-120b (Tinker)
Learning rate
4e-5 (constant)
PUCT buffer
82 → 216
Episodes / step
32 (16 failed)
Reference latency
1500µs (reward = 1.0)
Hardware
H100 SXM5, CUDA 12.4
Checkpoint store
Tinker / HF Hub