Const
Gradient step size (α) applied per forward result.
Small by design: Δweight = α × reward × contribution at α = 0.01 produces sub-1% shifts per step, keeping the weight trajectory smooth and preventing oscillation after a burst of unusual outcomes.
Δweight = α × reward × contribution
Gradient step size (α) applied per forward result.
Small by design:
Δweight = α × reward × contributionat α = 0.01 produces sub-1% shifts per step, keeping the weight trajectory smooth and preventing oscillation after a burst of unusual outcomes.