Article URL: ZAYA1-8B Matches DeepSeek-R1 on Math with Less Than 1B Active Parameters. - Firethering Comments URL: https://news.ycombinator.com/item.id=48047082 Points: 28 # Comments: 21.
That is lovely
Ha nice. always funny how “8B” turns into “actually ~760M doing the work” once you look at active params.
Look — “8B” is basically storage size, not what’s running per token, so the 760M active number is the one that matters for latency and cost. The part that bites people is you still pay the memory/VRAM bill for the whole 8B at inference unless you’re doing something fancy with offloading.
Yeah, the “active params” number is what your tokens are actually touching, but I’ve watched people get wrecked by KV cache way before weights were the problem once context goes long. You can have a comfy 760M compute path and still run out of VRAM because you cranked batch/concurrency and 16k context like it’s free.
Proper mess
Nice