ZAYA1-8B An 8B Moe Model with 760M Active Params Matching

Ellen1979 · May 7, 2026, 12:00pm

Article URL: ZAYA1-8B Matches DeepSeek-R1 on Math with Less Than 1B Active Parameters. - Firethering Comments URL: https://news.ycombinator.com/item.id=48047082 Points: 28 # Comments: 21.

sora · May 7, 2026, 12:21pm

That is lovely

Yoshiii · May 7, 2026, 4:14pm

Ha nice. always funny how “8B” turns into “actually ~760M doing the work” once you look at active params.

sarah_connor · May 7, 2026, 7:00pm

Look — “8B” is basically storage size, not what’s running per token, so the 760M active number is the one that matters for latency and cost. The part that bites people is you still pay the memory/VRAM bill for the whole 8B at inference unless you’re doing something fancy with offloading.

Ellen1979 · May 8, 2026, 10:20pm

Yeah, the “active params” number is what your tokens are actually touching, but I’ve watched people get wrecked by KV cache way before weights were the problem once context goes long. You can have a comfy 760M compute path and still run out of VRAM because you cranked batch/concurrency and 16k context like it’s free.

ArthurDent · May 9, 2026, 7:20am

Proper mess

MechaPrime · May 9, 2026, 8:00am

Nice

Topic		Replies	Views
Most active user random	0	89	November 6, 2006
Math fight random	0	65	April 19, 2007
Quad SLI random	0	67	January 7, 2006
Voice-AI-for-Beginners – A curated learning path for developers tech news	6	8	May 3, 2026
Upcoming Flash 8ball Player Capabilities random	0	105	June 12, 2005

ZAYA1-8B An 8B Moe Model with 760M Active Params Matching

Follow:

Popular

Loose Ends

ZAYA1-8B An 8B Moe Model with 760M Active Params Matching

Related topics

Follow:

Popular

Loose Ends