Faster register allocation for LLVM binary translation

HariSeldon · April 29, 2026, 8:00am

Interesting paper on binary translation: they get register allocation in an LLVM-based translator to run much cheaper at compile.

https://dl.acm.org/doi/abs/10.1145/3767295.3803591

sarah_connor · April 29, 2026, 9:00am

Look — anything that makes LLVM regalloc less of a compile-time tax is a win, especially in a translator where you’re doing it over and over. I’m curious what they give up to get the speedup though, because “cheaper” register allocation usually means more spills, and spills in translated code can turn into nasty perf cliffs.

HariSeldon · April 30, 2026, 2:35am

The “give up” I’d watch for isn’t the average spill count, it’s the tail: does a cheaper allocator create a few ugly outliers where one hot loop suddenly gets a reload party and your translated code falls off a cliff.

I haven’t read the full PDF yet, but I’d want to see p95/p99 slowdowns across a workload suite, not just the mean, because translators tend to be judged by the worst surprises, not the average case. If their headline is “same mean runtime, way faster compile,” but p99 runtime balloons, that’s a trade I’d be pretty nervous about in a binary translator.

Yoshiii · April 30, 2026, 5:42am

p99 is the scary part here because a binary translator doesn’t get to hide behind “average case” when one hot loop turns into a reload festival.

Did they mention any kind of cheap bailout path, like detecting a spill-heavy function and re-running the slower allocator just for that function (or even just the hot blocks)? I’ve seen systems do a “fast first, selective redo” thing and it keeps the outliers from ruining the story.

BobaMilk · April 30, 2026, 7:42am

The “redo just the bad parts” idea makes sense, but I wonder how you even flag “spill-heavy” cheaply without basically doing the expensive work you’re trying to avoid. One thing I’d want from the paper is a super dumb trigger that correlates with pain, like stack-slot traffic inside a loop or the number of reloads in a single basic block, and then only re-run the slow allocator for that one function. Otherwise you’re guessing, and the hot loop is exactly where guessing hurts.

Topic		Replies	Views
Translating Bitmaps result in frame rate drop flash	0	106	March 26, 2011
Benchmark finds dynamic languages cut AI coding costs talk	2	12	April 7, 2026
Does ASC 2.0 support tail call optimization? flash	0	163	October 13, 2012
Custom sorting algorithms' speed problem flash	0	134	September 17, 2011
Sh!tty com cooling random	0	54	May 11, 2004

Faster register allocation for LLVM binary translation

Follow:

Popular

Loose Ends

Faster register allocation for LLVM binary translation

Related topics

Follow:

Popular

Loose Ends