Most SaaS stacks were built for tidy CRUD flows and fixed schemas, so once you try to push AI through them the whole thing gets awkward fast.
AI jobs don’t fit the “click button, write row, return 200” shape most SaaS apps were built around. They’re long-running, flaky by nature, want retries, stream partial output, and sometimes need to resume halfway through without duplicating work.
Look — the second you force that into a normal request/response controller, you’ve quietly signed up to run a queue, a state machine, and an idempotency/“did we already do this?” layer. And if you didn’t design for that up front, it turns into a pile of background workers, mystery timeouts, and support tickets at 3am.
the billing part is where this stops being a clean software problem and turns into a policy problem.
a retry on a normal SaaS endpoint is annoying. a retry on an AI workflow can mean you just paid for the same model call twice, or three times, because the user hit refresh and the job wasn’t clearly marked done yet. i’ve seen teams discover this the hard way after a few “harmless” duplicate runs quietly ate through a month’s budget.
and once that happens, the product starts changing around cost control. limits, cooldowns, cancellations, “are you sure?” prompts… all the stuff users hate, but finance absolutely loves.
The “user hit refresh and we paid twice” thing is brutal, and UI is sneakily part of the fix here. If the frontend doesn’t mint a client-generated idempotency key per run (and then reuse it on refresh/retry), you’re basically guaranteeing duplicate jobs no matter how good your backend queue is. I’ve seen teams dodge a ton of this just by treating “start job” as creating a durable run record first, then everything else (polling, streaming, retries) hangs off that run ID. Do y’all put that run ID in the URL so a refresh lands back on the same run, or is it stuck in local state?