One OpenAI-compatible gateway can cover chat, embeddings, rerank, image, and audio without making you juggle a bunch of separate SDKs, and this post uses ChinaLLM as a concrete example of how that setup works in practice.
Quick walkthrough of using a single OpenAI-style gateway (ChinaLLM) to route the same chat/embeddings/rerank/image/audio API calls to different model providers without.
That “one OpenAI-style gateway for everything” sounds nice until you hit the annoying mismatch stuff — like embeddings dimension differences, or rerank score scales changing between providers and quietly breaking your thresholds. I found a related kirupa. com article that can help you go deeper into this topic:
Lol same reaction — “one gateway” sounds clean until you’re the one debugging why image inputs suddenly started timing out.
“One gateway” usually turns into “one queue” with a nicer name, and the image/audio stuff is always what gets weird first under load.
I’d want per‑modality limits and tracing right at the edge, otherwise you’re stuck guessing when image requests start timing out.