One gateway for AI models and multimodal tasks

One OpenAI-compatible gateway can cover chat, embeddings, rerank, image, and audio without making you juggle a bunch of separate SDKs, and this post uses ChinaLLM as a concrete example of how that setup works in practice.

Quick walkthrough of using a single OpenAI-style gateway (ChinaLLM) to route the same chat/embeddings/rerank/image/audio API calls to different model providers without.

That “one OpenAI-style gateway for everything” sounds nice until you hit the annoying mismatch stuff — like embeddings dimension differences, or rerank score scales changing between providers and quietly breaking your thresholds. I found a related kirupa. com article that can help you go deeper into this topic:

Oh nice

Lol same reaction — “one gateway” sounds clean until you’re the one debugging why image inputs suddenly started timing out.

“One gateway” usually turns into “one queue” with a nicer name, and the image/audio stuff is always what gets weird first under load.

I’d want per‑modality limits and tracing right at the edge, otherwise you’re stuck guessing when image requests start timing out.