You are not the first person who has asked about it.
It looks like a useful feature to have. Therefore, I'll dig into this topic more broadly over the next few days and let you know here whether, and possibly when, we plan to add it.
Curious how the semantic caching layer works.. are you embedding requests on the gateway side and doing a vector similarity lookup before proxying? And if so, how do you handle cache invalidation when the underlying model changes or gets updated?
Hey, contributor here. That's right, GoModel embeds requests and does vector similarity lookup before proxying. Regarding the cache invalidation, there is no "purging" involved – the model is part of the namespace (params_hash includes the LLM model, path, guardrails hash, etc). TTL takes care of the cleanup later.
This is really useful. I've been building an AI platform (HOCKS AI) where I route different tasks to different providers — free OpenRouter models for chat/code gen, Gemini for vision tasks. The biggest pain point has been exactly what you describe: switching models without changing app code.
One thing I'd love to see is built-in cost tracking per model/route. When you're mixing free and paid models, knowing exactly where your spend goes is critical. Do you have plans for that in the dashboard?
However kudos for the project, we need more alternatives in compiled languages.
It looks like a useful feature to have. Therefore, I'll dig into this topic more broadly over the next few days and let you know here whether, and possibly when, we plan to add it.
One thing I'd love to see is built-in cost tracking per model/route. When you're mixing free and paid models, knowing exactly where your spend goes is critical. Do you have plans for that in the dashboard?
However IIUC what you're asking for - it's already in the dashboard! Check the Usage page.
Are there even any benchmarks?
It's more lightweight and simpler. The Bifrost docker image looks 4x larger, at least for now.
IMO GoModel is more convenient for debugging and for seeing how your request flows through different layers of AI Gateways in the Audit Logs.