A few months ago I was deep in building AI agents. Not prototypes — real workflows that ran daily, made API calls, processed data, and actually did things. I was excited about what they could do. I was not excited about my API bill.
Every week it grew. Not because my agents were doing more useful work. Not because I was scaling. Just because every single call defaulted to Sonnet or GPT-4o. Classification tasks. Simple lookups. Yes/no decisions. All routed to the most expensive model available — because that was the default, and nobody had told my agent there was another option.
I started looking for a solution. I knew the space. LiteLLM, Portkey, OpenRouter — I had used all of them. They're solid tools. Good documentation, active communities, serious engineering behind them.
But none of them solved my problem.
The Reason Is Structural
At first I thought I was missing something. Maybe there was a setting I hadn't found. A config option buried in the docs. Some routing rule I could write.
There wasn't.
And when I thought about it longer, I realized why.
Every major LLM router earns a margin on paid API calls. That's their business. When you route a call through OpenRouter to GPT-4o, they take a small cut. When you route to Claude Sonnet via LiteLLM, same thing. The more you spend, the more they earn.
Routing you to Gemini Flash for free — or Llama via Groq for free — literally destroys their revenue on that call. They have no incentive to build that. In fact, they have the opposite incentive.
This isn't a criticism. It's just how their business model works. But it means that no provisioned-based router will ever prioritize free alternatives. Not because they can't — because they won't. Structurally, they can't afford to.
Once I understood that, I stopped looking for a setting and started building.
What I Built
The idea is simple: a routing layer that sits between my agents and every API call, and asks one question before each request — does this task actually need a paid model?
If the answer is no, it routes to the best free alternative. If the answer is yes, it pays.
The key word is "conservative." When the router is unsure, it defaults to paid. No silent quality drops. No gambling with important tasks. The goal isn't to route everything to free — it's to route to free exactly when it's safe to do so, and pay exactly when it isn't.
In practice that means:
- A classification task — urgent or not urgent, positive or negative — goes to Gemini Flash or Llama via Groq. Fast, free, good enough.
- A geocoding request for an internal data pipeline goes to Nominatim. Free, accurate, no API key required.
- A complex reasoning task, or anything where the output directly affects something important — that gets the model it needs, paid, no shortcuts.
The routing isn't static. It learns. Every call is a data point. Over time the system starts to build a picture of which task types need a strong model and which don't. It gets more precise the more it runs. Not through manual configuration — through observation.
The Business Model Is the Moat
Here's what I find most interesting about this: the reason this works as a product is the same reason nobody else has built it.
Hebline earns from subscriptions — not from margins on API calls. That one structural difference changes everything. I have no incentive to route you to paid models. In fact, the more I save you, the more obvious the value of the subscription becomes.
That means I can make a promise that no provisioned-based router can honestly make:
We are the only router built to actually save you money. Not as a feature. As the entire point.
Ship Once, Stay Current
There's one more thing I didn't expect to matter as much as it does.
The AI landscape changes constantly. New models drop every few weeks. Free tiers appear and disappear. Pricing changes overnight. Models get deprecated.
Every time something changes, developers have to react. Update configs. Test new models. Rewrite routing logic. It's invisible maintenance work that adds up.
With Hebline, that stops being your problem. When a better free model appears, the router finds it. When a provider changes pricing, the routing adjusts. Your agent code stays the same.
You ship once. The router keeps up.
Where It Stands Now
Hebline is live as an MCP server — installable in about 30 seconds in Claude Desktop, Claude Code, Cursor, or any MCP-compatible client.
What's routing today:
- LLMs: Gemini Flash, DeepSeek, Groq (Llama), local Ollama
- APIs: Geocoding, Translation, Web Search, OCR, Weather, Currency, Web Scraping
The core is open source. The routing logic, the adapter system, the learning layer — all of it is on GitHub and open for contributions.
I built this because I was frustrated. I kept looking for something that already existed and didn't find it. Now it exists.
If you're building agents and watching your bill grow for no good reason — this is for you.
Install the MCP server: github.com/Hebline/mcp-server
Learn more: hebline.ai