Conclusion
- Per-customer LLM cost is a gross-margin metric, not only an engineering metric.
- Track feature, route, fallback, and outcome; token count alone cannot explain profitability.
- Retries, invalid JSON, failed agent loops, and fallback calls must be attributed to the same customer/workspace.
- Budget alerts should fire before a customer, plan, or feature becomes unprofitable.
What to do next
- Define a usage event schema with customer, workspace, user, feature, task id, provider, route, model, tokens, retries, latency, status, outcome, and fallback route.
- Normalize provider prices into one internal cost table and refresh it when pricing or cache/off-peak rules change.
- Compute cost per successful task, cost per active customer, AI gross margin by plan, and top runaway workspaces.
- Set soft alerts at 50/80 percent of expected plan margin and hard caps for abnormal loops or abusive usage.
- Attach fallback and retry costs to the original customer task, not to a generic infrastructure bucket.
- Use OpenLLMAPI or shared server middleware so every provider call emits the same log shape and budget policy.
Recommended paths
| Provider | Free / credits | Best for |
|---|---|---|
| Application middleware | Build in-house | Single-provider apps with custom billing logic |
| OpenLLMAPI | Trial varies | Multi-provider logs, routing, fallback traces, and customer-level budgets |
| LLM cost calculator | Free tool | Estimating plan margins before launch |
| Provider dashboards | Included | Account-level spend, not customer or feature margin |
| Pricing exports | Free data | Refreshing internal cost tables and provider comparisons |
Global developer checklist
- Confirm whether signup, billing, and API keys work from your country before writing production code.
- Prefer OpenAI-compatible endpoints when you may need to switch models, regions, or providers later.
- Test free credits with a real smoke prompt and record latency, error shape, streaming behavior, and quota burn.
- Keep at least one fallback route for provider outages, model deprecations, and regional access changes.
Production handoff
Make every LLM call accountable
Route model calls through one compatible endpoint with customer-level logs, fallback traces, budget caps, and UTM-tagged SaaS margin attribution.
FAQ
What fields are mandatory?
At minimum: customer/workspace, feature, route, provider, model, tokens, unit price, retries, fallback route, latency, status, and outcome.
Should fallback cost count against the customer?
Yes. If fallback completed that customer task, attribute it to the same task and also record original route versus final route for tuning.
How do I handle streaming responses?
Write the usage event at request start, then update token counts, status, latency, and final cost when the stream closes or errors.
Can provider dashboards solve customer attribution?
Usually no. They show account spend, but not SaaS-level margin by customer, workspace, plan, feature, or accepted outcome.
What report should founders look at weekly?
Top customers by AI cost, cost per successful task by feature, fallback rate by route, gross margin by plan, and workspaces close to budget caps.