Conclusion
- Measure cost per resolved ticket, including retries and escalations.
- Cheap models work best for FAQ, order-status, and classification flows with strong guardrails.
- Use stronger fallback for ambiguous complaints, refunds, policy-sensitive answers, and long context.
- Track cost by customer, workspace, and conversation before scaling.
What to do next
- Collect 30 anonymized support questions across simple, medium, and hard cases.
- Run them through two low-cost providers plus one fallback model.
- Measure resolved answer rate, hallucination risk, escalation rate, latency, and total tokens.
- Set routing rules: cheap first for simple intent, fallback for low confidence, refunds, regulated topics, or policy-sensitive cases.
- Use OpenLLMAPI when you need one endpoint with per-conversation logs and budgets.
Recommended paths
| Provider | Free / credits | Best for |
|---|---|---|
| DeepSeek | Verify current pricing | Low-cost reasoning for support workflows |
| Qwen DashScope | Signup credits vary | China-friendly bilingual support bots |
| Zhipu GLM | Signup tokens vary | Domestic fallback and GLM experiments |
| SiliconFlow | Free/open routes vary | China-direct multi-model testing |
| OpenLLMAPI | Trial varies | Routing, cost attribution, and fallback |
Global developer checklist
- Confirm whether signup, billing, and API keys work from your country before writing production code.
- Prefer OpenAI-compatible endpoints when you may need to switch models, regions, or providers later.
- Test free credits with a real smoke prompt and record latency, error shape, streaming behavior, and quota burn.
- Keep at least one fallback route for provider outages, model deprecations, and regional access changes.
Production handoff
Track support cost per resolved conversation
Route simple tickets cheaply, fallback hard cases, and attribute AI spend to customers before margins disappear. UTM-tagged signup captures support-chatbot intent.
FAQ
Which provider is cheapest for support bots?
It depends on ticket mix. DeepSeek, Qwen, GLM, and SiliconFlow are common low-cost tests, but accepted conversation cost is the real metric.
Can I use only one cheap model?
Not safely for production. Keep fallback for ambiguous, policy-heavy, or high-value customer cases.
What should I log?
Customer/workspace, route, model, tokens, latency, retries, confidence, escalation, and final resolution outcome.
How do I reduce cost without hurting quality?
Use intent classification, retrieval snippets, short system prompts, cache repeated FAQs, and fallback only when confidence is low.