What gets tracked
Token usage
Input tokens, output tokens, cache-read tokens, and cache-write tokens — tracked per call and rolled up per agent per month.
Estimated cost
Calculated from known model pricing. If a model’s pricing isn’t in HQ’s list, usage is recorded as “unmetered” — never silently invented or guessed.
Setting a budget
Set a monthly limit
Enter a dollar amount (e.g.
$10). This is the hard ceiling for the calendar month. Leave blank for no limit.Set a warning threshold
Enter a percentage (e.g.
80). A notification fires when the agent crosses this percentage of the monthly limit — before the agent is stopped.What happens at the limit
| Setting | Behavior |
|---|---|
| Warning threshold crossed | A notification is created. The agent keeps working. |
| Monthly limit reached + hard cutoff ON | The runtime blocks further LLM replies. The dispatcher stops waking the agent for background work. The agent is effectively paused until the next month or until you raise the limit. |
| Monthly limit reached + hard cutoff OFF | A notification fires, but the agent keeps working. No enforcement. |
Budget periods
Budgets roll over automatically on the first of each calendar month (UTC). Usage from the previous month is preserved inagent_usage for historical reporting — it’s never deleted.
If an agent hits its hard cutoff mid-month, you can:
- Raise the limit in the agent detail page — the agent becomes active again immediately
- Wait for the reset — the agent resumes automatically at the start of next month
Model selection and cost
Different models have very different costs per token. Per-agent model selection lets you optimize spend across your fleet:- Assign expensive reasoning models (Claude Opus, GPT-4.1 with high thinking) only to agents doing complex work
- Use cheaper fast models (Gemini Flash, Claude Haiku, GPT-4.1 mini) for agents doing simple lookups or repetitive tasks
- Use per-task thinking overrides to temporarily boost reasoning for one-off complex tasks without increasing the agent’s baseline cost

