Usage budgets

HQ records every LLM call an agent makes and lets you set per-agent monthly budgets. Budgets reset on the first of each calendar month.

What gets tracked

Token usage

Input tokens, output tokens, cache-read tokens, and cache-write tokens — tracked per call and rolled up per agent per month.

Estimated cost

Calculated from known model pricing. If a model’s pricing isn’t in HQ’s list, usage is recorded as “unmetered” — never silently invented or guessed.

Usage is visible on each agent’s detail page and in the aggregate on the Settings → System page.

Setting a budget

Open the agent detail page

Go to Agents → select an agent → scroll to the Usage & Budget section.

Set a monthly limit

Enter a dollar amount (e.g. $10). This is the hard ceiling for the calendar month. Leave blank for no limit.

Set a warning threshold

Enter a percentage (e.g. 80). A notification fires when the agent crosses this percentage of the monthly limit — before the agent is stopped.

Enable or disable hard cutoff

Toggle Stop agent at limit on or off. This controls what happens when the monthly limit is reached (see below).

What happens at the limit

Setting	Behavior
Warning threshold crossed	A notification is created. The agent keeps working.
Monthly limit reached + hard cutoff ON	The runtime blocks further LLM replies. The dispatcher stops waking the agent for background work. The agent is effectively paused until the next month or until you raise the limit.
Monthly limit reached + hard cutoff OFF	A notification fires, but the agent keeps working. No enforcement.

Start without a hard cutoff to see how much an agent actually uses over a few weeks, then set limits based on real data. It’s easy to underestimate how much an active agent spends.

Budget periods

Budgets roll over automatically on the first of each calendar month (UTC). Usage from the previous month is preserved in agent_usage for historical reporting — it’s never deleted. If an agent hits its hard cutoff mid-month, you can:

Raise the limit in the agent detail page — the agent becomes active again immediately
Wait for the reset — the agent resumes automatically at the start of next month

Model selection and cost

Different models have very different costs per token. Per-agent model selection lets you optimize spend across your fleet:

Assign expensive reasoning models (Claude Opus, GPT-4.1 with high thinking) only to agents doing complex work
Use cheaper fast models (Gemini Flash, Claude Haiku, GPT-4.1 mini) for agents doing simple lookups or repetitive tasks
Use per-task thinking overrides to temporarily boost reasoning for one-off complex tasks without increasing the agent’s baseline cost

Combined with budgets, this gives you precise control: set a generous budget for an agent running an expensive model, and a tight budget for agents on cheap models that shouldn’t be doing much work.

Checking spend across all agents

Go to Settings → System → Usage to see a summary table of all agents with their current-period spend, monthly limit, and how close each one is to its threshold. This view is useful for identifying which agents are consuming the most model budget before costs compound.

Start here

Hosted

Self-host

Use Your HQ

Concepts

Design

Reference

Development

Security and troubleshooting

Usage budgets

What gets tracked

Token usage

Estimated cost

Setting a budget

What happens at the limit

Budget periods

Model selection and cost

Checking spend across all agents

​What gets tracked

Token usage

Estimated cost

​Setting a budget

​What happens at the limit

​Budget periods

​Model selection and cost

​Checking spend across all agents

What gets tracked

Setting a budget

What happens at the limit

Budget periods

Model selection and cost

Checking spend across all agents