Skip to main content
HQ is model-agnostic with a bring-your-own-key approach. You connect your own API keys and pay the model provider directly. You can connect multiple providers and pick different ones per agent.
Your first provider is connected during the onboarding wizard. This page covers adding additional providers or managing existing ones from Settings → Connections.
Where to go: Settings → Connections → Add connection

Supported providers

These are the simplest to connect — paste a key, done.
ProviderGet your keyTypical cost
OpenAI (GPT-4.1, o3, o4-mini)platform.openai.com/api-keys~$0.01–0.06 per 1K tokens
Anthropic (Claude Opus, Sonnet, Haiku)console.anthropic.com/settings/keys~$0.003–0.075 per 1K tokens
Google Geminiaistudio.google.com/app/apikey~$0.007 per 1K tokens
DeepSeekplatform.deepseek.comVery low cost
Mistralconsole.mistral.aiCompetitive pricing
Groqconsole.groq.com/keysFast inference, usage-based
OpenRouteropenrouter.ai/keysRoutes to 100+ models via one key
Together AIapi.together.aiOpen-source model hosting
Fireworks AIfireworks.aiFast, cheap open-source models
Perplexityperplexity.aiSearch-augmented models
xAI (Grok)console.x.aiGrok models
Coheredashboard.cohere.comCommand R and enterprise models
Not sure which to start with? Anthropic Claude Sonnet or OpenAI GPT-4.1 are the most capable general-purpose choices. OpenRouter is useful if you want to experiment with many models without managing multiple keys.

Step-by-step: adding a provider

1

Open Connections

Go to Settings → Connections in the HQ sidebar.
2

Click Add connection

Select your provider from the catalog. Each card shows the auth method.
3

Complete auth

  • API key: paste your key, click Connect
  • OAuth: click Start, follow the browser flow, return to HQ
  • Local URL: enter the endpoint URL (see Ollama/LM Studio guides below)
4

Set a default model

After connecting, you’ll be prompted to pick a default model. This applies to all agents unless overridden per-agent.

Ollama (free, runs on your machine)

Ollama lets you run models like Llama 3, Mistral, Gemma, and others locally with no API costs. Everything stays on your machine.
1

Install Ollama

Download from ollama.com and install it.
2

Pull a model

ollama pull llama3.2        # fast, general purpose
ollama pull mistral         # good at instruction following
ollama pull qwen2.5-coder   # strong at code
Check available models at ollama.com/library.
3

Connect in HQ

Go to Settings → Connections → Add connection → Ollama. Enter this URL:
http://host.docker.internal:11434
host.docker.internal is how Docker reaches your Mac/Windows machine. On Linux, use your host IP (e.g. http://172.17.0.1:11434).
4

Verify

Click Connect. HQ probes the endpoint and lists available models. Pick one as the default.
Local models need enough RAM to load. Llama 3.2 (3B) needs ~2 GB; Llama 3.1 (8B) needs ~6 GB; Llama 3.1 (70B) needs ~48 GB. If Ollama is slow or crashes, try a smaller model.

LM Studio

LM Studio is a Mac/Windows/Linux app with a GUI for downloading and running local models.
1

Install and load a model

Download from lmstudio.ai, open it, browse the model library, and download a model.
2

Start the local server

In LM Studio, go to Local Server (left sidebar) → click Start Server. Note the port (default is 1234).
3

Connect in HQ

Go to Settings → Connections → Add connection → LM Studio (or use “OpenAI-compatible”). Enter:
http://host.docker.internal:1234

Per-agent model overrides

Each agent can use a different provider and model. This lets you run a cheap local model for a background researcher while a cofounder agent uses Claude Opus. To override for one agent: open the agent’s detail page → look for the Model section in the right rail → pick a model and thinking level.

Model selection

The model picker shows models grouped by provider, filtered to only providers you’ve connected. If you have multiple connections that serve the same models (e.g. an OpenAI API key and an OpenAI subscription), the picker shows one unified “OpenAI” group that routes through whichever connection you have. If both are connected, each model appears with “Subscription” and “API” route labels so you can choose which billing path to use.

Thinking level

For models that support extended thinking (Claude, o-series), you can set a thinking level per agent:
LevelBehavior
NoneNo extended thinking (fastest, cheapest)
LowBrief internal reasoning
MediumModerate reasoning depth
HighMaximum reasoning depth (most capable, highest cost)

Per-task overrides

When creating or editing a task assigned to an agent, you can override the thinking level for that specific task. This is useful for one-off complex tasks that need deeper reasoning without changing the agent’s default. Task-level overrides are passed to the agent session at wake time via the inbox dispatch mechanism.

Resolution order

The model used for any given agent session follows this cascade:
  1. Per-task override (if the wake was triggered by a task with model_override or thinking_override)
  2. Agent default (set in the agent detail sidebar)
  3. Gateway default (the first connected model, or the workspace default from Settings → Connections)
Mix and match freely. A common setup: one expensive model for agents doing complex reasoning, a cheap/fast model for agents doing simple lookups, and Ollama for agents that handle private data.

Rotating or removing a provider

To remove a provider or rotate keys:
  1. Go to Settings → Connections
  2. Find the provider → click Remove or Update key
  3. For a key rotation: click Remove, then re-add with the new key
If you remove a provider that’s set as the default for some agents, those agents will fail until you set a new default.