Connect a model provider

HQ is model-agnostic with a bring-your-own-key approach. You connect your own API keys and pay the model provider directly. You can connect multiple providers and pick different ones per agent.

Your first provider is connected during the onboarding wizard. This page covers adding additional providers or managing existing ones from Settings → Connections.

Where to go: Settings → Connections → Add connection

Supported providers

API key providers
OAuth / interactive providers
Local / self-hosted

These are the simplest to connect — paste a key, done.

Provider	Get your key	Typical cost
OpenAI (GPT-4.1, o3, o4-mini)	platform.openai.com/api-keys	~$0.01–0.06 per 1K tokens
Anthropic (Claude Opus, Sonnet, Haiku)	console.anthropic.com/settings/keys	~$0.003–0.075 per 1K tokens
Google Gemini	aistudio.google.com/app/apikey	~$0.007 per 1K tokens
DeepSeek	platform.deepseek.com	Very low cost
Mistral	console.mistral.ai	Competitive pricing
Groq	console.groq.com/keys	Fast inference, usage-based
OpenRouter	openrouter.ai/keys	Routes to 100+ models via one key
Together AI	api.together.ai	Open-source model hosting
Fireworks AI	fireworks.ai	Fast, cheap open-source models
Perplexity	perplexity.ai	Search-augmented models
xAI (Grok)	console.x.ai	Grok models
Cohere	dashboard.cohere.com	Command R and enterprise models

Not sure which to start with? Anthropic Claude Sonnet or OpenAI GPT-4.1 are the most capable general-purpose choices. OpenRouter is useful if you want to experiment with many models without managing multiple keys.

These require a browser login flow. HQ walks you through it in the UI.

Provider	Notes
OpenAI (Subscription)	OAuth flow — log in with your OpenAI account. Uses your ChatGPT/Codex subscription credits. Same models as the API key, different billing path
GitHub Copilot	Device-code flow — HQ shows a code, you enter it on GitHub

For these providers, HQ opens an interactive auth flow in the Connections UI. Follow the on-screen prompts.

OpenAI API key vs subscription: Both serve the same models (GPT-4.1, o3, o4-mini, etc.). The API key bills per-token to your OpenAI platform account. The subscription route uses your ChatGPT Plus/Pro/Team credits — “free at the margin” if you already pay for a subscription. If you connect both, the model picker shows each model with “Subscription” and “API” labels so you can choose. Codex Mini is subscription-exclusive and only appears when the subscription connection is active.

Run models on your own machine — no API costs, fully private.

Provider	What it is	Setup
Ollama	Easy local model runner	See walkthrough below
LM Studio	GUI app for local models	See walkthrough below
vLLM	High-performance serving	Set URL to your vLLM endpoint
Any OpenAI-compatible server	LiteLLM, LocalAI, etc.	Set base URL in HQ

Step-by-step: adding a provider

Open Connections

Go to Settings → Connections in the HQ sidebar.

Click Add connection

Select your provider from the catalog. Each card shows the auth method.

Complete auth

API key: paste your key, click Connect
OAuth: click Start, follow the browser flow, return to HQ
Local URL: enter the endpoint URL (see Ollama/LM Studio guides below)

Set a default model

After connecting, you’ll be prompted to pick a default model. This applies to all agents unless overridden per-agent.

Ollama (free, runs on your machine)

Ollama lets you run models like Llama 3, Mistral, Gemma, and others locally with no API costs. Everything stays on your machine.

Install Ollama

Download from ollama.com and install it.

Pull a model

ollama pull llama3.2        # fast, general purpose
ollama pull mistral         # good at instruction following
ollama pull qwen2.5-coder   # strong at code

Check available models at ollama.com/library.

Connect in HQ

Go to Settings → Connections → Add connection → Ollama. Enter this URL:

http://host.docker.internal:11434

host.docker.internal is how Docker reaches your Mac/Windows machine. On Linux, use your host IP (e.g. http://172.17.0.1:11434).

Verify

Click Connect. HQ probes the endpoint and lists available models. Pick one as the default.

Local models need enough RAM to load. Llama 3.2 (3B) needs ~2 GB; Llama 3.1 (8B) needs ~6 GB; Llama 3.1 (70B) needs ~48 GB. If Ollama is slow or crashes, try a smaller model.

LM Studio

LM Studio is a Mac/Windows/Linux app with a GUI for downloading and running local models.

Install and load a model

Download from lmstudio.ai, open it, browse the model library, and download a model.

Start the local server

In LM Studio, go to Local Server (left sidebar) → click Start Server. Note the port (default is 1234).

Connect in HQ

Go to Settings → Connections → Add connection → LM Studio (or use “OpenAI-compatible”). Enter:

http://host.docker.internal:1234

Per-agent model overrides

Each agent can use a different provider and model. This lets you run a cheap local model for a background researcher while a cofounder agent uses Claude Opus. To override for one agent: open the agent’s detail page → look for the Model section in the right rail → pick a model and thinking level.

Model selection

The model picker shows models grouped by provider, filtered to only providers you’ve connected. If you have multiple connections that serve the same models (e.g. an OpenAI API key and an OpenAI subscription), the picker shows one unified “OpenAI” group that routes through whichever connection you have. If both are connected, each model appears with “Subscription” and “API” route labels so you can choose which billing path to use.

Thinking level

For models that support extended thinking (Claude, o-series), you can set a thinking level per agent:

Level	Behavior
None	No extended thinking (fastest, cheapest)
Low	Brief internal reasoning
Medium	Moderate reasoning depth
High	Maximum reasoning depth (most capable, highest cost)

Per-task overrides

When creating or editing a task assigned to an agent, you can override the thinking level for that specific task. This is useful for one-off complex tasks that need deeper reasoning without changing the agent’s default. Task-level overrides are passed to the agent session at wake time via the inbox dispatch mechanism.

Resolution order

The model used for any given agent session follows this cascade:

Per-task override (if the wake was triggered by a task with model_override or thinking_override)
Agent default (set in the agent detail sidebar)
Gateway default (the first connected model, or the workspace default from Settings → Connections)

Mix and match freely. A common setup: one expensive model for agents doing complex reasoning, a cheap/fast model for agents doing simple lookups, and Ollama for agents that handle private data.

Rotating or removing a provider

To remove a provider or rotate keys:

Go to Settings → Connections
Find the provider → click Remove or Update key
For a key rotation: click Remove, then re-add with the new key

If you remove a provider that’s set as the default for some agents, those agents will fail until you set a new default.

Start here

Hosted

Self-host

Use Your HQ

Concepts

Design

Reference

Development

Security and troubleshooting

Connect a model provider

Supported providers

Step-by-step: adding a provider

Ollama (free, runs on your machine)

LM Studio

Per-agent model overrides

Model selection

Thinking level

Per-task overrides

Resolution order

Rotating or removing a provider

​Supported providers

​Step-by-step: adding a provider

​Ollama (free, runs on your machine)

​LM Studio

​Per-agent model overrides

​Model selection

​Thinking level

​Per-task overrides

​Resolution order

​Rotating or removing a provider

Supported providers

Step-by-step: adding a provider

Ollama (free, runs on your machine)

LM Studio

Per-agent model overrides

Model selection

Thinking level

Per-task overrides

Resolution order

Rotating or removing a provider