Knowledge system

HQ uses a single knowledge system — knowledge_items — to store all workspace knowledge: pages, skills, files, and external sources. This replaces the earlier split between separate documents and assets tables.

Kinds

Every knowledge item has a kind that determines how it’s stored and rendered:

Kind	What it is	Content model
`page`	Rich-text document (company briefs, style guides, meeting notes)	JSON content + plain_text
`skill`	Structured procedure or SOP	JSON content + plain_text
`file`	Uploaded file (PDF, image, spreadsheet, audio, video)	`file_url` + `mime_type` + `file_size`
`source`	Externally synced content from a connected integration	`source_connection_id` + `source_external_id` + sync metadata

Pages and skills support rich editing in the UI. Files go through a processing pipeline (upload, extract text, embed). Sources sync from external integrations and track their sync status.

How indexing works

When you create or edit a knowledge item, HQ automatically indexes it so your agents can find it through search. Indexing converts the document into a format optimized for semantic search — agents can find relevant knowledge even when the exact words don’t match. For pages and skills, indexing starts immediately after you save and typically completes within a few seconds. For files (PDF, DOCX, XLSX, CSV, PPTX, TXT), the system first extracts text from the file and then indexes it, which may take a little longer depending on file size. Editing a document automatically re-triggers indexing. You’ll see a status indicator on each knowledge item:

Search ready (green) — fully indexed and searchable by agents.
Text ready (amber) — text has been processed; search embedding is still completing.
Indexing… (spinner) — indexing is in progress.
Index failed (red) — something went wrong. Use the retry button to re-process.

The embedder service handles indexing using a local embedding model that ships pre-loaded in the Docker image — no external API calls or additional setup required.

Source connectors

Source connections use a plugin-based connector architecture. Each provider is a self-contained folder under gateway/connectors/<provider>/ containing:

manifest.json — declarative config for auth, UI metadata, setup steps, and capabilities.
read.py — BaseConnector subclass implementing validate, browse, list, fetch, and change detection.
api.py — HTTP helpers for the provider’s API.
transforms.py — response-to-markdown conversion logic.
write.py (optional) — BaseActionProvider subclass for write-back operations.
__init__.py — exports CONNECTOR (and optionally ACTION_PROVIDER).

The manifest is the contract between the connector and the platform. A build script (scripts/build-source-manifests.mjs) generates a TypeScript module from all manifests so the UI renders provider setup forms, icons, and labels without hardcoded constants. Auto-discovery: gateway/connectors/registry.py scans subdirectories for exported CONNECTOR instances. Adding a new provider requires only the provider folder — no changes to existing platform code. Credential handling: The manifest declares what credentials are needed. The platform encrypts and stores them in the secrets table, secrets_sync.py decrypts them to the gateway filesystem, and source_sync.py assembles a creds dict for the connector. Single-key credentials use {PROVIDER}_SOURCE_{ID_PREFIX}, multi-key credentials use {PROVIDER}_SOURCE_{ID_PREFIX}__{FIELD}. Write support: Providers that support writes set supports_write: true in their manifest and implement a BaseActionProvider. The UI shows a “Write access” toggle on the connection detail page. Write operations flow through the source_write command action in the existing command queue. Browse and validate: The UI proxies browse and validate requests to the gateway’s files API (/sources/browse and /sources/validate), so provider-specific API calls happen on the gateway side where credentials are local. See CONTRIBUTING-SOURCES.md in the repository root for the full contributor guide. Skills can also be created autonomously by agents during work via hq_skill_upsert.py. When an agent discovers a reusable method, it codifies the procedure as an agent-scoped skill. These appear on the agent detail page with edit reasons and recency indicators.

Scope and agent access

Every item has a scope that controls who can access it: Workspace scope (scope = 'workspace') — visible to all agents. When pinned, the item is included in every agent’s boot context automatically. Agent scope (scope = 'agent') — visible only to agents explicitly assigned via the knowledge_item_agents junction table. Each row links one item to one agent. This replaces the old boot:all / boot:<slug> tag convention with explicit, queryable relationships. The scope is a column, not a tag — it can be filtered, indexed, and enforced at the database level.

How agents receive knowledge at boot

When an agent starts a session, the bootstrap script:

Fetches all workspace-scoped items where pinned = true.
Looks up the agent’s ID from its slug.
Fetches all items linked to that agent via knowledge_item_agents.
Deduplicates (an item can be both workspace-pinned and agent-assigned).
Injects the combined set into the agent’s startup context, grouped by scope.

The gateway’s HQ bootstrap plugin renders this context with kind labels ([page], [skill], [file]) and scope grouping (Workspace Knowledge vs Your Knowledge).

Folders and organization

Knowledge items live in folders (knowledge_folders). Folders support:

Nesting (parent/child via parent_id)
Custom icons and colors
Sort ordering

Folders are organizational — they don’t affect scope or agent access.

Search

Knowledge items support two search paths: Semantic search — vector similarity using the embedding column (384-dimensional vectors from the gateway embedder). Used by search_knowledge_items() RPC. Full-text search — PostgreSQL tsvector over title, plain_text, content, and tags. Used by search_knowledge_items_text() RPC. Both RPCs support filtering by tags, folder, and kind.

Chunks

Long-form items are split into chunks (knowledge_chunks) for granular retrieval. Each chunk has its own embedding and full-text search vector. The search_knowledge_chunks() and search_knowledge_chunks_text() RPCs search at the chunk level and join back to the parent item for metadata. Chunks reference their parent via knowledge_item_id. When an item’s content changes, the mark_knowledge_item_pending trigger resets chunk and embedding status, and the embedder re-indexes on its next cycle.

Embedding pipeline

The gateway embedder daemon handles indexing:

Calls lease_knowledge_items_for_indexing() to atomically claim pending items.
Generates embeddings using the local BGE model.
Splits content into chunks, embeds each chunk.
Calls mark_knowledge_item_indexed() on success or mark_knowledge_item_failed() on error.

Items in pending or failed embedding status are picked up automatically. The lease mechanism prevents parallel embedders from duplicating work.

Entity links

Knowledge items participate in the entity links system. Any owner entity (task, routine, collection record, agent) can link to any target entity (knowledge item, collection record, contact, organization, task, or URL). When an agent claims a task, it receives all linked entities as context. This is a universal replacement for the old task-specific attachments model. Entity links are stored in entity_links with polymorphic owner_type/target_type columns and a check constraint ensuring URL links carry a url and entity links carry a target_id.

Database tables

Table	Purpose
`knowledge_folders`	Folder hierarchy for organizing items
`knowledge_items`	All knowledge content — pages, skills, files, sources
`knowledge_item_agents`	Junction table linking agent-scoped items to specific agents
`knowledge_chunks`	Chunked content with per-chunk embeddings for granular retrieval
`source_connections`	External source integrations with plugin-based providers, credentials, and sync schedules
`source_sync_runs`	Sync execution history and status tracking
`entity_links`	Universal polymorphic links between any entities

Agent scripts

Every agent template ships with HQ skills that interact with the knowledge system:

Script	Purpose
`hq_session_bootstrap.py`	Fetches workspace-pinned + agent-specific items at session start
`hq_boot_docs.py`	Loads boot context using scope + junction queries
`hq_skill_upsert.py`	Creates or updates agent-scoped skills with auto-embedding and junction linking
`hq_create_doc.py`	Creates a new knowledge item (page or skill)
`hq_update_doc.py`	Updates an existing knowledge item
`hq_search_docs.py`	Semantic + full-text search across all knowledge items
`hq_get_knowledge_chunks.py`	Retrieves chunks for a specific knowledge item by ID
`hq_claim_task.py`	Claims a task and resolves all entity links (knowledge items, contacts, orgs, collection records)
`hq_inbox_process.py`	Processes inbox items and resolves linked entities

All scripts use the Supabase PostgREST API via the service role key.

Start here

Hosted

Self-host

Use Your HQ

Concepts

Design

Reference

Development

Security and troubleshooting

Knowledge system

Kinds

How indexing works

Source connectors

Scope and agent access

How agents receive knowledge at boot

Folders and organization

Search

Chunks

Embedding pipeline

Entity links

Database tables

Agent scripts

Start here

Hosted

Self-host

Use Your HQ

Concepts

Design

Reference

Development

Security and troubleshooting

Documentation Index

​Kinds

​How indexing works

​Source connectors

​Scope and agent access

​How agents receive knowledge at boot

​Folders and organization

​Search

​Chunks

​Embedding pipeline

​Entity links

​Database tables

​Agent scripts

Kinds

How indexing works

Source connectors

Scope and agent access

How agents receive knowledge at boot

Folders and organization

Search

Chunks

Embedding pipeline

Entity links

Database tables

Agent scripts