KB search and AI retrieval
The Atender Knowledge Base has one search engine that two very different audiences use:
- Customers typing into the search box on your public help center.
- AI agents — Capabilities, Sidekick, Web Chat, voice — retrieving articles to answer questions in conversations.
Both run through the same hybrid retrieval pipeline. Optimizing your KB content for one optimizes it for the other.
The hybrid pipeline
A search query goes through two paths in parallel and the results are blended:
- Keyword search (full-text via Postgres tsvector) — Exact words, common stems, synonym lists — Fast, precise on rare terms (error codes, product names)
- Vector search (embeddings via pgvector) — Semantic meaning of the query — Catches paraphrases, related concepts, intent matching
The blend is the strength of both. A customer typing “my card was charged twice” gets the article titled “Duplicate charge” even though the words don’t overlap — the vector path catches the meaning. A customer typing the literal error code E_PAYMENT_502 gets a hit even if it appears nowhere in the article body but only in the keywords field — the keyword path nails it.
Embeddings and chunks
Articles are chunked before embedding. Atender splits each article into smaller passages (typically a few hundred words) and embeds each chunk separately. When a query comes in, the most relevant chunks across the whole KB are retrieved, and the article each chunk belongs to is returned.
Why chunks: a long article has many topics. If you embed the whole article as one vector, only the dominant topic gets a strong signal. With chunks, each topic gets its own representation, so a 4000-word “Billing FAQ” can answer “How do I refund a charge?” and “How do I add a credit card?” without one drowning out the other.
The chunker has a version. When the chunker is improved (better boundary detection, smarter overlap), articles re-chunk and re-embed automatically. You don’t manage this — it’s a background job.
What gets searched
- Title — Yes (boosted) — Yes
- Summary — Yes (boosted) — Yes
- Body — Yes — Yes (chunked)
- Keywords field — Yes (boosted) — No
- Tag display names — Yes — No
- Category and subcategory names — Yes — No
Use the keywords field for terms you specifically want matched but don’t want in the prose. Synonyms, alternative spellings, error codes, common misspellings.
What does not get searched
- Drafts and
needs-reviewarticles — invisible to search until status ispublished. - Archived articles — invisible. Archive is the soft-delete state.
- Internal Handbook procedures — searched separately and only by AI agents (Sidekick, Capabilities), never by public customers.
- Article role assignments — roles control browsing visibility, not search ranking. The search engine doesn’t bias by role.
How AI agents use retrieval
When Sidekick, a Web Chat agent, or a voice agent needs to answer a customer question, it sends the question to the same retrieval pipeline and gets back the most relevant articles plus the most relevant chunks. The agent then reads those chunks and composes an answer.
A few consequences worth knowing:
- Articles are the unit of authority. AI agents quote and link articles, not chunks. A precise chunk gets retrieved, but the agent points the customer to the article.
- Recency matters more than perfection. A slightly imperfect article that’s up to date beats a perfect article that’s six months stale. The Self-Learning engine and the quarterly review pass exist to catch staleness.
- Translations are first-class. AI agents retrieve from the customer’s language when one’s available, and fall back to the default language when it isn’t.
How to optimize for retrieval
Same as how to write good articles for humans:
- Use the words your customers use. “Refund” not “reimbursement”. “My card” not “the cardholder’s payment instrument”.
- Lead with the answer. The first paragraph should resolve the question. Embeddings weight the lead heavily.
- Keep one topic per article. Long mixed-topic articles dilute their own embedding. Split when topics diverge.
- Use the keywords field. Add synonyms, error codes, common misspellings, internal jargon. The keyword path catches these even when the body doesn’t include them.
- Update the article when the feature changes. Stale articles confuse both customers and AI.
Where to go next
- Write articles that retrieve well: Create an article
- Troubleshoot retrieval misses: Why is search not finding an article?
- See how AI agents use the KB: Capabilities, Sidekick