For those building AI assistants and looking for a convenient knowledge store
The Starting Point: The Storage Problem
When you want to add real product knowledge to your AI assistant, the first question that comes up is: where do you store the documents?
The obvious options are: Notion, Google Docs, Confluence, a folder of PDFs, a custom-built CMS. Each offers something, but each has friction:
- Notion — great for editing, but the API is paid and unreliable
- Google Docs — familiar, but no proper structure or versioning
- Confluence — enterprise-grade, overkill for a small team
- A folder of files — simple, but no editing UI, no access control, no webhooks
I arrived at a different solution: I already had WordPress. And I decided not to multiply systems.
WordPress as a Knowledge Base CMS
WordPress isn’t just a blog. It’s a full-featured content management system with:
- An editor (Gutenberg or Classic) — any manager can write and edit articles without technical knowledge
- Access control — who can edit, who can only read
- A REST API out of the box —
GET /wp-json/wp/v2/postsreturns JSON with content, meta fields, and update dates - Hooks — you can attach an action to post save (
save_post) - Publication statuses — publish, private, draft
The last point turned out to be the key.
Two Types of Content in One CMS
In our implementation, the knowledge base is built from two sources within a single WordPress:
Public pages are real articles on the website: product descriptions, tutorials, FAQs. They simultaneously serve as SEO content for people and as a knowledge source for the AI assistant.
Private pages (status: private) are internal documents, invisible to site visitors but accessible via the REST API with authentication. This includes: sales scripts, objection-handling guides, AI system prompts, and technical specifications.
This separation solves an important problem: the same WordPress is simultaneously a public website and a private knowledge base.
WordPress
├── Public posts/pages → website + AI knowledge base
└── Private pages → AI knowledge base only
Additionally, an “Exclude from AI” field (a custom checkbox _wifly_no_ai) lets you mark a specific post so it won’t be indexed. This is useful for service pages, drafts, and promotional copy that shouldn’t influence the assistant’s answers.
Sync Architecture
The most interesting part: how content from WordPress gets into the vector database in real time.
WordPress Plugin as Trigger
We wrote a small WordPress plugin (~80 lines of PHP) that hooks into save_post:
add_action('save_post', function($post_id) {
if (wp_is_post_revision($post_id)) return;
if (get_post_meta($post_id, '_wifly_no_ai', true)) return;
// Debounce: no more than once every 2 minutes
$last = get_transient('wifly_kb_sync_' . $post_id);
if ($last) return;
set_transient('wifly_kb_sync_' . $post_id, 1, 120);
// Webhook to the AI assistant server
wp_remote_post(KB_SYNC_URL, [
'body' => json_encode(['post_id' => $post_id]),
'headers' => ['Authorization' => 'Bearer ' . KB_SYNC_SECRET],
'timeout' => 5,
'blocking' => false, // don't wait for a response
]);
});
Key details:
blocking: false— WordPress doesn’t wait for the server’s response; the page saves instantly- Debounce via
set_transient— prevents repeated calls during autosave - Secret token — the webhook is protected with Bearer authorization
The Server Receives the Webhook and Updates the Index
On the Node.js/Express server side, the webhook handler:
- Receives
post_id - Fetches the latest content via WP REST API
- Parses HTML → extracts clean text
- Splits into chunks by h2/h3 headings (~800 characters each)
- Vectorizes each chunk using
text-embedding-3-large - Updates records in Qdrant (upsert by
post_id + chunk_index)
app.post('/api/kb/sync', verifySecret, async (req, res) => {
res.json({ ok: true }); // Respond immediately
const { post_id } = req.body;
const post = await fetchFromWordPress(post_id);
const chunks = splitByHeadings(post.content, 800);
for (const [i, chunk] of chunks.entries()) {
const vector = await openai.embeddings.create({
model: 'text-embedding-3-large',
input: chunk.text,
});
await qdrant.upsert('wifly_kb', {
points: [{
id: `${post_id}_${i}`,
vector: vector.data[0].embedding,
payload: { text: chunk.text, heading: chunk.heading, post_id, url: post.link }
}]
});
}
});
Result: an editor saves an article — within 5–10 seconds, the AI assistant already knows the updated content.
Vectorization and Search
Embedding Model
We use text-embedding-3-large from OpenAI (3072 dimensions). It’s more expensive than text-embedding-3-small, but the search accuracy is noticeably better — especially for domain-specific technical terminology.
Hybrid Search: Dense + Keyword → RRF
Pure vector search is good at finding semantically similar content, but struggles with exact names — product codes, product names, abbreviations. If a user asks about “flyAir” or a specific pricing plan, the vector might miss.
The solution is hybrid search with Reciprocal Rank Fusion (RRF):
User query
│
├──→ Dense search (Qdrant ANN) → TOP-14 candidates
│
└──→ Keyword search (full-text filter) → TOP-10 candidates
│
▼
RRF Fusion (k=60)
│
▼
TOP-7 final chunks → into LLM context
The RRF formula for each document:
score(d) = Σ 1 / (k + rank_i(d))
Documents that rank high in both lists get the highest final score. This is a simple and highly effective method with no need to tune weights.
The full-text index in Qdrant is created at server startup (idempotent):
await qdrant.createPayloadIndex('wifly_kb', {
field_name: 'text',
field_schema: 'text',
});
What’s Stored in Qdrant
Each chunk is a point in vector space with a payload:
{
"id": "1847_2",
"vector": [0.023, -0.041, ...], // 3072 floats
"payload": {
"text": "flyAir is a device for passive MAC address collection...",
"heading": "How the MAC Radar Works",
"post_id": 1847,
"post_type": "page",
"url": "https://wifly.ru/flyair",
"updated_at": "2026-04-27T11:32:00Z"
}
}
Chunking by headings matters: it preserves the semantic integrity of each fragment. Mechanical splitting at 800 characters often cuts context at the wrong place.
What Works Well
Frictionless editorial workflow. Managers keep working in familiar WordPress. They don’t know their articles are becoming part of an AI system — it just works.
Live data. Classic RAG with manual file uploads goes stale quickly. Webhook synchronization keeps the index up to date automatically.
Separation of public and private content. One WordPress — two layers: a public website and a private knowledge base. No need to maintain two separate systems.
Full-text + vector search. The hybrid approach closes the blind spots of pure vector search — especially for product names and technical terms.
Pitfalls
save_post fires many times. During autosave, publish, and meta field updates — the hook can fire 3–5 times for a single action. Debounce via set_transient is mandatory.
Private pages require REST API authentication. A plain GET /wp-json/wp/v2/pages won’t return them. You need an Application Password and an Authorization: Basic ... header.
HTML in the content API. WordPress returns HTML, not clean text. You need a parser — we use cheerio on the Node.js side. It’s important to strip shortcodes, scripts, and ad blocks.
Embedding costs. text-embedding-3-large costs $0.13 / 1M tokens. At 800 characters per chunk and 1,000 articles, it’s trivial for the initial load — but keep this in mind for frequent updates to large knowledge bases.
Final Architecture
WordPress (cms.wifly.ru)
├── Public pages → SEO + KB
├── Private pages → KB only
└── Plugin: save_post → webhook (non-blocking)
│
▼
Node.js API Server
├── Fetch WP content (REST API)
├── Parse HTML → clean text
├── Chunk by headings (~800 chars)
├── Embed (text-embedding-3-large)
└── Upsert → Qdrant
│
▼
Hybrid Search (dense + keyword → RRF)
│
▼
GPT-4o with KB context → user response
Conclusion
If you already have WordPress — don’t rush to find another storage solution for RAG. It gives you everything you need: an editor, access control, a REST API, and hooks for synchronization. Add vectorization via webhook — and your existing CMS becomes a living, self-updating knowledge source for your AI assistant.
The full stack we use: WordPress → Node.js/Express → Qdrant → OpenAI Realtime API — and it works in production.
Interested? The next article will cover how to build a voice assistant on top of this same knowledge base using the OpenAI Realtime API.


