Đọc · 15 phút Cập nhật 30/04/2026 Cấp độ · Trung cấp

Vector Search trên Cloud SQL pgvector

Xây hệ thống tìm kiếm theo ngữ nghĩa — không chỉ match từ khoá mà hiểu được ý định người dùng. Nền tảng RAG cho chatbot biết tài liệu công ty, hệ thống đề xuất sản phẩm, và semantic search nội bộ.

Nguyên lý hoạt động

Embedding

Convert text thành vector 768 chiều bằng model text-embedding-004. Mỗi câu/đoạn văn được biểu diễn bởi 768 số float.

Lưu vào pgvector

Cloud SQL Postgres có extension pgvector — index HNSW cho phép tìm 1M vector trong dưới 50ms.

Truy vấn

Embed câu hỏi → tìm top-k vector gần nhất (cosine similarity) → trả về metadata gốc.

RAG (Retrieval-Augmented Generation)

Đưa kết quả tìm được vào context cho ZeniRouter — AI trả lời dựa trên tài liệu thật, không hallucination.

Tạo collection

bashcurl -X POST "https://zenicloud.io/api/v1/vector/collections?ws=prod" \
  -H "Authorization: Bearer $ZENI_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "company_docs",
    "dimension": 768,
    "metric": "cosine",
    "metadata_schema": {
      "title": "string",
      "category": "string",
      "created_at": "timestamp"
    }
  }'

Response:

json{
  "collection_id": "col_8f3a9b1c",
  "name": "company_docs",
  "dimension": 768,
  "metric": "cosine",
  "vector_count": 0,
  "created_at": "2026-04-30T10:15:32Z"
}

Upsert vector — thêm dữ liệu

Endpoint upsert nhận text (Zeni tự embed) hoặc vector tự tính sẵn:

bashcurl -X POST "https://zenicloud.io/api/v1/vector/upsert?ws=prod" \
  -H "Authorization: Bearer $ZENI_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "collection": "company_docs",
    "items": [
      {
        "id": "doc_001",
        "text": "Quy định nghỉ phép: nhân viên được nghỉ 12 ngày phép có lương mỗi năm.",
        "metadata": {
          "title": "Quy định nhân sự 2026",
          "category": "hr",
          "created_at": "2026-01-15T00:00:00Z"
        }
      },
      {
        "id": "doc_002",
        "text": "Chính sách OT: làm thêm giờ tính 150% mức lương cơ bản.",
        "metadata": {
          "title": "Chính sách lương",
          "category": "hr"
        }
      }
    ]
  }'

Search top-k similarity

bashcurl -X POST "https://zenicloud.io/api/v1/vector/search?ws=prod" \
  -H "Authorization: Bearer $ZENI_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "collection": "company_docs",
    "query_text": "Tôi có bao nhiêu ngày nghỉ một năm?",
    "top_k": 3,
    "filter": {"category": "hr"},
    "include_metadata": true
  }'

Response:

json{
  "results": [
    {
      "id": "doc_001",
      "score": 0.91,
      "text": "Quy định nghỉ phép: nhân viên được nghỉ 12 ngày phép có lương mỗi năm.",
      "metadata": {"title": "Quy định nhân sự 2026", "category": "hr"}
    },
    {
      "id": "doc_002",
      "score": 0.42,
      "text": "Chính sách OT: làm thêm giờ tính 150% mức lương cơ bản.",
      "metadata": {"title": "Chính sách lương", "category": "hr"}
    }
  ],
  "search_time_ms": 23
}

Ví dụ Python — chatbot RAG hoàn chỉnh

pythonimport os, requests

ZENI = "https://zenicloud.io/api/v1"
TOKEN = os.environ["ZENI_TOKEN"]
HEADERS = {"Authorization": f"Bearer {TOKEN}"}

def vector_search(query, top_k=5):
    r = requests.post(
        f"{ZENI}/vector/search?ws=prod",
        headers=HEADERS,
        json={
            "collection": "company_docs",
            "query_text": query,
            "top_k": top_k,
            "include_metadata": True,
        },
    )
    return r.json()["results"]

def rag_answer(question):
    # 1. Tìm tài liệu liên quan
    docs = vector_search(question, top_k=3)

    # 2. Build context từ docs
    context = "\n\n".join([
        f"[Nguồn: {d['metadata']['title']}]\n{d['text']}"
        for d in docs
    ])

    # 3. Gọi AI với context
    r = requests.post(
        f"{ZENI}/router/complete?ws=prod",
        headers=HEADERS,
        json={
            "messages": [
                {"role": "system", "content": (
                    "Bạn là trợ lý nội bộ. Chỉ trả lời dựa trên TÀI LIỆU "
                    "được cung cấp. Nếu không có thông tin, nói rõ là "
                    "không tìm thấy trong tài liệu."
                )},
                {"role": "user", "content": (
                    f"TÀI LIỆU:\n{context}\n\n"
                    f"CÂU HỎI: {question}"
                )},
            ],
            "task_type": "qa_simple",
        },
    )
    return r.json()["text"]

# Sử dụng
print(rag_answer("Tôi có bao nhiêu ngày nghỉ phép một năm?"))
# → "Theo quy định nhân sự 2026, bạn được nghỉ 12 ngày phép có lương mỗi năm."

print(rag_answer("Chính sách thưởng Tết như thế nào?"))
# → "Tôi không tìm thấy thông tin về chính sách thưởng Tết trong tài liệu."

Chunking — chia tài liệu lớn

Văn bản dài cần chia nhỏ trước khi embed (tối đa 8000 tokens/vector). Khuyến nghị:

Kích thước — 500-1000 tokens/chunk
Overlap — 100-150 tokens giữa các chunk để giữ context
Cắt theo paragraph — tốt hơn cắt theo số ký tự

pythondef chunk_text(text, chunk_size=800, overlap=120):
    """Chia text theo paragraph với overlap."""
    paragraphs = text.split("\n\n")
    chunks, current, current_len = [], [], 0

    for p in paragraphs:
        p_len = len(p.split())
        if current_len + p_len > chunk_size and current:
            chunks.append("\n\n".join(current))
            # Giữ vài para cuối làm overlap
            current = current[-2:] if len(current) > 2 else current
            current_len = sum(len(x.split()) for x in current)
        current.append(p)
        current_len += p_len

    if current:
        chunks.append("\n\n".join(current))
    return chunks

# Index toàn bộ file PDF (sau khi OCR)
with open("policy.txt") as f:
    full_text = f.read()

chunks = chunk_text(full_text)
items = [
    {
        "id": f"policy_chunk_{i}",
        "text": chunk,
        "metadata": {"source": "policy.txt", "chunk_index": i}
    }
    for i, chunk in enumerate(chunks)
]

requests.post(
    f"{ZENI}/vector/upsert?ws=prod",
    headers=HEADERS,
    json={"collection": "company_docs", "items": items},
)
print(f"Đã index {len(items)} chunks")

Filter theo metadata

Kết hợp similarity search với filter SQL-like:

json{
  "collection": "products",
  "query_text": "ghế công thái học",
  "top_k": 10,
  "filter": {
    "$and": [
      {"category": "office"},
      {"price": {"$lt": 5000000}},
      {"in_stock": true}
    ]
  }
}

Use cases

Chatbot doanh nghiệp — trả lời câu hỏi nhân sự, quy định, FAQ dựa trên handbook
Semantic search — tìm tài liệu liên quan trên website thay vì chỉ match keyword
Recommendation — gợi ý sản phẩm/bài viết tương tự
Code search — tìm function trong codebase theo mô tả tiếng Việt
Customer support — auto-route ticket dựa trên ngữ nghĩa nội dung

Chi phí

Mục	Đơn giá
Embedding text	$0.025 / 1M tokens
Lưu trữ vector	$0.30 / 1M vector / tháng
Query search	$0.10 / 1K queries
Free tier	10K vectors lưu, 1K queries/tháng

Best practice production

1. Chunk size 600-800 tokens cho tiếng Việt — câu thường dài hơn EN.
2. Re-embed lại mỗi 6 tháng nếu model có version mới.
3. Cache kết quả search 5 phút cho query lặp lại.
4. Set top_k ban đầu 5-10, không nên quá cao (giảm chất lượng RAG).

Bước tiếp theo

AI Router — kết hợp với RAG cho chatbot
OCR — index PDF, hoá đơn, hợp đồng
Cron — re-index database tự động hằng đêm