rag-engineer
Expert in building Retrieval-Augmented Generation systems. Masters embedding models, vector databases, chunking strategies, and retrieval optimization for LLM applications. Use when: building RAG, vector search, embeddings, semantic search, document retrieval.
Author
Category
AI Skill DevelopmentInstall
Download and extract to your skills directory
Copy command and send to OpenClaw for auto-install:
RAG Engineer - Retrieval-Augmented Generation System Architecture Expert
Skill Overview
A RAG Engineer is an AI specialist dedicated to building Retrieval-Augmented Generation (RAG) systems. They are proficient in vector embeddings, vector databases, document chunking strategies, and retrieval optimization—helping developers create high-quality LLM document Q&A applications.
Use Cases
Core Capabilities
- Chunking strategies based on semantic boundaries rather than fixed token counts
- Preserve document structure (titles, paragraphs) and include contextual overlap
- Automatically detect topic shifts to ensure each chunk remains semantically complete
- Add metadata to enable precise filtering
- Hierarchical indexing strategy (paragraph, chapter, document at multiple granularities)
- Two-stage retrieval flow: coarse filtering + distillation
- Maintain parent-child document relationships to balance precision and context
- Apply relevance-threshold filtering to avoid noisy contexts
- Fuse BM25/TF-IDF keyword matching with vector semantic retrieval
- Reciprocal Rank Fusion score-combining strategy
- Dynamically adjust weights based on different query types
- Re-rank and optimize retrieval results
- Independent retrieval quality evaluation metrics
Common Questions
What is a RAG system? What scenarios is it suitable for?
A RAG (Retrieval-Augmented Generation) system is an architectural approach that combines information retrieval with a large language model. It first retrieves relevant document snippets from a knowledge base, then uses those contents as context for the LLM to generate an answer. RAG is especially suitable for scenarios requiring answers grounded in specific documents—such as enterprise knowledge base Q&A, technical document assistants, contract review, and more. Compared to pure model generation, it can significantly reduce hallucinations and improve answer accuracy.
Why is document chunking important? What are the best practices?
Document chunking directly affects retrieval quality. Fixed-size chunks can cut through sentence and semantic boundaries, resulting in incomplete retrieval results. A best practice is semantic chunking: split by sentence and paragraph boundaries, use embedding similarity to detect topic changes, preserve an appropriate amount of contextual overlap, and add metadata (e.g., chapter titles, document type) for subsequent filtering.
How can I improve retrieval accuracy in a RAG system?
Improving retrieval accuracy typically requires multiple optimizations:
1) Use hybrid search combining semantic and keyword retrieval;
2) Apply metadata pre-filtering to narrow the search space;
3) Evaluate and select appropriate embedding models for different content types;
4) Implement hierarchical retrieval—coarse filtering first, then refinement;
5) Re-rank retrieval results;
6) Filter out low-quality results using relevance thresholds;
7) Establish independent retrieval evaluation metrics and continuously optimize.