llm-app-patterns
Production-ready patterns for building LLM applications. Covers RAG pipelines, agent architectures, prompt IDEs, and LLMOps monitoring. Use when designing AI applications, implementing RAG, building agents, or setting up LLM observability.
Author
Category
AI Skill DevelopmentInstall
Download and extract to your skills directory
Copy command and send to OpenClaw for auto-install:
LLM Application Patterns - Production-Grade LLM Application Architecture Patterns
Skills Overview
LLM Application Patterns provides a complete architectural guide for building production-grade large-model applications. It covers core patterns such as RAG pipelines, agent architectures, prompt engineering, and LLMOps monitoring—helping developers quickly design and deploy AI applications.
Use Cases
When you need to plan the overall architecture for an AI application, this skill offers a complete decision matrix—from simple RAG to complex multi-agent collaboration—helping you choose the right technical approach.
When you need to build a Q&A system based on an enterprise knowledge base, this skill covers end-to-end implementations including document ingestion, vector storage, hybrid retrieval, and context compression.
When you need to develop AI assistants that can use external tools, this skill provides multiple agent architecture patterns such as ReAct, Function Calling, and Plan-and-Execute, along with code examples.
Core Features
Includes document chunking strategies (fixed-size, semantic chunking, document-aware chunking), vector database selection guidance (Pinecone, Weaviate, ChromaDB, pgvector), hybrid retrieval implementation (semantic + keyword BM25), as well as advanced patterns like multi-query retrieval and context compression.
Contains patterns such as the ReAct reasoning-and-action approach, structured tool calling via Function Calling, the Plan-and-Execute planning-execution approach, and multi-agent collaboration architectures. Each pattern includes a full code implementation and descriptions of suitable scenarios.
Covers LLM application monitoring metrics (latency, quality, cost, reliability), request logs and distributed tracing, an output quality evaluation framework, and production-critical capabilities such as caching strategies, rate limiting, retries, and degradation fallbacks.
Frequently Asked Questions
What are common architecture patterns for LLM applications?
Common patterns include: simple RAG (suitable for FAQ and document search), hybrid RAG (semantic + keywords), ReAct agents (multi-step reasoning tasks), Function Calling (structured tool calling), Plan-Execute (complex task planning), and multi-agent collaboration (research and analysis tasks). When choosing, you should consider task complexity, development cost, and runtime cost.
How do I choose the right vector database?
Choose based on the scenario:
For embedding models: OpenAI text-embedding-3-small has low cost and good quality, while the local BGE-large model can be run fully for free.
How do I monitor LLM applications in production?
You need to track four categories of metrics: performance metrics (P50/P99 latency, generation speed), quality metrics (user satisfaction, task completion rate, hallucination rate), cost metrics (cost per request, cache hit rate), and reliability metrics (error rate, timeout rate, retry rate). It is recommended to use OpenTelemetry for distributed tracing and to establish complete request/response logging.