langchain-architecture - Agent Skills

LangChain Architecture

Master the LangChain framework for building sophisticated LLM applications with agents, chains, memory, and tool integration.

Do not use this skill when

The task is unrelated to langchain architecture

You need a different domain or tool outside this scope

Instructions

Clarify goals, constraints, and required inputs.

Apply relevant best practices and validate outcomes.

Provide actionable steps and verification.

If detailed examples are required, open resources/implementation-playbook.md.

Use this skill when

Building autonomous AI agents with tool access

Implementing complex multi-step LLM workflows

Managing conversation memory and state

Integrating LLMs with external data sources and APIs

Creating modular, reusable LLM application components

Implementing document processing pipelines

Building production-grade LLM applications

Core Concepts

1. Agents

Autonomous systems that use LLMs to decide which actions to take.

Agent Types:

ReAct: Reasoning + Acting in interleaved manner

OpenAI Functions: Leverages function calling API

Structured Chat: Handles multi-input tools

Conversational: Optimized for chat interfaces

Self-Ask with Search: Decomposes complex queries

2. Chains

Sequences of calls to LLMs or other utilities.

Chain Types:

LLMChain: Basic prompt + LLM combination

SequentialChain: Multiple chains in sequence

RouterChain: Routes inputs to specialized chains

TransformChain: Data transformations between steps

MapReduceChain: Parallel processing with aggregation

3. Memory

Systems for maintaining context across interactions.

Memory Types:

ConversationBufferMemory: Stores all messages

ConversationSummaryMemory: Summarizes older messages

ConversationBufferWindowMemory: Keeps last N messages

EntityMemory: Tracks information about entities

VectorStoreMemory: Semantic similarity retrieval

4. Document Processing

Loading, transforming, and storing documents for retrieval.

Components:

Document Loaders: Load from various sources

Text Splitters: Chunk documents intelligently

Vector Stores: Store and retrieve embeddings

Retrievers: Fetch relevant documents

Indexes: Organize documents for efficient access

5. Callbacks

Hooks for logging, monitoring, and debugging.

Use Cases:

Request/response logging

Token usage tracking

Latency monitoring

Error handling

Custom metrics collection

Quick Start

from langchain.agents import AgentType, initialize_agent, load_tools
from langchain.llms import OpenAI
from langchain.memory import ConversationBufferMemory
Initialize LLM

llm = OpenAI(temperature=0)
Load tools

tools = load_tools(["serpapi", "llm-math"], llm=llm)
Add memory

memory = ConversationBufferMemory(memory_key="chat_history")
Create agent

agent = initialize_agent(
    tools,
    llm,
    agent=AgentType.CONVERSATIONAL_REACT_DESCRIPTION,
    memory=memory,
    verbose=True
)
Run agent

result = agent.run("What's the weather in SF? Then calculate 25  4")

Architecture Patterns

Pattern 1: RAG with LangChain

from langchain.chains import RetrievalQA from langchain.document_loaders import TextLoader from langchain.text_splitter import CharacterTextSplitter from langchain.vectorstores import Chroma from langchain.embeddings import OpenAIEmbeddings Load and process documents loader = TextLoader('documents.txt') documents = loader.load() text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200) texts = text_splitter.split_documents(documents) Create vector store embeddings = OpenAIEmbeddings() vectorstore = Chroma.from_documents(texts, embeddings) Create retrieval chain qa_chain = RetrievalQA.from_chain_type( llm=llm, chain_type="stuff", retriever=vectorstore.as_retriever(), return_source_documents=True ) Query result = qa_chain({"query": "What is the main topic?"})
Pattern 2: Custom Agent with Tools

from langchain.agents import Tool, AgentExecutor from langchain.agents.react.base import ReActDocstoreAgent from langchain.tools import tool @tool def search_database(query: str) -> str: """Search internal database for information.""" # Your database search logic return f"Results for: {query}" @tool def send_email(recipient: str, content: str) -> str: """Send an email to specified recipient.""" # Email sending logic return f"Email sent to {recipient}" tools = [search_database, send_email]
agent = initialize_agent( tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True )
Pattern 3: Multi-Step Chain

from langchain.chains import LLMChain, SequentialChain from langchain.prompts import PromptTemplate Step 1: Extract key information extract_prompt = PromptTemplate( input_variables=["text"], template="Extract key entities from: {text}\n\nEntities:" ) extract_chain = LLMChain(llm=llm, prompt=extract_prompt, output_key="entities") Step 2: Analyze entities analyze_prompt = PromptTemplate( input_variables=["entities"], template="Analyze these entities: {entities}\n\nAnalysis:" ) analyze_chain = LLMChain(llm=llm, prompt=analyze_prompt, output_key="analysis") Step 3: Generate summary summary_prompt = PromptTemplate( input_variables=["entities", "analysis"], template="Summarize:\nEntities: {entities}\nAnalysis: {analysis}\n\nSummary:" ) summary_chain = LLMChain(llm=llm, prompt=summary_prompt, output_key="summary") Combine into sequential chain overall_chain = SequentialChain( chains=[extract_chain, analyze_chain, summary_chain], input_variables=["text"], output_variables=["entities", "analysis", "summary"], verbose=True )
Memory Management Best Practices
Choosing the Right Memory Type

# For short conversations (< 10 messages) from langchain.memory import ConversationBufferMemory memory = ConversationBufferMemory() For long conversations (summarize old messages) from langchain.memory import ConversationSummaryMemory memory = ConversationSummaryMemory(llm=llm) For sliding window (last N messages) from langchain.memory import ConversationBufferWindowMemory memory = ConversationBufferWindowMemory(k=5) For entity tracking from langchain.memory import ConversationEntityMemory memory = ConversationEntityMemory(llm=llm) For semantic retrieval of relevant history from langchain.memory import VectorStoreRetrieverMemory memory = VectorStoreRetrieverMemory(retriever=retriever)
Callback System
Custom Callback Handler

from langchain.callbacks.base import BaseCallbackHandler class CustomCallbackHandler(BaseCallbackHandler): def on_llm_start(self, serialized, prompts, kwargs): print(f"LLM started with prompts: {prompts}") def on_llm_end(self, response, kwargs): print(f"LLM ended with response: {response}") def on_llm_error(self, error, kwargs): print(f"LLM error: {error}") def on_chain_start(self, serialized, inputs, kwargs): print(f"Chain started with inputs: {inputs}") def on_agent_action(self, action, kwargs): print(f"Agent taking action: {action}") Use callback agent.run("query", callbacks=[CustomCallbackHandler()])
Testing Strategies
import pytest from unittest.mock import Mock def test_agent_tool_selection(): # Mock LLM to return specific tool selection mock_llm = Mock() mock_llm.predict.return_value = "Action: search_database\nAction Input: test query" agent = initialize_agent(tools, mock_llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION) result = agent.run("test query") # Verify correct tool was selected assert "search_database" in str(mock_llm.predict.call_args) def test_memory_persistence(): memory = ConversationBufferMemory() memory.save_context({"input": "Hi"}, {"output": "Hello!"})
assert "Hi" in memory.load_memory_variables({})['history'] assert "Hello!" in memory.load_memory_variables({})['history']
Performance Optimization
1. Caching

from langchain.cache import InMemoryCache import langchain
langchain.llm_cache = InMemoryCache()
2. Batch Processing

# Process multiple documents in parallel from langchain.document_loaders import DirectoryLoader from concurrent.futures import ThreadPoolExecutor loader = DirectoryLoader('./docs') docs = loader.load() def process_doc(doc): return text_splitter.split_documents([doc])
with ThreadPoolExecutor(max_workers=4) as executor: split_docs = list(executor.map(process_doc, docs))
3. Streaming Responses

from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
llm = OpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()])
Resources
references/agents.md: Deep dive on agent architectures

references/memory.md: Memory system patterns

references/chains.md: Chain composition strategies

references/document-processing.md: Document loading and indexing

references/callbacks.md: Monitoring and observability

assets/agent-template.py: Production-ready agent template

assets/memory-config.yaml: Memory configuration examples

assets/chain-example.py: Complex chain examples
Common Pitfalls
Memory Overflow: Not managing conversation history length

Tool Selection Errors: Poor tool descriptions confuse agents

Context Window Exceeded: Exceeding LLM token limits

No Error Handling: Not catching and handling agent failures

Inefficient Retrieval*: Not optimizing vector store queries

Production Checklist

[ ] Implement proper error handling

[ ] Add request/response logging

[ ] Monitor token usage and costs

[ ] Set timeout limits for agent execution

[ ] Implement rate limiting

[ ] Add input validation

[ ] Test with edge cases

[ ] Set up observability (callbacks)

[ ] Implement fallback strategies

[ ] Version control prompts and configurations