RAG with OpenSearch: From Fundamentals to Production¶
Overview¶
Retrieval-Augmented Generation (RAG) represents a breakthrough in how AI systems access and utilize knowledge. By combining the reasoning capabilities of large language models with the precision of search systems, RAG enables AI to provide accurate, up-to-date, and contextually relevant responses while citing specific sources.
This guide explores RAG fundamentals and demonstrates how OpenSearch serves as a powerful foundation for building production-ready RAG systems, from simple document retrieval to sophisticated multi-modal knowledge bases.
Understanding RAG: The Knowledge Problem¶
The Challenge with Traditional Language Models¶
Large language models like GPT-4 and Claude possess remarkable reasoning abilities, but they face fundamental limitations when it comes to knowledge:
Knowledge Cutoff: Training data has a specific cutoff date, making models unaware of recent events or updates. A model trained on data through 2023 cannot answer questions about 2024 developments.
Hallucination: When models lack specific information, they may generate plausible-sounding but factually incorrect responses. This occurs because models are trained to produce coherent text rather than to explicitly acknowledge knowledge gaps.
Static Knowledge: Information learned during training cannot be updated without retraining the entire model, making it impractical to incorporate new company policies, product updates, or domain-specific knowledge.
Source Attribution: Traditional language models cannot cite specific sources for their responses, making it difficult to verify accuracy or provide transparency in critical applications.
How RAG Solves These Problems¶
RAG addresses these limitations by introducing a two-step process that separates knowledge retrieval from response generation:
Step 1: Retrieval - Given a user question, search through a knowledge base to find relevant documents or passages that contain pertinent information.
Step 2: Augmented Generation - Provide both the user question and retrieved context to the language model, instructing it to generate a response based on the specific retrieved information.
This approach transforms the language model from a knowledge repository into a reasoning engine that works with provided evidence, similar to how a skilled researcher synthesizes information from multiple sources to answer complex questions.
Real-World Example:
Traditional Approach: - User: "What's our company's remote work policy?" - Model: Generates a generic response about remote work policies, potentially inaccurate for the specific company
RAG Approach: - User: "What's our company's remote work policy?" - System retrieves: Company HR manual section on remote work - Model receives: Question + Retrieved policy document - Model responds: "According to the company handbook, employees can work remotely up to 3 days per week with manager approval..."
RAG Architecture Components¶
A complete RAG system consists of several interconnected components that work together to provide accurate, relevant responses:
Knowledge Base: The searchable repository of information, typically consisting of documents, web pages, databases, or other structured data sources. This serves as the system's external memory.
Embedding Model: Converts text into vector representations that capture semantic meaning. This enables the system to find relevant information even when exact keywords don't match.
Vector Database: Stores and indexes the embedded documents to enable fast similarity search. OpenSearch excels in this role with its advanced vector search capabilities.
Retrieval System: Identifies and ranks the most relevant documents or passages for a given query. This often combines multiple search strategies for optimal results.
Generation Model: The large language model that synthesizes retrieved information into coherent, helpful responses while maintaining accuracy and proper attribution.
Orchestration Layer: Coordinates the flow between retrieval and generation, handles error cases, and manages the overall user experience.
RAG Implementation Patterns¶
Basic RAG: Straightforward Document Retrieval¶
The fundamental RAG pattern retrieves relevant documents and includes them as context for the language model. This approach works well for knowledge bases with clear, self-contained documents.
Architecture Flow: 1. User submits a question 2. Convert question to embedding vector 3. Search knowledge base for similar documents 4. Retrieve top-k most relevant documents 5. Combine question and retrieved documents in a prompt 6. Generate response using language model
Example Implementation:
{
"user_query": "How do I configure SSL certificates?",
"retrieved_documents": [
{
"title": "SSL Configuration Guide",
"content": "To configure SSL certificates, first obtain a valid certificate from a trusted CA...",
"source": "admin_manual.pdf",
"score": 0.89
}
],
"prompt_template": "Based on the following documentation, answer the user's question.\n\nQuestion: {user_query}\n\nRelevant Documentation:\n{retrieved_content}\n\nPlease provide a comprehensive answer based on the documentation above."
}
When to Use Basic RAG: - Knowledge base consists of well-structured documents - Questions typically map to single documents or clear topics - Information doesn't require complex reasoning across multiple sources - Getting started with RAG implementation
Advanced RAG: Multi-Step Reasoning and Filtering¶
Advanced RAG implementations incorporate sophisticated retrieval strategies and reasoning patterns to handle complex queries that require information synthesis from multiple sources.
Hierarchical Retrieval: First identifies relevant document categories or sections, then performs detailed search within those areas. This prevents the system from getting overwhelmed by irrelevant but superficially similar content.
Query Expansion: Generates multiple versions of the user's question to capture different ways the same information might be expressed in the knowledge base.
Iterative Retrieval: Performs multiple rounds of retrieval based on partial answers, allowing the system to gather comprehensive information for complex questions.
Context Filtering: Applies relevance scoring and filtering to ensure only the most pertinent information reaches the language model, preventing confusion from marginally relevant content.
Example Multi-Step Query:
User Question: "What are the performance implications of different caching strategies for our e-commerce platform?"
Step 1: Retrieve general caching documentation
Step 2: Retrieve e-commerce specific performance guidelines
Step 3: Retrieve case studies of caching implementations
Step 4: Synthesize information across all retrieved sources
Step 5: Generate comprehensive response with trade-off analysis
Conversational RAG: Maintaining Context Across Interactions¶
Conversational RAG extends the basic pattern to support multi-turn conversations, maintaining context and allowing for follow-up questions that build on previous interactions.
Context Preservation: Maintains conversation history to understand references like "that approach" or "the previous solution" in follow-up questions.
Query Contextualization: Combines current question with conversation history to create more specific search queries.
Response Continuity: Ensures responses build logically on previous answers while incorporating new retrieved information.
Example Conversation Flow:
Turn 1:
User: "How do I set up monitoring for our database?"
System: [Retrieves monitoring setup docs] "To set up database monitoring, you'll need to configure..."
Turn 2:
User: "What about alerting thresholds?"
System: [Understands "alerting thresholds" in context of database monitoring] "For the database monitoring we discussed, recommended alerting thresholds are..."
OpenSearch as a RAG Foundation¶
Vector Search Capabilities¶
OpenSearch provides enterprise-grade vector search functionality that forms the backbone of effective RAG systems. Its integration of traditional text search with modern vector search creates powerful hybrid retrieval capabilities.
Semantic Understanding: OpenSearch's vector search capabilities enable RAG systems to find relevant information even when the query uses different terminology than the source documents. For example, searching for "automobile maintenance" can retrieve documents about "car repair" through semantic similarity.
Hybrid Search Excellence: OpenSearch uniquely combines keyword-based text search with vector-based semantic search, providing both precision and recall. This combination ensures that exact matches are prioritized while also capturing semantically related content.
Algorithm Flexibility: Support for multiple vector search algorithms (HNSW, IVF) allows optimization for different RAG scenarios. HNSW excels for real-time retrieval needs, while IVF provides memory-efficient search for large knowledge bases.
Advanced Filtering: OpenSearch enables complex filtering during vector search, allowing RAG systems to constrain retrieval by document type, date ranges, user permissions, or other metadata. This ensures retrieved information is both relevant and appropriate for the user.
Document Storage and Management¶
OpenSearch's document-centric architecture aligns perfectly with RAG requirements, providing flexible storage and rich metadata capabilities.
Flexible Schema Design: Store documents with arbitrary metadata fields, enabling sophisticated filtering and routing in RAG applications. Documents can include content embeddings, source information, classification labels, and custom attributes.
{
"document_id": "tech_guide_ssl_001",
"title": "SSL Certificate Configuration",
"content": "Complete guide to configuring SSL certificates...",
"content_vector": [0.1, -0.2, 0.8, ...],
"metadata": {
"document_type": "technical_guide",
"department": "infrastructure",
"last_updated": "2024-01-15",
"security_level": "internal",
"topics": ["ssl", "security", "configuration"],
"author": "Infrastructure Team"
}
}
Version Management: Track document versions and updates, enabling RAG systems to work with the most current information while maintaining historical context when needed.
Access Control Integration: Leverage OpenSearch's security features to ensure RAG systems only retrieve information the user is authorized to access, maintaining data governance in enterprise environments.
Real-Time Updates and Indexing¶
OpenSearch's near real-time indexing capabilities ensure RAG systems work with current information, critical for dynamic environments where knowledge bases frequently change.
Incremental Updates: Add new documents or update existing ones without full re-indexing, keeping RAG systems current as knowledge bases evolve.
Change Detection: Monitor document changes and automatically update embeddings when content is modified, ensuring vector representations remain accurate.
Bulk Operations: Efficiently process large batches of documents during initial setup or major updates, with built-in error handling and progress tracking.
Implementing RAG with OpenSearch¶
Basic RAG Implementation¶
Here's a practical implementation of a basic RAG system using OpenSearch, demonstrating the core patterns that can be extended for more complex use cases.
Index Configuration for RAG:
{
"settings": {
"index": {
"knn": true,
"number_of_shards": 3,
"number_of_replicas": 1
}
},
"mappings": {
"properties": {
"title": {"type": "text"},
"content": {"type": "text"},
"content_vector": {
"type": "knn_vector",
"dimension": 768,
"method": {
"name": "hnsw",
"space_type": "cosinesimil",
"engine": "lucene",
"parameters": {
"ef_construction": 256,
"m": 32
}
}
},
"source": {"type": "keyword"},
"document_type": {"type": "keyword"},
"last_updated": {"type": "date"},
"security_tags": {"type": "keyword"}
}
}
}
RAG Query Implementation:
{
"size": 5,
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "SSL certificate configuration",
"fields": ["title^2", "content"],
"type": "best_fields"
}
},
{
"knn": {
"content_vector": {
"vector": [0.1, -0.2, 0.8, ...],
"k": 10
}
}
}
],
"filter": [
{"term": {"document_type": "technical_guide"}},
{"range": {"last_updated": {"gte": "2023-01-01"}}}
]
}
},
"_source": ["title", "content", "source", "last_updated"],
"highlight": {
"fields": {
"content": {"fragment_size": 200, "number_of_fragments": 3}
}
}
}
Advanced RAG Patterns¶
Multi-Step Retrieval Pipeline:
Advanced RAG systems often require multiple retrieval steps to gather comprehensive information for complex queries. OpenSearch's flexibility enables sophisticated retrieval workflows.
def advanced_rag_retrieval(query, opensearch_client):
# Step 1: Initial broad retrieval
initial_results = opensearch_client.search(
index="knowledge_base",
body={
"size": 20,
"query": {
"bool": {
"should": [
{"multi_match": {"query": query, "fields": ["title^3", "content"]}},
{"knn": {"content_vector": {"vector": encode_query(query), "k": 15}}}
]
}
}
}
)
# Step 2: Extract key concepts from initial results
key_concepts = extract_concepts(initial_results)
# Step 3: Focused retrieval based on identified concepts
focused_results = []
for concept in key_concepts:
concept_results = opensearch_client.search(
index="knowledge_base",
body={
"size": 10,
"query": {
"bool": {
"must": [
{"knn": {"content_vector": {"vector": encode_query(concept), "k": 8}}},
{"terms": {"topics": key_concepts}}
]
}
}
}
)
focused_results.extend(concept_results['hits']['hits'])
# Step 4: Deduplicate and rank final results
return deduplicate_and_rank(initial_results, focused_results)
Contextual Retrieval with User Personalization:
{
"query": {
"bool": {
"should": [
{
"knn": {
"content_vector": {
"vector": [0.1, -0.2, 0.8, ...],
"k": 10,
"boost": 1.0
}
}
}
],
"filter": [
{"terms": {"security_tags": ["public", "internal"]}},
{"term": {"user_role_access": "developer"}},
{"range": {"relevance_score": {"gte": 0.7}}}
]
}
},
"rescore": {
"window_size": 50,
"query": {
"rescore_query": {
"function_score": {
"functions": [
{
"field_value_factor": {
"field": "user_rating",
"factor": 1.5,
"modifier": "log1p"
}
},
{
"gauss": {
"last_accessed": {
"origin": "now",
"scale": "7d",
"decay": 0.8
}
}
}
]
}
}
}
}
}
Reranking for RAG Optimization¶
Reranking plays a crucial role in RAG systems by refining initial retrieval results to maximize the quality of information provided to the generation model. OpenSearch's built-in rescoring capabilities enable sophisticated reranking strategies.
Cross-Encoder Reranking Integration:
{
"query": {
"knn": {
"content_vector": {
"vector": [0.1, -0.2, 0.8, ...],
"k": 100
}
}
},
"rescore": {
"window_size": 20,
"query": {
"rescore_query": {
"script_score": {
"query": {"match_all": {}},
"script": {
"source": "params.rerank_scores.get(doc['_id'].value, 0.0)",
"params": {
"rerank_scores": {
"doc_1": 0.95,
"doc_2": 0.87,
"doc_3": 0.82
}
}
}
}
},
"query_weight": 0.3,
"rescore_query_weight": 0.7
}
}
}
Multi-Signal Reranking for RAG:
{
"rescore": {
"window_size": 50,
"query": {
"rescore_query": {
"function_score": {
"functions": [
{
"field_value_factor": {
"field": "citation_count",
"factor": 0.1,
"modifier": "log1p"
}
},
{
"field_value_factor": {
"field": "user_feedback_score",
"factor": 2.0,
"modifier": "sqrt"
}
},
{
"gauss": {
"publish_date": {
"origin": "now",
"scale": "365d",
"decay": 0.7
}
}
},
{
"script_score": {
"script": {
"source": "Math.max(0, (doc['content'].value.length() - 100) / 1000.0)"
}
}
}
],
"score_mode": "sum",
"boost_mode": "multiply"
}
}
}
}
}
Advanced RAG Architectures¶
Multi-Modal RAG¶
Modern RAG systems increasingly need to work with diverse content types beyond text, including images, documents, and multimedia content. OpenSearch's multi-modal capabilities enable sophisticated cross-content retrieval.
Unified Multi-Modal Index:
{
"mappings": {
"properties": {
"content_id": {"type": "keyword"},
"content_type": {"type": "keyword"},
"title": {"type": "text"},
"text_content": {"type": "text"},
"text_vector": {
"type": "knn_vector",
"dimension": 768,
"method": {"name": "hnsw", "space_type": "cosinesimil"}
},
"image_vector": {
"type": "knn_vector",
"dimension": 512,
"method": {"name": "hnsw", "space_type": "cosinesimil"}
},
"unified_vector": {
"type": "knn_vector",
"dimension": 1024,
"method": {"name": "hnsw", "space_type": "cosinesimil"}
},
"media_metadata": {
"properties": {
"file_type": {"type": "keyword"},
"size_bytes": {"type": "long"},
"dimensions": {"type": "keyword"},
"duration": {"type": "float"}
}
}
}
}
}
Cross-Modal Retrieval Query:
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{"term": {"content_type": "image"}},
{"knn": {"unified_vector": {"vector": [0.1, 0.2, ...], "k": 10}}}
]
}
},
{
"bool": {
"must": [
{"term": {"content_type": "text"}},
{"knn": {"text_vector": {"vector": [0.2, -0.1, ...], "k": 10}}}
]
}
}
]
}
}
}
Hierarchical RAG¶
For large knowledge bases with natural hierarchical structures, hierarchical RAG enables more efficient and accurate retrieval by first identifying relevant sections before performing detailed search.
Two-Stage Retrieval Architecture:
Stage 1: Section-Level Retrieval
{
"size": 5,
"query": {
"knn": {
"section_vector": {
"vector": [0.1, -0.2, 0.8, ...],
"k": 10
}
}
},
"aggs": {
"top_sections": {
"terms": {
"field": "section_id",
"size": 3
}
}
}
}
Stage 2: Document-Level Retrieval Within Sections
{
"size": 20,
"query": {
"bool": {
"must": [
{"knn": {"content_vector": {"vector": [0.1, -0.2, 0.8, ...], "k": 15}}},
{"terms": {"section_id": ["section_1", "section_2", "section_3"]}}
]
}
}
}
Agentic RAG¶
Agentic RAG systems use language models not just for generation but also for query planning, retrieval strategy selection, and result evaluation. This creates more sophisticated systems that can adapt their approach based on query complexity.
Query Planning with OpenSearch:
def agentic_rag_pipeline(user_query, opensearch_client):
# Step 1: Query analysis and planning
query_plan = analyze_query_complexity(user_query)
if query_plan.complexity == "simple":
# Direct retrieval for straightforward questions
results = simple_retrieval(user_query, opensearch_client)
elif query_plan.complexity == "multi_aspect":
# Break down query into sub-questions
sub_queries = decompose_query(user_query)
results = []
for sub_query in sub_queries:
sub_results = opensearch_client.search(
index="knowledge_base",
body=build_targeted_query(sub_query, query_plan.domain)
)
results.extend(sub_results['hits']['hits'])
elif query_plan.complexity == "analytical":
# Multi-step reasoning approach
results = analytical_retrieval_pipeline(user_query, opensearch_client)
# Step 2: Result validation and filtering
validated_results = validate_retrieval_quality(results, user_query)
# Step 3: Generate response with confidence scoring
response = generate_with_confidence(user_query, validated_results)
return response
def build_targeted_query(sub_query, domain):
base_query = {
"bool": {
"should": [
{"multi_match": {"query": sub_query, "fields": ["title^2", "content"]}},
{"knn": {"content_vector": {"vector": encode_query(sub_query), "k": 10}}}
]
}
}
# Domain-specific filtering
if domain == "technical":
base_query["bool"]["filter"] = [
{"term": {"document_type": "technical_documentation"}},
{"range": {"technical_accuracy_score": {"gte": 0.8}}}
]
elif domain == "policy":
base_query["bool"]["filter"] = [
{"term": {"document_type": "policy"}},
{"range": {"last_reviewed": {"gte": "now-1y"}}}
]
return {"query": base_query, "size": 15}
RAG Evaluation and Optimization¶
Retrieval Quality Metrics¶
Measuring and optimizing RAG system performance requires comprehensive evaluation of both retrieval and generation components. OpenSearch provides tools and patterns to support effective RAG evaluation.
Retrieval Evaluation Metrics:
Recall@K: Measures what percentage of relevant documents appear in the top-k retrieved results. Critical for ensuring the RAG system has access to necessary information.
Precision@K: Evaluates how many of the top-k retrieved documents are actually relevant. Important for minimizing noise in the generation context.
Mean Reciprocal Rank (MRR): Assesses how quickly users find relevant information, focusing on the rank position of the first relevant result.
NDCG (Normalized Discounted Cumulative Gain): Provides a more nuanced evaluation that considers both relevance and ranking quality.
Implementation Example:
def evaluate_retrieval_quality(test_queries, opensearch_client):
metrics = []
for query_data in test_queries:
query = query_data['query']
relevant_docs = query_data['relevant_document_ids']
# Perform retrieval
results = opensearch_client.search(
index="knowledge_base",
body={
"size": 20,
"query": build_rag_query(query),
"_source": ["document_id"]
}
)
retrieved_ids = [hit['_source']['document_id'] for hit in results['hits']['hits']]
# Calculate metrics
metrics.append({
'query': query,
'recall_at_5': calculate_recall_at_k(relevant_docs, retrieved_ids, 5),
'recall_at_10': calculate_recall_at_k(relevant_docs, retrieved_ids, 10),
'precision_at_5': calculate_precision_at_k(relevant_docs, retrieved_ids, 5),
'mrr': calculate_mrr(relevant_docs, retrieved_ids),
'ndcg_at_10': calculate_ndcg(relevant_docs, retrieved_ids, 10)
})
return aggregate_metrics(metrics)
End-to-End RAG Evaluation¶
Answer Quality Assessment:
Beyond retrieval metrics, RAG systems require evaluation of the final generated responses for accuracy, relevance, completeness, and factual consistency.
def evaluate_rag_responses(test_cases, rag_system):
evaluation_results = []
for test_case in test_cases:
question = test_case['question']
expected_answer = test_case['expected_answer']
source_documents = test_case['relevant_documents']
# Generate RAG response
rag_response = rag_system.generate_response(question)
# Evaluate response quality
evaluation = {
'question': question,
'generated_answer': rag_response.answer,
'retrieved_sources': rag_response.sources,
# Automatic metrics
'factual_consistency': check_factual_consistency(
rag_response.answer,
rag_response.sources
),
'answer_relevance': calculate_answer_relevance(
question,
rag_response.answer
),
'source_quality': evaluate_source_quality(
rag_response.sources,
source_documents
),
# Human evaluation (when available)
'human_rating': test_case.get('human_rating'),
'human_feedback': test_case.get('human_feedback')
}
evaluation_results.append(evaluation)
return evaluation_results
Performance Optimization Strategies¶
Index Optimization for RAG:
{
"settings": {
"index": {
"refresh_interval": "30s",
"number_of_replicas": 1,
"knn.algo_param.ef_search": 100,
"translog.durability": "async"
}
}
}
Query Optimization Patterns:
def optimized_rag_retrieval(query, opensearch_client, optimization_config):
# Adaptive retrieval size based on query complexity
base_size = optimization_config.get('base_retrieval_size', 10)
query_complexity = analyze_query_complexity(query)
retrieval_size = base_size * query_complexity.multiplier
# Use query profiling for optimization
search_body = {
"size": retrieval_size,
"query": build_optimized_query(query, optimization_config),
"profile": True if optimization_config.get('enable_profiling') else False
}
# Execute search with timeout
results = opensearch_client.search(
index="knowledge_base",
body=search_body,
timeout="5s"
)
# Log performance metrics
if 'profile' in results:
log_query_performance(results['profile'], query)
return results
def build_optimized_query(query, config):
return {
"bool": {
"should": [
{
"multi_match": {
"query": query,
"fields": config.get('text_fields', ["title^2", "content"]),
"type": "best_fields"
}
},
{
"knn": {
"content_vector": {
"vector": encode_query(query),
"k": config.get('vector_k', 15),
"boost": config.get('vector_boost', 1.0)
}
}
}
],
"minimum_should_match": 1
}
}
Production RAG Considerations¶
Scalability and Performance¶
Production RAG systems must handle varying loads while maintaining response quality and speed. OpenSearch's distributed architecture provides the foundation for scalable RAG deployments.
Horizontal Scaling Strategy:
{
"cluster_settings": {
"cluster.routing.allocation.disk.watermark.low": "85%",
"cluster.routing.allocation.disk.watermark.high": "90%",
"indices.memory.index_buffer_size": "20%"
},
"index_template": {
"template": "rag-knowledge-*",
"settings": {
"number_of_shards": 6,
"number_of_replicas": 1,
"refresh_interval": "30s"
}
}
}
Load Balancing for RAG Queries:
class RAGLoadBalancer:
def __init__(self, opensearch_clusters):
self.clusters = opensearch_clusters
self.cluster_weights = self.calculate_cluster_weights()
def route_query(self, query, query_type="standard"):
if query_type == "complex":
# Route complex queries to high-performance cluster
return self.clusters['high_performance']
elif query_type == "batch":
# Route batch queries to dedicated cluster
return self.clusters['batch_processing']
else:
# Load balance standard queries
return self.select_cluster_by_load()
def select_cluster_by_load(self):
cluster_loads = self.get_current_loads()
return min(cluster_loads.items(), key=lambda x: x[1])[0]
Security and Privacy¶
RAG systems often work with sensitive information, requiring robust security measures and privacy protections throughout the retrieval and generation pipeline.
Access Control in RAG:
{
"query": {
"bool": {
"must": [
{"knn": {"content_vector": {"vector": [0.1, -0.2, ...], "k": 10}}}
],
"filter": [
{"terms": {"user_access_groups": ["engineering", "public"]}},
{"term": {"classification_level": "internal"}},
{"bool": {
"should": [
{"term": {"owner_department": "user_department"}},
{"term": {"visibility": "company_wide"}}
]
}}
]
}
}
}
Privacy-Preserving RAG Patterns:
def privacy_aware_rag_retrieval(query, user_context, opensearch_client):
# Sanitize query to remove potential PII
sanitized_query = sanitize_user_query(query)
# Apply user-specific filters
access_filters = build_access_filters(user_context)
# Retrieve with privacy constraints
results = opensearch_client.search(
index="knowledge_base",
body={
"query": {
"bool": {
"must": [build_semantic_query(sanitized_query)],
"filter": access_filters
}
}
}
)
# Post-process to remove sensitive information
filtered_results = apply_data_minimization(results, user_context.clearance_level)
return filtered_results
def sanitize_user_query(query):
# Remove potential PII patterns
pii_patterns = [
r'\b\d{3}-\d{2}-\d{4}\b', # SSN pattern
r'\b[\w\.-]+@[\w\.-]+\.\w+\b', # Email pattern
r'\b\d{16}\b' # Credit card pattern
]
sanitized = query
for pattern in pii_patterns:
sanitized = re.sub(pattern, '[REDACTED]', sanitized)
return sanitized
Monitoring and Observability¶
Production RAG systems require comprehensive monitoring to ensure performance, quality, and reliability. OpenSearch provides extensive monitoring capabilities that integrate well with RAG-specific metrics.
RAG-Specific Monitoring Dashboard:
def create_rag_monitoring_dashboard(opensearch_client):
dashboard_config = {
"retrieval_metrics": {
"avg_retrieval_latency": {
"query": {
"aggs": {
"avg_latency": {
"avg": {"field": "retrieval_time_ms"}
}
}
}
},
"retrieval_success_rate": {
"query": {
"aggs": {
"success_rate": {
"terms": {"field": "retrieval_status"}
}
}
}
}
},
"quality_metrics": {
"user_satisfaction": {
"query": {
"aggs": {
"avg_rating": {
"avg": {"field": "user_rating"}
}
}
}
},
"answer_accuracy": {
"query": {
"aggs": {
"accuracy_distribution": {
"histogram": {
"field": "accuracy_score",
"interval": 0.1
}
}
}
}
}
}
}
return dashboard_config
Alerting for RAG Systems:
{
"trigger": {
"schedule": {"interval": {"period": 5, "unit": "MINUTES"}},
"condition": {
"compare": {
"ctx.payload.aggregations.avg_latency.value": {
"gt": 2000
}
}
}
},
"actions": {
"send_alert": {
"email": {
"to": ["rag-team@company.com"],
"subject": "RAG System Performance Alert",
"body": "Average retrieval latency exceeded 2000ms: {{ctx.payload.aggregations.avg_latency.value}}ms"
}
}
}
}
This comprehensive guide demonstrates how OpenSearch serves as a powerful foundation for building sophisticated RAG systems, from basic document retrieval to advanced multi-modal and agentic architectures. The combination of OpenSearch's robust search capabilities with thoughtful RAG design patterns enables the creation of knowledge systems that provide accurate, relevant, and trustworthy responses while maintaining the scalability and security requirements of production environments.