AI Security

RAG nu functioneaza? Probleme frecvente si cum sa le rezolvi

Nicu Constantin
--7 min lectura
#rag#llm#vector-database#embeddings#troubleshooting

RAG (Retrieval Augmented Generation) promite sa faca AI-ul sa raspunda din documentele tale, dar de multe ori nu functioneaza cum te astepti. Acest ghid te ajuta sa diagnostichezi si sa rezolvi problemele frecvente RAG.

Cum ar trebui sa functioneze RAG

User Question → Embed Question → Search Vector DB → Get Relevant Chunks
                                                            ↓
                                         LLM + Context → Answer

Cand esueaza:
- Documente gresite returnate
- Documente corecte, raspuns gresit
- Performanta lenta
- Niciun raspuns

Problema 1: Retrieval-ul returneaza documente irelevante

Simptom: Intrebi "Care e politica de retur?" dar primesti documente despre livrare.

Solutia 1 - Verifica calitatea embedding-urilor:

from langchain_openai import OpenAIEmbeddings
import numpy as np
 
embeddings = OpenAIEmbeddings()
 
# Test semantic similarity
query = "What is the refund policy?"
doc1 = "Our refund policy allows returns within 30 days."
doc2 = "Shipping takes 3-5 business days."
 
query_emb = embeddings.embed_query(query)
doc1_emb = embeddings.embed_query(doc1)
doc2_emb = embeddings.embed_query(doc2)
 
# Calculate cosine similarity
def cosine_sim(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
 
print(f"Query vs Doc1 (refund): {cosine_sim(query_emb, doc1_emb):.3f}")  # Should be high ~0.85+
print(f"Query vs Doc2 (shipping): {cosine_sim(query_emb, doc2_emb):.3f}")  # Should be low ~0.70
 
# If both are similar, your embeddings aren't distinguishing well

Solutia 2 - Imbunatateste chunking-ul:

from langchain.text_splitter import RecursiveCharacterTextSplitter
 
# Bad: Too small chunks lose context
bad_splitter = RecursiveCharacterTextSplitter(
    chunk_size=100,  # Too small!
    chunk_overlap=0
)
 
# Good: Larger chunks with overlap
good_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,      # Good size
    chunk_overlap=200,    # Overlap preserves context
    separators=["\n\n", "\n", ". ", " ", ""]  # Semantic breaks
)
 
# Even better: Section-aware splitting
def split_by_headers(document):
    """Split document by markdown headers."""
    import re
    sections = re.split(r'\n#{1,3}\s', document)
    return [s.strip() for s in sections if s.strip()]

Solutia 3 - Adauga filtrare pe metadata:

# Add metadata when indexing
documents = []
for doc in raw_documents:
    documents.append({
        "content": doc.text,
        "metadata": {
            "category": doc.category,  # "refund", "shipping", etc.
            "date": doc.date,
            "source": doc.filename
        }
    })
 
# Filter during retrieval
retriever = vectorstore.as_retriever(
    search_kwargs={
        "k": 4,
        "filter": {"category": "refund"}  # Only search refund docs
    }
)

Solutia 4 - Foloseste cautare hibrida:

from langchain.retrievers import BM25Retriever, EnsembleRetriever
 
# BM25 for keyword matching
bm25_retriever = BM25Retriever.from_documents(documents)
bm25_retriever.k = 4
 
# Vector search for semantic
vector_retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
 
# Combine both
ensemble_retriever = EnsembleRetriever(
    retrievers=[bm25_retriever, vector_retriever],
    weights=[0.5, 0.5]  # Equal weight
)

Problema 2: Gaseste documentele corecte, dar raspunsul e gresit

Simptom: Chunk-urile returnate contin raspunsul, dar LLM-ul tot greseste.

Solutia 1 - Imbunatateste promptul:

# Bad prompt
prompt = f"Answer this: {question}\nContext: {context}"
 
# Good prompt with clear instructions
prompt = f"""Answer the question based ONLY on the context below.
If the context doesn't contain the answer, say "I don't have that information."
Do not make up information not in the context.
 
Context:
{context}
 
Question: {question}
 
Answer:"""

Solutia 2 - Verifica daca contextul chiar ajunge la LLM:

# Debug: Print what the LLM actually sees
def debug_rag(question, context):
    full_prompt = f"Context:\n{context}\n\nQuestion: {question}"
    print("="*50)
    print("LLM INPUT:")
    print(full_prompt)
    print("="*50)
 
    response = llm.invoke(full_prompt)
    print("\nLLM OUTPUT:")
    print(response)
    return response

Solutia 3 - Redu numarul de chunk-uri daca e prea mult zgomot:

# Too many chunks can confuse the LLM
retriever = vectorstore.as_retriever(
    search_kwargs={
        "k": 3  # Fewer, more relevant chunks
    }
)
 
# Or use relevance score filtering
retriever = vectorstore.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={
        "k": 5,
        "score_threshold": 0.7  # Only high-confidence matches
    }
)

Solutia 4 - Foloseste un re-ranker:

from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import CohereRerank
 
# Initial retrieval gets more docs
base_retriever = vectorstore.as_retriever(search_kwargs={"k": 10})
 
# Re-ranker picks the best
compressor = CohereRerank(top_n=3)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=base_retriever
)

Problema 3: RAG-ul e prea lent

Simptom: Dureaza 5+ secunde sa raspunda la intrebari simple.

Solutia 1 - Optimizeaza baza de date vector:

# Use persistent storage instead of in-memory
from langchain_community.vectorstores import Chroma
 
# Slow: Rebuilds every time
vectorstore = Chroma.from_documents(documents, embeddings)
 
# Fast: Persists to disk
vectorstore = Chroma(
    persist_directory="./chroma_db",
    embedding_function=embeddings
)

Solutia 2 - Reduce apelurile de embedding:

# Cache embeddings
from functools import lru_cache
 
@lru_cache(maxsize=1000)
def cached_embed(text):
    return tuple(embeddings.embed_query(text))
 
# Or pre-compute common queries
common_queries = ["refund policy", "shipping time", "contact support"]
precomputed = {q: embeddings.embed_query(q) for q in common_queries}

Solutia 3 - Foloseste operatii asincrone:

import asyncio
from langchain_openai import ChatOpenAI
 
async def fast_rag(question):
    # Run retrieval and LLM prep in parallel
    retrieval_task = asyncio.create_task(
        vectorstore.asimilarity_search(question, k=3)
    )
 
    # While retrieving, set up LLM
    llm = ChatOpenAI(model="gpt-4o-mini")
 
    # Wait for retrieval
    docs = await retrieval_task
 
    # Generate answer
    context = "\n".join([d.page_content for d in docs])
    response = await llm.ainvoke(f"Context: {context}\nQuestion: {question}")
 
    return response

Problema 4: Probleme de conexiune la vector store

Simptom: Erori de tipul "Collection not found" sau timeout la conexiune.

Solutie pentru Chroma:

from langchain_community.vectorstores import Chroma
import chromadb
 
# Create persistent client
client = chromadb.PersistentClient(path="./chroma_db")
 
# Check collections exist
print(client.list_collections())
 
# Create/get collection
vectorstore = Chroma(
    client=client,
    collection_name="my_docs",
    embedding_function=OpenAIEmbeddings()
)

Solutie pentru Pinecone:

from pinecone import Pinecone
from langchain_pinecone import PineconeVectorStore
 
# Initialize with explicit error handling
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
 
# Check index exists
indexes = pc.list_indexes()
if "my-index" not in [idx["name"] for idx in indexes]:
    raise ValueError("Index 'my-index' not found!")
 
# Connect
vectorstore = PineconeVectorStore(
    index_name="my-index",
    embedding=OpenAIEmbeddings()
)

Problema 5: Documentele nu se indexeaza

Simptom: Cautarea nu returneaza nimic desi ai adaugat documente.

Solutia 1 - Verifica daca indexarea a reusit:

# Add documents with confirmation
ids = vectorstore.add_documents(documents)
print(f"Added {len(ids)} documents")
 
# Verify they exist
test_results = vectorstore.similarity_search("test query", k=1)
print(f"Can retrieve: {len(test_results) > 0}")

Solutia 2 - Verifica formatul documentelor:

from langchain.schema import Document
 
# Wrong format
vectorstore.add_texts(["text1", "text2"])  # May lose metadata
 
# Correct format
docs = [
    Document(page_content="text1", metadata={"source": "file1"}),
    Document(page_content="text2", metadata={"source": "file2"})
]
vectorstore.add_documents(docs)

Solutia 3 - Persista modificarile (Chroma):

# After adding documents
vectorstore.persist()
 
# Or use auto-persist
vectorstore = Chroma(
    persist_directory="./chroma_db",
    embedding_function=embeddings
)
# Changes auto-persist in newer versions

Checklist complet de debug RAG

def debug_rag_pipeline(question, vectorstore, llm):
    print("="*60)
    print("RAG DEBUG REPORT")
    print("="*60)
 
    # 1. Test embedding
    print("\n1. EMBEDDING TEST")
    try:
        emb = vectorstore._embedding_function.embed_query(question)
        print(f"   Embedding works (dim: {len(emb)})")
    except Exception as e:
        print(f"   Embedding failed: {e}")
        return
 
    # 2. Test retrieval
    print("\n2. RETRIEVAL TEST")
    try:
        docs = vectorstore.similarity_search(question, k=3)
        print(f"   Retrieved {len(docs)} documents")
        for i, doc in enumerate(docs):
            print(f"   Doc {i+1}: {doc.page_content[:100]}...")
    except Exception as e:
        print(f"   Retrieval failed: {e}")
        return
 
    # 3. Test LLM
    print("\n3. LLM TEST")
    try:
        context = "\n".join([d.page_content for d in docs])
        prompt = f"Context: {context}\n\nQuestion: {question}\n\nAnswer:"
        response = llm.invoke(prompt)
        print(f"   LLM responded")
        print(f"   Response: {response.content[:200]}...")
    except Exception as e:
        print(f"   LLM failed: {e}")
 
    print("\n" + "="*60)
 
# Usage
debug_rag_pipeline("What is the refund policy?", vectorstore, llm)

Referinta rapida: solutii frecvente

| Simptom | Cauza probabila | Solutie | |---------|----------------|---------| | Documente gresite returnate | Chunking slab | Creste chunk size + overlap | | Scoruri de relevanta scazute | Embedding-uri slabe | Incearca alt model de embedding | | Documente corecte, raspuns gresit | Probleme de prompt | Imbunatateste claritatea promptului | | Query-uri lente | Fara persistenta | Foloseste vector DB persistent | | Rezultate goale | Nu s-au indexat | Verifica add_documents + persist |

Ai nevoie de ajutor cu sistemul tau RAG?

Sistemele RAG de productie necesita tuning atent. Echipa noastra ofera:

  • Design de arhitectura RAG
  • Optimizarea retrieval-ului
  • Audituri de securitate LLM
  • Tuning de performanta

Obtine expertiza RAG


Sistemul tau AI e conform cu EU AI Act? Evaluare gratuita de risc - afla in 2 minute →

Ai nevoie de ajutor cu conformitatea EU AI Act sau securitatea AI?

Programeaza o consultatie gratuita de 30 de minute. Fara obligatii.

Programeaza un Apel

Weekly AI Security & Automation Digest

Get the latest on AI Security, workflow automation, secure integrations, and custom platform development delivered weekly.

No spam. Unsubscribe anytime.