AI Security

Halucinatii LLM: Cum sa Detectezi si sa Previi Inventiile AI

Nicu Constantin
--7 min lectura
#llm#hallucination#ai-safety#chatgpt#rag

Halucinatiile LLM - cand AI-ul genereaza cu incredere informatii false - sunt o provocare critica. Acest ghid acopera strategii de detectare si prevenire.

Intelegerea Halucinatiilor

Tipuri de Halucinatii LLM:

1. Erori Factuale
   "Turnul Eiffel a fost construit in 1920"  ❌ (De fapt 1889)

2. Surse Fabricate
   "Conform unui studiu Nature din 2023..."  ❌ (Studiul nu exista)

3. Inconsistente Logice
   "X este adevarat" → mai tarziu → "X este fals"  ❌

4. Confuzie de Entitati
   "Einstein a inventat telefonul"  ❌ (Bell l-a inventat)

Strategia 1: Retrieval-Augmented Generation (RAG)

Ancoreaza raspunsurile in documente reale:

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnablePassthrough
 
# Creeaza vector store din documentele tale
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(documents, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
 
# Prompt RAG care previne halucinatia
prompt = ChatPromptTemplate.from_template("""
Answer the question based ONLY on the following context.
If the context doesn't contain the answer, say "I don't have information about that."
Do NOT make up information.
 
Context:
{context}
 
Question: {question}
 
Answer:""")
 
# Chain cu retrieval
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | ChatOpenAI(temperature=0)
)
 
response = chain.invoke("What is our refund policy?")

Verifica daca citatiile exista:

def verify_rag_response(response, retrieved_docs):
    """Verifica daca continutul raspunsului provine din documente."""
    response_lower = response.lower()
    doc_content = " ".join([doc.page_content.lower() for doc in retrieved_docs])
 
    # Extrage fapte potentiale din raspuns
    sentences = response.split('. ')
    verified = []
    unverified = []
 
    for sentence in sentences:
        # Verifica daca frazele cheie apar in documentele sursa
        words = sentence.lower().split()
        key_phrases = [' '.join(words[i:i+3]) for i in range(len(words)-2)]
 
        if any(phrase in doc_content for phrase in key_phrases):
            verified.append(sentence)
        else:
            unverified.append(sentence)
 
    return {
        "verified": verified,
        "unverified": unverified,
        "confidence": len(verified) / (len(verified) + len(unverified))
    }

Strategia 2: Verificarea Auto-Consistentei

Pune aceeasi intrebare in mai multe moduri:

import openai
from collections import Counter
 
def check_consistency(question, model="gpt-4o", samples=5):
    """Genereaza raspunsuri multiple si verifica consistenta."""
    client = openai.OpenAI()
 
    responses = []
    for _ in range(samples):
        response = client.chat.completions.create(
            model=model,
            temperature=0.7,  # Putina variatie
            messages=[
                {"role": "system", "content": "Answer concisely and factually."},
                {"role": "user", "content": question}
            ]
        )
        responses.append(response.choices[0].message.content)
 
    # Verifica daca raspunsurile sunt de acord
    # Pentru intrebari factuale, raspunsurile ar trebui sa fie consistente
    return {
        "responses": responses,
        "unique_answers": len(set(responses)),
        "consistent": len(set(responses)) <= 2  # Permite variatie minora
    }
 
# Utilizare
result = check_consistency("What year was Python released?")
if not result["consistent"]:
    print("Atentie: Raspunsuri inconsistente detectate - posibila halucinatie")

Strategia 3: Scoring de Incredere

Pune modelul sa isi evalueze propria incredere:

def get_answer_with_confidence(question):
    client = openai.OpenAI()
 
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": """
Answer the question, then rate your confidence on a scale of 1-10.
Format:
Answer: [your answer]
Confidence: [1-10]
Reasoning: [why this confidence level]
 
If you're not sure, say so. It's better to admit uncertainty than to guess.
"""},
            {"role": "user", "content": question}
        ]
    )
 
    text = response.choices[0].message.content
 
    # Parseaza increderea
    import re
    confidence_match = re.search(r'Confidence:\s*(\d+)', text)
    confidence = int(confidence_match.group(1)) if confidence_match else 5
 
    return {
        "full_response": text,
        "confidence": confidence,
        "needs_verification": confidence < 7
    }
 
# Utilizare
result = get_answer_with_confidence("Who invented the transistor?")
if result["needs_verification"]:
    print("Incredere scazuta - verifica aceasta informatie!")

Strategia 4: Pipeline de Fact-Checking

Referinta incrucisata cu surse externe:

import requests
 
def fact_check_claim(claim):
    """Foloseste API-uri externe pentru a verifica afirmatii."""
 
    # Optiunea 1: Cautare Wikipedia
    wiki_url = "https://en.wikipedia.org/w/api.php"
    params = {
        "action": "query",
        "list": "search",
        "srsearch": claim,
        "format": "json"
    }
 
    response = requests.get(wiki_url, params=params)
    results = response.json().get("query", {}).get("search", [])
 
    # Optiunea 2: Foloseste un API de cautare
    # Optiunea 3: Verifica fata de baza de cunostinte
 
    return {
        "claim": claim,
        "evidence_found": len(results) > 0,
        "sources": [r["title"] for r in results[:3]]
    }
 
def validate_response(llm_response):
    """Extrage si verifica afirmatiile factuale."""
    client = openai.OpenAI()
 
    # Extrage afirmatii
    extraction = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": """
Extract factual claims from this text that can be verified.
Return as a JSON list of claims.
Only include objective, verifiable facts, not opinions.
"""},
            {"role": "user", "content": llm_response}
        ]
    )
 
    import json
    claims = json.loads(extraction.choices[0].message.content)
 
    # Verifica fiecare afirmatie
    results = []
    for claim in claims:
        verification = fact_check_claim(claim)
        results.append(verification)
 
    return results

Strategia 5: Validare Output Structurat

Aplica schema si valideaza:

from pydantic import BaseModel, validator
from typing import List, Optional
import openai
import json
 
class FactualResponse(BaseModel):
    answer: str
    sources: List[str]
    confidence_level: str  # "high", "medium", "low"
    caveats: Optional[List[str]] = None
 
    @validator('confidence_level')
    def validate_confidence(cls, v):
        if v not in ["high", "medium", "low"]:
            raise ValueError("Invalid confidence level")
        return v
 
    @validator('sources')
    def require_sources(cls, v, values):
        if values.get('confidence_level') == 'high' and len(v) == 0:
            raise ValueError("High confidence claims need sources")
        return v
 
def get_validated_answer(question):
    client = openai.OpenAI()
 
    response = client.chat.completions.create(
        model="gpt-4o",
        response_format={"type": "json_object"},
        messages=[
            {"role": "system", "content": f"""
Answer questions with verified information.
Return JSON matching this schema:
{FactualResponse.schema_json()}
 
Rules:
- Only claim "high" confidence for well-known facts
- Include sources when possible
- List caveats for uncertain information
"""},
            {"role": "user", "content": question}
        ]
    )
 
    data = json.loads(response.choices[0].message.content)
 
    # Valideaza cu Pydantic
    try:
        validated = FactualResponse(**data)
        return validated
    except Exception as e:
        print(f"Validare esuata: {e}")
        return None

Strategia 6: Constientizarea Knowledge Cutoff

Gestioneaza intrebarile despre evenimente recente:

from datetime import datetime
 
def handle_temporal_query(question):
    client = openai.OpenAI()
 
    # Verifica daca intrebarea e despre evenimente recente
    temporal_check = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": """
Analyze if this question requires knowledge of events after April 2024.
Return JSON: {"requires_recent": true/false, "reason": "..."}
"""},
            {"role": "user", "content": question}
        ]
    )
 
    import json
    result = json.loads(temporal_check.choices[0].message.content)
 
    if result["requires_recent"]:
        return {
            "warning": "Aceasta intrebare poate necesita informatii de dupa knowledge cutoff (Aprilie 2024)",
            "recommendation": "Verifica cu surse actuale",
            "answer": None
        }
 
    # Continua cu raspunsul
    answer = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": question}
        ]
    )
 
    return {"answer": answer.choices[0].message.content}

Strategia 7: Human-in-the-Loop

Semnaleaza raspunsurile incerte pentru revizuire:

class HumanReviewQueue:
    def __init__(self):
        self.pending_reviews = []
 
    def process_with_review(self, question, answer, confidence):
        if confidence < 0.7:
            self.pending_reviews.append({
                "question": question,
                "proposed_answer": answer,
                "confidence": confidence,
                "status": "pending_review"
            })
            return {
                "answer": "Acest raspuns este in asteptarea revizuirii umane pentru acuratete.",
                "draft": answer,
                "review_id": len(self.pending_reviews) - 1
            }
        return {"answer": answer, "verified": True}
 
    def approve_review(self, review_id, corrected_answer=None):
        review = self.pending_reviews[review_id]
        review["status"] = "approved"
        review["final_answer"] = corrected_answer or review["proposed_answer"]
        return review["final_answer"]
 
# Utilizare in productie
review_queue = HumanReviewQueue()
 
def answer_question(question):
    llm_answer = get_llm_response(question)
    confidence = estimate_confidence(llm_answer)
 
    return review_queue.process_with_review(question, llm_answer, confidence)

Checklist Anti-Halucinatie pentru Productie

def production_safe_response(question, context=None):
    """Pipeline complet pentru raspunsuri rezistente la halucinatii."""
 
    # 1. Daca e disponibil context, foloseste RAG
    if context:
        grounded_answer = rag_chain.invoke({
            "question": question,
            "context": context
        })
    else:
        grounded_answer = None
 
    # 2. Obtine raspunsul LLM cu incredere
    llm_result = get_answer_with_confidence(question)
 
    # 3. Verifica consistenta
    consistency = check_consistency(question, samples=3)
 
    # 4. Determina raspunsul final
    if grounded_answer and llm_result["confidence"] >= 7:
        return {
            "answer": grounded_answer,
            "source": "grounded",
            "confidence": "high"
        }
    elif consistency["consistent"] and llm_result["confidence"] >= 7:
        return {
            "answer": llm_result["full_response"],
            "source": "llm",
            "confidence": "medium",
            "note": "Verificat prin verificare de consistenta"
        }
    else:
        return {
            "answer": "Nu am suficienta incredere pentru a raspunde corect.",
            "confidence": "low",
            "recommendation": "Verifica cu surse de autoritate"
        }

Referinta Rapida: Tehnici de Prevenire

| Tehnica | Cel mai bun pentru | Complexitate | |-----------|----------|------------| | RAG | Fapte specifice domeniului | Medie | | Auto-consistenta | Intrebari generale | Scazuta | | Scoring incredere | Toate raspunsurile | Scazuta | | Fact-checking | Afirmatii critice | Ridicata | | Output structurat | Extractie de date | Medie | | Revizuire umana | Decizii cu miza ridicata | Ridicata |

Construiesti Sisteme AI Fiabile?

Prevenirea halucinatiilor este critica pentru AI-ul enterprise. Echipa noastra ofera:

  • Audituri de fiabilitate AI
  • Pipeline-uri personalizate de fact-checking
  • Consultanta de implementare RAG
  • Conformitate EU AI Act

Obtine expertiza in siguranta AI

Ai nevoie de ajutor cu conformitatea EU AI Act sau securitatea AI?

Programeaza o consultatie gratuita de 30 de minute. Fara obligatii.

Programeaza un Apel

Weekly AI Security & Automation Digest

Get the latest on AI Security, workflow automation, secure integrations, and custom platform development delivered weekly.

No spam. Unsubscribe anytime.