n8n Automation

Procesarea Fisierelor si Automatizarea Documentelor cu n8n

Petru Constantin
--12 min lectura
#n8n#document automation#file processing#workflow automation#PDF

Automatizarea documentelor elimina manipularea manuala a fisierelor si permite procese de business scalabile. Acest ghid acopera construirea de workflow-uri robuste de procesare a fisierelor cu n8n pentru diferite tipuri de documente si sisteme de stocare in cloud.

Fundamentele Procesarii Fisierelor

Procesarea Datelor Binare

// Function Node: Utilitare Procesare Fisiere
const fileBuffer = $input.first().binary.data;
const fileName = $input.first().binary.fileName;
const mimeType = $input.first().binary.mimeType;
 
// Determina procesarea pe baza tipului de fisier
const processors = {
  'application/pdf': processPDF,
  'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet': processExcel,
  'text/csv': processCSV,
  'image/jpeg': processImage,
  'image/png': processImage,
  'application/json': processJSON
};
 
const processor = processors[mimeType];
if (!processor) {
  throw new Error(`Tip de fisier nesuportat: ${mimeType}`);
}
 
// Metadata fisier
const metadata = {
  fileName,
  mimeType,
  size: Buffer.from(fileBuffer, 'base64').length,
  processedAt: new Date().toISOString()
};
 
return [{
  json: {
    metadata,
    processorType: mimeType
  },
  binary: {
    data: fileBuffer
  }
}];

Validarea si Securitatea Fisierelor

// Function Node: Validare Securizata a Fisierelor
const file = $input.first();
const buffer = Buffer.from(file.binary.data, 'base64');
 
// Configuratie
const config = {
  maxSizeBytes: 50 * 1024 * 1024, // 50MB
  allowedMimeTypes: [
    'application/pdf',
    'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
    'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
    'text/csv',
    'image/jpeg',
    'image/png'
  ],
  blockedExtensions: ['.exe', '.bat', '.cmd', '.sh', '.ps1', '.vbs']
};
 
// Verificari de validare
const validations = [];
 
// Verificare dimensiune
if (buffer.length > config.maxSizeBytes) {
  validations.push({
    check: 'size',
    passed: false,
    message: `Fisierul depaseste dimensiunea maxima de ${config.maxSizeBytes / 1024 / 1024}MB`
  });
} else {
  validations.push({ check: 'size', passed: true });
}
 
// Verificare tip MIME
if (!config.allowedMimeTypes.includes(file.binary.mimeType)) {
  validations.push({
    check: 'mimeType',
    passed: false,
    message: `Tipul MIME ${file.binary.mimeType} nu este permis`
  });
} else {
  validations.push({ check: 'mimeType', passed: true });
}
 
// Verificare extensie
const fileName = file.binary.fileName || '';
const extension = fileName.substring(fileName.lastIndexOf('.')).toLowerCase();
if (config.blockedExtensions.includes(extension)) {
  validations.push({
    check: 'extension',
    passed: false,
    message: `Extensia ${extension} este blocata`
  });
} else {
  validations.push({ check: 'extension', passed: true });
}
 
// Verificare magic bytes pentru tipuri comune de fisiere
const magicBytes = {
  'application/pdf': [0x25, 0x50, 0x44, 0x46], // %PDF
  'image/jpeg': [0xFF, 0xD8, 0xFF],
  'image/png': [0x89, 0x50, 0x4E, 0x47]
};
 
const expectedMagic = magicBytes[file.binary.mimeType];
if (expectedMagic) {
  const fileMagic = [...buffer.slice(0, expectedMagic.length)];
  const magicMatch = expectedMagic.every((byte, i) => byte === fileMagic[i]);
 
  if (!magicMatch) {
    validations.push({
      check: 'magicBytes',
      passed: false,
      message: 'Continutul fisierului nu corespunde tipului MIME declarat'
    });
  } else {
    validations.push({ check: 'magicBytes', passed: true });
  }
}
 
// Rezultat global
const allPassed = validations.every(v => v.passed);
 
return [{
  json: {
    valid: allPassed,
    validations,
    fileName: file.binary.fileName,
    mimeType: file.binary.mimeType,
    sizeBytes: buffer.length
  },
  binary: allPassed ? { data: file.binary.data } : undefined
}];

Workflow-uri de Procesare PDF

Extragerea Textului din PDF

// Function Node: Extragere Text PDF (folosind pdf-parse)
const pdfParse = require('pdf-parse');
 
const pdfBuffer = Buffer.from($input.first().binary.data, 'base64');
 
const options = {
  max: 0, // Parseaza toate paginile (0 = fara limita)
  pagerender: function(pageData) {
    return pageData.getTextContent().then(function(textContent) {
      let text = '';
      let lastY = null;
 
      for (const item of textContent.items) {
        // Adauga linie noua cand pozitia Y se schimba semnificativ
        if (lastY !== null && Math.abs(lastY - item.transform[5]) > 5) {
          text += '\n';
        }
        text += item.str + ' ';
        lastY = item.transform[5];
      }
 
      return text;
    });
  }
};
 
const data = await pdfParse(pdfBuffer, options);
 
return [{
  json: {
    text: data.text,
    numPages: data.numpages,
    info: {
      title: data.info?.Title,
      author: data.info?.Author,
      creator: data.info?.Creator,
      creationDate: data.info?.CreationDate
    },
    metadata: data.metadata
  }
}];

Generare PDF din Template

// Function Node: Genereaza PDF din Template HTML
const puppeteer = require('puppeteer');
 
const templateData = $input.first().json;
 
// Template HTML cu legare de date
const htmlTemplate = `
<!DOCTYPE html>
<html>
<head>
  <style>
    body { font-family: Arial, sans-serif; margin: 40px; }
    .header { border-bottom: 2px solid #333; padding-bottom: 20px; margin-bottom: 30px; }
    .logo { max-height: 60px; }
    h1 { color: #2563eb; }
    table { width: 100%; border-collapse: collapse; margin: 20px 0; }
    th, td { padding: 12px; text-align: left; border-bottom: 1px solid #ddd; }
    th { background: #f5f5f5; }
    .total { font-size: 18px; font-weight: bold; text-align: right; }
    .footer { margin-top: 40px; padding-top: 20px; border-top: 1px solid #ddd; font-size: 12px; color: #666; }
  </style>
</head>
<body>
  <div class="header">
    <h1>Factura #${templateData.invoiceNumber}</h1>
    <p>Data: ${templateData.date}</p>
    <p>Scadenta: ${templateData.dueDate}</p>
  </div>
 
  <div class="customer">
    <h3>Facturare catre:</h3>
    <p>${templateData.customer.name}</p>
    <p>${templateData.customer.address}</p>
    <p>${templateData.customer.email}</p>
  </div>
 
  <table>
    <thead>
      <tr>
        <th>Descriere</th>
        <th>Cantitate</th>
        <th>Pret Unitar</th>
        <th>Total</th>
      </tr>
    </thead>
    <tbody>
      ${templateData.items.map(item => `
        <tr>
          <td>${item.description}</td>
          <td>${item.quantity}</td>
          <td>$${item.unitPrice.toFixed(2)}</td>
          <td>$${(item.quantity * item.unitPrice).toFixed(2)}</td>
        </tr>
      `).join('')}
    </tbody>
  </table>
 
  <div class="total">
    <p>Subtotal: $${templateData.subtotal.toFixed(2)}</p>
    <p>TVA (${templateData.taxRate}%): $${templateData.tax.toFixed(2)}</p>
    <p>Total: $${templateData.total.toFixed(2)}</p>
  </div>
 
  <div class="footer">
    <p>Va multumim pentru colaborare!</p>
    <p>Termeni de plata: Net 30</p>
  </div>
</body>
</html>
`;
 
// Genereaza PDF
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.setContent(htmlTemplate);
 
const pdfBuffer = await page.pdf({
  format: 'A4',
  margin: { top: '20mm', right: '20mm', bottom: '20mm', left: '20mm' },
  printBackground: true
});
 
await browser.close();
 
return [{
  json: {
    fileName: `factura-${templateData.invoiceNumber}.pdf`,
    generated: new Date().toISOString()
  },
  binary: {
    data: pdfBuffer.toString('base64'),
    mimeType: 'application/pdf',
    fileName: `factura-${templateData.invoiceNumber}.pdf`
  }
}];

Procesarea Spreadsheet-urilor

Procesare Fisiere Excel

// Function Node: Procesare Excel cu XLSX
const XLSX = require('xlsx');
 
const excelBuffer = Buffer.from($input.first().binary.data, 'base64');
const workbook = XLSX.read(excelBuffer, { type: 'buffer' });
 
// Obtine toate numele de sheet-uri
const sheetNames = workbook.SheetNames;
 
// Proceseaza fiecare sheet
const sheets = {};
for (const sheetName of sheetNames) {
  const worksheet = workbook.Sheets[sheetName];
 
  // Converteste in JSON
  const jsonData = XLSX.utils.sheet_to_json(worksheet, {
    header: 1, // Foloseste primul rand ca header-e
    defval: null, // Valoare implicita pentru celulele goale
    blankrows: false
  });
 
  // Obtine header-ele (primul rand)
  const headers = jsonData[0] || [];
 
  // Converteste in array de obiecte
  const rows = jsonData.slice(1).map(row => {
    const obj = {};
    headers.forEach((header, index) => {
      if (header) {
        obj[header] = row[index];
      }
    });
    return obj;
  });
 
  sheets[sheetName] = {
    headers,
    rows,
    rowCount: rows.length
  };
}
 
return [{
  json: {
    fileName: $input.first().binary.fileName,
    sheetNames,
    sheets,
    totalSheets: sheetNames.length
  }
}];

Crearea unui Excel din Date

// Function Node: Genereaza Raport Excel
const XLSX = require('xlsx');
 
const reportData = $input.first().json;
 
// Creeaza workbook
const workbook = XLSX.utils.book_new();
 
// Creeaza sheet-ul de sumar
const summaryData = [
  ['Raport Generat', new Date().toISOString()],
  ['Perioada', reportData.period],
  ['Total Inregistrari', reportData.totalRecords],
  [],
  ['Metrica', 'Valoare'],
  ...Object.entries(reportData.metrics)
];
 
const summarySheet = XLSX.utils.aoa_to_sheet(summaryData);
XLSX.utils.book_append_sheet(workbook, summarySheet, 'Sumar');
 
// Creeaza sheet-ul de date
const dataSheet = XLSX.utils.json_to_sheet(reportData.records);
 
// Aplica latimile coloanelor
const colWidths = Object.keys(reportData.records[0] || {}).map(key => ({
  wch: Math.max(key.length, 15)
}));
dataSheet['!cols'] = colWidths;
 
XLSX.utils.book_append_sheet(workbook, dataSheet, 'Date');
 
// Genereaza buffer
const excelBuffer = XLSX.write(workbook, {
  type: 'buffer',
  bookType: 'xlsx'
});
 
return [{
  json: {
    fileName: `raport-${reportData.period}.xlsx`,
    sheets: ['Sumar', 'Date']
  },
  binary: {
    data: excelBuffer.toString('base64'),
    mimeType: 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
    fileName: `raport-${reportData.period}.xlsx`
  }
}];

Procesare CSV

// Function Node: Parser CSV cu Validare
const Papa = require('papaparse');
 
const csvContent = Buffer.from(
  $input.first().binary.data, 'base64'
).toString('utf8');
 
// Configuratie parsare
const config = {
  header: true,
  skipEmptyLines: true,
  transformHeader: (header) => header.trim().toLowerCase().replace(/\s+/g, '_'),
  transform: (value) => value.trim()
};
 
const result = Papa.parse(csvContent, config);
 
// Validare
const errors = [];
const validRows = [];
 
// Defineste schema asteptata
const schema = {
  required: ['email', 'name'],
  email: {
    pattern: /^[^\s@]+@[^\s@]+\.[^\s@]+$/,
    message: 'Format email invalid'
  },
  phone: {
    pattern: /^\+?[\d\s-()]{10,}$/,
    message: 'Format telefon invalid'
  }
};
 
result.data.forEach((row, index) => {
  const rowErrors = [];
 
  // Verifica campurile obligatorii
  for (const field of schema.required) {
    if (!row[field]) {
      rowErrors.push(`Camp obligatoriu lipsa: ${field}`);
    }
  }
 
  // Valideaza patternuri
  for (const [field, rules] of Object.entries(schema)) {
    if (field === 'required') continue;
    if (row[field] && rules.pattern && !rules.pattern.test(row[field])) {
      rowErrors.push(`${field}: ${rules.message}`);
    }
  }
 
  if (rowErrors.length > 0) {
    errors.push({ row: index + 2, errors: rowErrors }); // +2 pentru header si indexarea de la 0
  } else {
    validRows.push(row);
  }
});
 
return [{
  json: {
    totalRows: result.data.length,
    validRows: validRows.length,
    errorRows: errors.length,
    errors: errors.slice(0, 100), // Limiteaza raportarea erorilor
    headers: result.meta.fields,
    data: validRows
  }
}];

Integrare Cloud Storage

Sincronizare Fisiere Multi-Cloud

// Function Node: Router Cloud Storage
const file = $input.first();
const config = $input.first().json.config;
 
// Determina storage-ul tinta pe baza tipului si dimensiunii fisierului
function determineStorage(file, config) {
  const size = Buffer.from(file.binary.data, 'base64').length;
  const mimeType = file.binary.mimeType;
 
  // Fisierele mari merg in S3
  if (size > 100 * 1024 * 1024) { // 100MB
    return 's3';
  }
 
  // Imaginile merg in Cloudinary pentru CDN
  if (mimeType.startsWith('image/')) {
    return 'cloudinary';
  }
 
  // Documentele merg in Google Drive pentru colaborare
  if (mimeType.includes('document') || mimeType.includes('spreadsheet')) {
    return 'google_drive';
  }
 
  // Implicit S3
  return 's3';
}
 
const targetStorage = determineStorage(file, config);
 
// Genereaza calea de stocare
const timestamp = Date.now();
const sanitizedName = file.binary.fileName
  .replace(/[^a-zA-Z0-9.-]/g, '_')
  .toLowerCase();
 
const storagePath = `uploads/${new Date().toISOString().slice(0, 7)}/${timestamp}-${sanitizedName}`;
 
return [{
  json: {
    targetStorage,
    storagePath,
    fileName: file.binary.fileName,
    mimeType: file.binary.mimeType,
    size: Buffer.from(file.binary.data, 'base64').length
  },
  binary: {
    data: file.binary.data
  }
}];

Upload S3 cu Presigned URL

// Function Node: Genereaza S3 Presigned URL
const AWS = require('aws-sdk');
 
const s3 = new AWS.S3({
  accessKeyId: $env.AWS_ACCESS_KEY_ID,
  secretAccessKey: $env.AWS_SECRET_ACCESS_KEY,
  region: $env.AWS_REGION
});
 
const fileInfo = $input.first().json;
 
// Genereaza presigned URL pentru upload
const uploadParams = {
  Bucket: $env.S3_BUCKET,
  Key: fileInfo.storagePath,
  ContentType: fileInfo.mimeType,
  Expires: 3600 // 1 ora
};
 
const uploadUrl = s3.getSignedUrl('putObject', uploadParams);
 
// Genereaza presigned URL pentru download
const downloadParams = {
  Bucket: $env.S3_BUCKET,
  Key: fileInfo.storagePath,
  Expires: 86400 // 24 ore
};
 
const downloadUrl = s3.getSignedUrl('getObject', downloadParams);
 
return [{
  json: {
    uploadUrl,
    downloadUrl,
    bucket: $env.S3_BUCKET,
    key: fileInfo.storagePath,
    expiresIn: {
      upload: 3600,
      download: 86400
    }
  }
}];

Workflow-uri de Conversie Documente

Pipeline de Conversie Format

// Function Node: Router Conversie Documente
const file = $input.first();
const targetFormat = $input.first().json.targetFormat;
 
const conversionMap = {
  // Format sursa -> tinte suportate
  'application/pdf': ['image/png', 'text/plain', 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'],
  'application/vnd.openxmlformats-officedocument.wordprocessingml.document': ['application/pdf', 'text/plain', 'text/html'],
  'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet': ['text/csv', 'application/pdf', 'application/json'],
  'text/csv': ['application/vnd.openxmlformats-officedocument.spreadsheetml.sheet', 'application/json'],
  'image/jpeg': ['image/png', 'image/webp', 'application/pdf'],
  'image/png': ['image/jpeg', 'image/webp', 'application/pdf']
};
 
const sourceMime = file.binary.mimeType;
const supportedTargets = conversionMap[sourceMime] || [];
 
if (!supportedTargets.includes(targetFormat)) {
  throw new Error(
    `Nu se poate converti ${sourceMime} in ${targetFormat}. Suportate: ${supportedTargets.join(', ')}`
  );
}
 
// Determina metoda de conversie
let conversionMethod;
if (sourceMime.startsWith('image/') && targetFormat.startsWith('image/')) {
  conversionMethod = 'image_transform';
} else if (targetFormat === 'application/pdf') {
  conversionMethod = 'to_pdf';
} else if (sourceMime === 'application/pdf') {
  conversionMethod = 'from_pdf';
} else {
  conversionMethod = 'document_convert';
}
 
return [{
  json: {
    sourceFormat: sourceMime,
    targetFormat,
    conversionMethod,
    fileName: file.binary.fileName
  },
  binary: {
    data: file.binary.data
  }
}];

Procesare Imagini

// Function Node: Procesare Imagini cu Sharp
const sharp = require('sharp');
 
const imageBuffer = Buffer.from($input.first().binary.data, 'base64');
const options = $input.first().json.options || {};
 
// Optiuni implicite
const config = {
  width: options.width || null,
  height: options.height || null,
  format: options.format || 'jpeg',
  quality: options.quality || 80,
  fit: options.fit || 'inside' // cover, contain, fill, inside, outside
};
 
let processor = sharp(imageBuffer);
 
// Redimensioneaza daca sunt specificate dimensiuni
if (config.width || config.height) {
  processor = processor.resize({
    width: config.width,
    height: config.height,
    fit: config.fit,
    withoutEnlargement: true
  });
}
 
// Conversie format
switch (config.format) {
  case 'jpeg':
  case 'jpg':
    processor = processor.jpeg({ quality: config.quality });
    break;
  case 'png':
    processor = processor.png({ compressionLevel: 9 });
    break;
  case 'webp':
    processor = processor.webp({ quality: config.quality });
    break;
  case 'avif':
    processor = processor.avif({ quality: config.quality });
    break;
}
 
const outputBuffer = await processor.toBuffer();
const metadata = await sharp(outputBuffer).metadata();
 
// Genereaza noul nume de fisier
const originalName = $input.first().binary.fileName;
const baseName = originalName.substring(0, originalName.lastIndexOf('.'));
const newFileName = `${baseName}.${config.format}`;
 
return [{
  json: {
    originalSize: imageBuffer.length,
    newSize: outputBuffer.length,
    compressionRatio: ((1 - outputBuffer.length / imageBuffer.length) * 100).toFixed(1) + '%',
    dimensions: {
      width: metadata.width,
      height: metadata.height
    },
    format: metadata.format
  },
  binary: {
    data: outputBuffer.toString('base64'),
    mimeType: `image/${config.format}`,
    fileName: newFileName
  }
}];

Workflow-uri de Procesare Batch

Procesor Batch de Fisiere

// Function Node: Coordonator Procesare Batch
const files = $input.all();
const batchConfig = {
  maxConcurrent: 5,
  retryAttempts: 3,
  timeoutMs: 30000
};
 
// Grupeaza fisierele dupa tip pentru procesare eficienta
const fileGroups = {};
for (const file of files) {
  const type = file.binary?.mimeType || 'unknown';
  if (!fileGroups[type]) {
    fileGroups[type] = [];
  }
  fileGroups[type].push(file);
}
 
// Creeaza job-uri batch
const batchJobs = [];
let jobId = 0;
 
for (const [type, groupFiles] of Object.entries(fileGroups)) {
  // Imparte in chunk-uri pentru procesare concurenta
  for (let i = 0; i < groupFiles.length; i += batchConfig.maxConcurrent) {
    const chunk = groupFiles.slice(i, i + batchConfig.maxConcurrent);
    batchJobs.push({
      jobId: ++jobId,
      fileType: type,
      files: chunk.map(f => ({
        fileName: f.binary?.fileName,
        size: f.binary?.data ? Buffer.from(f.binary.data, 'base64').length : 0
      })),
      fileCount: chunk.length
    });
  }
}
 
return batchJobs.map(job => ({
  json: {
    ...job,
    config: batchConfig,
    timestamp: new Date().toISOString()
  }
}));

Error Handling si Logging

Logger Complet de Procesare Fisiere

// Function Node: Logger Procesare Fisiere
const result = $input.first().json;
const startTime = $input.first().json.startTime || Date.now();
 
const logEntry = {
  timestamp: new Date().toISOString(),
  processingTime: Date.now() - startTime,
  operation: result.operation,
  status: result.success ? 'success' : 'failed',
  input: {
    fileName: result.inputFile,
    mimeType: result.inputMimeType,
    size: result.inputSize
  },
  output: result.success ? {
    fileName: result.outputFile,
    mimeType: result.outputMimeType,
    size: result.outputSize
  } : null,
  error: result.error || null,
  metadata: {
    workflowId: $workflow.id,
    executionId: $execution.id,
    nodeId: $node.id
  }
};
 
// Determina nivelul de log
let level = 'info';
if (result.error) {
  level = result.error.includes('timeout') ? 'warn' : 'error';
}
 
return [{
  json: {
    ...logEntry,
    level
  }
}];

Sumar Bune Practici

  1. Valideaza intotdeauna tipurile de fisiere folosind magic bytes, nu doar extensii
  2. Implementeaza limite de dimensiune pentru a preveni epuizarea resurselor
  3. Foloseste streaming pentru fisiere mari pentru a minimiza utilizarea memoriei
  4. Gestioneaza erorile gradat cu logging adecvat
  5. Sanitizeaza numele fisierelor inainte de stocare
  6. Foloseste presigned URLs pentru acces securizat la fisiere
  7. Proceseaza in batch cand lucrezi cu fisiere multiple
  8. Monitorizeaza timpii de procesare pentru a identifica bottleneck-uri

Automatizarea documentelor cu n8n permite pipeline-uri puternice de procesare a fisierelor, mentinand in acelasi timp securitatea si scalabilitatea. Combina aceste patternuri in functie de cerintele specifice ale cazului tau de utilizare.


Sistemul tau AI e conform cu EU AI Act? Evaluare gratuita de risc - afla in 2 minute →

Ai nevoie de ajutor cu conformitatea EU AI Act sau securitatea AI?

Programeaza o consultatie gratuita de 30 de minute. Fara obligatii.

Programeaza un Apel

Weekly AI Security & Automation Digest

Get the latest on AI Security, workflow automation, secure integrations, and custom platform development delivered weekly.

No spam. Unsubscribe anytime.