Talend tMap Component: Complete Tutorial with Examples

The tMap component is Talend's most powerful transformation tool. This comprehensive guide covers every aspect of tMap from basic mappings to advanced multi-output routing and complex expressions.

tMap Architecture Overview

┌──────────────────────────────────────────────────────────────────────────┐
│                         tMAP COMPONENT STRUCTURE                          │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌─────────────────────────────────────────────────────────────────────┐│
│  │                        INPUT SECTION                                 ││
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐              ││
│  │  │  Main Flow   │  │   Lookup 1   │  │   Lookup 2   │              ││
│  │  │  (Required)  │  │  (Optional)  │  │  (Optional)  │              ││
│  │  │              │  │              │  │              │              ││
│  │  │ • customer_id│  │ • cust_id    │  │ • code       │              ││
│  │  │ • name       │  │ • segment    │  │ • description│              ││
│  │  │ • order_amt  │  │ • tier       │  │              │              ││
│  │  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘              ││
│  └─────────┼─────────────────┼─────────────────┼────────────────────────┘│
│            │                 │                 │                          │
│            ▼                 ▼                 ▼                          │
│  ┌─────────────────────────────────────────────────────────────────────┐│
│  │                      VARIABLES SECTION                               ││
│  │  var_fullName = row1.first_name + " " + row1.last_name              ││
│  │  var_discountRate = row2.tier.equals("GOLD") ? 0.2 : 0.1            ││
│  │  var_finalAmount = row1.order_amt * (1 - var_discountRate)          ││
│  └─────────────────────────────────────────────────────────────────────┘│
│            │                                                             │
│            ▼                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐│
│  │                       OUTPUT SECTION                                 ││
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐              ││
│  │  │   Output 1   │  │   Output 2   │  │   Reject     │              ││
│  │  │  (All rows)  │  │ (Filtered)   │  │  (Errors)    │              ││
│  │  │              │  │              │  │              │              ││
│  │  │ Filter: none │  │ Filter:      │  │ Catch lookup │              ││
│  │  │              │  │ amt > 1000   │  │ reject       │              ││
│  │  └──────────────┘  └──────────────┘  └──────────────┘              ││
│  └─────────────────────────────────────────────────────────────────────┘│
└──────────────────────────────────────────────────────────────────────────┘

Basic Data Transformations

Simple Column Mapping

// Direct mapping (drag and drop)
// row1.customer_id → output.customer_id
 
// Expression-based mapping examples:
 
// String transformations
output.upper_name = StringHandling.UPCASE(row1.name)
output.lower_name = StringHandling.DOWNCASE(row1.name)
output.trimmed = StringHandling.TRIM(row1.name)
output.padded = StringHandling.LPAD(row1.code, 10, "0")
 
// Substring operations
output.first_3 = StringHandling.LEFT(row1.code, 3)
output.last_4 = StringHandling.RIGHT(row1.phone, 4)
output.middle = StringHandling.MID(row1.code, 2, 4)
 
// String replacement
output.cleaned = StringHandling.REPLACE(row1.text, "\n", " ")
output.no_spaces = StringHandling.EREPLACE(row1.text, "\\s+", "_")
 
// Concatenation
output.full_address = row1.street + ", " + row1.city + " " + row1.zip
 
// Null handling
output.safe_name = row1.name != null ? row1.name : "Unknown"
output.safe_amount = row1.amount == null ? 0 : row1.amount

Numeric Transformations

// Basic math
output.total = row1.quantity * row1.unit_price
output.discounted = row1.price * (1 - row1.discount_rate)
output.rounded = Math.round(row1.amount * 100.0) / 100.0
 
// Type conversions
output.int_value = Integer.parseInt(row1.string_number)
output.double_value = Double.parseDouble(row1.string_decimal)
output.string_value = String.valueOf(row1.numeric_field)
 
// Formatting numbers
output.formatted = String.format("%.2f", row1.amount)
output.with_comma = String.format("%,.2f", row1.large_number)
output.percentage = String.format("%.1f%%", row1.rate * 100)
 
// Number functions
output.absolute = Math.abs(row1.value)
output.ceiling = Math.ceil(row1.decimal)
output.floor = Math.floor(row1.decimal)
output.max_value = Math.max(row1.val1, row1.val2)
output.min_value = Math.min(row1.val1, row1.val2)
output.power = Math.pow(row1.base, row1.exponent)
output.sqrt = Math.sqrt(row1.value)
 
// Random values
output.random_int = (int)(Math.random() * 1000)
output.uuid = java.util.UUID.randomUUID().toString()

Date Transformations

// Parse dates from strings
output.parsed_date = TalendDate.parseDate("yyyy-MM-dd", row1.date_string)
output.parsed_datetime = TalendDate.parseDate("yyyy-MM-dd HH:mm:ss", row1.datetime_string)
 
// Format dates to strings
output.formatted_date = TalendDate.formatDate("MM/dd/yyyy", row1.date_field)
output.iso_date = TalendDate.formatDate("yyyy-MM-dd'T'HH:mm:ss'Z'", row1.datetime_field)
 
// Date arithmetic
output.next_week = TalendDate.addDate(row1.order_date, 7, "dd")
output.next_month = TalendDate.addDate(row1.start_date, 1, "MM")
output.next_year = TalendDate.addDate(row1.birth_date, 1, "yyyy")
output.yesterday = TalendDate.addDate(TalendDate.getCurrentDate(), -1, "dd")
 
// Date parts extraction
output.year = TalendDate.getPartOfDate("YEAR", row1.date_field)
output.month = TalendDate.getPartOfDate("MONTH", row1.date_field)
output.day = TalendDate.getPartOfDate("DAY_OF_MONTH", row1.date_field)
output.day_of_week = TalendDate.getPartOfDate("DAY_OF_WEEK", row1.date_field)
output.week_of_year = TalendDate.getPartOfDate("WEEK_OF_YEAR", row1.date_field)
output.quarter = (TalendDate.getPartOfDate("MONTH", row1.date_field) - 1) / 3 + 1
 
// Date comparisons
output.days_between = TalendDate.diffDate(row1.end_date, row1.start_date, "dd")
output.months_between = TalendDate.diffDate(row1.end_date, row1.start_date, "MM")
 
// Current date/time
output.today = TalendDate.getCurrentDate()
output.now = TalendDate.parseDate("yyyy-MM-dd HH:mm:ss",
    TalendDate.formatDate("yyyy-MM-dd HH:mm:ss", new java.util.Date()))
 
// First/Last day of month
output.first_of_month = TalendDate.getFirstDayOfMonth(row1.date_field)
output.last_of_month = TalendDate.getLastDayOfMonth(row1.date_field)

Lookup Operations

Single Table Lookup

/*
 * Lookup Configuration:
 *
 * Main Input: row1 (customers)
 * Lookup Input: row2 (customer_segments)
 *
 * Join Key: row1.customer_id = row2.customer_id
 *
 * Lookup Model: Load once (for small lookup tables)
 * Join Type: Left Outer Join
 * Match Model: First Match
 */
 
// Output expressions with lookup data
output.customer_id = row1.customer_id
output.customer_name = row1.name
output.segment = row2.segment != null ? row2.segment : "UNKNOWN"
output.tier = row2.tier != null ? row2.tier : "STANDARD"
 
// Handle lookup misses with default values
output.discount_rate = row2 != null && row2.discount_rate != null
    ? row2.discount_rate
    : 0.0
 
// Conditional based on lookup result
output.is_premium = row2 != null && "PREMIUM".equals(row2.tier)

Multiple Lookups

/*
 * Multiple Lookup Configuration:
 *
 * Main Input: row1 (orders)
 * Lookup 1: row2 (customers) - join on customer_id
 * Lookup 2: row3 (products) - join on product_id
 * Lookup 3: row4 (regions) - join on region_code
 *
 * Processing Order: Lookups evaluated in sequence
 */
 
// Variables section (for reusability)
Var.customer_name = row2 != null ? row2.name : "Unknown Customer"
Var.product_name = row3 != null ? row3.product_name : "Unknown Product"
Var.region_name = row4 != null ? row4.region_name : "Unknown Region"
Var.unit_price = row3 != null ? row3.unit_price : row1.default_price
 
// Output expressions
output.order_id = row1.order_id
output.customer_name = Var.customer_name
output.product_name = Var.product_name
output.region = Var.region_name
output.line_total = row1.quantity * Var.unit_price
output.lookup_status = (row2 != null ? "C" : "") +
                       (row3 != null ? "P" : "") +
                       (row4 != null ? "R" : "")

Lookup with Composite Keys

/*
 * Composite Key Lookup:
 *
 * Main: row1 (transactions)
 * Lookup: row2 (price_matrix)
 *
 * Join Keys:
 *   row1.product_id = row2.product_id AND
 *   row1.region_code = row2.region_code AND
 *   row1.effective_date >= row2.start_date AND
 *   row1.effective_date <= row2.end_date
 *
 * Note: For complex date ranges, use tMap with inner join expression
 * or pre-filter lookup data
 */
 
// Expression key example (in tMap key settings)
// Key 1: row1.product_id.equals(row2.product_id)
// Key 2: row1.region_code.equals(row2.region_code)
 
// For date range matching, use expression:
row2.start_date.compareTo(row1.trans_date) <= 0 &&
row2.end_date.compareTo(row1.trans_date) >= 0

Variables for Complex Logic

Variable Declaration and Usage

/*
 * Variables Section Benefits:
 * - Avoid repeating complex expressions
 * - Improve readability
 * - Optimize performance (expression evaluated once)
 * - Build calculations step by step
 */
 
// Variable definitions (evaluated in order)
Var.base_price = row1.quantity * row3.unit_price
 
Var.discount_rate = row2 != null && row2.tier != null
    ? (row2.tier.equals("GOLD") ? 0.20 :
       row2.tier.equals("SILVER") ? 0.15 :
       row2.tier.equals("BRONZE") ? 0.10 : 0.05)
    : 0.0
 
Var.discount_amount = Var.base_price * Var.discount_rate
 
Var.subtotal = Var.base_price - Var.discount_amount
 
Var.tax_rate = row4 != null ? row4.tax_rate : 0.08
 
Var.tax_amount = Var.subtotal * Var.tax_rate
 
Var.total = Var.subtotal + Var.tax_amount
 
// Output using variables
output.base_price = Var.base_price
output.discount_rate = Var.discount_rate
output.discount_amount = Var.discount_amount
output.subtotal = Var.subtotal
output.tax_rate = Var.tax_rate
output.tax_amount = Var.tax_amount
output.total = Var.total

Conditional Logic in Variables

// Complex conditional assignment
Var.customer_category =
    row1.annual_revenue > 1000000 ? "ENTERPRISE" :
    row1.annual_revenue > 100000 ? "MID_MARKET" :
    row1.annual_revenue > 10000 ? "SMALL_BUSINESS" :
    "MICRO"
 
Var.risk_level =
    row1.payment_days_late > 90 ? "HIGH" :
    row1.payment_days_late > 30 ? "MEDIUM" :
    row1.payment_days_late > 0 ? "LOW" :
    "NONE"
 
// Pattern matching with regex
Var.is_valid_email = row1.email != null &&
    row1.email.matches("^[A-Za-z0-9+_.-]+@[A-Za-z0-9.-]+$")
 
Var.phone_type = row1.phone == null ? "UNKNOWN" :
    row1.phone.matches("^1?8[0-9]{2}.*") ? "TOLL_FREE" :
    row1.phone.matches("^\\+1.*") ? "US_INTERNATIONAL" :
    "DOMESTIC"
 
// State machine logic
Var.order_status =
    row1.shipped_date != null && row1.delivered_date != null ? "DELIVERED" :
    row1.shipped_date != null ? "IN_TRANSIT" :
    row1.paid_date != null ? "PROCESSING" :
    row1.confirmed_date != null ? "CONFIRMED" :
    "PENDING"

Filtering and Routing

Output Filters

/*
 * Filter Configuration in Output Settings:
 *
 * Output 1 (all_orders): No filter
 * Output 2 (large_orders): row1.amount > 1000
 * Output 3 (premium_customers): row2.tier.equals("PREMIUM")
 * Output 4 (needs_review): Var.needs_review == true
 */
 
// Filter expressions
 
// Simple value filter
row1.status.equals("ACTIVE")
 
// Numeric comparison
row1.amount >= 1000 && row1.amount <= 10000
 
// Date filter
row1.order_date.after(TalendDate.parseDate("yyyy-MM-dd", "2024-01-01"))
 
// Null check filter
row1.email != null && !row1.email.isEmpty()
 
// Multiple conditions
row1.status.equals("ACTIVE") &&
row1.amount > 100 &&
row2 != null &&
row2.credit_limit > row1.amount
 
// Pattern filter
row1.product_code.startsWith("PREM-")
 
// List membership
java.util.Arrays.asList("US", "CA", "MX").contains(row1.country_code)

Reject Flows

/*
 * Reject Flow Configuration:
 *
 * Catch lookup reject: true
 * Catch expression filter reject: true
 *
 * Reject causes:
 * - Lookup with "Inner Join" and no match
 * - Filter expression evaluated to false
 */
 
// In reject output, you can access:
output.original_id = row1.id
output.original_data = row1.data
output.reject_reason = row2 == null ? "Customer lookup failed" :
    row3 == null ? "Product lookup failed" :
    "Filter condition not met"
output.rejected_at = TalendDate.getCurrentDate()
 
// Log reject for analysis
// Enable "Catch lookup inner join reject" on lookup table settings

Multi-Output Routing

/*
 * Routing Strategy:
 *
 * Main Output: Successful records
 * Error Output: Validation failures
 * Archive Output: Historical records
 * Delta Output: Changed records only
 */
 
// Using variables for routing decisions
Var.is_valid = row1.id != null &&
               row1.amount > 0 &&
               row1.customer_id != null
 
Var.is_historical = row1.record_date.before(
    TalendDate.addDate(TalendDate.getCurrentDate(), -365, "dd"))
 
Var.is_changed = !row1.hash.equals(row2.previous_hash)
 
// Output filters:
// main_output: Var.is_valid && !Var.is_historical
// error_output: !Var.is_valid
// archive_output: Var.is_historical
// delta_output: Var.is_valid && Var.is_changed

Advanced Techniques

Aggregation in tMap

/*
 * Note: tMap is row-by-row processing
 * For true aggregation, use tAggregateRow
 *
 * But for running totals and counters, use globalMap:
 */
 
// In tJava before tMap, initialize counters:
globalMap.put("runningTotal", 0.0);
globalMap.put("rowCount", 0);
 
// In tMap variable:
Var.running_total = ((Double)globalMap.get("runningTotal")) + row1.amount
Var.row_number = ((Integer)globalMap.get("rowCount")) + 1
 
// Update globalMap (in output expression with side effect - use carefully):
// This is a workaround - tAggregateRow is preferred for aggregations

Working with Arrays and Lists

// Split delimited string into array
String[] parts = row1.tags.split(",");
output.first_tag = parts.length > 0 ? parts[0].trim() : null
output.tag_count = parts.length
 
// Join array to string
output.combined = String.join(", ", row1.tag_array)
 
// Check if value in list
output.is_priority = java.util.Arrays.asList(
    row1.categories.split(",")).contains("PRIORITY")
 
// Process JSON array (basic)
// For complex JSON, use tExtractJSONFields before tMap
output.first_item = row1.json_array.replace("[", "")
    .replace("]", "")
    .split(",")[0]
    .replace("\"", "")
    .trim()

Custom Java Methods

// Define reusable method in tJava (begin section)
public static String maskEmail(String email) {
    if (email == null || !email.contains("@")) return email;
    String[] parts = email.split("@");
    String username = parts[0];
    String masked = username.charAt(0) +
        "*".repeat(Math.max(0, username.length() - 2)) +
        (username.length() > 1 ? username.charAt(username.length() - 1) : "");
    return masked + "@" + parts[1];
}
 
public static String cleanPhone(String phone) {
    if (phone == null) return null;
    return phone.replaceAll("[^0-9]", "");
}
 
public static boolean isValidSSN(String ssn) {
    if (ssn == null) return false;
    return ssn.matches("^\\d{3}-\\d{2}-\\d{4}$");
}
 
// Use in tMap expressions:
output.masked_email = maskEmail(row1.email)
output.clean_phone = cleanPhone(row1.phone)
output.ssn_valid = isValidSSN(row1.ssn)

Error Handling in Expressions

// Safe navigation with null checks
output.nested_value = row1.parent != null &&
                      row1.parent.child != null
    ? row1.parent.child.value
    : null
 
// Try-catch in expression (for risky operations)
output.parsed_number = (
    (java.util.function.Supplier<Integer>)() -> {
        try {
            return Integer.parseInt(row1.string_num);
        } catch (NumberFormatException e) {
            return -1; // Default on error
        }
    }
).get()
 
// Simpler approach - use ternary with validation
output.parsed_number = row1.string_num != null &&
    row1.string_num.matches("-?\\d+")
    ? Integer.parseInt(row1.string_num)
    : 0

Performance Optimization

Expression Optimization

// BAD: Repeated complex expression
output.field1 = StringHandling.UPCASE(StringHandling.TRIM(row1.name))
output.field2 = StringHandling.UPCASE(StringHandling.TRIM(row1.name)) + "_SUFFIX"
output.field3 = StringHandling.UPCASE(StringHandling.TRIM(row1.name)).length()
 
// GOOD: Use variable
Var.clean_name = StringHandling.UPCASE(StringHandling.TRIM(row1.name))
output.field1 = Var.clean_name
output.field2 = Var.clean_name + "_SUFFIX"
output.field3 = Var.clean_name.length()
 
// BAD: Multiple null checks
output.val = row2 != null ? (row2.field != null ? row2.field : "default") : "default"
 
// GOOD: Single variable with short-circuit
Var.lookup_value = row2 != null && row2.field != null ? row2.field : "default"
output.val = Var.lookup_value

Lookup Optimization

/*
 * Lookup Model Selection:
 *
 * 1. Load Once
 *    - Entire lookup loaded into memory at start
 *    - Best for: < 100K rows, frequently accessed
 *    - Memory: O(n) where n = lookup size
 *
 * 2. Reload at Each Row
 *    - Query lookup for each main row
 *    - Best for: Very large lookups, selective main flow
 *    - Performance: Slow, but low memory
 *
 * 3. Reload at Each Row (cache)
 *    - Query + cache results
 *    - Best for: Large lookup, repeated keys in main
 *    - Memory: O(unique keys accessed)
 */
 
// For huge lookups, consider:
// 1. Pre-filter lookup data with tFilterRow
// 2. Use tHash* components for disk-based lookups
// 3. Split job into chunks with tFlowToIterate

Complete Job Example

Job Structure: Customer Order Processing

tFileInputDelimited (orders.csv)
        │
        ▼
    [tMap_Main]
        │
        ├──► lookup: tDBInput (customers)
        │    JOIN: order.customer_id = customer.id
        │
        ├──► lookup: tDBInput (products)
        │    JOIN: order.product_id = product.id
        │
        ├──► Variables:
        │    Var.unit_price = row3.price != null ? row3.price : row1.default_price
        │    Var.discount = row2.tier.equals("GOLD") ? 0.2 : 0.1
        │    Var.line_total = row1.qty * Var.unit_price * (1 - Var.discount)
        │    Var.is_valid = row1.qty > 0 && row2 != null
        │
        ├──► Output 1: valid_orders (Filter: Var.is_valid)
        │    └──► tDBOutput (processed_orders)
        │
        ├──► Output 2: large_orders (Filter: Var.line_total > 10000)
        │    └──► tFileOutputDelimited (large_orders.csv)
        │
        └──► Output 3: reject_orders (Catch lookup reject)
             └──► tLogRow + tFileOutputDelimited (rejected.csv)

Best Practices Summary

tmap_best_practices:
  design:
    - "Use meaningful variable names"
    - "Comment complex expressions"
    - "Keep expressions simple and readable"
    - "Break complex logic into multiple variables"
 
  performance:
    - "Choose appropriate lookup model based on data size"
    - "Use variables for repeated expressions"
    - "Filter early to reduce processing"
    - "Pre-filter large lookups before tMap"
 
  error_handling:
    - "Always handle null values explicitly"
    - "Use reject outputs for error tracking"
    - "Validate data types before conversion"
    - "Log rejected records for analysis"
 
  maintenance:
    - "Document join keys and business logic"
    - "Test with edge cases (nulls, empty strings)"
    - "Version control tMap configurations"
    - "Review and optimize periodically"

Conclusion

The tMap component is essential for Talend data transformations. Master the use of variables for complex logic, choose appropriate lookup strategies based on data volumes, and implement proper error handling with reject flows. These techniques enable efficient, maintainable ETL jobs.