Transformation Issues

Troubleshoot problems during data masking, generation, and subsetting.

Foreign Key Constraint Violations

Symptom

Error: "Foreign key constraint violation", "Referential integrity error".

Common Causes

  1. Parent table processed after child

The platform should handle this automatically, but if you see this error:

# Ensure parent tables are listed first
table_schema:
  - table_name_pattern: "customers"  # Parent
  - table_name_pattern: "orders"     # Child
  1. Missing foreign key metadata

Solution: Define virtual foreign keys:

foreign_keys:
  - columns: ["customer_id"]
    reference_table: "customers"
    reference_columns: ["id"]

Type Conversion Errors

Symptom

"Cannot convert value", "Type mismatch", "Invalid data type".

Solutions

  1. Check transformer compatibility:

# ❌ Wrong: Email transformer on numeric column
- columns: ["customer_id"]
  params:
    type: person_generator
    column_templates: ["${email}"]

# ✓ Correct: Use appropriate transformer
- columns: ["customer_id"]
  params:
    type: int_sequence_generator
  1. Verify column data types:

-- Check actual data type
SELECT data_type
FROM information_schema.columns
WHERE table_name = 'customers'
AND column_name = 'email';
  1. Use custom transformations:

NULL Value Handling

Symptom

Unexpected NULL values in output, "NOT NULL constraint violation".

Solutions

  1. Preserve NULLs from source:

The platform preserves NULL values by default.

  1. Generate non-NULL values:

transformations:
  - columns: ["email"]
    params:
      type: person_generator
      column_templates: ["${email}"]
    # Platform preserves NULLs by default; configure as needed
  1. Check NOT NULL constraints:

-- Verify constraint
SELECT column_name, is_nullable
FROM information_schema.columns
WHERE table_name = 'customers';

Memory Errors

Symptom

"OutOfMemoryError", "GC overhead limit exceeded", workflow hangs.

Solutions

  1. Increase JVM heap:

export JAVA_OPTS="-Xmx8g"
  1. Reduce batch size:

default_config:
  batch_size: 5000  # Smaller batches
  1. Process tables separately:

Split large workflows into multiple smaller workflows.

Slow Transformation Performance

Symptom

Workflow takes much longer than expected.

Solutions

  1. Check database indexes:

-- Ensure foreign key columns are indexed
CREATE INDEX idx_customer_id ON orders(customer_id);
  1. Optimize transformations:

    • Avoid complex scripting transformers when simple ones suffice

    • Use deterministic transformations when possible

  2. Scale horizontally:

Add more agents: See Scaling Guide

Character Encoding Issues

Symptom

Special characters appear garbled or corrupted.

Solutions

  1. Verify database encoding:

-- PostgreSQL
SHOW server_encoding;

-- MySQL
SHOW VARIABLES LIKE 'character_set%';
  1. Set JDBC encoding:

url: jdbc:postgresql://host:5432/db?characterEncoding=UTF-8