How Masking Works

Understand the masking engine’s internals, how transformers replace sensitive data, and how the platform preserves referential integrity during masking.

Overview

Data masking replaces sensitive information with realistic but fake data. The platform’s masking engine reads every row from the source database, applies transformers to specified columns, and writes the masked data to the destination while preserving referential integrity, data types, and business logic.

MASKING Mode

Key Characteristics

  • Preserves row count: Output has same number of rows as input

  • Preserves IDs: Primary keys typically kept unchanged

  • 1:1 mapping: Each source row maps to exactly one destination row

  • Schema preserved: Same tables, columns, data types

  • Referential integrity maintained: Foreign keys remain valid

Masking Process

Example: Before and After Masking

Table 1. Customer Table Transformation
Column Original Data Masked Data Transformation

customer_id

1

1

Preserved (Primary Key)

first_name

John

Alice

Realistic name generated

last_name

Doe

Smith

Realistic name generated

email

john.doe@company.com

alice.smith@example.com

Format preserved, content masked

phone

+1-555-123-4567

+1-555-987-6543

Format preserved, digits randomized

created_at

2023-01-15

2023-01-15

Preserved (non-sensitive)

Notice how the masked data:

  • Maintains all data types

  • Preserves primary keys for referential integrity

  • Generates realistic values that pass validation

  • Keeps non-sensitive data unchanged

When to Use Masking

Masking is ideal when you need to:

  • Anonymize production data for non-production environments

  • Maintain exact row counts and relationships

  • Preserve business logic and data distributions

  • Comply with data privacy regulations (GDPR, HIPAA, etc.)

Comparing Modes: To understand how masking differs from generation and subsetting, see Mode Comparison.

Common Questions

Is masking reversible?

No. Masking is intentionally irreversible for security. Once data is masked, the original values cannot be recovered. This is a feature, not a limitation - it ensures that masked data cannot be de-anonymized.

Never mask your only copy of production data. Always mask a copy or backup.

The same input will always produce the same output, which is useful for:

  • Masking multiple related tables

  • Re-running workflows

  • Maintaining consistency across environments

Will masking break my application?

No, if configured correctly. The platform preserves:

  • All data types

  • All constraints (PK, FK, NOT NULL, CHECK)

  • All referential integrity

  • Data formats and patterns

However, you should:

  • Test with a small dataset first

  • Verify application functionality with masked data

  • Use format-preserving transformers for validated fields

  • Don’t rely on specific data values in your tests