[experimental] Incremental Masking: Quickstart

This feature is experimental and may change in future releases.

Get started with incremental masking in 5 minutes. This guide shows you how to set up incremental processing for large tables that need regular updates without full reprocessing.

Prerequisites

  • A table you want to process incrementally with a monotonically increasing column (timestamp, auto-incrementing ID, etc.)

  • The column should be indexed and not nullable

Basic Configuration

Create a configuration file with incremental checkpoints for tables you want to process incrementally. Ensure table_truncation_mode: IGNORE is set to avoid data loss. Add checkpoint with mode: ignore_table to default config.

default_config:
  mode: MASKING
  checkpoint:
    mode: ignore_table

tables:
  - table_name_with_schema: <your table>
    mode: MASKING
    checkpoint:
      mode: incremental
      column_name: <your column>
      source:
        type: last_value

table_truncation_mode: IGNORE
schema_creation_mode: CREATE_IF_NOT_EXISTS

With this configuration:

  • First run: TDK processes all data in all tables

  • Subsequent runs:

    • TDK only processes new rows in tables with incremental checkpoints

    • Other tables will be skipped due to ignore_table in the default config

Key Points

  • Required setting: table_truncation_mode: IGNORE prevents data loss

  • Checkpoint column: Must be monotonically increasing (timestamps, IDs)

  • Execution: First run processes all data, subsequent runs process only new data

  • Performance: Efficient for large datasets with regular updates

  • Default config: Use checkpoint with mode: ignore_table in default_config to skip processing other tables on subsequent runs

Tables with Dependencies

If you want to process dependent tables incrementally, you can set up checkpoints for both tables, otherwise, only the rows that reference already processed rows will be masked.

tables:
  - table_name_with_schema: <your child table>
    mode: MASKING
    checkpoint:
      mode: incremental
      column_name: <child column>
      source:
        type: last_value

  - table_name_with_schema: <your parent table>
    mode: MASKING
    checkpoint:
      mode: incremental
      column_name: <parent column>
      source:
        type: last_value

If the related table does not have a monotonically increasing column, you can use mode: refresh to reprocess the entire table on each run.

tables:
  - table_name_with_schema: <your child table>
    mode: MASKING
    checkpoint:
      mode: incremental
      column_name: <child column>
      source:
        type: last_value

  - table_name_with_schema: <your parent table>
    mode: MASKING
    checkpoint:
      mode: refresh

This configuration will reprocess the customer table entirely on each run, while the payment table will be processed incrementally. In this case, it might be not worth using incremental processing at all.

Next Steps

  • Read the complete Incremental Masking guide

  • Learn about different checkpoint modes and constraints

  • Explore advanced scenarios like starting from scratch