[experimental] Incremental Masking: Quickstart
This feature is experimental and may change in future releases. |
Get started with incremental masking in 5 minutes. This guide shows you how to set up incremental processing for large tables that need regular updates without full reprocessing.
Prerequisites
-
A table you want to process incrementally with a monotonically increasing column (timestamp, auto-incrementing ID, etc.)
-
The column should be indexed and not nullable
Basic Configuration
Create a configuration file with incremental checkpoints for tables you want to process incrementally. Ensure table_truncation_mode: IGNORE
is set to avoid data loss. Add checkpoint
with mode: ignore_table
to default config.
default_config:
mode: MASKING
checkpoint:
mode: ignore_table
tables:
- table_name_with_schema: <your table>
mode: MASKING
checkpoint:
mode: incremental
column_name: <your column>
source:
type: last_value
table_truncation_mode: IGNORE
schema_creation_mode: CREATE_IF_NOT_EXISTS
With this configuration:
-
First run: TDK processes all data in all tables
-
Subsequent runs:
-
TDK only processes new rows in tables with
incremental
checkpoints -
Other tables will be skipped due to
ignore_table
in the default config
-
Key Points
-
Required setting:
table_truncation_mode: IGNORE
prevents data loss -
Checkpoint column: Must be monotonically increasing (timestamps, IDs)
-
Execution: First run processes all data, subsequent runs process only new data
-
Performance: Efficient for large datasets with regular updates
-
Default config: Use
checkpoint
withmode: ignore_table
indefault_config
to skip processing other tables on subsequent runs
Tables with Dependencies
If you want to process dependent tables incrementally, you can set up checkpoints for both tables, otherwise, only the rows that reference already processed rows will be masked.
tables:
- table_name_with_schema: <your child table>
mode: MASKING
checkpoint:
mode: incremental
column_name: <child column>
source:
type: last_value
- table_name_with_schema: <your parent table>
mode: MASKING
checkpoint:
mode: incremental
column_name: <parent column>
source:
type: last_value
If the related table does not have a monotonically increasing column, you can use mode: refresh
to reprocess the entire table on each run.
tables:
- table_name_with_schema: <your child table>
mode: MASKING
checkpoint:
mode: incremental
column_name: <child column>
source:
type: last_value
- table_name_with_schema: <your parent table>
mode: MASKING
checkpoint:
mode: refresh
This configuration will reprocess the customer
table entirely on each run, while the payment
table will be processed incrementally. In this case, it might be not worth using incremental processing at all.
Next Steps
-
Read the complete Incremental Masking guide
-
Learn about different checkpoint modes and constraints
-
Explore advanced scenarios like starting from scratch