[experimental] Data append mode

Starting from version 1.58, Synthesized TDK supports not only provisioning data into an empty database, but also appending data to an existing database.

This feature is experimental and is subject to change in the future as more feedback is gathered and more cases are covered.

Limitations

Smart data append is only supported for the GENERATION mode. If a table is being transformed in the MASKING mode, the resulting data will be appended to the existing output data without any reconciliation.

Configuration

The append mode is only in effect for operations modes that do not purge the output database before the transformation but at the same time allow the execution to proceed:

  • schema_create_mode: CREATE_IF_NOT_EXISTS or schema_create_mode: DO_NOT_CREATE

  • table_truncation_mode: IGNORE

The append mode does not require any additional configuration. It automatically kicks in when the above condition is met.

The behaviour with regard to the data in the parent table is controlled by the parent_data_mode configuration option. The following values are supported:

  • OLD - only the data that existed in the parent table before the transformation is used to create the output data in the child table.

  • NEW - only the data that was created by the transformation is used to create the output data in the child table.

  • ALL (default) - both pre-existing and newly created data are used to create the output data in the child table.

The parent_data_mode setting only has effect in combination with the GENERATION transformation mode. In other modes, it is ignored.

To get hold of how the parent_data_mode options differ, consider the example of two tables linked with a foreign key.

CREATE TABLE public.customer (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL
);

CREATE TABLE public.transaction (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER NOT NULL,
    amount INTEGER NOT NULL,
    date DATE NOT NULL,
    FOREIGN KEY (customer_id) REFERENCES public.customer(id)
);

Let’s say we have the following data in the customer table:

id name address_id

1

Alice

11

2

Bob

22

And the following data in the address table:

id street

11

Elm Street

22

Maple Street

Now, let’s run a transformation with the following configuration:

  - table_name_with_schema: "public.customer"
    transformations:
      - columns: [ address_id ]
        params:
          type: foreign_key_generator
          parent_data_mode: ALL
          distribution: ROUND_ROBIN

This may result in the following data in the address table, with the new data highlighted in bold and color:

id street

11

Elm Street

22

Maple Street

33

Oak Street

The customer table will have address_id drawn from both pre-existing and newly created data:

id name address_id

1

Alice

11

2

Bob

22

3

Charlie

33

4

David

11

The same setup, but with the parent_data_mode set to OLD, would result in the following data in the following customer table: The same setup, but with the parent_data_mode set to OLD, would result in the following data in the following customer table:

id name address_id

1

Alice

11

2

Bob

22

3

Charlie

11

4

David

22

Note that the address_id for the new data is taken from the pre-existing data in the address table for Charlie and David.

If we replace the parent_data_mode with NEW, the customer table will have the data like shown below:

id name address_id

1

Alice

11

2

Bob

22

3

Charlie

33

4

David

33

Now, both Charlie and David have the address_id taken from the newly created data in the address table.

The OLD data mode can only work if the parent table is not empty. Otherwise, there would be no data to draw from. If the parent table is empty, an error will be thrown during the configuration validation phase.

Note that the parent_data_mode is a transformer-level setting, so if a table has multiple. Not all combinations of parent_data_mode are supported in the current version.

Current TDK release might not support some combinations of parent_data_mode for complex schemas. In such cases, an error will be thrown during the transformation planning phase. As the feature evolves, more corner cases will be covered.