[experimental] Data Append mode
Starting from version 1.58, Synthesized TDK supports not only provisioning data into an empty database, but also appending data to an existing database.
| This feature is experimental and is subject to change in the future as more feedback is gathered and more cases are covered. |
Limitations
Smart data append is only supported for the GENERATION mode. If a table is being transformed in the MASKING mode, the
resulting data will be appended to the existing output data without any reconciliation.
Configuration
The append mode is only in effect for operations modes that do not purge the output database before the transformation but at the same time allow the execution to proceed:
-
schema_create_mode: CREATE_IF_NOT_EXISTSorschema_create_mode: DO_NOT_CREATE -
table_truncation_mode: IGNORE
The append mode does not require any additional configuration. It automatically kicks in when the above condition is met.
The behaviour with regard to the data in the parent table is controlled by the parent_data_mode configuration option. The following values are supported:
-
OLD- only the data that existed in the parent table before the transformation is used to create the output data in the child table. -
NEW- only the data that was created by the transformation is used to create the output data in the child table. -
ALL(default) - both pre-existing and newly created data are used to create the output data in the child table.
The parent_data_mode setting only has effect in combination with the GENERATION transformation mode. In other modes, it is ignored.
To get hold of how the parent_data_mode options differ, consider the example of two tables linked with a foreign key.
CREATE TABLE public.customer (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL
);
CREATE TABLE public.transaction (
id SERIAL PRIMARY KEY,
customer_id INTEGER NOT NULL,
amount INTEGER NOT NULL,
date DATE NOT NULL,
FOREIGN KEY (customer_id) REFERENCES public.customer(id)
);
Let’s say we have the following data in the customer table:
| id | name | address_id |
|---|---|---|
1 |
Alice |
11 |
2 |
Bob |
22 |
And the following data in the address table:
| id | street |
|---|---|
11 |
Elm Street |
22 |
Maple Street |
Now, let’s run a transformation with the following configuration:
- table_name_with_schema: "public.customer"
transformations:
- columns: [ address_id ]
params:
type: foreign_key_generator
parent_data_mode: ALL
distribution: ROUND_ROBIN
This may result in the following data in the address table, with the new data highlighted in bold and color:
| id | street |
|---|---|
11 |
Elm Street |
22 |
Maple Street |
33 |
Oak Street |
The customer table will have address_id drawn from both pre-existing and newly created data:
| id | name | address_id |
|---|---|---|
1 |
Alice |
11 |
2 |
Bob |
22 |
3 |
Charlie |
33 |
4 |
David |
11 |
The same setup, but with the parent_data_mode set to OLD, would result in the following data in the following customer table:
The same setup, but with the parent_data_mode set to OLD, would result in the following data in the following customer table:
| id | name | address_id |
|---|---|---|
1 |
Alice |
11 |
2 |
Bob |
22 |
3 |
Charlie |
11 |
4 |
David |
22 |
Note that the address_id for the new data is taken from the pre-existing data in the address table for Charlie and David.
If we replace the parent_data_mode with NEW, the customer table will have the data like shown below:
| id | name | address_id |
|---|---|---|
1 |
Alice |
11 |
2 |
Bob |
22 |
3 |
Charlie |
33 |
4 |
David |
33 |
Now, both Charlie and David have the address_id taken from the newly created data in the address table.
The OLD data mode can only work if the parent table is not empty. Otherwise, there would be no data to draw from. If the parent table is empty, an error will be thrown during the configuration validation phase.
|
Note that the parent_data_mode is a transformer-level setting, so if a table has multiple. Not all combinations of parent_data_mode are supported in the current version.
Current TDK release might not support some combinations of parent_data_mode for complex schemas. In such cases, an error will be thrown during the transformation planning phase. As the feature evolves, more corner cases will be covered.
|