[experimental] Data append mode
Starting from version 1.58, Synthesized TDK supports not only provisioning data into an empty database, but also appending data to an existing database.
This feature is experimental and is subject to change in the future as more feedback is gathered and more cases are covered. |
Limitations
Smart data append is only supported for the GENERATION
mode. If a table is being transformed in the MASKING
mode, the
resulting data will be appended to the existing output data without any reconciliation.
Configuration
The append mode is only in effect for operations modes that do not purge the output database before the transformation but at the same time allow the execution to proceed:
-
schema_create_mode: CREATE_IF_NOT_EXISTS
orschema_create_mode: DO_NOT_CREATE
-
table_truncation_mode: IGNORE
The append mode does not require any additional configuration. It automatically kicks in when the above condition is met.
The behaviour with regard to the data in the parent table is controlled by the parent_data_mode
configuration option. The following values are supported:
-
OLD
- only the data that existed in the parent table before the transformation is used to create the output data in the child table. -
NEW
- only the data that was created by the transformation is used to create the output data in the child table. -
ALL
(default) - both pre-existing and newly created data are used to create the output data in the child table.
The parent_data_mode
setting only has effect in combination with the GENERATION
transformation mode. In other modes, it is ignored.
To get hold of how the parent_data_mode
options differ, consider the example of two tables linked with a foreign key.
CREATE TABLE public.customer (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL
);
CREATE TABLE public.transaction (
id SERIAL PRIMARY KEY,
customer_id INTEGER NOT NULL,
amount INTEGER NOT NULL,
date DATE NOT NULL,
FOREIGN KEY (customer_id) REFERENCES public.customer(id)
);
Let’s say we have the following data in the customer
table:
id | name | address_id |
---|---|---|
1 |
Alice |
11 |
2 |
Bob |
22 |
And the following data in the address
table:
id | street |
---|---|
11 |
Elm Street |
22 |
Maple Street |
Now, let’s run a transformation with the following configuration:
- table_name_with_schema: "public.customer"
transformations:
- columns: [ address_id ]
params:
type: foreign_key_generator
parent_data_mode: ALL
distribution: ROUND_ROBIN
This may result in the following data in the address
table, with the new data highlighted in bold and color:
id | street |
---|---|
11 |
Elm Street |
22 |
Maple Street |
33 |
Oak Street |
The customer
table will have address_id
drawn from both pre-existing and newly created data:
id | name | address_id |
---|---|---|
1 |
Alice |
11 |
2 |
Bob |
22 |
3 |
Charlie |
33 |
4 |
David |
11 |
The same setup, but with the parent_data_mode
set to OLD
, would result in the following data in the following customer
table:
The same setup, but with the parent_data_mode
set to OLD
, would result in the following data in the following customer
table:
id | name | address_id |
---|---|---|
1 |
Alice |
11 |
2 |
Bob |
22 |
3 |
Charlie |
11 |
4 |
David |
22 |
Note that the address_id
for the new data is taken from the pre-existing data in the address
table for Charlie and David.
If we replace the parent_data_mode
with NEW
, the customer
table will have the data like shown below:
id | name | address_id |
---|---|---|
1 |
Alice |
11 |
2 |
Bob |
22 |
3 |
Charlie |
33 |
4 |
David |
33 |
Now, both Charlie and David have the address_id
taken from the newly created data in the address
table.
The OLD data mode can only work if the parent table is not empty. Otherwise, there would be no data to draw from. If the parent table is empty, an error will be thrown during the configuration validation phase.
|
Note that the parent_data_mode
is a transformer-level setting, so if a table has multiple. Not all combinations of parent_data_mode
are supported in the current version.
Current TDK release might not support some combinations of parent_data_mode for complex schemas. In such cases, an error will be thrown during the transformation planning phase. As the feature evolves, more corner cases will be covered.
|