Virtual Foreign Keys
The Synthesized TDK is able to automatically understand and maintain the relationships defined using foreign keys. If tables are linked with foreign key constraints, any synthetic data will retain those links.
However, some systems define these links at the application layer rather than at the database layer. In such cases, Synthesized provides the option to define virtual foreign keys. This means that referential integrity can be preserved without any changes to the source database.
Virtual foreign keys are only needed for situations where the foreign keys are missing from the database. Foreign keys present in the database are handled automatically. |
Configuration-defined foreign keys follow the same rules as database-defined foreign keys. Importantly, that means each virtual foreign key should refer to a combination of one or more columns that identify one unique row. |
Adding a virtual foreign key
Each workflow script can contain a metadata
element that contains all virtual foreign keys.
The structure is like this:
- metadata: tables: - table_name_with_schema: "DEPENDENT_SCHEMA.DEPENDENT_TABLE" foreign_keys: NAME_OF_THE_VIRTUAL_FOREIGN_KEY: referred_schema: "REFERRED_SCHEMA" referred_table: "REFERRED_TABLE" columns: - column: "DEPENDENT_COLUMN_NAME" referred_column: "REFERRED_COLUMN_NAME"
For example, a business has decided that rather than asking customers for their surname, they will use the staff member’s id instead. This is a business rule, rather than logic in the database.
default_config: target_ratio: 1 mode: GENERATION metadata: tables: - table_name_with_schema: "public.customer" foreign_keys: fk_surname_using_staff_id: referred_schema: "public" referred_table: "staff" columns: - column: "last_name" referred_column: "staff_id"
id |
last_name |
store_id |
0 |
91 |
26 |
1 |
19 |
48 |
2 |
218 |
46 |
staff_id |
store_id |
… |
|
19 |
25 |
… |
|
91 |
0 |
… |
|
218 |
86 |
The metadata section adds a foreign key from public.customer.last_name
to public.staff.staff_id
, and refers to it
as fk_surname_using_staff_id
. All of the last_name
values in the customer
table come from the staff_id
column of the staff
table.
Referencing multiple columns
Virtual foreign keys can reference multiple columns. In the previous example, the customer’s store_id
wasn’t linked to the linked staff member’s store_id
. This can be fixed by referencing multiple columns in the foreign key.
default_config: target_ratio: 1 mode: GENERATION metadata: tables: - table_name_with_schema: "public.customer" foreign_keys: fk_surname_using_staff_id: referred_schema: "public" referred_table: "staff" columns: - column: "last_name" referred_column: "staff_id" - column: "store_id" referred_column: "store_id"
id |
last_name |
store_id |
0 |
91 |
0 |
1 |
19 |
25 |
2 |
218 |
86 |
staff_id |
store_id |
… |
|
19 |
25 |
… |
|
91 |
0 |
… |
|
218 |
86 |
Now that the foreign key refers to both staff_id
and store_id
, the customer’s store_id
matches the store_id
of the staff member they have been linked to.
For the full reference docs, please see Configuration Reference: Metadata.