Virtual Foreign Keys

The Synthesized TDK is able to automatically understand and maintain the relationships defined using foreign keys. If tables are linked with foreign key constraints, any synthetic data will retain those links.

However, some systems define these links at the application layer rather than at the database layer. In such cases, Synthesized provides the option to define virtual foreign keys. This means that referential integrity can be preserved without any changes to the source database.

Virtual foreign keys are only needed for situations where the foreign keys are missing from the database. Foreign keys present in the database are handled automatically.

Configuration-defined foreign keys follow the same rules as database-defined foreign keys. Importantly, that means each virtual foreign key should refer to a combination of one or more columns that identify one unique row.

Adding a virtual foreign key

Each workflow script can contain a metadata element that contains all virtual foreign keys.

The structure is like this:

- metadata:
  tables:
    - table_name_with_schema: "DEPENDENT_SCHEMA.DEPENDENT_TABLE"
      foreign_keys:
        NAME_OF_THE_VIRTUAL_FOREIGN_KEY:
          referred_schema: "REFERRED_SCHEMA"
          referred_table: "REFERRED_TABLE"
          columns:
            - column: "DEPENDENT_COLUMN_NAME"
              referred_column: "REFERRED_COLUMN_NAME"

For example, a business has decided that rather than asking customers for their surname, they will use the staff member’s id instead. This is a business rule, rather than logic in the database.

default_config:
  target_ratio: 1
  mode: GENERATION

metadata:
  tables:
    - table_name_with_schema: "public.customer"
      foreign_keys:
        fk_surname_using_staff_id:
          referred_schema: "public"
          referred_table: "staff"
          columns:
            - column: "last_name"
              referred_column: "staff_id"
Output: public.customer

id

last_name

store_id

0

91

26

1

19

48

2

218

46

Output: public.staff

staff_id

store_id

…​

19

25

…​

91

0

…​

218

86

The metadata section adds a foreign key from public.customer.last_name to public.staff.staff_id, and refers to it as fk_surname_using_staff_id. All of the last_name values in the customer table come from the staff_id column of the staff table.

Referencing multiple columns

Virtual foreign keys can reference multiple columns. In the previous example, the customer’s store_id wasn’t linked to the linked staff member’s store_id. This can be fixed by referencing multiple columns in the foreign key.

default_config:
  target_ratio: 1
  mode: GENERATION

metadata:
  tables:
    - table_name_with_schema: "public.customer"
      foreign_keys:
        fk_surname_using_staff_id:
          referred_schema: "public"
          referred_table: "staff"
          columns:
            - column: "last_name"
              referred_column: "staff_id"
            - column: "store_id"
              referred_column: "store_id"
Output: public.customer

id

last_name

store_id

0

91

0

1

19

25

2

218

86

Output: public.staff

staff_id

store_id

…​

19

25

…​

91

0

…​

218

86

Now that the foreign key refers to both staff_id and store_id, the customer’s store_id matches the store_id of the staff member they have been linked to.

For the full reference docs, please see Configuration Reference: Metadata.