Linking CSV Files in a Workflow
Generate multiple CSV files in a single workflow and link columns across them with virtual foreign keys.
Overview
A single workflow can generate data for more than one CSV file at a time. When the input directory contains multiple table folders, each folder is treated as a separate logical table within the same public schema. You can then declare foreign-key relationships in the configuration so that values generated in one file reference values generated in another.
This is useful when you need referentially consistent data across files — for example, a parent file of profiles and a child file of transactions where each transaction must point to an existing profile.
Directory Structure
Organise the input bucket (or local directory) so that each linked file lives in its own table folder under a common parent directory:
my-bucket/my-input-dir/
├── PartyProfile/
│ └── profiles.csv
└── Equity/
└── transactions.csv
Each folder name becomes the table name. After the workflow runs, the output directory mirrors the same structure, with a single data.csv per folder:
my-bucket/my-output-dir/
├── PartyProfile/
│ └── data.csv
└── Equity/
└── data.csv
Configuration
The configuration defines both tables under tables, then declares the relationship between them under metadata.tables.foreign_keys. The schema is always public for CSV workflows.
In the example below:
-
PartyProfileis the parent table. ItsProfile IDcolumn is generated as a 10-digit sequence and kept (mode: KEEP) so child rows have stable values to reference. -
Equityis the child table. It generates 300 rows of transaction data, and itsProfile IDcolumn is linked back to the parent through a foreign key.
default_config:
mode: GENERATION
target_row_number: 100
use_working_directory: false
locale: "en-GB"
tables:
- table_name_with_schema: "public.PartyProfile"
mode: KEEP
transformations:
- columns: ["Profile ID"]
params:
type: string_sequence_generator
length: 10
start_from: "1"
alphabets:
- type: digits
- table_name_with_schema: "public.Equity"
target_row_number: 300
transformations:
- columns: ["FI name", "FI val"]
params:
type: loop_generator
repeatable: true
source:
value_source: "MULTIPLE_PROVIDED"
column_types:
"FI name": STRING
"FI val": NUMERIC
values:
- {"FI name": "Chase", "FI val": "213"}
- {"FI name": "Wells Fargo", "FI val": "3434"}
- columns: ["Amount"]
params:
type: continuous_generator
numeric_type: FLOAT
min: 1200.34
max: 123654.98
mean: 2344.00
std: 3423.00
round: 2
- columns: ["First Name"]
params:
type: person_generator
column_templates: ["${first_name}"]
- columns: ["Last name"]
params:
type: person_generator
column_templates: ["${last_name}"]
metadata:
tables:
- table_name_with_schema: "public.Equity"
foreign_keys:
fk_user_order:
referred_schema: "public"
referred_table: "PartyProfile"
columns:
- column: "Profile ID"
referred_column: "Profile ID"
cycle_resolution_strategy: "FAIL"
schema_creation_mode: "DROP_AND_CREATE"
table_truncation_mode: TRUNCATE
safety_mode: "RELAXED"
use_text_column_heuristics: true
How the Link Works
The metadata.tables.foreign_keys block declares a virtual foreign key between the two files. Synthesized:
-
Generates the parent table (
PartyProfile) first, producing the pool ofProfile IDvalues. -
Generates the child table (
Equity), drawing eachProfile IDvalue from the parent pool rather than inventing a new one.
The result is that every Profile ID in Equity/data.csv exists in PartyProfile/data.csv, so the two files can be joined safely downstream.
| Since CSV files do not carry foreign-key metadata of their own, the relationship must be declared in the configuration. This is the same mechanism used for virtual foreign keys on database workflows. |