Scripting Transformer
The Scripting Transformer is Synthesized’s most flexible transformer. It lets you define custom logic using JavaScript, and supports both GENERATION and MASKING modes.
|
The Scripting Transformer is best suited for complex logic that can’t be handled by native transformers. Before using it, consider whether your use case can be addressed by other advanced transformers such as the Arithmetic, Sieve, or Conditional Transformers. |
To ensure security and stability the Scripting Transformer enforces strict limitations. Only permitted external dependencies can be imported, network requests are blocked, and scripts may not access files.
Scripted generation
A Scripting Transformer defines a JavaScript lambda function. To generate a single value, the lambda
function takes in the context of the generation (ctx) and returns a value.
transformations:
- columns:
- name
params:
type: scripting_transformer
language: "JAVASCRIPT"
script:
code: |
(ctx) => "Hello, World!";
| id | name |
|---|---|
0 |
Hello, World! |
1 |
Hello, World! |
2 |
Hello, World! |
This GENERATION script sets the name column to Hello, World! for every row.
|
Use the |
To generate multiple columns, the lambda function must return a dictionary of column-value pairs.
transformations:
- columns:
- rental_rate
- replacement_cost
params:
type: scripting_transformer
language: "JAVASCRIPT"
script:
code: |
(ctx) => {
const actualCost = 100 + (Math.random() * 1900);
return {
rental_rate: 0.99 + Math.floor(actualCost / 500),
replacement_cost: 3.99 + (Math.ceil(actualCost * 1.5 / 100))
}
}
| id | rental_rate | replacement_cost |
|---|---|---|
0 |
2.99 |
19.99 |
1 |
2.99 |
20.99 |
2 |
3.99 |
27.99 |
This example generates both the rental_rate and the replacement_cost. To do this:
-
It generates a random original cost between 1.00 and 20.00.
-
It uses that cost to come up with the price to rent the item and to replace it.
-
The function returns a dictionary with keys that match the column names.
Scripted masking
Using a Scripting Transformer for masking is similar to generation. The key difference is that the function takes an
extra argument: originalRecord. This allows you to use the original unmasked values during masking.
transformations:
- columns:
- title
params:
type: scripting_transformer
language: "JAVASCRIPT"
script:
code: |
(ctx, originalRecord) => originalRecord;
| id | title |
|---|---|
0 |
ACADEMY DINOSAUR |
1 |
ACE GOLDFINGER |
2 |
ADAPTATION HOLES |
| id | title |
|---|---|
0 |
ACADEMY DINOSAUR |
1 |
ACE GOLDFINGER |
2 |
ADAPTATION HOLES |
By returning the originalRecord as the result, this script has left the input values unchanged. This is effectively how the Passthrough transformer works.
To access the data inside an originalRecord, you can use originalRecord.get(column_name). Importantly, the originalRecord contains data from all columns in the original table, not just columns that are getting masked.
transformations:
- columns:
- title
params:
type: scripting_transformer
language: "JAVASCRIPT"
script:
code: |
(ctx, originalRecord) => {
const title = originalRecord.get("title");
return title.split(' ').reverse().join(' ');
}
| id | title |
|---|---|
0 |
ACADEMY DINOSAUR |
1 |
ACE GOLDFINGER |
2 |
ADAPTATION HOLES |
| id | title |
|---|---|
0 |
DINOSAUR ACADEMY |
1 |
GOLDFINGER ACE |
2 |
HOLES ADAPTATION |
This script took the original values from the title column and reversed the order of the words.
Managing state with init_script
The script in a Scripting Transformer is called for each record. To save state or increase speed, you can also write
scripts that apply before generation occurs. The init_script runs before generation, and allows you to define
variables and functions that can be reused by the script.
transformations:
- columns:
- payment_id
params:
type: scripting_transformer
language: "JAVASCRIPT"
init_script:
code: |
var previousId = 0;
function nextId(difference) {
previousId += difference;
return previousId;
}
script:
code: |
(ctx) => nextId(1 + Math.floor(Math.random() * 5));
| id | payment_id |
|---|---|
0 |
3 |
1 |
7 |
2 |
11 |
In this GENERATION, payment ids will continuously increase with gaps of 1-5 between each.
Both the previousId variable and nextId function are available in the script.
Storing custom scripts
Simple scripts can be written directly in the workflow config files. If your scripts grow and make it hard to read the config file, you can separate them out into separate files.
The scripts can be stored on the local file system, AWS S3 or Google Storage:
transformations:
- columns:
- credit_card
params:
type: scripting_transformer
language: "JAVASCRIPT"
init_script:
file: "/path/init_script.js"
script:
file: "/path/script.js"
The path to your scripting folder will need to be added to the configuration. For example, running docker compose you could add a scripts folder as a volume under the backend and the agents:
services:
backend:
...
volumes:
- ./scripts:/scripts
...
agent:
...
volumes:
- ./scripts:/scripts
...
To load scripts from S3, the property TDK_AWS_ENABLED==true should be enabled. More details can be found
here.
The load scripts from Google Storage, the property TDK_GCP_ENABLED==true should be enabled. More details can be found
here.
Reusing scripts for different transformers
The utility objects column and columns are available from within scripts. These
properties are designed to allow script reuse across multiple transformers.
-
columns- The list of columns the transformer is producing. -
column- The column name this transformer is producing. This variable is only bound for single-column transformations
These properties make it possible to use the same script in multiple places, but with individual configuration.
(ctx, originalRecord) => {
const max_chars = additionalProperties.max_characters
const firstColumn = columns.get(0);
const secondColumn = columns.get(1);
const firstContents = originalRecord.get(firstColumn);
const secondContents = originalRecord.get(secondColumn);
return {
[firstColumn]: secondContents.slice(max_chars),
[secondColumn]: firstContents.slice(max_chars)
}
}
- table_name_with_schema: public.actor
transformations:
- columns:
- first_name
- last_name
params:
type: scripting_transformer
language: "JAVASCRIPT"
script:
file: "/scripts/flip_values.js"
- table_name_with_schema: public.film
transformations:
- columns:
- title
- description
params:
type: scripting_transformer
language: "JAVASCRIPT"
script:
file: "/scripts/flip_values.js"
| id | first_name | last_name |
|---|---|---|
0 |
PENELOPE |
GUINESS |
1 |
NICK |
WAHLBERG |
2 |
ED |
CHASE |
| id | title | description |
|---|---|---|
0 |
ACADEMY DINOSAUR |
A Epic Drama of a Feminist And a Mad… |
1 |
ACE GOLDFINGER |
A Astounding Epistle of a Database Admin… |
2 |
ADAPTATION HOLES |
A Astounding Reflection of a Lumberjack… |
| id | first_name | last_name |
|---|---|---|
0 |
GUINESS |
PENELOPE |
1 |
WAHLBERG |
NICK |
2 |
CHASE |
ED |
| id | title | description |
|---|---|---|
0 |
A Epic Drama of a Feminist And a Mad… |
ACADEMY DINOSAUR |
1 |
A Astounding Epistle of a Database Admin… |
ACE GOLDFINGER |
2 |
A Astounding Reflection of a Lumberjack… |
ADAPTATION HOLES |
This MASKING example shows the same script getting used in two places to flip the values in two columns.
Instead of hardcoding the column names, it gets them from columns.
This allowed it to work on both first_name-last_name and title-description.
Working with the generation context
Each Scripting Transformer function takes in a ctx object.
This represents the context of the current row’s generation.
This object can be used for sophisticated scripts that need to consider details like generated output values.
The available functions are:
| function | type | description |
|---|---|---|
getTable() |
String |
The current table that we are generating |
getRowNum() |
Long |
The row number of the current row |
getResultRecord() |
Record |
A representation of the row that is being generated |
getParentRecords() |
Map<ReferenceKey, List<Record>> |
Chosen records from all already generated tables that are referred by a foreign key |
getGlobalSeed() |
String |
The global random generator seed as a Base64 string. Only if a global seed has been set. |
getOutputTableSize() |
Long |
The target number of rows in the table |
Where Records represent dictionaries of key value pairs.
They contain a get(String) → Any method to access the values.
transformations:
- columns:
- rental_rate
params:
type: categorical_generator
- columns:
- replacement_cost
params:
type: scripting_transformer
language: "JAVASCRIPT"
script:
code: |
(ctx) => {
const rental_rate = ctx.getResultRecord().get('rental_rate');
return 5 + rental_rate * (5 + (Math.random() * 5));
};
| id | rental_rate | replacement_cost |
|---|---|---|
0 |
0.99 |
13.15 |
1 |
4.99 |
49.95 |
2 |
2.99 |
20.17 |
In this GENERATION example, the rental rate is generated using a Categorical Generator.
This script uses the generated value to calculate a new replacement cost.
Available classes and functions
The JavaScript environment is based on the ECMAScript 2021 (ES12) standard, and includes the built-in objects, global functions, and control flow constructs defined in the specification.
| Category | Available Features |
|---|---|
Primitives |
|
Wrapper Objects |
|
Utility Objects |
|
Collections |
|
JSON |
|
Global Functions |
|
The environment also provides access to some key Java and Kotlin packages:
-
java.util -
java.lang -
java.time -
javax.crypto -
kotlin.Pair
This script is always executed to simplify usage of classes from the host application:
const JavaHashMap = Java.type("java.util.HashMap");
const JavaLong = Java.type("java.lang.Long");
const JavaDouble = Java.type("java.lang.Double");
const JavaInteger = Java.type("java.lang.Integer");
const JavaRandom = Java.type("java.util.Random");
const JavaLocalDateTime = Java.type("java.time.LocalDateTime");
const JavaMath = Java.type("java.lang.Math");
const Pair = Java.type("kotlin.Pair");
Scripts that create dates
Synthesized dates should be returned as Java LocalDateTime objects. This example shows how you can work with dates:
transformations:
- columns:
- rental_date
- return_date
- last_update
params:
type: scripting_transformer
language: "JAVASCRIPT"
script:
code: |
(ctx) => {
const ChronoUnit = Java.type('java.time.temporal.ChronoUnit');
const now = JavaLocalDateTime.now();
const startPoint = JavaLocalDateTime.parse("2022-01-01T00:00:00");
const totalSeconds = startPoint.until(now, ChronoUnit.SECONDS);
const randomSeconds = Math.floor(Math.random() * totalSeconds);
const rentalDate = startPoint.plusSeconds(randomSeconds);
return {
rental_date: rentalDate,
return_date: rentalDate.plusWeeks(1),
last_update: JavaLocalDateTime.now()
};
}
This GENERATION example produces three different dates.
The last_update date uses the current timestamp JavaLocalDateTime.now();
The rental_date is a random date since 2022-01-01T00:00:00". It uses parse to get a start point, then adds a random number of seconds with plusSeconds.
The return_date takes the rental date and adds one week with plusWeeks(1).