Scripting Transformer
The Scripting Transformer is Synthesized’s most flexible transformer. It lets you define custom logic using JavaScript, and supports both GENERATION
and MASKING
modes.
The Scripting Transformer is best suited for complex logic that can’t be handled by native transformers. Before using it, consider whether your use case can be addressed by other advanced transformers such as the Arithmetic, Sieve, or Conditional Transformers. |
To ensure security and stability the Scripting Transformer enforces strict limitations. Only permitted external dependencies can be imported, network requests are blocked, and scripts may not access files.
Scripted generation
A Scripting Transformer defines a JavaScript lambda function. To generate a single value, the lambda
function takes in the context of the generation (ctx
) and returns a value.
transformations: - columns: - name params: type: scripting_transformer language: "JAVASCRIPT" script: code: | (ctx) => "Hello, World!";
id | name |
---|---|
0 |
Hello, World! |
1 |
Hello, World! |
2 |
Hello, World! |
This GENERATION
script sets the name
column to Hello, World!
for every row.
Use the |
To generate multiple columns, the lambda function must return a dictionary of column-value pairs.
transformations: - columns: - rental_rate - replacement_cost params: type: scripting_transformer language: "JAVASCRIPT" script: code: | (ctx) => { const actualCost = 100 + (Math.random() * 1900); return { rental_rate: 0.99 + Math.floor(actualCost / 500), replacement_cost: 3.99 + (Math.ceil(actualCost * 1.5 / 100)) } }
id | rental_rate | replacement_cost |
---|---|---|
0 |
2.99 |
19.99 |
1 |
2.99 |
20.99 |
2 |
3.99 |
27.99 |
This example generates both the rental_rate
and the replacement_cost
. To do this:
-
It generates a random original cost between 1.00 and 20.00.
-
It uses that cost to come up with the price to rent the item and to replace it.
-
The function returns a dictionary with keys that match the column names.
Scripted masking
Using a Scripting Transformer for masking is similar to generation. The key difference is that the function takes an
extra argument: originalRecord
. This allows you to use the original unmasked values during masking.
transformations: - columns: - title params: type: scripting_transformer language: "JAVASCRIPT" script: code: | (ctx, originalRecord) => originalRecord;
id | title |
---|---|
0 |
ACADEMY DINOSAUR |
1 |
ACE GOLDFINGER |
2 |
ADAPTATION HOLES |
id | title |
---|---|
0 |
ACADEMY DINOSAUR |
1 |
ACE GOLDFINGER |
2 |
ADAPTATION HOLES |
By returning the originalRecord
as the result, this script has left the input values unchanged. This is effectively how the Passthrough transformer works.
To access the data inside an originalRecord
, you can use originalRecord.get(column_name)
. Importantly, the originalRecord
contains data from all columns in the original table, not just columns that are getting masked.
transformations: - columns: - title params: type: scripting_transformer language: "JAVASCRIPT" script: code: | (ctx, originalRecord) => { const title = originalRecord.get("title"); return title.split(' ').reverse().join(' '); }
id | title |
---|---|
0 |
ACADEMY DINOSAUR |
1 |
ACE GOLDFINGER |
2 |
ADAPTATION HOLES |
id | title |
---|---|
0 |
DINOSAUR ACADEMY |
1 |
GOLDFINGER ACE |
2 |
HOLES ADAPTATION |
This script took the original values from the title
column and reversed the order of the words.
Managing state with init_script
The script in a Scripting Transformer is called for each record. To save state or increase speed, you can also write
scripts that apply before generation occurs. The init_script
runs before generation, and allows you to define
variables and functions that can be reused by the script.
transformations: - columns: - payment_id params: type: scripting_transformer language: "JAVASCRIPT" init_script: code: | var previousId = 0; function nextId(difference) { previousId += difference; return previousId; } script: code: | (ctx) => nextId(1 + Math.floor(Math.random() * 5));
id | payment_id |
---|---|
0 |
3 |
1 |
7 |
2 |
11 |
In this GENERATION
, payment ids will continuously increase with gaps of 1-5 between each.
Both the previousId
variable and nextId
function are available in the script.
Storing custom scripts
Simple scripts can be written directly in the workflow config files. If your scripts grow and make it hard to read the config file, you can separate them out into separate files.
The scripts can be stored on the local file system, AWS S3 or Google Storage:
transformations:
- columns:
- credit_card
params:
type: scripting_transformer
language: "JAVASCRIPT"
init_script:
file: "/path/init_script.js"
script:
file: "/path/script.js"
The path to your scripting folder will need to be added to the configuration. For example, running docker compose you could add a scripts folder as a volume under the backend and the agents:
services:
backend:
...
volumes:
- ./scripts:/scripts
...
agent:
...
volumes:
- ./scripts:/scripts
...
To load scripts from S3, the property TDK_AWS_ENABLED==true
should be enabled. More details can be found
here.
The load scripts from Google Storage, the property TDK_GCP_ENABLED==true
should be enabled. More details can be found
here.
Reusing scripts for different transformers
The utility objects column
and columns
are available from within scripts. These
properties are designed to allow script reuse across multiple transformers.
-
columns
- The list of columns the transformer is producing. -
column
- The column name this transformer is producing. This variable is only bound for single-column transformations
These properties make it possible to use the same script in multiple places, but with individual configuration.
(ctx, originalRecord) => {
const max_chars = additionalProperties.max_characters
const firstColumn = columns.get(0);
const secondColumn = columns.get(1);
const firstContents = originalRecord.get(firstColumn);
const secondContents = originalRecord.get(secondColumn);
return {
[firstColumn]: secondContents.slice(max_chars),
[secondColumn]: firstContents.slice(max_chars)
}
}
- table_name_with_schema: public.actor transformations: - columns: - first_name - last_name params: type: scripting_transformer language: "JAVASCRIPT" script: file: "/scripts/flip_values.js" - table_name_with_schema: public.film transformations: - columns: - title - description params: type: scripting_transformer language: "JAVASCRIPT" script: file: "/scripts/flip_values.js"
id | first_name | last_name |
---|---|---|
0 |
PENELOPE |
GUINESS |
1 |
NICK |
WAHLBERG |
2 |
ED |
CHASE |
id | title | description |
---|---|---|
0 |
ACADEMY DINOSAUR |
A Epic Drama of a Feminist And a Mad… |
1 |
ACE GOLDFINGER |
A Astounding Epistle of a Database Admin… |
2 |
ADAPTATION HOLES |
A Astounding Reflection of a Lumberjack… |
id | first_name | last_name |
---|---|---|
0 |
GUINESS |
PENELOPE |
1 |
WAHLBERG |
NICK |
2 |
CHASE |
ED |
id | title | description |
---|---|---|
0 |
A Epic Drama of a Feminist And a Mad… |
ACADEMY DINOSAUR |
1 |
A Astounding Epistle of a Database Admin… |
ACE GOLDFINGER |
2 |
A Astounding Reflection of a Lumberjack… |
ADAPTATION HOLES |
This MASKING
example shows the same script getting used in two places to flip the values in two columns.
Instead of hardcoding the column names, it gets them from columns
.
This allowed it to work on both first_name
-last_name
and title
-description
.
Working with the generation context
Each Scripting Transformer function takes in a ctx
object.
This represents the context of the current row’s generation.
This object can be used for sophisticated scripts that need to consider details like generated output values.
The available functions are:
function | type | description |
---|---|---|
getTable() |
String |
The current table that we are generating |
getRowNum() |
Long |
The row number of the current row |
getResultRecord() |
Record |
A representation of the row that is being generated |
getParentRecords() |
Map<ReferenceKey, List<Record>> |
Chosen records from all already generated tables that are referred by a foreign key |
getGlobalSeed() |
String |
The global random generator seed as a Base64 string. Only if a global seed has been set. |
getOutputTableSize() |
Long |
The target number of rows in the table |
Where Record
s represent dictionaries of key value pairs.
They contain a get(String) → Any
method to access the values.
transformations: - columns: - rental_rate params: type: categorical_generator - columns: - replacement_cost params: type: scripting_transformer language: "JAVASCRIPT" script: code: | (ctx) => { const rental_rate = ctx.getResultRecord().get('rental_rate'); return 5 + rental_rate * (5 + (Math.random() * 5)); };
id | rental_rate | replacement_cost |
---|---|---|
0 |
0.99 |
13.15 |
1 |
4.99 |
49.95 |
2 |
2.99 |
20.17 |
In this GENERATION
example, the rental rate is generated using a Categorical Generator.
This script uses the generated value to calculate a new replacement cost.
Available classes and functions
The JavaScript environment is based on the ECMAScript 2021 (ES12) standard, and includes the built-in objects, global functions, and control flow constructs defined in the specification.
Category | Available Features |
---|---|
Primitives |
|
Wrapper Objects |
|
Utility Objects |
|
Collections |
|
JSON |
|
Global Functions |
|
The environment also provides access to some key Java and Kotlin packages:
-
java.util
-
java.lang
-
java.time
-
javax.crypto
-
kotlin.Pair
This script is always executed to simplify usage of classes from the host application:
const JavaHashMap = Java.type("java.util.HashMap");
const JavaLong = Java.type("java.lang.Long");
const JavaDouble = Java.type("java.lang.Double");
const JavaInteger = Java.type("java.lang.Integer");
const JavaRandom = Java.type("java.util.Random");
const JavaLocalDateTime = Java.type("java.time.LocalDateTime");
const JavaMath = Java.type("java.lang.Math");
const Pair = Java.type("kotlin.Pair");
Scripts that create dates
Synthesized dates should be returned as Java LocalDateTime objects. This example shows how you can work with dates:
transformations: - columns: - rental_date - return_date - last_update params: type: scripting_transformer language: "JAVASCRIPT" script: code: | (ctx) => { const ChronoUnit = Java.type('java.time.temporal.ChronoUnit'); const now = JavaLocalDateTime.now(); const startPoint = JavaLocalDateTime.parse("2022-01-01T00:00:00"); const totalSeconds = startPoint.until(now, ChronoUnit.SECONDS); const randomSeconds = Math.floor(Math.random() * totalSeconds); const rentalDate = startPoint.plusSeconds(randomSeconds); return { rental_date: rentalDate, return_date: rentalDate.plusWeeks(1), last_update: JavaLocalDateTime.now() }; }
This GENERATION
example produces three different dates.
The last_update
date uses the current timestamp JavaLocalDateTime.now();
The rental_date
is a random date since 2022-01-01T00:00:00"
. It uses parse
to get a start point, then adds a random number of seconds with plusSeconds
.
The return_date
takes the rental date and adds one week with plusWeeks(1)
.