Scripting Transformer

The Scripting Transformer is Synthesized’s most flexible transformer. It lets you define custom logic using JavaScript, and supports both GENERATION and MASKING modes.

The Scripting Transformer is best suited for complex logic that can’t be handled by native transformers. Before using it, consider whether your use case can be addressed by other advanced transformers such as the Arithmetic, Sieve, or Conditional Transformers.

To ensure security and stability the Scripting Transformer enforces strict limitations. Only permitted external dependencies can be imported, network requests are blocked, and scripts may not access files.

Scripted generation

A Scripting Transformer defines a JavaScript lambda function. To generate a single value, the lambda function takes in the context of the generation (ctx) and returns a value.

    transformations:
      - columns:
          - name
        params:
          type: scripting_transformer
          language: "JAVASCRIPT"
          script:
            code: |
              (ctx) => "Hello, World!";
Output: public.language
id name

0

Hello, World!

1

Hello, World!

2

Hello, World!

This GENERATION script sets the name column to Hello, World! for every row.

Use the | character after code: to define a multi-line string. This preserves line breaks and indentation in YAML.

To generate multiple columns, the lambda function must return a dictionary of column-value pairs.

    transformations:
      - columns:
          - rental_rate
          - replacement_cost
        params:
          type: scripting_transformer
          language: "JAVASCRIPT"
          script:
            code: |
              (ctx) => {
                const actualCost = 100 + (Math.random() * 1900);

                return {
                  rental_rate: 0.99 + Math.floor(actualCost / 500),
                  replacement_cost: 3.99 + (Math.ceil(actualCost * 1.5 / 100))
                }
              }
Output: public.film
id rental_rate replacement_cost

0

2.99

19.99

1

2.99

20.99

2

3.99

27.99

This example generates both the rental_rate and the replacement_cost. To do this:

  1. It generates a random original cost between 1.00 and 20.00.

  2. It uses that cost to come up with the price to rent the item and to replace it.

  3. The function returns a dictionary with keys that match the column names.

Scripted masking

Using a Scripting Transformer for masking is similar to generation. The key difference is that the function takes an extra argument: originalRecord. This allows you to use the original unmasked values during masking.

    transformations:
      - columns:
          - title
        params:
          type: scripting_transformer
          language: "JAVASCRIPT"
          script:
            code: |
              (ctx, originalRecord) => originalRecord;
Input: public.film
id title

0

ACADEMY DINOSAUR

1

ACE GOLDFINGER

2

ADAPTATION HOLES

Output: public.film
id title

0

ACADEMY DINOSAUR

1

ACE GOLDFINGER

2

ADAPTATION HOLES

By returning the originalRecord as the result, this script has left the input values unchanged. This is effectively how the Passthrough transformer works.

To access the data inside an originalRecord, you can use originalRecord.get(column_name). Importantly, the originalRecord contains data from all columns in the original table, not just columns that are getting masked.

    transformations:
      - columns:
          - title
        params:
          type: scripting_transformer
          language: "JAVASCRIPT"
          script:
            code: |
              (ctx, originalRecord) => {
                const title = originalRecord.get("title");
                return title.split(' ').reverse().join(' ');
              }
Input: public.film
id title

0

ACADEMY DINOSAUR

1

ACE GOLDFINGER

2

ADAPTATION HOLES

Output: public.film
id title

0

DINOSAUR ACADEMY

1

GOLDFINGER ACE

2

HOLES ADAPTATION

This script took the original values from the title column and reversed the order of the words.

Managing state with init_script

The script in a Scripting Transformer is called for each record. To save state or increase speed, you can also write scripts that apply before generation occurs. The init_script runs before generation, and allows you to define variables and functions that can be reused by the script.

    transformations:
      - columns:
          - payment_id
        params:
          type: scripting_transformer
          language: "JAVASCRIPT"
          init_script:
            code: |
              var previousId = 0;

              function nextId(difference) {
                previousId += difference;
                return previousId;
              }
          script:
            code: |
              (ctx) => nextId(1 + Math.floor(Math.random() * 5));
Output: public.payment
id payment_id

0

3

1

7

2

11

In this GENERATION, payment ids will continuously increase with gaps of 1-5 between each. Both the previousId variable and nextId function are available in the script.

Storing custom scripts

Simple scripts can be written directly in the workflow config files. If your scripts grow and make it hard to read the config file, you can separate them out into separate files.

The scripts can be stored on the local file system, AWS S3 or Google Storage:

    transformations:
      - columns:
          - credit_card
        params:
          type: scripting_transformer
          language: "JAVASCRIPT"
          init_script:
            file: "/path/init_script.js"
          script:
            file: "/path/script.js"

The path to your scripting folder will need to be added to the configuration. For example, running docker compose you could add a scripts folder as a volume under the backend and the agents:

services:
  backend:
    ...
    volumes:
      - ./scripts:/scripts
    ...
  agent:
    ...
    volumes:
      - ./scripts:/scripts
    ...

To load scripts from S3, the property TDK_AWS_ENABLED==true should be enabled. More details can be found here.

The load scripts from Google Storage, the property TDK_GCP_ENABLED==true should be enabled. More details can be found here.

Reusing scripts for different transformers

The utility objects column and columns are available from within scripts. These properties are designed to allow script reuse across multiple transformers.

  • columns - The list of columns the transformer is producing.

  • column - The column name this transformer is producing. This variable is only bound for single-column transformations

These properties make it possible to use the same script in multiple places, but with individual configuration.

(ctx, originalRecord) => {
  const max_chars = additionalProperties.max_characters

  const firstColumn = columns.get(0);
  const secondColumn = columns.get(1);

  const firstContents = originalRecord.get(firstColumn);
  const secondContents = originalRecord.get(secondColumn);

  return {
    [firstColumn]: secondContents.slice(max_chars),
    [secondColumn]: firstContents.slice(max_chars)
  }
}
  - table_name_with_schema: public.actor
    transformations:
      - columns:
          - first_name
          - last_name
        params:
          type: scripting_transformer
          language: "JAVASCRIPT"
          script:
            file: "/scripts/flip_values.js"

  - table_name_with_schema: public.film
    transformations:
      - columns:
          - title
          - description
        params:
          type: scripting_transformer
          language: "JAVASCRIPT"
          script:
            file: "/scripts/flip_values.js"
Input: public.actor
id first_name last_name

0

PENELOPE

GUINESS

1

NICK

WAHLBERG

2

ED

CHASE

Input: public.film
id title description

0

ACADEMY DINOSAUR

A Epic Drama of a Feminist And a Mad…​

1

ACE GOLDFINGER

A Astounding Epistle of a Database Admin…​

2

ADAPTATION HOLES

A Astounding Reflection of a Lumberjack…​

Output: public.actor
id first_name last_name

0

GUINESS

PENELOPE

1

WAHLBERG

NICK

2

CHASE

ED

Output: public.film
id title description

0

A Epic Drama of a Feminist And a Mad…​

ACADEMY DINOSAUR

1

A Astounding Epistle of a Database Admin…​

ACE GOLDFINGER

2

A Astounding Reflection of a Lumberjack…​

ADAPTATION HOLES

This MASKING example shows the same script getting used in two places to flip the values in two columns.

Instead of hardcoding the column names, it gets them from columns. This allowed it to work on both first_name-last_name and title-description.

Working with the generation context

Each Scripting Transformer function takes in a ctx object. This represents the context of the current row’s generation. This object can be used for sophisticated scripts that need to consider details like generated output values.

The available functions are:

function type description

getTable()

String

The current table that we are generating

getRowNum()

Long

The row number of the current row

getResultRecord()

Record

A representation of the row that is being generated

getParentRecords()

Map<ReferenceKey, List<Record>>

Chosen records from all already generated tables that are referred by a foreign key

getGlobalSeed()

String

The global random generator seed as a Base64 string. Only if a global seed has been set.

getOutputTableSize()

Long

The target number of rows in the table

Where Records represent dictionaries of key value pairs. They contain a get(String) → Any method to access the values.

    transformations:
      - columns:
          - rental_rate
        params:
          type: categorical_generator

      - columns:
          - replacement_cost
        params:
          type: scripting_transformer
          language: "JAVASCRIPT"
          script:
            code: |
              (ctx) => {
                const rental_rate = ctx.getResultRecord().get('rental_rate');
                return 5 + rental_rate * (5 + (Math.random() * 5));
              };
Output: public.film
id rental_rate replacement_cost

0

0.99

13.15

1

4.99

49.95

2

2.99

20.17

In this GENERATION example, the rental rate is generated using a Categorical Generator. This script uses the generated value to calculate a new replacement cost.

Available classes and functions

The JavaScript environment is based on the ECMAScript 2021 (ES12) standard, and includes the built-in objects, global functions, and control flow constructs defined in the specification.

Category Available Features

Primitives

undefined, null, Infinity, NaN

Wrapper Objects

String, Number, Boolean, BigInt, Symbol

Utility Objects

Object, Function, Array, Date, Math, RegExp, Error and subclasses

Collections

Map, Set, WeakMap, WeakSet

JSON

JSON.parse(), JSON.stringify()

Global Functions

eval(), isFinite(), isNaN(), parseFloat(), parseInt(), encodeURI(), decodeURI(), encodeURIComponent(), decodeURIComponent()

The environment also provides access to some key Java and Kotlin packages:

  • java.util

  • java.lang

  • java.time

  • javax.crypto

  • kotlin.Pair

This script is always executed to simplify usage of classes from the host application:

const JavaHashMap = Java.type("java.util.HashMap");
const JavaLong = Java.type("java.lang.Long");
const JavaDouble = Java.type("java.lang.Double");
const JavaInteger = Java.type("java.lang.Integer");
const JavaRandom = Java.type("java.util.Random");
const JavaLocalDateTime = Java.type("java.time.LocalDateTime");
const JavaMath = Java.type("java.lang.Math");
const Pair = Java.type("kotlin.Pair");

Scripts that create dates

Synthesized dates should be returned as Java LocalDateTime objects. This example shows how you can work with dates:

    transformations:
      - columns:
          - rental_date
          - return_date
          - last_update
        params:
          type: scripting_transformer
          language: "JAVASCRIPT"
          script:
            code: |
              (ctx) => {
                const ChronoUnit = Java.type('java.time.temporal.ChronoUnit');

                const now = JavaLocalDateTime.now();
                const startPoint = JavaLocalDateTime.parse("2022-01-01T00:00:00");

                const totalSeconds = startPoint.until(now, ChronoUnit.SECONDS);
                const randomSeconds = Math.floor(Math.random() * totalSeconds);

                const rentalDate = startPoint.plusSeconds(randomSeconds);

                return {
                  rental_date: rentalDate,
                  return_date: rentalDate.plusWeeks(1),
                  last_update: JavaLocalDateTime.now()
                };
              }

This GENERATION example produces three different dates.

The last_update date uses the current timestamp JavaLocalDateTime.now();

The rental_date is a random date since 2022-01-01T00:00:00". It uses parse to get a start point, then adds a random number of seconds with plusSeconds.

The return_date takes the rental date and adds one week with plusWeeks(1).