Scripting Transformer

The scripting transformer allows you to implement own logic for both the GENERATION mode and the MASKING mode.

Currently only Javascript implementation is supported.

TDK API classes

The following classes are available as public TDK API and can be used in scripts.

package io.synthesized.tdk.api

/**
* Provides all necessary information for transformer to produce a result.
* @param table current table that we are generating.
* @param rowNum a number of the current row.
* @param resultRecord a record that is being produced by generators in a table.
* @param parentRecords chosen records from all already generated tables that are referred by a foreign key.
* @param globalSeed global random generator seed.
* @param outputTableSize desired output table size.
*/
data class GenerationContext(
    val table: String,
    val rowNum: Long,
    val resultRecord: Record,
    val parentRecords: Map<String, List<Record>>,
    val globalSeed: Int = 0,
    val outputTableSize: Long
)
package io.synthesized.tdk.api

/**
 *  Represents a "row" in original and produced tables
 */
class Record(val fields: Map<String, Any?>) {
    constructor(vararg nameValues: Pair<String, Any?>) : this(nameValues.toMap())

    /**
     * Return a new record containing only selected columns.
     */
    fun getSubRecord(fieldNames: List<String>): Record {
        check(fields.keys.containsAll(fieldNames)) {
            "Given fieldNames $fieldNames not present in record keys ${fields.keys}"
        }
        return Record(
            fieldNames.associateWith { fields[it] }
        )
    }

    fun getSubRecord(vararg fieldNames: String): Record = getSubRecord(fieldNames.toList())

    operator fun plus(newFields: Map<String, Any?>): Record = Record(fields.plus(newFields))

    operator fun plus(newFields: Pair<String, Any?>): Record = Record(fields.plus(newFields))

    operator fun plus(otherRecord: Record): Record = this.plus(otherRecord.fields)

    operator fun get(key: String): Any? {
        check(fields.containsKey(key)) { "Given key $key not present." }
        return fields[key]
    }

    override fun toString(): String = "Record(fields=$fields)"

    fun asMap(): Map<String, Any?> = fields
}

The getters are generated for each Kotlin field. For example, to get the globalSeed from the GlobalContext, you should use the following syntax (Javascript):

(ctx) => {
    const seed = ctx.getGlobalSeed();
    ...
}

GraalJS implementation

GraalJS is used as an implementation of the JavaScript programming language. More information can be found at the documentation.

The script for GENERATION mode must define a lambda function that returns a dictionary where keys are column names, and the values are the desired values of the record.

If a transformer is applied to a single column, the value may be returned instead of a dictionary.

The script for the MASKING mode must define a lambda function with the two arguments ctx and originalRecord.

The following example shows how to use a custom script for multiple columns and MASKING mode.

    transformations:
      - columns:
          - textdescription
          - htmldescription
        params:
          type: scripting_transformer
          language: "JAVASCRIPT"
          script:
            code: |
              /**
               * @typedef { Object.<string, *> | * } Result
               *
               * @param {GenerationContext} ctx
               * @param {Record} originalRecord
               * @returns {Result}
               */
              (ctx, originalRecord) => {
                const dict = originalRecord.asMap();
                const textDescriptionColumn = columns.get(0);
                const htmlDescriptionColumn = columns.get(1);
                const descriptionWithoutSpaces = dict.get(textDescriptionColumn).trim();
                return { [textDescriptionColumn]: descriptionWithoutSpaces, [htmlDescriptionColumn]: descriptionWithoutSpaces };
              }

The script for the GENERATION mode should define a lambda function with the single argument ctx.

The following example shows how to use a custom script for GENERATION:

    transformations:
      - columns:
          - credit_card
        params:
          type: scripting_transformer
          language: "JAVASCRIPT"
          additional_properties:
            first_credit_card_digit: 4
          init_script:
            code: |
              /**
               * @returns {String}
               */
              function generateRandomCreditCardNumber() {
                let creditCardNumber = additionalProperties["first_credit_card_digit"]

                for (let i = 1; i < 16; i++) {
                  const digit = Math.floor(Math.random() * 10);
                  creditCardNumber += digit.toString();
                }

                return creditCardNumber;
              }
          script:
            code: |
              /**
               * @typedef { Object.<string, *> | * } Result
               *
               * @param {GenerationContext} ctx
               * @returns {Result}
               */
              (ctx) => generateRandomCreditCardNumber();

The script can also be located on local file system, AWS S3 and Google Storage:

    transformations:
      - columns:
          - credit_card
        params:
          type: scripting_transformer
          language: "JAVASCRIPT"
          init_script:
            file: "/path/init_script.js"
          script:
            file: "/path/script.js"

To be able to load scripts from S3 the property TDK_AWS_ENABLED==true should be set. More details can be found here.

The property TDK_GCP_ENABLED==true allows loading scripts from Google Storage. More details can be found here.

Limitations

The current solution does not support:

  • Additional code dependencies

  • Read and write files from scripts

  • Network requests

  • Using environment variables from the host application.

Accessing host application classes

GraalJS provides an ECMAScript-compliant JavaScript language runtime. You can find additional details here.

The following Java packages and classes are available for using in GraalJS environment:

  • io.synthesized.tdk.api

  • java.util

  • java.lang

  • java.time

  • javax.crypto

  • kotlin.Pair

This script is always executed to simplify usage of classes from the host application:

const JavaHashMap = Java.type("java.util.HashMap");
const JavaLong = Java.type("java.lang.Long");
const JavaDouble = Java.type("java.lang.Double");
const JavaInteger = Java.type("java.lang.Integer");
const JavaRandom = Java.type("java.util.Random");
const JavaLocalDateTime = Java.type("java.time.LocalDateTime");
const JavaMath = Java.type("java.lang.Math");
const Pair = Java.type("kotlin.Pair");

const Record = Java.type("io.synthesized.tdk.api.Record");

The following variables are available from the scripts:

  • columns - A list of columns related to a transformation.

  • column - A column name. This variable is only bound for single-column transformations.

  • additionalProperties - A dictionary with properties defined in the user configuration.