Changelog

Version 1.9

29 Jul 2022

Version 1.9 of the Synthesized TDK.

feature Microsoft SQL Server Support

Microsoft SQL Server can be used as input and output database.

feature Advanced format_preserving_hashing configuration

A hash transformation is applied to each character, which included into the configured group, in a given text so that the output preserves the format but contains different characters. This transformation is secure and non-reversible.

Parameters:

  • groups: List<FormatPreservingHashingGroup>: The pair of selector and list of alphabet. selector is used to choose characters from the input string, alphabet - is a set of characters, which are used to replace source ones.

  • filter: Filters are used to mask only a specified substring and keep other characters as is (e.g., mask only last 5 characters).

Available character selectors:

  • numeric

  • lower_letters

  • upper_letters

  • regex

Available alphabets:

  • numeric

  • lower_letters

  • upper_letters

  • custom

Available filters:

  • first - Mask only first n characters.

  • last - Mask only last n characters.

  • characters - Mask only specified characters. Parameters: characters - set of characters to mask, ignore_case (default: false) - indicates if case is taken into account.

  • substring - Mask all occurrences of specified substring. Parameters: substring - Substring to mask, ignore_case (default: false) - indicates if case is taken into account.

  • regex - Mask only characters matching by specified Regex pattern. Parameters: pattern - Regex pattern to find characters to mask, ignore_case (default: false) - indicates if case is taken into account.

For more information, see Transformations List.

feature constant_numeric, constant_date, constant_string, constant_boolean generators

Added new constant generators for numeric, date, string, boolean data types.

For more information, see Transformations List.

feature Numeric type for categorical_generator

categorical_generator supports numeric columns.

For more information, see Transformations List.

enhancement Increased insert_batch_size to optimize performance

Default value for insert_batch_size increased from 30 to 1000.

Version 1.8

8 Jul 2022

Version 1.8 of the Synthesized TDK.

enhancement New Documentation

In addition to the nice appearance, many pages and yaml examples are now generated automatically from source code and tests, which reduces the number of mistakes and allows the documentation to be up-to-date with the product version.

Enjoy!

enhancement Performance Improvement for MASKING

This release includes significant performance improvement for MASKING mode with target_ratio: 1.0.

bugfix Handle Empty Tables

Fixed issues with processing empty tables in MASKING and GENERATION modes.

bugfix Handle Columns with Single Value

Fixed issues with processing columns with single values in MASKING mode.

bugfix Missing relations in GENERATION using KEEP

Fixed issue with missing relations in data generation, when GENERATION tables have Foreign Keys to KEEP tables.

Version 1.7

17 Jun 2022

Version 1.7 of the Synthesized TDK.

feature Custom Database Types Support

To support custom database types:

  • Use output database with already created schema and its child objects, see the DO_NOT_CREATE in Configuration File for more details.

  • Explicitly define generator for custom type column in the configuration file.

For example, for the following custom ENUM type:

CREATE TYPE public.transaction_type_t AS ENUM ('SENT', 'RECEIVED');

Use a configuration like this:

column_params:
- columns:
  - "transaction_type"
  params:
    type: "categorical_generator"
    categories:
      type: string
      values:
      - "SENT"
      - "RECEIVED"
    probabilities:
    - 0.6
    - 0.4

For more information, see Custom database types.

feature Constant Generator

Generate a single numeric value for the entire column

Parameters:

  • value: Number?: numeric value to generate

Compatible modes: GENERATION,badge-primary MASKING,badge-secondary

Compatible column data types: NUMERIC

Supports multiple columns: No,badge-danger

Example:

column_params:
- columns: [ "balance" ]
  params:
    type: "constant"
    value: 0.0

For more information, see Transformations List.

feature UUID Support for MASKING

UUID data type support for MASKING mode.

feature BIGINT and SMALLINT Support

BIGINT and SMALLINT data type support for GENERATION, MASKING, and KEEP modes.

feature Global Seed Parameter

global_seed to set the seed for random number generators.

An integer 32-bit value between -2147483648 and 2147483647, used a seed for random number generators. The result of generation must be the same each time the generation is being run with the same seed and workflow configuration. By default global_seed is 0.

Example:

default_config:
  mode: "MASKING"
  target_ratio: 1.0
global_seed: 42

For more information, see Configuration File.

Version 1.6

10 Jun 2022

Version 1.6 of the Synthesized TDK.

enhancement Performance Improvements

This release includes significant rework of transformation execution internals, bringing the following benefits to end users:

  • Heavy parallelization of transformations and database operation. To the extent the logic of transformation permits, operations are performed in parallel. That results in better hardware utilization and reduced latencies.

  • Memory consumption optimization. The solution now can handle tables with sizes noticeably exceeding main memory size of the process itself.

Version 1.5

8 Jun 2022

Version 1.5 of the Synthesized TDK.

feature H2 Support

H2 database can be used as input and output database.

Note

Add the following arguments to H2 JDBC URLs: ;DATABASE_TO_LOWER=TRUE;CASE_INSENSITIVE_IDENTIFIERS=TRUE

feature SQLite Support

SQLite database can be used as input and output database.

Version 1.4

7 Jun 2022

Version 1.4 of the Synthesized TDK.

feature License Expiration API endpoint

The license expiration can be requested via API:

curl -X 'GET' \
  'http://${API_SERVICE_URL}:${API_SERVICE_PORT}/api/v1/license-expiration' \
  -H 'accept: */*'

Where:

  • API_SERVICE_URL is the endpoint of the service. If running locally, this will likely be localhost

  • API_SERVICE_PORT is the port exposed for the service. The default port is 8081.

If the service is up and running correctly, you should receive a 200 status with the body containing information like:

{"expiry_date":"2023-06-01"}

For more information, see License Expiration.

feature UUID Data Type Support

UUID data type support for GENERATION and KEEP modes.

feature Boolean Data Type Support

BOOLEAN data type support for GENERATION, MASKING, and KEEP modes.

enhancement Configuration File Upload

YAML configuration can be uploaded as a file via API.

For more information, see Create Workflow.

Version 1.3

20 May 2022

Version 1.3 of the Synthesized TDK.

feature Google Secret Manager Integration

The database credentials can be provided from Google Secret Manager:

"password": {
  "type": gcp,
  "project": "${GCP_PROJECT_ID}",
  "secret": "${SECRET_ID}",
  "version": "${VERSION_ID}"
}

For more information, see Database Credentials.

feature Append Data

A new table_truncation_mode:

  • IGNORE: if this mode is selected, the status of the output database is ignored.

It allows not to delete existing data from the output database, but to generate additional and append above.

For more information, see Configuration File.

feature Locale For Address and Person Generators

  • locale: String = 'en-GB': To generate names and addresses from different geographical areas, the user can change this parameter. Default to 'en-GB', which corresponds to British names.

Supported locales:

  • bg

  • ca

  • ca-CAT

  • da-DK

  • de

  • de-AT

  • de-CH

  • en

  • en-AU

  • en-au-ocker

  • en-BORK

  • en-CA

  • en-GB

  • en-IND

  • en-MS

  • en-NEP

  • en-NG

  • en-NZ

  • en-PAK

  • en-SG

  • en-UG

  • en-US

  • en-ZA

  • es

  • es-MX

  • fa

  • fi-FI

  • fr

  • he

  • hu

  • in-ID

  • it

  • ja

  • ko

  • nb-NO

  • nl

  • pl

  • pt

  • pt-BR

  • ru

  • sk

  • sv

  • sv-SE

  • tr

  • uk

  • vi

  • zh-CN

  • zh-TW

For more information, see Transformations List.

enhancement Null Generator by Default

For currently unsupported types, such as XML datatype, null_generator will be used by default.

enhancement Stop Workflow API Endpoint

Added ways to stop the workflow using workflow_id and workflow_run_id. Improved error handling.

For more information, see Stop Workflow.

enhancement Ability to Process a Subset of Tables

Removed comparison between input and output schema. It allows to process a subset of the input tables.

bugfix Consistent Formatted Strings

formatted_string_generator in MASKING mode generates consistent values across the schema.

bugfix Positive Output Based on Positive Input

If the input numeric column contains only positive values, then the generated values will also be positive by default.

Version 1.2

29 Apr 2022

Version 1.2 of the Synthesized TDK.

feature Schema Truncation Mode

There are two table truncation modes:

  • DO_NOT_TRUNCATE: (default) if this mode is selected, tables in the output database won’t be truncated. An empty output database required.

  • TRUNCATE: if this mode is selected, tables in the output database will be truncated.

Usage example for table_truncation_mode:

default_config:
    mode: "GENERATION"
    target_ratio: 1.0
table_truncation_mode: "TRUNCATE"

feature Support CHAR Primary Keys

MASKING mode for tables with CHAR primary keys can be used without any additional configuration. In the previous versions passthrough transformation was used as a workaround.

feature Support Composite Keys

Composite primary and foreign keys can be automatically handled without any additional configuration. In the previous versions foreign_key_generator was used as a workaround.

enhancement Advanced Subsetting

Advanced subsetting implementation for KEEP and MASKING modes. In the previous versions some of the tables after subsetting were empty.

enhancement CLI Parameters

Changed CLI parameters from camelCase to kebab-case:

Usage: engine-lite [-hV] [-c=<config-file>] [-ip=<input-password>]
                   -iu=<input-url> [-iU=<input-username>]
                   [-op=<output-password>] -ou=<output-url>
                   [-oU=<output-username>]
TDK engine lite.
  -c, --config-file=<config-file>
                  Configuration file
  -h, --help      Show this help message and exit.
      -ip, --input-password=<input-password>
                  Input password, default to null
      -iu, --input-url=<input-url>
                  JDBC URL to the INPUT database
      -iU, --input-username=<input-username>
                  Input username, default to null
      -op, --output-password=<output-password>
                  Output password, default to null
      -ou, --output-url=<output-url>
                  JDBC URL to the OUTPUT database
      -oU, --output-username=<output-username>
                  Output username, default to null
  -V, --version   Print version information and exit.

bugfix Consistent Fake Generators

person_generator and address_generator in MASKING mode will generate consistent values across the schema.

For example, all mentions of James Bond with UK address will be masked as Jon Snow with Seven Kingdoms address for any mentions in the schema.

Version 1.1

15 Apr 2022

Version 1.1 of the Synthesized TDK.

feature Schema creation mode

There are four schema creation modes:

  • CREATE_IF_NOT_EXISTS: (default) if this mode is selected, DDL schema will be copied from the source database to the target one if it does not exist, existing schema will be used otherwise.

  • DO_NOT_CREATE: if this mode is selected, existing schema will be used.

  • CREATE: if this mode is selected, DDL schema will be copied from the source database to the target one. The target database should be empty.

  • DROP_AND_CREATE: if this mode is selected, DDL schema will be copied from the source database to the target one. Existing schema in the target database will be dropped. Please use this mode carefully.

Note: If CREATE_IF_NOT_EXISTS, DO_NOT_CREATE modes are used, the target schema should be equal to the source one.

feature Address generator

Generate address fields (e.g. street, zip code) and keep them consistent across columns.

Parameters:

  • column_templates: List<String>: For each column, the template to be used to generate address data consistent_with_column: String?: If given, the column that need to be consistent on. For example, if consistent_with_column="user_id" all people with same user_id will have the same street

Available templates are:

  • ${zip_code}

  • ${country}

  • ${city}

  • ${street_name}

  • ${house_number}

  • ${flat_number}

Compatible modes: GENERATION,badge-primary MASKING,badge-secondary KEEP,badge-warning

Compatible column data types: STRING

Supports multiple columns: Yes,badge-success

Example for multiple columns:

column_params:
  - columns: ["street_name", "zip_code"]
    params:
      type: "address_generator"
      column_templates: ["${street_name}", "${zip_code}"]

Example for a single column:

column_params:
  - columns: ["address"]
    params:
      type: "address_generator"
      column_templates: ["${country}, ${city}, ${street_name}, ${house_number}, ${flat_number}, ${zip_code}"]

feature Cycle resolution strategy

There are two cycle resolution strategies:

  • FAIL: (default) if this mode is selected, cycle_breaker_references should be provided in the configuration file. Otherwise, execution will fail if it detects a circular reference.

  • DELETE_NOT_REQUIRED: if this mode is selected, cyclic references will be resolved automatically by removing the last nullable reference leading to the cycle.

Example for FAIL mode:

default_config:
    mode: "GENERATION"
    target_ratio: 1.0
user_table_configs:
  - table_name_with_schema: "employees"
    cycle_breaker_references: ["employees"]
cycle_resolution_strategy: "FAIL"

Where the employees table contains a cycle reference.

Example for DELETE_NOT_REQUIRED mode:

default_config:
    mode: "GENERATION"
    target_ratio: 1.0
cycle_resolution_strategy: "DELETE_NOT_REQUIRED"

Version 1.0

1 Apr 2022

Version 1.0 of the Synthesized TDK.

First release

We have been working hard to combine our products into a single product with enhanced architecture that will enable us to add exciting new features and optimizations!