Changelog

Version 1.51.0

15 April 2024

feature Multiple columns support added for categorical_generator

The categorical_generator now includes support for multiple columns, enabling users to specify multi-column categories. This enhancement facilitates the preservation of relationships between values across different columns within a single category.

For detailed implementation, refer to the following example:

    transformations:
      - columns:
          - productcode
          - productname
        params:
          type: categorical_generator
          categories:
            value_source: MULTIPLE_PROVIDED
            null_values: ["nil"]
            category_values:
              - values:
                  productcode: "P1"
                  productname: "Product 1"
                weight: 0.5
              - values:
                  productcode: "P2"
                  productname: "Product 2"
                weight: 0.3
              - values:
                  productcode: "P3"
                  productname: "Product 3"
                weight: 0.2
              - values:
                  productcode: "nil"
                  productname: "nil"
                weight: 0.5

You can also specify a CSV file as the source of categories for multiple columns:

    transformations:
      - columns:
          - productcode
          - productname
        params:
          type: categorical_generator
          categories:
            value_source: MULTIPLE_CSV_FILE
            path: src/e2e/resources/data_multi.csv

The example above works with the following CSV file data:

P1

Product1

43

P2

Product2

23

P3

Product3

24

Example of advanced configuration with multiple columns and multiple categories:

    transformations:
      - columns:
          - productcode
          - productname
        params:
          type: categorical_generator
          categories:
            value_source: MULTIPLE_CSV_FILE
            path: src/e2e/resources/data_with_header_multi.csv
            null_values: ["null", ""]
            format:
              columns:
                column_accessor_type: NAME
                categories:
                  productcode: "code"
                  productname: "name"
                weights: "rank"
              encoding: "UTF-8"
              delimiter: ","
              trim: true

The example above works with the following CSV file data:

code name rank

P1

Product1

43

P2

Product2

23

P3

Product3

24

Alternatively, you can leave the categorical_generator configuration empty, in which case TDK will automatically generate categories from the source table.

enhancement Breaking change: Replace nullable_weight with null_values for categorical_generator

The nullable_weight parameter in the categorical_generator has been replaced with null_values. This change allows users to specify the number of values to be treated as special NULL value. By default, null_values is set to ["null"].

Version 1.50.2

22 March 2024

bugfix Fix freeze on highly-concurrent table writes

Fixed the freeze when processing too many tables concurrently.

bugfix Resolved issue with generating a small number of rows

In certain scenarios, a foreign_key_generator with default distribution could generate an empty output, leading to error occurrences.

feature Support the "mode" parameter at the column level

Support has been added to set the transformation mode parameter (such as MASKING, GENERATION, KEEP) at the column level, in addition to the global and table levels. This feature allows us to handle a common scenario - setting the KEEP mode as a global mode and specifying which columns to mask. In this case, TDK automatically determines the necessary transformation for each column (such as format_preserving_hashing, noising, and so on). More information can be found in this example:

default_config:
  mode: KEEP

tables:
  - table_name_with_schema: "public.customer"
    transformations:
      - columns: [ "first_name", "last_name", "email" ]
        mode: MASKING

feature Support the "mapping" parameter as an independent parameter

Ability to specify the mapping value without providing any transformation parameters (the params parameter):

default_config:
  mode: MASKING

tables:
  - table_name_with_schema: "public.customer"
    transformations:
      - columns: [ "username" ]
        mapping:
          read: "?"
          write: "current_user()"

bugfix Improved MSSQL Schema Copy for Specific Data Types

Enhanced the schema copy support for MSSQL. Now, it can handle the following data types: timestamp, numeric, decimal, money, smalldatetime, datetime, and sysname.

Version 1.49.0

15 March 2024

This is a stability improvement release. A number of bugfixes & regressions from the previous release is fixed.

Version 1.48.0

08 March 2024

enhancement Support for truncating cyclically referenced tables

Added support of cyclically referenced table truncation for: DB2, H2, MSSQL and ORACLE databases.

bugfix Improved Stability in Schema Copying for SQL Server

Resolved an issue where schema copying for SQL Server would attempt to access other databases, leading to permission access errors.

enhancement Improved Performance in MASKING mode

Significantly enhance the performance of the TDK in MASKING mode when the working directory enabled. This enhancement particularly benefits large tables by accelerating the rows filtering process.

bugfix Resolve Filtering Issue in MASKING Mode

Corrects a bug where certain rows were erroneously filtered out in MASKING mode. This issue primarily affected composite keys containing Null values. With this fix, Null values are now properly handled, ensuring the retention of all rows during the filtering process.

bugfix Fixed issue with working directory being used when disabled in the configuration

Version 1.47.0

21 February 2024

enhancement Enhancements to DROP_AND_CREATE Mode in MSSQL

Enhanced DROP_AND_CREATE mode to include support for dropping schema-related objects such as stored functions and procedures, domains, and triggers within the DROP_AND_CREATE mode.

enhancement In-Memory Filter Threshold Property

This advanced performance tuning property, the In-Memory Filter Threshold, is designed to optimize filtering operations.

This property determines whether values the parent table are loaded into memory during filtering operations, depending on the table size. If the size of the table does not exceed this specified value, the values will be loaded into memory for filtering for child tables.

The property can be set globally and be overridden at the table-level configuration.

enhancement Revise TDK logs and validation messages for clarity and conciseness

Version 1.46.0

16 February 2024

enhancement Revise TDK logs and validation messages for clarity and conciseness

Version 1.45.0

13 February 2024

Certain datatypes previously had a different handling method that resulted in exceptions for some columns with names containing spaces, such as [Column Name with Spaces].

bugfix Resolved Issue when Copying DDL for SQL Server

This bugfix addresses an issue encountered when copying tables that include columns with varbinary(max) data type in SQL Server.

Version 1.44.0

7 February 2024

feature Constant XML generator

Introducing the new Constant XML Generator.

bugfix Working with XML datatype

Implemented a fix for work with XML fields on SQL Server for runs with working directory.

Version 1.43.0

31 January 2024

bugfix Disable Postgres Trigger Copying During Schema Transfer

Implemented a fix to enhance the stability of schema transfer functionality by disabling trigger copying for Postgres.

Version 1.42.0

29 January 2024

bugfix Resolve Parameters Calculation Issue for date_generator for DB2

Addressed a bug in the data_generator where automatic parameter calculation led to an arithmetic overflow error for DB2.

Version 1.41.1

23 January 2024

bugfix Disable View Copying During Schema Transfer

The view copying functionality was not fully implemented, leading to inconveniences during schema copying. To maintain consistency with previous versions, view copying during schema transfer has been disabled.

Version 1.41.0

23 January 2024

feature Add Configuration Flags: Skip Failed Batch or Fail on First Failure

This release introduces two new configuration flags, providing users with greater control over batch handling:

NOT_FALLBACK_TO_ONE_BY_ONE_INSERTS

In the event of a failed batch insert, this flag prevents the system from resorting to one-by-one inserts for the entire batch. If any batch encounters an error, the entire TDK execution will fail.

SKIP_FAILED_BATCHES

Skip failed batches without resorting to one-by-one inserts.

Only one of the flags NOT_FALLBACK_TO_ONE_BY_ONE_INSERTS and SKIP_FAILED_BATCHES can be chosen at a time.

bugfix Address TDK Run Interference

This release includes a fix to mitigate the rare bug that could cause interference from previous runs affecting the current execution.

Version 1.40.0

17 January 2024

bugfix Handling autogenerated types for DB2.

Version 1.39.0

5 January 2024

feature MSSQL CSV file insertion strategy.

This strategy optimally performs bulk loading for SQL Server, significantly enhancing the efficiency of writing large files into tables.

bugfix Handling unique timestamp for table with empty columns.

bugfix Categorical generator instead of constant for single-value Boolean and Numeric types.

Version 1.38.0

15 Dec 2023

enhancement Automatic search indexes deactivation during processing.

This enhancement boosts performance by reducing processing time in schemas with numerous search indexes.

bugfix Subsetting support for auto-generated MSSQL Timestamp.

Version 1.37.0

6 Dec 2023

enhancement MSSQL Categorical generator enhancement.

Enhanced support for category names in various cases and encodings.

enhancement Improved Generation from Empty Table with Alias Types.

Enhanced support for generating data from an empty table with alias types.

bugfix Fixed ignoring for auto-generated columns.

Version 1.36.1

28 Nov 2023

enhancement Partial database processing enhancement.

Implemented support for an ignore filter in table processing and introduced a caching strategy for improved performance in table size calculation.

enhancement Enhance handling of MSSQL timestamp data type.

The timestamp (Transact-SQL) data type is just an incrementing number and does not preserve a date or a time.

Version 1.36.0

27 Nov 2023

feature Financial Data Generator.

Introducing the new Finance Generator.

Choose from a variety of templates, including:

  • ${credit_card}

  • ${bic}

  • ${iban}

  • ${nasdaq_ticker}

  • ${nyse_ticker}

  • ${stock_market}

  • ${us_routing_number}

Configure specific card types with templates such as:

  • ${credit_card.visa}

  • ${credit_card.mastercard}

  • ${credit_card.discover}

  • ${credit_card.american_express}

  • ${credit_card.diners_club}

  • ${credit_card.jcb}

  • ${credit_card.switch}

  • ${credit_card.solo}

  • ${credit_card.dankort}

  • ${credit_card.forbrugsforeningen}

  • ${credit_card.laser}

Example:

    transformations:
      - columns: [ "credit_card" ]
        params:
          type: finance_generator
          column_templates: [ "${credit_card.visa}" ]

feature Realistic Text Column Generation.

TDK now intelligently applies heuristics based on column and table names, allowing for more suitable default transformation selections.

Therefore, Person Generator, Address Generator and Finance Generator can be chosen by default for GENERATION mode.

This behavior is enabled by default but can be disabled using the use_text_column_heuristics property.

feature Foreign Key generation with Poisson distribution.

The foreign key generator now enables the generation of foreign key relationships with links that follow a Poisson distribution, offering a more accurate representation of real-world data structures.

This feature is now set as the default behavior, replacing the previous default of ROUND_ROBIN. However, ROUND_ROBIN remains available for use if preferred.

enhancement New Templates for Person and Address Generators.

New templates are now supported for Person Generator:

  • ${full_name} - First name and Last name

  • ${company} - Company name

  • ${phone_national} - The phone number in domestic format

  • ${phone_international} - The phone number in international format

  • ${ssn} - U.S. Social Security Number (SSN)

New templates for Address Generator:

  • ${full_address}

  • ${street_address}

  • ${region}

  • ${latitude}

  • ${longitude}

  • ${coordinates}

  • ${time_zone}

enhancement Default Value Truncation for Fake Generators.

Make length_exceeded_mode: TRUNCATE default for person_generator, address_generator and finance_generator.

This ensures generated values are truncated by the column’s max length instead of causing overflow errors.

enhancement Global Locale Setting.

The Default Locale property can be set globally for person_generator, address_generator and finance_generator.

This setting can be overridden at the table-level configuration.

enhancement Extended Locales Support.

The expanded list of supported locales for person_generator, address_generator and finance_generator.:

ar

bg

ca

ca-CAT

cs

da-DK

de

de-AT

de-CH

en

en-AU

en-CA

en-GB

en-IND

en-MS

en-NEP

en-NG

en-NZ

en-PAK

en-SG

en-UG

en-US

en-ZA

en-PH

es

es-MX

fa

fi-FI

fr

he

hu

in-ID

it

ja

ko

nb-NO

nl

pl

pt

pt-BR

ru

sk

sv

sv-SE

tr

uk

vi

zh-CN

zh-TW

enhancement Enhance handling of MSSQL Server user-defined data type aliases.

Enhanced support for user-defined data type aliases, allowing these types to be processed by the TDK.

Version 1.35.2

10 Nov 2023

enhancement Support PostgreSQL JSONB type for json_pointer_transformer

JSON Pointer transformer now can process PostgreSQL JSONB type.

enhancement Java 17 to be the minimum required version

TDK requires at least JVM 17 to be run.

It’s needed to update JDK to be able using latest versions of TDK.

No changes required for Docker and Kubernetes users.

Version 1.35.1

6 Nov 2023

enhancement Support SQL Server text and ntext data types

Despite text and ntext are deprecated, they may be present in production schemas.

Now these types can be processed by the TDK.

Version 1.35.0

6 Nov 2023

feature Scripting transformer

With the introduction of the new Scripting Transformer, it’s allowed to write custom scripts using the Javascript programming language.

This makes it possible to extend the TDK and add any specific logic.

The Scripting Transformer can be applied to any column for the GENERATION and MASKING modes.

feature Preserve Foreign Key distribution

The foreign key generator now allows you to preserve the original relationship of links, which can significantly improve data quality.

This feature can be enabled by specifying the distribution ORIGINAL for the foreign key columns:

tables:
  - table_name_with_schema: "public.orders"
    transformations:
      - columns: [ "customer_id" ]
        params:
          type: "foreign_key_generator"
          distribution: ORIGINAL

Version 1.34.0

10 Oct 2023

feature Category generator enhancements.

New and more convenient format for configuring categorical_generator, including the ability to specify a CSV file as the source of categories.

bugfix SQLite and PostgreSQL support.

Fix issues in SQLite and PostgreSQL support.

enhancement Optimize Oracle inserts.

Optimize Oracle batch inserts.

Version 1.33.0

20 Sep 2023

feature Snowflake support.

Snowflake support using a JDBC-compatible driver.

enhancement Enhancing performance in Oracle foreign keys processing

Increased efficiency in foreign keys metadata collecting.

Version 1.32.0

22 Aug 2023

feature TDK is now available on GCP Marketplace

The TDK is now available on the GCP Marketplace. Getting started page https://console.cloud.google.com/marketplace/product/synthesized-marketplace-public/synthesized-tdk

Version 1.31.0

11 Aug 2023

enhancement Eliminate the usage of Java-dependent hashes in computations

All transformations now use Java-agnostic hashes; there is no more Objects.hash() or similar unportable calls.

bugfix Multiple unique and referential integrity constraint fixes

Fixed a number of corner cases where unique and referential integrity constraints were processed incorrectly.

Version 1.30.0

28 Jul 2023

feature Hashicorp Vault as a secret manager

Added support for Hashicorp Vault secret manager.

enhancement Usability improvements for the date generator

Before this release, for GENERATION mode for date generator we had to set std option in milliseconds.

Now the following formats are supported:

  • ISO-8601 Duration format, e.g., P1DT2H3M4.058S.

  • The concise format described here, e.g., 10s, 1h 30m or -(1h 30m)

  • Milliseconds without the specific unit, e.g., 12534.

bugfix Fix UUID transformer behaviour in masking mode

Starting from release v1.17, the UUID transformer worked as a pure generator, it never took into account input values. Therefore it had no utility when processing FK-connected columns. In this release, the correct behaviour for the masking mode was restored.

Version 1.29.0

04 Jul 2023

feature DEFER_FOREIGN_KEY Cycle Resolution Strategy

New DEFER_FOREIGN_KEY Cycle Resolution Strategy: when selected - all FK references will be preserved, but the ones that lead to cycles will be disabled during masking and then re-enabled after data is inserted.

This strategy is suitable for databases with cyclic schema and works only with MASKING mode without subsetting.

enhancement Using the Kubernetes TTL mechanism to delete completed pods

Extend TDK CLI Helm chart with the property ttlSecondsAfterFinished. This allows Kubernetes pods to be removed in specified number of seconds after they have completed.

enhancement Bring back views for MySQL DDL copying

Copy DDL for views was previously disabled for MySQL.

enhancement Support unsigned data types for unique_hashing

enhancement Show all found errors happening during the effective configuration creation

The process of creating an efficient configuration is no longer aborted after the first detected error.

bugfix Fix DDL copying for MySQL columns with unsigned decimal types.

Previously, these columns were copied with the other data type.

bugfix Fix DDL copying for MySQL datetime/timestamp columns

Version 1.28.0

15 Jun 2023

feature New Tutorials

New tutorials for Masking, Generation, Subsetting and Data Filtering.

enhancement Support negative values for unique_hashing

bugfix Handling zero date

For more information, refer to java.sql.SQLException: Zero date value prohibited page.

bugfix Ignore views when exporting DDL

Version 1.27.2

7 Jun 2023

bugfix Fix handling of several data types in MySQL

Version 1.27.1

2 Jun 2023

bugfix Fix an issue with MySQL metadata extraction for foreign keys referring to an indexed column which is neither unique-constrained nor a primary key.

bugfix Fix an issue when certain MySQL data types could not be parsed.

Version 1.27

31 May 2023

feature DB2 support for Linux, Unix, and Windows

bugfix Fix functionality for AWS Aurora MySQL

bugfix Fix https://github.com/synthesized-io/pagila-tdk-demo failure during startup on ARM-based machines

bugfix Fix the "Cannot truncate" error in MySQL when using the CREATE_IF_NOT_EXISTS schema creation mode.

Version 1.26

10 May 2023

enhancement Hourly pricing model is now available for the TDK product on AWS marketplace

enhancement Getting started page now uses Pagila Docker-compose demo

enhancement Improved performance with RocksDB

bugfix TDK CLI Docker image now works on arm64

Version 1.25

21 Apr 2023

Version 1.25 of the Synthesized TDK.

feature Subsetting mode is now available in the free version of TDK

Starting from this version, the Subsetting mode is available in the free version of TDK. For more details about the Subsetting mode, please see here.

feature Postgres partitioned tables support

TDK can operate with Postgres partitioned tables.

Version 1.24

04 Apr 2023

Version 1.24 of the Synthesized TDK.

feature TDK is now available on AWS Marketplace

The TDK is now available on the AWS Marketplace with Docker, ECS Fargate, and Helm charts delivery options for . See more details here: AWS Marketplace.

feature TDK Docker container is now available

TDK can now be launched not only via command line interface but as a Docker image, see details here: Docker.

Version 1.23

7 Mar 2023

Version 1.23 of the Synthesized TDK.

feature AWS S3 support for configuration loading

To be able to read the configuration file from AWS S3 you need to enable the TDK_AWS_ENABLED property, see Application properties.

feature AWS Secrets Manager support

Database credentials can be requested from AWS Secrets Manager:

{
  "type": "aws",
  "secret": "${SECRET_ID}",
  "version": "${VERSION_ID}"
}

Where:

  • type: password provider type

  • secret - The ARN or name of the secret to retrieve,

  • version (optional) - The unique identifier of the version of the secret to retrieve. If you don’t specify the version, then the AWSCURRENT version is used.

Note

SECRETS_AWS_SECRET_MANAGER_ENABLED property should be enabled, see Application properties

Version 1.22

14 Feb 2023

Version 1.22 of the Synthesized TDK.

feature Ability to provide Foreign Keys in the yaml configuration file

By default, TDK preserves referential integrity based on the foreign keys in the source database schema. In this release, added the ability to provide additional foreign keys in the yaml configuration file.

For example, if order.user_id is a foreign key referred to user.id, and it’s not defined in the database schema, then the following configuration can be provided:

default_config:
  mode: "MASKING"
  target_ratio: 0.5
metadata:
  tables:
    - table_name_with_schema: "public.order"
      foreign_keys:
        fk_user_order:
          referred_schema: "public"
          referred_table: "user"
          columns:
            - column: "user_id"
              referred_column: "id"

More details about the additional foreign keys in the Configuration reference section.

enhancement Performance Improvements

Significant performance improvement for the subsetting mode.

To enable the performance improvement, the following application property should be set:

TDK_WORKINGDIRECTORY_ENABLED=true
TDK_WORKINGDIRECTORY_PATH=/home/tdk/working-directory

enhancement Constant generators in the relaxed mode

In the RELAXED mode, constant generators constant_numeric, constant_string, constant_date, constant_boolean will be chosen by default where the source column contains the same value in all rows.

enhancement FAQ page

FAQ page in the documentation.

Version 1.21

31 Jan 2023

Version 1.21 of the Synthesized TDK.

feature Safety Mode

By default, STRICT mode is enabled. If no suitable transform is found, then the passthrough, null_generator and categorical_generator will not be chosen by default.

This change breaks compatibility with the previous version.

To keep the behavior of previous versions, you can use the RELAXED mode.

feature Data Filtering

Data Filtering feature.

feature Multiple Database

Version 1.20

13 Jan 2022

Version 1.20 of the Synthesized TDK.

feature YAML anchors and aliases support

YAML anchors and aliases support to reduce repeated sections in the configuration.

enhancement Download page

The latest free TDK CLI version is currently available in the documentation for download.

enhancement GitHub Actions Integration

GitHub Actions Integration page for the free TDK CLI version in the documentation.

enhancement Mutually exclusive options have been turned into separate commands

--dry-run, --default-config, --json-schema, --license-expiration options have been turned into separate commands:

.....
Commands:
  help                    Display help information about the specified command.
  default-config, dc      Print built-in default configuration (can be
                            overridden in the user config)
  dry-run, dr             Print the effective configuration instead of running
                            the transformation to the console (by default) or
                            to the file (specified as `-ec` parameter value)
  json-schema, js         Print json schema for configuration YAML file
  license-expiration, le  Print the expiration date of the license key

enhancement Ability to save effective config into custom file

dry-run command prints the effective configuration to the console (by default) or to the file (specified as -ec or --effective-config-file parameter value).

enhancement Added help command to display detailed information by each command

For example tdk.jar help dry-run:

Usage: engine-lite dry-run [-ec=<effective-config-file>]
Print the effective configuration instead of running the transformation to the
console (by default) or to the file (specified as `-ec` parameter value)
      -ec, --effective-config-file=<effective-config-file>
         Effective configuration file

bugfix Error getting connection from data source

Fixed the error Error getting connection from data source when running TDK transformation on a huge database.

Version 1.19

19 Dec 2022

Version 1.19 of the Synthesized TDK.

bugfix Oracle identifier is too long error when working with Oracle database

Fixed the behaviour where ORA-00972: identifier is too long error appeared when working with Oracle DB instances.

feature Better default rules for Boolean fields

Default rules now recognize boolean-typed fields (e. g. TINYINT(1) in MySql or BOOL in PostgreSQL) and use Categorical generator to generate random boolean values in GENERATION and MASKING modes.

enhancement Improve performance of ConfigConverter

The performance of creating effective config is now improved for certain cases.

Version 1.18

9 Dec 2022

Version 1.18 of the Synthesized TDK.

feature Mapping expressions on read and on write

Allows transforming columns as they are being read from the input database and written to the output database.

For example, a column might need a cast to a different type on read and then a cast back to the original type on write. The following configuration might be provided to address that:

tables:
  - table_name_with_schema: "test_schema.test_table"
    mode: "MASKING"
    transformations:
      - columns: ["my_binary_column"]
        mapping:
          read: "cast(? as char)"
          write: "cast(? as binary)"

The my_binary_column will be cast to char type on read and the result of transformation will be cast back to binary type on write.

feature Support for unsigned numeric types

TDK is now aware of unsigned numeric types supported by some DBMS (MySQL, Oracle).

enhancement Collision-free format-preserving transformation

format_preserving_hashing is now collision-free even for short strings.

enhancement --dry-run produces reusable configuration

The --dry-run CLI option now creates a YAML config file that contains transformations for all tables and can be used for further runs.

Version 1.17

28 Nov 2022

Version 1.17 of the Synthesized TDK.

feature Filter Schemas

Introduced the ability to define the list of schemas to process:

schemas: array of String. If not set or null, all schemas available to the source database user will be processed.

Example:

default_config:
  mode: GENERATION
  target_ratio: 2.0
schemas: ["accounts", "payments"]
schema_creation_mode: CREATE_IF_NOT_EXISTS
table_truncation_mode: TRUNCATE

feature JSON transformer

json_pointer_transformer transforms JSON value nodes indicated by JSON pointers, the rest of the values are kept as is:

    transformations:
      - columns: ["productspec"]
        params:
          type: "json_pointer_transformer"
          specifications:
            - pointers: [ "/sku" ]
              transformation:
                type: "format_preserving_hashing"
            - pointers: [ "/tags/0" ]
              transformation:
                type: "format_preserving_hashing"
              ignore_errors: true

Refer to JSON Pointer Transformer for more details.

enhancement Better tuned default rules for generation from empty database

Introduced more empirical default rules for generation from empty database. If a source table is empty and the generated field is not null and no user configuration provided for this field, then reasonable defaults for generators are chosen for the following data types:

  • DATE a random date from 1970-01-01 to 2030-01-01

  • NUMERIC a random integer from 1 to 100

  • ANY (blobs and binary arrays) a single random byte

Version 1.16

17 Nov 2022

Version 1.16 of the Synthesized TDK.

enhancement Improved Yaml Configuration Structure

This change breaks compatibility with the previous version.

The following yaml configuration parameters have been renamed:

  • column_paramstransformations

  • user_table_configstables

Example before:

default_config:
  mode: MASKING
  target_ratio: 1.0
user_table_configs:
  - table_name_with_schema: "public.delivery"
    column_params:
      - columns: ["status"]
        params:
          type: categorical_generator

Example after:

default_config:
  mode: MASKING
  target_ratio: 1.0
tables:
  - table_name_with_schema: "public.delivery"
    transformations:
      - columns: ["status"]
        params:
          type: categorical_generator

enhancement Output Schema Cannot Be Source

A validation step to check that the output schema is not the source schema to prevent any source data or schema corruption.

Version 1.15

11 Nov 2022

Version 1.15 of the Synthesized TDK.

feature Testcontainers Integration

By combining Testcontainers with Synthesized TDK, developers can populate any Testcontainers database with synthetically generated data, enabling rapid development of tests for logic which involves interaction with the database. Refer to documentation for more details.

feature Documentation Tutorials

Documentation tutorials for different modes:

feature Boost performance using working directory on a local file system

A transient local storage area can now be configured to speed up TDK operations. Refer to documentation for more details.

feature --json-schema parameter for CLI

Introduced --json-schema parameter for CLI which prints JSON Schema for the YAML Configuration. This schema can be used in an IDE to provide auto-completion for your YAML and to validate your configuration before run.

feature Output alphabets in format_preserving_hashing

Introduced the ability to define the output alphabets in format_preserving_hashing with unicode_block and unicode_range. Refer to custom alphabets for more details.

enhancement Better tuned default rules for generation from empty database

Introduced more empirical default rules for generation from empty database.

enhancement Oracle user permissions

Added the minimum database permissions required to run Synthesized TDK with Oracle database, see Oracle permissions.

enhancement Preserve null values for all masking transformers

Preserve null values for all masking transformers.

bugfix Make CategoricalGenerator’s masking mode the same as in generation

Fixed the behaviour when CategoricalGenerator didn’t preserve the input probabilities in masking mode.

bugfix Make CategoricalGenerator produce null values

Fixed the behaviour when CategoricalGenerator produced 'null' strings instead of null values.

bugfix Fixed the issue when insertion did not work for wide tables

For tables wider than a certain threshold (for example, 21 columns for SQL Server) insertion failed. This issue is now fixed for all supported databases.

Version 1.14

30 Sep 2022

Version 1.14 of the Synthesized TDK.

feature Significant performance improvement

Significant performance improvement for MASKING and GENERATION modes for all supported databases.

feature GENERATION mode for empty tables

To use GENERATION mode for empty tables, the user should specify target_row_number at the global or table level:

target_row_number: optional Integer (int64). The absolute size of the output table in rows. This parameter is applicable only for GENERATION mode. If not provided, target_ratio will be used.

enhancement Improved error handling

More user-friendly messages in scenarios where there is not enough data to determine the generation parameters.

Improved error handling for Microsoft SQL server IDENTITY exceptions.

Version 1.13

16 Sep 2022

Version 1.13 of the Synthesized TDK.

feature MySQL support

MySQL database can be used as an input and output database.

enhancement Improved performance of all transformations up to 30%

Improved performance of all transformations up to 30%.

enhancement Improved data quality and reduce memory consumption of date_generator

Improved data quality and reduce memory consumption of date_generator.

bugfix Fixed CREATE_IF_NOT_EXISTS for Microsoft SQL server

Fixed the behavior when the second run with schema_creation_mode: CREATE_IF_NOT_EXISTS configuration fails for Microsoft SQL server.

bugfix Fixed IDENTITY support for not PRIMARY KEY columns

Fixed IDENTITY property support for not PRIMARY KEY columns. Now int_sequence_generator is used by default for any IDENTITY column.

Version 1.12

2 Sep 2022

Version 1.12 of the Synthesized TDK.

feature Oracle support

Oracle database can be used as an input and output database.

enhancement New unique_hashing algorithm

New unique_hashing algorithm:

  • provides random bijective (one-to-one) mapping between unique identifiers of the input and the output databases

  • prevents key collisions – unique input keys are mapped to unique output keys

  • 2x performance improvement in MASKING scenarios due to the absence of key collisions.

enhancement Microsoft SQL Server IDENTITY property support

Microsoft SQL Server IDENTITY property support.

enhancement Changed MASKING default behavior

The default MASKING behavior has become more secure. Now format_preserving_hashing is always used by default for string columns. To passthrough, the user must configure it explicitly.

bugfix Fixed conditional_generator

conditional_generator now works correctly both when conditional_column is in the same table as the column being generated and when they are in different tables.

Version 1.11

23 Aug 2022

Version 1.11 of the Synthesized TDK.

enhancement New format_preserving_hashing Algorithm

New format_preserving_hashing algorithm:

  • supports Unicode’s Basic Multilingual Plane in input and output data

  • maximum length of input text is increased to 232 characters

  • performance increased to 30-60% (the boost may vary depending on the message size and hashing groups configuration).

enhancement Required Database Permissions

Database permissions page describing the minimum database permissions required to run Synthesized TDK.

enhancement length_exceeded_mode for fake generators

A new parameter length_exceeded_mode is added for address_generator and person_generator, it allows to truncate the generated values by the column size.

For example, if the country column is varchar(20), and the generated value is "Saint Vincent And The Grenadines":

  • length_exceeded_mode: IGNORE (default) fails to insert "Saint Vincent And The Grenadines" to the varchar(20) column

  • length_exceeded_mode: TRUNCATE mode inserts the truncated value "Saint Vincent And Th" to the varchar(20) column.

enhancement Tests for the Documentation Examples

All the examples in the documentation are now covered with tests and up-to-date with the Synthesized TDK version.

bugfix Temporary Columns Cleanup

Fixed an issue with temporary columns after failed execution, now it is cleaned up for most cases.

Version 1.10

12 Aug 2022

Version 1.10 of the Synthesized TDK.

feature BigID integration

For more information, see BigID integration.

feature CREATE_IF_NOT_EXISTS support for Microsoft SQL Server

CREATE_IF_NOT_EXISTS schema creation mode support for Microsoft SQL Server.

feature Autogenerated Documentation

Transformations and YAML configuration are now autogenerated and up-to-date.

enhancement Quick Start

Getting Started and Installation sections in the documentation with H2 demo database.

enhancement noising and continuous_generator enhancement

Columns with a single value do not fail with an error, but are kept as-is for MASKING mode and filled with nulls for GENERATION.

enhancement global_seed for subsetting

Subsetting is repeatable with the same global_seed value.

enhancement Performance improvement for fake generators

Huge performance improvement for address_generator and person_generator.

bugfix XML data type support for Microsoft SQL Server

XML data type support for Microsoft SQL Server.

bugfix constant_date fix

Fixed constant_date, now it works for both value and range modes.

bugfix categorical_generator validation

Fixed categorical_generator validation, now it fails with not provided probabilities.

Version 1.9

29 Jul 2022

Version 1.9 of the Synthesized TDK.

feature Microsoft SQL Server Support

Microsoft SQL Server can be used as input and output database.

feature Advanced format_preserving_hashing configuration

A hash transformation is applied to each character, which included into the configured group, in a given text so that the output preserves the format but contains different characters. This transformation is secure and non-reversible.

Parameters:

  • groups: List<FormatPreservingHashingGroup>: The pair of selector and list of alphabet. selector is used to choose characters from the input string, alphabet - is a set of characters, which are used to replace source ones.

  • filter: Filters are used to mask only a specified substring and keep other characters as is (e.g., mask only last 5 characters).

Available character selectors:

  • numeric

  • lower_letters

  • upper_letters

  • regex

Available alphabets:

  • numeric

  • lower_letters

  • upper_letters

  • custom

Available filters:

  • first - Mask only first n characters.

  • last - Mask only last n characters.

  • characters - Mask only specified characters. Parameters: characters - set of characters to mask, ignore_case (default: false) - indicates if case is taken into account.

  • substring - Mask all occurrences of specified substring. Parameters: substring - Substring to mask, ignore_case (default: false) - indicates if case is taken into account.

  • regex - Mask only characters matching by specified Regex pattern. Parameters: pattern - Regex pattern to find characters to mask, ignore_case (default: false) - indicates if case is taken into account.

For more information, see Transformations.

feature constant_numeric, constant_date, constant_string, constant_boolean generators

Added new constant generators for numeric, date, string, boolean data types.

For more information, see Transformations.

feature Numeric type for categorical_generator

categorical_generator supports numeric columns.

For more information, see Transformations.

enhancement Increased insert_batch_size to optimize performance

Default value for insert_batch_size increased from 30 to 1000.

Version 1.8

8 Jul 2022

Version 1.8 of the Synthesized TDK.

enhancement New Documentation

In addition to the nice appearance, many pages and yaml examples are now generated automatically from source code and tests, which reduces the number of mistakes and allows the documentation to be up-to-date with the product version.

Enjoy!

enhancement Performance Improvement for MASKING

This release includes significant performance improvement for MASKING mode with target_ratio: 1.0.

bugfix Handle Empty Tables

Fixed issues with processing empty tables in MASKING and GENERATION modes.

bugfix Handle Columns with Single Value

Fixed issues with processing columns with single values in MASKING mode.

bugfix Missing relations in GENERATION using KEEP

Fixed issue with missing relations in data generation, when GENERATION tables have Foreign Keys to KEEP tables.

Version 1.7

17 Jun 2022

Version 1.7 of the Synthesized TDK.

feature Custom Database Types Support

To support custom database types:

  • Use output database with already created schema and its child objects, see the DO_NOT_CREATE in YAML configuration for more details.

  • Explicitly define generator for custom type column in the configuration file.

For example, for the following custom ENUM type:

CREATE TYPE public.transaction_type_t AS ENUM ('SENT', 'RECEIVED');

Use a configuration like this:

transformations:
- columns:
  - "transaction_type"
  params:
    type: "categorical_generator"
    categories:
      values:
      - "SENT"
      - "RECEIVED"
    probabilities:
    - 0.6
    - 0.4

For more information, see Custom database types.

feature Constant Generator

Generate a single numeric value for the entire column

Parameters:

  • value: Number?: numeric value to generate

Compatible modes: GENERATION,badge-primary MASKING,badge-secondary

Compatible column data types: NUMERIC

Supports multiple columns: No,badge-danger

Example:

transformations:
- columns: [ "balance" ]
  params:
    type: "constant"
    value: 0.0

For more information, see Transformations.

feature UUID Support for MASKING

UUID data type support for MASKING mode.

feature BIGINT and SMALLINT Support

BIGINT and SMALLINT data type support for GENERATION, MASKING, and KEEP modes.

feature Global Seed Parameter

global_seed to set the seed for random number generators.

An integer 32-bit value between -2147483648 and 2147483647, used a seed for random number generators. The result of generation must be the same each time the generation is being run with the same seed and workflow configuration. By default global_seed is 0.

Example:

default_config:
  mode: "MASKING"
  target_ratio: 1.0
global_seed: 42

For more information, see YAML configuration.

Version 1.6

10 Jun 2022

Version 1.6 of the Synthesized TDK.

enhancement Performance Improvements

This release includes significant rework of transformation execution internals, bringing the following benefits to end users:

  • Heavy parallelization of transformations and database operation. To the extent the logic of transformation permits, operations are performed in parallel. That results in better hardware utilization and reduced latencies.

  • Memory consumption optimization. The solution now can handle tables with sizes noticeably exceeding main memory size of the process itself.

Version 1.5

8 Jun 2022

Version 1.5 of the Synthesized TDK.

feature H2 Support

H2 database can be used as input and output database.

Note

Add the following arguments to H2 JDBC URLs: ;DATABASE_TO_LOWER=TRUE;CASE_INSENSITIVE_IDENTIFIERS=TRUE

feature SQLite Support

SQLite database can be used as input and output database.

Version 1.4

7 Jun 2022

Version 1.4 of the Synthesized TDK.

feature License Expiration API endpoint

The license expiration can be requested via API:

curl -X 'GET' \
  'http://${API_SERVICE_URL}:${API_SERVICE_PORT}/api/v1/license-expiration' \
  -H 'accept: */*'

Where:

  • API_SERVICE_URL is the endpoint of the service. If running locally, this will likely be localhost

  • API_SERVICE_PORT is the port exposed for the service. The default port is 8081.

If the service is up and running correctly, you should receive a 200 status with the body containing information like:

{"expiry_date":"2023-06-01"}

feature UUID Data Type Support

UUID data type support for GENERATION and KEEP modes.

feature Boolean Data Type Support

BOOLEAN data type support for GENERATION, MASKING, and KEEP modes.

enhancement Configuration File Upload

YAML configuration can be uploaded as a file via API.

Version 1.3

20 May 2022

Version 1.3 of the Synthesized TDK.

feature Google Secret Manager Integration

The database credentials can be provided from Google Secret Manager:

"password": {
  "type": gcp,
  "project": "${GCP_PROJECT_ID}",
  "secret": "${SECRET_ID}",
  "version": "${VERSION_ID}"
}

feature Append Data

A new table_truncation_mode:

  • IGNORE: if this mode is selected, the status of the output database is ignored.

It allows not to delete existing data from the output database, but to generate additional and append above.

For more information, see YAML configuration.

feature Locale For Address and Person Generators

  • locale: String = 'en-GB': To generate names and addresses from different geographical areas, the user can change this parameter. Default to 'en-GB', which corresponds to British names.

Supported locales:

  • bg

  • ca

  • ca-CAT

  • da-DK

  • de

  • de-AT

  • de-CH

  • en

  • en-AU

  • en-au-ocker

  • en-BORK

  • en-CA

  • en-GB

  • en-IND

  • en-MS

  • en-NEP

  • en-NG

  • en-NZ

  • en-PAK

  • en-SG

  • en-UG

  • en-US

  • en-ZA

  • es

  • es-MX

  • fa

  • fi-FI

  • fr

  • he

  • hu

  • in-ID

  • it

  • ja

  • ko

  • nb-NO

  • nl

  • pl

  • pt

  • pt-BR

  • ru

  • sk

  • sv

  • sv-SE

  • tr

  • uk

  • vi

  • zh-CN

  • zh-TW

For more information, see Transformations.

enhancement Null Generator by Default

For currently unsupported types, such as XML datatype, null_generator will be used by default.

enhancement Stop Workflow API Endpoint

Added ways to stop the workflow using workflow_id and workflow_run_id. Improved error handling.

enhancement Ability to Process a Subset of Tables

Removed comparison between input and output schema. It allows to process a subset of the input tables.

bugfix Consistent Formatted Strings

formatted_string_generator in MASKING mode generates consistent values across the schema.

bugfix Positive Output Based on Positive Input

If the input numeric column contains only positive values, then the generated values will also be positive by default.

Version 1.2

29 Apr 2022

Version 1.2 of the Synthesized TDK.

feature Schema Truncation Mode

There are two table truncation modes:

  • DO_NOT_TRUNCATE: (default) if this mode is selected, tables in the output database won’t be truncated. An empty output database required.

  • TRUNCATE: if this mode is selected, tables in the output database will be truncated.

Usage example for table_truncation_mode:

default_config:
    mode: "GENERATION"
    target_ratio: 1.0
table_truncation_mode: "TRUNCATE"

feature Support CHAR Primary Keys

MASKING mode for tables with CHAR primary keys can be used without any additional configuration. In the previous versions passthrough transformation was used as a workaround.

feature Support Composite Keys

Composite primary and foreign keys can be automatically handled without any additional configuration. In the previous versions foreign_key_generator was used as a workaround.

enhancement Advanced Subsetting

Advanced subsetting implementation for KEEP and MASKING modes. In the previous versions some of the tables after subsetting were empty.

enhancement CLI Parameters

Changed CLI parameters from camelCase to kebab-case:

Usage: engine-lite [-hV] [-c=<config-file>] [-ip=<input-password>]
                   -iu=<input-url> [-iU=<input-username>]
                   [-op=<output-password>] -ou=<output-url>
                   [-oU=<output-username>]
TDK engine lite.
  -c, --config-file=<config-file>
                  Configuration file
  -h, --help      Show this help message and exit.
      -ip, --input-password=<input-password>
                  Input password, default to null
      -iu, --input-url=<input-url>
                  JDBC URL to the INPUT database
      -iU, --input-username=<input-username>
                  Input username, default to null
      -op, --output-password=<output-password>
                  Output password, default to null
      -ou, --output-url=<output-url>
                  JDBC URL to the OUTPUT database
      -oU, --output-username=<output-username>
                  Output username, default to null
  -V, --version   Print version information and exit.

bugfix Consistent Fake Generators

person_generator and address_generator in MASKING mode will generate consistent values across the schema.

For example, all mentions of James Bond with UK address will be masked as Jon Snow with Seven Kingdoms address for any mentions in the schema.

Version 1.1

15 Apr 2022

Version 1.1 of the Synthesized TDK.

feature Schema creation mode

There are four schema creation modes:

  • CREATE_IF_NOT_EXISTS: (default) if this mode is selected, DDL schema will be copied from the source database to the target one if it does not exist, existing schema will be used otherwise.

  • DO_NOT_CREATE: if this mode is selected, existing schema will be used.

  • CREATE: if this mode is selected, DDL schema will be copied from the source database to the target one. The target database should be empty.

  • DROP_AND_CREATE: if this mode is selected, DDL schema will be copied from the source database to the target one. Existing schema in the target database will be dropped. Please use this mode carefully.

Note: If CREATE_IF_NOT_EXISTS, DO_NOT_CREATE modes are used, the target schema should be equal to the source one.

feature Address generator

Generate address fields (e.g. street, zip code) and keep them consistent across columns.

Parameters:

  • column_templates: List<String>: For each column, the template to be used to generate address data consistent_with_column: String?: If given, the column that need to be consistent on. For example, if consistent_with_column="user_id" all people with same user_id will have the same street

Available templates are:

  • ${zip_code}

  • ${country}

  • ${city}

  • ${street_name}

  • ${house_number}

  • ${flat_number}

Compatible modes: GENERATION,badge-primary MASKING,badge-secondary KEEP,badge-warning

Compatible column data types: STRING

Supports multiple columns: Yes,badge-success

Example for multiple columns:

transformations:
  - columns: ["street_name", "zip_code"]
    params:
      type: "address_generator"
      column_templates: ["${street_name}", "${zip_code}"]

Example for a single column:

transformations:
  - columns: ["address"]
    params:
      type: "address_generator"
      column_templates: ["${country}, ${city}, ${street_name}, ${house_number}, ${flat_number}, ${zip_code}"]

feature Cycle resolution strategy

There are two cycle resolution strategies:

  • FAIL: (default) if this mode is selected, cycle_breaker_references should be provided in the configuration file. Otherwise, execution will fail if it detects a circular reference.

  • DELETE_NOT_REQUIRED: if this mode is selected, cyclic references will be resolved automatically by removing the last nullable reference leading to the cycle.

Example for FAIL mode:

default_config:
    mode: "GENERATION"
    target_ratio: 1.0
tables:
  - table_name_with_schema: "employees"
    cycle_breaker_references: ["employees"]
cycle_resolution_strategy: "FAIL"

Where the employees table contains a cycle reference.

Example for DELETE_NOT_REQUIRED mode:

default_config:
    mode: "GENERATION"
    target_ratio: 1.0
cycle_resolution_strategy: "DELETE_NOT_REQUIRED"

Version 1.0

1 Apr 2022

Version 1.0 of the Synthesized TDK.

First release

We have been working hard to combine our products into a single product with enhanced architecture that will enable us to add exciting new features and optimizations!