Changelog
enhancement Support diagnostic package for Oracle
The diagnostic package now supports Oracle, allowing it to collect summaries of schemas, tables, indexes, and more.
Version 1.72.0
12 Nov 2024
enhancement Auto-detection strategy for Format Preserving Hashing
This update introduces a new auto-detection policy for the transformer.
Text columns exceeding 64 characters are now processed by formatted_string_generator
.
This limitation is designed to improve overall performance in typical scenarios.
If the maximum value length in a column exceeds the limit and format_preserving_hashing
is specified in the configuration,
a validation error will occur. Users can manually override this behavior for each transformer by setting
the length_threshold
parameter.
Version 1.71.0
05 Nov 2024
enhancement Support masking for non-English alphabets
New locale
parameter is now supported for Format Preserving Hashing Masker.
The masker uses the global locale by default, but the user can override this setting for a specific masker instance.
enhancement Add column size validations for some transformations
If the transformation can produce values that may be outside the range of the column, the error will occur at the validation stage instead of failing on inserts.
bugfix Fix CSV table name issues
Fixed the bug when certain CSV directory names such as users
could result in an obscure error during processing.
enhancement Title template for Person and Address Generators.
New template is now supported for Person Generator:
-
${title}
- Prefix or title (Mr., Mrs., Dr., etc.)
Version 1.68.0
15 Oct 2024
feature [early access] CSV support
It is now possible to process CSV files as if they were tables. For more details, please refer to CSV documentation.
enhancement Performance Optimization
Improved performance for the majority of use cases, resulting in faster execution times.
Version 1.67.0
23 Sep 2024
bugfix Support Postgres point
type
It’s possible to set formatted_string_generator
for the point type columns with pattern "\\(([1-8]{1,2}(\\.[0-9]{1,5})?),([1-8]{1,2}(\\.[0-9]{1,5})?)\\)"
in Postgres.
bugfix Support sysname
alias copy for SQL Server
This update added explicit copy support for sysname
.
Version 1.64.0
6 Sep 2024
bugfix VoidGenerator
for computed and autogenerated columns in MSSQL
This update changes the default behavior for computed and autogenerated columns. They will be processed by the VoidGenerator
.
feature Implementing new workflow table
Introduced a new workflow table with the following capabilities:
-
Grouping: Supports grouping by multiple fields, including tags.
-
Multi-Sorting: Allows users to sort data by multiple criteria for enhanced organization.
-
Multi-Filters: Enables the use of multiple filters to refine displayed data.
-
Column Configuration: Users can customize the order and visibility of columns to suit their preferences.
-
Support multi views of table.
-
Saving table views and config changes on backend.
-
Ability to share workflow view.
Version 1.63.0
7 Aug 2024
enhancement Improvements to SQL Server Schema Copying
This update includes several fixes and improvements:
-
Identity columns for numeric types are now copied correctly.
-
Time columns with a scale of 0 no longer lose precision.
-
Complex primary keys are now copied accurately.
-
Complex unique constraints are copied correctly.
-
Both clustered and non-clustered indexes are now copied properly.
feature Triggers disablement support for SQL Server
TDK now supports flag DISABLE_DB_TRIGGERS_ON_WRITE
for disabling triggers during data generation or masking for SQL Server databases. See more configuration flags.
feature Partial schema processing and truncation
TDK now supports partial masking of columns not contained in primary keys and foreign keys. This feature allows you to process only specified tables and columns, leaving the rest of the schema untouched. To exclude a table from processing, use the target_ratio: 0.0
parameter in the table configuration.
Version 1.61.0
19 Jul 2024
feature MSSQL diagnostics CLI command
The diagnostic package is a collection of scripts capable of collecting the most important information about a database without exposing sensitive data.
The summaries help us to identify database schema limitations in specific cases in order to find the best approach for processing your schema.
This feature is experimental and supports only MSSQL databases.
Version 1.54.0
17 May 2024
enhancement StringSequenceGenerator generation in append mode
The new StringSequenceGenerator
can continue a string sequence for an existing target database using the uppercase alphanumeric Latin alphabet.
This value is automatically detected as the maximum value of the field in the target database, or it can be set manually.
transformations:
- columns: ["country_id"]
params:
type: string_sequence_generator
start_from: "BE"
alphabets:
- type: digits
- type: upper_letters
Version 1.52.0
19 April 2024
feature New Loop Generator
The New Loop Generator is a feature that facilitates the sequential generation of values from a provided list in a round-robin fashion. This functionality is particularly useful when you need to iterate through a set of values repeatedly.
When the number of rows to generate exceeds the number of values in the list, and the repeatable
property is set to true, the list can be repeated.
You can utilize the New Loop Generator in various ways.
Generating values from a provided list:
transformations:
- columns:
- transaction_type
params:
type: "loop_generator"
repeatable: true
source:
value_source: "PROVIDED"
values:
- "sent"
- "skipping_value"
- "received"
- null
Loading values from a local CSV file:
transformations:
- columns:
- productcode
- productname
params:
type: "loop_generator"
repeatable: true
source:
value_source: "CSV_FILE"
path: src/e2e/resources/data_with_header_multi.csv
null_values: null
format:
encoding: "UTF-8"
delimiter: ","
trim: true
columns:
column_accessor_type: NAME
names: ["code", "name"]
This illustrates how the generator can handle multiple columns with provided tuples of values that always appear together:
transformations:
- columns:
- productcode
- productname
params:
type: "loop_generator"
repeatable: true
source:
value_source: "MULTIPLE_PROVIDED"
values:
- productcode: "P1"
productname: "Product 1"
- productcode: "P2"
productname: "Product 2"
- productcode: "P3"
productname: "Product 3"
- productcode: null
productname: null
feature Pre and Post execution SQL scripts support
The TDK now includes support for executing SQL scripts before and after the data generation process. This feature allows users to perform additional operations if needed.
Both inline and file scripts are supported. For detailed implementation, please refer to the following example:
scripts:
pre:
source: INLINE
script: |
ALTER TABLE public.transaction DISABLE TRIGGER ALL;
post:
source: FILE
path: post_script.sql
feature Multiple columns support added for categorical_generator
The categorical_generator
now includes support for multiple columns, enabling users to specify multi-column categories. This enhancement facilitates the preservation of relationships between values across different columns within a single category.
For detailed implementation, refer to the following example:
transformations:
- columns:
- productcode
- productname
params:
type: categorical_generator
categories:
value_source: MULTIPLE_PROVIDED
null_values: ["nil"]
category_values:
- values:
productcode: "P1"
productname: "Product 1"
weight: 0.5
- values:
productcode: "P2"
productname: "Product 2"
weight: 0.3
- values:
productcode: "P3"
productname: "Product 3"
weight: 0.2
- values:
productcode: "nil"
productname: "nil"
weight: 0.5
You can also specify a CSV file as the source of categories for multiple columns:
transformations:
- columns:
- productcode
- productname
params:
type: categorical_generator
categories:
value_source: MULTIPLE_CSV_FILE
path: src/e2e/resources/data_multi.csv
The example above works with the following CSV file data:
P1 |
Product1 |
43 |
P2 |
Product2 |
23 |
P3 |
Product3 |
24 |
Example of advanced configuration with multiple columns and multiple categories:
transformations:
- columns:
- productcode
- productname
params:
type: categorical_generator
categories:
value_source: MULTIPLE_CSV_FILE
path: src/e2e/resources/data_with_header_multi.csv
null_values: ["null", ""]
format:
columns:
column_accessor_type: NAME
categories:
productcode: "code"
productname: "name"
weights: "rank"
encoding: "UTF-8"
delimiter: ","
trim: true
The example above works with the following CSV file data:
code | name | rank |
---|---|---|
P1 |
Product1 |
43 |
P2 |
Product2 |
23 |
P3 |
Product3 |
24 |
Alternatively, you can leave the categorical_generator
configuration empty, in which case TDK will automatically generate categories from the source table.
enhancement Breaking change: Replace nullable_weight
with null_values
for categorical_generator
The nullable_weight
parameter in the categorical_generator
has been replaced with null_values
. This change allows users to specify the number of values to be treated as special NULL
value. By default, null_values
is set to ["null"]
.
bugfix Resolved issue with generating a small number of rows
In certain scenarios, a foreign_key_generator
with default distribution could generate an empty output, leading to error occurrences.
feature Support the "mode" parameter at the column level
Support has been added to set the transformation mode parameter (such as MASKING
, GENERATION
, KEEP
) at the column level, in addition to the global and table levels. This feature allows us to handle a common scenario - setting the KEEP
mode as a global mode and specifying which columns to mask. In this case, TDK automatically determines the necessary transformation for each column (such as format_preserving_hashing
, noising
, and so on). More information can be found in this example:
default_config:
mode: KEEP
tables:
- table_name_with_schema: "public.customer"
transformations:
- columns: [ "first_name", "last_name", "email" ]
mode: MASKING
feature Support the "mapping" parameter as an independent parameter
Ability to specify the mapping
value without providing any transformation parameters (the params
parameter):
default_config:
mode: MASKING
tables:
- table_name_with_schema: "public.customer"
transformations:
- columns: [ "username" ]
mapping:
read: "?"
write: "current_user()"
Version 1.48.0
08 March 2024
enhancement Support for truncating cyclically referenced tables
Added support of cyclically referenced table truncation for: DB2, H2, MSSQL and ORACLE databases.
bugfix Improved Stability in Schema Copying for SQL Server
Resolved an issue where schema copying for SQL Server would attempt to access other databases, leading to permission access errors.
enhancement Improved Performance in MASKING
mode
Significantly enhance the performance of the TDK in MASKING
mode when the working directory enabled. This enhancement particularly benefits large tables by accelerating the rows filtering process.
bugfix Resolve Filtering Issue in MASKING
Mode
Corrects a bug where certain rows were erroneously filtered out in MASKING
mode. This issue primarily affected composite keys containing Null values. With this fix, Null values are now properly handled, ensuring the retention of all rows during the filtering process.
Version 1.47.0
21 February 2024
enhancement Enhancements to DROP_AND_CREATE Mode in MSSQL
Enhanced DROP_AND_CREATE mode to include support for dropping schema-related objects such as stored functions and procedures, domains, and triggers within the DROP_AND_CREATE mode.
enhancement In-Memory Filter Threshold Property
This advanced performance tuning property, the In-Memory Filter Threshold, is designed to optimize filtering operations.
This property determines whether values the parent table are loaded into memory during filtering operations, depending on the table size. If the size of the table does not exceed this specified value, the values will be loaded into memory for filtering for child tables.
The property can be set globally and be overridden at the table-level configuration.
Version 1.45.0
Version 1.44.0
7 February 2024
feature Constant XML generator
Introducing the new Constant XML Generator.
Version 1.41.0
23 January 2024
feature Add Configuration Flags: Skip Failed Batch or Fail on First Failure
This release introduces two new configuration flags, providing users with greater control over batch handling:
NOT_FALLBACK_TO_ONE_BY_ONE_INSERTS
-
In the event of a failed batch insert, this flag prevents the system from resorting to one-by-one inserts for the entire batch. If any batch encounters an error, the entire TDK execution will fail.
SKIP_FAILED_BATCHES
-
Skip failed batches without resorting to one-by-one inserts.
Only one of the flags NOT_FALLBACK_TO_ONE_BY_ONE_INSERTS and SKIP_FAILED_BATCHES can be chosen at a time.
|
Version 1.39.0
Version 1.38.0
15 Dec 2023
enhancement Automatic search indexes deactivation during processing.
This enhancement boosts performance by reducing processing time in schemas with numerous search indexes.
bugfix Subsetting support for auto-generated MSSQL Timestamp.
Version 1.37.0
Version 1.36.1
28 Nov 2023
enhancement Partial database processing enhancement.
Implemented support for an ignore filter in table processing and introduced a caching strategy for improved performance in table size calculation.
enhancement Enhance handling of MSSQL timestamp data type.
The timestamp
(Transact-SQL) data type is just an incrementing number and does not preserve a date or a time.
Version 1.36.0
27 Nov 2023
feature Financial Data Generator.
Introducing the new Finance Generator.
Choose from a variety of templates, including:
-
${credit_card}
-
${bic}
-
${iban}
-
${nasdaq_ticker}
-
${nyse_ticker}
-
${stock_market}
-
${us_routing_number}
Configure specific card types with templates such as:
-
${credit_card.visa}
-
${credit_card.mastercard}
-
${credit_card.discover}
-
${credit_card.american_express}
-
${credit_card.diners_club}
-
${credit_card.jcb}
-
${credit_card.switch}
-
${credit_card.solo}
-
${credit_card.dankort}
-
${credit_card.forbrugsforeningen}
-
${credit_card.laser}
Example:
transformations:
- columns: [ "credit_card" ]
params:
type: finance_generator
column_templates: [ "${credit_card.visa}" ]
feature Realistic Text Column Generation.
TDK now intelligently applies heuristics based on column and table names, allowing for more suitable default transformation selections.
Therefore, Person Generator, Address Generator and Finance Generator can be chosen by default for GENERATION
mode.
This behavior is enabled by default but can be disabled using the use_text_column_heuristics property.
feature Foreign Key generation with Poisson distribution.
The foreign key generator now enables the generation of foreign key relationships with links that follow a Poisson distribution, offering a more accurate representation of real-world data structures.
This feature is now set as the default behavior, replacing the previous default of ROUND_ROBIN
.
However, ROUND_ROBIN
remains available for use if preferred.
enhancement New Templates for Person and Address Generators.
New templates are now supported for Person Generator:
-
${full_name}
- First name and Last name -
${company}
- Company name -
${phone_national}
- The phone number in domestic format -
${phone_international}
- The phone number in international format -
${ssn}
- U.S. Social Security Number (SSN)
New templates for Address Generator:
-
${full_address}
-
${street_address}
-
${region}
-
${latitude}
-
${longitude}
-
${coordinates}
-
${time_zone}
enhancement Default Value Truncation for Fake Generators.
Make length_exceeded_mode: TRUNCATE
default for person_generator
, address_generator
and finance_generator
.
This ensures generated values are truncated by the column’s max length instead of causing overflow errors.
enhancement Global Locale Setting.
The Default Locale property can be set globally for person_generator
, address_generator
and finance_generator
.
This setting can be overridden at the table-level configuration.
enhancement Extended Locales Support.
The expanded list of supported locales for person_generator
, address_generator
and finance_generator
.:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
enhancement Enhance handling of MSSQL Server user-defined data type aliases.
Enhanced support for user-defined data type aliases, allowing these types to be processed by the TDK.
Version 1.35.2
10 Nov 2023
enhancement Support PostgreSQL JSONB
type for json_pointer_transformer
JSON Pointer transformer now can process PostgreSQL JSONB type.
enhancement Java 17 to be the minimum required version
TDK requires at least JVM 17 to be run.
It’s needed to update JDK to be able using latest versions of TDK.
No changes required for Docker and Kubernetes users.
Version 1.35.1
6 Nov 2023
enhancement Support SQL Server text
and ntext
data types
Despite text
and ntext
are deprecated, they may be present in production schemas.
Now these types can be processed by the TDK.
Version 1.35.0
6 Nov 2023
feature Scripting transformer
With the introduction of the new Scripting Transformer, it’s allowed to write custom scripts using the Javascript programming language.
This makes it possible to extend the TDK and add any specific logic.
The Scripting Transformer can be applied to any column for the GENERATION
and MASKING
modes.
feature Preserve Foreign Key distribution
The foreign key generator now allows you to preserve the original relationship of links, which can significantly improve data quality.
This feature can be enabled by specifying the distribution ORIGINAL
for the foreign key columns:
tables:
- table_name_with_schema: "public.orders"
transformations:
- columns: [ "customer_id" ]
params:
type: "foreign_key_generator"
distribution: ORIGINAL
Version 1.34.0
Version 1.32.0
22 Aug 2023
feature TDK is now available on GCP Marketplace
The TDK is now available on the GCP Marketplace. Getting started page https://console.cloud.google.com/marketplace/product/synthesized-marketplace-public/synthesized-tdk
Version 1.31.0
Version 1.30.0
28 Jul 2023
feature Hashicorp Vault as a secret manager
Added support for Hashicorp Vault secret manager.
enhancement Usability improvements for the date generator
Before this release, for GENERATION
mode for date generator we had to set std
option in milliseconds.
Now the following formats are supported:
-
ISO-8601 Duration format, e.g.,
P1DT2H3M4.058S
. -
The concise format described here, e.g.,
10s
,1h 30m
or-(1h 30m)
-
Milliseconds without the specific unit, e.g.,
12534
.
bugfix Fix UUID transformer behaviour in masking mode
Starting from release v1.17, the UUID transformer worked as a pure generator, it never took into account input values. Therefore it had no utility when processing FK-connected columns. In this release, the correct behaviour for the masking mode was restored.
Version 1.29.0
04 Jul 2023
feature DEFER_FOREIGN_KEY
Cycle Resolution Strategy
New DEFER_FOREIGN_KEY
Cycle Resolution Strategy: when selected - all FK references will be preserved, but the ones that lead to cycles will be disabled during masking and then re-enabled after data is inserted.
This strategy is suitable for databases with cyclic schema and works only with MASKING
mode without subsetting.
enhancement Using the Kubernetes TTL mechanism to delete completed pods
Extend TDK CLI Helm chart with the property ttlSecondsAfterFinished
. This allows Kubernetes pods to be removed in specified number of seconds after they have completed.
enhancement Bring back views for MySQL DDL copying
Copy DDL for views was previously disabled for MySQL.
enhancement Show all found errors happening during the effective configuration creation
The process of creating an efficient configuration is no longer aborted after the first detected error.
Version 1.28.0
15 Jun 2023
feature New Tutorials
New tutorials for Masking, Generation, Subsetting and Data Filtering.
bugfix Handling zero date
For more information, refer to java.sql.SQLException: Zero date value prohibited page.
Version 1.27
31 May 2023
bugfix Fix https://github.com/synthesized-io/pagila-tdk-demo failure during startup on ARM-based machines
Version 1.26
10 May 2023
enhancement Getting started page now uses Pagila Docker-compose demo
Version 1.25
21 Apr 2023
Version 1.25 of the Synthesized TDK.
feature Subsetting mode is now available in the free version of TDK
Starting from this version, the Subsetting mode is available in the free version of TDK. For more details about the Subsetting mode, please see here.
Version 1.24
04 Apr 2023
Version 1.24 of the Synthesized TDK.
feature TDK is now available on AWS Marketplace
The TDK is now available on the AWS Marketplace with Docker, ECS Fargate, and Helm charts delivery options for . See more details here: AWS Marketplace.
feature TDK Docker container is now available
TDK can now be launched not only via command line interface but as a Docker image, see details here: Docker.
Version 1.23
7 Mar 2023
Version 1.23 of the Synthesized TDK.
feature AWS S3 support for configuration loading
To be able to read the configuration file from AWS S3 you need to enable the TDK_AWS_ENABLED
property, see Application properties.
feature AWS Secrets Manager support
Database credentials can be requested from AWS Secrets Manager:
{
"type": "aws",
"secret": "${SECRET_ID}",
"version": "${VERSION_ID}"
}
Where:
-
type
: password provider type -
secret
- The ARN or name of the secret to retrieve, -
version
(optional) - The unique identifier of the version of the secret to retrieve. If you don’t specify the version, then theAWSCURRENT
version is used.
Note
|
Version 1.22
14 Feb 2023
Version 1.22 of the Synthesized TDK.
feature Ability to provide Foreign Keys in the yaml configuration file
By default, TDK preserves referential integrity based on the foreign keys in the source database schema. In this release, added the ability to provide additional foreign keys in the yaml configuration file.
For example, if order.user_id
is a foreign key referred to user.id
, and it’s not defined in the database schema, then the following configuration can be provided:
default_config:
mode: "MASKING"
target_ratio: 0.5
metadata:
tables:
- table_name_with_schema: "public.order"
foreign_keys:
fk_user_order:
referred_schema: "public"
referred_table: "user"
columns:
- column: "user_id"
referred_column: "id"
More details about the additional foreign keys in the Configuration reference section.
enhancement Performance Improvements
Significant performance improvement for the subsetting mode.
To enable the performance improvement, the following application property should be set:
TDK_WORKINGDIRECTORY_ENABLED=true
TDK_WORKINGDIRECTORY_PATH=/home/tdk/working-directory
enhancement Constant generators in the relaxed mode
In the RELAXED
mode, constant generators constant_numeric
, constant_string
, constant_date
, constant_boolean
will be chosen by default where the source column contains the same value in all rows.
enhancement FAQ page
FAQ page in the documentation.
Version 1.21
31 Jan 2023
Version 1.21 of the Synthesized TDK.
feature Safety Mode
By default, STRICT
mode is enabled. If no suitable transform is found, then the passthrough
, null_generator
and categorical_generator
will not be chosen by default.
This change breaks compatibility with the previous version. |
To keep the behavior of previous versions, you can use the RELAXED
mode.
feature Data Filtering
Data Filtering feature.
feature Multiple Database
Multiple Database support.
Version 1.20
13 Jan 2022
Version 1.20 of the Synthesized TDK.
feature YAML anchors and aliases support
YAML anchors and aliases support to reduce repeated sections in the configuration.
enhancement Download page
The latest free TDK CLI version is currently available in the documentation for download.
enhancement GitHub Actions Integration
GitHub Actions Integration page for the free TDK CLI version in the documentation.
enhancement Mutually exclusive options have been turned into separate commands
--dry-run
, --default-config
, --json-schema
, --license-expiration
options have been turned into separate commands:
.....
Commands:
help Display help information about the specified command.
default-config, dc Print built-in default configuration (can be
overridden in the user config)
dry-run, dr Print the effective configuration instead of running
the transformation to the console (by default) or
to the file (specified as `-ec` parameter value)
json-schema, js Print json schema for configuration YAML file
license-expiration, le Print the expiration date of the license key
enhancement Ability to save effective config into custom file
dry-run
command prints the effective configuration to the console (by default) or to the file (specified as -ec
or --effective-config-file
parameter value).
enhancement Added help
command to display detailed information by each command
For example tdk.jar help dry-run
:
Usage: engine-lite dry-run [-ec=<effective-config-file>]
Print the effective configuration instead of running the transformation to the
console (by default) or to the file (specified as `-ec` parameter value)
-ec, --effective-config-file=<effective-config-file>
Effective configuration file
Version 1.19
19 Dec 2022
Version 1.19 of the Synthesized TDK.
bugfix Oracle identifier is too long error when working with Oracle database
Fixed the behaviour where ORA-00972: identifier is too long
error appeared when working with Oracle DB instances.
Version 1.18
9 Dec 2022
Version 1.18 of the Synthesized TDK.
feature Mapping expressions on read and on write
Allows transforming columns as they are being read from the input database and written to the output database.
For example, a column might need a cast to a different type on read and then a cast back to the original type on write. The following configuration might be provided to address that:
tables:
- table_name_with_schema: "test_schema.test_table"
mode: "MASKING"
transformations:
- columns: ["my_binary_column"]
mapping:
read: "cast(? as char)"
write: "cast(? as binary)"
The my_binary_column
will be cast to char
type on read and the result of transformation will be cast back to
binary
type on write.
feature Support for unsigned numeric types
TDK is now aware of unsigned numeric types supported by some DBMS (MySQL, Oracle).
Version 1.17
28 Nov 2022
Version 1.17 of the Synthesized TDK.
feature Filter Schemas
Introduced the ability to define the list of schemas to process:
schemas
: array of String
. If not set or null
, all schemas available to the source database user will be processed.
Example:
default_config:
mode: GENERATION
target_ratio: 2.0
schemas: ["accounts", "payments"]
schema_creation_mode: CREATE_IF_NOT_EXISTS
table_truncation_mode: TRUNCATE
feature JSON transformer
json_pointer_transformer
transforms JSON value nodes indicated by JSON pointers, the rest of the values are kept as is:
transformations:
- columns: ["productspec"]
params:
type: "json_pointer_transformer"
specifications:
- pointers: [ "/sku" ]
transformation:
type: "format_preserving_hashing"
- pointers: [ "/tags/0" ]
transformation:
type: "format_preserving_hashing"
ignore_errors: true
Refer to JSON Pointer Transformer for more details.
enhancement Better tuned default rules for generation from empty database
Introduced more empirical default rules for generation from empty database. If a source table is empty and the generated field is not null and no user configuration provided for this field, then reasonable defaults for generators are chosen for the following data types:
-
DATE
a random date from 1970-01-01 to 2030-01-01 -
NUMERIC
a random integer from 1 to 100 -
ANY
(blobs and binary arrays) a single random byte
Version 1.16
17 Nov 2022
Version 1.16 of the Synthesized TDK.
enhancement Improved Yaml Configuration Structure
This change breaks compatibility with the previous version. |
The following yaml configuration parameters have been renamed:
-
column_params
→transformations
-
user_table_configs
→tables
Example before:
default_config:
mode: MASKING
target_ratio: 1.0
user_table_configs:
- table_name_with_schema: "public.delivery"
column_params:
- columns: ["status"]
params:
type: categorical_generator
Example after:
default_config:
mode: MASKING
target_ratio: 1.0
tables:
- table_name_with_schema: "public.delivery"
transformations:
- columns: ["status"]
params:
type: categorical_generator
Version 1.15
11 Nov 2022
Version 1.15 of the Synthesized TDK.
feature Testcontainers Integration
By combining Testcontainers with Synthesized TDK
, developers can populate any Testcontainers database with synthetically generated data, enabling rapid development of tests for logic which involves interaction with the database. Refer to documentation for more details.
feature Boost performance using working directory on a local file system
A transient local storage area can now be configured to speed up TDK operations. Refer to documentation for more details.
feature --json-schema parameter for CLI
Introduced --json-schema
parameter for CLI which prints JSON Schema for the YAML Configuration. This schema can be used in an IDE to provide auto-completion for your YAML and to validate your configuration before run.
feature Output alphabets in format_preserving_hashing
Introduced the ability to define the output alphabets in format_preserving_hashing
with unicode_block
and unicode_range
.
Refer to custom alphabets for more details.
enhancement Better tuned default rules for generation from empty database
Introduced more empirical default rules for generation from empty database.
enhancement Oracle user permissions
Added the minimum database permissions required to run Synthesized TDK with Oracle database, see Oracle permissions.
enhancement Preserve null values for all masking transformers
Preserve null values for all masking transformers.
bugfix Make CategoricalGenerator’s masking mode the same as in generation
Fixed the behaviour when CategoricalGenerator
didn’t preserve the input probabilities in masking mode.
Version 1.14
30 Sep 2022
Version 1.14 of the Synthesized TDK.
feature Significant performance improvement
Significant performance improvement for MASKING
and GENERATION
modes for all supported databases.
feature GENERATION mode for empty tables
To use GENERATION
mode for empty tables, the user should specify target_row_number
at the global or table level:
target_row_number
: optional Integer (int64).
The absolute size of the output table in rows.
This parameter is applicable only for GENERATION
mode.
If not provided, target_ratio
will be used.
Version 1.13
16 Sep 2022
Version 1.13 of the Synthesized TDK.
enhancement Improved performance of all transformations up to 30%
Improved performance of all transformations up to 30%.
enhancement Improved data quality and reduce memory consumption of date_generator
Improved data quality and reduce memory consumption of date_generator
.
Version 1.12
2 Sep 2022
Version 1.12 of the Synthesized TDK.
enhancement New unique_hashing algorithm
New unique_hashing
algorithm:
-
provides random bijective (one-to-one) mapping between unique identifiers of the input and the output databases
-
prevents key collisions – unique input keys are mapped to unique output keys
-
2x performance improvement in
MASKING
scenarios due to the absence of key collisions.
enhancement Microsoft SQL Server IDENTITY property support
Microsoft SQL Server IDENTITY
property support.
Version 1.11
23 Aug 2022
Version 1.11 of the Synthesized TDK.
enhancement New format_preserving_hashing Algorithm
New format_preserving_hashing
algorithm:
-
supports Unicode’s Basic Multilingual Plane in input and output data
-
maximum length of input text is increased to 232 characters
-
performance increased to 30-60% (the boost may vary depending on the message size and hashing groups configuration).
enhancement Required Database Permissions
Database permissions page describing the minimum database permissions required to run Synthesized TDK.
enhancement length_exceeded_mode for fake generators
A new parameter length_exceeded_mode
is added for address_generator
and person_generator
, it allows to truncate the generated values by the column size.
For example, if the country column is varchar(20)
, and the generated value is "Saint Vincent And The Grenadines":
-
length_exceeded_mode: IGNORE
(default) fails to insert "Saint Vincent And The Grenadines" to thevarchar(20)
column -
length_exceeded_mode: TRUNCATE
mode inserts the truncated value "Saint Vincent And Th" to thevarchar(20)
column.
Version 1.10
12 Aug 2022
Version 1.10 of the Synthesized TDK.
feature BigID integration
For more information, see BigID integration.
feature CREATE_IF_NOT_EXISTS support for Microsoft SQL Server
CREATE_IF_NOT_EXISTS
schema creation mode support for Microsoft SQL Server.
feature Autogenerated Documentation
Transformations and YAML configuration are now autogenerated and up-to-date.
enhancement Quick Start
Getting Started and Installation sections in the documentation with H2 demo database.
enhancement noising and continuous_generator enhancement
Columns with a single value do not fail with an error, but are kept as-is for MASKING
mode and filled with nulls for GENERATION
.
enhancement Performance improvement for fake generators
Huge performance improvement for address_generator
and person_generator
.
Version 1.9
29 Jul 2022
Version 1.9 of the Synthesized TDK.
feature Advanced format_preserving_hashing configuration
A hash transformation is applied to each character, which included into the configured group, in a given text so that the output preserves the format but contains different characters. This transformation is secure and non-reversible.
Parameters:
-
groups: List<FormatPreservingHashingGroup>
: The pair ofselector
and list ofalphabet
.selector
is used to choose characters from the input string,alphabet
- is a set of characters, which are used to replace source ones. -
filter
: Filters are used to mask only a specified substring and keep other characters as is (e.g., mask only last 5 characters).
Available character selectors:
-
numeric
-
lower_letters
-
upper_letters
-
regex
Available alphabets:
-
numeric
-
lower_letters
-
upper_letters
-
custom
Available filters:
-
first
- Mask only firstn
characters. -
last
- Mask only lastn
characters. -
characters
- Mask only specified characters. Parameters:characters
- set of characters to mask,ignore_case
(default: false) - indicates if case is taken into account. -
substring
- Mask all occurrences of specified substring. Parameters:substring
- Substring to mask,ignore_case
(default: false) - indicates if case is taken into account. -
regex
- Mask only characters matching by specified Regex pattern. Parameters:pattern
- Regex pattern to find characters to mask,ignore_case
(default: false) - indicates if case is taken into account.
For more information, see Transformations.
feature constant_numeric, constant_date, constant_string, constant_boolean generators
Added new constant generators for numeric, date, string, boolean data types.
For more information, see Transformations.
feature Numeric type for categorical_generator
categorical_generator supports numeric columns.
For more information, see Transformations.
Version 1.8
8 Jul 2022
Version 1.8 of the Synthesized TDK.
enhancement New Documentation
In addition to the nice appearance, many pages and yaml examples are now generated automatically from source code and tests, which reduces the number of mistakes and allows the documentation to be up-to-date with the product version.
Enjoy!
enhancement Performance Improvement for MASKING
This release includes significant performance improvement for MASKING
mode with target_ratio: 1.0
.
bugfix Handle Empty Tables
Fixed issues with processing empty tables in MASKING
and GENERATION
modes.
Version 1.7
17 Jun 2022
Version 1.7 of the Synthesized TDK.
feature Custom Database Types Support
To support custom database types:
-
Use output database with already created schema and its child objects, see the
DO_NOT_CREATE
in YAML configuration for more details. -
Explicitly define generator for custom type column in the configuration file.
For example, for the following custom ENUM type:
CREATE TYPE public.transaction_type_t AS ENUM ('SENT', 'RECEIVED');
Use a configuration like this:
transformations:
- columns:
- "transaction_type"
params:
type: "categorical_generator"
categories:
values:
- "SENT"
- "RECEIVED"
probabilities:
- 0.6
- 0.4
For more information, see Custom database types.
feature Constant Generator
Generate a single numeric value for the entire column
Parameters:
-
value: Number?
: numeric value to generate
Compatible modes: GENERATION,badge-primary
MASKING,badge-secondary
Compatible column data types: NUMERIC
Supports multiple columns: No,badge-danger
Example:
transformations:
- columns: [ "balance" ]
params:
type: "constant"
value: 0.0
For more information, see Transformations.
feature BIGINT and SMALLINT Support
BIGINT
and SMALLINT
data type support for GENERATION
, MASKING
,
and KEEP
modes.
feature Global Seed Parameter
global_seed
to set the seed for random number generators.
An integer 32-bit
value between -2147483648
and 2147483647
, used a
seed for random number generators. The result of generation must be the
same each time the generation is being run with the same seed and
workflow configuration. By default global_seed
is 0
.
Example:
default_config:
mode: "MASKING"
target_ratio: 1.0
global_seed: 42
For more information, see YAML configuration.
Version 1.6
10 Jun 2022
Version 1.6 of the Synthesized TDK.
enhancement Performance Improvements
This release includes significant rework of transformation execution internals, bringing the following benefits to end users:
-
Heavy parallelization of transformations and database operation. To the extent the logic of transformation permits, operations are performed in parallel. That results in better hardware utilization and reduced latencies.
-
Memory consumption optimization. The solution now can handle tables with sizes noticeably exceeding main memory size of the process itself.
Version 1.5
Version 1.4
7 Jun 2022
Version 1.4 of the Synthesized TDK.
feature License Expiration API endpoint
The license expiration can be requested via API:
curl -X 'GET' \
'http://${API_SERVICE_URL}:${API_SERVICE_PORT}/api/v1/license-expiration' \
-H 'accept: */*'
Where:
-
API_SERVICE_URL
is the endpoint of the service. If running locally, this will likely belocalhost
-
API_SERVICE_PORT
is the port exposed for the service. The default port is8081
.
If the service is up and running correctly, you should receive a 200
status with the body containing information like:
{"expiry_date":"2023-06-01"}
Version 1.3
20 May 2022
Version 1.3 of the Synthesized TDK.
feature Google Secret Manager Integration
The database credentials can be provided from Google Secret Manager:
"password": { "type": gcp, "project": "${GCP_PROJECT_ID}", "secret": "${SECRET_ID}", "version": "${VERSION_ID}" }
feature Append Data
A new table_truncation_mode
:
-
IGNORE
: if this mode is selected, the status of the output database is ignored.
It allows not to delete existing data from the output database, but to generate additional and append above.
For more information, see YAML configuration.
feature Locale For Address and Person Generators
-
locale: String = 'en-GB'
: To generate names and addresses from different geographical areas, the user can change this parameter. Default to 'en-GB', which corresponds to British names.
Supported locales:
bg
ca
ca-CAT
da-DK
de
de-AT
de-CH
en
en-AU
en-au-ocker
en-BORK
en-CA
en-GB
en-IND
en-MS
en-NEP
en-NG
en-NZ
en-PAK
en-SG
en-UG
en-US
en-ZA
es
es-MX
fa
fi-FI
fr
he
hu
in-ID
it
ja
ko
nb-NO
nl
pl
pt
pt-BR
ru
sk
sv
sv-SE
tr
uk
vi
zh-CN
zh-TW
For more information, see Transformations.
enhancement Null Generator by Default
For currently unsupported types, such as XML datatype, null_generator
will be used by default.
enhancement Stop Workflow API Endpoint
Added ways to stop the workflow using workflow_id
and
workflow_run_id
. Improved error handling.
enhancement Ability to Process a Subset of Tables
Removed comparison between input and output schema. It allows to process a subset of the input tables.
Version 1.2
29 Apr 2022
Version 1.2 of the Synthesized TDK.
feature Schema Truncation Mode
There are two table truncation modes:
-
DO_NOT_TRUNCATE
: (default) if this mode is selected, tables in the output database won’t be truncated. An empty output database required. -
TRUNCATE
: if this mode is selected, tables in the output database will be truncated.
Usage example for table_truncation_mode
:
default_config:
mode: "GENERATION"
target_ratio: 1.0
table_truncation_mode: "TRUNCATE"
feature Support CHAR Primary Keys
MASKING
mode for tables with CHAR primary keys can be used without any
additional configuration. In the previous versions passthrough
transformation was used as a workaround.
feature Support Composite Keys
Composite primary and foreign keys can be automatically handled without
any additional configuration. In the previous versions
foreign_key_generator
was used as a workaround.
enhancement Advanced Subsetting
Advanced subsetting implementation for KEEP
and MASKING
modes. In
the previous versions some of the tables after subsetting were empty.
enhancement CLI Parameters
Changed CLI parameters from camelCase to kebab-case:
Usage: engine-lite [-hV] [-c=<config-file>] [-ip=<input-password>]
-iu=<input-url> [-iU=<input-username>]
[-op=<output-password>] -ou=<output-url>
[-oU=<output-username>]
TDK engine lite.
-c, --config-file=<config-file>
Configuration file
-h, --help Show this help message and exit.
-ip, --input-password=<input-password>
Input password, default to null
-iu, --input-url=<input-url>
JDBC URL to the INPUT database
-iU, --input-username=<input-username>
Input username, default to null
-op, --output-password=<output-password>
Output password, default to null
-ou, --output-url=<output-url>
JDBC URL to the OUTPUT database
-oU, --output-username=<output-username>
Output username, default to null
-V, --version Print version information and exit.
Version 1.1
15 Apr 2022
Version 1.1 of the Synthesized TDK.
feature Schema creation mode
There are four schema creation modes:
-
CREATE_IF_NOT_EXISTS
: (default) if this mode is selected, DDL schema will be copied from the source database to the target one if it does not exist, existing schema will be used otherwise. -
DO_NOT_CREATE
: if this mode is selected, existing schema will be used. -
CREATE
: if this mode is selected, DDL schema will be copied from the source database to the target one. The target database should be empty. -
DROP_AND_CREATE
: if this mode is selected, DDL schema will be copied from the source database to the target one. Existing schema in the target database will be dropped. Please use this mode carefully.
Note: If CREATE_IF_NOT_EXISTS
, DO_NOT_CREATE
modes are used, the
target schema should be equal to the source one.
feature Address generator
Generate address fields (e.g. street, zip code) and keep them consistent across columns.
Parameters:
-
column_templates: List<String>
: For each column, the template to be used to generate address dataconsistent_with_column: String?
: If given, the column that need to be consistent on. For example, ifconsistent_with_column="user_id"
all people with sameuser_id
will have the same street
Available templates are:
-
${zip_code}
-
${country}
-
${city}
-
${street_name}
-
${house_number}
-
${flat_number}
Compatible modes: GENERATION,badge-primary
MASKING,badge-secondary
KEEP,badge-warning
Compatible column data types: STRING
Supports multiple columns: Yes,badge-success
Example for multiple columns:
transformations:
- columns: ["street_name", "zip_code"]
params:
type: "address_generator"
column_templates: ["${street_name}", "${zip_code}"]
Example for a single column:
transformations:
- columns: ["address"]
params:
type: "address_generator"
column_templates: ["${country}, ${city}, ${street_name}, ${house_number}, ${flat_number}, ${zip_code}"]
feature Cycle resolution strategy
There are two cycle resolution strategies:
-
FAIL
: (default) if this mode is selected,cycle_breaker_references
should be provided in the configuration file. Otherwise, execution will fail if it detects a circular reference. -
DELETE_NOT_REQUIRED
: if this mode is selected, cyclic references will be resolved automatically by removing the last nullable reference leading to the cycle.
Example for FAIL
mode:
default_config:
mode: "GENERATION"
target_ratio: 1.0
tables:
- table_name_with_schema: "employees"
cycle_breaker_references: ["employees"]
cycle_resolution_strategy: "FAIL"
Where the employees table contains a cycle reference.
Example for DELETE_NOT_REQUIRED
mode:
default_config:
mode: "GENERATION"
target_ratio: 1.0
cycle_resolution_strategy: "DELETE_NOT_REQUIRED"