Changelog
Version 1.9
29 Jul 2022
Version 1.9 of the Synthesized TDK.
feature Advanced format_preserving_hashing configuration
A hash transformation is applied to each character, which included into the configured group, in a given text so that the output preserves the format but contains different characters. This transformation is secure and non-reversible.
Parameters:
-
groups: List<FormatPreservingHashingGroup>
: The pair ofselector
and list ofalphabet
.selector
is used to choose characters from the input string,alphabet
- is a set of characters, which are used to replace source ones. -
filter
: Filters are used to mask only a specified substring and keep other characters as is (e.g., mask only last 5 characters).
Available character selectors:
-
numeric
-
lower_letters
-
upper_letters
-
regex
Available alphabets:
-
numeric
-
lower_letters
-
upper_letters
-
custom
Available filters:
-
first
- Mask only firstn
characters. -
last
- Mask only lastn
characters. -
characters
- Mask only specified characters. Parameters:characters
- set of characters to mask,ignore_case
(default: false) - indicates if case is taken into account. -
substring
- Mask all occurrences of specified substring. Parameters:substring
- Substring to mask,ignore_case
(default: false) - indicates if case is taken into account. -
regex
- Mask only characters matching by specified Regex pattern. Parameters:pattern
- Regex pattern to find characters to mask,ignore_case
(default: false) - indicates if case is taken into account.
For more information, see Transformations List.
feature constant_numeric, constant_date, constant_string, constant_boolean generators
Added new constant generators for numeric, date, string, boolean data types.
For more information, see Transformations List.
feature Numeric type for categorical_generator
categorical_generator supports numeric columns.
For more information, see Transformations List.
Version 1.8
8 Jul 2022
Version 1.8 of the Synthesized TDK.
enhancement New Documentation
In addition to the nice appearance, many pages and yaml examples are now generated automatically from source code and tests, which reduces the number of mistakes and allows the documentation to be up-to-date with the product version.
Enjoy!
enhancement Performance Improvement for MASKING
This release includes significant performance improvement for MASKING
mode with target_ratio: 1.0
.
bugfix Handle Empty Tables
Fixed issues with processing empty tables in MASKING
and GENERATION
modes.
Version 1.7
17 Jun 2022
Version 1.7 of the Synthesized TDK.
feature Custom Database Types Support
To support custom database types:
-
Use output database with already created schema and its child objects, see the
DO_NOT_CREATE
in Configuration File for more details. -
Explicitly define generator for custom type column in the configuration file.
For example, for the following custom ENUM type:
CREATE TYPE public.transaction_type_t AS ENUM ('SENT', 'RECEIVED');
Use a configuration like this:
column_params:
- columns:
- "transaction_type"
params:
type: "categorical_generator"
categories:
type: string
values:
- "SENT"
- "RECEIVED"
probabilities:
- 0.6
- 0.4
For more information, see Custom database types.
feature Constant Generator
Generate a single numeric value for the entire column
Parameters:
-
value: Number?
: numeric value to generate
Compatible modes: GENERATION,badge-primary
MASKING,badge-secondary
Compatible column data types: NUMERIC
Supports multiple columns: No,badge-danger
Example:
column_params:
- columns: [ "balance" ]
params:
type: "constant"
value: 0.0
For more information, see Transformations List.
feature BIGINT and SMALLINT Support
BIGINT
and SMALLINT
data type support for GENERATION
, MASKING
,
and KEEP
modes.
feature Global Seed Parameter
global_seed
to set the seed for random number generators.
An integer 32-bit
value between -2147483648
and 2147483647
, used a
seed for random number generators. The result of generation must be the
same each time the generation is being run with the same seed and
workflow configuration. By default global_seed
is 0
.
Example:
default_config:
mode: "MASKING"
target_ratio: 1.0
global_seed: 42
For more information, see Configuration File.
Version 1.6
10 Jun 2022
Version 1.6 of the Synthesized TDK.
enhancement Performance Improvements
This release includes significant rework of transformation execution internals, bringing the following benefits to end users:
-
Heavy parallelization of transformations and database operation. To the extent the logic of transformation permits, operations are performed in parallel. That results in better hardware utilization and reduced latencies.
-
Memory consumption optimization. The solution now can handle tables with sizes noticeably exceeding main memory size of the process itself.
Version 1.5
Version 1.4
7 Jun 2022
Version 1.4 of the Synthesized TDK.
feature License Expiration API endpoint
The license expiration can be requested via API:
curl -X 'GET' \
'http://${API_SERVICE_URL}:${API_SERVICE_PORT}/api/v1/license-expiration' \
-H 'accept: */*'
Where:
-
API_SERVICE_URL
is the endpoint of the service. If running locally, this will likely belocalhost
-
API_SERVICE_PORT
is the port exposed for the service. The default port is8081
.
If the service is up and running correctly, you should receive a 200
status with the body containing information like:
{"expiry_date":"2023-06-01"}
For more information, see License Expiration.
feature Boolean Data Type Support
BOOLEAN
data type support for GENERATION
, MASKING
, and KEEP
modes.
enhancement Configuration File Upload
YAML configuration can be uploaded as a file via API.
For more information, see Create Workflow.
Version 1.3
20 May 2022
Version 1.3 of the Synthesized TDK.
feature Google Secret Manager Integration
The database credentials can be provided from Google Secret Manager:
"password": { "type": gcp, "project": "${GCP_PROJECT_ID}", "secret": "${SECRET_ID}", "version": "${VERSION_ID}" }
For more information, see Database Credentials.
feature Append Data
A new table_truncation_mode
:
-
IGNORE
: if this mode is selected, the status of the output database is ignored.
It allows not to delete existing data from the output database, but to generate additional and append above.
For more information, see Configuration File.
feature Locale For Address and Person Generators
-
locale: String = 'en-GB'
: To generate names and addresses from different geographical areas, the user can change this parameter. Default to 'en-GB', which corresponds to British names.
Supported locales:
bg
ca
ca-CAT
da-DK
de
de-AT
de-CH
en
en-AU
en-au-ocker
en-BORK
en-CA
en-GB
en-IND
en-MS
en-NEP
en-NG
en-NZ
en-PAK
en-SG
en-UG
en-US
en-ZA
es
es-MX
fa
fi-FI
fr
he
hu
in-ID
it
ja
ko
nb-NO
nl
pl
pt
pt-BR
ru
sk
sv
sv-SE
tr
uk
vi
zh-CN
zh-TW
For more information, see Transformations List.
enhancement Null Generator by Default
For currently unsupported types, such as XML datatype, null_generator
will be used by default.
enhancement Stop Workflow API Endpoint
Added ways to stop the workflow using workflow_id
and
workflow_run_id
. Improved error handling.
For more information, see Stop Workflow.
enhancement Ability to Process a Subset of Tables
Removed comparison between input and output schema. It allows to process a subset of the input tables.
Version 1.2
29 Apr 2022
Version 1.2 of the Synthesized TDK.
feature Schema Truncation Mode
There are two table truncation modes:
-
DO_NOT_TRUNCATE
: (default) if this mode is selected, tables in the output database won’t be truncated. An empty output database required. -
TRUNCATE
: if this mode is selected, tables in the output database will be truncated.
Usage example for table_truncation_mode
:
default_config:
mode: "GENERATION"
target_ratio: 1.0
table_truncation_mode: "TRUNCATE"
feature Support CHAR Primary Keys
MASKING
mode for tables with CHAR primary keys can be used without any
additional configuration. In the previous versions passthrough
transformation was used as a workaround.
feature Support Composite Keys
Composite primary and foreign keys can be automatically handled without
any additional configuration. In the previous versions
foreign_key_generator
was used as a workaround.
enhancement Advanced Subsetting
Advanced subsetting implementation for KEEP
and MASKING
modes. In
the previous versions some of the tables after subsetting were empty.
enhancement CLI Parameters
Changed CLI parameters from camelCase to kebab-case:
Usage: engine-lite [-hV] [-c=<config-file>] [-ip=<input-password>]
-iu=<input-url> [-iU=<input-username>]
[-op=<output-password>] -ou=<output-url>
[-oU=<output-username>]
TDK engine lite.
-c, --config-file=<config-file>
Configuration file
-h, --help Show this help message and exit.
-ip, --input-password=<input-password>
Input password, default to null
-iu, --input-url=<input-url>
JDBC URL to the INPUT database
-iU, --input-username=<input-username>
Input username, default to null
-op, --output-password=<output-password>
Output password, default to null
-ou, --output-url=<output-url>
JDBC URL to the OUTPUT database
-oU, --output-username=<output-username>
Output username, default to null
-V, --version Print version information and exit.
Version 1.1
15 Apr 2022
Version 1.1 of the Synthesized TDK.
feature Schema creation mode
There are four schema creation modes:
-
CREATE_IF_NOT_EXISTS
: (default) if this mode is selected, DDL schema will be copied from the source database to the target one if it does not exist, existing schema will be used otherwise. -
DO_NOT_CREATE
: if this mode is selected, existing schema will be used. -
CREATE
: if this mode is selected, DDL schema will be copied from the source database to the target one. The target database should be empty. -
DROP_AND_CREATE
: if this mode is selected, DDL schema will be copied from the source database to the target one. Existing schema in the target database will be dropped. Please use this mode carefully.
Note: If CREATE_IF_NOT_EXISTS
, DO_NOT_CREATE
modes are used, the
target schema should be equal to the source one.
feature Address generator
Generate address fields (e.g. street, zip code) and keep them consistent across columns.
Parameters:
-
column_templates: List<String>
: For each column, the template to be used to generate address dataconsistent_with_column: String?
: If given, the column that need to be consistent on. For example, ifconsistent_with_column="user_id"
all people with sameuser_id
will have the same street
Available templates are:
-
${zip_code}
-
${country}
-
${city}
-
${street_name}
-
${house_number}
-
${flat_number}
Compatible modes: GENERATION,badge-primary
MASKING,badge-secondary
KEEP,badge-warning
Compatible column data types: STRING
Supports multiple columns: Yes,badge-success
Example for multiple columns:
column_params:
- columns: ["street_name", "zip_code"]
params:
type: "address_generator"
column_templates: ["${street_name}", "${zip_code}"]
Example for a single column:
column_params:
- columns: ["address"]
params:
type: "address_generator"
column_templates: ["${country}, ${city}, ${street_name}, ${house_number}, ${flat_number}, ${zip_code}"]
feature Cycle resolution strategy
There are two cycle resolution strategies:
-
FAIL
: (default) if this mode is selected,cycle_breaker_references
should be provided in the configuration file. Otherwise, execution will fail if it detects a circular reference. -
DELETE_NOT_REQUIRED
: if this mode is selected, cyclic references will be resolved automatically by removing the last nullable reference leading to the cycle.
Example for FAIL
mode:
default_config:
mode: "GENERATION"
target_ratio: 1.0
user_table_configs:
- table_name_with_schema: "employees"
cycle_breaker_references: ["employees"]
cycle_resolution_strategy: "FAIL"
Where the employees table contains a cycle reference.
Example for DELETE_NOT_REQUIRED
mode:
default_config:
mode: "GENERATION"
target_ratio: 1.0
cycle_resolution_strategy: "DELETE_NOT_REQUIRED"