YAML configuration
User single database config
Object.
User configuration used when working with a single database
Properties
-
default_config
:Default configuration
.
-
tables
: array ofUser-defined table configuration
.
-
cycle_resolution_strategy
:Cycle resolution strategy
.
-
schema_creation_mode
:Schema Creation Mode
.
-
table_truncation_mode
:Table truncation mode
.
-
safety_mode
:Safety Mode
.
-
global_seed
:Integer.
A value used a seed for random number generators. The result of generation must be the same each time the generation is being run with the same seed and workflow configuration. By defaultglobal_seed
is 0.
-
schemas
: array ofString.
List of schemas to process. If not set or null, all available schemas will be processed.
-
metadata
:Additional Metadata
.
User multiple databases config
Object.
User configuration used when working with multiple databases
Properties
-
transformation_configs
:TransformationConfigs
.
-
synchronised_transformations
: array of array ofString.
-
data_source_mapping
:Mapping of data sources
.
Default configuration
Used in: default_config
optional Object.
Consists of the default table configuration parameters that are applied to all tables by default and a list of rules that are conditionally applied to the table.
Properties
-
mode
:Transformation mode
.
-
target_ratio
: optionalNumber (double).
The relative size of each table of the output database with respect to the input. The number of rows of each output table will be computed by multiplying this parameter by the input table size. If not provided, this parameter will be target_ratio = 1, resulting on same size for input and output databases. Can be overridden bytarget_row_number
-
target_row_number
: optionalInteger (int64).
The absolute size of each table of the output database in rows. This parameter is applicable only for GENERATION mode. If not provided,target_ratio
will be used.
-
insert_batch_size
: optionalInteger.
Indicates how many table rows are inserted to the database per each batch operation.
-
use_working_directory
:Boolean.
Whether to use the configured local working directory to perform certain I/O operations more efficiently. In some cases, using working directory can speed up table insertion speed by several times. However, the more tables use the working directory, the more space is needed for it. Note that this setting is only effective when the working directory is configured globally (see Working directory properties).
-
items
: array ofDefault configuration item
.
The rules used to select the appropriate transformation depending on the database metadata if one is not specified in the user’s configuration.
User-defined table configuration
Used in: tables
Object.
The parameters defined in the default configuration are applied to all tables in the database, so there’s no need to configure each table individually. But if needed, the user can override default configuration for any specific table present in the database.
For each table, the user can create a UserTableConfig
and add it to the list tables
of YAML configuration
.
Properties
-
table_name_with_schema
:String.
The name of the table affected by thisUserTableConfig
. This value must contain both the schema name and table name, separated by a dot. In other words, this value must follow the format$schema.$table
. In most cases, this value is case-insensitive (except when a database object has been created with a double-quoted name). Here are some examples of valid values:HR.employees
,SCOT.SALGRADE
,sakila.film
,public.Actor
,db2inst1.SALES
. In these examples,HR
,SCOT
,sakila
,public
anddb2inst1
are schema names, whileemployees
,SALGRADE
,film
,Actor
andSALES
are table names.
In MySQL, a schema is synonymous with a database, as defined in the MySQL Glossary. Therefore, in MySQL, you should always place the database name as the first component of this property. |
-
mode
:Transformation mode
.
-
transformations
: array ofColumn transformation parameters
.
-
target_ratio
: optionalNumber (double).
The relative size of the output table with respect to the input. The number of rows of each output table will be computed by multiplying this parameter by the input table size. If not provided, this parameter will betarget_ratio = 1
, resulting on same size for input and output databases. Can be overridden bytarget_row_number
When setting target_ratio at a table level, the result may end up being smaller than the given value due to relationships with parent table. For example, if a customer table is set to target_ratio = 0.5 , and its child table transactions has target_ratio = 1.0 , the output transaction table will also end with half it’s samples due to its downstream dependency to the reduced table customer.
|
-
target_row_number
: optionalInteger (int64).
The absolute size of the output table in rows. This parameter is applicable only for GENERATION mode. If not provided,target_ratio
will be used.
-
cycle_breaker_references
: array ofString.
WhenCycle resolution strategy
isFAIL
, this list may contain a list of table names, references to which will be ignored during the data generation.
-
insert_batch_size
: optionalInteger.
Indicates how many table rows are inserted to the database per each batch operation.
-
use_working_directory
:Boolean.
-
filter
:String.
An expression to filter a source table data. Any SQL expression that can be used as the search condition in WHERE clause. For example:
org_id != 120 and custom_function() = 'foo'
Cycle resolution strategy
Used in: cycle_resolution_strategy
optional String.
Defines how to deal with cycles in table relations via foreign keys.
FAIL
-
if this mode is selected,
cycle_breaker_references
should be provided in the configuration file. Otherwise, execution will fail if a circular reference is detected in the schema. DELETE_NOT_REQUIRED
-
if this mode is selected, cyclic references will be resolved automatically by removing the last nullable reference that leads to the cycle. As a result, the cyclic reference columns will be masked to nulls.
DEFER_FOREIGN_KEY
-
if this mode is selected, cyclic references will be automatically resolved by disabling references that lead to the cycles during insertion, and then re-enabled back. As a result, the columns with cyclic references will be masked like any other keys in the schema. This strategy can only be used with the
MASKING
mode without subsetting, and only for Oracle, MSSQL, and DB2.
Example for a cycle breaker reference:
schema_creation_mode: CREATE_IF_NOT_EXISTS
cycle_resolution_strategy: FAIL
table_truncation_mode: TRUNCATE
safety_mode: "RELAXED"
default_config:
mode: GENERATION
target_ratio: 1.0
tables:
- table_name_with_schema: "public.employees"
cycle_breaker_references: ["public.employees"]
Where the employees table contains a cycle reference.
- Enum values
-
-
FAIL
-
DELETE_NOT_REQUIRED
-
DEFER_FOREIGN_KEY
-
Schema Creation Mode
Used in: schema_creation_mode
optional String.
Defines the mode of schema creation.
CREATE_IF_NOT_EXISTS
-
if this mode is selected, DDL schema will be copied from the source database to the target one if it does not exist, existing schema will be used otherwise.
DO_NOT_CREATE
-
if this mode is selected, the existing schema will be used without any validations. Please use this mode carefully: run-time errors may occur if the input and output schema do not match.
CREATE
-
if this mode is selected, DDL schema will be copied from the source database to the target one. The target database should be empty.
DROP_AND_CREATE
-
if this mode is selected, DDL schema will be copied from the source database to the target one. Existing schema in the target database will be dropped. Please use this mode carefully.
- Enum values
-
-
DO_NOT_CREATE
-
CREATE
-
CREATE_IF_NOT_EXISTS
-
DROP_AND_CREATE
-
Table truncation mode
Used in: table_truncation_mode
optional String.
Defines the mode of table truncation.
DO_NOT_TRUNCATE
-
(default) if this mode is selected, tables in the target database won’t be truncated. An empty target database required.
TRUNCATE
-
if this mode is selected, tables in the target database will be truncated.
IGNORE
-
if this mode is selected, the status of the target database is ignored.
- Enum values
-
-
DO_NOT_TRUNCATE
-
TRUNCATE
-
IGNORE
-
Safety Mode
Used in: safety_mode
optional String.
The mode of choosing transformations.
STRICT
-
if this mode is selected and no matching transformation found, then no
passthrough
,categorical_generator
,null_generator
transformations will be chosen by default. If no matching transformation found, an error occurs. IfDEFER_FOREIGN_KEY
cycle resolution strategy fails to re-enable a foreign key after transformation, an error occurs. RELAXED
-
if this mode is selected and no matching transformation found, then the
passthrough
transformation will be applied forMASKING
mode, andnull_generator
orcategorical_generator
forGENERATION
mode. Constant generatorsconstant_numeric
,constant_string
,constant_date
,constant_boolean
will be chosen by default where the source column contains the same value in all rows. IfDEFER_FOREIGN_KEY
cycle resolution strategy fails to re-enable a foreign key after transformation, a warning is shown in logs and transformation continues.
This property has no meaning for KEEP
mode.
Default: STRICT
.
- Enum values
-
-
STRICT
-
RELAXED
-
Additional Metadata
Used in: metadata
optional Object.
Metadata that is not defined at the database level but must be taken into account when performing transformations.
Properties
-
tables
: array ofTable Additional Metadata
.
TransformationConfigs
Used in: transformation_configs
map of String keys to`User single database config`.
Map of named transformation configs. Keys of this map are transformation config ids and values are objects of type User single database config.
Mapping of data sources
Used in: data_source_mapping
map of String keys to`DataSourceMappingEntry`.
Mapping of data sources defined in the Inventory file.
Transformation mode
optional String.
Defines table processing mode.
KEEP
-
if this mode is selected, the original data will be copied as it is. When this mode is selected, the output size needs to be smaller than the input, i.e.
target_ratio <= 1
. MASKING
-
if this mode is selected, masking transformations will be applied to the original data. When this mode is selected, the output size needs to be smaller than the input, i.e.
target_ratio <= 1
. GENERATION
-
if this mode is selected, the synthesized engine will learn the original data and generate new synthetic data. For this mode, the output database can be bigger than the input, so
target_ratio
can be greater than 1.
Both KEEP and MASKING modes apply a transformation to original data. While KEEP uses passthrough as default transformation, while MASKING automatically assigns a privacy preserving masking transformation to all columns. See transformations list for more details. For all modes, the user can override default transformers.
|
- Enum values
-
-
MASKING
-
GENERATION
-
KEEP
-
Default configuration item
Used in: items
Object.
The rule that applies to the tables by default, written in form "if given conditions are met, the given Transformation parameters are applied."
Properties
-
id
: optionalString.
Optional identifier of the rule (generally needed for debugging).
-
conditions
: array ofCondition
.
The list of conditions that must be met in order for the transformation params to be applied.
-
transformation
:Transformation parameters
.
Column transformation parameters
Used in: transformations
Object.
List of column names associated with Transformation parameters.
Properties
-
columns
: array ofString.
List of columns that are affected by this generator.
-
params
:Transformation parameters
.
-
id
: optionalString.
-
mapping
:ValueMapping
.
Table Additional Metadata
Used in: tables
Object.
Properties
-
table_name_with_schema
:String.
-
foreign_keys
:Additional Foreign Keys
.
Condition
Used in: conditions
Object.
Condition on which the default parameters are being applied.
Depending on type
property value, can be one of the following:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ValueMapping
Used in: mapping
Object.
The pair of SQL expressions which are applied while reading and writing values from the specified column. The expression should be a valid SQL that can contain the macro ?
. This macro will be replaced with a column name on reading and transformed value on writing.
When using plain SQL expressions, you must guarantee syntax integrity. You may create the possibility of malicious SQL injection. Be sure to properly use this functionality. |
Properties
-
read
:String.
The SQL expression which is applied while reading the column value from a source database. The value "?" can be used if the mapping is not required.
-
write
:String.
The SQL expression which is applied while writing the column value to a target database. The value "?" can be used if the mapping is not required.
Additional Foreign Keys
Used in: foreign_keys
map of String keys to`Additional Foreign Key`.
Additional foreign keys that are not defined at the database level but must be taken into account when performing transformations.
IsIgnoredForeignKeyCondition
Applied to columns that are the part of FK which is ignored due to cycles
Properties
-
type = is_ignored_foreign_key
IsUniqueCondition
The column is either a part of primary key or UNIQUE
constraint.
Properties
-
type = is_unique
ModeInCondition
The transformation mode is in a given array.
Properties
-
type = mode_in
-
modes
: array ofTransformation mode
.
DataTypeCondition
The column has one of the given data types.
Properties
-
type = data_type
-
data_type
:Transformation data type
.
UniqueValuesCondition
Check whether the given field can be modelled as a format preserving hashing column instead of categorical.
Properties
-
type = unique_values
-
unique_ratio_threshold
:Number (double).
The fraction of unique values. The null values are not taken into account.
-
min_table_size_threshold
:Integer.
Minimum table size, as for small tables theunique_ratio_threshold
can lead to false positives.
ParentTransformationParamsCondition
If the field refers to a FK, check whether the parent column is transformed by a specific transformer
Properties
-
type = parent_transformation
-
parent_transformation_params
:Transformation parameters
.
NoParentTransformationParamsCondition
If the field refers to a FK, check that the parent transformation for specific column is not defined
Properties
-
type = no_parent_transformation
SingleDistinctValueCondition
The column contains only one distinct value. The null values are not taken into account.
Properties
-
type = single_distinct_value
DistinctValuesCondition
Check whether the given field distinct value number is greater than the specified threshold. The null values are not taken into account.
Properties
-
type = distinct_values
-
minimum_threshold
:Integer.
Distinct value number (non-negative)
-
maximum_threshold
: optionalInteger.
Distinct value number (non-negative)
StandardDeviationCondition
Check whether the given field standard deviation is greater than the specified threshold.
Properties
-
type = std_dev
-
std_dev_threshold
:Number (double).
Standard deviation threshold (non-negative)
Additional Foreign Key
Used in: foreign_keys
Object.
Properties
-
referred_database
:String.
-
referred_schema
:String.
-
referred_table
:String.
-
columns
: array ofAdditional Foreign Key Column Mapping
.