UserConfig Schema

UserConfig

Object.
To execute a workflow, the user can provide a YAML configuration file to fine-tune parameters of the transformations. UserConfig is the root object of this configuration file.

This object consists of a default configuration and a list of UserTableConfig.

Properties

  • default_config: DefaultConfig.

  • user_table_configs: array of UserTableConfig.

  • cycle_resolution_strategy: CycleResolutionStrategy.

  • schema_creation_mode: SchemaCreationMode.

  • table_truncation_mode: TableTruncationMode.

  • global_seed: Integer.
    A value used a seed for random number generators. The result of generation must be the same each time the generation is being run with the same seed and workflow configuration. By default global_seed is 0.

DefaultConfig

Object.
Consists of the the default table configuration parameters that are applied to all tables by default and a list of rules that are conditionally applied to the table.

Properties

  • mode: UserTransformationMode.

  • target_ratio: Number (double).
    The relative size of the output database with respect to the input. The number of rows of each output table will be computed by multiplying this parameter by the input table size. If not provided, this parameter will be target_ratio = 1, resulting on same size for input and output databases.

  • insert_batch_size: Integer.
    Indicates how many table rows are inserted to the database per each batch operation.

  • items: array of DefaultConfigItem.

UserTableConfig

Object.
The parameters defined in the default configuration are applied to all tables in the database, so there’s no need to configure each table individually. But if needed, the user can override default configuration for any specific table present in the database. For each table, the user can create a UserTableConfig and add it to the list user_table_configs of UserConfig.

Properties

  • table_name_with_schema: String.
    The name of the table affected by this UserTableConfig. Must be in format $schema.$table, and the table must exist in the database.

  • mode: UserTransformationMode.

  • column_params: array of ColumnTransformationParams.

  • target_ratio: Number (double).
    The relative size of the output database with respect to the input. The number of rows of each output table will be computed by multiplying this parameter by the input table size. If not provided, this parameter will be target_ratio = 1, resulting on same size for input and output databases.

When setting target_ratio at a table level, the result may end up being smaller than the given value due to relationships with parent table. For example, if a customer table is set to target_ratio = 0.5, and its child table transactions has target_ratio = 1.0, the output transaction table will also end with half it’s samples due to its downstream dependency to the reduced table customer.
  • cycle_breaker_references: array of String.
    When CycleResolutionStrategy is FAIL, this list may contain a list of table names, references to which will be ignored during the data generation.

  • insert_batch_size: Integer.
    Indicates how many table rows are inserted to the database per each batch operation.

CycleResolutionStrategy

String.
Defines how to deal with cycles in table relations via foreign keys.

FAIL

if this mode is selected, cycle_breaker_references should be provided in the configuration file. Otherwise, execution will fail if it detects a circular reference.

DELETE_NOT_REQUIRED

if this mode is selected, cyclic references will be resolved automatically by removing the last nullable reference leading to the cycle.

Enum values
  • FAIL

  • DELETE_NOT_REQUIRED

SchemaCreationMode

String.
Defines the mode of schema creation.

CREATE_IF_NOT_EXISTS

if this mode is selected, DDL schema will be copied from the source database to the target one if it does not exist, existing schema will be used otherwise.

DO_NOT_CREATE

if this mode is selected, the existing schema will be used without any validations. Please use this mode carefully: run-time errors may occur if the input and output schema do not match.

CREATE

if this mode is selected, DDL schema will be copied from the source database to the target one. The target database should be empty.

DROP_AND_CREATE

if this mode is selected, DDL schema will be copied from the source database to the target one. Existing schema in the target database will be dropped. Please use this mode carefully.

Enum values
  • DO_NOT_CREATE

  • CREATE

  • CREATE_IF_NOT_EXISTS

  • DROP_AND_CREATE

TableTruncationMode

String.
Defines the mode of table truncation.

DO_NOT_TRUNCATE

(default) if this mode is selected, tables in the target database won’t be truncated. An empty target database required.

TRUNCATE

if this mode is selected, tables in the target database will be truncated.

IGNORE

if this mode is selected, the status of the target database is ignored.

Enum values
  • DO_NOT_TRUNCATE

  • TRUNCATE

  • IGNORE

UserTransformationMode

String.
Defines table processing mode.

KEEP

if this mode is selected, the original data will be copied as it is. When this mode is selected, the output size needs to be smaller than the input, i.e. target_ratio <= 1.

MASKING

if this mode is selected, masking transformations will be applied to the original data. When this mode is selected, the output size needs to be smaller than the input, i.e. target_ratio <= 1.

GENERATION

if this mode is selected, the synthesized engine will learn the original data and generate new synthetic data. For this mode, the output database can be bigger than the input, so target_ratio can be greater than 1.

Both KEEP and MASKING modes apply a transformation to original data. While KEEP uses passthrough as default transformation, while MASKING automatically assigns a privacy preserving masking transformation to all columns. See transformations list for more details. For all modes, the user can override default transformers.
Enum values
  • MASKING

  • GENERATION

  • KEEP

DefaultConfigItem

Object.
The rule that applies to the tables by default, written in form "if given conditions are met, the given TransformationParams are applied."

Properties

The list of conditions that must be met in order for the transformation params to be applied.

ColumnTransformationParams

Object.
List of column names associated with TransformationParams.

Properties

  • columns: array of String.
    List of columns that are affected by this generator.

  • params: TransformationParams.

Condition

Object.
Condition on which the default parameters are being applied.

Depending on type property value, can be one of the following:

is_key

is_primary_key

is_foreign_key

is_ignored_foreign_key

is_unique

mode_in

true

data_type

unique_values

is_empty

parent_transformation

is_uuid

single_distinct_value

TransformationParams

Object.
Parameters of the generator. All parameters have a type key with the type name of the transformation, and other parameters that are transformation-specific.

Depending on type property value, can be one of the following:

categorical_generator

conditional_generator

continuous_generator

quantile_generator

copy_parent_generator

foreign_key_generator

unique_generator

format_preserving_hashing

formatted_string_generator

int_sequence_generator

string_sequence_generator

noising

null_generator

passthrough

person_generator

address_generator

redaction

unique_hashing

date_generator

uuid_generator

constant_numeric

constant_string

constant_date

constant_boolean

IsKeyCondition

The column is a part of either primary or a foreign key.

Properties

  • type = is_key

IsPrimaryKeyCondition

The column is a part of a primary key.

Properties

  • type = is_primary_key

IsForeignKeyCondition

The column is a part of a foreign key.

Properties

  • type = is_foreign_key

IsIgnoredForeignKeyCondition

Applied to columns that are the part of FK which is ignored due to cycles

Properties

  • type = is_ignored_foreign_key

IsUniqueCondition

The column is either a part of primary key or UNIQUE constraint.

Properties

  • type = is_unique

ModeInCondition

The transformation mode is in a given array.

Properties

TrueCondition

Always true, thus making the rule applicable to every column.

Properties

  • type = true

DataTypeCondition

The column has one of the given data types.

Properties

UniqueValuesCondition

Check whether the given field can be modelled as a format preserving hashing column instead of categorical.

Properties

  • type = unique_values

  • unique_ratio_threshold: Number (double).
    The fraction of unique values.

  • min_table_size_threshold: Integer.
    Minimum table size, as for small tables the unique_ratio_threshold can lead to false positives.

IsEmptyCondition

The column is empty.

Properties

  • type = is_empty

ParentTransformationParamsCondition

If the field refers to a FK, check whether the parent column is transformed by a specific transformer

Properties

IsUuidCondition

The column is of UUID type.

Properties

  • type = is_uuid

SingleDistinctValueCondition

The column contains only one distinct value.

Properties

  • type = single_distinct_value

CategoricalGeneratorParams

Properties

  • type = categorical_generator

  • categories: Categories.

  • probabilities: array of Number (double).

ConditionalGeneratorParams

Properties

ContinuousGeneratorParams

Properties

  • type = continuous_generator

  • mean: Number (double).

  • std: Number (double).

  • min: Number (double).

  • max: Number (double).

  • numeric_type: NumericType.

  • round: Integer.

QuantileGeneratorParams

Properties

  • type = quantile_generator

  • hist: array of Number (double).

  • bin_edges: array of Number (double).

  • numeric_type: NumericType.

CopyParentGeneratorParams

Properties

  • type = copy_parent_generator

  • parent_columns: array of String.

  • parent_tables: array of String.

ForeignKeyGeneratorParams

Properties

  • type = foreign_key_generator

  • referred_schema: String.

  • referred_table: String.

  • referred_fields: array of String.

UniqueGeneratorParams

Properties

FormatPreservingHashingParams

Properties

FormattedStringGeneratorParams

Properties

  • type = formatted_string_generator

  • pattern: String.

IntSequenceGeneratorParams

Properties

  • type = int_sequence_generator

  • start_from: Integer.

StringSequenceGeneratorParams

Properties

  • type = string_sequence_generator

  • length: Integer.

NoisingParams

Properties

  • type = noising

  • sensitivity: Number (double).

  • min: Number (double).

  • max: Number (double).

NullGeneratorParams

Properties

  • type = null_generator

PassthroughParams

Properties

  • type = passthrough

PersonGeneratorParams

Properties

  • type = person_generator

  • column_templates: array of String.

  • consistent_with_column: String.

  • locale: String.

AddressGeneratorParams

Properties

  • type = address_generator

  • column_templates: array of String.

  • consistent_with_column: String.

  • locale: String.

RedactionParams

Properties

  • type = redaction

  • action: Action.

  • which: Position.

  • count: Integer.

  • mask_with: String.

UniqueHashingParams

Properties

  • type = unique_hashing

  • max_value: Number (double).
    Max value to generate, null means absence of limit

  • precision: Integer.
    Max precision to generate (e.g. if the value is 3, the maximal value is 999), null means absence of limit. Minimal value is applied if both max_value and precision are specified

DateGeneratorParams

Properties

  • type = date_generator

  • mean: String (date-time).

  • std: Integer.
    standard deviation in milliseconds

  • min: String (date-time).

  • max: String (date-time).

UuidGeneratorParams

Properties

  • type = uuid_generator

ConstantNumericGeneratorParams

Properties

  • type = constant_numeric

  • value: Number.

  • min: Number.

  • max: Number.

  • numeric_type: NumericType.

ConstantStringGeneratorParams

Properties

  • type = constant_string

  • value: String.

ConstantDateGeneratorParams

Properties

  • type = constant_date

  • value: String (date-time).

  • min: String (date-time).

  • max: String (date-time).

ConstantBooleanGeneratorParams

Properties

  • type = constant_boolean

  • value: Boolean.

TransformationDataType

String.

Enum values
  • TEXT

  • NUMERIC

  • DATE

  • BOOLEAN

  • ANY

Categories

Object.

Depending on type property value, can be one of the following:

string

boolean

numeric

NumericType

String.

Enum values
  • INT

  • LONG

  • DOUBLE

  • FLOAT

  • BIG_DECIMAL

  • BIG_INTEGER

  • SHORT

FormatPreservingHashingGroup

Object.

FormatPreservingHashingFilter

Object.

Depending on type property value, can be one of the following:

first

last

characters

substring

regex

Action

String.

Enum values
  • KEEP

  • MASK

Position

String.

Enum values
  • FIRST

  • LAST

StringCategories

Properties

  • type = string

  • values: array of String.

BooleanCategories

Properties

  • type = boolean

  • values: array of Boolean.

NumericCategories

Properties

  • type = numeric

  • values: array of Number.

FormatPreservingHashingGroupSelector

Object.

Depending on type property value, can be one of the following:

digits

lower_letters

upper_letters

regex

FormatPreservingHashingGroupAlphabet

Object.

Depending on type property value, can be one of the following:

digits

lower_letters

upper_letters

custom

FirstCharactersFilter

Mask only first N characters of the input string

Properties

  • type = first

  • n: Integer (int32).

LastCharactersFilter

Mask only last N characters of the input string

Properties

  • type = last

  • n: Integer (int32).

CharactersFilter

Mask only specified characters of the input string

Properties

  • type = characters

  • characters: String.

  • ignore_case: Boolean.

SubstringFilter

Mask only specified substring of the input string

Properties

  • type = substring

  • substring: String.

  • ignore_case: Boolean.

RegexFilter

Mask only characters filtered by regex

Properties

  • type = regex

  • pattern: String.

  • ignore_case: Boolean.

DigitsSelector

Properties

  • type = digits

LowerLettersSelector

Properties

  • type = lower_letters

UpperLettersSelector

Properties

  • type = upper_letters

RegexSelector

Properties

  • type = regex

  • pattern: String.

DigitsAlphabet

Properties

  • type = digits

LowerLettersAlphabet

Properties

  • type = lower_letters

UpperLettersAlphabet

Properties

  • type = upper_letters

CustomAlphabet

Properties

  • type = custom

  • characters: String.