UserConfig Schema
UserConfig
Object.
To execute a workflow, the user can provide a YAML configuration file to fine-tune parameters of the transformations. UserConfig
is the root object of this configuration file.
This object consists of a default configuration and a list of UserTableConfig
.
Properties
-
default_config
:DefaultConfig
. -
user_table_configs
: array ofUserTableConfig
. -
cycle_resolution_strategy
:CycleResolutionStrategy
. -
schema_creation_mode
:SchemaCreationMode
. -
table_truncation_mode
:TableTruncationMode
. -
global_seed
:Integer.
A value used a seed for random number generators. The result of generation must be the same each time the generation is being run with the same seed and workflow configuration. By defaultglobal_seed
is 0.
DefaultConfig
Object.
Consists of the the default table configuration parameters that are applied to all tables by default and a list of rules that are conditionally applied to the table.
Properties
-
mode
:UserTransformationMode
. -
target_ratio
:Number (double).
The relative size of the output database with respect to the input. The number of rows of each output table will be computed by multiplying this parameter by the input table size. If not provided, this parameter will be target_ratio = 1, resulting on same size for input and output databases. -
insert_batch_size
:Integer.
Indicates how many table rows are inserted to the database per each batch operation. -
items
: array ofDefaultConfigItem
.
UserTableConfig
Object.
The parameters defined in the default configuration are applied to all tables in the database, so there’s no need to configure each table individually. But if needed, the user can override default configuration for any specific table present in the database.
For each table, the user can create a UserTableConfig
and add it to the list user_table_configs
of UserConfig
.
Properties
-
table_name_with_schema
:String.
The name of the table affected by thisUserTableConfig
. Must be in format$schema.$table
, and the table must exist in the database. -
mode
:UserTransformationMode
. -
column_params
: array ofColumnTransformationParams
. -
target_ratio
:Number (double).
The relative size of the output database with respect to the input. The number of rows of each output table will be computed by multiplying this parameter by the input table size. If not provided, this parameter will betarget_ratio = 1
, resulting on same size for input and output databases.
When setting target_ratio at a table level, the result may end up being smaller than the given value due to relationships with parent table. For example, if a customer table is set to target_ratio = 0.5 , and its child table transactions has target_ratio = 1.0 , the output transaction table will also end with half it’s samples due to its downstream dependency to the reduced table customer.
|
-
cycle_breaker_references
: array ofString.
WhenCycleResolutionStrategy
isFAIL
, this list may contain a list of table names, references to which will be ignored during the data generation. -
insert_batch_size
:Integer.
Indicates how many table rows are inserted to the database per each batch operation.
CycleResolutionStrategy
String.
Defines how to deal with cycles in table relations via foreign keys.
FAIL
-
if this mode is selected,
cycle_breaker_references
should be provided in the configuration file. Otherwise, execution will fail if it detects a circular reference. DELETE_NOT_REQUIRED
-
if this mode is selected, cyclic references will be resolved automatically by removing the last nullable reference leading to the cycle.
- Enum values
-
-
FAIL
-
DELETE_NOT_REQUIRED
-
SchemaCreationMode
String.
Defines the mode of schema creation.
CREATE_IF_NOT_EXISTS
-
if this mode is selected, DDL schema will be copied from the source database to the target one if it does not exist, existing schema will be used otherwise.
DO_NOT_CREATE
-
if this mode is selected, the existing schema will be used without any validations. Please use this mode carefully: run-time errors may occur if the input and output schema do not match.
CREATE
-
if this mode is selected, DDL schema will be copied from the source database to the target one. The target database should be empty.
DROP_AND_CREATE
-
if this mode is selected, DDL schema will be copied from the source database to the target one. Existing schema in the target database will be dropped. Please use this mode carefully.
- Enum values
-
-
DO_NOT_CREATE
-
CREATE
-
CREATE_IF_NOT_EXISTS
-
DROP_AND_CREATE
-
TableTruncationMode
String.
Defines the mode of table truncation.
DO_NOT_TRUNCATE
-
(default) if this mode is selected, tables in the target database won’t be truncated. An empty target database required.
TRUNCATE
-
if this mode is selected, tables in the target database will be truncated.
IGNORE
-
if this mode is selected, the status of the target database is ignored.
- Enum values
-
-
DO_NOT_TRUNCATE
-
TRUNCATE
-
IGNORE
-
UserTransformationMode
String.
Defines table processing mode.
KEEP
-
if this mode is selected, the original data will be copied as it is. When this mode is selected, the output size needs to be smaller than the input, i.e.
target_ratio <= 1
. MASKING
-
if this mode is selected, masking transformations will be applied to the original data. When this mode is selected, the output size needs to be smaller than the input, i.e.
target_ratio <= 1
. GENERATION
-
if this mode is selected, the synthesized engine will learn the original data and generate new synthetic data. For this mode, the output database can be bigger than the input, so
target_ratio
can be greater than 1.
Both KEEP and MASKING modes apply a transformation to original data. While KEEP uses passthrough as default transformation, while MASKING automatically assigns a privacy preserving masking transformation to all columns. See transformations list for more details. For all modes, the user can override default transformers.
|
- Enum values
-
-
MASKING
-
GENERATION
-
KEEP
-
DefaultConfigItem
Object.
The rule that applies to the tables by default, written in form "if given conditions are met, the given TransformationParams
are applied."
Properties
-
conditions
: array ofCondition
.
The list of conditions that must be met in order for the transformation params to be applied.
-
transformation
:TransformationParams
.
ColumnTransformationParams
Object.
List of column names associated with TransformationParams.
Properties
-
columns
: array ofString.
List of columns that are affected by this generator. -
params
:TransformationParams
.
Condition
Object.
Condition on which the default parameters are being applied.
Depending on type
property value, can be one of the following:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
TransformationParams
Object.
Parameters of the generator. All parameters have a type key with the type name of the transformation, and other parameters that are transformation-specific.
Depending on type
property value, can be one of the following:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
IsIgnoredForeignKeyCondition
Applied to columns that are the part of FK which is ignored due to cycles
ModeInCondition
The transformation mode is in a given array.
Properties
-
type = mode_in
-
modes
: array ofUserTransformationMode
.
DataTypeCondition
The column has one of the given data types.
Properties
-
type = data_type
-
data_type
:TransformationDataType
.
UniqueValuesCondition
Check whether the given field can be modelled as a format preserving hashing column instead of categorical.
ParentTransformationParamsCondition
If the field refers to a FK, check whether the parent column is transformed by a specific transformer
Properties
-
type = parent_transformation
-
parent_transformation_params
:TransformationParams
.
CategoricalGeneratorParams
Properties
-
type = categorical_generator
-
categories
:Categories
. -
probabilities
: array ofNumber (double).
ConditionalGeneratorParams
Properties
-
type = conditional_generator
-
conditional_column
:String.
-
conditional_table
:String.
-
conditional_value
:String.
-
if_false
:TransformationParams
. -
if_true
:TransformationParams
.
ContinuousGeneratorParams
Properties
-
type = continuous_generator
-
mean
:Number (double).
-
std
:Number (double).
-
min
:Number (double).
-
max
:Number (double).
-
numeric_type
:NumericType
. -
round
:Integer.
QuantileGeneratorParams
Properties
-
type = quantile_generator
-
hist
: array ofNumber (double).
-
bin_edges
: array ofNumber (double).
-
numeric_type
:NumericType
.
UniqueGeneratorParams
Properties
-
type = unique_generator
-
params
: array ofTransformationParams
.
UniqueHashingParams
Properties
-
type = unique_hashing
-
max_value
:Number (double).
Max value to generate, null means absence of limit -
precision
:Integer.
Max precision to generate (e.g. if the value is 3, the maximal value is 999), null means absence of limit. Minimal value is applied if both max_value and precision are specified
ConstantGeneratorParams
Params to generate a single value for the entire column
Properties
-
type = constant
-
value
:Number.
-
numeric_type
:NumericType
.