Transformations
Transformations
Object.
Parameters of a transformation. All parameters have a type key with the type name of the transformation, and other parameters that are transformation-specific.
Depending on type
property value, can be one of the following:
Key |
Link |
Modes |
Data types |
Multiple columns |
|
GENERATION, MASKING, KEEP |
ANY |
No |
|
|
GENERATION, MASKING, KEEP |
ANY |
Yes |
|
|
GENERATION, MASKING, KEEP |
NUMERIC |
No |
|
|
GENERATION, MASKING, KEEP |
NUMERIC |
No |
|
|
GENERATION, MASKING, KEEP |
ANY |
Yes |
|
|
GENERATION, MASKING, KEEP |
ANY |
Yes |
|
|
GENERATION, MASKING, KEEP |
ANY |
Yes |
|
|
MASKING, KEEP |
TEXT |
No |
|
|
GENERATION, MASKING, KEEP |
ANY |
No |
|
|
GENERATION, MASKING, KEEP |
NUMERIC |
No |
|
|
GENERATION, MASKING, KEEP |
TEXT |
No |
|
|
MASKING, KEEP |
NUMERIC |
No |
|
|
GENERATION, MASKING, KEEP |
ANY |
Yes |
|
|
MASKING, KEEP |
ANY |
Yes |
|
|
GENERATION, MASKING, KEEP |
TEXT |
Yes |
|
|
GENERATION, MASKING, KEEP |
TEXT |
Yes |
|
|
MASKING, KEEP |
TEXT |
No |
|
|
MASKING, KEEP |
NUMERIC |
No |
|
|
MASKING, KEEP |
DATE |
No |
|
|
GENERATION, MASKING, KEEP |
DATE |
No |
|
|
GENERATION, MASKING, KEEP |
ANY |
No |
|
|
GENERATION, MASKING, KEEP |
NUMERIC |
No |
|
|
GENERATION, MASKING, KEEP |
TEXT |
No |
|
|
GENERATION, MASKING, KEEP |
DATE |
No |
|
|
GENERATION, MASKING, KEEP |
BOOLEAN |
No |
|
|
MASKING, KEEP |
ANY |
No |
|
|
GENERATION, MASKING, KEEP |
ANY |
Yes |
Categorical generator
Randomly sample from a given set of categories and probabilities. Probabilities and categories can be provided or learned from data. If given, both parameters are required.
Example:
transformations:
- columns:
- transaction_type
params:
type: categorical_generator
categories:
type: string
values:
- "SENT"
- "RECEIVED"
probabilities:
- 0.6
- 0.4
Properties
-
type = categorical_generator
-
categories
:Categories for Categorical Generator
.
-
probabilities
: array ofNumber (double).
Probabilities for each category (must have same size ascategories
)
Compatible modes: GENERATION, MASKING, KEEP
Compatible column data types: ANY
Supports multiple columns: No
Conditional generator
Uses one of two transformations/generators depending on the value of the given field of the parent table. For example, using conditional generator one may use different generators depending on the value of "gender" column of the parent table.
Example:
transformations:
- columns: [ "status" ]
params:
type: conditional_generator
conditional_table: "public.delivery"
conditional_column: "status"
conditional_value: "DONE"
if_true:
type: constant_string
value: "CLOSED"
if_false:
type: constant_string
value: "OPEN"
Properties
-
type = conditional_generator
-
conditional_column
:String.
Parent column.
-
conditional_table
: optionalString.
Parent table.
-
conditional_value
:String.
Value to be compared with. If the value of the parent column is equal toconditional_value
, thenif_true
generator is used, otherwiseif_false
generator.
-
if_false
:Transformations
.
-
if_true
:Transformations
.
Compatible modes: GENERATION, MASKING, KEEP
Compatible column data types: ANY
Supports multiple columns: Yes
Continuous generator
Output data is sampled from a parameterized continuous distribution. If parameters are not given, they will be fitted from the original data.
Example:
transformations:
- columns:
- "amount"
params:
type: continuous_generator
mean: 354.21
std: 98.96
min: 0.0
Properties
-
type = continuous_generator
-
mean
: optionalNumber (double).
Mean of the sampled distribution
-
std
: optionalNumber (double).
Standard Deviation
-
min
: optionalNumber (double).
Minimum value
-
max
: optionalNumber (double).
Maximum value
-
numeric_type
:Numeric type
.
-
round
:Integer.
If given, output data will be rounded to this number of digits
Compatible modes: GENERATION, MASKING, KEEP
Compatible column data types: NUMERIC
Supports multiple columns: No
Quantile generator
Given a list of probabilities and bin edges, the output data is sampled from a mixture of uniform distributions, where each uniform distribution i
is chosen with probability probabilities[i]
and its edges are given by bin_edges[i]
and bin_edges[i + 1]
. If parameters are not given, they will be fitted from the original data.
Example:
transformations:
- columns: ["amount"]
params:
type: quantile_generator
hist: [-2.1, 0.0, 3.4, 5.6]
bin_edges: [0.3, 0.45, 0.25]
Properties
-
type = quantile_generator
-
hist
: optional array ofNumber (double).
Probabilities of each uniform distribution.
-
bin_edges
: optional array ofNumber (double).
Bin edges of each uniform distribution.
-
numeric_type
:Numeric type
.
Compatible modes: GENERATION, MASKING, KEEP
Compatible column data types: NUMERIC
Supports multiple columns: No
Copy parent generator
Copies values from parent table. Can be used for de-normalization of the database, e. g. for copying address
or phone_number
field from customers
table to orders
table.
Example:
transformations:
- columns: [ "phone_number" ]
params:
type: copy_parent_generator
parent_tables: [ "public.employees" ]
parent_columns: [ "phone_number" ]
Properties
-
type = copy_parent_generator
-
parent_columns
: array ofString.
Columns to copy the values from.
-
parent_tables
: array ofString.
Tables to copy the values from.
Compatible modes: GENERATION, MASKING, KEEP
Compatible column data types: ANY
Supports multiple columns: Yes
Foreign key generator
Fills columns with the parent table’s primary key values of a random row.
Normally this generator is being created implicitly wherever data generation for tables related with foreign keys is needed, so it should not be explicitly set up by the user. |
Properties
-
type = foreign_key_generator
-
referred_schema
: optionalString.
-
referred_table
: optionalString.
-
referred_fields
: optional array ofString.
Compatible modes: GENERATION, MASKING, KEEP
Compatible column data types: ANY
Supports multiple columns: Yes
Unique generator
This generator is intended for the case where primary key values are part of the foreign key.
Normally this generator is being created implicitly wherever data generation for tables related with foreign keys is needed, so it should not be explicitly set up by the user. |
Properties
-
type = unique_generator
-
params
: optional array ofTransformations
.
Compatible modes: GENERATION, MASKING, KEEP
Compatible column data types: ANY
Supports multiple columns: Yes
Format preserving hashing
A hash transformation is applied to each character, which included into the configured group, in a given text so that the output preserves the format but contains different characters. This transformation is secure and non-reversible.
Examples:
Default configuration:
transformations:
- columns: ["registration_number"]
params:
type: format_preserving_hashing
groups:
- selector:
type: digits
alphabets:
- type: digits
- selector:
type: lower_letters
alphabets:
- type: lower_letters
- selector:
type: upper_letters
alphabets:
- type: upper_letters
Mask only last 5 characters:
transformations:
- columns: ["registration_number"]
params:
type: format_preserving_hashing
filter:
type: "last"
n: 5
Mask only substring ignoring case:
transformations:
- columns: ["registration_number"]
params:
type: format_preserving_hashing
filter:
type: substring
substring: sub
ignore_case: true
Mask only a set of characters ignoring case:
transformations:
- columns: ["registration_number"]
params:
type: format_preserving_hashing
filter:
type: characters
characters: "abc"
ignore_case: true
Mask characters selected by regex with a custom alphabet:
transformations:
- columns: ["phone_number"]
params:
type: format_preserving_hashing
groups:
- selector:
type: regex
pattern: "[123]"
alphabets:
- type: custom
parts:
- type: characters
characters: "456"
- type: characters
characters: "789"
- type: unicode_block
name: LATIN_EXTENDED_D
- type: unicode_block
name: "Latin Extended-A"
- type: unicode_range
from: 0x0D00
to: 0x0D7F
Properties
-
type = format_preserving_hashing
-
groups
: array ofHashing group
.
Hashing groups to apply on top of the specified filter. There can be multiple groups configured. In that case the groups will be tried to match a region within the filtered value in the order they are specified in configuration. If a match is successfully found, the corresponding group’s alphabet will be used for transformation, and no other groups will be tried for that region. This implies that most specific hashing groups must be specified first in the configuration. Unspecified parameter or null
is equivalent to the following:
transformations:
- columns: ["registration_number"]
params:
type: format_preserving_hashing
groups:
- selector:
type: digits
alphabets:
- type: digits
- selector:
type: lower_letters
alphabets:
- type: lower_letters
- selector:
type: upper_letters
alphabets:
- type: upper_letters
- selector:
type: word_characters
alphabets:
- type: digits
- type: lower_letters
- type: upper_letters
-
filter
:Format preserving hashing filter
.
Compatible modes: MASKING, KEEP
Compatible column data types: TEXT
Supports multiple columns: No
Formatted string generator
Generate a string column based on a given pattern. If the pattern is not given, will generate random characters with similar length as original column.
Example:
transformations:
- columns:
- "phone_number"
params:
type: formatted_string_generator
pattern: "\\+44[0-9]{10}"
Properties
-
type = formatted_string_generator
-
pattern
: optionalString.
Regular expression pattern used to sample data from
Compatible modes: GENERATION, MASKING, KEEP
Compatible column data types: ANY
Supports multiple columns: No
Integers sequence generator
Generate a sequence of integers that represent a unique id column that contain unique values.
Properties
-
type = int_sequence_generator
-
start_from
:Integer.
Where to start the sequence from, default to 0. If the generator is used on existing data, this should be used as the maximum of the existing data plus 1.
Example:
transformations:
- columns:
- "user_id"
params:
type: int_sequence_generator
Compatible modes: GENERATION, MASKING, KEEP
Compatible column data types: NUMERIC
Supports multiple columns: No
String sequence generator
Generate a sequence of strings that represent a unique id column that contain unique values, including uppercase alphabetic and numeric values.
Example:
transformations:
- columns: ["country_id"]
params:
type: string_sequence_generator
Properties
-
type = string_sequence_generator
-
length
: optionalInteger.
Maximum length of the column, extracted from the database DDL if not given
Compatible modes: GENERATION, MASKING, KEEP
Compatible column data types: TEXT
Supports multiple columns: No
Noising transformation
Add laplacian noise to the input column in order to protect the privacy but output similar values.
Example:
transformations:
- columns: ["product_price"]
params:
type: noising
sensitivity: 23.47
min: 0
Properties
-
type = noising
-
sensitivity
: optionalNumber (double).
Amount of noise to be added
-
min
: optionalNumber (double).
If there’s a hard minimum, transformation will truncate output values there if smaller
-
max
: optionalNumber (double).
If there’s a hard maximum, transformation will truncate output values there if greater
Compatible modes: MASKING, KEEP
Compatible column data types: NUMERIC
Supports multiple columns: No
Null generator
The output column is filled with null values
Example:
transformations:
- columns: ["empty_column"]
params:
type: null_generator
Properties
-
type = null_generator
Compatible modes: GENERATION, MASKING, KEEP
Compatible column data types: ANY
Supports multiple columns: Yes
Passthrough transformation
The output data is equal to the input, no transformation is applied.
Example:
transformations:
- columns: ["customer_number", "plate"]
params:
type: passthrough
Properties
-
type = passthrough
Compatible modes: MASKING, KEEP
Compatible column data types: ANY
Supports multiple columns: Yes
Person generator
Generate personal fields (e.g., name, surname, title) and keep them consistent across columns.
Available templates are:
-
${email}
-
${first_name}
-
${male_first_name}
-
${female_first_name}
-
${last_name}
Supported locales:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Example for several columns:
transformations:
- columns: ["first_name", "last_name"]
params:
type: person_generator
column_templates: ["${first_name}", "${last_name}"]
Example for a single column:
transformations:
- columns: ["full_name"]
params:
type: person_generator
column_templates: ["${first_name} ${last_name}"]
Properties
-
type = person_generator
-
column_templates
: array ofString.
For each column, the template to be used to generate personal data
-
consistent_with_column
:String.
If given, the column that need to be consistent on. For example, ifconsistent_with_column="user_id"
all people with sameuser_id
will have the same name. The "self" value means consistency with the source value.
-
locale
:String.
To generate names from different geographical areas, the user can change this parameter. Default to 'en', which corresponds to British names.
-
length_exceeded_mode
:Value length exceeded mode
.
-
column_lengths
: optional array ofInteger.
Max lengths allowable for the column. Ignored when length_exceeded_mode: "IGNORE"`
Compatible modes: GENERATION, MASKING, KEEP
Compatible column data types: TEXT
Supports multiple columns: Yes
Address generator
Generate address fields (e.g., street, zip code) and keep them consistent across columns. Available templates are:
-
${zip_code}
-
${country}
-
${city}
-
${street_name}
-
${house_number}
-
${flat_number}
Supported locales: see Person generator.
Example for several columns:
transformations:
- columns: [ "street_name", "zip_code" ]
params:
type: address_generator
column_templates: [ "${street_name}", "${zip_code}" ]
Example for a single column:
transformations:
- columns: ["address"]
params:
type: address_generator
column_templates: ["${country}, ${city}, ${street_name}, ${house_number}, ${flat_number}, ${zip_code}"]
Properties
-
type = address_generator
-
column_templates
: array ofString.
For each column, the template to be used to generate address data
-
consistent_with_column
:String.
If given, the column that need to be consistent on. For example, ifconsistent_with_column="user_id"
all people with sameuser_id
will have the same street. The "self" value means consistency with the source value.
-
locale
:String.
To generate addresses from different geographical areas, the user can change this parameter. Default to 'en-GB', which corresponds to Great Britain addresses.
-
length_exceeded_mode
:Value length exceeded mode
.
-
column_lengths
: optional array ofInteger.
Max lengths allowable for the column. Ignored when length_exceeded_mode: "IGNORE"`
Compatible modes: GENERATION, MASKING, KEEP
Compatible column data types: TEXT
Supports multiple columns: Yes
Redaction masker
Some values in the input string are substituted by the same value, obtaining partially masked text in the output.
Example:
transformations:
- columns: ["credit_card"]
params:
type: redaction
action: MASK
which: FIRST
count: 4
mask_with: "#"
Properties
-
type = redaction
-
action
:Action
.
-
which
:Position
.
-
count
:Integer.
amount of characters to be masked or kept, default to 4
-
mask_with
:String.
character used to mask values, default to*
Compatible modes: MASKING, KEEP
Compatible column data types: TEXT
Supports multiple columns: No
Unique hashing
Apply a hash transformation to a given value so that the output is encrypted but structural coherence is preserved (same input hashed with same key is always going to produce same output). Output values are unique.
This transformation is applied to primary and foreign keys by default in MASKING
mode.
Example:
transformations:
- columns: ["card_id"]
params:
type: unique_hashing
Properties
-
type = unique_hashing
-
max_value
:Number (double).
Max value to generate, null means absence of limit
-
precision
:Integer.
Max precision to generate (e.g. if the value is 3, the maximal value is 999), null means absence of limit. Minimal value is applied if both max_value and precision are specified
Compatible modes: MASKING, KEEP
Compatible column data types: NUMERIC
Supports multiple columns: No
Date unique hashing
Apply a hash transformation to a date time format value so that the output is encrypted but structural coherence is preserved (same input hashed with same key is always going to produce same output). Output values are unique.
This transformation is applied to primary and foreign keys by default in MASKING
mode.
Example:
transformations:
- columns: ["create_date"]
params:
type: date_time_unique_hashing
min: 2000-01-01T12:00:00Z
max: 2022-01-01T12:00:00Z
Properties
-
type = date_time_unique_hashing
-
min
: optionalString (date-time).
Minimum value
-
max
: optionalString (date-time).
Maximum value
Compatible modes: MASKING, KEEP
Compatible column data types: DATE
Supports multiple columns: No
Date generator
Output data is sampled from a parameterized continuous distribution, and transformed into dates. If parameters are not given, they will be extracted from the original data
Example:
transformations:
- columns:
- "date_of_birth"
params:
type: date_generator
mean: 2018-02-01T12:00:00Z
std: 2d 4h 45m 12s 434ms
min: 2000-01-01T12:00:00Z
max: 2022-01-01T12:00:00Z
Properties
-
type = date_generator
-
mean
: optionalString (date-time).
Average date of the sampled distribution
-
std
: optionalString.
Standard deviation. The following formats are accepted:-
ISO-8601 Duration format, e.g.,
P1DT2H3M4.058S
. -
The concise format described here, e.g.,
10s
,1h 30m
or-(1h 30m)
-
Milliseconds without the specific unit, e.g.,
12534
.
-
-
min
: optionalString (date-time).
Minimum value
-
max
: optionalString (date-time).
Maximum value
Compatible modes: GENERATION, MASKING, KEEP
Compatible column data types: DATE
Supports multiple columns: No
UUID generator
The output column is filled with UUIDs.
Example:
transformations:
- columns: ["unique_id"]
params:
type: uuid_generator
Properties
-
type = uuid_generator
Compatible modes: GENERATION, MASKING, KEEP
Compatible column data types: ANY
Supports multiple columns: No
Constant numeric generator
Generates a single numeric value for the entire column
Example:
transformations:
- columns: [ "balance" ]
params:
type: constant_numeric
value: 0.0
Example (range):
transformations:
- columns: [ "balance" ]
params:
type: constant_numeric
min: 0.0
max: 10000.0
Properties
-
type = constant_numeric
-
value
: optionalNumber.
numeric value to generate
-
min
: optionalNumber.
The lower boundary for the value (inclusive)
-
max
: optionalNumber.
The upper boundary for the value (exclusive)
-
numeric_type
:Numeric type
.
Compatible modes: GENERATION, MASKING, KEEP
Compatible column data types: NUMERIC
Supports multiple columns: No
Constant string generator
Generates a single string value for the entire column
Example:
transformations:
- columns: [ "status" ]
params:
type: constant_string
value: "ACTIVE"
Properties
-
type = constant_string
-
value
: optionalString.
string value to generate
Compatible modes: GENERATION, MASKING, KEEP
Compatible column data types: TEXT
Supports multiple columns: No
Constant date generator
Generates a single date value for the entire column
Example:
transformations:
- columns: [ "creation_date" ]
params:
type: constant_date
value: 2022-07-28T12:21:00Z
Example (range):
transformations:
- columns: [ "creation_date" ]
params:
type: constant_date
min: 2022-07-01T00:00:00Z
max: 2022-07-31T23:59:59Z
Properties
-
type = constant_date
-
value
: optionalString (date-time).
date value to generate
-
min
: optionalString (date-time).
The lower boundary for the value (inclusive)
-
max
: optionalString (date-time).
The upper boundary for the value (exclusive)
Compatible modes: GENERATION, MASKING, KEEP
Compatible column data types: DATE
Supports multiple columns: No
Constant boolean generator
Generates a single boolean value for the entire column
Example:
transformations:
- columns: [ "is_active" ]
params:
type: constant_boolean
value: true
Properties
-
type = constant_boolean
-
value
: optionalBoolean.
boolean value to generate
Compatible modes: GENERATION, MASKING, KEEP
Compatible column data types: BOOLEAN
Supports multiple columns: No
JSON Pointer Transformer
Transforms JSON value nodes indicated by JSON pointers. The rest of the values are kept as is.
Example:
transformations:
- columns: ["productspec"]
params:
type: "json_pointer_transformer"
specifications:
- pointers: [ "/sku" ]
transformation:
type: "format_preserving_hashing"
- pointers: [ "/tags/0" ]
transformation:
type: "format_preserving_hashing"
ignore_errors: true
Properties
-
type = json_pointer_transformer
-
specifications
: array ofJSON Pointer Transformer Specification
.
Compatible modes: MASKING, KEEP
Compatible column data types: ANY
Supports multiple columns: No
Void Generator
An auxiliary transformer that throws an error when called. It is used only when it is necessary to ignore the processing of the entire table.
Properties
-
type = void_generator
Compatible modes: GENERATION, MASKING, KEEP
Compatible column data types: ANY
Supports multiple columns: Yes
Categories for Categorical Generator
Used in: categories
optional Object.
Categories for Categorical Generator
Depending on type
property value, can be one of the following:
|
|
|
|
|
Numeric type
Used in: numeric_type
, numeric_type
, numeric_type
optional String.
Type of numbers used by a generator
- Enum values
-
-
INT
-
LONG
-
DOUBLE
-
FLOAT
-
BIG_DECIMAL
-
BIG_INTEGER
-
SHORT
-
BYTE
-
UNSIGNED_BYTE
-
UNSIGNED_INTEGER
-
UNSIGNED_LONG
-
UNSIGNED_SHORT
-
Hashing group
Used in: groups
Object.
The pair of selector
and list of alphabet
. selector
is used to choose characters from the input string, alphabet
- is a set of characters, which are used to replace source ones.
Properties
-
selector
:Hashing group selector
.
-
alphabets
: array ofFormat preserving hashing group alphabet
.
Format preserving hashing filter
Used in: filter
optional Object.
Depending on type
property value, can be one of the following:
|
|
|
|
|
|
|
|
|
Value length exceeded mode
Used in: length_exceeded_mode
, length_exceeded_mode
optional String.
Action, required on value length overflow.
Modes:
IGNORE
(default) - error if the value exceeds column length
TRUNCATE
- truncate value to the field length
- Enum values
-
-
IGNORE
-
TRUNCATE
-
JSON Pointer Transformer Specification
Used in: specifications
Properties
-
pointers
: array ofString.
JSON Pointer (specified by RFC6901)
-
transformation
:Transformations
.
-
ignore_errors
:Boolean.
Controls the behaviour when no JSON node is found at the pointer or the node has a type incompatible with the specified transformer. If this setting istrue
, the found JSON node, if any, will remain unchanged. If the setting isfalse
, an error will be raised. Default isfalse
.
Hashing group selector
Used in: selector
Object.
Depending on type
property value, can be one of the following:
|
|
|
|
|
|
|
|
|
Format preserving hashing group alphabet
Used in: alphabets
Object.
Depending on type
property value, can be one of the following:
|
|
|
|
|
|
|
First N characters
Mask only first N characters of the input string
Properties
-
type = first
-
n
:Integer (int32).
Last N characters
Mask only last N characters of the input string
Properties
-
type = last
-
n
:Integer (int32).
Specified characters
Mask only specified characters of the input string
Properties
-
type = characters
-
characters
:String.
-
ignore_case
:Boolean.
Specified substring
Mask only specified substring of the input string
Properties
-
type = substring
-
substring
:String.
-
ignore_case
:Boolean.
Regex filter
Mask only characters filtered by regex
Properties
-
type = regex
-
pattern
:String.
-
ignore_case
:Boolean.
Word characters (equivalent of regex '\w+' as described at this link)
Custom alphabet
Custom alphabet which can consist of characters, unicode blocks and unicode ranges. In total it can be from 1 to (2^16) characters.
Properties
-
type = custom
-
parts
: optional array ofCustom alphabet part
.
Custom alphabet part
Used in: parts
optional Object.
Depending on type
property value, can be one of the following:
|
|
|
|
|
Character set
Custom alphabet which can consist of 1 to (2^16) characters. All printable characters from Unicode Basic Multilingual Plane are supported.
Properties
-
type = characters
-
characters
:String.
Unicode block
Unicode block by name. Name of the Unicode block formatted according to the results described in Java’s UnicodeBlock documentation. Examples: "BASIC_LATIN", "Basic Latin". Only the blocks from BMP (codepoints from 0x0000 to 0xFFFF) are supported. You can refer to the Unicode specification to find out the range for a block of interest.
Properties
-
type = unicode_block
-
name
:String.