Masking

By default, TDK copies data over without any modification. This table mode is called KEEP. If we call TDK with an empty configuration file, that would be the equivalent of using:

default_config:
  mode: KEEP

When processing production data, we need to ensure that no sensitive information goes to the final destination. We can achieve this if we use MASKING mode. Let’s start with a simple config:

default_config:
  mode: MASKING
safety_mode: "RELAXED"

If the output database is not empty, we can add the following settings for the output database:

default_config:
  mode: MASKING

table_truncation_mode: TRUNCATE
schema_creation_mode: CREATE_IF_NOT_EXISTS
safety_mode: "RELAXED"

Above we have just defined the following: truncate all tables in the output database and create schema only if it does not exist. For more options, you can check the YAML configuration part of the documentation.

Given that you run demo databases explained in Getting Started, you can use the following command to run the transformation:

tdk \
    -iu jdbc:h2:tcp://localhost:9123/input;DATABASE_TO_LOWER=TRUE -iU sa \
    -ou jdbc:h2:tcp://localhost:9124/output;DATABASE_TO_LOWER=TRUE -oU sa \
    -c config.yaml

TDK tries to detect the best masking mechanism based on the input type. You can see what transformations were applied by using the dry-run command (see more arguments here: CLI reference)

Let’s check the customers table:

Table 1. CUSTOMERS table before masking
CUSTOMERNUMBER CUSTOMERNAME CONTACTLASTNAME CONTACTFIRSTNAME PHONE ADDRESSLINE1

103

Atelier graphique

Schmitt

Carine

40.32.2555

54, rue Royale

112

Signal Gift Stores

King

Jean

5551838

8489 Strong St.

114

Australian Collectors, Co.

Ferguson

Peter

03 9520 4555

636 St Kilda Road

119

La Rochelle Gifts

Labrune

Janine

40.67.8555

67, rue des Cinquante Otages

Table 2. CUSTOMERS table after masking
CUSTOMERNUMBER CUSTOMERNAME CONTACTLASTNAME CONTACTFIRSTNAME PHONE ADDRESSLINE1

7604416

Fwomnubri Jdorefqpfjn Vepbwfoe

Iylapd

Wqvj

7010-370953

Lmmejweaooyha 954

29919216

Ubkcg & Hhitqfm Wz

Hjpxftbrdybfxev

Rgykous

(72) 478-1967

um. Akseehys 97

65286067

Pwaibpu Lmypt

Llsrsmu

Grrnbqbi

281-942530

Sqy Teyfbglp qo Cdqo 21

68022786

Fudqrgjjycmz Sau Gsct Dms.

Ywsyew

Uurxn

5919048124

7833 Xtuvvhu Gb.

As you can see, all strings were masked with random characters. You can also notice that the format of strings, such as telephone numbers, is kept. This approach is called Format Preserving Encryption. Read more about transformations here.

If we have a look at the products table, it also has been changed:

Table 3. OFFICES table before masking
OFFICECODE CITY PHONE ADDRESSLINE1 ADDRESSLINE2 STATE

1

San Francisco

+1 650 219 4782

100 Market Street

Suite 300

CA

2

Boston

+1 215 837 0825

1550 Court Place

Suite 102

MA

3

NYC

+1 212 555 3000

523 East 53rd Street

apt. 5A

NY

Table 4. OFFICES table after masking
OFFICECODE CITY PHONE ADDRESSLINE1 ADDRESSLINE2 STATE

1

Ozc Zuuokybyl

+7 880 078 0951

654 Dnqsaj Bwcnag

Nlnet 086

CO

2

Jskfmy

+8 005 822 3352

2010 Jyuyl Yjprh

Atika 383

KC

3

XNR

+1 622 556 5429

452 Kebx 77xk Ctrzdk

ljh. 2C

OX

Sometimes we want to keep some tables as is. For that, we can use table settings overrides:

default_config:
  mode: MASKING

tables:
  - table_name_with_schema: "public.offices"
    mode: "KEEP"

table_truncation_mode: TRUNCATE
schema_creation_mode: CREATE_IF_NOT_EXISTS
safety_mode: "RELAXED"

Please note that the mode defined in default_config is applied automatically to all tables and tables work as override for that global default.

Also, we can customise our masking for others tables. Let’s imagine we need more realistic names, addresses, and phone numbers in the table customers. For this, we can use person_generator, address_generator and formatted_string_generator :

default_config:
  mode: MASKING

tables:
  - table_name_with_schema: "public.customers"
    transformations:
      - columns: ["customername", "contactlastname", "contactfirstname"]
        params:
          type: "person_generator"
          column_templates: ["${last_name} ${first_name}", "${last_name}", "${first_name}"]
      - columns: ["addressline1"]
        params:
          type: "address_generator"
          column_templates: ["${zip_code}, ${city}"]
      - columns: ["phone"]
        params:
          type: "formatted_string_generator"
          pattern: "\\+44[0-9]{4} [0-9]{6}"

table_truncation_mode: TRUNCATE
schema_creation_mode: CREATE_IF_NOT_EXISTS
safety_mode: "RELAXED"

Let’s see the result:

Table 5. CUSTOMERS table before masking
CUSTOMERNUMBER CUSTOMERNAME CONTACTLASTNAME CONTACTFIRSTNAME PHONE ADDRESSLINE1

103

Atelier graphique

Schmitt

Carine

40.32.2555

54, rue Royale

112

Signal Gift Stores

King

Jean

5551838

8489 Strong St.

114

Australian Collectors, Co.

Ferguson

Peter

03 9520 4555

636 St Kilda Road

119

La Rochelle Gifts

Labrune

Janine

40.67.8555

67, rue des Cinquante Otages

Table 6. CUSTOMERS table before masking
CUSTOMERNUMBER CUSTOMERNAME CONTACTLASTNAME CONTACTFIRSTNAME PHONE ADDRESSLINE1

7604416

Kshlerin Gale

Kshlerin

Gale

+441835 660356

F5J 5EY, Lake Arlene

29919216

Glover Chasidy

Glover

Chasidy

+440644 860341

H2P 9FL, Chuckborough

65286067

Purdy Mara

Purdy

Mara

+444110 954168

MD9A 6AL, Carminemouth

68022786

Gleason Annetta

Gleason

Annetta

+442048 764663

R2 3JU, Moenberg

More transformation parameters can be found here.

On the transformations page, you can find transformation compatibility in modes column. All "generators" can be used in both MASKING and GENERATION modes, but there are some purely "masking" transformations, such as noising, that can only be used in MASKING mode.

Let’s combine all examples above:

default_config:
  mode: MASKING

tables:
  - table_name_with_schema: "public.offices"
    mode: "KEEP"
  - table_name_with_schema: "public.customers"
    transformations:
      - columns: ["customername", "contactlastname", "contactfirstname"]
        params:
          type: "person_generator"
          column_templates: ["${last_name} ${first_name}", "${last_name}", "${first_name}"]
      - columns: ["addressline1"]
        params:
          type: "address_generator"
          column_templates: ["${zip_code}, ${city}"]
      - columns: ["phone"]
        params:
          type: "formatted_string_generator"
          pattern: "\\+44[0-9]{4} [0-9]{6}"

table_truncation_mode: TRUNCATE
schema_creation_mode: CREATE_IF_NOT_EXISTS
safety_mode: "RELAXED"