Masking
By default, TDK copies data over without any modification. This table mode is called KEEP. If we call TDK with an empty configuration file, that would be the equivalent of using:
default_config:
mode: KEEP
When processing production data, we need to ensure that no sensitive information goes to the final destination. We can achieve this if we use MASKING
mode. Let’s start with a simple config:
default_config:
mode: MASKING
safety_mode: "RELAXED"
If the output database is not empty, we can add the following settings for the output database:
default_config:
mode: MASKING
table_truncation_mode: TRUNCATE
schema_creation_mode: CREATE_IF_NOT_EXISTS
safety_mode: "RELAXED"
Above we have just defined the following: truncate all tables in the output database and create schema only if it does not exist. For more options, you can check the YAML configuration part of the documentation.
Given that you run demo databases explained in Getting Started, you can use the following command to run the transformation:
java -jar tdk.jar \
-iu jdbc:h2:tcp://localhost:9123/input;DATABASE_TO_LOWER=TRUE -iU sa \
-ou jdbc:h2:tcp://localhost:9124/output;DATABASE_TO_LOWER=TRUE -oU sa \
-c config.yaml
TDK tries to detect the best masking mechanism based on the input type.
You can see what transformations were applied by using the dry-run
command (see more arguments here: CLI reference)
Let’s check the customers table:
CUSTOMERNUMBER | CUSTOMERNAME | CONTACTLASTNAME | CONTACTFIRSTNAME | PHONE | ADDRESSLINE1 |
---|---|---|---|---|---|
103 |
Atelier graphique |
Schmitt |
Carine |
40.32.2555 |
54, rue Royale |
112 |
Signal Gift Stores |
King |
Jean |
5551838 |
8489 Strong St. |
114 |
Australian Collectors, Co. |
Ferguson |
Peter |
03 9520 4555 |
636 St Kilda Road |
119 |
La Rochelle Gifts |
Labrune |
Janine |
40.67.8555 |
67, rue des Cinquante Otages |
CUSTOMERNUMBER | CUSTOMERNAME | CONTACTLASTNAME | CONTACTFIRSTNAME | PHONE | ADDRESSLINE1 |
---|---|---|---|---|---|
7604416 |
Fwomnubri Jdorefqpfjn Vepbwfoe |
Iylapd |
Wqvj |
7010-370953 |
Lmmejweaooyha 954 |
29919216 |
Ubkcg & Hhitqfm Wz |
Hjpxftbrdybfxev |
Rgykous |
(72) 478-1967 |
um. Akseehys 97 |
65286067 |
Pwaibpu Lmypt |
Llsrsmu |
Grrnbqbi |
281-942530 |
Sqy Teyfbglp qo Cdqo 21 |
68022786 |
Fudqrgjjycmz Sau Gsct Dms. |
Ywsyew |
Uurxn |
5919048124 |
7833 Xtuvvhu Gb. |
As you can see, all strings were masked with random characters. You can also notice that the format of strings, such as telephone numbers, is kept. This approach is called Format Preserving Encryption
. Read more about transformations here.
If we have a look at the products table, it also has been changed:
OFFICECODE | CITY | PHONE | ADDRESSLINE1 | ADDRESSLINE2 | STATE |
---|---|---|---|---|---|
1 |
San Francisco |
+1 650 219 4782 |
100 Market Street |
Suite 300 |
CA |
2 |
Boston |
+1 215 837 0825 |
1550 Court Place |
Suite 102 |
MA |
3 |
NYC |
+1 212 555 3000 |
523 East 53rd Street |
apt. 5A |
NY |
OFFICECODE | CITY | PHONE | ADDRESSLINE1 | ADDRESSLINE2 | STATE |
---|---|---|---|---|---|
1 |
Ozc Zuuokybyl |
+7 880 078 0951 |
654 Dnqsaj Bwcnag |
Nlnet 086 |
CO |
2 |
Jskfmy |
+8 005 822 3352 |
2010 Jyuyl Yjprh |
Atika 383 |
KC |
3 |
XNR |
+1 622 556 5429 |
452 Kebx 77xk Ctrzdk |
ljh. 2C |
OX |
Sometimes we want to keep some tables as is. For that, we can use table settings overrides:
default_config:
mode: MASKING
tables:
- table_name_with_schema: "public.offices"
mode: "KEEP"
table_truncation_mode: TRUNCATE
schema_creation_mode: CREATE_IF_NOT_EXISTS
safety_mode: "RELAXED"
Please note that the mode defined in default_config
is applied automatically to all tables and
tables
work as override for that global default.
Also, we can customise our masking for others tables. Let’s imagine we need more realistic names, addresses, and phone numbers in the table customers
. For this, we can use person_generator, address_generator and formatted_string_generator :
default_config:
mode: MASKING
tables:
- table_name_with_schema: "public.customers"
transformations:
- columns: ["customername", "contactlastname", "contactfirstname"]
params:
type: "person_generator"
column_templates: ["${last_name} ${first_name}", "${last_name}", "${first_name}"]
- columns: ["addressline1"]
params:
type: "address_generator"
column_templates: ["${zip_code}, ${city}"]
- columns: ["phone"]
params:
type: "formatted_string_generator"
pattern: "\\+44[0-9]{4} [0-9]{6}"
table_truncation_mode: TRUNCATE
schema_creation_mode: CREATE_IF_NOT_EXISTS
safety_mode: "RELAXED"
Let’s see the result:
CUSTOMERNUMBER | CUSTOMERNAME | CONTACTLASTNAME | CONTACTFIRSTNAME | PHONE | ADDRESSLINE1 |
---|---|---|---|---|---|
103 |
Atelier graphique |
Schmitt |
Carine |
40.32.2555 |
54, rue Royale |
112 |
Signal Gift Stores |
King |
Jean |
5551838 |
8489 Strong St. |
114 |
Australian Collectors, Co. |
Ferguson |
Peter |
03 9520 4555 |
636 St Kilda Road |
119 |
La Rochelle Gifts |
Labrune |
Janine |
40.67.8555 |
67, rue des Cinquante Otages |
CUSTOMERNUMBER | CUSTOMERNAME | CONTACTLASTNAME | CONTACTFIRSTNAME | PHONE | ADDRESSLINE1 |
---|---|---|---|---|---|
7604416 |
Kshlerin Gale |
Kshlerin |
Gale |
+441835 660356 |
F5J 5EY, Lake Arlene |
29919216 |
Glover Chasidy |
Glover |
Chasidy |
+440644 860341 |
H2P 9FL, Chuckborough |
65286067 |
Purdy Mara |
Purdy |
Mara |
+444110 954168 |
MD9A 6AL, Carminemouth |
68022786 |
Gleason Annetta |
Gleason |
Annetta |
+442048 764663 |
R2 3JU, Moenberg |
More transformation parameters can be found here.
On the transformations page, you can find transformation compatibility in modes column. All "generators" can be used in both MASKING and GENERATION modes, but there are some purely "masking" transformations, such as noising , that can only be used in MASKING mode.
|
Let’s combine all examples above:
default_config:
mode: MASKING
tables:
- table_name_with_schema: "public.offices"
mode: "KEEP"
- table_name_with_schema: "public.customers"
transformations:
- columns: ["customername", "contactlastname", "contactfirstname"]
params:
type: "person_generator"
column_templates: ["${last_name} ${first_name}", "${last_name}", "${first_name}"]
- columns: ["addressline1"]
params:
type: "address_generator"
column_templates: ["${zip_code}, ${city}"]
- columns: ["phone"]
params:
type: "formatted_string_generator"
pattern: "\\+44[0-9]{4} [0-9]{6}"
table_truncation_mode: TRUNCATE
schema_creation_mode: CREATE_IF_NOT_EXISTS
safety_mode: "RELAXED"