Frequently Asked Questions :: Synthesized Docs

There is no limitation on the number of columns / tables being synthesized as part of the data generation process.

Yes, data generation from schema is enabled using the generation mode of Synthesized TDK.

Synthesized is commonly used to handle TBs of data.

Database subsetting is available with the software. The documentation is available here.

The workflows get saved in the system and keep track of sensitive data annotations for the next run.

Yes, it is possible to apply data synthesis only to selected columns and use other column data as it is.

To perform data synthesis, Synthesized TDK performs data profiling as part of the workflow.

Synthesized understands relationships between tables automatically. It also has the capability to persist user-defined relationships.

Synthesized supports all relational databases. Available out-of-the-box with no additional configuration: PostgreSQL, MySQL, MariaDB, Oracle, MSSQL, SQLITE, DB2. Additional relational database support is provided for the following databases: Aurora MySQL Edition, Aurora PostgreSQL Edition, Azure SQL Data Warehouse (Azure Synapse Analytics), Derby, Firebird, H2, HANA, HSQLDB, Informix Ingres, MariaDB, Microsoft Access, Redshift, Sybase Adaptive Server Enterprise, Sybase SQL Anywhere, Teradata, Vertica.

The detailed workflow configuration and execution plan are available both before and after the execution. This includes a list of all transformations and their parameters for all columns and tables in the processed schemas. Extensive logs include detailed information about the workflow execution, starting from connecting to the databases, all performed transformations, and insertion logs, with all warnings and errors that occurred.

Synthesized TDK supports integrations with various Secret Managers. More options can be added by request. The HashiCorp Vault integration tutorial provides an idea of how integration can be used with TDK CLI.

The primary masking transformer, format_preserving_hashing, uses the FF1 encryption algorithm (Section 5.1, FF1).

Yes, it's possible. We have experience integrating with various solutions such as BigID. In such integration, insights from Data Profiling and PII Discovery can be used to auto-tune the workflow configuration. TDK can be integrated with open-source Data Profiling and PII Discovery solutions in a similar way.

Yes, TDK supports integration with Flyway, and the demo is available here. Flyway supports Drift Detection. Synthesized TDK focuses on data masking and generation and can be easily integrated with other tools for comparing the source and target databases.

Synthesized TDK is designed to process complex enterprise schemas with extremely flexible YAML configuration.

Blob and other special data types require additional user configuration, which is possible through flexible value mapping.

Yes, after starting the workflow with the Synthesized UI, the 'Cancel run' button is available to stop execution at any step. In the CLI, the process can simply be interrupted.

No, GPU support is not required for a higher number of columns. The software is already optimized for CPU consumption.

Frequently Asked Questions

When synthesizing data, is there any limitation on the number of columns / tables?

Is there an option to just use canonical data schema (only structure) to produce a dataset?

What is the maximum database size handled? What are the infrastructure considerations to scale performance with size?

Is it possible to subset the data using specific test conditions, like using SQL query-like filtering?

How does the software remember the data profiling and sensitive data identification for the next run?

Is it possible to apply data synthesis only to selected columns and use other column data as it is?

Is there a capability within the tool to perform data profiling (query and understand data)?

Can you define custom relationships amongst the tables? If so, does the tool have the capability to persist user-defined relationships?

Please provide a list of supported databases.

What types of logs and reports are available post-masking and post-generation execution to aid in identifying and sampling the results?

Regarding database password security while executing the workflows via CLI commands, how can we encrypt the DB credentials in the CLI commands or through API channels?

Does the Synthesized capability support encryption masking with various encryption methods, and does it also allow reverse masking through decryption transformations?

Profiling and PII Discovery Tool Integration: Since Synthesized does not have in-built PII scanning functionality, can it be integrated with open-source Data profiling and Personally Identifiable Information (PII) discovery Solutions?

Can Synthesized be integrated with any open-source tools which can compare schemas and detect schema changes between the source and target databases? Does Synthesized assist in comparing data between the source and target databases, ensuring consistency and accuracy?

Does Synthesized TDK support SaaS-based Databases and Applications like Oracle ERP, Salesforce, etc.?

Does Synthesized support masking or generating Blob Data types and other special Data types?

Is it possible to pause or abort any TDK Execution (Masking, Generation) once we start the execution? What execution controls are available during TDK execution?

With a higher number of columns and tables, do we need GPU support?