Architecture Summary#

How handling of original data is organized#

Internally, the process of handling data by Synthesized can be broken down into three steps after the data is loaded onto the platform from a data source:

  • Annotate and preprocess data: the software understands the data formats and types automatically. It is able to handle missing data and erroneous values.

  • Build a mathematical generative model of data: the software builds a generative representation, a mathematical equation which encapsulates how the properties of data should look like. Internally, this equation allows the user to take pure data noise (a sequence of standard normal random variables) and transform them into the output data which has the properties of original data.

  • Synthesize a new dataset from the generative model: Finally, when the generative model is trained it can be used to generate new samples of data on demand. Furthermore, the software enables data manipulation which is used to rebalance some of the variables in data so that the output data has the desired properties.

Architecture used for handling tabular data#

The process of handling tabular data is illustrated in Figure 1 below. The software interacts with original data only during the first two steps and the final step doesn’t involve original data. The user can compare the statistical properties of the output data using the SDK with internal methods or using the web interface.

Overview of Synthesized's SDK

The SDK is a self-contained package and doesn’t require internet connection.