Synthesized’s Documentation

Synthesized provides the ability to generate high quality structured synthetic data.

overview

We can consider three different stages of the synthesis process which connect data to the information it represents.

  1. Analysis - creating a description/understanding from a data.

  2. Augmentation - modifying a description with another description.

  3. Curation - creating data from a given description/understanding.

The functionality offered can be considered at two different levels, which correspond to different scales and use cases of data.

Synthesized’s Data Kits

Scientific Data Kit

The SDK generates high quality, privacy-preserving datasets for machine learning and data science use cases.

  • Bootstrap datasets

  • Rebalance and impute missing values

  • Create privacy-preserving data

Test Data Kit

For very large test data databases with complicated primary-foreign key relationships, our TDK is the tool to use.

  • Maintain referential integrity

  • Subset and mask databases

  • Generate privacy-preserving replicas

Community support for free versions of SDK and TDK is available here.